1. Introduction
As the underlying topology of complex networks [
1] and social networks [
2], interconnection networks are becoming increasingly important. A significant number of multiprocessor systems adopt interconnection networks as their underlying topological framework. In this context, graphs are commonly adopted to model processors and their interconnections within multiprocessor systems. As the scale of processors in a system expands, the probability of node failures experiences a marked increase. The occurrence of faulty processors constitutes a substantial hazard to the system’s stability and the integrity of data. Consequently, it is essential for the system to be equipped with relevant algorithms capable of discriminating between faulty and fault-free processors. The procedure of making such a distinction is referred to as fault diagnosis. For the purpose of maintaining system stability, the fault diagnosis emerges as a particularly important research area.
At present, fault diagnosis methods mainly include system-level diagnosis and nonsystem-level diagnosis. Nonsystem-level diagnosis mainly uses hardware test diagnosis; its disadvantage is low efficiency, and the test resource requirements are large and prone to errors, especially for large-scale network systems. System-level diagnosis employs the inherent processors of the network for self-diagnostic purposes. This diagnostic strategy offers several merits, including non-intrusiveness to the system’s operational state, high efficiency, and a low error rate, which has led to its widespread application. Within the domain of system-level diagnosis, a diverse array of diagnosis models have been presented to explore the diagnosability characteristics of multiprocessor systems. The PMC model, formulated by Preparata et al. [
3], is based on the concept of reciprocal testing between adjacent processors. In contrast, the MM model, proposed by Malek and Maend [
4] and also known as the comparison model, assumes that each processor must undergo comparison with all of its adjacent processors. A considerable volume of research has been conducted utilizing both the PMC model and the MM model as foundational frameworks [
5,
6].
Faults in multiprocessor systems can be classified into two categories: permanent faults and intermittent faults. Permanent fault nodes remain in a faulty state indefinitely within the system, while intermittent fault nodes alternate between faulty and fault-free states, ultimately progressing to a permanent fault over time. Given that the fault states exhibited by intermittent fault nodes are inherently difficult to observe, this fault type has not been given enough emphasis in previous research. Accordingly, this investigation explores intermittent fault diagnosis by employing a system-level model. A system is classified as
n-
-diagnosable if all faulty nodes are permanent faults and can be accurately identified without replacement, under the condition that the number of faulty nodes does not exceed
n. In circumstances where the number of intermittently faulty nodes is no greater than
, a system is considered
-fault diagnosable if a fault-free node is never incorrectly diagnosed as faulty in any syndrome. Lai et al. [
7] introduced conditional diagnosability, which stipulates that each processor must possess at least one fault-free neighbor. Peng et al. [
8] proposed the concept of good-neighbor fault diagnosability. Within the context of the PMC model and the MM model, researchers have extensively studied the good-neighbor fault diagnosability of a multitude of well-established networks [
9,
10]. Currently, for these multiprocessor systems, research on permanent fault diagnosis has achieved a relatively high level of maturity. However, as evidenced by
Table 1, there is a dearth of research on intermittent fault diagnosis. Mallela et al. proposed a method for determining the intermittent fault diagnosability of any given system [
11]. Liang and Feng et al. provided the necessary and sufficient conditions for the diagnosability of crisp three-cycle networks under the PMC model [
12].
Studies on intermittent faults currently have difficulty diagnosing some typical internetwork topologies, such as the augmented cube. The
n-dimensional augmented cube
is a variant of the
n-dimensional hypercube
developed by Choudum and Sunitha [
15]. It has been documented that this structure not only preserves certain favorable properties of the hypercube but also possesses some embedded characteristics that are absent in the hypercube [
16,
17]. A substantial body of research has been dedicated to investigating the topological properties and diagnosability of augmented hypercubes [
18,
19]. Chang et al. [
20] showed that the conditional diagnosability of an
n-dimensional augmented cube
is
for
. In this paper, we evaluate the intermittent fault diagnosability of
under the PMC model. We show that the intermittent fault diagnosability of
is
for
. Furthermore, we propose an intermittent fault diagnosis algorithm applicable to augmented cube networks.
The instability in the state of intermittent fault nodes makes them more difficult to diagnose. A component composed of intermittent faulty nodes is considered to be in a faulty state if and only if it has no 0-path to any of its adjacent nodes. According to this characteristic, Song et al. proposed an intermittent fault diagnosis algorithm based on Depth-First Search (DFS) strategy in [
21]. However, many of the test result detections between neighboring nodes in this algorithm are unnecessary. In [
22], Ye et al. proposed a permanent fault diagnosis algorithm applicable to Hamiltonian networks under the PMC model named
; this algorithm determines the node states based on the properties of subpaths after decomposing the Hamiltonian network into subpaths. In [
23], we modified the subpath partitioning algorithm for Hamiltonian networks in the HTAD algorithm and proposed a corresponding fault diagnosis algorithm. In [
24], Chen et al. extended the aforementioned algorithms to the PMC model and the MM model based on path topology and proposed corresponding fault diagnosis algorithms. The augmented cube studied in this paper belongs to Hamiltonian networks, so we designed a quickly intermittent fault diagnosis algorithm by referencing this strategy, which is named
.
The rest of this paper is structured as follows: In
Section 2, we provide the terminology, lemmas, and theorems for a system-level fault diagnosis. In
Section 3, we supplement the properties of the augmented cubes. In
Section 4, we evaluate the intermittent fault diagnosability of
. In
Section 5, an intermittent fault diagnosis algorithm is proposed, and its performance is evaluated through simulation experiments in
Section 6. Finally, we offer our conclusions.
2. Preliminaries
In the field of system-level fault diagnosis, multiprocessor systems are typically represented as an undirected graph , where is a node set and is an edge set. The node represents the processor, and the edge represents the communication link. , represents the adjacent nodes u and v. The set of neighbor nodes for node u is represented as , . A graph is a subgraph of , where and . The neighbor nodes of node x in a subgraph , denoted by , is the set of all nodes in S that are adjacent to node x. represents the degree of node x, and is the number of all nodes in S that are adjacent to node x, i.e., . Given a set of nodes , the neighboring nodes of U in is denoted by . The minimum degree in graph S are denoted as , and the maximum degree is denoted as , , . If , then graph S is called an m-regular graph.
Furthermore, the connectivity of graph G refers to the minimum number of nodes removed from G that causes G to disconnect, or causes only one node to remain, denoted by . Let and , if there is a bijective function g: such that if and only if , we say that G is isomorphic to , and is typically denoted by .
In graph theory, a Hamiltonian cycle within a graph G is defined as a cycle (or path) that traverses each vertex of G exactly once. A simple graph containing a Hamiltonian cycle is termed a Hamiltonian graph, and numerous well-known network structures exhibit this property. A subpath within a graph is characterized as a sequence of distinct vertices, denoted as where , with the condition that for each , the vertices and are adjacent. In this sequence, the vertex is referred to as the head, as the tail, as the tail−first vertex, and as the tail−last vertex. The length of a sequence is defined as the number of vertices contained within the sequence and is represented by .
In a multiprocessor system , under the PMC model. If node u is connected to node v, , then node u is able to test whether node v is faulty or faultless. The test results can be represented by 0 or 1; a test result of 0 indicates that node u has tested node u as fault-free, and a test result of 1 indicates that node u has tested node u as faulty. If test node u is a normal node, the test results are reliable; if test node u is a faulty node, the test results are unreliable.
In augmented cubes, the system is modeled by an undirected graph
. For a node
u in graph
G, assume
v is an adjacent node in
, and the undirected link is represented by
. Based on the PMC model, the adjacent nodes will evaluate each other. We use
to represent the diagnosis results, where
is the result of
, and
is the result of
. Combined with intermittent fault characterization, four possible syndromes could be generated. If
, called 0-path,
u and
v have the same correctness; otherwise, it is called 1-path, and at least one of the two nodes is in a fault state. The results are shown in
Table 2.
The term rounds are used to describe the iteration of the graph test. Because of the unstable property of the intermittent fault processor, we need to test multiple rounds for the intermittent fault system (graph), and the combination of all the rounds of the syndrome generates the last syndrome. Assuming the graph n rounds are tested, denotes the i round result of edge , and the last test result is generated by function : |||……|.
Let
,
=
; Dahbura and Masson [
25] present sufficient conditions to determine whether a system is
-diagnosable.
Definition 1 ([
3])
. A system is n--diagnosable if all faulty nodes are permanent faults and can be correctly identified without replacement, provided that the number of faulty nodes presented does not exceed n. Lemma 1 ([
25])
. A system G is n--fault diagnosable if and only if for each pair of faulty set , ∈ with , ≤ n, and ≠. There is at least one test from subgraph to . Lemma 2 ([
11])
. A system G is m--fault diagnosable without repair if and only if, given any two sets of units in the system, and , , ∈, , ≤ m, , and the set of the remaining units is such that both and receive at least one testing link from R. Lemma 3 ([
26])
. If a system G is n--fault diagnosable and m--fault diagnosable, then the following inequality holds: . 3. Properties of Augmented Cubes
Recursion is defined as follows:
An
n-dimensional augmented cube is denoted
.
is a complete graph
, whose two vertices are marked with 0 and 1, respectively. When
,
can be obtained by adding
edges between two
-dimensional augmented cubes labeled
and
, respectively, where
, and
. A node
u =
of
is adjacent to a node
v =
of
if and only if either (1)
=
for 1 ≤ i ≤
, in which case
is called a hypercube edge, or (2)
=
for 1 ≤ i ≤
, in which case
is called a complement edge.
Figure 1 shows
,
, and
. The number of nodes in an n-dimensional augmented cube is given by
, which directly follows from its definition (e.g.,
contains 32 nodes,
contains 64 nodes, etc.).
According to the above definition, it is recorded as = . We use L for and R for . That is, = L ⊕ R. We call the edge between L and R the cross edge, and we call L and R the subcube. Obviously each vertex in has two cross edges, i.e., a vertex in L has two vertices in R and vice versa. Sometimes we also use and to represent and , where . Obviously, is -regular. For any n-bit binary string u = . We use to denote the binary string and to denote the binary string .
Above, we give a recursive definition of an n-dimensional augmented cube; however, according to the recursive definition it is not a good representation of the structural characteristics of an augmented cube. Below, we give a nonrecursive definition of an n-dimensional augmented cube.
Definition 2. The set of points of is an array of n-elements defined on the set , i.e., . There are edges between the two vertices of , u = and v = , if and only if there is a positive integer , v = , or, there is a positive integer , .
Property 1 ([
15])
. , and for . Property 2 ([
27])
. Let , where and , and u and v are two nodes in L. Then, is a complement pair of if and only if they have common (more precisely, exactly two) neighbors in R. Moreover, if is not a complement pair of L, then u and v have no common neighbor. Lemma 4 ([
28])
. Let x and y be two arbitrary nodes in , where . Then, holds. Lemma 5 ([
28])
. Let be three nodes in , where . Then, holds. Lemma 6 ([
28])
. Let be four nodes in , where . Then, holds. Lemma 7 ([
28])
. Let be five nodes in , where . If contains two complement pairs among , then holds. Lemma 8 ([
28])
. Let be five nodes in , where . If contains at least one complement pair among , then holds. According to Lemma 4 to Lemma 8, the following lemma can be obtained:
Lemma 9. Let be five nodes in , where . Then, holds.
Proof of Lemma 9. The complete proof of Lemma 9 is presented in
Appendix A.1. □
Lemma 10 ([
28])
. Let be six nodes in , where . If contains three complement pairs among , then holds. Lemma 11. Let be six nodes in , where . If contains two complement pairs among , then holds.
Proof of Lemma 11. Let and be the two complement pairs of , where and . We assume that and . For the distribution of u and v, we consider the following cases: □
Case 1. Both
u and
v are in the same subcube. We assume that
(see
Figure 2a). By Lemma 6, we have
. By Lemma 4, we have
. Hence,
holds.
Case 2. One of
u and
v is in one subcube, and the remaining one is in the other subcube. We assume that
and
(see
Figure 2b). By Lemma 5, we have
.
. Hence,
holds.
By combining the above two cases, we complete the proof.
Lemma 12. Let be six nodes in , where . If contains at least one complement pair among , then holds.
Proof of Lemma 12. Let be the complement pair of , where and . We assume that and . For the distribution of , we consider the following cases: □
Case 1. All of
are in the same subcube. We assume that
(see
Figure 2c). By Lemma 9, we have
. By the fact that
, hence
holds.
Case 2. Three of
are in the same subcube, and the remaining one is in the other subcube. We assume that
and
(see
Figure 2d). By Lemma 6, we have
. By Lemma 4, we have
. Hence,
holds.
Case 3. Two of
are in the same subcube, and the remaining two are in the other subcube. We assume that
and
(see
Figure 2e). By Lemma 5, we have
.
. Hence,
holds.
By combining the above three cases, we complete the proof.
Lemma 13. Let be six nodes in , where . Then, holds.
Proof of Lemma 13. We use mathematical induction to conduct the proofs. The base case for can be verified according to its structure diagram, i.e., let be six nodes in ; then, holds. Assume that the lemma holds for , where . Now, we consider , where and . For the distribution of , we consider the following four cases: □
Case 1. All of are in the same subcube. We assume that . According to the definition of the augmented cube, L contains at most three complement pairs among . Hence, there are the following subcases:
Case 1.1. L contains three complement pairs among
. We assume that
,
, and
are the three complement pairs of
L (see
Figure 3a). By Lemma 10, we have that
. By Property 2,
. Hence,
. Therefore,
holds.
Case 1.2. L contains two complement pairs among
. We assume that
and
are the two complement pairs of
L (see
Figure 3b). By Lemma 11, we have that
. By Property 2,
. Because
,
. Therefore,
holds.
Case 1.3. L contains only one complement pair among
. We assume that
is the complement pair of
L (see
Figure 3c). By Lemma 12, we have that
. By Property 2,
. Because
,
. Therefore,
holds.
Case 1.4. L contains no complement pair among
(see
Figure 3d). By the induction hypothesis, we have that
. Because
, hence
. Therefore,
holds.
Case 2. Five of
are in one subcube, and the remaining one is in the other subcube. We assume that
and
(see
Figure 3e). By Lemma 9, we have
. By Property 1, we have
. Therefore,
holds.
Case 3. Four of
are in one subcube, and the remaining two are in the other subcube. We assume that
and
(see
Figure 3f). By Lemma 6, we have
. By Lemma 4, we have
. Therefore,
holds.
Case 4. Three of are in one subcube, and the remaining three are in the other subcube. We assume that and . By Lemma 5, we have , . Therefore, holds.
By combining the above four cases, we complete the proof.
5. IFDA for Augmented Cubes
In this section, we aim to design a quick and accurate intermittent fault diagnosis algorithm applicable to augmented cube networks, which is named the intermittent fault diagnosis algorithm (IFDA). The augmented cube is one instance of a Hamiltonian network, thus all nodes within this network can form a Hamiltonian cycle. The proposed algorithm employs the cycle partitioning strategy introduced in [
23] to decompose the target network into a series of subpaths. Then, it determines the states of nodes within certain subpaths based on their inherent properties. Finally, the DFS strategy is invoked to diagnose the states of unknown nodes, with isolated nodes that without a 0-path being identified as faulty. The cycle partitioning algorithm is detailed in Algorithm 1, followed by an analysis of the subpath properties.
Algorithm 1 Cycle-Partitioning Algorithm Based on the PMC Model [23] |
Input: A N-node cycle with syndrome based on PMC model. Output: The set of subpaths .Step 1: Choose a 0-path followed by a 1-path in the clockwise direction. Step 2: Let a be the edge following . If a is 0-path, updating to a and continue execute Step 2; otherwise, go to Step 3. Step 3: Mark with an X the edge following a. If it was not previously marked, set as the next edge of the X-marked and go to Step 2; otherwise, the algorithm terminates.
|
It should be noted that if no starting node satisfies the given conditions in Step 1, all nodes will be deemed to be in a fault-free state. Therefore, the aforementioned scenario will not be considered in the subsequent sections of this paper. For the algorithm, Property 3 below is consistent with the corresponding property in [
23]. The test result between adjacent nodes corresponds to a 1-path; it indicates the presence of at least one faulty node between these two nodes. Consequently, the following Property 4 also holds.
Property 3 ([
23])
. The test results for all subpaths always follow the following pattern: , where . Property 4. In every subpath created by the cycle-partition algorithm, there exists at least one intermittent faulty node.
Property 5 ([
22])
. There are subpaths with lengths greater than or equal to by the Pigeonhole principle, where s is the number of subpaths. It is observed that the states of all nodes in the subpaths, excluding the tail node, must exhibit consistency. Therefore, the subpaths are categorized into two types based on the state of the head node: Type-I subpaths and Type-II subpaths. The head node in a Type-I subpath is in a fault-free state, whereas the head node in a Type-II subpath is in a faulty state.
Lemma 15 ([
23])
. Let t be the number of intermittent faulty nodes in an node cycle system, and s is the number of subpaths. If a subpath satisfies the inequality , then it is classified as a Type-I subpath. In [
22,
23,
24], the evaluation methods for the fault bound are essentially consistent, yet [
24] presents a more concise evaluation approach. According to the fault bound evaluation method in [
24], Lemma 16 restates the relationship between the fault bound
T and the number of nodes
N.
Lemma 16 ([
22])
. Let n and l be non-negative integers. In a path with N () nodes, the fault bound under the PMC model is calculated using the following formula:where , and . If , then, ; if , then ; otherwise, . Subsequently, we introduce the Algorithm 2 for diagnosing the status of all nodes in the system. In the first step of the intermittent fault diagnosis algorithm, the cycle-partition strategy is invoked to decompose the cycle into a series of subpaths. And then, Lemma 16 is applied to pick out Type-I subpaths that meet the criteria. In the subpath identified in this step, all nodes except the tail node are identified as fault-free. Next, based on the syndrome of the cycle, the fault-free nodes in set
are employed to determine the unknown states of nodes in set
F. Finally, the Depth-First Search (DFS) algorithm is invoked for the unknown nodes to group the unknown nodes that are 0-path connected to them into a component. If this component has a 0-path connection with the fault-free nodes, it can be concluded that the nodes within this component are in a fault-free state; otherwise, they are in a faulty state and are kept in the set
F.
Algorithm 2 Intermittent Fault Diagnosis Algorithm (IFDA) |
Input: A syndrome for an N-node cycle under the PMC model, and its intermittent fault diagnosability . Output: The set of fault-free nodes, denoted as ; the set of fault nodes, denoted as F.Step 1: By executing the - algorithm, mapping the cycle to subpaths , where . Step 2: Determine the subpaths of Type-I according to Lemma 15 and the fault bound T in Lemma 16. If , then . Otherwise, move the tail node of subpath to the set F, and move the remaining nodes to the set . Step 3: Based on the syndrome of the cycle, determine the status of neighbor nodes in F using the fault-free nodes in set . Step 4: The DFS algorithm is invoked to partition nodes in set F that are interconnected via 0-paths into a component, denoted as ; if a node has no 0-path connection to any other node, it forms an isolated component. Case 4.1: If , the largest component constructed in Step 4 are fault-free nodes, while the remaining nodes are faulty nodes [ 21]. Case 4.2: If , all nodes in the component are classified as fault-free if such a connection exists between and , ; otherwise, they are marked as faulty, .
|
6. Simulation Experiment
In this section, the performance of the
IFDA is evaluated through simulation experiments. Prior research, as evidenced in [
22,
23,
24], has demonstrated the efficacy and performance of diagnosis algorithms grounded in network decomposition strategy. Therefore, this experiment simulates the efficacy of the algorithm in diagnosing intermittent fault nodes and evaluates the impact of relevant parameters on the final diagnostic outcomes. In the experiments, we assume that the intermittent fault nodes are randomly distributed. The IFDA is programmed in Python and executed using Python 3.10. To mitigate experimental errors arising from random functions, the mean value derived from 10,000 algorithm runs is considered as the experimental outcome.
For the simulation experiment, the evaluation parameters consist of three important metrics: Detection Accuracy, Precision, and Recall (also known as True Positive Rate, TPR). These metrics are utilized to evaluate the final performance of the algorithm. The concept of has the same definition as the standard Accuracy, but it is calculated after the third step of the intermittent fault diagnosis algorithm (IFDA) is finished. In a multiprocessor system, the number of test rounds between adjacent nodes has a direct impact on the syndrome. Additionally, the probability of intermittent faulty nodes showing failures, represented by in this experiment, is another key parameter that affects the syndrome. Therefore, this paper will assess how both the Ratio and the number of test rounds influence the experimental results. The following are the in-depth definitions of the relevant evaluation parameters:
Detection Accuracy (Accuracy, DA) [
21]: The ratio of the number of nodes that are correctly diagnosed to the total number of nodes.
Recall/TPR [
21]: The ratio of the number of faulty nodes that are correctly diagnosed to the total number of faulty nodes.
Precision [
21]: The ratio of the number of faulty nodes that are correctly diagnosed to the total number of diagnosed faulty nodes.
To begin with, the experiment shown in
Table 3 evaluates how the algorithm performs in augmented cubes of different dimensions. At the same time, the data from
Table 3 are depicted in
Figure 4. From the experimental results of Precision, we can observe that the algorithm does not make false diagnoses by labeling non-faulty nodes as faulty. Moreover, as the dimension of the augmented cube goes up, both the
and
parameters show a steady increase. The reason for the increase in Accuracy is that in higher-dimensional networks, each node has more neighboring nodes, which raises the probability of 0-path connections between adjacent nodes. It is important to note that the decrease in the Recall parameter is also due to this same factor. The improvement in Accuracy is a result of the growing difference between the fault bound of the intermittent fault diagnosis algorithm (
IFDA) and the intermittent fault diagnosability. This growing difference enables more subpaths to meet the Type-I criteria set out in Lemma 15.
Furthermore, considering the influence of the number of tests conducted between adjacent nodes on the diagnostic outcomes, the corresponding experimental results are presented in
Table 4 and
Figure 5a. In this experiment, the dimension of network is set to 10, and the probability of intermittent faulty nodes exhibiting failures is assumed to be 80%. The experimental results reveal that the Precision and Accuracy’ parameters remain relatively constant. As the number of rounds increases, the Accuracy parameter gradually improves and eventually reaching 100%. Additionally, the Recall parameter rapidly increases with the rise in the number of rounds, as multiple tests enhance the probability of nodes exhibiting failure states. It can be concluded that increasing the number of tests between adjacent nodes significantly improves the accuracy of the algorithm’s diagnostic results.
Finally, the simulation results shown in
Table 5 evaluate the impact of the probability of intermittent faulty nodes exhibiting failures on the experimental results. The data from
Table 5 are illustrated in
Figure 5b. In this experiment, the dimension of the augmented cube is set to 10, and the number of testing rounds between nodes is set to 3. Because of the relatively large base of fault-free nodes in the system, the parameters Accuracy and Accuracy’ exhibit minimal changes. Meanwhile, the value of Precision remains consistently at 100%. As the ratio parameter increases, intermittent faulty nodes have a higher probability of exhibiting failure states, leading to a gradual increase in the Recall parameter. The experimental results demonstrate that the probability of intermittent faulty nodes exhibiting failures significantly influences the diagnostic outcomes of the algorithm.