Next Article in Journal
Next Generation AT-Cut Quartz Crystal Sensing Devices
Previous Article in Journal
Distributed Dynamic Host Configuration Protocol (D2HCP)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Immunity-Based Diagnosis for a Motherboard

1
Department of Information Network and Communication, Kanagawa Institute of Technology 1030, Shimo-ogino, Atsugi, Kanagawa 243-0292, Japan
2
Department of Knowledge-Based Information Engineering, Toyohashi University of Technology, 1-1, Tempaku, Toyohashi, Aichi 441-8580, Japan
*
Author to whom correspondence should be addressed.
Sensors 2011, 11(4), 4462-4473; https://doi.org/10.3390/s110404462
Submission received: 18 February 2011 / Revised: 29 March 2011 / Accepted: 14 April 2011 / Published: 18 April 2011
(This article belongs to the Section Physical Sensors)

Abstract

: We have utilized immunity-based diagnosis to detect abnormal behavior of components on a motherboard. The immunity-based diagnostic model monitors voltages of some components, CPU temperatures, and fan speeds. We simulated abnormal behaviors of some components on the motherboard, and we utilized the immunity-based diagnostic model to evaluate motherboard sensors in two experiments. These experiments showed that the immunity-based diagnostic model was an effective method for detecting abnormal behavior of components on the motherboard.

1. Introduction

The technology of cloud computing has become prevalent, and the demand for data centers that provide such cloud computing has increased. Each server in the data center must be highly available for data processing and data transmission. To maintain system availability, it is important to detect equipment abnormalities during their early stages, before system failure. The simplest way of diagnosing abnormalities consists of evaluating each component individually by comparing the output value of its sensor with a predetermined threshold value. However, it is difficult to identify the abnormal component using this method [1].

Another method of diagnosis uses an immunity-based diagnostic model [27], which is derived primarily from the concept of an immune system [8]. In the biological immune systems, each immune cell can test other immune cells and can be tested by other immune cells, and protects against disease by identifying and eliminating nonself entities (i.e., pathogens). Similarly, in our diagnostic model, mutual tests are performed among nodes (i.e., sensors), and this protects against system failure by identifying abnormal nodes. The features of our diagnostic model are similar to the features of the biological immune systems, therefore, the diagnostic model is called the immunity-based diagnostic model. This diagnostic model has been applied to node fault diagnosis in processing plants [9], to self-monitoring/self-repairing in distributed intrusion detection systems [3], and to sensor-based diagnostics for automobile engines [4]. This paper reports on the use of an immunity-based diagnostic model for detecting the abnormal behavior of components on a motherboard, including CPUs, memories, chipsets and Fans.

2. Embedded Sensors on the Motherboard

Since a motherboard has multiple sensors, including voltage, temperature, and fan speed sensors, abnormalities on the motherboard can be detected by monitoring these sensors. We therefore used sensor output values for diagnosis of the motherboard.

We collected sensor output values on a server from July 27th to September 18th. The specifications of the server are shown in Table 1. The average air temperature during that period was 25.3 °C, ranging from 20.1 °C to 32.8 °C. Data were collected using lm_sensors, a hardware health monitoring package for Linux that allows information to be obtained from temperature, voltage, and fan speed sensors.

We collected the output values from all 29 sensors on the motherboard, from which we calculated the correlation coefficients of all sensors. The correlation coefficient C of a set of sensor data (x, y) = {(xi, yi) (i = 1,2, …n)} is given by the following equation:

C = i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 i = 1 n ( y y ¯ ) 2
where:
x ¯ = 1 n i = 1 n x i , y ¯ = 1 n i = 1 n y i

We observed correlations between five sensors (Table 2), and these five sensors are easy to assume that the test cases for evaluation. Therefore, we used these five sensors for evaluation.

3. Immunity-Based Diagnostic Model

The immunity-based diagnostic model has the features of a dynamic network [7], in which diagnoses are performed by mutually testing nodes, i.e., sensors, and by dynamically propagating their active states. In this paper, the targets of the immunity-based diagnosis are components with a sensor embedded on a motherboard. Each sensor can test linked sensors and can be tested by linked sensors. Each sensor is assigned a state variable Ri indicating its credibility.

The initial value of credibility Ri (0) is 1. The aim of the diagnosis is to decrease the credibility of all the abnormal sensors. If the credibility of a sensor is less than a threshold value, the sensor is considered abnormal in this model.

When the value of credibility Ri is between 0 and 1, the model is called a gray model, reflecting the ambiguous nature of credibility. The gray model is formulized by the equation:

d r i ( t ) dt = j T ji + R j ( t ) r i ( t )
where:
R i = 1 1 + exp   ( r i ( t ) )
T ij + = { T ij + T ji 1 ,   if one of evaluation from i to j or j to i exists , 0 ,   if neither evaluation from i to j nor j   to i exists ,
T ij = { 1 ,   if a balance formula between sensors  i   and  j   is satisfied , 1 ,   if a balance formula between sensors   i   and  j   is not satisfied , 0 ,   if there is no balance formula between sensors  i   and   j .

Equation (3) controls the commitment of the node by determining the variable ri(t) based on the evaluations to and from the node i and the active/inactive state of the evaluating and being evaluated nodes j. In the right-hand side of Equation (3), the first term is the sum of evaluations from other nodes for node i. The second term is an inhibition term that maintains ambiguous states of credibility. Activeness of each node i will be expressed by a continuous time dependent variable ri ∈ {–∞, ∞} or its normalization Ri ∈ [0,1]; Ri = 1 for fully active (Ri = 0 for fully inactive).

In this model, equilibrium points satisfy the equation r i ( t ) = j T j i + R j ( t ). Thus Ri monotonically reflects the value of j T ji + R j ( t ). If j T ji + R j ( t ) is close to 0, then Ri is close to 0.5. The balance formulas are shown in Table 3. We determined the balance formulas by calculating the relationships of the output value of the sensors by trial and error. The flowchart of the diagnostic model is shown in Figure 1.

4. Evaluations of Immunity-Based Diagnosis of the Motherboard

We evaluated the immunity-based diagnostic model for motherboard sensors in two experiments. In the first experiment, we compared two diagnostic models: a standalone diagnostic model and a mutual diagnostic model, i.e., an immunity-based diagnostic model. In the second experiment, we compared two networks in the immunity-based diagnostic model: a fully-connected network and a correlation-based network. We determined the normal ranges by calculating the balance formulas. Table 4 shows the normal ranges. Each evaluation was based on the four test cases shown in Table 5, and the value of test cases was based on the range of sensor output values shown in Table 2 and the normal ranges shown in Table 4.

The test cases in 1 and 2 assumed that the speed of Fan5 was largely out of the range shown in Table 2. A significant decrease in fan speed would therefore cause the CPU temperature to rise, with the overheated CPU causing the server to crash. Conversely, a significant increase in fan speed would waste power and decrease the life span of the fan. In addition, the output values of the sensors were largely out of the range shown in Table 4. Therefore, we determined that the test cases of 1 and 2 are abnormal.

The test cases of 3 and 4 assumed that the output values of the sensors were slightly out of the range shown in Table 2. The test case of 3 assumed that the speed of Fan5 was slightly higher than that of Table 2, but that Fan5 was not abnormal. The test case of 4 assumed that the temperature of CPU1 was slightly higher than that of Table 2, but that CPU1 was not abnormal. Temperatures outside the range are not always abnormal, because these temperatures depend on room temperature. For example, maximum of temperature differences is 12.7 °C. In addition, the output values of the sensors were inside of the range shown in Table 4. Therefore, we determined that the test cases of 3 and 4 are normal.

4.1. Stand Alone vs. Mutual Diagnosis

We evaluated a standalone diagnosis and a mutual diagnosis. According to the standalone diagnosis, a component is considered abnormal if the sensor output value is outside the range shown in Table 2. In contrast, mutual diagnosis uses the immunity-based diagnostic model.

Tables 6 and 7 show the results of the standalone and mutual diagnoses, respectively. In Table 6, a credibility of 0 indicates that the output value was not within range, i.e., it was abnormal, whereas a credibility of 1 indicates that the output value was within range, i.e., it was normal. In Table 7 the credibility corresponds to Ri of Equation (2), i.e., it expresses the probability that component i is normal. We assumed that a component on the motherboard was abnormal if its credibility was less than 0.1. This threshold value is an empirical value by trial and error. A diagnosis of “X” indicates an abnormality, whereas a diagnosis of “O” indicates an absence of abnormality. An accuracy of “O” indicates a correct decision, an accuracy of “X” indicates an incorrect decision, and an accuracy of “P” indicates that the diagnostic model could not identify the abnormal component, although it detected multiple abnormalities.

The standalone diagnostic model detected abnormalities in all test cases, because all test cases have values out of the range. In test cases 1 and 2, the standalone diagnostic model failed to identify the abnormal component. This model also misdiagnosed test cases 3 and 4, judging them abnormal since the output values were slightly out of the range. In contrast, the mutual diagnosis model identified the abnormal Fan in test case 2 since only the credibility of Fan5 was 0.00. In test case 3, the mutual diagnosis made a correct decision. Consequently, the mutual diagnosis model is more accurate than the standalone diagnosis model.

4.2. Fully-Connected Network vs. Correlation-Based Network

The immunity-based diagnostic model contains a network for mutually testing the credibility of nodes. In the above section, the network of the immunity-based diagnostic model was fully-connected, with each sensor connected to all other sensors, and each sensor mutually tested by all other sensors. A fully-connected network can include some connections between sensors with weakly correlated output values. These connections may be unreliable for mutually testing the credibility of their sensors. Therefore, we removed such connections from a fully-connected network, forming a correlation-based network.

We used the immunity-based diagnostic model to evaluate two network models, a fully-connected network and a correlation-based network. Figure 2 shows the correlation coefficients among the 5 sensors in Table 2. Any pair of sensors with a correlation greater than a threshold value was defined as connected. In this experiment, we built correlation-based networks for all the thresholds, using the correlation coefficients shown in Figure 2. Typical correlation-based networks are shown in Figure 3.

All test cases were the same as those in Table 5. Table 8 shows the results of correlation-based networks. A network with a threshold less than 0.01 was identical to a fully-connected network, whereas a network with a threshold greater than 0.90 had no connection between any pair of sensors, i.e., a diagnostic model with a threshold greater than 0.90 was identical to a stand alone diagnostic model. These diagnostic models were evaluated in the previous section.

In Table 8(A) the diagnostic models with thresholds of 0.01 misidentified the normal CPU1 in test cases 1 and 4. In Table 8(B), the diagnostic models with thresholds of 0.40 misidentified the normal CPU1 in test cases 1, 2 and 4. In Table 8(C), the diagnostic model with a threshold of 0.52 identified the abnormal Fan in test cases 1 and 2, and did not falsely identify an abnormality in test case 3, but misidentified the abnormal CPU1 in test case 4 as normal. In Table 8(D,E), the diagnostic models with thresholds of 0.55 and 0.62 correctly identified the abnormal Fan in test cases 1 and 2 and did not falsely identify abnormalities in test cases 3 and 4. In Table 8(F), the diagnostic model with a threshold of 0.90 identified only test case 3, because the abnormal sensor of Fan5 was isolated from the correlation-based network. This diagnostic model could not diagnose the isolated sensors, because the credibility of each was always 0.50.

Even networks with the best thresholds, of 0.55 and 0.62, have isolated sensors of VcoreA and Vbat. The sensor output values of VcoreA and Vbat were approximately constant over time, i.e., their standard deviations were very small (Table 2), such that the standalone diagnostic model would correctly detect their abnormalities. Therefore, we applied standalone diagnosis only to these isolated sensors (Figure 4). In other words, we use a hybrid diagnosis model, using both standalone and immunity-based diagnosis. Sensors on the correlation network were diagnosed by the immunity-based diagnostic model, and isolated sensors were diagnosed by the stand alone diagnostic model.

4.3. Discussions of Multiple Diagnostic Networks

We hypothesized that utilizing multiple diagnostic networks, in which isolated nodes are connected to a network or another isolated node, would improve diagnostic accuracy. All combinations of the multiple networks used for immunity-based diagnosis are shown in Figure 5. Each evaluation was based on the four test cases shown in Table 5. The diagnostic accuracy of all multiple networks is shown in Table 9. In Table 9, a diagnostic accuracy of “P” indicates that the diagnostic model could not identify the abnormal component, although it detected multiple abnormalities.

We found that diagnostic models (A), (C), (F) and (G) made correct decisions, whereas the other diagnostic models made incorrect decisions. In test cases 1, 2 and 3, each of the diagnostic networks (A), (C), (F) and (G) consisted of 3 sensors including Fan5. In contrast, the other diagnostic networks either consisted of 2 sensors including Fan5 or were weakly correlated networks. In test case 4, all diagnostic networks other than (B) and (I) showed results similar to those of CPU1.

For example, Table 10 shows the successful results of diagnostic network (C), and Table 11 shows the unsuccessful results of diagnostic network (I).

The diagnostic model in Table 11 misidentified the abnormal Fan5 in test case 2 and test case 3. These results indicate that the diagnostic network consisting of 3 sensors is more accurate than the diagnostic network consisting of two sensors. In test case 4 of Table 11, the diagnostic network misidentified the normal CPU1 due to a weak correlation network shown in Figure 2, although CPU1 belongs to the diagnostic network consisting of three sensors. These results indicate that the strong correlated diagnostic network is more accurate than the strong weakly correlated diagnostic network. Therefore, these experiments showed that diagnostic accuracy depends on the number of sensors in the diagnostic network (i.e., the size of diagnostic network) and the correlation between sensors of network.

5. Conclusions

We have applied immunity-based diagnosis to the detection of abnormal behaviors of components on a motherboard. We simulated the abnormal behaviors of some components on the motherboard, and we evaluated the ability of this model to diagnose abnormalities of components of motherboard sensors by two experiments. In the first experiment, which compared an immunity-based with a stand-alone diagnostic model, we found that the immunity-based diagnostic model outperformed the standalone diagnostic model. In the second experiment, which compared a fully-connected network with a correlation-based network for mutually testing the credibility of sensors, and we found that the correlation-based network improved the diagnosis accuracy in all test cases. In addition, we evaluated all the combinations of the diagnostic networks, and we showed that diagnostic accuracy depends on the size of the network and the correlation between nodes of the network. At the same time, we showed that the immunity-based diagnostic model with multiple diagnostic networks was an effective method for detecting abnormal behavior of components on the motherboard.

In addition, we utilized a hybrid model, consisting of the standalone and immunity-based diagnostic models, to diagnose nodes connected to the network, as well as nodes isolated from the network. The accuracy of hybrid diagnosis, however, depends on the stand alone diagnosis for the isolated nodes. In future, we will attempt to improve the accuracy of diagnosis of isolated nodes.

References

  1. Tanaka, T; Kawazu, T; Kanda, S. Computer-Assisted Diagnostic System Applied with ANFIS. Biomed. Fuzzy Syst. Assoc 2003, 5, 49–54. [Google Scholar]
  2. Ishida, Y. An Immune Network Approach to Sensor-Based Diagnosis by Self-Organization. Comples Syst. Publ 1996, 10, 73–90. [Google Scholar]
  3. Watanabe, Y; Ishida, Y. Immunity-Based Approaches for Self-Monitoring in Distributed Intrusion Detection System. In Knowledge-Based Intelligent Information and Engineering Systems; Springer: Berlin, Germany, 2003; LNCS; Part 2,; pp. 503–510. [Google Scholar]
  4. Ishida, Y. Designing an Immunity-Based Sensor Network for Sensor-Based Diagnosis of Automobile Engines. In Knowledge-Based Intelligent Information and Engineering Systems; Springer: Berlin, Germany, 2006; LNCS; Volume 4252,; pp. 146–153. [Google Scholar]
  5. Watanabe, Y; Ishida, Y. Mutual Tests among Agents in Distributed Intrusion Detection Systems Using Immunity-Based Diagnosis. Proceedings of International Symposium on Artificial Life and Robotics (AROB 8th), Beppu, Japan, January 2003; 2, pp. 682–685.
  6. Laurentys, CA; Palhares, RM; Caminhas, WM. Design of an Artificial Immune System Based on Danger Model for Fault Detection. Expert Systems with Applications. Expert Syst. Appl. Int. J 2010, 37, 5145–5152. [Google Scholar]
  7. Laurentys, CA; Palhares, RM; Caminhas, WM. A Noval Artificial Immune System for Fault Behavior Detection. Expert Systems with Applications. Expert Syst. Appl. Int. J 2011, 38, 6957–6966. [Google Scholar]
  8. Jerne, NK. The Immune System. Sci. Amer 1973, 229, 52–60. [Google Scholar]
  9. Ishida, Y. Immunity-Based Systems A Design Perspective; Springer-Verlag: New York, NY, USA, 2004. [Google Scholar]
Figure 1. Flowchart of the diagnostic model.
Figure 1. Flowchart of the diagnostic model.
Sensors 11 04462f1 1024
Figure 2. Correlation coefficients among five sensors.
Figure 2. Correlation coefficients among five sensors.
Sensors 11 04462f2 1024
Figure 3. Correlation-based networks for thresholds of (a) 0.01, (b) 0.40, (c) 0.52, (d) 0.55, (e) 0.62, and (f) 0.90.
Figure 3. Correlation-based networks for thresholds of (a) 0.01, (b) 0.40, (c) 0.52, (d) 0.55, (e) 0.62, and (f) 0.90.
Sensors 11 04462f3 1024
Figure 4. Example of a hybrid diagnostic model with a threshold of 0.55.
Figure 4. Example of a hybrid diagnostic model with a threshold of 0.55.
Sensors 11 04462f4 1024
Figure 5. Multiple diagnostic networks.
Figure 5. Multiple diagnostic networks.
Sensors 11 04462f5 1024
Table 1. Server specification.
Table 1. Server specification.
MotherboardSupermicro® X7DVL-I
OSDebian GUN/Linux 5.0
Kernel2.6.26-2-amd64
Modulelm-sensors version 3.0.2 with libesensors version 3.0.2
CPUIntel® Xeon E5410 2.33GHz×2
Power supplyThermaltake Toughpower 700w
FanXFan model: RDM8025B×2, Gantle Typhoon D0925C12B2AP×2, ADDA CFX-120S
Table 2. Sensors used for evaluation and the range of sensor output values.
Table 2. Sensors used for evaluation and the range of sensor output values.
SensorComponentRangeMeanStandard deviation

CPU1CPU temperature11.00–48.00(°C)18.684.550
Core2Core2 temperature35.00–72.00(°C)42.794.450
VcoreACoreA voltage1.11–1.19(V)1.1210.007
VbatInternal battery voltage3.23–3.26(V)3.2370.009
Fan5Fan speed1,012–1,044(RPM)10345.021
Table 3. Balance formulas between sensors.
Table 3. Balance formulas between sensors.
SensorBalance formula

CPU1-Core2|CPU1-Core2| ≤ 26
CPU1-VCoreA|CPU1-VCoreA × 25| ≤ 20
CPU1-Vbat|CPU1-Vbat × 9| ≤ 18
CPU1-Fan5|CPU1-Fan5/34| ≤ 18
Core2-VCoreA|Core2-VcoreA × 45.5| ≤ 28
Core2-Vbat|Core2-Vbat × 16| ≤ 20
Core2-Fan5|Core2-Fan5/19| ≤ 21
VCoreA-Vbat|VCoreA-Vbat/2.8| ≤ 0.05
VCoreA-Fan5|VCoreA-Fan5/893| ≤ 0.07
Vbat-Fan5|Vbat-Fan5/316| ≤ 0.07
Table 4. Normal ranges derived from the balance formulas.
Table 4. Normal ranges derived from the balance formulas.
SensorNormal range

CPU14.75–78.16(°C)
Core231.68–73.34(°C)
VcoreA0.99–1.31(V)
Vbat2.52–3.96(V)
Fan5821.56–1,232.34(RPM)
Table 5. Test cases.
Table 5. Test cases.
CaseSensor output value
State
CPU1Core2VcoreAVbatFan5

1Fan speed is very low.70651.123.23200Abnormal
2Fan speed is very high.9351.123.232,000Abnormal
3Fan speed is slightly high.14351.123.231,050Normal
4CPU temperature is slightly high.50601.123.231,020Normal
Table 6. Results of the stand alone diagnosis.
Table 6. Results of the stand alone diagnosis.
Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

101110XP
201110XP
311110XX
401111XX
Table 7. Results of the mutual diagnosis.
Table 7. Results of the mutual diagnosis.
Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.000.830.780.780.00XP
20.730.990.990.730.00XO
30.990.910.990.990.99OO
40.001.001.001.001.00XX
Table 8. (A) Results of a correlation-based network with a threshold of 0.01.
Table 8. (A) Results of a correlation-based network with a threshold of 0.01.
Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.000.970.870.870.00XP
20.960.980.980.120.00XO
30.990.990.980.510.98OO
40.000.980.990.980.99XX
(B) Results of a correlation-based network with a threshold of 0.40.

Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.000.970.870.870.00XP
20.000.980.980.120.00XP
30.990.990.980.730.73OO
40.000.990.880.980.98XX
(C) Results of a correlation-based network with a threshold of 0.52.

Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.340.670.500.340.01XO
20.810.610.500.810.00XO
30.990.990.500.730.73OO
40.000.980.500.980.98XX
(D) Results of a correlation-based network with a threshold of 0.55.

Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.870.970.500.870.00XO
20.870.970.500.870.00XO
30.980.990.500.880.98OO
40.670.950.500.870.67OO
(E) Results of a correlation-based network with a threshold of 0.62.

Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.840.840.500.500.00XO
20.840.840.500.500.00XO
30.610.810.500.500.81OO
40.810.610.500.500.81OO
(F) Results of a correlation-based network with a threshold of 0.90.

Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.840.840.500.500.50OX
20.840.840.500.500.50OX
30.840.840.500.500.50OO
40.840.840.500.500.50OO
Table 9. Diagnostic accuracy of multiple networks.
Table 9. Diagnostic accuracy of multiple networks.
Test case(A)(B)(C)(D)(E)(F)(G)(H)(I)(J)

1OXOXXOOXPX
2OXOXXOOOXX
3OOOOOOOXOO
4OXOOOOOOXO
Table 10. Results of diagnostic model (C).
Table 10. Results of diagnostic model (C).
Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.6400.6400.6590.6590.021XO
20.6400.6400.6590.6590.021XO
30.8440.8440.6590.6590.844OO
40.3850.6830.6590.6590.385OO
Table 11. Results of diagnostic model (I).
Table 11. Results of diagnostic model (I).
Test caseCredibility
DecisionAccuracy
CPU1Core2VcoreAVbatFan5

10.0210.2930.6400.6400.293XP
20.3850.2930.6830.3850.293OX
30.8440.6590.8440.8440.659OO
40.0210.6590.6400.6400.659XX

Share and Cite

MDPI and ACS Style

Shida, H.; Okamoto, T.; Ishida, Y. Immunity-Based Diagnosis for a Motherboard. Sensors 2011, 11, 4462-4473. https://doi.org/10.3390/s110404462

AMA Style

Shida H, Okamoto T, Ishida Y. Immunity-Based Diagnosis for a Motherboard. Sensors. 2011; 11(4):4462-4473. https://doi.org/10.3390/s110404462

Chicago/Turabian Style

Shida, Haruki, Takeshi Okamoto, and Yoshiteru Ishida. 2011. "Immunity-Based Diagnosis for a Motherboard" Sensors 11, no. 4: 4462-4473. https://doi.org/10.3390/s110404462

Article Metrics

Back to TopTop