1. Introduction
Monitoring the condition of complex systems in real-time can save valuable time and cost to maintain the system. Fault diagnosis can detect process anomalies and classify the types of anomalies, and has hence drawn enormous attention (e.g., [
1,
2,
3]). In survey papers [
4] and [
5], the methods of fault diagnosis are divided into model-based, signal-based, knowledge-based, and hybrid/active methods. Knowledge-based method is also named data-driven method, where a fault diagnosis model is built through historical data rather than precise mathematical model. Therefore, a data-driven method is suitable for complex systems that are difficult to obtain an accurate model or whose signal is unknown. Data-driven fault diagnosis has been applied to real systems such as wind turbine system [
6], high-speed trains [
7], and induction motor drive system [
8], etc.
On the other hand, many modern engineering systems are modeled as multi-agent systems (MASs), where two or more agents are communicated through a designed protocol to work cooperatively [
9,
10]. Due to the communication, a fault in one agent can degrade performance of its neighbors, and even the whole network. Therefore, an effective fault diagnosis technique is crucial for MAS. Furthermore, a fault alarm from one agent can be induced by its neighboring agents, hence, fault diagnosis for multi-agent system is more challenging compared with single agent system. A variety of fault diagnosis approaches have been developed for MAS recently [
11,
12]. Most existing work of MAS is based on a precise state-space model of each agent as well as their communication, e.g., [
13,
14,
15]. However, the communication between agents can be unknown. Thus, it is difficult to establish an accurate mathematical model. As a result, data-driven fault diagnosis plays an important role in complex MAS.
Among various data driven fault diagnosis methods [
16,
17,
18,
19,
20], the neural network can convert fault diagnosis into a multi-label classification problem, and automatically learn the features of the original data. However, storing and leaning a large amount of data in real-time is challenging for the computation and communication device/software. In order to deal with the limited capability of the device/software, event-triggered mechanize [
21,
22] and distributed methods [
23] have been hot topics in recent years. Specifically, event-triggered fault diagnosis methods have been developed in [
21] and [
22], where the mathematical model of the system is assumed to be known. Nevertheless, when model and communication of MAS are not available, the above event-triggered methodologies are not applicable. Therefore, it is motivated to develop event-triggered data driven fault diagnosis for MAS with unknown mathematical model and unknown communication.
In this paper, a residual-triggered fault diagnosis technique is proposed for MAS. Specifically, a neural network-based state prediction model is established through training historical data offline. Then, online comparison of real state/output and the predicted state/output can generate a residual signal, which indicates whether there is a fault. If the residual exceeds the threshold, it triggers a fault classification training process to identify and locate the fault. This residual-triggered fault diagnosis method does not depend on a mathematical model and communication information. Moreover, online identification of a fault is implemented only in case of fault, hence the data transmission and calculation are reduced. A real experiment on leader-follower inverted pendulum demonstrates the effectiveness of the developed algorithm. The contribution includes: 1. Residual-triggered data-driven fault diagnosis for MAS is a novel topic, where data calculation can be reduced; 2. The designed fault classifiers are distributed, where a fault in one agent can be identified by fault classifier of its neighbor; 3. The communication among agents are internal in the agents but unknown (not available) in state prediction and fault classification, which implies that the designed state prediction and fault diagnosis techniques are fully distributed. It should be mentioned that many existing estimation/prediction models of MAS rely on communication information among agents, such as the adjacency matrix [
13,
14,
15], nevertheless, the adjacency matrix consists of the overall communication information, which makes the developed methods centralized rather than distributed. In this article, only input and output data is required in the developed state prediction and fault classification method, and communication is not used.
The organization of the paper is as follows. After the introduction section, the data-driven state prediction algorithm is introduced in
Section 2. Based on the prediction model, a residual-triggered fault classification technique is proposed in
Section 3.
Section 4 presents the experimental results in a twin rotational inverted pendulum system with leader-follower mechanism. The paper is ended by
Section 5 with the conclusion and future researches.
2. Data-Driven State Prediction for Multi-Agent System
In this section, we introduce the establishment of a neural network model to predict the state of a multi-agent system with unknown communication. To be precise, the controller of each agent and communication protocol among the agents are pre-designed to guarantee the performance of a multi-agent system (i.e., consensus and robustness) in a fault-free case, and the design of the controller and communication is not of concern in this paper. The physical models of the agents are unknown or highly nonlinear. Moreover, the communication protocol is internal to the system, but not available for the prediction model.
The diagram of the prediction model for the multi-agent system is shown in
Figure 1.
In
Figure 1,
and
represent state and control input of agent
, and
is the number of agents;
represents the time of
, where
is the sampling time;
and
represent the time of
and
, respectively,
is the prediction of
. Firstly, the state of each Agent
is recorded in the corresponding Register
at the past two sampling times, namely
and
are obtained. Then,
,
and control input of Agent
at current time
are used to train the Prediction Model
. The output of the prediction model is the predicted state at the current time
. By comparing the real state
and the predicted state
can be generated. The residual values are sent into Enable Controller, which is responsible for deciding whether the residual exceeds the threshold. To be precise, when it exceeds the threshold, it is recognized that there is a fault in the system. At this time, the enable signal stops the prediction model and triggers fault diagnosis algorithms, which will be presented in
Section 3.
The enable control algorithm is described as follows:
where,
represents the residual threshold of Agent
,
is the output of Enable Controller.
Remark 1. It should be mentioned that communication among agents is not used in the prediction model. The “unknown communication” in this paper means the communication is internal to the MAS, but cannot be used in the prediction/fault diagnosis. Moreover, the controllers are predesigned for the MAS, which is not under concern in this paper.
The network structure used to build the prediction model is the back propagation (BP) neural network, which is known as a multilayer feedforward neural network trained by error back propagation algorithm. It can learn and store a large number of input–output pattern mapping relations without concrete mathematical functions. A neural network is composed of a number of neurons, and the BP neural network of a single neuron for predicting the concerned model is shown in
Figure 2.
In the diagram,
and
represent the weight parameter and bias parameter between hidden layers, respectively;
represents the number of current layers;
and
represent the number of current nodes in the current layer and the number of current nodes in the upper layer, respectively.
represents the input of the neuron and the output of the weighted multiplication summation. A represent the input or output of the neuron. Where:
The hidden layer takes the Tansig function as the excitation function
, where:
The reason for using the Tansig function is that the training data changes periodically in . Using Tansig can accelerate the decline of training gradient.
The output of the neural network is the predicted value of system state
in a fault-free scenario. Therefore, the output layer uses the Purelin function as the activation function, which is defined as
, and
The predicted state is compared with the actual system state and the network topology structures and training parameter should be designed to make closed to .
In the healthy state, the residual between and is convergent. However, when the system is in the fault state, the residual will exceed the threshold. At this time, it is deemed to be in the fault state and start fault diagnosis.
Root mean square error (RMSE) between the predicted value and the actual value is used as the evaluation standard of the prediction accuracy. In BP neural network, the gradient descent is used to update the and until the RMSE between and is locally minimum. As a result, the optimal weight and bias parameters of the neural network are calculated.
There are a variety of network structures and learning rates. In order to obtain optimized performance of the state prediction, RMSEs of different hierarchical structures under the same training parameters and the same training time are generated and compared. Generally speaking, smaller the RMSE value indicates better training performance, however, the generalization capability should also be considered to avoid over fitting. Accordingly, the network structure can be determined. Subsequently, learning rates are determined by comparing their accuracy with the selected network structure.
Then, the developed state prediction model can be implemented to a real-time system to predict the state in absence of fault. By comparing real state and the predicted health state, a residual signal can be generated. This residual signal can indicate whether a fault occurs, and if the residual signal excesses a threshold, it triggers a fault classification mechanism, which is designed in
Section 3.
3. Sensor Fault Classification
The fault of one sensor may lead to the fault of the whole system [
23]. Therefore, it is very important to diagnose the fault of the sensor.
In this section, a data-driven sensor fault detection and classification technique is presented. Three typical sensor faults are under consideration: zero-output fault, drift fault, and deviation fault.
Figure 3,
Figure 4 and
Figure 5 are schematic diagrams of the three types of sensor faults. Moreover, the three types of faults can exist in different sensors and different agents. The objective of this section is to use a neural network classifier to identify and locate different types of faults.
Specifically, the zero-output sensor fault [
24] is molded as:
where
represents sensor fault,
denotes the time that a sensor fault occurs,
is the real system output. In engineering, it is easy to occur when the signal is open circuited. A deviation fault is molded as:
where
represents deviation fault and
is a bounded constant. The deviation fault is easy to appear in the current or voltage sensor [
25]. A drift fault is molded as:
where
represents drift fault and
is an irregular bounded disturbance signal, which is a sensor noise (due to the influence of external environment and internal factors of the sensor) [
26].
The data used to train the classifier is
. The procedure to select an appropriate network structure and learning rate is the same with state prediction. The output of the classifier is the probability of each fault category, therefore, the last output layer activation function is replaced by the Softmax function. Through non-maximum suppression, the original network output is fuzzed, and the fault type and location with the highest probability can be determined. The network structure diagram of a fault classification model can be found in
Figure 6.
In the fault classification model, the amount of network input data can be large. Identification of such an amount of data in real-time brings a challenge to the computation ability. As a result, a triggering mechanism is designed to active the identification. Specifically, the prediction model introduced in
Section 2 is implemented in the system to predict the system state and output in absence of fault. By comparing the predicted healthy output and the measured output, which can be abnormal in the case of sensor faults, a residual signal can be generated. When a sensor fault occurs, the residual signal exceeds the threshold, and the fault diagnosis model of the neural network is triggered to identify and locate the fault types. The state prediction triggered fault classification mechanism is illustrated in
Figure 7.
When the residual in
Figure 1 is greater than the set threshold, Enable Controller sends an enable signal to the register of fault classifier in
Figure 7, and the register starts to record the abnormal state data of the agent for 4
s. The stored data is then sent to the fault diagnosis network. The fault diagnosis network is obtained by labeling historical fault data and off-line supervised learning. The diagnosis model can classify the faults in agent
and its neighbor through the output of agent
. Moreover, communication is not utilized in the fault classifier.