processes

: In the era of Industry 4.0, highly complex production equipment is becoming increasingly integrated and intelligent, posing new challenges for data-driven process monitoring and fault diagnosis. Technologies such as IIoT, CPS, and AI are seeing increasing use in modern industrial smart manufacturing. Cloud computing and big data storage greatly facilitate the processing and management of industrial information ﬂow, which helps the development of real-time fault diagnosis (RTFD) technology. This paper provides a comprehensive review of the latest RTFD technologies in the ﬁeld of industrial process monitoring and machine condition monitoring. The RTFD process is introduced in detail, starting with the data acquisition process. The current RTFD methods are divided into methods based on independent feature extraction, methods based on “end-to-end” neural networks, and methods based on qualitative knowledge reasoning from a new perspective. In addition, this paper discusses the challenges and potential trends of RTFD in future development to provide a reference for researchers focusing on this ﬁeld.


Introduction
The design of process monitoring and fault diagnosis methods in industrial manufacturing has become a compelling research topic in recent years. With the rapid development of automation technology, modern industrial manufacturing equipment and production processes are becoming more and more complex. The new generation of networked information technologies, data analytics, and predictive modeling provide technical support to achieve more intelligent industrial manufacturing systems. The development of intelligence potentially amplifies the scope and depth of the impact of failures. The damage caused by a chain of abnormal reactions attached to a failure would be incalculable for modern large industrial production processes or safety-critical systems. A key component in the development of smart manufacturing is an effective fault diagnosis mechanism, which can accurately identify, diagnose, isolate, and recover abnormal operating conditions [1,2]. Therefore, smarter fault detection, diagnosis, and prediction have received increasing research attention.
Fault diagnosis techniques are mainly used in the following applications: condition monitoring and predictive maintenance of mechanical production equipment [3][4][5]; process monitoring [6] and production scheduling [7] in industrial manufacturing processes; and abnormal behavior monitoring of intelligent terminals such as robots [8,9] and self-driving cars [10,11]. In addition, real-time detection, identification, and diagnosis of faults in safetycritical systems that need to ensure human safety, environmental health, and economic security are particularly important [12]. This includes medical and surgical equipment, search and screening; data performance statistics; topic feature mining. The detailed steps are as follows.

Step 1: Literature Search and Screening
To capture international cutting-edge research results, this work uses the Web of Science, IEEE Xplore, and Science Direct science citation databases to conduct a global search of relevant literature. In order to find the right article within the topic, "real-time" and "fault diagnosis" were used as keywords for the topic search. In addition, to keep up with the trend of RTFD application in industrial smart manufacturing, some other search terms were added, such as "fault prognostics", "fault detection", "artificially intelligent", and "smart manufacturing".
More than 300 articles were retrieved via a precise search of subject matter such as titles, abstracts, and keywords while avoiding the substitution of synonyms. Considering the need for complete annual data indicators for statistical work, the time span chosen for this work was 2010-2021; journal categories: "Articles", "Journals", and "Review Articles". Relevant journals dealing with the field of engineering were used as refinement conditions for a secondary search of the papers. The number of articles retained after the secondary search was 224. This work was then filtered based on details such as abstracts to remove a subset of articles that were not relevant to the focus of the review. Some articles that were repeated in the three major citation databases were also filtered out. Ultimately, a total of 110 papers were selected for the review and analysis of RTFD . In this section, 110 papers are initially counted and analyzed and divided according to application areas, among which 22 papers are related to "smart manufacturing and Internet of Things", 18 papers are related to "industrial process monitoring", 41 papers are related to "machinery condition monitoring", and 29 papers are related to "power transmission monitoring", as shown in Figure 1.

Step 2: Data Performance Statistics
A more detailed data review of the selected literature in Step 1 is necessary to highlight the research fervor and attention given to RTFD. In this work, the chronological volume trends of the selected literature were counted and the following results have been obtained. Since the beginning of 2010, there has been a general trend of exponential increase in the amount of international literature and institutional participation in fault diagnosis and smart manufacturing. Specifically, among the searched literature, 4 studies were published in 2010; this number had reached 34 by 2021, with a growth rate of 112.5% compared to the previous year. It is worth mentioning that the average cumulative literature growth rate during the 12-year period from 2010 to 2021 has reached as high as 35.72%, which shows that this research topic is popular, with a strong research focus, and the review study is valuable.

Step 3: Topic Feature Mining
After sifting and refining the selected papers, this work further applied text mining tools for mining analysis of topic features and clustering statistics of scientific research relationships. ITGInsight software [112] is an advanced scientific text mining and visual analysis tool, which is mainly used for visual analysis and mining of scientific texts and Internet text data. The visualization mining methods used in this work mainly include coupling relationship visualization, co-occurrence relationship visualization, and evolution analysis visualization.
Before reviewing the vast amount of literature, it is necessary to categorize it according to relevance to facilitate thematic clustering and targeted analysis. This work achieved preliminary categorization by analyzing the coupling relationships among the literature. The literature based on the same citation set has similar research problems, which is suitable for a centralized review to obtain approximately consistent conclusions.
An article usually contains more than one topic to varying degrees, and keywords are the authors' high-level summaries of literature topics. Therefore, this work conducted a co-occurrence analysis for keywords to get the number of papers weighted for each topic to facilitate a more focused discussion. As shown in Figure 2, there are strong co-occurrence relationships among "fault diagnosis", "fault detection", "machine learning", "condition monitoring", "feature extraction", "smart manufacturing", and other keywords, which verifies the preliminary statistical results in Step 1. Moreover, keywords such as "real-time systems", "convolutional neural network", "predictive maintenance", "cloud computing", and "big data" need to be further analyzed to explore more potential research directions of RTFD. In order to further highlight the development trend and direction of fault diagnosis, this work also performed a visual evolution analysis of the regional sources and keywords of the literature, respectively, in order to inspire the readers' thinking, and the results are shown in Figure 3. In terms of the country share, most of the papers selected in this review were contributed by China and the United States, accounting for 40.9% and 16.3%, respectively. Figure 3a retains two countries with large contributions, showing the proportion and trend of the number of literature issued by different countries in each period. Figure 3b shows the hot topics and their weights in each period, which can provide a useful reference for researchers to get a comprehensive overview of this field.

Real-Time Fault Diagnosis Process
In this section, RTFD techniques are further classified according to the different implementation methods, and their industrial applications are highlighted for review. In addition, this section describes the RTFD process in detail, including data acquisition, preprocessing processes such as denoising or dimensionality reduction if necessary, as well as selection of the RTFD method and its working process. It also gives a flow chart of the method, as shown in Figure 4.

Data Acquisition
The collected value of the sensor directly represents the state of the detected object, so sensor data collection is essential for a fault diagnosis system. The data acquisition step is usually performed by various types of sensors, wireless sensor networks [39], or other information acquisition techniques. The types and manifestations of faults vary from scenario to scenario, which leads to large variation between fault information collection methods. In general, the commonly used monitoring signals in condition monitoring of mechanical equipment are vibration, acoustic emission, displacement, velocity, temperature, torque, and pressure [40][41][42], among which the vibration signal is the most commonly used fault diagnosis signal due to its low cost and high reliability. In industrial process monitoring, commonly monitored variables include pressure, temperature, flow, level, humidity, and concentration [43,44].
For different fault diagnosis objects, the selection of sensors can be referred to as follows. Fault information of critical rotating components such as bearings and gearboxes is expressed in the form of strain, vibration, and acoustic signals, and vibration and acoustic sensors can be used for data acquisition. In particular, the advantage of acoustic emission sensing technology is its ability to effectively detect early defects in bearings and gears, especially in low-speed operating conditions and low-frequency noise environments of machines. Magnetoelectric speed sensors based on the electromagnetic induction principle are commonly used to detect the rotational speed of rotating equipment. For fault information of electrical equipment such as circuit breakers and distribution boxes, current and voltage transformers can be used for data acquisition. In addition, since heat generation is usually an early symptom of equipment failure, thermal image-based fault diagnosis methods are widely used for electrical equipment, rotating machinery, etc., in industrial fields. As data acquisition techniques become more and more intelligent, this subsection further classifies data acquisition methods into invasive and non-invasive acquisition to facilitate researchers' choices.

Invasive Data Acquisition Methods
The so-called intrusive data acquisition method mainly refers to the use of wired or contact sensors to accomplish the information acquisition task. Sensors that detect vibration, acceleration, pressure, and electrical signals are mostly intrusive sensors. Many bearing failures are usually manifested in high frequencies, so fault diagnosis methods based on high sampling rate modeling are more effective. In the bearing failure study by Shenfield et al. [45], experimental data were obtained from high-frequency vibration signals collected by accelerometers at the drive and fan ends of the test equipment, sampled at 48 kHz. The experimental dataset of Zhong et al. [46] was obtained from a vibration sensor mounted on a turbine gearbox. To avoid missing details of the fault information, the sampling frequency was set to 4096 Hz, which is twice as high as the meshing frequency. To monitor the vibration and noise generated by linear switched reluctance actuators (LSRA) during operation, Salvado et al. [47] developed a distributed vibration and noise analysis and monitoring system based on an intelligent sensor (IS) module. The IS module is connected to an accelerometer and placed on different mechanical parts of the LSRA structure. Voltage and current signals are the most effective way to characterize the state information of power systems such as microgrids [48]. In the study of Yılmaz et al. [49], the power quality disturbance signals (PQDs) of PV microgrids were automatically detected with the help of voltage signals only. Jiang et al. [50] developed a new data-driven probabilistic fault location method for power distribution systems that jointly uses historical fault location data and real-time or near real-time alarms from multiple sensors for probabilistic fault location. Supervisory Control and Data Acquisition (SCADA) sensors in the feeder circuits were used to detect fault currents. The digital relay and Intelligent Electric Devices (IEDs) were used to obtain the estimated fault distance.
To facilitate the real-time processing of multi-source heterogeneous data collected by multiple sensors, the researchers also introduced wireless sensor networks and data fusion technologies. Moradi et al. [51] proposed a multi-sensor data fusion method including strain and vibration sensors for wind turbine set blade stress-level detection and crack detection using microelectromechanical systems (MEMS) sensors. To address the problem that the fault detection results of vibration signals from a single sensor may be unreliable and unstable, Liu et al. [52] proposed a correlated vector machine intelligent multi-sensor data fusion method based on an ant colony optimization algorithm and successfully used the collected eight channels of vibration data for gearbox fault detection. Cheng et al. [53] applied a wireless sensor network constructed using LoRa wireless communication technology to motor condition monitoring and achieved real-time acquisition of voltage, current, speed, position, and other status information of motors in a cluster system. Chao et al. [54] collected power generation data of PV module arrays under different solar radiation levels, module temperature, and fault conditions through a ZigBee wireless sensor network and developed a portable PV power system fault diagnostic instrument. The ZigBee wireless sensor network transmitter improves real-time fault diagnosis and allows remote fault diagnosis.

Non-Invasive Data Acquisition Methods
Intrusive acquisition methods usually affect the balance of machine motion to varying degrees. Non-intrusive data acquisition techniques have emerged to avoid interference with equipment or system status and to obtain the most realistic status information. In IIoT and CPS environments, real-time data acquisition can be performed at different spatial scales and is widely used in smart grids, medical monitoring, and industrial process control systems. Gong et al. [55] broke the limitation of vibration sensors to detect early faults in rotating machinery at ultrasonic frequencies (20-60 kHz) via acoustic signal acquisition through a non-contact acoustic sensor. In a study by Gupta et al. [56] on real-time monitoring of pharmaceutical powder processing, the near-infrared (NIR) spectroscopy sensor Turbido OFS-12S-120H was used to measure ribbon density and moisture content. They are also conducting some studies on non-invasive sensing techniques based on microwaves and X-rays. In addition to specific sensors, researchers have devised indirect, non-invasive data acquisition methods. Irfan et al. [57] proposed a non-invasive instantaneous power analysis (IPA) method for condition monitoring of asynchronous motors. They considered that the motor mechanical vibration is related to the component of the stator current at the specific characteristic frequencies. The modulation of the air gap by mechanical vibrations causes the motor current to increase with mechanical vibrations. The effect of this modulation is manifested in the stator current through the stator inductance of the motor. IPA is used as an indirect and non-invasive method to obtain frequency modulation characteristics in the current and voltage spectrum of a motor and thus detect motor mechanical vibration data without the need for special sensors.
In addition, relying on the development of machine vision-related technologies, noninvasive detection of specific target faults can also be achieved by processing image information streams in real time [58]. Lim et al. [59] proposed a thermal-image-based fault diagnosis method by acquiring thermal images of rotating machinery using an infrared thermography camera while acquiring the vibration signal of the rotating machinery and considering the thermography signature of CIELab space as a pattern recognition paradigm. This method shows a better performance than vibration analysis in diagnosing early problems of stator windings. Sun et al. [60] proposed a vision-based approach to fault diagnosis by extracting image datasets from videos to represent the normal and fault behavior of a vibrating mechanical system. A Phantom Miro C110 high-speed camera with a resolution of 1280 × 800 and a frame rate of 500 fps in grayscale mode was used to collect images about the state of the machine and use them as input to a deep learning model.

RTFD Methods
According to the different fault diagnosis bases such as equipment signals, process variables, and semantic data, and the need for independent extraction of figurative feature information, this work classifies current RTFD methods into three categories: methods based on independent feature extraction, methods based on "end-to-end" neural networks, and methods based on qualitative knowledge reasoning. A detailed description of these methods and their research cases will be shown in the following subsections.

Methods Based on Independent Feature Extraction
The idea of the methods based on independent feature extraction is to extract feature information of practical significance from the monitoring data and to establish the relationship between features and states for fault diagnosis through the classification of features. Extracting and classifying the characteristic information characterizing normal or faulty states, respectively, is the main step of the method. When there is a need to reduce noise or computational complexity, dimensionality reduction techniques such as PCA, LDA, and Relief can be used to compress features and eliminate redundant information [61]. The reduced-dimensional feature components help to build higher-performance models. Currently, the application of independent feature extraction-based methods in fault diagnosis is very widespread, especially for vibration and acoustic signals of mechanical equipment [62]. A detailed description of the two main steps in this method-feature extraction and feature classification-is given in Figure 5.  Step 1: Feature Extraction Feature extraction is the process of extracting identifiable non-redundant feature information that characterizes various states from the raw data. The extracted features have obvious physical significance or statistical significance. As the most critical step of fault diagnosis, feature extraction is the basis for further detection of fault occurrence and identification of fault types, which directly affects the accuracy of diagnosis results [63]. The methods of feature extraction vary for different fault diagnosis application scenarios.

Methods based on independent feature extraction
In machine condition monitoring based on signal data, features are usually extracted using signal analysis methods, which can be in the time domain, frequency domain, or a combination of both [64]. In plant-wide process monitoring based on process variables, multivariate statistical analysis is the standard method for extracting characteristics of statistical values, such as principal component analysis (PCA), partial least squares (PLS), and independent component analysis (ICA) [65].
(1) Feature extraction of machine state data via signal analysis For mechanical equipment, signal analysis is the most effective feature extraction technique. It can extract proper and de-noised fault features of the original signal and clearly show the failure pattern on time, frequency, or time-frequency domains. The corresponding characteristic indexes mainly include amplitude, kurtosis, power spectral density, etc. Typical methods are spectral analysis, Fourier transforms [66], wavelet transform [67], Stransform, empirical modal decomposition [68], and Hilbert-Huang transform (HHT). In a study by Chung et al. [69], Blockchain Network Based Topic Mining Process for Cognitive Manufacturing was investigated. They used a short-term Fourier transform algorithm to perform signal processing on the information collected by various sensors to analyze the state information of equipment and human motion. Zhu et al. [70] extracted features based on the energy of each frequency component in the sensor signal. They used two methods-wavelet packet decomposition based on HAAR wavelet basis and empirical mode decomposition-to decompose the signal waveform step by step to generate a series of data sequences with different feature scales as the intrinsic mode function and then extracted the variance contribution and modal energy value of each intrinsic mode function as the feature vector for pattern recognition. Zhang et al. [71] used the blower as a monitoring object and preprocessed the signal collected by the sensor using filtering, denoising, and compression methods. The standard deviation of wavelet coefficients was extracted from the processed equipment history signal as features. Then, the new principal features from the original features are extracted using PCA as input to train the neural network. Gashteroodkhani et al. [72] proposed intelligent fault detection and classification method for multi-distributed power microgrids. The method uses a time-time transform to extract energy, standard deviation, and median absolute deviation from TT-matrix diagonal and TT-contours of current samples to calculate fault detection and classification features. Tonelli-Neto et al. [73] applied multi-resolution analysis to the discrete wavelet transform for signal feature extraction, analyzing feeder current signals at different resolution levels using multiple filters. Liu et al. [52] used ensemble empirical mode decomposition (EEMD) to preprocess the signal to eliminate the effects of noise and other uncorrelated signals in order to effectively extract the fault features from the non-linear, non-smooth raw vibration signal. There are 27 eigenparameters selected in the eigenmode function of each decomposition. Fourteen time-domain statistical features and thirteen frequency-domain statistical features were extracted separately. To further eliminate feature redundancy and improve classification accuracy, the distance evaluation technique is employed to select dominant features as input of the relevance vector machine based on an ant colony optimization algorithm. Zhong et al. [46] proposed a data-driven real-time fault diagnosis method for wind turbine gearbox systems. The method integrates EEMD and HHT for fault feature extraction. They used EEMD to eliminate the mode mixing problem and proposed a combination of energy mode calculation and time-domain statistical analysis to extract fault features and reduce the feature size to improve computational efficiency. Then, the dimension vector was constructed by the intrinsic mode function energy, time-domain statistical features, and the maximum HHT edge spectrum.
(2) Feature extraction of plant-wide process data by multivariate statistical analysis For plant-wide industrial processes containing a large amount of variable data, the main method of feature extraction is to use multivariate statistical analysis methods such as extended PCA and linear discriminant analysis to extract statistical value features using statistical knowledge [74][75][76][77]. In the work of Gupta et al. [56], a three-stage dual orthogonal wavelet was chosen to denoise the data, and statistical quantitative feature extraction was performed using PCA, combined with Hotelling T2 and Q statistics for fault detection and identification. Xia et al. [78] proposed a real-time fault detection and process control method based on multi-channel sensor data fusion. The method uses uncorrelated multilinear discriminant analysis (UMLDA) to extract features from multi-channel sensor data. It combines the extracted features with multivariate control charts to achieve real-time fault detection and process control for multi-process forging processes. UMLDA is a supervised multilinear feature extractor that directly processes multidimensional data. It considers class information when extracting features and extracts unrelated discriminative features through tensor-vector projection. Kim et al. [79] proposed a fault detection method capable of diagnosing abnormalities in equipment components in real time. First, data normalization is performed on the collected normal and abnormal state vibration data. Then the vibration signal is segmented using the Hamming window function, and the signal is denoised using the inverse spectral transform to enhance the intrinsic characteristics of the vibration signal. After preprocessing the data, ten statistical condition indicators, such as root mean squared, average, effective value, and peak to peak, are extracted and used to train the feature classification model.
Step 2: Feature Classification Mapping the recognition results of features to process variables or machine states is the feature classification process. The main step of the process is to build an effective classifier model based on the available feature information and classify the extracted features according to the failure modes to obtain diagnostic results. Fault diagnosis can be regarded as a pattern classification problem in essence. As a powerful pattern recognition tool, the application of AI in fault diagnosis has attracted much attention from many researchers. There are a series of traditional machine learning classification algorithms, such as k-nearest neighbor (k-NN), artificial neural networks (ANN), support vector machines (SVM), and decision trees [80][81][82]. In recent years, deep learning has seen increasing use in fault diagnosis tasks with better real-time and generalization capabilities [83][84][85][86].
(1) Machine learning classification algorithms ANN can be viewed as a learning machine composed of a large number of simple computational units (neuron nodes) that are interconnected. The excitation function, connection weights, and network structure of the network can be adapted to the actual problem or combined with other algorithms to make the network achieve a specific function. Zhang et al. [71] proposed a new approach to rotating machinery fault diagnosis combining wavelet transform, PCA, and ANN. The main fault features extracted from real-time signals are used as inputs for ANN training. The trained neural network can predict the status and degradation of components and machines and make a diagnosis of components and machines with faults. Tonelli-Neto et al. [73] normalized the energy function vector of current information as a feature indicator to input a multilayer fuzzy artificial neural network (FANN) to identify the state information of distribution feeders. The FANN consists of a pair of adaptive resonance theory modules and associative memory modules that finally output diagnostic information with the help of a voting scheme. Zhong et al. [46] proposed a pairwise-coupled sparse Bayesian extreme learning machine (PC-SBELM) for fast fault identification of a real-time gearbox monitoring system. The classifier uses the Sparse Bayesian learning (SBL) algorithm to overcome the drawbacks in the extreme learning machine based on feedforward neural networks. It can generate smaller classification models and identify single and simultaneous faults more efficiently and accurately.
General-purpose supervised machine learning algorithms such as SVM are powerful tools for feature classification and statistical analysis. SVMs have inner product kernels and are able to classify linearly indistinguishable cases by mapping to higher dimensions. Abbas et al. [87] proposed an SVM classifier-based sensor fault detection method using a dataset containing 42 normal samples and 25 faulty samples for training. Wang et al. [88] introduced a hybrid fault diagnosis method that uses SVM and an improved particle swarm optimization (PSO) algorithm. The radial basis function (RBF) was used to build the SVM model. For the problem of selecting hyperparameters such as kernel function width and penalty factor, they proposed an improved PSO to ensure the classification efficiency and accuracy of SVM. In addition, Dou et al. [89] also introduced the Particle Swarm Optimization optimized Support Vector Machine (PSO-SVM) algorithm. Chang et al. [90] developed a SVM-based automatic diagnostic procedure to characterize the performance of mechanical molds. They use sensors to measure the contact forces during the stamping of steel parts. SVM extracts feature from the obtained waveforms and spectra for constructing hyperplanes to classify the state of the punch or die as sharp or blunt to evaluate the mechanical properties of the slotting machine.
Typical binary classification methods such as SVM and random forest require labeled normal and fault data for training diagnostic classification models. However, the reality is that the amount of industrial fault data that actually exists is very small. In contrast, the use of a class of classification methods based on distance metrics, which classify samples based on their similarity metrics, can be used without the limitation of the amount of fault data. For example, MD can construct Mahalanobis space represented by Mahalanobis Distance (MD), using only normal signal data, and then determine whether the new signal sample belongs to Mahalanobis space. Therefore, Kim et al. [79] constructed two MD-based one-class classification models, the Mahalanobis distance classifier (MDC) and the Mahalanobis-Taguchi System (MTS), to detect anomalous states and evaluate the extracted data.
In addition, Zhu et al. [70] extracted features from the output data collected by the sensors under various failure modes and then input them into a decision tree for training to build a fault classification recognition model. This decision-tree model determines the fault type based on the variance contribution of each data series under different fault performances.
(2) Deep learning classification networks For the purpose of effective classification of more complex features, some deep network models are also used in combination with signal analysis techniques to provide fault diagnosis methods based on independent feature extraction. Lee et al. [91] developed a CWT-CNN model to detect the mechanical fault in variable speed settings. They collected machine state data at different rotational speeds using a triaxial accelerometer, extracted time-frequency features through continuous wavelet transform with Morlet wavelets, and applied them to a convolutional neural network (CNN) model. The model takes the final output of all filters in the convolutional layer as the feature map, reduces the input dimensionality through segmental sampling in the pooling layer, and outputs the detection results through the fully connected layer. To achieve real-time and accurate fault diagnosis, Wang et al. [92] proposed an integrated diagnosis method based on impulse signals, deep confidence networks (DBN), and feature unification. For gearbox faults that often excite frequency resonance at a certain frequency, they used an optimized Morlet wavelet transform, cliff exponent, and adaptive soft threshold synthesis method to extract the impulse components from the original signal. Then, 17 time-domain features, 4 spectral frequency-domain features, and 4 envelope spectral frequency-domain features were extracted from the original signal and the pulse signal, respectively. The sensitivity of features to different faults was studied using probability density functions, and DBN learned these differences to identify fault types based on the input features.
As the most popular classification network, CNN architecture is widely used in deep learning-based fault diagnosis research through visual image feature extraction. Xia et al. [113] applied deep learning to visual monitoring of wire-arc additive manufacturing to diagnose different process abnormalities, including humping, spattering and robot suspension. They used CNN to learn the features of melt pool images and evaluated the classification performance of several representative CNN architectures, and obtained good classification results. Li et al. [114] developed a lightweight CNN structure called WearNet to realize surface scratch detection of metal-forming processes. The network can gradually extract and learn category features with good label recognition capability. Their research ensures excellent classification accuracy while simplifying the network structure as much as possible, with advantages in response speed.

Methods Based on "End-to-End" Neural Networks
With the dramatic increase in the amount of data in industrial systems, traditional machine learning methods are now struggling to meet the requirements. With their powerful feature learning and pattern recognition capabilities, neural networks can perform the entire abstract feature learning and classification process in an "end-to-end" manner without needing to perform independent feature extraction. The principle of "end-to-end" neural network-based fault diagnosis is as follows: first, the network automatically learns deep abstract features of a large amount of input data layer by layer, then the network is trained according to the mapping of feature attributes to fault patterns. Finally, the trained network is used for fault diagnosis of new monitoring data. Some deep hierarchical networks are even capable of taking source data directly as input and monitoring it in real time. This subsection classifies this class of methods into single-type network models and multi-class network fusion models based on the type of network, as shown in Figure 6. In addition, researchers can further enhance the fault diagnosis performance of the model by introducing optimization algorithms to optimize the network structure and parameters.
Real-time processing of source data directly, automatic learning of fault features for classification and identification Methods based on "end-to-end" neural networks  (1) Single-type network model Some of the more representative deep neural networks in fault diagnosis are CNN, DBN, stacked autoencoder network, and recurrent neural network (RNN) [93,94]. Eren et al. [95] used a compact adaptive 1D CNN classifier for real-time bearing fault diagnosis. It can directly take the source data as input and efficiently learn optimal features with the proper training. In the supervised training phase, CNN's convolutional filter is optimized by back-propagating the classification error, and the model can automatically learn highly discriminative features from the input data. The overall fault classification accuracy of this method for the IMS and CWRU bearing datasets was 93.9% and 93.2%, respectively. Xu et al. [96] proposed a two-phase digital-twin-assisted fault diagnosis (DFDD) method using deep transfer learning and constructed a deep neural network-based diagnosis model. Without a priori knowledge, representative features were extracted from a large amount of unlabeled simulation data, and PCA was used to reduce the features to three dimensions. Tests showed that the model had good feature clustering and was able to separate most features for different health conditions. Among the many neural network parameter optimization algorithms, PSO is one of the most commonly used algorithms.
Zhang et al. [97] applied a spectrometer to capture high-dimensional optical signals and feed them into a stacked auto-encoder-based data-driven framework for real-time detection of high-power disc laser welding defects. They used PSO to optimize the proposed data-driven framework to enhance its ability to extract representative features from the original high-dimensional signal by obtaining globally optimal parameters, such as kernel size, weight coefficients of sparse terms, and weight decay coefficients. To address the problem of the poor real-time performance of CNN in directly processing one-dimensional time series signals of aero engines for fault diagnosis, Li et al. [98] combined improved pattern gradient spectral entropy (IPGSE) and CNN to propose an intelligent fault diagnosis scheme for aero engine control system sensors. In order to solve the problem of insufficient adaptation of the algorithm, the PGSE empirical selection parameters were improved by using PSO to adaptively optimize the scale factor λ so that the obtained spectral entropy map can better match the classification made by the CNN model. In addition, Wang et al. [99] proposed a lightweight convolutional neural network (LCNN) for the intelligent diagnosis of bearing faults. They used deep separable convolution to construct the LCNN structure via inverse residual structure and linear bottleneck layer operation. Then, a novel decomposed Hierarchical Search Space decomposed the model into different blocks and searched each block for operations and block-block connectivity relationships to automatically search for the optimal LCNN for bearing fault diagnosis in the IIoT environment. The model can meet the requirements of a small number of parameters, small storage space, and high accuracy to a large extent.
(2) Multi-type network fusion model Currently, some studies have used models with a multi-class neural network fusion to compensate for eigenmodes, capture dynamic features, or achieve further fusion of deep features. Such models are able to describe fault features more accurately and improve fault diagnosis accuracy while reducing computational costs. Liu et al. [100] proposed a dynamic deep learning algorithm based on incremental compensation (ICDDL) applied to fault diagnosis and prediction of bearing operating conditions. The method used a denoising autoencoder (DAE) to extract the characteristic patterns of the newly generated data, and then the weights of each pattern were dynamically adjusted according to the difference in similarity between the new patterns and the historical failure patterns. Finally, the SVM algorithm was used to supervise the classification of weighted patterns, and the BP algorithm was used to fine-tune the whole network model according to the error of the model. The method provides dynamic compensation based on the importance of each eigenmode over time and has more accurate fault diagnosis accuracy. The proposed ICDDL method can accomplish both real-time extraction of bearing equipment status features and reliable classification of failure modes.
Shenfield et al. [45] combined elements of CNN with an RNN path to propose a novel dual-path recurrent neural network with a wide first kernel and deep convolutional neural network (RNN-WDCNN) pathway for diagnosing rolling bearing faults in electromechanical drive systems. The model works directly on the raw temporal data, avoiding the need for manual feature extraction or noise removal, and exhibits good robustness with respect to both environmental noise and changes in operating conditions. In the dual-path architecture of the model, the deep convolutional path consists of five convolutional stages for feature extraction and a dimensionality reduction stage for feature compression. The RNN block is mainly used to capture dynamic temporal features spanning many time steps and feed them into a convolutional path to learn and improve the final classification results. Finally, the deep convolutional paths are fed to the output classification layer together with the RNN paths, and the probability distributions of the bearing fault classes are output by softmax. Xue et al. [101] proposed a two-stream feature fusion convolutional neural network (TSFFCNN) to achieve real-time diagnosis of bearing faults, using 1D-CNN and 2D-CNN to construct a two-channel network model. Two parallel convolutional and pooling layers are used to extract the 1D and 2D features of the normalized and reconstructed signals. The feature fusion strategy is used to fuse the two feature streams smoothly. Finally, the classification is performed using (PSO-SVM). The model is able to identify failure modes more accurately from vibration signals in a shorter iteration time.
Yang et al. [115] proposed a fault detection method based on a teacher-student uncertainty autoencoder (TSAUAE) to monitor process-relevant and quality-relevant faults.
In this method, the student network extracts representational features and the teacher network detects faults. Representation evaluation block (REB) is proposed to evaluate and reduce the feature difference between the teacher and student networks. Wang et al. [102] proposed a high real-time Optimal Transport-Capsule Network (OT-Caps) fault diagnosis model. The model expands the one-dimensional neuron in the traditional CNN into the multidimensional neuron according to the characteristics of the capsule network, which enhances the data mining capability and fault feature storage capability of the deep network. Based on the traditional capsule network algorithm, they introduce an auxiliary loss to improve the network architecture during the offline training process and introduce the optimal transmission theory into the auxiliary loss to accurately describe the error distribution of fault characteristics. By improving the network structure, this capsule network can directly process one-dimensional raw vibration signals, reducing the complexity of data processing. The model ensures high accuracy, early prediction, and relocatability of fault diagnosis while reducing the computational cost of multidimensional neuronal networks.

Methods Based on Qualitative Knowledge Reasoning
Recently, fuzzy logic, case-based reasoning (CBR), and hybrid methods based on data and knowledge have been popular techniques used for fault diagnosis and fault prognosis [103,104]. Methods based on qualitative knowledge reasoning usually use a combination of qualitative and quantitative analysis, establish inference rules with product knowledge, and use data as an auxiliary basis to achieve real-time reasoning about fault information. The brief process of this kind of method is as follows: first, the relevant representation of failure modes is made using a priori qualitative knowledge or data. Then, the fault information matching model and inference rules are established to perform realtime fault inference on the system state. As shown in Figure 7, the methods discussed in this subsection mainly include causal interpretation, logical inference, fuzzy rules, and Bayesian networks.  (1) Causal explanation For fault diagnosis tasks of control systems containing multiple complex operating units, qualitative modeling based on cause-effect interpretation is a practical approach when there are not enough direct data or failure mode models for quantitative modeling. In a study by Rathinasabapathy et al. [105], a qualitative modeling approach for process diagnosis called causal link assessment (CLA) was developed. CLA uses a plant model structure called Functional Representation (FR) to generate all causally linked device failure modes matching a plant snapshot at a certain time step by matching sensor characteristics to device states in a progressive process performed device by device. The CLA approach can easily solve multiple simultaneous faults and provide helpful diagnostic clues for those not documented in the model. However, it will be limited by the prior knowledge of the FR model. Hamdan et al. [106] developed a real-time exceptional events management (EEM) framework integrating signed directed graph (SDG) and trend analysis (TA) methods for fault diagnosis. First, the low-resolution search was performed based on the initial response tables (IRT) generated from the SDG representation of the causal relationships of variables. The SDG and associated IRT for known faults were generated based on a priori knowledge of the process. They used a moving window to calculate the average process variable value and implemented fault detection by testing the deviation of the variable from the tolerance range or alarm limits. Once an anomaly was detected, the motion window is frozen and diagnostic logic was initiated to determine possible faults. They then used a higher-resolution but more time-consuming diagnostic method based on qualitative trend comparisons, comparing the deviation trends of the abnormal variables observed during the process with the fault trends in the information base to derive the most likely faults. The framework considers the simultaneous occurrence of different abnormal events and develops a multi-fault identification protocol that detects, diagnoses, and provides mitigation strategies for multiple simultaneous abnormal events within 10 s.
Yang et al. [26] proposed a process monitoring method based on Causal graphical modeling (CGM) and a multiple model framework for detecting blockages in a steel continuous casting process. The proposed method can suppress the effects of unobservable disturbances and improve the robustness of blockage detection by using a smart manufacturing platform, and the large amount of process data collected by CGM developed based on field knowledge. In a study by Niu et al. [107], a new scheme for fault characterization modeling with an integrated Bond graph (BG) was proposed in order to achieve robust real-time monitoring and diagnosis of dynamic systems in multiple energy domains, which can track the propagation and transformation of energy effects in the model through causal paths. In addition, a complete fault detection and isolation framework were established by combining the novel BG model, residual generation empirical estimation based on multivariate state estimation technique, and threshold monitoring based on a sequential probability ratio test.
(2) Logic or knowledge reasoning Intelligent fault diagnosis methods based on logic or knowledge inference can reveal the internal logic of data and help address the growing need for real-time online processing of industrial data. Wang et al. [108] proposed a fault diagnosis scheme with knowledge inference and semantic data integration to meet real-time data processing requirements. They define a smart factory as a cloud-assisted self-organizing manufacturing system. The inference engine performs fault diagnosis and statistical analysis using real-time collection and processing of real-time semantic data from the production process. Specifically, the researchers used the unified modeling language (UML) and the JUDE community to build information models of manufacturing systems. Based on the above, Protégé software was used to build the ontology model, and UaModeler was used to build the OPC UA model. The OPC UA server provides semantic data to the cloud, and the ontology model combined with it is cyclically processed by Apache Jena. Semantic web rule language was used to describe fault detection and statistical analysis rules. Chen et al. [109] proposed an intelligent fault diagnosis method for power electronic converters based on generalized logic according to the correlation between faults and basic measurements. The method successfully diagnoses different faults online using combinatorial logic and fuzzy logic. In their study, combinatorial logic was shown to diagnose specific faults and perform the system recovery using redundant or standby components. Fuzzy logic was used to diagnose multiple faults, providing faster fault diagnosis times.
Fault detection and classification using a fuzzy rule-based approach has the following main features: low computational effort, high robustness, reliability, and efficiency. It also has the potential to merge new failures that are different from the existing model. Tonelli-Neto et al. [73] designed a fuzzy inference system (FIS) to provide an output scalar to the feeder current vector through the evolution of fuzzy rules. Then, the FIS fuzzy set verified its affiliation to identify the operating state of the feeder. They used the Dempster-Shafer evidence theory based on probabilistic reasoning and evidence combination to aggregate state data into a probability value and a reliability value, thereby simplifying the diagnostic process and minimizing the stress of manual decision-making. In addition, as an important subset of qualitative knowledge reasoning, fuzzy logic methods are also widely used for fault prediction modeling due to their good representation and evaluation abilities. Li et al. [116] combined fuzzy logic modeling with the improved quantumbehaved particle swarm optimization algorithm to predict the evolution of sheet metal surface scratching. They used ball-on-disk experiments to evaluate the contribution of specific fuzzy variables to surface damage. In order to improve the prediction accuracy, they refined the fuzzy model by optimizing the membership functions of the fuzzy variables. Padhi et al. [117] proposed a fuzzy inference system combined with Taguchi's philosophy to optimize the fused deposition modeling (FDM) process parameters and further proposed a prediction model to evaluate the dimensional accuracy of FDM-fabricated parts under various operating conditions. They used fuzzy logic decision-making to optimize multiple performance characteristics into a single performance characteristic index.

(3) Other related approaches
In general, the causal relationship between the failure and the cause of a complex system is not always certain. Bayesian networks provide an intuitive causal framework capable of handling uncertainty inference problems with multiple sources of data and knowledge. Chen et al. [110] constructed a hierarchical Bayesian network based on failure mode and impact analysis and system composition, aiming to reduce uncertainty and improve troubleshooting efficiency in the diagnosis of complex aircraft systems by considering design knowledge and real-time monitoring information. The network model uses failure mode and effect analysis in security analysis as a priori knowledge and real-time monitoring events as observed information to show the cause-effect relationship of complex systems in a cause-mode-effect manner. In addition, T-S fuzzy dynamic modeling techniques are becoming popular in the study of non-linear industrial processes. Li et al. [118] proposed optimal observer-based fault detection and estimation approaches for T-S fuzzy systems. They addressed the fault detection issue of fuzzy systems in the time-varying framework for the first time and investigated the fault estimation scheme from the least squares optimization viewpoint.
Moreover, many critical problems which play a key role in real industrial fault detection and diagnosis, such as the imbalance or loss fault data, as well as non-Gaussian or heavy tailed distributional data. Wang et al. [111] proposed a bilayer convolutional transfer learning neural network to detect the agglomeration fault of an actual polyethylene process when there are far fewer faulty data than normal data. Considering the strong non-stationarity of time series from a real system, several common statistical indices including autocorrelation function, Hurst exponent, probability density distribution, and Alpha stable distribution were analyzed to explore the hidden fractional feature, which helped to improve the generalization ability of the neural network model. The original acquisition data directly from the industrial sensors should be preprocessed and depicted to find the statistical characteristics before the data learning for any purpose.

Discussion, Challenges, and Future Trends
The details above show that RTFD has been widely used in all aspects of industrial smart manufacturing. The previous section reviewed the research results of RTFD in various application scenarios in detail. This section further discusses some of the research challenges and extensions of this topic in combination with the topic feature mining in Section 2.3. The following content hopes to inspire readers to locate potential trends in the future development of this topic.

Data-Acquisition-Related Issues
As the primary medium for data acquisition, the sensor determines the validity of the collected data from the root. If the acquisition of diagnostic information is inaccurate, it will inevitably directly affect the correctness of data processing and lose the diagnostic value of the model. In addition, if the diagnostic data are not adequately collected, this will inevitably lead to a waste of resources and hinder the improvement of the model diagnostic performance to a certain extent. Therefore, the following two issues require adequate research attention.
(1) How to ensure the accuracy of the data collected by the sensor. Unreliable sensors or unsynchronized sampling rates can lead to unexpected loss of observations or differences in data dimensionality, which prevents accurate condition monitoring and affects the occurrence of diagnostic decisions. Only a little research has been carried out on the status detection and self-diagnosis of the sensors themselves.
(2) How to more fully capture machine status information. For the fault diagnosis of mechanical equipment, the selection of the type of observation signal is also crucial to the diagnosis results. In the course of the article review, it was found that for the same vibration sensor measurement data applied to different signal analysis methods, the diagnostic results vary significantly. The sensor's performance in different frequency ranges may also be a major factor, excluding the influence of different methods. In the future, the relative characteristics of different types of sensors in different frequency bands can be advantageous,signaling different frequency bands for targeted, integrated processing.

Big Data Application Issues
(1) Data optimization processing issues As a critical premise of data-driven work, data preprocessing has been challenging fault diagnosis. In the current RTFD effort, the volume of data collected is growing rapidly, along with an increase in low-quality data that need more accuracy and completion. Effective data cleaning is essential to improve the source data quality, which will directly affect the performance of the fault-handling model. Targeting the removal of redundant information, compressing data, and improving the validity of sample data will be a long-term challenge for the field. In addition, effective data fusion techniques must be developed to overcome the differences in sampling rates, periods, and weights and to fuse heterogeneous data from multiple sources collected by various sensors. Data fusion can provide comprehensive global fault characteristics for fault diagnosis and obtain more accurate diagnosis results.
(2) Building a big data ecosystem With the further development of intelligent manufacturing, the amount of data to be faced in the industrial field is enormous. Enhancing the adequate management of monitoring of the data stream will greatly improve the real-time fault diagnosis. The architecture of cloud computing is being mentioned more often. With the assistance of cloud computing technology, data flow management technologies such as data lake and database are introduced to establish a big data ecosystem for online processing, which is of great help to both real-time fault diagnosis and fault prognosis. Online data reflect not only the latest changes in the system's current state but also the operational process's cumulative correlation. In addition, data mapping technologies such as the digital twin can be introduced to improve the efficiency and real time of FD by using dual diagnosis in virtual and physical space. However, the data security issues brought about simultaneously cannot be ignored. Transformation protocols and data encryption methods can be utilized to address data and network security issues.

Machine Fault Diagnosis for Variable Operating Conditions
In practical applications, for the fault diagnosis of mechanical equipment, especially rotating machinery, it is necessary to develop a diagnostic model applicable to variable machine running conditions. Variable machine operating conditions mainly refer to different motor loads, bearing speeds, and ambient noise. For some static feature classification methods, their diagnosis performance relies heavily on the extracted feature parameters, is limited to a small data set, and is only applicable to specific equipment operating conditions.
In other words, a fault model built with features extracted under one operating condition does not necessarily achieve the same ideal diagnostic effect under other different operating conditions. Many RTFD methods in the literature are not applicable to different operating conditions, which raises new requirements for future research.

Hybrid Fault Diagnosis of Plant-Wide Industrial Systems
Industrial systems will inevitably evolve towards complex systems with high-dimensional, strongly non-linear, strongly coupled, and large time delays, for which it is difficult to build an accurate model. How to manage the process data of large and complex systems and provide troubleshooting for reliable operation of complex systems accordingly is a crucial issue. The complexity of the actual system makes it impossible to rely on a single troubleshooting method to achieve the desired maintenance results. Therefore, it is a highly relevant research direction to consider fusing the different characteristics of different algorithms and developing a diagnosis method for hybrid faults applicable to complex systems.
On the other hand, the future diagnosis of hybrid faults requires data and knowledge fusion-driven decision making. With the increasing complexity of modern plant-wide processes, FD has become more challenging than ever. Qualitative knowledge modeling enables excellent diagnosis of special systems or specific fault types with little direct data volume. The application of causality and empirical knowledge helps design and optimize data modeling, improving diagnostic performance while reducing model complexity and computational cost. In addition, quantitative data modeling helps to avoid the uniqueness of diagnostic results of qualitative models and further improves the usefulness of qualitative methods. An integrated modeling strategy that introduces qualitative analysis into quantitative data processing is an effective way to solve complex FD tasks. The integration of the two in fault diagnosis will be very promising research work.

Interpretability and Robustness of Deep-Learning-Based FD Models
The problems of poor interpretability and robustness caused by the "black box" character and complex structure of deep neural networks have seriously affected the reliability and practicality of the models.
(1) Interpretability. For the many needs of practical applications, it is not enough to know the inputs and outputs of a deep learning-based FD model, the interpretation of its results is equally necessary. Research on the interpretability of deep learning can guide the construction of DNN-based diagnostic models with optimal architectures and enable clear targeted improvements. It is challenging to work out generalized methods to explain deep-learning-based models. Creating specific interpretations of the model for specific questions and data may be the first step for future research. Some researchers have been focusing on the deep structure of traditional statistical monitoring to implement hierarchical latent variables extraction in the trade-off between model interpretability and model performance [119].
(2) Robustness. The effectiveness of deep-learning-based FD models relies on highquality training data. DNN models can easily exhibit vulnerability if there are adversarial interference attacks in the input data. It is important to conduct research to simultaneously make DNN-based diagnostic models with satisfactory performance and good robustness. Currently, adversarial training is the most successful method to obtain robust neural networks. In fact, the research of robustness in deep learning is usually associated with adversarial sample attack and defense. However, the robust overfitting problem that affects adversarial training performance requires further attention from researchers.

Conclusions
This paper provides a detailed review of the most relevant techniques and methods for RTFD from 2010 to 2021. The research and discussion are derived from scientific literature search and text mining.
First, this paper presents a preliminary analysis of the collected literature using a text mining and visual analysis tool, ITGInsight. Then, after a careful reading, this paper categorizes the current RTFD methods from a new perspective and discusses their methodological applications separately. Finally, through topic mining and in-depth analysis of the literature, this paper provides prospects in the following five aspects in response to the challenges faced by RTFD, aiming to provide valuable guidance for future research in this field.

•
The self-diagnosis and characteristics in different frequency bands of sensors play a key role in the accuracy and sufficiency of information acquisition; • Effective data preprocessing techniques should be developed to achieve fusion of heterogeneous data from multiple sources; • Fault diagnosis techniques under variable working condition are important due to the changing production needs; • Hybrid fault diagnosis methods should be considered in order to be applied to the complex plant-wide industrial system; • The interpretability and robustness should be discussed to improve the reliability of deep-learning-based fault diagnosis models.
In particular, this paper points out the development trend of real-time intelligent fault diagnosis in modern industrial smart manufacturing. With the surge of industrial data, the construction of big data ecosystems should be considered to support the monitoring and management of real-time data flow for online decision-making and real-time fault diagnosis.