Deep Transfer Learning Models for Industrial Fault Diagnosis Using Vibration and Acoustic Sensors Data: A Review

: In order to evaluate ﬁnal quality, nondestructive testing techniques for ﬁnding bearing ﬂaws have grown in favor. The precision of image processing-based vision-based technology has greatly improved for defect identiﬁcation, inspection, and classiﬁcation. Deep Transfer Learning (DTL), a kind of machine learning, combines the superiority of Transfer Learning (TL) for knowledge transfer with the beneﬁts of Deep Learning (DL) for feature representation. As a result, the discipline of Intelligent Fault Diagnosis has extensively developed and researched DTL approaches. They can improve the robustness, reliability, and usefulness of DL-based fault diagnosis techniques (IFD). IFD has been the subject of several thorough and excellent studies, although most of them have appraised important research from an algorithmic standpoint, neglecting real-world applications. DTL-based IFD strategies have also not yet undergone a full evaluation. It is necessary and imperative to go through the relevant DTL-based IFD publications in light of this. Readers will be able to grasp the most cutting-edge concepts and develop practical solutions to any IFD challenges they may encounter by doing this. The theory behind DTL is brieﬂy discussed before describing how transfer learning algorithms may be included into deep learning models. This research study looks at a number of vision-based methods for defect detection and identiﬁcation utilizing vibration acoustic sensor data. The goal of this review is to assess where vision inspection system research is right now. In this respect, image processing as well as deep learning, machine learning, transfer learning, few-shot learning, and light-weight approach and its selection were explored. This review addresses the creation of defect classiﬁers and vision-based fault detection systems.


Introduction
Safety-critical systems include aeroengines, vehicle dynamics, chemical processes, manufacturing systems, electric machines, wind energy conversion systems, and industrial electronic equipment, among others. Industrial systems prone to process irregularities and component breakdowns prioritize reliability and safety. Defect-tolerant operation may prevent performance drops and harmful circumstances by immediately recognizing abnormalities or errors.
Any variation from the expected/usual/standard behavior of a distinctive attribute or parameter of the system is considered a defect [1]. When a system component is disconnected or a sensor fails to work properly (for example, when a sensor gets stuck at one specific value or has a change in the sensor scalar factor), it may lead to problems. This means that, in order to classify the faults, they are often divided into actuator, sensor, and plant faults (also known as component or parameter faults). These faults either interrupt the controller's action on the plant, or produce significant measurement errors, or directly change the dynamic input/output properties of the plant, resulting in a decrease in the system's performance and even system collapse [2]. Fault diagnostic is often used to monitor, find, and diagnose errors in a system utilizing the notion of redundancy, either hardware redundancy or software redundancy, with the goal of improving system dependability (or called analytical redundancy). Because the output signals from similar components with identical input signals can be compared, diagnostic choices may be made using a number of approaches such as limit checking, and majority voting, among others. In terms of reliability, hardware redundancy is costly, but it also adds weight and increases the amount of available space [3]. Hardware redundancy is required for important components, but not for the whole system because of the expense or complexity of installation when the space and/or weight are limited in terms of both size and weight. The analytical redundancy approach, whose schematic design may be illustrated in Figure 1, has been the major stream of fault diagnostic research since the 1980s, with the development of current control theory [4]. Assuming that the input and output data are consistent with the pre-existing knowledge of a healthy system for a controlled system, a fault diagnosis algorithm can be constructed using the input and output data in order to check for actuator, process/component, and sensor faults. A diagnostic decision can then be made using diagnostic logics. Due to external disturbances, inherent modeling inaccuracy, and the complexity of the system dynamics and control structure, analytical redundancy diagnostic techniques are more cost-effective than hardware redundancy approaches.
In today's industrial environment, it is critical to keep tabs on the health of machines. If these devices fail, they might result in huge financial losses and even put the lives of the individuals who work with them at risk. Machine health monitoring has always been in demand because it is necessary to maintain industrial equipment running at their peak performance and reliability [5,6]. With regard to rolling element bearings, their health problems may have a huge influence on the performance, stability and life duration of a rotating mechanism, for example by changing the diameters of the faults in various locations under different loads. Most often, a real-time vibration monitoring system is used while the rotating mechanism is operating [7,8].
Recent years have seen tremendous achievement in computer vision [9,10] and speech recognition [11,12] using deep learning approaches. The use of deep learning in machine health monitoring systems is not new. Auto encoders with three hidden layers were studied by Lu et al. in a comprehensive empirical investigation for defect detection of rotating machinery components [13]. Receptive input size and structure, deep structure, sparsity constraint, and denoising operation were all examined in their research. Instead of utilizing raw signals as inputs to acoustic emission (AE) models, some researchers employed frequency-domain characteristics as inputs. They entered time-series data from rotating equipment into SAE because frequency spectra may better distinguish the health issues of a rotating machine [14]. Ref. [15] Zhu and his colleagues suggested an SAE-based DNN for hydraulic pump problem diagnostics that employs Fourier transforms to create frequency characteristics. To train their DNN, Liu et al. employ a normalized STFT of a sound source as an input. Some studies [16,17] use SAE to combine statistical data from many domains, such as time, frequency, and time-frequency-domain.
Data with well-defined grid-like shapes such as 2D image data or 1D time series data may be processed using CNNs, which were initially suggested by LeCun [18]. CNNs have seen success in image recognition in the last several years. Several CNN architectures are available, including VGG-net, Res-net, and inception v4. Vibration signals in one dimension are not compatible with CNNs [19]. Using just 2 × 3 × 3 weights, all of these models can achieve a receptive area the same size as a 5 × 5 convolutional kernel using two subsequent 3 × 3 convolutional layers. Two 3-by-1 convolutional layers may provide a 5-by-1 receptive area for a one-dimensional, signal, albeit at the expense of two 3-by-3 weights. Convolutional layers 1-7 and 7-1 are utilized for inception V4; however, for 1D vibration signals, 7-1 convolutional operations are not possible [20,21]. From the foregoing, it may be deduced that the standard 2D CNN model may not be as effective on 1D signals. For 1D vibration signals, a new CNN model is thus required. Using CNN for fault diagnosis has been employed since 2015. The DFT of the normalized vibration signal collected by two accelerators placed perpen-dicular to one another and the outputs of four classification categories representing healthy bearing (HB), mildly inadequately lubricated bearing (MILB), and extremely inadequately lubricated bearing (EILB) were used by Janssens et al. to diagnose bearing health conditions [22]. Diagnosed bearing health conditions using the DFT of the normalized vibration signal collected by two perpendicularly placed accelerometers and the outputs of four classification categories representing healthy bearing (HB), mildly inadequately lubricated bearing (MILB), and extremely inadequately lubricated bearing (EILB). The input is the DFT of the normalized vibration signal, and the output is the DFT of the layer that is completely linked (200 neurons). Structural degradation can be diagnosed and localized using 1D CNNs [23]. For the experiment, thirty accelerometers were installed at thirty joints on the principal girders of a grandstand simulator. This paper proposes an adaptive 1D CNN architecture to integrate the feature extraction and learning (damage detection) phases of the raw accelerometer data. Using a 5-layer WDCNN model with large first-layer kernels and small subsequent kernels, as described in [24], it is feasible to diagnose bearing problems with a significant quantity of training data and high accuracy, even in noisy environments.
As a result, Deep Learning approaches are now being studied in the area of defect detection with the purpose of using a smaller number of training samples in order to learn more information and obtain better results than typical Machine Learning algorithms. Realworld circumstances are not consistent with this, since it is simple to gather big volumes of data in the industry as well as using data augmentation methods [25][26][27] to enhance the quantity of training samples in industry. Even if we use classical algorithms or deep learning algorithms to analyze the data, virtually all can achieve almost 100% accuracy. As a result, the algorithm's capacity to adapt in a real-world industrial setting has to be given additional consideration. The flexibility of an algorithm in the area of fault diagnostics may be measured in two ways: No matter how much the working load of a machine changes, the model trained under one loading may still be evaluated on data from another working load [28,29]. When testing on noisy data, a model trained on noise-free samples may attain a high accuracy rate [30,31] since noise is an inevitable issue. WDCNN's performance quickly declines without AdaBN, which needs statistical knowledge of all test data, in noisy situations. Even the state-of-the art DNN [14] fails to diagnose adequately in noisy environments. As a result, we may conclude that Deep Learning models have not yet come up with a solution to this issue.

Transfer Learning
Rotating equipment needs rolling element bearings to function. Bearing defects would impair the machine's performance and cause the whole rotary machine to break down, causing significant property damage, downtime, and injuries [32][33][34]. Four subcomponents make up a rolling element bearing: a rolling element, an inner race, an outer race, and a cage. Ninety percent of bearing failures are brought on by either the inner or outer race, while only ten percent are brought on by the cage or rolling element [35]. An emphasis on fault detection of rolling element bearings, particularly bearing race fault diagnostics, is necessary to increase the safety and dependability of mechanical equipment. The identification of bearing race faults at the fault site level is covered in this work. A single, isolated defect in a bearing subcomponent is sought to be identified.
For autonomous fault identification in complicated industrial processes, data-driven approaches starting with historical data are superior to well-developed model-based and signal-based techniques that need prior knowledge of models or signal patterns [36,37]. Three phases are commonly included in conventional data-driven techniques for issue detection: Data collection, human feature extraction and selection, and automated problem identification are the first three steps [38]. Despite the fact that conventional data-driven approaches use machine learning algorithms such as supporting vector machine (SVM), artificial neural network (ANN), and k-nearest neighbor (KNN) to accomplish automated diagnosis in defect detection procedures [39], feature extraction is still carried out by hand. Many feature extraction techniques have been proposed based on contemporary signal processing technologies, including those based on entropy, spectrum kurtosis (SK), and empirical mode decomposition (EMD) [40]. Despite the effectiveness of these methods, manually extracting features might be tiresome and time-consuming [41]. Recent research focuses on end-to-end intelligent diagnosis based on deep learning techniques capable of hierarchical and nonlinear representation learning. These techniques include convolutional neural networks (CNN)-based diagnosis methods, Restricted Boltzmann Machines (RBM)-based diagnosis methods, and Auto-encoder (AE) and its variants [42]. Both conventional machine learning and deep learning consider the distribution of the training and test sets to be the same. As a result, data relevant to diverse health issues that were collected using the same tools and working environments and used in several research investigations to train and assess diagnosis models. However, it is frequently challenging to obtain extensive historical fault data from the equipment to be diagnosed in advance in real-world diagnostic situations, particularly online diagnosis; only a limited number of fault data of specific equipment operating under specific circumstances are accessible [43,44]. The diagnostic model would be vulnerable to overfitting and challenging to generalize if it were developed with such a small dataset [45].
The motivations for solving the small sample problem can be roughly categorized into three groups: (1) generating new new data based on the existing data through data augmentation methods [46,47] or generation models [48,49]; (2) generating new data based on the existing data; and (3) generating new data based on the existing data. AE citeb49 and Generative Adversarial Networks (GAN) [49][50][51][52], (3) relying on the diagnosis knowledge learned from other datasets based on transfer learning [53][54][55] or meta learning [56][57][58], and (4) employing specific network structures or tricks to Inspiring by the second motivation, this study proposes A single point issue on the race of rolling element bearings often has a periodic effect, resulting in bearing characteristic frequencies (BCFs) in frequency domain and envelope characteristics in time domain in bearing vibration signals [59]. In actual vibration signals, however, in addition to these fault characteristics related to the bearing, there are numerous other complex signal components due to the influences of machine structure and numerous random factors, which are the specific characteristics of the equipment and working conditions. These components contribute to the complex distribution of real data by improving the vibration signal properties. When there is little available in-service data, the learning algorithm cannot learn the effective bearing failure characteristics, but rather other attributes that merely classify the unique training dataset, leading to a loss of generalization [60]. Theoretically disclosing the characteristics of a bearing under different operating conditions is made easier by physics-based modeling.
In the bearing dynamic model, only bearing-related components are considered, and several random and nonlinear aspects are simplified. Thus, bearing fault features are substantially more obvious in simulation signals than in real signals. In addition, it is straightforward to acquire adequate vibration data for bearings of different kinds working under a range of conditions and failure patterns by selecting simulation parameters intelligently. Motivated by this, the proposed method proposes to employ simulation data-acquired diagnostic knowledge to help in the diagnosis of real-world events [61].
In order to solve the aforementioned deficiency, this study aims to provide a comprehensive assessment of DTL for defect detection in industrial settings. In contrast to previous research that either emphasized classical machine learning, deep learning, or transfer learning for IFD approaches, this review article will emphasize the new machine learning paradigm of DTL. Second, in contrast to prior review papers, this one classifies DTL-based intelligent failure diagnosis (IFD) systems in the context of real industrial conditions. This categorization might enable engineers to identify the most suitable algorithms for their specific industrial needs. The bulk of previously published review articles focused on works released before 2020. As the IFD attracts the interest of both academic and industry researchers, many new studies have recently been published. Nevertheless, the bulk of the most current state-of-the-art DTL approaches was already included in this review study before it was submitted for publication in the journal.
The major contributions of this research are below: • In order to provide the reader with an algorithmic grasp of DTL, this article introduces the reader to the fundamental concepts and theories of DTL, including instance-based DTL, model-based DTL, and feature-based DTL. • This article discusses a crucial case of defect identification where the diagnostic model may be used to previously unexplored regions without knowledge of their data distributions. The alternative is more appropriate for real-world diagnostic tasks than conventional data-driven methods. • The well-designed approach can handle machinery defect diagnostic duties efficiently. An experimental study demonstrates the significance and superiority of each generalization aim.

Objectives and Organization
Based on these results, it is important to do a full evaluation of these updated and improved FDD approaches to make them more useful, effective, and efficient.

•
The primary focus of this study is the modeling of systems using both basic principles and signal form. In this FDD subject, it is essential to comprehend the fundamentals of data-driven methodologies, to define issues, and to provide views. • In order to provide academics and practitioners a thorough knowledge from both a theoretical and an application viewpoint, the second purpose is to undertake a systematic evaluation of the growing research effort, the so-called data-driven FDD approaches, for traction systems during the previous ten years. • On our third attempt, we attempted to classify into three categories. Fault detection Datasets, Deep Learning Based Papers, and Transfer Learning Based Papers are among the categories. • With an emphasis on real-world applications and contemporary data analysis tools, the ultimate objective is to provide research possibilities in data-driven FDD approaches for traction systems, as well as research challenges and future prospects.
The rest of the paper is organized based on Figure 1. Figure 1 is below. We have also added list of abbreviations in Table 1.

Background Study of Deep Learning
Researchers have spent a lot of time and money trying to find a solution to the issue of bearing fault diagnostics in rotating equipment. Knowledge-based approaches are able to solve the bearing diagnostic problem with high accuracy and performance because they can analyze sensor and actuator data in an efficient way. Deep learning (DL) is one of the machine learning algorithms that have the benefit of not relying on feature extraction and instead giving precise diagnosis. In order to deal with this problem, a new DL-based model for the identification and classification of motor bearing faults is described in this study. First, a signal-to-image conversion technique for converting time-domain signals into pictures is presented. It then uses a revolutionary deep residual learning (DRL) network to develop an end-to-end mapping between pictures and the motor bearing health state [62].
In this section, we have divided the relevant works into two categories: Knowledgebased fault diagnosis is used in a new study on diagnosing problems with bearings. Two related subtopics related to the DRL network methodology are also discussed: residual learning with skip connection and batch normalization.

Knowledge-Based Fault Diagnosis
However, even though fault diagnosis approaches can be grouped into model-based, signal-based, and knowledge-based fault diagnosis, only knowledge-based studies are shown in this section because they can show the relationship between raw measured data and its result. The [63] complex industrial systems find it hard to have the models and signal patterns of simple systems [64].
CNN-based fault diagnosis was explored by Wen et al. [65]. The suggested CNN approach, which is based on LeNet-5, may be applied to a typical dataset of motor bearing raw signals. A collection of pictures was created by converting the 1D raw signals to 2D gray level images. Because of this, CNN-based methods for bearing failure detection seem to have a bright future ahead of them. To help with bearing failure identification, Xia et al. [65] created a CNN-based model that uses 1D raw vibration data. Using many sensors instead of a single one was proven by the authors. Unsupervised feature learning was used by Lei et al. [43] to develop a two-stage learning system for diagnosing bearing faults. Sparse filtering is used in the first step of the proposed method, which employs 1D raw vibration data. Bearings' health is then classified using softmax regression. Transfer learning-inspired deep convolutional transfer learning networks for bearing fault diagnostics have been suggested by [66]. For the first step, we propose to employ a 1D CNN to learn the discriminative features from a 1D raw signal and then classify the bearings' state. Researchers from Wen et al. [67] explored the use of 1D raw vibration data from bearings for bearing failure diagnostics. The discriminative features are extracted from the raw data using a three-layer sparse auto-encoder. On a well-known dataset of motor bearings, the suggested method outperformed existing machine learning algorithms in terms of classification accuracy. Feature learning and a deep stack autoencoder are used in conjunction with a deep adversarial domain adaption model to extract more useful fault features. The research shows that the suggested model's classification accuracy is superior to other current machine learning and deep learning approaches. A CNN model developed by Janssens et al. [23] was used to learn aspects of 1D raw vibration data from bearings and monitor their health. Rolling bearing defect identification was examined by Guo et al. using a hierarchical deep CNN model. Researchers came up with an adjustable learning rate for the model and found that it outperformed typical CNNs in terms of effectiveness. Raw vibration data were transformed to 2D pictures by Ding and He [7], who developed a bearing failure diagnostic approach based on deep convolutional networks. Eren [68] suggested a 1D CNN-based bearing defect detecting method. The network discovered a correlation between bearing health and raw vibration signals. It was proposed by Chen et al. [69] to enhance the classification performance of bearing defects using a DL-based method. Analysis of cyclic spectral patterns was used to identify the different kinds of bearing faults. The high-level feature learning capacity of a CNN model was then used to construct a classification model for mistakes. For bearing defect identification, Xu et al. [70] suggested a hybrid DL model that included both a CNN model and a deep forest model. Time-frequency data were processed using a CNN model. Vibration signal images and fault feature extractions loaded into a dense forest End-to-end DL methods for bearing defect identification have been examined by Khorram et al. [71]. The model was a mix of CNN and LSTM models, and it utilized raw vibration data directly without any preprocessing or modification methods. The results were impressive.
A new DRL-based network model for bearing defect identification is provided in this paper. An end-to-end mapping approach, rather than hand-crafted features with restricted representation capabilities, is presented as an alternative. For the experiment to be more reliable, the effects of the number of residual blocks and the BN layer are examined.

Residual Learning and Skip Connection, and Batch Normalization
The techniques of residual learning and skip connection, as well as batch normalization, are briefly discussed in the following sections.
Residual learning and skip connection: As additional layers are stacked to address increasingly complicated problems, the model's training accuracy degrades. Furthermore, as the network becomes more complex, it becomes more difficult to train it. It is also difficult for the deep CNN model to achieve convergence because of the vanishing/exploding gradient issue during backpropagation. For the reasons stated above, He et al. [21] first developed a residual learning strategy for CNN structure. Instead of learning the intended original mapping, the residual mapping, which is the difference between the layer output and the ground truth, is discovered. Because of this, a DRL technique such as this makes deep CNN optimization easy while also improving accuracy. One or more layers might be skipped in order to obtain the residual mapping. As a result, the outputs of the stacked layers and shortcut connections are added element-by-element [72]. It is simpler to optimize a network with these connections, and the network's categorization performance improves as a result. Relative learning schemes with identity skip connections are used in the suggested model's design. Relative learning's link to identity skip connection is also addressed in order to resolve the knowledge-based defect diagnostic issue that has not yet been solved utilizing residual learning.
Batch normalization: Mini-batch stochastic gradient descent (SGD) has been extensively employed as an optimization technique for minimizing the objective function in deep CNN models that need to optimize an objective function with regard to its parameters. SGD is simple and effective but model parameters and learning rates must be properly established at the start. A deep CNN must be able to adapt to its changing inputs during training since each layer's inputs are constantly changing. The issue of shifting internal covariates may be addressed by using a BN layer between convolutional layers and nonlinearities. There has been no research towards using BN for knowledge-based defect diagnosis, as far as we can tell. The results of this investigation, on the other hand, show that batch normalization enhances classification accuracy, as previously reported.
A wide range of industries, including manufacturing, electric cars, and power production, have benefited from the deployment of PMSMs in recent years. A mechanical transmission is often used to connect the PMSM to the driving load in these cases. The transmission is vulnerable to failure because of its critical function in transferring energy from the PMSM to the driving load. Transmission problems may have serious ramifications, ranging from lost productivity to safety concerns. Because of this, there is a need for monitoring and early FDI of PMSM transmission to prevent failures and minimize downtime.
Because the transmission failure creates particular distinctive frequencies that modulate in the vibration spectrum, the vibration-based diagnostic approach is widely used. Vibrations are non-stationary and time-varying as a function of the gear's kinetics and arrangement. For this reason, signal processing techniques including Fourier series, shorttime Fourier transform, and wavelet transform are often used on vibration signals. Using this vibration-based diagnostic procedure, on the other hand, would be costly. Continuous condition monitoring needs specialized accelerometer transducers and data gathering systems. Because the accelerometer transducers must be placed in a certain area, the vibration signal might be affected by outside disturbances. The accelerometer transducers cannot be installed in many places and drive-train layouts.
Mathematical models may be used to better understand PMSM behavior by exploiting the link between the input voltage signal and output current signal. For the purpose of calculating an estimated current signal, the voltage signal will be entered into a mathematical model. The predicted current signal will be removed from the actual current signal before being added back in. This allows for the transmission fault signature to be preserved in the residual current signal even after harmonics and noise from the voltage and any compensation from the controller inverter have been filtered out. In this research, the residual current signal is transferred from the time domain to the frequency domain, in contrast to the typical residual signal analysis. Instead of seeing the time-domain residual current signal, the fault signs are seen in the residual current spectrum. The PMSM's transmission defects may be seen as perturbations to the rotating PMSM. That particular frequency is disrupted by a malfunction in the transmission and might cause problems with the PMSM rotation harmonics. Detecting fault signs is easier when looking at data at a greater frequency rather than a slower rate. The model-based technique instead uses the residual current spectrum rather than the observed current spectrum, in contrast to MCSA.

Background Study of Transfer Learning
A novel approach to machine learning called Deep Transfer Learning (DTL) enhances performance by fusing the advantages of Deep Learning's (DL) feature representation and Transfer Learning's (TL) knowledge transfer. In the area of Intelligent Fault Detection, DTL techniques have been the focus of much study and development. This was carried out to increase the effectiveness, dependability, and toughness of DL-based fault diagnosis techniques (IFD). IFD has been the focus of several thorough studies, although the majority of these justifications have focused on algorithms rather than how IFD is actually employed. IFD tactics based on DTL have not yet been subjected to a comprehensive study. A full review of relevant publications in the field of DTL-based IFD is necessary and even more significant for assisting readers in comprehending the current state of the art and creating a feasible solution to IFD difficulties in the real world. Before showing how transfer learning methods may be included into deep learning models, we briefly address the theoretical foundations of DTL. Finally, significant DTL applications and current advancements in IFD are studied in depth. As a result, proposals for implementing DTL algorithms in real-world applications as well as some potential challenges have been provided. The results of the investigation are then given. As a conclusion, we have reason to anticipate that the research reported in this article will inspire and serve as a resource for future researchers attempting to enhance IFD [73].

Deep Transfer Learning
Industry 4.0, the fourth industrial revolution, is now a part of production. The goal of Industry 4.0 is for machines to have exact self-awareness, make decisions on their own, and communicate with each other in an intelligent way throughout the manufacturing process [74][75][76]. Industrial equipment (IE) has been focusing on providing economic advantages including quality enhancement, efficiency improvement, energy conservation, and cost reduction in response to such developments and revolutions. The IE, on the other hand, is often requested to do Herculean duties while working in a difficult operational environment and delivering long-term services [77]. Health status monitoring and diagnosis of industrial environments (IEs) are critical to the safety and dependability of the industrial environment because it reduces downtime, formulates planned maintenance and increases economic advantages. When it comes to accurately diagnosing faults for IE in time, the production process complexity and dynamicity makes this a huge difficulty.
Throughout the last several decades, academics and business researchers have devoted increasing attention to the need of timely and accurate IFD, which has been cited as a major priority by many governments and organizations. However, academics and engineers have created a wide range of intelligent algorithms, including deep learning and transfer learning, to solve a variety of practical challenges in industrial situations, and have also made significant progress with intelligent failure detection (IFD) of IE.
Artificial intelligence (AI) researchers use the deep learning subfield of machine learning to extract higher-level representations from unstructured input data, such as pixel-based photos, audio files, text texts, etc. [78]. There are many methods for doing this, including deep neural networks, deep belief networks, recurrent neural networks, convolutional neural networks, and graph neural networks (GNNs). Due to deep learning technology's phenomenal advantages in massive data processing, discriminative feature learning, and efficient pattern recognition, building intelligent models by mapping relationships between IE health conditions and industrial data end-to-end has been shown to be a promising method in many manufacturing industry applications [79]. In contrast, the limits of deep learning technology make it impossible for it to progress and be applied in increasingly challenging real-world circumstances. The following are the ideal and hypothetical applications of deep learning: • Deep learning requires many pre-labeled samples in order to train a model. As was already said, one disadvantage of deep learning is that it mostly on direct observation to teach abilities. On the other hand, deep learning systems rely heavily on enormous amounts of labeled training data; without them, the algorithms are prone to overfitting and are weak at generalization. • For deep learning to work, certain requirements must be met by the distributions between the training and test sets of data. This implies that, if a deep learning model is trained on data that does not match the planned data distribution, its performance will likely suffer greatly or perhaps fail.
Transfer learning, a branch of machine learning that focuses on the collection of common information from one or more related but diverse application environments, has emerged as a viable technique for assisting deep learning in overcoming the aforementioned restrictions. Transfer learning may improve an AI model's learning performance even when training material is limited and sparse, similar to how people might utilize a small number of examples or past knowledge to handle unanticipated situations [46]. In addition, it may provide an AI model with strong generalization performance from similar but distinct application contexts to a new scenario. Therefore, it may be difficult for traditional machine learning algorithms to acquire and recall the discriminative representations necessary for transfer learning.
A new machine learning paradigm that uses deep learning technologies to transmit information has evolved in recent years. This paradigm has the potential to discover complicated patterns and obtain transferable knowledge more quickly. DTL is more suited for real-world industrial applications since it is simple to integrate with IFD of IE's well developed deep learning models [80].
DTL for fault diagnosis in manufacturing environments is the focus of this study, which aims to give readers interested in contributing to the growth and development of IFD with a thorough understanding of the technology. Fault diagnosis has been the subject of a number of systematic reviews in the past [20]. Traditional fault diagnostic and prognosis methods were heavily included in this literature study, which does not represent current methodologies [21]. Yan and Gao compiled a list of the most popular deep learning models used for machine health monitoring prior to 2019, including SAE, DBN, DBM, CNN, and RNN, and provided the data and code needed to replicate the results [81]. Using machine learning, Lei and Nandi [82] compiled an overview and road map for IFD approaches, dividing the evolution of IFD methods into three phases: classical machine learning, deep learning, and the future (transfer learning). Aside from those stated above, there are several related papers that concentrate on deep learning-based [83], transfer learning-based and convolutional neural network-based [55] approaches for machine defect diagnostics, which are not included here. Because of this, an overview of relevant DTL articles is more crucial than ever before in order to assist readers in swiftly grasping the most recent, cutting-edge concepts in IFD and devising practical solutions to common problems.

Evaluation of Selected Studies on Fault Detection
This section is divided into two categories: deep learning based paper and transfer learning based paper.

Deep Learning Based Paper
In [5], detecting and identifying any potential abnormalities or faults as early as possible and implementing real-time fault-tolerant operation in order to minimize performance degradation and avoid dangerous situations is becoming increasingly important because industrial systems are becoming increasingly complex and expensive. Throughout the past four decades, researchers have made significant progress in the fields of defect detection and fault-tolerant control. For the purpose of this three-part study, we are focusing on the last decade's findings in real-time defect diagnosis and fault tolerant control. In the first half of the paper, the methods and applications of model-based and signal-based defect diagnosis are looked at in depth.
In [84], a novel approach based on deep learning is offered to solve the issue in this research. The contributions consist of: Initially, we presented an end-to-end technique that accepts unprocessed temporal signals as inputs and so does not need any time-consuming denoising preprocessing. The model can attain a reasonable degree of precision in a noisy setting. Second, the model is independent of domain adaptation algorithms and does not need domain-specific knowledge. It can attain great precision regardless of the operating load. To comprehend the suggested model, we will display the learnt characteristics and attempt to determine the causes for its great performance.
In [85], this study describes a model-based method for steady-state permanent magnet synchronous motor (PMSM) driving transmission defect detection and identification (FDI). The suggested architecture employs a PMSM state-space model and an approximation of a transmission model. Using Recursive Least-Square to create regression models for a parameter estimates (RLS) algorithm. The FDI is assessed via the residual current spectrum thresholding approach, the distinctive frequency and magnitude of the fault, as well as the parameter clustering. In the trials, transmissions with three distinct fault states are examined. As an introductory effort in condition monitoring of PMSM driving transmission, the findings of this work indicate a promising strategy that takes into account both residual current spectrum and parameter cluster, and accomplishes an acceptable level of decision-making in recognizing and diagnosing the defective situation.
In [86], deep learning has made amazing strides in the field of fault diagnosis. Nonetheless, the operating circumstances of an axial piston pump are variable, and the data distribution is not uniform, rendering the majority of deep learning models invalid. A deep multi-signal fusion adversarial model based transfer learning (MFAN) solution is given to address this issue. A multi-signal fusion module is created to assign weights to vibration signals and auditory signals, which increases the method's dynamic adjustment capability. In addition, the residual network is incorporated into the shared feature creation module in order to gather an abundance of feature data. According to the various operating loads of the axial piston pump, nine transfer scenarios are constructed, and the suggested approach is compared to five common diagnostic techniques.
In [87], this research is a continuation of our prior work on constructing induction motor inter-turn fault (ITF) models (Part I). In this research, an online ITF diagnostic technique for induction motors is presented by using the negative sequence current as a fault signal, based on the fault model developed in part I of a previous study. The ITF model analyzes the correlations between fault parameters, negative sequence current, and fault copper loss. The findings demonstrate that the fault severity index, which is a function of fault characteristics, has a direct relationship with the negative sequence and copper loss. Consequently, the model-based fault diagnostic approach suggested evaluates the fault severity index from the negative sequence current and identifies the ITF. With the predicted fault severity index, the thermal degradation-causing fault copper loss by the ITF may be computed. Finally, tests were conducted under several fault settings to validate the suggested approach for fault diagnosis.
Due [88] to the efficient processing of acquired sensor and actuator data, knowledgebased approaches are able to provide a potential solution to the bearing diagnostic issue with a high level of accuracy. Deep learning (DL) has the benefit among machine learning algorithms of disregarding feature extraction and offering accurate diagnosis. In order to solve this problem, a new DL-based model for failure detection and classification of motor bearing is described in this study. In this study, time domain signals are first transformed to images using a technique provided for signal-to-image conversion. The modified grayscale pictures are then sent into a unique deep residual learning (DRL) network designed to learn end-to-end mapping between images and motor bearing health state. Case Western Reserve University provides a frequently used actual vibration data set which is used to test the effectiveness of the planned DRL network (CWRU).
In [89], the purpose of this work is to provide interested readers with an overview of several model-based FD approaches created for linear time-invariant (LTI) systems in order for them to learn about current advancements in this area. Model-based FD approaches for LTI systems are categorized as parameter estimation techniques, parity space-based techniques, and observer-based techniques. In this study, the history, current advancements, and practical applications of each of these methodologies in relation to fault detection are explored.
In [90], existing failure diagnosis (FD) techniques cannot be used on different types of multiphase machines due to varying stator phase layouts. An enhanced machine-learningbased FD approach with adaptive secondary sample filtering is presented for multiphase drive systems in the age of big data and artificial intelligence. Experimental findings on both five-phase and six-phase motor driving platforms demonstrate the proposed method's generalizability as well as its excellent accuracy and robustness.

Transfer Learning Based Paper
In this section, we have tried to find out the research gap for transfer learning. In [2], the proposed framework in uses convolutional neural networks (CNNs) and parameter transfer techniques to convey diagnostic data from enormous and varied simulation data generated by the dynamic model of bearings to the real environment. The usefulness of the technique is shown through a comprehensive analysis of three fault situations. The results show that the proposed method may discover more transferable features and reduce the disparity between the distribution of features based on simulation data and strategies for transferring parameters in CNN. This is a considerable increase in the performance of defect detection.
In [3] with this knowledge, it is even more crucial to conduct a thorough analysis of the pertinent DTL-based IFD publications. Doing so will help readers more rapidly understand the most cutting-edge techniques and come up with fast solutions for practical IFD problems. To demonstrate how transfer learning methods may be used with deep learning models, the theoretical foundations of deep learning transfer learning (DTL) are briefly presented. Finally, significant DTL applications and current IFD advancements are covered in-depth. It is also critical to emphasize that DTL algorithms are addressed in genuine applications and possible problems. Here are the survey's final findings. Finally, we have reason to assume that the study described in this article may assist and inspire scientists who desire to further IFD.
In [4], several partial distribution adaptation sub-networks (PDA-Subnets) are combined with a multi-source diagnostic knowledge fusion module to form a multi-source transfer learning network (MSTLN), which aggregates diagnostic information from multiple sources. By adjusting partial distributions of source and target pairings together, the former weights target samples using counter-balancing variables to release adverse effects owing to discrepancy across several source machines, while the latter fuses diagnostic conclusions from multiple PDA-Subnets even further. For unbalanced target samples, MSTLN may minimize misdiagnosis rates and provide higher transfer performance than standard approaches, according to two case studies.
In [91], this paper's primary objective was to use pre-trained deep transfer learning (DTL) structures and standard machine learning (ML) models as an automated method for diagnosing Parkinson's disease (PD) using sEMG data. To produce the discriminative feature vectors, we first stacked the extracted features from three deep pre-trained architectures, including AlexNet, VGG-f, and CaffeNet. Despite the enormous number of stacked features from all three deep structures, the correct features are successful in overcoming the difficulty of over-fitting and boosting resilience to varying amounts of additional noise. In order to minimize the size of the extracted features, a unique soft combination of subset feature selection approaches, including receiver operating characteristic (ROC), entropy, and signal-to-noise ratio (SNR) processes, was presented. Finally, we identified PD disorder using a support vector machine (SVM) with a radial basis function (RBF) kernel.
In [2], the proposed framework, convolutional neural network (CNN), and parameter transfer techniques are utilized to apply the diagnostic information derived from simulation data to real-world situations. A dynamic model of bearing is employed to generate enormous and varied simulation data. Three fault diagnosis cases are used to analyze and thoroughly describe the effectiveness of the recommended approach. The results show that the suggested strategy can learn more transferable features and reduce the feature distribution discrepancy, significantly enhancing the fault detection performance, based on simulation data and parameter transfer approaches in CNN.
In [92], this study explains the design process of the built-in discrete time-series convolution neural network (DTCNN) and offers a hierarchical technique for TRU fault detection as well as a transfer learning-based fault diagnostic method as an alternative to training new models for various TRUs. Initially, the DTCNN construction was decided upon. Then, HDCNN's performance is confirmed. On this premise, the prerequisites for an appropriate source dataset for TRU defect diagnostics and the pre-trained HDCNN transfer layers are described. Comparing different algorithms with different noise settings showed that transfer learning is a good way to build a diagnostic network for similar equipment that can often lead to better performance.
In [93], CAE is utilized as the feature extractor in the proposed framework since it is capable of noise reduction. In addition, CORrelation ALignment (CORAL) loss and domain categorization loss are combined to amplify the impact of domain confusion. The suggested model is applied to the fault transfer diagnosis of planetary gearboxes operating under various working loads and noise levels, and it is compared to other conventional fault transfer diagnostic models. The experimental findings demonstrate that CAE-DTLN has greater diagnostic precision and more generalizability. The diagnostic accuracy of CAE-DTLN is typically greater than 99 percent. In addition, the transfer learning model provided has superior anti-noise performance.
In [94], this research provides a technique for defect diagnostics in building chillers based on transfer learning. On two water-cooled screw chillers, experiments were undertaken to capture both fault and fault-free data. Different transfer learning tasks, training instances, learning situations, and transfer learning implementation methodologies were considered for conducting experiments for fault diagnosis using transfer learning. The experimental findings demonstrate the use of transfer learning for FDD in building energy systems, particularly when there is limited experimental data available for model creation. The greatest accuracy increases for the two learning tasks were 12.63 and 8.18 percent. Based on transfer learning, the study's results can be used to help design FDD solutions for building energy systems.
In [95], this research proposes a defect detection approach based on parameter-based transfer learning and convolutional auto encoder (CAE) for wind turbines using small-scale data in order to maximize the use of this valuable data. The suggested approach is capable of transferring information from comparable wind turbines to the target wind turbine. The performance of the suggested technique is evaluated and compared to transfer and non-transfer methods. The comparative findings indicate that the suggested technique is superior for identifying flaws in small-scale wind turbine data.
In [96], the issue of transfer diagnosis with insufficient target data is addressed in a unique method in this study. The main concept is to pair the source and target data with the same machine condition and perform individual domain adaptation to mitigate the lack of target data, reduce the distribution discrepancy, and prevent negative transfer in consideration of the unclear data distribution described by the sparse data. Additionally, in our network, the problem of mismatched label spaces may be effectively resolved. The suggested method has been thoroughly tested on two case studies. All transfer variables, such as different working conditions and a range of machinery, are taken into account. Comprehensive testing shows that the suggested strategy works better than conventional transfer learning techniques.
Due to its high generalizability across a variety of industrial contexts [97], there has been a recent uptick in interest in transfer learning in equipment defect diagnostics. Existing techniques typically assume equal label spaces and suggest decreasing the marginal distribution difference between source and destination domains. This assumption, however, does not often hold true in real-world businesses, where testing data consist of a subspace of the source label space. Therefore, it is rational to transfer diagnostic information from a broad source area to a target domain with constrained machine conditions. In this paper, a domain adaption strategy based on deep learning is used to tackle the difficult issue of partial transfer learning. It is suggested to use a class-weighted adversarial neural network to promote positive transfer of common classes and disregard source outliers. Experimental findings on two datasets of rotating equipment indicate that the suggested approach for partial transfer learning is promising.
Ref. [98] examines the latest developments in deep transfer learning for machine failure detection. There are several deep transfer structures and associated ideas, as well as an overview, categorization, and explanation of a number of relevant works. Based on this concept, this article discusses the key triumphs, problems, and prospective future research topics of deep transfer learning. This gives explicit guidelines for the architecture selection, design, and implementation of deep transfer learning for machine fault detection.
Ref. [99] propose more accurately aligning the marginal and conditional distributions of datasets, and this study introduces a novel transfer learning technique called improved joint distribution adaptation (IJDA). A working condition-resilient fault diagnostic approach based on vibration signals that fundamentally consists of three components is being developed in the meantime. Using noise to improve network performance, a novel data augmentation technique is created to give more relevant samples for unbalanced vibration signals. Second, the input dimension of IJDA is minimized via sparse filtering (SF). Ref.
[100] provides a dependable defect diagnostic technique based on acoustic spectrum imaging (ASI) of acoustic emission (AE) signals as an accurate health condition. Transfer learning is a method for machine learning that exchanges information with convolutional neural networks (CNN) for accurate diagnosis under different operating settings. ASI transforms the amplitudes of the spectral components of the time-domain windowed acoustic emission signal into spectrum imaging. ASI gives a visual representation of spectral properties of acoustic emission in photographs. This delivers improved spectrum pictures for transfer learning (TL) testing and training, as well as a robust classifier approach with excellent diagnostic precision.
Ref. [101] provides a brand-new taxonomy and investigates UDTL-based IFD in the context of various workloads. The transferability of features, the influence of backbones, negative transfer, physical priors, etc. seldom explored open and essential issues in UDTLbased IFD that are revealed through a comparative investigation of a number of popular methodologies and datasets. To facilitate future studies and establish the significance and repeatability of UDTL-based IFD, the whole test framework will be made available to the scientific community. In conclusion, the provided framework and comparison research may serve as an expanded interface and crucial findings for future work on UDTL-based IFD.
Ref. [102] describes a two-phase digital-twin-assisted fault detection approach based on deep transfer learning (DFDD) that enables fault identification in both the development and maintenance stages. Front running the high model in the virtual space identifies possible issues that were not addressed during design, while a deep neural network (DNN)based diagnostic model is completely trained. In the second step, utilizing deep transfer learning, the previously trained diagnostic model may be transferred from the virtual environment to the physical space for real-time monitoring and predictive maintenance. This assures the correctness of the diagnosis and prevents time and knowledge waste.
Ref. [103] proposes a distance metric called polynomial kernel induced MMD (PKMMD) to solve the drawbacks. In conjunction with PK-MMD, a paradigm for reusing diagnostic information from one machine to another is developed. Two transfer learning instances are used to validate the proposed approaches, in which the health statuses of locomotive bearings are detected using data from laboratory motor bearings and gearbox bearings.
In this paper, we also discussed the latest datasets about fault detection. The datasets discussed below are: Ref. [104] presents the "ToyADMOS" dataset, which is meant to identify anomalies in machine operation noises (ADMOS). According to their knowledge, there are no large-scale datasets available for ADMOS, despite the fact that large-scale datasets have contributed to recent breakthroughs in acoustic signal processing. This is because it is difficult to obtain data on unusual sounds. To acquire abnormal operation noises of micro devices (toys) for the ADMOS database, we intentionally harmed them. The publicly available dataset includes three sub-datasets for machine-condition inspection: fault diagnosis of machines with geometrically fixed jobs and fault diagnosis of machines with changing tasks.
Ref. [105] provide a novel sound dataset for malfunctioning industrial machine inquiry and inspection (MIMII dataset). Different kinds of industrial machinery (e.g., valves, pumps, fans, and sliding rails) had normal noises recorded, and to simulate a real-world situation, numerous abnormal sounds were also recorded (e.g., contamination, leakage, rotational imbalance, and rail damage). The MIMII dataset is being made available to aid the machine-learning and signal-processing groups in the development of automated facility maintenance.
Ref. [106] offers the "ToyADMOS2" large-scale dataset for anomaly identification in machine operation noises (ADMOS). As we did for our earlier ToyADMOS dataset, we gathered a large number of operational sounds of small machines (toys) under normal and anomalous settings by intentionally destroying them, with the addition of controlled damage depth in anomalous samples. The ToyADMOS2 dataset is developed for testing systems under domain-shifting settings, since most ADMOS application scenarios demand robust performance under such conditions. The publicly available dataset includes two subdatasets for machine-condition inspection: fault diagnosis of machines with geometrically fixed tasks and fault diagnosis of machines with moving tasks.
Ref. [107] provides MIMII Owing, a singular dataset for investigating and inspecting industrial machines that are not working properly with domain alterations brought on by operational and environmental changes. Standard techniques for recognizing anomalous sounds encounter practical difficulties since the distribution of characteristics changes between the training and operational phases (referred to as domain shift) as a result of several real-world factors. Unfortunately, there is currently not a dataset available that contains real domain shifts, making it impossible to assess the robustness against them. The new dataset contains the typical and anomalous operating noises of five different kinds of industrial machinery that operate in the source domain and target domain, respectively, under two different operational and environmental situations.
Ref. [108] renders denoising ineffective since it makes it hard to discover faults, which is a critical stage in defect identification and condition monitoring (CM). In the present study, background noise is efficiently removed, and the reliability of the defect identification method is increased by using the Singular Value Decomposition (SVD) and Hankel matrix based denoising approaches on the time domain vibration signals and spectra of ball bearings.
Ref. [109] presents and discusses empirical facts. The mode decomposition method (EMD) and the problem of mode mixing and its primary causes are discussed. Integrated EMD (EEMD) and, as treatments for EMD, methods for masking EMD (M-EMD) are described. The problem with blending models was that both upgraded procedures were used to examine and compare the results of an EEG signal. The M-EMD yielded the greatest results, successfully splitting the EEG signal into delta, theta, alpha, and beta bands [110]. In this paper, we use 1D to 2D conversion using Hilbert Transform, applied transfer learning with DCNN-LSTM architecture, and used a malfunctioning industrial machine investigation and inspection (MFPT) bearing vibration signals dataset. In this paper, we convert 1D to a 2D texture image, Gabor Filter with SVD for feature extraction, then SVM (Support Vector Machine) as a classifier, and used a Vibration signal dataset for validation [111]. We used 1D to 2D conversion for texture generation, Image Segmentation, and the Gabor filter for feature extraction used Acoustic Signal Dataset [112]. We applied GNS (Global Neighborhood Structure) for feature extraction, used multiclass SVM, and used vibration signals [113]. DNS map for feature extraction, multiclass SVM as a classifier, and used vibration signals [114], applied different 1D to 2D texture generation techniques with multiclass SVM, and used acoustic signals for validation [115,116].
To the best of our knowledge, deep transfer learning is best for fault detection. In this review, we have tried to compare deep transfer learning with other methods.

Challenges and Future Directions
In this section, we have tried to find the research gap, which is shown in Table 2. Table 2. Find the research gap challenges and future directions.

Objectives and Challenges Method Dataset
Dynamic model of bearings was used to apply the diagnostic information from the simulation data to a real situation [2].
Intelligent fault diagnosis, Transfer learning, Dynamic model and Convolutional neural network CWRU data and MFPT data Transforming deep learning models into transfer learning approaches begins with a quick overview of the theoretical foundations of DTL. Following that, we go through some of the most important DTL applications and the most current DTL improvements in IFD [3].
Fault diagnosis, Deep learning, Transfer learning, Domain adaptation, Deep transfer learning MIMII A multi-source transfer learning network (MSTLN) structure is proposed in this study to aggregate and transmit diagnostic information from many sources. A multi-source diagnostic knowledge fusion module is used in conjunction with many partial distribution adaption sub-networks (PDA-Subnets) [4]. This paper's primary objective was to use pre-trained deep transfer learning (DTL) structures and standard machine learning (ML) models as an automated method for diagnosing Parkinson's disease (PD) using sEMG data [2].

Deep transfer learning and Ensembling feature selection
Prosthetic fingers and Gait rhythmicity datasets.
This study explains the design process of the built-in discrete time-series convolution neural network (DTCNN) and provides a hierarchical technique for TRUs fault detection as well as a transfer learning-based fault diagnostic method rather than training new models for distinct TRUs [76].
Transformer rectifier units, Intelligent fault diagnosis, Convolutional neural network and Transfer learning.
Researchers are employing transfer learning to identify faults due to a shortage of fault data. This paper investigates the advantages of transfer learning for AI-based fault-detection problems [77].
Fault diagnosis, feature extraction, feature transfer and sensors.
Two data sets, denoted Group A and Group B, are used in the comparative experiments. Group A depicts the transition of the Hp1 fault prediction model to Hp2. Group B reflects the transition of the Hp3 fault prediction model to Hp4. This approach can be immediately applied to transient data while preserving accuracy without the need for a steady-state detector, allowing for early defect diagnosis. The transformer design employs a new multi-head attention mechanism devoid of convolutional and recurrent layers, as is the case with standard deep learning techniques [78].
Transformer architecture and Deep learning method N/A.
By assessing the attributes of great observations from sensors put in traction systems, the typical processes, difficulties that may limit future FDD installations are analyzed in detail, as are realistic high-speed trains. Using the theoretical advancements of data-driven FDD techniques, further enlightening insights on this topic are provided. Orally generated by embracing FDD-based model-based issues. High-speed train traction system approaches for system identification and new machine learning technologies that provide a range of interesting solutions to FDD strategies [79].
Data-driven, fault diagnosis and detection (FDD), traction systems, and high-speed trains.

N/A.
This study presents a domain generalization-based hybrid diagnostic network for deployment under unanticipated working settings in order to address this difficulty. Using both intrinsic and extrinsic generalization objectives, the deep network's discriminant structure is intended to be made more regular. This allows the diagnostic model to acquire robust traits and then apply them to previously unexplored areas [80].
Keywords: deep learning, domain generalization, intelligent failure detection, rotating equipment, and vibration signals. and a record of problems with the gearbox. N/A.

Conclusions
This review provides an overview of the theory and practices of DTL techniques from the perspective of the algorithm. This review defines the terminologies for DTL and describes how TL technologies may be used to enhance the performance of DL models. Real-world industrial applications have also been taken into consideration while examining the state-of-the-art uses of DTL-based IFD techniques. The four key application scenarios-improving generalization performance, diagnosing flaws in partial domains, identifying faults that are just beginning to appear, and decoupling problems that are driven by other faults-have been defined and discussed in depth. Then, tactics for selecting DTL algorithms for a new IFD project are thoroughly outlined, and potential challenges and trends for the future are also examined. This article offers suggestions for what academics who wish to assist IFD in developing and improving may do.