Diagnosis Methodology Based on Deep Feature Learning for Fault Identification in Metallic, Hybrid and Ceramic Bearings

Scientific and technological advances in the field of rotatory electrical machinery are leading to an increased efficiency in those processes and systems in which they are involved. In addition, the consideration of advanced materials, such as hybrid or ceramic bearings, are of high interest towards high-performance rotary electromechanical actuators. Therefore, most of the diagnosis approaches for bearing fault detection are highly dependent of the bearing technology, commonly focused on the metallic bearings. Although the mechanical principles remain as the basis to analyze the characteristic patterns and effects related to the fault appearance, the quantitative response of the vibration pattern considering different bearing technology varies. In this regard, in this work a novel data-driven diagnosis methodology is proposed based on deep feature learning applied to the diagnosis and identification of bearing faults for different bearing technologies, such as metallic, hybrid and ceramic bearings, in electromechanical systems. The proposed methodology consists of three main stages: first, a deep learning-based model, supported by stacked autoencoder structures, is designed with the ability of self-adapting to the extraction of characteristic fault-related features from different signals that are processed in different domains. Second, in a feature fusion stage, information from different domains is integrated to increase the posterior discrimination capabilities during the condition assessment. Third, the bearing assessment is achieved by a simple softmax layer to compute the final classification results. The achieved results show that the proposed diagnosis methodology based on deep feature learning can be effectively applied to the diagnosis and identification of bearing faults for different bearing technologies, such as metallic, hybrid and ceramic bearings, in electromechanical systems. The proposed methodology is validated in front of two different electromechanical systems and the obtained results validate the adaptability and performance of the proposed approach to be considered as a part of the condition-monitoring strategies where different bearing technologies are involved.


Introduction
Scientific and technological advances around rotatory electrical machinery are leading to increased efficiency and performance of the processes and systems in which they take part. In this regard, the consideration of advanced materials, such as hybrid or ceramic bearings, are of high interest in the move towards high-performance and oil-free rotatory electromechanical actuators. High demanding operative requirements have become the main characteristics of modern rotating machinery as well as their continuous operation, which also increases the risk of system failure and, specifically, the acceleration of bearing faults as one of the most common sources of malfunction [1,2]. Depending on the considered bearing technology, that is, metallic, hybrid, or full ceramic, the operative characteristics such as friction coefficient, thermal rejection ratio, or lubrication requirement among others differ. The suitability of these features to the application is, in fact, the motivation to decide the incorporation of one or another bearing technology. Metallic bearings are commonly used in simple and common applications of everyday life; the most complex applications for metallic bearings are probably those related to industrial machines, whereas ceramic bearings are being used in particular applications such as in high-speed lathes, chemical machinery and aeroengines, among others. In this regard, in environments in which high-speed, heavy-duty, low noise, and/or high precision are required, full ceramic bearings are being gradually incorporated [3]. However, from the maintenance point of view, different bearing technologies imply several differences in the degradation processes and also in the expected characteristic fault-related patterns that are usually considered in the design process of the condition-based maintenance solutions [4][5][6][7]. Thus, the differences in the fault mechanisms due to differences in brittleness or toughness between steel and ceramic have effects on the vibration response during the machine operation [8][9][10][11][12]. Specifically, overall vibration levels in full-ceramic bearings are modified by the rotational speed and do not suffer changes in presence of axial load. Meanwhile, under the influence of a given axial load and rotational speed, the vibratory levels of full-ceramic bearing are reduced, and for hybrid bearings they increase [13,14]. In this regard, although the mechanical principles from which the characteristic fault-related effects are estimated remain, the quantitative analysis over the vibration signals is affected in amplitude and frequency terms.
Indeed, there is no consensus related to the most convenient signal processing stage to face different bearing technologies. On the other side, concerning the condition assessment and fault identification in different bearing technologies, a great deal of approaches have been proposed and focused on solving the following problems: (i) the occurrence of faults on different parts of the bearing, where, the fault assessment in the inner race, outer race, balls and cage, have been widely studied [3,4,15,16]; (ii) the assessment and recognition of the bearing fault severity which has been normally analyzed over the inner race and outer race [9,12,17]; and (iii) the analysis and prediction of the degradation in any of the bearing elements with the aim of estimating its remaining useful life [6,18]. Despite the fact that several condition-monitoring approaches have been successfully proposed in the field of bearing fault identification to face the most common problems related to the assessment of faulty conditions, most of these proposals have been designed for being applied to a specific bearing technology, the metallic bearings. Under this premise and in order to provide solutions in front of the emerged bearing technologies, the proposal of new condition assessment approaches capable of self-adapting, independently of the bearing technology, such as metallic, hybrid and ceramic bearings, is mandatory. In actuality, it is possible to find works in the related literature of condition monitoring that consider different domains of analysis, that is, time domain, frequency domain and/or time-frequency domain [19][20][21]. Thus, motivated by the consideration that the characteristic fault-related patterns are modulated by the electromechanical system configuration and operating conditions in which the bearing operates, this leads to data-driven approaches as a more suitable solution for practical solutions and applications. In recent years, proposals that have been addressed the fault detection and identification by using artificial intelligence (AI)-based models have attracted great attention [22,23]. Initially, considering machine learning (ML) approaches, such as the work presented by A. Soualhi et al. in [24], based on support vector machines, or the work presented by J. J. Saucedo et al. in [25], based on artificial neural networks (NN), or even the work presented by J. Guo et al. in [26], based on k-nearest neighbors. However, despite the fault detection and identification can be carried out by such classical approaches in which depending on the bearing technology and its electromechanical context, the most significant fault-related features have to be selected, and recently, deep learning (DL) approaches are being considered. The use of DL has attracted attention since DL provides an adaptive framework that allows the design of architecture that can be adapted in the solution of specific problems that must be faced in which the engineering features and AI algorithms coexist. Hence, DL-based approaches have recently been established as a powerful tool to adapt the pattern characterization of electromechanical systems [27]. Significant examples of this fact are the works presented by J. Dai [15,22,28], respectively.
However, dealing with the generalization of the diagnosis methodology, and specifically, considering its application to different bearing technologies, such as metallic, hybrid and full ceramic, new DL-based procedures have to overcome a set of critical challenges that are avoiding its practical application [29]. Mainly, these limitations are: (i) the operation with limited amounts of data, which are difficult to obtain in industrial applications especially concerning the fault patterns, (ii) understandable interpretation of the learning process, that is not clear for practitioners, (iii) generalization-based approaches avoiding over-fitted solutions and (iv) the methodological hyperparameter tuning procedures to promote ease of access and configuration.
Thereby, the contribution of this work lies in the proposal of a novel data-driven monitoring methodology based on deep learning applied to the diagnosis and identification of bearing faults for different bearing technologies, such as metallic, hybrid and ceramic bearings, in electromechanical systems. Thus, the proposed diagnosis methodology promotes a common framework which valid for different bearing technologies and electromechanical systems. To address this issue, the proposed methodology consists of three main stages. First, a DL-based model is designed with the ability of self-adapting to the extraction of characteristic fault-related features from different signals including classical vibration and stator current; also, the proposed diagnosis approach is designed for being adapted to any other different available physical magnitude. In order to perform the proposed deep feature learning, a distributed DL-based structure supported by a stacked au-to encoder (SAE) is considered for extracting potential and meaningful fault-related features from the available signals that are characterized through its analysis in three different domains (time domain, frequency domain, and time-frequency domain). In this sense, the proposed methodology offers a practical way to implement a DL approach independently of the electromechanical configuration in general, but considering the bearing technology in particular. Second, a feature fusion stage that integrates information from different domains is considered to increase the posterior discrimination capabilities during the condition assessment. Finally, third, a simple softmax layer is included to compute the final classification result. The proposed methodology is validated in front of two different electromechanical systems. First, a self-designed experimental test bench including metallic, hybrid and full ceramic bearings, at multiple operating conditions under the same fault is considered to evaluate the adaptation capabilities of the proposed methodology in front of the bearing technology. Second, a standard benchmark as the Case Western Reserve University bearing data center [30,31], is also evaluated in order to assess the performance of the proposed diagnostic method in front of different metallic bearing failures in a reference framework. It should be noted that the proposed approach is of significant importance in the field, even more so when designing a fault detection and identification solution, since depending on the bearing technology, the electromechanical system configuration, and even the range of operating conditions, the meaning and correlation of characteristic fault-related features will differ. However, the proposed methodology represents a common framework for the design and implementation of feasible solutions to be applied in a wide range of electromechanical systems where the bearings are involved in multiple configurations.
The proposed work is organized as follows: Section 2 presents the theoretical background on the stacked auto-encoder as a deep-learning approach. Section 3 presents the proposed method presented in this study. In Section 4, the results and validation of this proposal are discussed. Finally, the conclusions are provided in Section 5.

Autoencoder-Based Deep Feature Learning
Unsupervised feature-learning techniques have been included as a fundamental part in most of the proposed condition-monitoring and diagnosis approaches. In this regard, the autoencoder (AE), which is well-known as an unsupervised NN, has a symmetrical structure and one of its main design purposes is focused to perform the representation of a high-dimensional feature space into a low-dimensional space; specifically, the AE goal is to reconstruct the input pattern at the output of the network by means of minimizing the reconstruction error between the input and output feature patterns [26,32]. The AE network structure is represented by a single hidden layer and the graphic representation is shown in Figure 1. The mathematical notation that belongs to a single AE network structure is described below; however, it should be clarified that aiming to facilitate the description of the mathematical basis, italic letters are used to define scalars, bold-face lower-case letters are considered in the definition of vectors and bold-face upper-case letters are used in the definition of matrices.

Autoencoder-Based Deep Feature Learning
Unsupervised feature-learning techniques have been included as a fundamental part in most of the proposed condition-monitoring and diagnosis approaches. In this regard, the autoencoder (AE), which is well-known as an unsupervised NN, has a symmetrical structure and one of its main design purposes is focused to perform the representation of a high-dimensional feature space into a low-dimensional space; specifically, the AE goal is to reconstruct the input pattern at the output of the network by means of minimizing the reconstruction error between the input and output feature patterns [26,32]. The AE network structure is represented by a single hidden layer and the graphic representation is shown in Figure 1. The mathematical notation that belongs to a single AE network structure is described below; however, it should be clarified that aiming to facilitate the description of the mathematical basis, italic letters are used to define scalars, bold-face lowercase letters are considered in the definition of vectors and bold-face upper-case letters are used in the definition of matrices. Since AE is an unsupervised NN, the training procedure is divided into two main phases: encoder and decoder; thus, during the encoder procedure, an input feature dataset vector is connected to the input encoder layer x. The encoder procedure aims to perform the mapping of the input x into a hidden representation which is denoted by the hidden layer or encoder vector h. Therefore, the hidden layer h contains the numerical information and maps that lead to achieve the connection of the input layer x with the hidden layer h through n sparse-activated neurons and non-linear transformation following Equation (1): where is the sigmoid function used as the non-linear activation function, and and are the corresponding weights and biases matrices for the encoder phase, respectively. Otherwise, the decoder procedure which is applied as a reverse transformation procedure aims to perform the reconstruction of the input feature vector x from the information contained in the hidden layer h; thus, an output decoder layer vector y is obtained during the decoder procedure, which is achieved by following Equation (2): From Equation (2), the corresponding weights and bias matrices considered in the decoder procedure are and , respectively; the AE output is represented by y as the achieved reconstruction of the corresponding input x. Therefore, the aim of the AE feature learning process lies in the optimization of the parameters = , , , , where the minimization of the reconstruction error between the input and the output feature vectors,

Encoder
Decoder Hidden layer Since AE is an unsupervised NN, the training procedure is divided into two main phases: encoder and decoder; thus, during the encoder procedure, an input feature dataset vector is connected to the input encoder layer x. The encoder procedure aims to perform the mapping of the input x into a hidden representation which is denoted by the hidden layer or encoder vector h. Therefore, the hidden layer h contains the numerical information and maps that lead to achieve the connection of the input layer x with the hidden layer h through n sparse-activated neurons and non-linear transformation following Equation (1): where f is the sigmoid function used as the non-linear activation function, and W e and B e are the corresponding weights and biases matrices for the encoder phase, respectively. Otherwise, the decoder procedure which is applied as a reverse transformation procedure aims to perform the reconstruction of the input feature vector x from the information contained in the hidden layer h; thus, an output decoder layer vector y is obtained during the decoder procedure, which is achieved by following Equation (2): From Equation (2), the corresponding weights and bias matrices considered in the decoder procedure are W d and B d , respectively; the AE output is represented by y as the achieved reconstruction of the corresponding input x. Therefore, the aim of the AE feature learning process lies in the optimization of the parameters θ = {W d , B e , W d , B d }, where the minimization of the reconstruction error between the input and the output feature vectors, x and y, respectively, is accomplished in terms of the mean squared error (MSE) by following Equation (3): where k = 1, 2, . . . M, x k , y k , represents each corresponding element of the input and the out feature vectors, respectively, and M is the total number of input and output elements. Moreover, in order to prevent overfitted responses on AE networks, it is necessary to introduce L 2 as the regularization term to the cost or fitness function. Specifically, L 2 is denominated as the weight decay regularization term, which is added to avoid the emergence of large weights in the weight matrix W, thus, such a term is formulated as Equation (4) depicts: where a is the number of weight parameters, b is the number of rows and c is the number of columns in each weight matrix W, and W i jk represents each element of W. Furthermore, a sparsity regularizer term is introduced to generate more specific AE network models capable of learning information from the input more effectively. This regularizer term is a function of the average output activation value of a neuron which is described as followed by Equation (5): where n is the total number of training samples, x j is the j th training sample, w T i is the i th row of the weight matrix W e and b i is the i-th entry of the bias vector, b e . This purpose of the sparsity function is to restrict the values ofρ i to be low, which causes the AE to generate a mapping such that each neuron in the hidden layer is activated with a small number of samples. In this regard, a term Ω Sparsity is added to the cost function to constrain theρ i values and make the neurons sparce. For this, the KL divergence is used to measure the distance between ρ andρ i , where ρ is the wanted sparsity parameter andρ i is the effective sparsity of a corresponding hidden neuron Subsequently, the fitness function defined as FunCost is described as the sum of the error term regularization penalty terms where λ is the specific parameter for the L 2 regularization term that takes control over the weight decay and β is the corresponding parameter to the sparsity regularization term. Therefore, the parameter-tuning process can be stated as an optimization problem where the network parameters are adjusted aiming to minimize the resulting cost function [16]. In this paper, the AE network aims to use the ability to learn and reconstruct patterns to characterize the different conditions and operating regimes of rotating systems. In addition, a deep neural network (DNN) can be built from stacking several AEs, as shown in Figure 2. In this way, a DNN based on SAE has a greater capacity to automatically extract fault features through multiple layers and non-linear transformations. The process to efficiently construct a DNN is described by Hilton et al. [33]. The result is a multilayer deep-autoencoder (DAE) structure with a bottleneck coding layer in the middle, which represents the features extracted in a low dimension.

Deep Feature Learning Based Methodology
As mentioned, new condition-monitoring strategies are required for those technological fields in which the demanding requirements of operation over rotatory electromechanical systems are considered. This fact is of special importance dealing with bearings of rotating machines since they are one of the most common sources of malfunctions in related electromechanical systems. Moreover, the bearing technological variants, that is, metallic, hybrid and ceramic, lead to an increased complexity when a common framework for a feasible characterization and posterior pattern recognition is desired. In this regard, the flowchart of the proposed deep feature learning based condition-monitoring methodology focused on the diagnosis and identification of bearing faults for different bearing technologies is shown in Figure 3. The diagnosis methodology has been designed following a proposed step-by-step strategy that facilitates its practical application as a conditionmonitoring scheme over electromechanical systems. In this regard, the methodology represents a common framework to face bearing condition characterization and its posterior recognition independently of the bearing's technology. The proposed methodology includes a SAE-based DNN to perform an automatic characterization of the available acquisitions processed under multiple domains, that is, time, frequency and time-frequency. It must be noted that the general structure of the proposed methodology is intended to serve as a guide to adapt such condition-monitoring procedures to different industrial systems, that is, considering differences in the registering modules (i.e., type and/or number of acquired physical magnitudes), and differences in the electromechanical configuration and operation (i.e., speed, torque, but also the bearing's technologies).
Encoder Decoder x' x z Feature extracted Input Recostruction

Deep Feature Learning Based Methodology
As mentioned, new condition-monitoring strategies are required for those technological fields in which the demanding requirements of operation over rotatory electromechanical systems are considered. This fact is of special importance dealing with bearings of rotating machines since they are one of the most common sources of malfunctions in related electromechanical systems. Moreover, the bearing technological variants, that is, metallic, hybrid and ceramic, lead to an increased complexity when a common framework for a feasible characterization and posterior pattern recognition is desired. In this regard, the flowchart of the proposed deep feature learning based condition-monitoring methodology focused on the diagnosis and identification of bearing faults for different bearing technologies is shown in Figure 3. The diagnosis methodology has been designed following a proposed step-by-step strategy that facilitates its practical application as a condition-monitoring scheme over electromechanical systems. In this regard, the methodology represents a common framework to face bearing condition characterization and its posterior recognition independently of the bearing's technology. The proposed methodology includes a SAE-based DNN to perform an automatic characterization of the available acquisitions processed under multiple domains, that is, time, frequency and time-frequency. It must be noted that the general structure of the proposed methodology is intended to serve as a guide to adapt such condition-monitoring procedures to different industrial systems, that is, considering differences in the registering modules (i.e., type and/or number of acquired physical magnitudes), and differences in the electromechanical configuration and operation (i.e., speed, torque, but also the bearing's technologies).
The consideration of DL techniques as part of the related condition-monitoring methodologies represents a complex task that has not been properly solved by most of the available studies in the field. The selection and configuration of the related hyperparameters as a result of wrapper or empirical approaches leads to a high risk of overfitting during the validation of the methodology. In this regard, the proposed three-stage scheme promotes the generalization of the resulting model by introducing the selection of the hyperparameters as part of the methodology, thus leading to a feasible method easy to implement and adaptable to different electromechanical configurations.

Multi-Domain Feature Calculation
Indeed, the proposed methodology has been designed for being adapted to different and/or multiple available physical magnitudes. In this regard, although the analysis of vibration signals represents the most significant source of information when the bearing fault diagnosis is addressed and faced [1,12], the proposed scheme allows the consideration of additional sources of information, that is, multiple vibration axis or even complementary physical magnitudes as the stator currents. Therefore, as Figure 3 depicts, the proposed approach may support a number of N available signals where the considered index i, for I = 1, 2, . . . , N, is used to identify each one of the considered physical magnitudes. Thus, regarding the proposed methodology, in the first stage, the available physical magnitudes are processed aiming to perform their characterization through the estimation of a potentially meaningful set of features. In this sense, the available signals, especially those related to the vibration of the electromechanical system in general, and the bearings in particular, are proposed to be processed into a D number of domains equal to three, that is, time domain (TD), frequency domain (FD) and time-frequency domain (TFD). The consideration of DL techniques as part of the related condition-monitoring methodologies represents a complex task that has not been properly solved by most of the available studies in the field. The selection and configuration of the related hyperparameters as a result of wrapper or empirical approaches leads to a high risk of overfitting during the validation of the methodology. In this regard, the proposed three-stage scheme promotes the generalization of the resulting model by introducing the selection of the hyperparameters as part of the methodology, thus leading to a feasible method easy to implement and adaptable to different electromechanical configurations.

Multi-Domain Feature Calculation
Indeed, the proposed methodology has been designed for being adapted to different and/or multiple available physical magnitudes. In this regard, although the analysis of vibration signals represents the most significant source of information when the bearing fault diagnosis is addressed and faced [1,12], the proposed scheme allows the consideration of additional sources of information, that is, multiple vibration axis or even complementary physical magnitudes as the stator currents. Therefore, as Figure 3 depicts, the proposed approach may support a number of N available signals where the considered index i, for I = 1, 2,…,N, is used to identify each one of the considered physical magnitudes. Thus, regarding the proposed methodology, in the first stage, the available physical magnitudes are processed aiming to perform their characterization through the estimation of a potentially meaningful set of features. In this sense, the available signals, especially those related to the vibration of the electromechanical system in general, and the bearings in particular, are proposed to be processed into a D number of domains equal to three, that is, time domain (TD), frequency domain (FD) and time-frequency domain (TFD).
Although the proposed methodology adapts to any signal processing techniques to Although the proposed methodology adapts to any signal processing techniques to be considered in each domain, a reference set of features is proposed to be calculated from the available physical magnitudes as a trade-off between simplicity and performance. Additionally, as a general framework, the different signals that are available are first subjected to a segmentation procedure, where each acquired signal is individually divided into equal parts of one second. Thus, considering that s i refers to each available signal, for i = 1, 2, . . . , N, the segmentation procedure is performed by following Equation (8) where L refers to the length of the time windows, but, specifically, is the number of sampled points for each segment of the signal; n is the total number of sampled points for each available signal. Through this segmentation process, each available signal is divided in equal parts obtaining n/L segments. Dealing with periodic patterns in rotatory electromechanical machinery, this temporal duration assures enough statistical consistency in most of the practical applications (i.e., rotatory speeds higher than 500 rpm). However, the acquisition time can be increased without any loss of performance for those low-speed applications. Thereby, each considered signal is processed by applying the TD analysis, specifically, a numerical set of T = 15 statistical time domain features is estimated from each segmented part of the available signals; as a result, a consecutive set of numerical features represented into a T-dimensional space is computed; precisely, a feature matrix composed by statistical time domain features is obtained, T ∈ R T . On the other hand, for the FD analysis, the FFT technique is applied to each segmented part of the signals in order to obtain its representative frequency spectrum. Subsequently, a numerical set of 14 statistical features is calculated from each one of the obtained frequency spectrums. Additionally, six characteristic-bearing fault-related frequency components are estimated, that is, the first and second harmonic corresponding to the outer, inner and ball bearing faults. Therefore, in the FD analysis, a resulting set of F = 20 features are calculated from each frequency spectrum, and a representative F-dimensional frequency-domain feature matrix, F ∈ R F , is calculated. Finally, for the TFD analysis, the acquired signals are, first, processed by means of one of the most suitable signal-decomposition techniques, the empirical mode decomposition (EMD), which allows obtaining a set of sub-signals containing the main oscillatory modes included in the original one. Thereby, the EMD technique is applied to each one of the segmented parts of the available signals and, following the related literature, the two first intrinsic modes functions (IMF), are taken into account for being characterized, since they contain the most significant information in terms of characteristic fault patterns. Later, the FFT technique is applied to each IMF to compute their corresponding frequency spectrum. The estimation of the same set of 14 statistical frequency-features is carried out for each IMF's frequency spectrum. Therefore, for the TFD analysis, a set of TF = 28 features are estimated for each acquisition. As a result, a representative TF-dimensional time-frequency domain feature matrix, TF ∈ R TF , is calculated.
Next, Tables 1 and 2 summarize the proposed sets of statistical features estimated during the signal processing in the TD and FD analysis, respectively. These sets of statistical features have been included in other related works; its application has been preferred since they offer a high-performance signal characterization due to the capability of modeling trends and changes in the TD analysis, whereas, for the FD analysis, the estimation of these features allow to identify abrupt changes in the amplitude of those characteristic frequency components [34,35]. Table 1. Proposed set of statistical time domain features estimated during the signal processing in the TD analysis, where each y(j) value belongs to each sample point of the processed signal for j = 1, 2, 3 . . . P, with a total number P samples.

Statistical Feature Mathematical Equation
Mean Maximum value Root mean square Square root mean Standard deviation Variance RMS Shape factor SRM Shape factor Crest factor t 9 = t 2 Table 1. Cont.

Statistical Feature Mathematical Equation
Latitude factor Impulse factor Skewness Kurtosis Fifth moment Sixth moment Table 2. Proposed set of statistical time domain features estimated from the frequency spectra during the signal processing in the FD analysis, where each z(i) is a spectrum for i = 1, 2, . . . ,Q, and Q is the total number of lines for an estimated spectrum; f q (i) is the frequency value of the i-th spectrum line.

Statistical Feature Mathematical Equation
Mean Variance Grand mean Standard deviation 1 D Factor Third moment 1 Fourth moment 1 H Factor J Factor

SAE Feature Learning
The second stage of the proposed method is focused on the self-adaptive extraction procedure to obtain a reduced set of features as a result of the learning procedure over the patterns described by the original feature matrices for each analyzed domain. This procedure is carried out by a multi-SAE structure that is applied over the feature matrices resulting from the analysis of the N available signals through a D number of domains of analysis, that is D = 3, T i , F i , and TF i , for i = 1, 2,...N. Thus, it is proposed to consider as many SAE structures as available feature matrices to extract; for each matrix, a reduced set of features is considered while preserving most of the characteristic information that allows a proper reconstruction of the original signal. In this sense, it must be noted that performing a specific domain-based feature learning procedure over the extracted patterns leads to maximizing the characterization capabilities from each matrix. Indeed, the SAE seeks for an optimum codification in terms of posterior reconstruction error minimization, thus leading to the preservation of the underlying physical behavior of the signal from an unsupervised point of view.
However, as each matrix describes a different statistical distribution in its highdimensional space, each SAE requires a specific hyperparameter tuning to reach a proper performance. Classically, in the related condition-monitoring literature, this procedure is faced empirically by means of a wrapper approach to maximize the final diagnosis performances. However, this approach leads to a high risk of overfitting, that is, the SAE is forced to prioritize those patterns providing higher diagnosis performances over those patterns that better characterize the original signal in terms of the reconstruction's MSE. On the other hand, in most of the cases, several hyperparameter-tuning strategies have been classically proposed in order to optimize the hyperparameters regarding to a single model criterion, in which, the misclassification error is considered to obtain a model with a high performance. In this sense, this proposal faces the challenge of achieving the hyperparameter-tuning procedure by taking into account the trade-off between model accuracy (proper reconstruction of the original signals) and preservation of the characteristic information (underlying physical behavior of the signal). Therefore, the hyperparameter tuning for each considered SAE structure is faced through a heuristic search algorithm, the genetic algorithm (GA), where the fitness function of the GA is focused on the minimization of the MSE value during the reconstruction. In fact, the use of heuristic search algorithms, such as GA, has been widely used and preferred since the reached solutions are based on a stochastic optimization method. Moreover, one of the main benefits of using GA is that optimal or near-optimal solutions of a large-scale problem may be obtained under a reasonable computational time.
Therefore, in this proposal the hyperparameter optimization process is individually carried out by a GA for each considered SAE structure; the SAE hyperparameters that are optimized are (i) the coefficient for the L 2 regularization term, (ii) the coefficient for the sparsity regularization term and (iii) the parameter for sparsity proportion. Thus, the optimization of the three mentioned hyperparameters is performed by following the next steps:

•
Step 1: Initialization of the population: the chromosomes of the GA are initially defined with a logical vector containing three elements, where each element represents each one of the hyperparameters. Subsequently, a random initialization of the population is performed by assigning a specific value to each particular hyperparameter; in fact, the values assigned to each hyperparameter are within a predefined range of values. Once the initialization of the population is achieved, the procedure continues in step 2.

•
Step 2: Evaluation of the population: in this step the fitness function is evaluated based on the minimization of the reconstruction error between the input and the output features. Specifically, the minimization of the reconstruction error is evaluated in terms of the MSE value following Equation (3), as mentioned in Section 2. Thereby, the optimization problem to be solved by the GA involves the search of those specific hyperparameter values that leads a high-performance feature mapping. Then, once the whole population is evaluated under a wide range of values, the condition of best hyperparameter values is analyzed and the procedure continues in step 4.

•
Step 3: Mutation operation: the mutation of the GA produces a new population by means of the roulette wheel selection; the newly generated population takes into account the choice of the best fitness value achieved by the previously evaluated population. Moreover, a mutation operation that is based on the Gaussian distribution is applied during the generation of the new population. Subsequently, the procedure continues in step 2.

•
Step 4: Stop criteria: there are two stop criteria for the GA: (i) the obtention of a reconstruction MSE value lower than a predefined threshold, 5%, and/or, (ii) reaching the maximum number of iterations, 1000. In the case of the first stop criterion, (i), the procedure is repeated iteratively until those optimal hyperparameter values are found until the GA evolves, then the procedure continues in step 3.
On the other hand, regarding the number of neurons for the hidden layers, multiple works reported in the related literature suggest a first hidden layer for expanding the number of neurons, and then the expansion is followed later with reduction in neurons in posterior layers [32]. In this regard, and following such good practices that have been validated in previous works, the SAEs are proposed to be constituted by a first hidden layer to expand neurons in a ratio from 5 to 10 times approximately, and three additional hidden layers reducing neurons in 1/3, 1/3 and 1/2 ratios approximately, thus resulting in a final hidden layer with a number of neurons equal to the number of considered classes, C. By means of this approach, each considered SAE structure provides the most representative C features that preserves the characteristic features of the signal for each considered domain as much as possible, that is, the most representative underlying physical behaviors seen from each domain that will be considered for posterior recognition. Thus, the SAE structures for TD, FD and TFD analysis are

Data Fusion and Fault Diagnosis
Finally, a concatenated vector of D × N number of SAE structures, resulting from the consideration of N available signals that are processed into N domains per C considered classes (D × N × C), is considered as a feature fusion approach for classification purposes. Taking into consideration that all the posterior stages are focused on the learning of the most significant features from each available signal in all three domains, in this stage, the most significant patterns for their recognition are extracted. It should be noted that the proposed methodology allows a simple configuration for the classification task, since the input vector represents a processed set of features highlighting the representative characteristics of the signals.
In this regard, a simple softmax layer is proposed for diagnosis purposes. Thus, carrying out a one-neuron layer training based on a supervised approach. The proposed softmax layer provides the corresponding probability as part of the activation function values, resulting in a vector of λ 1 (z), λ 2 (z), λ C (z), where the λ j (z) represents the probability of j-th class. The calculation of λ j (z) is defined following:

Pulley-Belt Electromechanical System
The electromechanical system used to experiment with different bearing technologies is a self-designed laboratory test bench based on a pulley-belt system, as shown in Figure 4. The electromechanical system comprises a 971-W three-phase induction motor (IM), which model is WEG00136APE48T; the IM has one pair of poles and supports 220 VAC as a power supply. A variable frequency driver (VFD), model WEGCFW08, is used to feed the motor allowing the control of its rotational speed. Moreover, the motor is coupled by means of a pulley-belt system to an ordinary alternator. That produces a nominal load in the IM between 25% and 35%. The automotive alternator is used as a mechanical load and it produces a nominal load with variations within 25% to 35% in the IM. Such percentages of nominal loads are in terms of the stator current consumption in the IM and these values are estimated through experimental tests and by doing comparisons with the nominal current of the IM at its full load. On the other hand, although the alternator may be used as a power generation source to feed an arrangement of resistive loads, the performed experiments were carried out without considering the additional connection of resistive loads to the alternator. Thereby, the aforementioned percentages of load conditions are produced just by the rotational inertia of the alternator.

Pulley-Belt Electromechanical System
The electromechanical system used to experiment with different bearing technologies is a self-designed laboratory test bench based on a pulley-belt system, as shown in Figure 4. The electromechanical system comprises a 971-W three-phase induction motor (IM), which model is WEG00136APE48T; the IM has one pair of poles and supports 220 VAC as a power supply. A variable frequency driver (VFD), model WEGCFW08, is used to feed the motor allowing the control of its rotational speed. Moreover, the motor is coupled by means of a pulley-belt system to an ordinary alternator. That produces a nominal load in the IM between 25% and 35%. The automotive alternator is used as a mechanical load and it produces a nominal load with variations within 25% to 35% in the IM. Such percentages of nominal loads are in terms of the stator current consumption in the IM and these values are estimated through experimental tests and by doing comparisons with the nominal current of the IM at its full load. On the other hand, although the alternator may be used as a power generation source to feed an arrangement of resistive loads, the performed experiments were carried out without considering the additional connection of resistive loads to the alternator. Thereby, the aforementioned percentages of load conditions are produced just by the rotational inertia of the alternator.  Regarding the acquisition of the data, two vibration axes and one stator current are measured and acquired with a proprietary data acquisition system (DAS) that is a low-cost design based on FPGA (field-programmable gate array). The designed DAS has a 12-bit 4-channel serial-output sampling analog-to-digital converter from Texas Instruments (i.e., model ADS7841). For the experimentation, different frequency values are set in the VFD to produce different rotational speeds in the IM and, for each operating frequency, the continuous measurement of the vibration signals and stator current is performed. Thus, the mechanical vibrations produced by the electromechanical system are acquired with an accelerometer model LIS3L02AS4 that is placed on the top of the IM, as Figure 4 illustrates; the acquired vibrations signals belong to the perpendicular plane of the rotating axis of the motor shaft. On the other hand, the stator current of one of the three phases of the IM is measured through a hall-effect sensor from Tamura Corporation model L08P050D15, such sensor has high linearity (i.e., 1%) and allows to perform measurements of 50 Amp; thus, the sensor is placed between the power supply lines, as shown in Figure 4. Furthermore, the used sensors are mounted individually on a PCB with its corresponding signal conditioning and anti-alias filtering. The sampling frequency to acquire the vibration signals and the stator current was set to 3000Hz and, for each performed experiment, 300 s of continuous operation are acquired and stored in a personal computer for posterior processing.
In regard to the assessed conditions, different bearing technologies are experimentally evaluated in the electrical motor to assess its corresponding condition. Hence, a metallic bearing, a hybrid bearing and a full ceramic bearing are experimentally tested. The bearing model is the 6203, and it is the end drive bearing of the IM. Two different bearing conditions are evaluated for each one of the considered technologies, that is, the healthy or normal condition (HC), and the outer race bearing defect or bearing damaged (BF) are tested. To produce the bearing defects, a metallic, a hybrid and a ceramic bearing were artificially damaged by drilling a hole with a tungsten drill bit of 1/16 of diameter; the drilled hole completely passed the bearing outer race of the three bearings. Thus, the drilled bearings are shown in Figure 5a-c for each corresponding technology: metallic, hybrid and full ceramic, respectively. Thus, for each considered bearing technology, both bearing conditions (HC and OD) are iteratively tested in the IM under different supply frequencies that are set in the VFD, that is, 5 Hz, 15 Hz, 50 Hz and 60 Hz. Thus, the condition assessment in different bearing technologies may represent a complex task, even more, whether a specific pattern is not shown. Theoretically, it is well-known that for bearings with defects in the inner race and/or outer race, the bearing system is under the influence of induced impacts when the balls get in contact with the defect area during the rotational working operation of the bearing. Although these impacts usually appear at intervals of time that are influenced by the size and shape of the defect, its appearance may not occur depending on the severity of the damage. In this sense, aiming to show the complexity of the addressed problem, from Figure 6a-c the vibration patterns acquired when the different bearing technologies are tested under the faulty condition, metallic, hybrid and full ceramic, are shown, respectively. As it can be appreciated, for any of the cases, a specific pattern does not exist that depicts the occurrence of the outer race faulty condition over the vibration patterns; specifically, the vibration patterns do not show periodic impacts that may be associated with the bearing defect. On the other hand, even though the appearing of periodic impacts on the vibration patterns may be induced by the influence of the outer race bearing fault in all considered bearing technologies, metallic, hybrid and full ceramic, respectively; the condition assessment and fault detection may be complicated due to the bearings under analysis have similar geometric properties. Additionally, aiming to provide a better understanding of the addressed problem, the corresponding frequency spectra for each raw vibration signal of Figure 6 are estimated by means of the fast Fourier transform. Thus, in Figure 7 is possible to observe the vibration spectra for all bearing technologies, metallic, hybrid and ceramic; as it can be appreciated, the theoretical fault-related frequency component of the outer race bearing defect is around 150 Hz when the bearings are operated at 50 Hz in the IM. Additionally, it should be highlighted that a clear difference between all estimated frequency spectra does not exist, that is, all the frequency spectra show similar amplitude values; therefore, additional signal processing techniques have to be implemented to improve the condition assessment.

Rolling Bearing CWRU Dataset
In order to verify the effectiveness of the proposed methodology, the public experimental dataset provided by the Case Western Reserve University Bearing Data Center (CWRU) has been also considered for being analyzed under the proposed DL-based fault

Rolling Bearing CWRU Dataset
In order to verify the effectiveness of the proposed methodology, the public experimental dataset provided by the Case Western Reserve University Bearing Data Center (CWRU) has been also considered for being analyzed under the proposed DL-based fault

Rolling Bearing CWRU Dataset
In order to verify the effectiveness of the proposed methodology, the public experimental dataset provided by the Case Western Reserve University Bearing Data Center (CWRU) has been also considered for being analyzed under the proposed DL-based fault diagnosis approach. Such database is considered as a standard database and it has been used widely used by many researchers to prove the effectiveness and applicability of their condition-monitoring approaches [30,36]; several condition-monitoring and fault diagnosis strategies have been specifically based on the analysis the CWRU data [15][16][17]30,31].
The test bench consists of a 1.49 kW reliance electric motor, a torque transducer, a dynamometer and an electronic controller; the database contains vibration data that have been collected from the experimental test rig when different damaged bearings are iteratively tested under four different loads conditions: 0, 0.74, 1.49 and 2.23 kW. Regarding the bearing conditions, besides the healthy condition or normal condition (HC), three different faulty conditions are introduced separately in the drive-end of the reliance electric motor, i.e., inner race fault (IF), outer race fault (OF) and ball fault (BF). The vibration signals were acquired through an accelerometer that was attached with a magnetic base at the drive-end of the reliance electric motor; the vibration data was collected with 12 kHz as a sampling frequency. Thereby, in order to validate the proposed diagnosis methodology in front of multiple bearing faults, the vibration signals corresponding to the HC, IF, OF and BF conditions with artificial single point faults of 0.014 in diameter are considered.

Evaluation of the Diagnostic Model in the Pulley-Belt Electromechanical System
In a first validation stage, the proposed method is applied over the available signals acquired from the pulley-belt-based electromechanical system, that is, two vibration signals and one stator current. As aforementioned and for posterior notations, the considered indexes to identify these physical magnitudes are i = 1 and i = 2 for vibration and stator current, respectively. Thus, as described previously, the available signals were acquired during 300 s of the continuous operation of the IM and, according to the proposed method, each one of the acquired signals is segmented in equal parts of 1 s to generate a consecutive set of samples. As a result, three hundred individual segments were obtained from each considered signal by following Equation (8). Afterward, the multi-domain analysis was performed over each one of the segmented parts of the signals to achieve the signal characterization in the three mentioned domains, that is, DT, DF and DTF. In this regard, for the analysis carried out in TD, a set of 15 statistical features was computed from each segment of each available signal; therefore, for both vibration signals a feature matrix T 1 that contains 30 statistical features represented with 300 samples was achieved; meanwhile, the feature matrix achieved from the stator current T 2 consists of 15 statistical features with 300 samples. Subsequently, when the FD analysis is performed, from each segmented part of each available signal, the corresponding frequency spectrum is estimated by means of applying the FFT technique and then, from each resulting spectrum, the proposed set of 14 statistical features is calculated to characterize each spectrum into a set of representative numerical values. As a result, a characteristic feature matrix F 1 with 28 statistical features and 300 samples is carried out from both vibrations signals, while the characteristic matrix F 2 estimated from the stator current contains 14 statistical features with 300 samples. Additionally, for each available signal, six fault-related frequency features are taken into account; these additional features are the two first harmonic frequencies associated with three possible sources of fault in the bearing (i.e., outer, inner and ball bearing faults); thereby, these fault-related frequency features are represented by its corresponding amplitude resulting from each frequency spectrum. Accordingly, for the FD analysis, the resulting feature matrix F 1 of both vibration signals has a total of 40 features with 300 samples, and the feature matrix F 2 for the stator current has a total of 20 features with 300 samples. Then, during the TFD analysis, each segmented part is evaluated by the EMD technique in order to obtain the intrinsic modes associated with nonlinearities of the signals; thus, from each segmented part of the available signals, the two first intrinsic mode functions (i.e., IMF 1 and IMF 2 ) are considered as the two main intrinsic modes. Consecutively, the FFT technique is applied over each intrinsic mode and the resulting frequency spectra are then individually characterized by the estimation of the proposed set of 14 statistical features. Therefore, a feature matrix TF 1 with 56 statistical features and 300 samples is achieved for both vibration signals, whereas, for the stator current a feature matrix TF 2 with 28 features and 300 samples is obtained.
Thus, six representative feature matrices are estimated for each of the three experiments considering different bearing technologies. In summary, the feature matrices resulting from the analysis of the vibration signals in TD, FD and TFD are T 1 , F 1 and TF 1 , respectively; whereas, T 2 , F 2 , TF 2 are the resulting feature matrices from the analysis of the stator current in the TD, FD and TFD, respectively. Additionally, it should be clarified that these representative feature matrices are also estimated for each one of the supply frequencies considered (i.e., 5 Hz, 15 Hz, 50 Hz and 60 Hz). Following the proposed method, for each one of the bearing technologies and for each one of the available signals, the whole resulting feature matrices, considering all supply frequencies and taking into account the assessed bearing conditions, are grouped according to the domain of analysis. For example, in Table 3 the grouping of the feature matrices that are estimated from the vibrations signals is summarized; the feature matrices are grouped according to the domain of analysis, as it can be appreciated, and each corresponding grouping considers the bearing conditions (i.e., HC and BD conditions). These grouped matrices are represented by T 1 group , F 1 group , TF 1 group , where Equation (40) gives the detail of a specific set of grouped matrices for T 1 group . Thus, the same grouping is applied to feature matrices that are estimated from analyzing the stator current in the TD, FD and TFD for each one of the supply frequencies tested in the VFD and the obtained grouped matrices are T 2 group , F 2 group and TF 2 group . )   Table 3. Representation of the grouping of the feature matrices computed by means of analyzing the vibration signals in TD, FD and TFD for each supply frequency for the considered bearing conditions; where the notations HC and BD represent the healthy condition and the bearing damaged condition.

Related condition
Normal T 1 @5Hz@HC F 1 @5Hz@HC TF 1 @5Hz@HC T 1 @15Hz@HC F 1 @15Hz@HC TF 1 @15Hz@HC T 1 @50Hz@HC F 1 @50Hz@HC TF 1 @50Hz@HC T 1 @60Hz@HC F 1 @60Hz@HC TF 1 @60Hz@HC Damaged T 1 @5Hz@BD F 1 @5Hz@BD TF 1 @5Hz@BD T 1 @15Hz@BD F 1 @15Hz@BD TF 1 @15Hz@BD T 1 @50Hz@BD F 1 @50Hz@BD TF 1 @50Hz@BD T 1 @60Hz@BD F 1 @60Hz@BD TF 1 @60Hz@BD Grouped matrices T 1 group F 1 group TF 1 group Afterward, for each bearing technology that is tested under different supply frequencies, the corresponding feature matrices achieved from the multi-domain analysis of the available vibrations and stator current are grouped according to the domain of analysis (i.e., TD, FD and TFD), and then such matrices are individually used as the input space in several SAE structures. Indeed, a particular SAE structure for each one of the grouped sets of feature matrices for each considered bearing technology and available signal is defined. Thus, i.e., for the metallic bearing, the considered SAE structures for the grouped matrices of the vibration signal analyzed in TD, FD and TFD are SAE respectively. Subsequently, as aforementioned, each SAE structure in conjunction with the GA is implemented over each corresponding feature space to be evaluated aiming to extract the most representative features that are related to the condition of the IM (i.e., the bearing condition). Moreover, as previously described, during the feature extraction procedure, the hyperparameters of the SAE network are automatically tuned by solving an optimization problem that uses the minimization of the reconstruction error as the fitness function. In this sense, in Tables 4 and 5 the hyperparameters that are tuned by the GA for each SAE structure that is considered for each domain of analysis for the available signals, vibrations and stator current, are summarized, respectively. Additionally, Tables 4 and 5 summarize the tuned hyperparameter for each of the bearing technologies tested. From the optimized hyperparameters for each feature space that are shown in Tables 4  and 5, it should be noted that each hyperparameter shows variations between a specific range of values. That is, it can be assumed for the three feature domains and for the three bearing technologies, that all the SAE structures show a high-performance characterization of the input feature space by considering values in the L 2 Regularization parameter about (2 ± 1) × 10 −5 , by considering values about (5 ± 3) × 10 −5 in the Sparsity Regularization parameter and by considering the Sparsity Proportion with values about (0.5 ± 0.2). On the other hand, Table 6 summarizes the corresponding MSE values that are carried out during the same optimization process are summarized. Indeed, such MSE values are considered as the metric to numerically compare the similarity between the original input features and the mapped and reconstructed features. As it can be noticed, all the MSE values summarized in Table 6 show small values which are desired to perform a high-performance characterization in the feature learning for the proposed SAE structures. Thus, to show the effectiveness of the SAE-based feature learning approach, from Figure 8a-c the original and reconstructed characteristic patterns that represent the HC in three different domains are shown, that is, TD, FD and TFD, respectively. These patterns belong to the metallic bearing tested with 5 Hz in the VFD, and as it is qualitatively appreciated, the reconstructed patterns match properly with the original ones.    Aiming to highlight the contribution of the proposed multi-domain analysis in front of the characterization of physical magnitudes (i.e., vibrations and stator currents) for fault diagnosis applied to different bearing technologies, different visual representations of some of the feature spaces mapped by their corresponding SAE structure are shown next. Specifically, from Figure 9a-d the resulting feature spaces mapped by their corresponding SAE are projected into a 2D space through the T-SNE technique. Such representation allows interpretation of the data distribution of the considered bearing conditions that are tested at different operating frequencies. Thus, the 2D projections of Figure 9a,b represent both conditions, HC and BD, of the metallic bearing when the vibration signals and stator current are analyzed in the FD, respectively, while Figure 9c,d represent both conditions of the metallic bearing when the acquired signals are processed in the TFD, the vibrations and stator current, respectively. As it can be seen from Figure 9a-d, regardless of the considered domain of analysis, there exists an overlapping between the NC and the BD conditions for all the tested supply frequencies. Additionally, independently of the bearing technology, the consideration of a unique domain of analysis applied over a specific physical magnitude, i.e., the stator current, may not provide enough information that leads to a clear discrimination between bearing conditions. In this regard, from Figure 10a-c the visual representations of the data distribution are shown for the different bearing technologies, metallic, hybrid and ceramic, respectively. The achieved projections of Figure 10a-c represent both bearing conditions (HC and BD) when the stator current signature is analyzed in the TD, and as it is observed, an overlap appears between the considered bearing conditions.  Lastly, from Figure 10a-c the visual representations of the data distributions when all feature spaces domains are considered are shown, and as it can be noticed, regardless of the bearing technology and the considered supply frequency, the data fusion in multiple domains leads to a high-performance characterization of different bearing conditions. From Figure 11a-c, it can be observed that a clear separation is performed between the corresponding HC and BD conditions for the considered bearing technologies: metallic, hybrid and ceramic, respectively. Thus, condition-monitoring schemes based on data fusion through DL techniques represent a practical solution to achieve reliable condition assessments in multiple applications. Finally, aiming to provide the automatic fault diagnosis, the feature spaces mapped by their corresponding SAE structure are concatenated as a feature fusion approach, and then the concatenated vector is evaluated through a simple softmax layer that is proposed for diagnosis purposes. Thereby, in Table 7 the achieved percentages of classification accuracy for each considered bearing technology are summarized; additionally, in Table 7 the classification accuracies corresponding to the particular evaluation of each feature domain, TD, FD and TFD, and the proposed fusion scheme TD+FD+TFD, are included. As the obtained results depict, the proposed condition-monitoring scheme based on the multi-domain feature calculation and SAE feature learning leads to obtaining a high-performance classification accuracy.  Lastly, from Figure 10a-c the visual representations of the data distributions when all feature spaces domains are considered are shown, and as it can be noticed, regardless of the bearing technology and the considered supply frequency, the data fusion in multiple domains leads to a high-performance characterization of different bearing conditions. From Figure 11a-c, it can be observed that a clear separation is performed between the corresponding HC and BD conditions for the considered bearing technologies: metallic, hybrid and ceramic, respectively. Thus, condition-monitoring schemes based on data fusion through DL techniques represent a practical solution to achieve reliable condition assessments in multiple applications. Finally, aiming to provide the automatic fault diagnosis, the feature spaces mapped by their corresponding SAE structure are concatenated as a feature fusion approach, and then the concatenated vector is evaluated through a simple softmax layer that is proposed for diagnosis purposes. Thereby, in Table 7 the achieved percentages of classification accuracy for each considered bearing technology are summarized; additionally, in Table 7 the classification accuracies corresponding to the particular evaluation of each feature domain, TD, FD and TFD, and the proposed fusion scheme TD+FD+TFD, are included. As the obtained results depict, the proposed condition-monitoring scheme based on the multi-domain feature calculation and SAE feature learning leads to obtaining a high-performance classification accuracy.    Finally, aiming to highlight the effectiveness of the proposed method in front of classical condition-monitoring approaches that are based on machine learning, the feature matrices obtained by analyzing the available signals in TD, FD and TFD are subjected to a dimensionality reduction procedure by means of two well-known techniques, the PCA and the LDA. Afterward, the extracted features represented in a 2-dimensional space are evaluated through a classical NN-based classifier to achieve the condition assessment and fault diagnosis. Thus, two classical condition-monitoring structures are considered, PCA+NN and LDA+NN, to evaluate the different studied conditions for the considered bearing technologies. Both classical structures are individually applied to each particular feature domain, TD, FD and TFD, and are also applied under the fusion scheme TD + FD + TFD. The achieved classification ratios are summarized in Table 7. As it can be noticed, the classification ratios achieved by the proposed approach (SAE + Softmax) increase the classification performance up to 38.6% in the most critical case (TF features of the metallic bearing). Thus, the consideration of classical approaches, such as PCA + NN and LDA + NN, leads to obtaining low-performance classification ratios due to the fact that an additional technique to select those representative and meaningful features is required; meanwhile, the proposed approach results in high-performance classification ratios due to the ability of self-adapting to the extraction of the characteristic fault-related features from different signals that are processed in different domains.
Lastly, in order to highlight the advantages and effectiveness of the proposed SAE feature learning approach in front of classical approaches, a quantitative analysis of the performance metrics, accuracy, precision, recall and F1 score is performed. Thus, these performance metrics are estimated from the resulting classification matrices achieved by means of the simple softmax layer. Subsequently, in Table 8 the accuracy, precision, recall, and F1 score corresponding to each particular evaluation of each feature domain, TD, FD and TFD, and the proposed fusion scheme TD + FD + TFD, in the softmax layer are summarized. As it can be appreciated, all the performance metrics show values greater than 0.91 in the most critical cases, whereas the values of the performance metrics are greater than 0.97 in most cases. The averaged values of the performance metrics related to the confusion matrices achieved by classical approaches are 0.59, 0.74, 0.72 and 0.74 for the accuracy, precision, recall and F1 score, respectively, and 0.73, 0.83, 0.86 and 0.84 for the accuracy, precision, recall and F1, respectively, when the classical approaches PCA + NN and LDA + NN are evaluated, respectively. Therefore, the achieved results show that the proposed diagnosis methodology based on deep feature learning can be effectively applied to the diagnosis and identification of bearing faults for different bearing technologies, such as metallic, hybrid and ceramic bearings, in electromechanical systems. Additionally, the obtained results make the proposed approach feasible to be implemented as a part of the condition-based maintenance programs in industrial applications that involve rotating machinery.

Evaluation of the Diagnostic Model in the CWRU Database
In this section, the vibration signals of the CWRU bearing dataset are analyzed, aiming to evaluate the performance of the proposed DL-based diagnostic method in a reference framework, particularly in front of different metallic bearing failures. Thereby, the bearing conditions to be evaluated are the healthy or normal condition (HC), inner (IF), outer (OF) and ball (BF) faults with single-point faults of 0.014 in of diameter; such bearing conditions are also tested in combination with four different loads conditions: 0, 0.74, 1.49 and 2.23 kW. Subsequently, the vibration signals of the CWRU dataset are processed by following the practical application of the proposed DL-based diagnostic method and the collected vibration signals are first segmented in equal parts, resulting in a consecutive set of 150 samples, for each assessed bearing condition. Each sample of the consecutive sets of samples consists of 3000 data points, then, over each estimated sample the proposed multi-domain feature calculation is applied, that is, the processing of the vibration signals in TD, FD and TFD. The three resulting feature matrices are T 1 , F 1 and TF 1 , where each feature matrix comprises of 15, 20 and 28 features with 150 samples. Accordingly, for each one of the three resulting feature matrices, the total number of 150 samples corresponding to each tested condition is divided for training and testing purposes; in this sense, a random number of 100 samples are adopted for training and the remaining 50 samples for testing.
Afterward and similarly as in the previous experimental stage, the SAE feature learning is performed by a multi-SAE structure to be applied over the feature matrices resulting from the analysis of the unique available signal (N = 1) through the different three domains, that is, T i , F i , and TF i , for i = 1,...N. Thus, the SAE structures for TD, FD and TFD analysis are SAE T i , SAE F i and SAE TF i , respectively. In this regard, the automatic optimization of the hyperparameters of the SAE network is carried out through the GA, aiming to minimize the MSE reconstruction value. Thereby, in Table 9 the achieved hyperparameters obtained through the application of the proposed tuning strategy are summarized; the average MSE value obtained during the optimization procedure shows values about (2 ± 1) × 10 −4 that depict a high-performance characterization in the feature learning by the means of the SAE structure. Aiming to prove the effectiveness of the SAE-based feature learning approach, from Figure 12a-d the original and reconstructed feature patterns that represent assessed bearing conditions, NC, IF, OF and BF in different domains are shown, respectively. As it can be appreciated, the reconstructed patterns match properly with the original ones. Figure 12a shows the patterns corresponding to the HC represented in TD and their corresponding reconstruction obtained by the characterization of the SAE. As can be seen qualitatively, the representing feature patterns are correctly characterized, presenting an error of only about 5.07%. On the other hand, Figure 12b-d show the characteristic feature patterns of the vibration signals related to the IF, OF and BF bearing conditions represented in TFD, TD and FD, respectively. The reconstruction error achieved during the feature learning of the considered bearing fault conditions are: 1.99%, 6.08% and 2.43%, correspondingly. Table 9. Hyperparameter used in a deep SAE-based model for vibration signals in CWRU database. (2 ± 1) × 10 −4 that depict a high-performance characterization in the feature learning by the means of the SAE structure. Aiming to prove the effectiveness of the SAE-based feature learning approach, from Figure 12a-d the original and reconstructed feature patterns that represent assessed bearing conditions, NC, IF, OF and BF in different domains are shown, respectively. As it can be appreciated, the reconstructed patterns match properly with the original ones. Figure 12a shows the patterns corresponding to the HC represented in TD and their corresponding reconstruction obtained by the characterization of the SAE. As can be seen qualitatively, the representing feature patterns are correctly characterized, presenting an error of only about 5.07%. On the other hand, Figure 12b-d show the characteristic feature patterns of the vibration signals related to the IF, OF and BF bearing conditions represented in TFD, TD and FD, respectively. The reconstruction error achieved during the feature learning of the considered bearing fault conditions are: 1.99%, 6.08% and 2.43%, correspondingly. Table 9. Hyperparameter used in a deep SAE-based model for vibration signals in CWRU database. Subsequently, the data distribution of the considered healthy states is projected into a 2D space by means of applying the T-SNE technique over the feature spaces mapped by their corresponding SAE structure. From Figure 13a, the representations of the considered bearing conditions, HC, IF, OF and BF are shown when the vibration signals of the CWRU dataset are analyzed in the TD, FD and TFD, respectively; Figure 13d shows the data distribution of the considered conditions when all feature space domains are taken into account, which is the proposed fusion approach (TD+FD+TFD). From these resulting projections, it should be mentioned that several samples appear overlapped when vibration signals are analyzed in TD (Figure 13a), whereas a better performance is achieved through the analysis in FD, TFD and the proposed fusion scheme (Figure 13b-d, respectively). As expected, the proposed DL-based diagnostic method exhibits the ability to learn and selfadapt the extraction of significant fault-related features from the available physical magnitudes for the characterization of electromechanical systems. Finally, the automatic fault Subsequently, the data distribution of the considered healthy states is projected into a 2D space by means of applying the T-SNE technique over the feature spaces mapped by their corresponding SAE structure. From Figure 13a, the representations of the considered bearing conditions, HC, IF, OF and BF are shown when the vibration signals of the CWRU dataset are analyzed in the TD, FD and TFD, respectively; Figure 13d shows the data distribution of the considered conditions when all feature space domains are taken into account, which is the proposed fusion approach (TD+FD+TFD). From these resulting projections, it should be mentioned that several samples appear overlapped when vibration signals are analyzed in TD (Figure 13a), whereas a better performance is achieved through the analysis in FD, TFD and the proposed fusion scheme (Figure 13b-d, respectively). As expected, the proposed DL-based diagnostic method exhibits the ability to learn and self-adapt the extraction of significant fault-related features from the available physical magnitudes for the characterization of electromechanical systems. Finally, the automatic fault diagnosis is performed through the proposed simple softmax layer, thus, the feature spaces mapped by its corresponding SAE structure are concatenated as a feature fusion approach. In Table 10, the achieved percentages of classification accuracy for each considered bearing condition of the CWRU dataset are summarized. Furthermore, in Table 10 the classification ratios corresponding to each particular feature domain, TD, FD and TFD, and the proposed fusion scheme, TD + FD + TDF, are appended. As the obtained results depict, the proposed fusion approach is superior to any of the individual domains due to capability of deep feature learning, which is supported by SAEs, for adapting and extracting the best fault-related characteristic patters of the electromechanical systems under evaluation.

Conclusions
In this work is proposed a novel data-driven condition-monitoring methodology based on deep feature learning applied to fault diagnosis and identification in differen bearings technology, that is, metallic, hybrid and full ceramic, in electromechanical sys

Conclusions
In this work is proposed a novel data-driven condition-monitoring methodology based on deep feature learning applied to fault diagnosis and identification in different bearings technology, that is, metallic, hybrid and full ceramic, in electromechanical systems. Advantageous results have been obtained through this proposed approach; indeed, there are three important aspects of this proposal that must be emphasized. The first one is that the use of a stacked autoencoder-neural network-based structure allows to perform a high-effective feature characterization during the assessment of bearing faults for different bearing technologies. Certainly, the important characteristic of the proposed SAE structure is its capability of learning and self-adapting to those meaningful features that are estimated from the available signal. The deep feature learning is supported by a proposed feature domain fusion approach that considers the signal processing in time, frequency and timefrequency domains through two processing techniques such as the FFT and the EMD. The second important aspect is the proposal and integration of the process that facilitates the tuning of the auto-encoder hyperparameters as part of the proposed approach; specifically, the use of a genetic algorithm in the tunning process leads to overcome the current limitations for practical application by industrial maintenance practitioners and leads to propose generalized-based approaches avoiding over-fitted solutions. Finally, the third aspect is related to the high-performance results obtained during the validation of the diagnosis method based on the deep feature learning approach in front of two different experimental scenarios, considering different bearing technologies and faults, different operating conditions, different electromechanical structures and even different acquisition systems. This fact proves the robustness and adaptability of the proposed method demonstrating that this proposal overcomes the challenge of noise immunity which represents a critical issue due to noise generation is inherent to the operation of rotating systems under industrial environments. Additionally, adaptability means other physical magnitudes and numerical features different from the considered in the proposal could be added since the method shows good feature learning capabilities during the characterization stage. Regarding the limitations of the method, the use and implementation of the genetic algorithm as a tool to search the optimal hyperparameter values may represent a challenge for industrial maintenance practitioners since prior knowledge is needed. The proposed DL diagnosis method is designed for the analysis of bearing faults, independently of the bearing technology; thus, this approach involves the validation of a common framework for the detection and classification of bearing faults that allows the consideration of multiple patterns. In this sense, future works consider the evolution towards evolving learning systems supported by the integration of novelty detection, incremental learning and transfer learning remain to be addressed, and that may allow the application of the deep feature learning method to identify the occurrence of faults in other elements such as gears, shafts, rotors and couplings, among others.