Next Article in Journal
Braking Energy Recovery Control Strategy Based on Instantaneous Response and Dynamic Weight Optimization
Previous Article in Journal
Dynamic Damage Behavior Analysis of Hail Impact on Composite Radome Structure Using Peridynamic Bond-Based Theory
Previous Article in Special Issue
A Novel Stagger Prediction Method for Overhead Rigid Conductor Systems Using Force Measurements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Machine Learning Vibration-Based Methodology for Robust Detection and Severity Characterization of Gear Incipient Faults Under Variable Working Speed and Load

by
Dimitrios M. Bourdalos
and
John S. Sakellariou
*
Stochastic Mechanical Systems & Automation (SMSA) Laboratory, Department of Mechanical Engineering and Aeronautics, University of Patras, 26504 Patras, Greece
*
Author to whom correspondence should be addressed.
Machines 2026, 14(1), 9; https://doi.org/10.3390/machines14010009
Submission received: 31 October 2025 / Revised: 12 December 2025 / Accepted: 17 December 2025 / Published: 19 December 2025

Abstract

A machine learning (ML) methodology for the robust detection and severity characterization of incipient gear faults under variable speed and load is postulated. The methodology is trained using vibration signals from a single accelerometer mounted on the gearbox, simultaneously acquired with tachometer signals at a sample of working conditions (WCs) from the range of interest. A special parametric identification procedure of gearbox dynamics that may account for the continuous range of WCs is introduced through ‘clouds’ of advanced stochastic data-driven Functionally Pooled models, estimated from angularly resampled vibration signals. Each cloud represents the gearbox dynamics at a specific fault severity level, while the pseudo-static effects of the WCs on the dynamics are accounted for through data pooling. Fault detection and severity characterization are achieved by testing the consistency of a vibration signal with each model cloud within a hypothesis testing framework in which the unknown load is also estimated. The methodology is assessed through 18,300 experiments on a single-stage spur gearbox including four incipient single-tooth pinion faults, 61 speeds, and four load levels. The faults produce no significant changes in the time-domain signals, while their frequency-domain effects overlap with the variations caused by the WCs, rendering the diagnosis problem highly challenging. The comparison with a state-of-the-art deep Stacked Autoencoder (SAE) demonstrates the ML method’s superior performance, achieving 95.4% and 91.6% accuracy in fault detection and characterization, respectively.

1. Introduction

Gearboxes are key components in various applications, enabling power transmission and motion control in complex mechanical systems such as robotics, vehicles and wind turbines [1]. The diagnosis of gear faults, including detection and severity assessment, are essential for ensuring operational reliability and preventing unexpected downtime. Maximum effectiveness is achieved when diagnosis is performed at the incipient stage of fault development so that maintenance actions are undertaken before catastrophic failures and costly breakdowns occur [2].
Among available diagnostic methods, vibration-based methods are widely adopted due to their high sensitivity, ease of implementation, and relatively low cost. The underlying premise is that faults induce alterations in gearbox dynamics that are reflected in measured vibration signals. Over the last decades, numerous vibration-based gearbox fault diagnosis methods have been proposed, ranging from classical signal-processing techniques (e.g., decomposition, demodulation, cyclostationary analysis, order tracking) [3] to more recent data-driven and deep learning approaches [4].
Most studies, however, are performed under constant working conditions (WCs), typically involving a single rotating speed and load level. In practical applications, gearboxes operate under variable WCs, encompassing multiple levels of rotating speed and load. Typical examples include industrial applications during production phases, where WCs change according to process requirements, as well as transportation, with vehicles operating at distinct speeds and payloads that remain constant over route segments, and helicopter drivetrains which experience different WCs across flight phases such as take-off, and cruise [5]. Under these conditions, alterations in the vibration signals are observed, such as amplitude and frequency modulations, that may mask incipient fault signatures and lead to false alarms [2]. The present study focuses on this case of multiple discrete WCs, that remain nominally constant over specific time intervals.
A significant portion of the relevant literature addressing this complex problem relies on non-parametric methods, where features are selected because they are highly sensitive to faults yet relatively insensitive to variations across variable WCs. These features are extracted from the vibration signals and include time, frequency and time–frequency domain features such as RMS, kurtosis and mean frequency [6,7,8,9]. In an alternative approach, vibration signal decomposition is performed using methods like the Wavelet Packet Decomposition (WPD) [7,10,11] and the Empirical Mode Decomposition (EMD) [7,12] to decompose the original frequency bandwidth into multiple narrow-band components. Features based on WPD coefficients and EMD Intrinsic Mode Functions (IMFs), including entropy measures and residual RMS, are commonly applied [7,10,11]. Numerous features mentioned above have been broadly coupled with machine learning (ML) classifiers such as Support Vector Machine (SVM) [8] and k-Nearest Neighbor (k-NN) [6,9,10] for incipient gear fault diagnosis under variable operating speed and load [6,7,10]. Despite their effectiveness, these methods are typically trained on large datasets comprising numerous multi-sensor vibration signals spanning all considered WCs, which are not always available in practice.
More recently, with the rapid development of artificial intelligence, gear fault diagnosis under variable WCs has been extensively investigated using non-parametric approaches based on neural networks (NNs) [13]. Among these, Deep Learning (DL) architectures have received considerable attention for their ability to automatically learn discriminative features directly from vibration signals, reducing or eliminating the need for manual feature extraction and user expertise [4]. In this direction, many Deep Transfer Learning (DTL) approaches have been proposed that interpret each working condition as a distinct domain and aim to learn domain-invariant yet fault-discriminative representations so that vibration signals corresponding to the same health state remain similar across different WCs while preserving separability between different health states. This is typically achieved by aligning the feature distributions of the source and target domains or by fine-tuning shared representations allowing the knowledge learned on one working condition to be effectively transferred to another. In particular, feature decoupling architectures [14], generalization networks [15,16], and adversarial networks [17,18] have been proposed for incipient gear fault diagnosis under a small number of different speeds and loads. More recently, a denoising network incorporating equal-angle resampling of vibration signals has been introduced in [19], demonstrating improved feature extraction in transfer learning for fault diagnosis under variable working speed. However, the above frameworks typically require data from the target domain, meaning that vibration signals must be available for every working condition considered. When this is not the case, these approaches are usually restricted to transfer scenarios involving only a limited set of WCs.
To mitigate this limitation, DL methodologies primarily relying on deep Autoencoder (AE) architectures have been proposed. In these approaches, a network learns to reconstruct the input time series, and deviations between the input and its reconstruction (residuals) are used as an anomaly score for fault detection. Recent variants include adaptive graph-based AEs [20], sparse AEs [21], and speed-normalized AEs [22]. Despite their promise, these methods typically address only the fault detection task and generally require training data that cover all considered working conditions (WCs), which limits their applicability. To move beyond detection and tackle full fault diagnosis, including detection and severity characterization under variable WCs, deep Stacked Autoencoders (SAEs) have been explored as a promising class of neural networks [23]. SAEs consist of multiple AEs arranged in layers, with each AE trained to reconstruct its input and thus learn progressively more abstract latent representations. During the pretraining stage, each AE is trained on its input (greedy layer-wise pretraining), and the encoder weights are retained, whereas the decoders are discarded. The encoders are then stacked to form a deep feed-forward network, upon which a classifier (e.g., softmax) is appended to map the learned latent features to distinct fault severities or classes [23]. SAEs have been effectively employed in a number of studies for gear fault diagnosis under a limited range of speeds and loads, with promising results even when evaluated at speeds and loads not included during training. Different SAE variants have been proposed, such as denoising SAEs [24,25], maximum-independence SAEs [26], sparse SAEs [27], and transfer SAEs [28]. It is noted that due to the strong performance of the standard SAE, it is often adopted for comparative purposes in many studies [23,24,25,27].
Although NNs, including SAEs, have demonstrated promising performance in fault diagnosis under variable WCs, they typically require large amounts of data that adequately represent the variability across WCs and fault severities. Moreover, their performance tends to degrade when tested in WCs not included during training, dramatically limiting their industry application. More importantly, NNs operate as purely data-driven ‘black box’ systems, learning abstract feature representations without explicit correspondence to the underlying gearbox dynamics. Thus, the diagnostic decisions of such models are often difficult to interpret in terms of measurable quantities. Even when achieving high accuracy, this lack of explainability deters the adoption of neural networks in real-world applications. Recent efforts toward explainable AI in rotating machinery diagnosis are emerging, but remain in their infancy, especially under variable WCs [29].
An interesting alternative to non-parametric methods, which allow for explicit modeling of the gearbox dynamics and clear insights into why specific decisions are made thanks to their transparency, is parametric methods, which utilize parametric statistical time series AutoRegressive Moving Average with eXogenous input (ARMAX) type models. The advantages provided by these methods include simplicity, high accuracy, and a compact representation of the underlying system dynamics. Interpretability and transparency are also offered based on the direct relation between the model parameters and the physical characteristics of the gearbox [30,31]. In practice, vibration signals are used to identify ARX-type models representing the healthy dynamics under variable WCs. Fault detection and severity characterization are then performed by testing the model residuals via standard time-domain statistics (e.g., sample variance, kurtosis) or more advanced indices (e.g., periodic modulation intensity) [32]. In this context, Linear Parameter Varying (LPV) models have been recently employed for gear fault diagnosis under continuously varying WCs [32,33,34], where their parameters are explicit functions of the WCs, for the modeling of non-stationary dynamics. In particular, sparse LPV-AR [33], LPV-VAR [34], and LPV-ARMA [32] models have been employed for gear tooth crack detection and severity characterization under random speed variations. These models, however, cannot be applied to scenarios with multiple discrete WCs that remain nominally constant over specific time intervals. To address such cases, several methods have been proposed for gear crack detection and severity characterization under multiple load levels while assuming a constant rotating speed. These methods first apply Time Synchronous Averaging (TSA) incorporating shaft-angle information to isolate cyclic vibrations from noise and other asynchronous components [35]. Subsequently, multiple distinct AR [36,37], ARX [38], or Vector AR (VAR) [39,40] models are identified to represent the dynamics at each load level. Despite promising results, assessment rely on run-to-failure experiments culminating in complete tooth cracking and removal, leaving open whether detection is achieved at truly incipient stages.
It is important to note that, according to the relevant literature, the diagnosis of incipient gear faults, whether through non-parametric or parametric methods, has thus far been investigated under a limited set of WCs. The challenge of fault diagnosis for a broad range of simultaneously varying speeds and loads remains largely unexplored. The study of this problem is of particular significance, as variations in both speed and load typically occur concurrently in industrial applications. Moreover, the benefits of parametric modeling have been exploited from a limited perspective that is through multiple distinct models, one per working condition. A primary challenge in formulating a single parametric model that accurately incorporates the influence of variable WCs on the observed machinery dynamics is presented by the direct effect WCs have on the vibration spectral characteristics with speed variations in particular causing shifts in the frequencies of spectral harmonics. To overcome this barrier, a novel statistical time series model-based method is proposed in [41]. In particular, an advanced stochastic Functionally Pooled model of the AutoRegressive form (FP–AR) is employed, in which the parameters are expressed as functions of a scheduling variable corresponding to a specific working condition (speed or load). This approach is capable of representing the machinery dynamics under any working condition within the considered continuous range, rather than being confined to specific WCs as in the multiple model-based parametric methods [39,40]. An unsupervised FP–AR model-based fault detection method is then proposed, which is trained on a sample of WCs from the considered range, rather than on all WCs as required by most non-parametric and deep learning-based methods. The method has shown remarkable detection performance for shaft, coupler, and bearing-related faults, even at WCs not encountered during training. However, it remains limited to a single varying working condition (either speed or load), while fault severity characterization has yet to be addressed.
The goal of this study is the postulation of a novel machine learning (ML) methodology for fault detection and severity characterization of incipient gear faults under variable working speed and load, by extending the recently introduced statistical time series model-based method presented in [41]. The extension includes the use of Vector Functionally Pooled models of the AR type (VFP–AR), instead of scalar ones, thus incorporating both rotating speed and load simultaneously. Moreover, shaft-angle information is incorporated through angularly resampled vibration signals obtained based on a dedicated computed order tracking procedure, specifically formulated for multiple nominally constant rotating speeds within a predefined range. To achieve high-accuracy modeling of gearbox dynamics at different health states and fault severities under multiple loads within a continuous and wide range of rotating speeds, the concept of employing ‘clouds’ of VFP–AR models representing the gearbox dynamics is additionally proposed. Each VFP–AR model of the different ‘clouds’ is identified using a set of angular vibration signals from a single accelerometer, acquired from a sample of the considered rotating speeds and loads rather than all WCs. Each model may have a different structure and functional subspace, covering a specific subrange of rotating speeds while accounting for the effects of all considered loads on the dynamics. A VFP–AR model cloud-based fault detection and severity characterization method is then developed, utilizing model residual whiteness hypothesis testing. During training, both rotating speed and load are treated as known. During inspection (testing), speed—which is typically readily available—is known, while load is unknown and is estimated via a nonlinear optimization procedure integrated within the fault diagnosis procedure.
The methodology’s performance is systematically assessed through thousands of experiments with a single-stage spur gearbox operating under 61 different rotating speeds and four different loads. Four severities of an incipient fault at the base of a single pinion tooth are considered. Furthermore, the ML methodology is compared with a state-of-the-art deep Stacked Autoencoder-based alternative that has been widely adopted for comparative purposes [23,24,25,27]. All comparisons follow systematic and statistically reliable procedures involving thousands of inspection (testing) experiments, with performance presented via plots of the methods’ metrics, Receiver Operating Characteristic (ROC) curves [42] and confusion matrices [43].
The main contributions of this paper include
(a)
Parametric Modeling of Gearbox Dynamics: A parametric identification procedure for gearbox dynamics within a continuous range of working conditions, including variable speeds and loads, is presented. The approach employs clouds of VFP–AR models combined with angularly resampled vibration signals via a dedicated computed order tracking procedure for different nominally constant rotating speeds.
(b)
Gear fault diagnosis under variable working conditions: A machine learning incipient gear fault detection and severity characterization methodology is postulated. The methodology is trained on a minimal number of vibration signals from a sample of the considered WCs range and can operate at any working condition within that range.
(c)
Systematic Experimental and Comparative Assessment: A comprehensive experimental evaluation of the proposed methodology is conducted using data from thousands of experiments on a single-stage spur gearbox operating over a wide range of speeds and loads. A comparative assessment with a state-of-the-art deep Stacked Autoencoder-based alternative demonstrates the superior performance of the postulated methodology.
The paper’s remaining sections are arranged as follows: The precise problem formulation is presented in Section 2 and the machine learning methodology in Section 3. The experimental assessment of the method is presented in Section 4. Concluding remarks are finally summarized in Section 5.

2. Problem Formulation

To establish the basis for the machine learning methodology, this section formulates the problem of incipient gear fault detection and severity characterization under variable working speed and load. The formulation defines the notation, assumptions, and data organization used in the subsequent development of the machine-learning-based approach. Each working condition, i.e., a specific combination of rotating speed ( ω ) and load (), is represented by the vector k = [ ω l ] ; vectors and matrices are denoted by bold lowercase and uppercase symbols, respectively. Fault detection and severity characterization are performed in two phases: training and inspection.
Training Phase: For each considered health state—healthy and faulty gearbox under different severities and WCs—a single vibration signal is collected from one accelerometer on the gearbox, sampled simultaneously with the signal from a tachometer. A total of M = M 1 × M 2 pairs of such signals is collected, where M 1 and M 2 correspond to a limited number of sampled speeds and loads from the continuous ranges of the considered WCs, for training the methodology. Each signal has length N samples (discrete time index t) and is denoted by:
y k [ t ] , k { k 1 , 1 , , k M 1 , M 2 } , t = 1 , 2 , , N
Consider S healthy, and faulty gearbox states yield a total of S × M vibration signals for training.
Inspection (real-time) Phase: Given a newly acquired vibration signal y u [ t ] from an unknown (subscript u) gearbox health state obtained from the same accelerometer used in the training phase, as well as the corresponding tachometer signal, the methodology determines initially whether a fault is present. If a fault is detected, its severity level is subsequently characterized. The signal y u [ t ] may be acquired while the gearbox operates at any speed and load within the training range, even for values not used in the training phase. The decision on the health state is obtained via standard hypothesis testing using the methodology’s metric (see Section 3).
It is noted that during training, both speed ω (via tachometer) and load are known, while during inspection, ω is measured and is treated as completely unknown.

3. The Machine Learning Fault Detection and Severity Characterization Methodology

This section presents the ML fault detection and severity characterization methodology that consists of three sequential steps, as outlined in Figure 1. The first step involves partitioning the considered speed range into shorter ranges and performing angular resampling of the vibration signals within each of them (Step 1), followed by the identification of the gearbox dynamics through clouds of VFP–AR models at each health state (Step 2, training phase). Finally, the developed model clouds are employed for real-time fault detection and severity characterization under unknown health conditions in Step 3 (inspection phase).

3.1. Step 1: Special Angular Resampling for Different Nominally Constant Rotating Speeds

The first step of the methodology involves angular resampling of vibration signals based on Computed Order Tracking [44] using the tachometer signal, in order to eliminate non-synchronous frequency modulation in the time-domain signals. This modulation arises primarily due to fluctuations in shaft speed, which occur even when the motor inverter is set at a nominally constant value [44]. To retain all frequency content up to the Nyquist limit as obtained in the time domain, the resampling frequency in the angular domain (or θ -domain) is selected as:
f s θ ( ω ) = f s ω
where f s denotes the sampling frequency (in Hz) in the time domain, ω the nominal rotating speed in revolutions per second (rev/s) and f s θ the resampling frequency in samples per revolution (samples/rev). However, as recently shown in [19], using a speed-dependent f s θ prevents unified modeling of the gearbox dynamics across different speeds.
The adoption of a common frequency is dictated by the maximum rotating speed ω max , yielding:
f s θ = f s ω max
with Nyquist limit equal to f s / ( 2 ω max ) and operational bandwidth [ 0 , f s / ( 2 ω max ) ] in the θ -domain, which maps to [ 0 , f s / 2 ] Hz in the time domain. For  ω < ω max , the corresponding mapping requires a sampling frequency that is larger than the common value, f s θ ( ω ) > f s θ , and causes the mapped time-domain frequencies to exceed the angular-domain Nyquist limit, leading to aliasing. Therefore, to preserve a valid mapping and avoid aliasing, all vibration signals should be low-pass filtered prior to angular resampling, with a cut-off frequency defined by:
f c ( ω ) = f s θ 2 ω = f s 2 ω ω max
As part of this filtering process, higher-frequency components are progressively eliminated as the rotating speed decreases. To limit this attenuation (if necessary), m adjacent subranges that cover the entire range of rotating speeds may be selected via the following procedure: Select the highest frequency to retain f H (in Hz) ensuring f H f s / 2 . Then, partition the speed range into m adjacent subranges so that in each subrange i, the following condition is satisfied:
f c ( ω min i ) f H f s 2 ω max i ω min i f H ω min i ω max i 2 f H f s
with ω min i and ω max i designating the minimum and maximum speed of the ith subrange, respectively. Then, the angular resampling frequency f s θ i of each subrange may be obtained based on Equation (2) as:
f s θ i = f s ω max i
If f c ( ω ) lies far beyond the range of diagnostic interest—as may occur when the original sampling frequency f s is very high (see Equation (3))—the frequency-based segmentation in adjacent subranges may be omitted. In this case, a common angular resampling frequency may be selected for all considered speeds based on Equation (2). It should be noted that, in either case, the considered frequency range must include at least four distinct working speeds for the adequate identification of a Functionally Pooled model [41] in Step 2.
The above procedure selecting adjacent subranges is illustrated through an indicative example in Figure 2. Signals are assumed to be measured at five nominal speeds ω 1 < ω 2 < ω 3 < ω 4 < ω 5 = ω max , each containing speed–proportional components (2 shaft harmonics and two gear–mesh harmonics).
The case in which a single common resampling frequency f s θ is selected across all different speeds based on Equation (2) is shown in Figure 2a,b. In particular, Figure 2a presents the frequency spectra at all considered speeds together with the corresponding f c ( ω ) frequencies, as obtained via Equation (3). The frequencies above f c ( ω ) (yellow shaded areas) must be filtered out prior to angular resampling. Figure 2b presents the resulting order spectra where the order–domain Nyquist limit f s θ / 2 is set by ω max .
Additionally, the case in which a maximum frequency of interest f H is selected to be retained is shown in Figure 2c–f. Two adjacent subranges ( m = 2 ) are considered, [ ω 1 , ω 3 ] and [ ω 3 , ω 5 ] , with  f c ( ω min i ) f H in each subrange, according to Equation (4). The corresponding frequency and order spectra are shown in Figure 2c–f. For each subrange, a distinct angular resampling frequency is adopted according to Equation (5), yielding different order–domain Nyquist limits ( f s / ( 2 ω 3 ) ) and ( f s / ( 2 ω 5 ) ) , respectively. In this way, the bandwidth [ 0 , f c ( ω min i ) ] to be filtered out (shown in yellow) is minimized within each subrange, while ensuring inclusion of f H , which is not the case in the previous configuration as shown in Figure 2a.

3.2. Step 2 (Training Phase): VFP–AR Model Cloud-Based Gearbox Dynamics Identification

Once the m adjacent speed subranges have been selected and the available (training phase) vibration signals have been angularly resampled with the corresponding frequency, the objective of this stage of the methodology is to model the gearbox dynamics within the continuous range of working speeds and loads for each of the S considered health states. This is achieved by constructing S clouds of VFP–AR models as illustrated in the upper part of Figure 3. Each cloud consists of m VFP–AR models, corresponding to the m speed subranges. Therefore, each VFP–AR model represents the gearbox dynamics at a specific health state over a specific speed subrange, that is ω [ ω min i , ω max i ] ( i = 1 , , m ) and the complete load range l [ l min , l max ] . Each model is identified from M i = M 1 i × M 2 vibration signals, where M 1 i M 1 a number of training signals (see Section 2) corresponding to the sampled speeds of the i th subrange. Thus, the training of each model is performed using a set of signals from a sample of the considered speeds and loads ranges, rather than all available working conditions, as also indicated in Section 2. The general form of the VFP–AR model is given by [45]:
y k [ θ ] + i = 1 n a a i ( k ) · y k [ θ i ] = e k [ θ ]
a i ( k ) = j = 1 p a i , j · G j ( k ) , e k [ θ ] iid N 0 , σ e 2 ( k ) , k R 2
with y k [ θ ] designating the angular vibration signal, θ ( = 1 , , N ) equispaced samples after angular resampling, n a the AR model order, and e k [ θ ] model residual signal, while iid stands for independent and identically distributed, and  N ( · , · ) for normal (Gaussian) distribution with the indicated mean and variance. The operating parameter vector k is defined as k = [ ω v l n ] T k v , n , where v = 1 , , M 1 i and n = 1 , , M 2 . The AR parameters a i ( k ) (Equation (6b)) are modeled as explicit functions of k belonging to a p-dimensional functional subspace spanned by the mutually independent functions G 1 ( k ) , G 2 ( k ) , , G p ( k ) . These functions form a functional basis that consists of bivariate orthogonal polynomials such as Chebyshev, Legendre or others.
The constants a i , j designate the corresponding AR projection coefficients, which are aggregated in a vector β = [ a 1 , 1 a n a , p ] [ n a p × 1 ] T . The representation of Equations (6a) and (6b) is referred to as a VFP–AR model of order n a and functional subspace dimensionality p, abbreviated as VFP–AR ( n a ) p . This model may be written in a linear regression form as follows (⊗ designates the Kronecker product [46] (pp. 27–28)):
y k [ θ ] = [ φ k T [ θ ] g T ( k ) ] · β + e k [ θ ] = ϕ k T [ θ ] · β + e k [ θ ]
with φ k [ θ ] = [ y k [ θ 1 ] y k [ θ n a ] ] [ n a × 1 ] T and g ( k ) = [ G 1 ( k ) G p ( k ) ] [ p × 1 ] T .
  • Substituting the values for a single signal ( θ = 1 , , N ) corresponding to a k working condition into the above expression leads to:
y k [ 1 ] y k [ N ] = ϕ k T [ 1 ] ϕ k T [ N ] · β + e k [ 1 ] e k [ N ] y k = Φ k · β + e k
Pooling together all such expressions for k 1 , 1 , , k M 1 i , M 2 for a single value of θ leads to:
y k 1 , 1 y k M 1 i , M 2 [ N M i × 1 ] = Φ k 1 , 1 Φ k M 1 i , M 2 [ N M i × p n a ] · β + e k 1 , 1 e k M 1 i , M 2 [ N M i × 1 ] y = Φ · β + e
The projection coefficient vector β may be thus obtained based on the Ordinary Least Squares (OLS) estimator ( ^   denotes estimate/estimator) [45]:
β ^ = ( Φ T Φ ) 1 · Φ T y = θ = 1 N Φ T [ θ ] Φ [ θ ] 1 · θ = 1 N Φ T [ θ ] y [ θ ]
while the residual variance is obtained as:
σ ^ e 2 ( k , β ) = 1 N θ = 1 N e k 2 [ θ , β ^ ] for k = k 1 , 1 , k 1 , 2 , , k M 1 i , M 2 ,
Initially, a set of M i conventional AR ( n a ) models are estimated based on the minimization of the typical Bayesian Information Criterion (BIC) [47] (pp. 505–507) using the M i response vibration signals y k 1 , 1 y k M 1 i , M 2 , and the maximum order among all models is adopted for the VFP–AR model. Then, the dimensionality of the VFP–AR model’s functional subspace is selected via a Genetic Algorithm (GA) [48] (pp. 27–49) minimizing the properly adapted BIC for FP models [45]. Finally, the VFP–AR model validation is carried out by testing the uncorrelatedness (whiteness) of the arising residual signals e k [ θ ] k using typical hypothesis testing [47] (pp. 512–513).
Upon completion of Step 2 (training phase), a total of S clouds of VFP–AR models are obtained, each corresponding to one gearbox health state. Each cloud contains m individual VFP–AR models, one for each speed subrange, collectively describing the gearbox dynamics across the entire working range of speeds and loads. Hence, a total of S × m VFP–AR models are identified, where each model represents the gearbox dynamics under healthy or a specific faulty condition (different severities) within the continuous range of WCs. It is worth noting that the individual VFP–AR models within each cloud may differ in both their order and the dimensionality of their functional subspace.

3.3. Step 3 (Inspection Phase): VFP–AR Model Cloud-Based Fault Detection and Severity Characterization

The final step of the methodology corresponds to the inspection (real-time) phase, executed periodically or continuously during normal gearbox operation while the health state is unknown. A new vibration signal y u [ t ] together with the corresponding tachometer signal are acquired under unknown health state and load. Based on them, the methodology performs two sequential tasks: fault detection followed by severity characterization in case of a fault is detected.
Fault detection: The first task in Step 3 is fault detection, which relies on the ‘Healthy Cloud’ of VFP–AR models (also see upper part of Figure 3), identified exclusively from healthy gearbox vibration signals during training (Step 1). Therefore, fault detection is fully unsupervised within the proposed ML methodology [49] (p. 12) and is performed as follows. Based on the tachometer signal, the specific speed subrange to which the current speed belongs is first determined. The vibration signal y u [ t ] is then angularly resampled as previously described, yielding y u [ θ ] (see also Figure 3). Next, y u [ θ ] is driven through the VFP–AR model from the ‘Healthy Cloud’ corresponding to the specific speed subrange, and Equation (6a) is written as:
y u [ θ ] + i = 1 n a a i ( k ω u , l u ) · y u [ θ i ] = e u [ θ , k ω u , l u ]
where the load l u in k ω u , l u is unknown. Then, a Nonlinear Least Squares (NLS) optimization algorithm employed through golden search and parabolic interpolation (also known as Brent’s method [50] (pp. 61–75)) is used to estimate the unknown load in k ω u , l u that leads to the minimum residual sum of squares (RSS):
l ^ u = arg min l K θ = 1 N e u 2 [ θ , k ω u , l u ] , σ ^ e u 2 = 1 N θ = 1 N e u 2 [ θ , k ω u , l ^ u ]
with e u [ θ , k ω u , l u ] provided by Equation (12) and K = [ l min , l max ] the range of the examined loads into which the gearbox normally operates.
The estimate l ^ u is asymptotically N normally distributed with mean μ l u and variance σ l u 2 , that is l ^ u N ( μ l u , σ l u 2 ) . The variance σ l u 2 is provided by the Cramér-Rao lower bound and can be estimated as [51]:
σ ^ l u 2 = σ ^ e u 2 N 1 N θ = 1 N φ T [ θ ] g T ( k ω u , l u ) l u | l u = l ^ u · β ^ 2 1
with φ [ θ ] defined similarly to Equation (7) and β ^ the projection coefficient vector estimated in the baseline phase. If the current dynamics originates from the healthy gearbox, then e u [ θ , k ω , l ^ u ] should be a white (uncorrelated) signal which may be examined via any typical whiteness testing procedure; herein the Pena-Rodriguez (PR) D statistic which follows a standard normal distribution for a white sequence [52] is employed. If  e u [ θ , k ω , l ^ u ] is white, this means that y u [ θ ] originates from the healthy gearbox operating under k = [ ω l ^ u ] T speed and load; otherwise, the gearbox is declared as faulty.
Fault Severity Characterization: The second task in Step 3 (see also Figure 3) is fault severity characterization once a fault is positively detected. This relies on the ‘Faulty Clouds’ of VFP–AR models which are also identified in the training phase (Step 2) from vibration signals corresponding to the available different gear fault severities. Therefore, fault severity characterization is performed in a supervised manner within the ML methodology [49] (p. 12). If the gearbox is detected as faulty, y u [ θ ] is redirected to all fault severity clouds as shown in Figure 3, and the same procedure performed in fault detection is repeated for each cloud corresponding to a specific fault severity. In particular, the signal is first driven through the VFP–AR model that corresponds to the specific speed subrange. Then, the unknown load l u is estimated via Equation (13), and the resulting residual sequence e u [ θ , k ω u , l ^ u ] is obtained. Finally, for every fault severity cloud, the D statistic of the PR test is then calculated to test the whiteness of the residuals and the unknown fault severity characterization is achieved as follows:
s ^ = arg min s = 1 , , j D s ,
where D s denotes D statistic of the PR test computed for the residuals obtained when y u [ θ ] is driven through the cloud corresponding to the s th fault severity level, and j the total number of the considered fault severities. The estimated severity level s ^ corresponds to the cloud associated with the minimum value of D s , indicating the closest match between the measured and modeled gearbox dynamics. It is worth noting that if a fault severity not included in the training phase is encountered, the ML methodology will classify it to the closest known severity from training, that is, to the model cloud whose dynamic characteristics most closely resemble those of the new condition.

4. Experimental Assessment

4.1. Gearbox, Gear Fault Scenarios and Vibration Signals

The experimental dataset has been acquired by the authors at the University of Patras, Greece. The set-up (Figure 4a), manufactured by ‘alphamach.gr’, consists of a single-stage spur gearbox driven by an AC electric motor and loaded by a DC motor. The gearbox features a pinion with 17 teeth and a gear with 34 teeth. The drive motor operates at 61 distinct speeds, ranging from 10 rev/s (1200 rpm) to 25 rev/s (1500 rpm) in increments of 0.25 rev/s (15 rpm), regulated by a standard variable frequency drive (inverter). The load motor operates as a generator, allowing adjustable loading conditions for the gearbox. Four different load scenarios are implemented: For Load 1, the load motor is detached from the gearbox. For Load 2, the load motor is attached to the gearbox but remains unloaded (no devices are connected to the generator’s outputs). In Load 3, the load motor is attached to the gearbox and a 500-watt device is connected. Finally, in Load 4, the load motor is attached to the gearbox with a 1000-watt device connected. An incipient fault is introduced at the base of a single pinion tooth in the gearbox (Figure 4b) using a typical Dremel-type cutting tool. The fault scenarios are implemented at four distinct severity levels, each defined as a percentage of the total tooth face width (w) affected (see Figure 4c). The first level corresponds to 25% of the tooth face width affected ( L 1 = 0.25 w ), the second to 50% ( L 2 = 0.5 w ), and the third and fourth to 75% ( L 3 = 0.75 w ) and 100% ( L 4 = w ), respectively. The above fault scenarios are designated as F25, F50, F75 and F100, respectively. It is worth noting that the faults have been introduced on the gears inside the gearbox without disassembly, thus avoiding additional uncertainties.
Vibration signals are acquired using a single triaxial accelerometer placed on the ball bearing housing of the secondary shaft (see Figure 4a). The signals are sampled with a sample frequency of f s = 10 , 240 Hz, and only z-direction (radial axis aligned with the gravitational direction) measurements are employed based on the fact that they typically exhibit the strongest sensitivity to spur gear faults [53,54,55]. Additionally, a laser tachometer measures simultaneously the rotating speed of the drive motor (see Figure 4a). A total of 18,765 vibration signals are recorded (see details in Table 1) including all considered health states, rotating speeds and loads. Only 465 of them ( 2.5 % of the complete dataset) are employed for the training of the ML methodology, while the remaining 18,300 signals ( 97.5 % ) are exclusively used in the inspection (testing) phase for the methodology’s assessment and comparison.

4.2. Effects of Variable Working Conditions and Incipient Faults on the Vibration Signals

Initially, the effects of the incipient gear faults on the vibration signals are evaluated in the time domain. In Figure 5, an indicative vibration signal from each health state at a constant speed of 20 rev/s under the second load level is presented, with different colors indicating the respective health states. It is observed that the fault scenarios do not introduce visible deviations relative to the healthy state signal. In particular, the healthy signal amplitude ranges from −0.71 to 1.09 (m/s2), while the four faulty signals lie between −0.82 and 1.02 (m/s2), without any visible amplitude-modulation pattern. The absence of fault-induced signatures confirms the incipient stage of the examined severities and demonstrates that fault detection and severity characterization remain challenging tasks, even under constant working conditions.
Additionally, the effects of variable working conditions and fault scenarios are evaluated using time-domain features that are commonly applied for gear fault diagnosis under variable WCs [6,9]. The considered features include RMS, peak-to-peak, kurtosis, crest factor, impulse factor, Energy I, skewness, standard deviation (std), variance, shape factor, margin factor, and Energy II (formal definitions in [9]). In Figure 6, the values of these features are depicted as obtained from 244 vibration signals per health state, one per combination of rotating speed and load. Each subplot corresponds to one feature, with health states distinguished by color, x-axis denoting the signal number, and y-axis representing the feature value. The two black dashed horizontal lines in each subplot mark the minimum and maximum value of the corresponding feature for the healthy state over all 244 signals, thereby defining the ‘healthy range’ for that feature. As it is shown, the points corresponding to all fault severities lie largely inside this healthy range, implying that it is highly challenging to separate healthy from faulty conditions. For example, the RMS values of the healthy state range from 0.05 to 0.42, and only 5, 3, 11 and 19 signals from the F25, F50, F75 and F100 fault severities, respectively, fall outside this healthy RMS interval. A similar pattern is observed for all examined features, with the largest number of out-of-range signals occurring for skewness in the F100 fault scenario, where 42 signals lie outside the healthy range. Although these features are indeed widely used in condition-monitoring applications and can be effective in many scenarios, in the present dataset, their variations are dominated by changes in rotating speed and load rather than by faults.
Finally, the fault as well as variable speed and load effects are investigated in the order domain via order-spectrum zones estimated through Fast Fourier Transform (FFT). These zones are constructed using the 244 vibration signals per health state described above. For visualization purposes only, all signals are angularly resampled at a single rate of f s θ = 409 (samples/rev). While a common resampling frequency is not adopted in the ML methodology, this choice is applied here exclusively to place all signals on the same order axis, so that spectrum zones from different speeds and loads can be shown in a single figure and directly compared. The spectrum zones of the healthy state are compared with those of the four fault scenarios in Figure 7. More specifically, Figure 7a–d compares the healthy order-spectrum zone with a different fault scenario. It is evident that variations in rotating speed and load modulate the healthy signals in a manner that masks fault-related signatures, causing the range of faulty zone values to overlap almost entirely with the healthy zone. Thus, considering this and the above comments on the time-domain signals and standard features, it is evident that the considered fault diagnosis problem is particularly challenging.

4.3. Angular Resampling and VFP–AR Model Identification

Step 1: Following the procedure described in Section 3.1, the considered speed range is divided into m = 4 adjacent subranges based on Equation (4) so that a highest frequency of f H 4096 (Hz) is retained within each one, while ensuring that at least four speeds are included within each one for the FP modeling. The corresponding angular resampling frequency of each range is determined based on Equation (5) while the number of rotations over which the angular resampling is performed within each range is determined by the minimum speed of each range. All of the above are summarized in Table 2.
Step 2: Based on the procedure described in Section 3.2 and using the set of 465 vibration signals of the training phase, corresponding to all health states, 31 different rotating speeds and three different load levels from the considered ranges (see Table 1), a set of S = 5  clouds of VFP–AR models are estimated. Each cloud corresponds to a certain health state of the gearbox (Healthy, F25, …, F100), and consists of four distinct VFP–AR models, one per speed range. A total of 20 distinct VFP–AR models are estimated, collectively representing the gearbox dynamics within the continuous range of considered speeds and loads at every health state. All details on the estimated models are summarized in Table 3. In particular, this table includes the order n a (equal to model parameters) and the number of basis functions p for each VFP–AR ( n a ) p model, from which the number of estimated projection coefficients may be calculated via the product n a × p . It is worth noting that n a and p are the method’s hyper parameters to be determined. Furthermore, Table 3 includes the Samples-Per-Parameter (SPP) ratio—defined as the total number of training samples used in the OLS estimator divided by the number of estimated projection coefficients—and the condition number of the inverted matrix in Equation (10).
The identification procedure is indicatively presented for the fourth model of the Healthy Cloud, corresponding to speed range [20–25] rev/s. The AR order is selected as n a = 330 using the BIC and the Residual Sum of Squares (RSS) criteria, both shown in Figure 8a. As shown, the median BIC is 13.24 , while the corresponding RSS median is 40.44. The functional subspace dimensionality p of the VFP–AR model is selected to be spanned by 11 bivariate Shifted Legendre polynomials via a Genetic Algorithm (GA) for the minimization of the BIC, yielding a VFP–AR ( 330 ) 11 that represents the healthy gearbox dynamics over the continuous subrange of speeds [ 20 , 25 ]  (rev/s) and loads [ 1 , 4 ] .
Despite the fact that a plateau of BIC (and RSS) values is present over several candidate model orders (Figure 8a), the final order is determined by standard validation through formal assessment of residual whiteness using the Autocorrelation Function (ACF). An illustrative case is shown in Figure 8b for a speed of 22 rev/s and load level 1, where the horizontal red lines mark the statistical limits at α = 0.05 . Satisfaction of the whiteness criterion ensures model adequacy. Selecting the lowest-order model within the BIC/RSS plateau that passes the whiteness test achieves statistical parsimony while maintaining performance and reduces the risk of overfitting [47] (pp. 492–495).
In Figure 8c,d, the gearbox dynamics captured by the VFP–AR model are visualized through the variability of two indicative parameters of the VFP–AR ( 330 ) 11 as functions of rotating speed and load. In addition, at the constant load level 4, the model-based power spectral density (psd) magnitude as a function of shaft order and rotating speed is presented in Figure 9a–d (one subplot per subrange), whereas Figure 10a–d presents the model-based psd magnitude as a function of shaft order and load at four fixed rotating speeds (one representative speed from each subrange).

4.4. Fault Detection and Severity Characterization via the VFP–AR Based ML Methodology

The fault detection performance of the proposed ML methodology is assessed (inspection phase) based on 3660 signals from the healthy gearbox and 3660 from each fault severity, with the gearbox operating under 61 different speeds and four different load levels (not only those used in the training phase; see Table 1). The tachometer signal is also acquired simultaneously with each vibration measurement in this phase, while the health state of the gearbox and the load level is considered as completely unknown.
Based on the procedure described in Section 3.3, each vibration signal is initially classified to a certain speed subrange of the training phase and then is angularly resampled. Then, it is driven through the VFP–AR model of the Healthy Cloud corresponding to its speed range, yielding the model residual signal e u [ θ , k ω u , l ^ u ] through Equation (12) corresponding to the estimated load l ^ u (based on Equation (13)). Fault detection is subsequently performed via the D statistic of the Pena-Rodriguez whiteness test evaluated on the residuals.
The D statistic values of the methodology for all considered test signals (18,300) of the inspection phase are presented in Figure 11a–d using scatter plots and Receiver Operating Characteristic (ROC) curves [42]. An ROC curve depicts the true positive rate (percentage of correctly detected faults) versus the false positive rate (percentage of false alarms) as the decision threshold varies; perfect detection is achieved when the curve passes through the point (0,1).
More specifically, Figure 11(a1–d1) shows the values of the Pena-Rodriguez D statistic across 1st–4th subrange, respectively. Based on them, the inspection (test) vibration signals corresponding to the healthy gearbox are highly separated from those corresponding to the gear faults which indicates consistently high fault detection performance, including cases with rotating speeds and load levels not included in the method’s training. The corresponding ROC curves in Figure 11(a2–d2) corroborate these results, showing TPR 94 % at 5 % FPR for all considered fault scenarios.
A summary of these results is also presented through a confusion matrix in Figure 12 for all 18,300 test cases considered. Each column of the confusion matrix in the upper left 2 × 2 sub-matrix corresponds to the actual health state, with each row representing the estimated state. The entry in position ( i , j ) therefore indicates the number of times the actual j-th health state was predicted as the i-th health state, presented as a ratio in relation to the total number of actual inspection test signals. Along the diagonal, the true health state is correctly identified, while in the off-diagonal parts, it is incorrectly identified. The percentages of correctly (in green) and incorrectly (in red) identified signals out of all estimations made for each specific health state are shown in the column on the far right of the matrix. The bottom row shows the percentages of correctly (in green) and incorrectly (in red) identified signals out of all actual signals of that state. Finally, the cell at the bottom and right indicates the overall correct identification rate (in green) and false identification rate (in red) across both health states. In other words, this cell presents the total fault detection accuracy of the methodology, while the rightmost column and the bottom row indicate the typical precision and recall ratios, respectively, for each health state [56]. Evidently, the proposed method’s very good results are verified, underlying the detection of the healthy state for 87.2 % of the considered test cases. Furthermore, it accurately detects the considered incipient faults (Faulty stands for all fault severities) at 97.4 % , resulting in an overall detection rate of 95.4 % (rightmost bottom cell). The mean time needed for the completion of fault detection per inspection signal is 2.5 s to 6.5 s (MATLAB functions clock.m, etime.m; MATLAB Version R2024b; Computer: Intel Core(TM) i7-13700K CPU @ 3.4 GHz, 16 GB RAM, Windows 11 Enterprise Operating System).
Based on the procedure described in Section 3.3, the angular vibration signal of each case that is identified as faulty is redirected to all fault severity Clouds. The signal is driven through the VFP–AR model of the corresponding speed range within each severity Cloud, yielding a set of residual signals (one per severity level). Fault severity characterization is performed by selecting the minimum Pena–Rodriguez D statistic across these residuals, in accordance with Equation (15).
The resulting performance is presented via the confusion matrix in Figure 13, aggregating all test signals (3660 per fault severity state). Although, strictly, test signals misclassified during the fault detection stage should not proceed to severity characterization, all faulty test signals (see Table 1) are included here to provide a clearer overall summary. As shown in Figure 13, very high fault severity characterization performance is achieved with each severity level correctly identified in at least 87.7 % of the corresponding test cases, with an overall characterization rate of 91.6 % (rightmost bottom cell). The mean time needed for the completion of fault severity characterization per inspection signal is 6.5 s to 9.5 s (MATLAB functions clock.m, etime.m; MATLAB Version R2024b; Computer: Intel Core(TM) i7-13700K CPU @ 3.4 GHz, 16 GB RAM, Windows 11 Enterprise Operating System).

4.5. Comparison with a Stacked Autoencoder-Based Method

The standard deep Stacked Autoencoder (SAE) architecture of [23], widely used as a benchmark in related works [23,24,25,27] and shown to achieve promising performance for gearbox fault diagnosis under variable WCs, is employed in this section for comparison with the VFP–AR model-based ML methodology. For a fair comparison, the SAE framework is properly adapted. In particular, fault detection and severity characterization are addressed in the above studies as a single-stage multi-class setting, treating each health state (including the healthy one) as a separate class. The ML methodology of this study performs two sequential tasks: unsupervised fault detection, followed by supervised severity characterization. Based on this, a two-stage deep learning framework is adopted: a standard Deep Autoencoder (DAE) is first used for unsupervised fault detection, while the SAE is used solely for fault severity characterization. A brief description of the employed framework is provided below, while full details may be found in [22,23].
An Autoencoder (AE) is a neural network whose primary objective is to replicate its input data at the output layer [57]. A standard AE is built around three main components: an input layer, a hidden layer, and an output layer. This architecture is conceptually divided into two parts. The encoder, which encompasses the input and hidden layers, is responsible for learning a compressed, lower-dimensional data representation. The decoder, composed of the hidden and output layers, takes this compact representation and attempts to generate a reconstruction of the original input signal. Considering a vibration signal, y k , obtained from a gearbox under the k -th operational state (for notational simplicity, the time index is omitted), the resulting output vector from the encoder, denoted h k , is expressed as:
h k = f e ( y k , w e )
Here, f e represents the encoder function, and w e contains the encoder’s learned weight and bias parameters. Subsequently, the decoder generates a reconstruction y ^ k of the original signal:
y ^ k = f d ( h k , w d )
In this equation, f d is the decoder function, and w d contains its corresponding parameters, which are frequently set as the transpose of w e . The overall transformation performed by the AE from input to output is thus defined by:
y ^ k = f ( y k , w y ) = f d f e ( y k , w e ) , w d
where w y = { w e , w d } aggregates all learnable (to be estimated) parameters that are optimized during the training phase. The training of an AE is achieved by minimizing the reconstruction error (residuals) between the original input y k and its reconstructed output y ^ k . This optimization is typically performed by minimizing the Mean Squared Error (MSE) loss function:
w ^ y = arg min w y 1 N t = 1 N y k [ t ] y ^ k [ t ] 2
Upon successful training, the AE can be deployed for unsupervised gear fault detection. An inspection vibration signal is passed through the trained AE, and the Root Mean Square (RMS) value of the residuals (reconstruction error) is calculated. This RMS value serves as a quantitative health indicator. By comparing this indicator to a user-defined threshold, the presence of a fault can be flagged.
A classical AE is characterized by its shallow structure, typically having only a single hidden layer. By integrating multiple additional hidden layers, the model becomes a Deep Autoencoder (DAE). An illustrative representation of a DAE architecture is provided in Figure 14, where the central hidden layer is commonly termed the bottleneck layer.
A deeper representation can be obtained by hierarchically stacking multiple autoencoders, forming a Stacked Autoencoder (SAE). In this scheme, each autoencoder is first pretrained independently to reconstruct its input by minimizing the Mean Squared Error (MSE) (Equation (19)). It is noted that, following the observations in [23], the original training vibration signals are first segmented into fixed-length segments, yielding a total of R training segments across all fault severity levels and working conditions considered. After pretraining, only the encoder part of each autoencoder is retained, while the decoders are discarded. The encoders are then stacked to build a deep architecture that maps the raw input to an increasingly compact representation. A softmax classification layer is subsequently attached on top of the final hidden representation, and the final SAE network is formed as illustrated in Figure 15. The trainable parameters of the SAE, comprising the encoder weights and biases of all autoencoders together with the softmax parameters, are then jointly estimated (end-to-end fine-tuning) via minimizing the categorical cross-entropy [43] (pp. 525–533).
The Deep Autoencoder employed for fault detection is trained on 93 vibration signals acquired from the healthy gearbox across 31 rotating speeds and three load levels (see details in Table 1). These signals are the same as those used for construction of the ‘Healthy Cloud’ in the proposed ML methodology. Each signal is segmented into non-overlapping windows of 1024 samples, which serve as training inputs. The DAE architecture is composed of five layers, featuring the following structure: 1024 (input)–512 (hidden)–128 (bottleneck)–512 (hidden)–1024 (output). Rectified Linear Unit (ReLU) activations are employed across all hidden layers, and a linear activation is utilized at the output layer to permit reconstruction. Training is performed for 400 epochs using the Adam optimizer (with a learning rate of 10 3 ) and a batch size of 64. Determination of the window length, batch size, and layer widths is accomplished through a sensitivity analysis to achieve maximum fault detection performance. During this analysis, alternative configurations (1024–512–256–512–1024, 2048–1024–64–128–256, 512–256–64–256–512) and batch sizes (16, 32, 64) are investigated.
The assessment of the DAE fault detection method is based on the same set of signals used for the ML methodology assessment. In particular, performance is assessed using 3660 signals per health state spanning 61 rotating speeds and four load levels (see details in Table 1), thereby covering the full range of working conditions. Results are presented in Figure 16 (left) via the RMS of the residuals (reconstruction error), where a moderate separation between healthy and faulty conditions is observed. This is also confirmed by the confusion matrix in Figure 16 (right part), which indicates fair fault detection capability under variable WCs, yielding an overall accuracy of 84.7%.
The Stacked Autoencoder (SAE) employed for fault severity characterization is trained using the same 372 vibration signals as those used for the construction of the Faulty Clouds of the proposed ML methodology (see Table 1). Each vibration signal is segmented into non-overlapping windows of 1024 samples, yielding training inputs of consistent dimensionality. The SAE architecture consists of three Stacked Autoencoders and a final softmax classifier. The dimensionality of the hidden layers is selected as 1024–512–128–64, providing an increasingly compact latent representation before the classification stage. Each autoencoder is pretrained independently for 150, 60, and 40 epochs, respectively. The softmax classifier is trained for 200 epochs, and the entire network is then fine-tuned for 150 epochs.
The assessment of the SAE-based fault severity characterization method is performed on test signals covering the complete range of speeds and loads (Table 1). Results are summarized in Figure 17, showing accuracies of 87.1% (F25), 86.2% (F50), 77.5% (F75), and 84.3% (F100), yielding an overall accuracy of 83.8%. The method thus demonstrates generally good classification performance, with higher accuracy at low severities (F25, F50) and fair performance at the high ones (F75, F100). Most misclassifications occur between adjacent severity levels, with the largest confusion observed for F75 (22.5 %), primarily toward F100.

5. Concluding Remarks

A machine learning (ML) methodology for the detection and severity characterization of incipient gear faults under variable working speed and load has been introduced. The methodology is trained on a limited number of vibration signals at a sample of working conditions (WCs) within the range of interest, along with tachometer signals, and can operate at any working condition within that range. The main findings drawn from this study are summarized below:
(a)
The fundamental component of the ML methodology is the accurate parametric modeling of gearbox dynamics within the continuous range of the considered WCs, achieved for the first time through ‘clouds’ of Vector Functionally Pooled AutoRegressive (VFP–AR) models estimated from properly angular resampled vibration signals. This involves a novel procedure that incorporates angular resampling based on dedicated computed order tracking and a special filtering for different nominally constant rotating speeds.
(b)
Fault diagnosis including fault detection and severity characterization in the inspection phase rely on a low-complexity whiteness hypothesis testing procedure applied to the VFP–AR model residuals, enabling real-time implementation.
(c)
The methodology’s performance has been rigorously assessed through thousands of experiments with a single-stage spur gearbox, demonstrating high effectiveness in detecting and characterizing incipient gear faults that leave no visible imprints in time-domain signals and whose frequency-domain effects significantly overlap with those induced by the different WCs. The ML methodology achieves an overall 95.4% accuracy in fault detection and 91.6% in fault severity characterization.
(d)
A comparative assessment of the ML methodology confirmed that it outperforms a state-of-the-art deep Stacked Autoencoder (SAE)-based alternative, which achieved 84.7% accuracy in fault detection and 83.8% in severity characterization. Furthermore, the VFP–AR-based ML methodology offers clear insights into modeling and diagnosis decisions with full transparency and interpretability due to the direct relation of the model parameters with the physical characteristics of the gearbox, unlike the SAE-based ‘black box’ counterpart.
Despite the promising performance of the VFP–AR-based ML methodology, its effectiveness warrants further investigation on more complex gearboxes. Additionally, ongoing research aims to enhance diagnosis accuracy by enriching the model clouds with physics-informed data-driven statistical time series models.

Author Contributions

Conceptualization, D.M.B. and J.S.S.; methodology, D.M.B. and J.S.S.; software, D.M.B.; validation, D.M.B.; formal analysis, D.M.B.; investigation, D.M.B. and J.S.S.; resources, D.M.B. and J.S.S.; data curation, D.M.B.; writing—original draft preparation, D.M.B.; writing—review and editing, D.M.B. and J.S.S.; visualization, D.M.B.; supervision, J.S.S.; project administration, J.S.S.; funding acquisition, J.S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the 4th Call for HFRI PhD Fellowships (Fellowship Number 10820).

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AEAutoencoder
ACFAutocorrelation Function
ARAutoregressive
BICBayesian Information Criterion
DAEDeep Autoencoder
DLDeep Learning
DTLDeep Transfer Learning
EMDEmpirical Mode Decomposition
FFTFast Fourier Transform
VFP–ARVector Functionally Pooled Autoregressive
FPRFalse Positive Rate
GMFGear Mesh Frequency
MLMachine Learning
MSEMean Squared Error
NLSNonlinear Least Squares
NNNeural Network
OLSOrdinary Least Squares
psdpower spectral density
ROCReceiver Operating Characteristic
RMSRoot Mean Square
RSSResidual Sum of Squares
SAEStacked Autoencoder
TPRTrue Positive Rate
WCsWorking Conditions

References

  1. Chen, J.; Lin, C.; Peng, D.; Ge, H. Fault Diagnosis of Rotating Machinery: A Review and Bibliometric Analysis. IEEE Access 2020, 8, 224985–225003. [Google Scholar] [CrossRef]
  2. Feng, K.; Ji, J.C.; Ni, Q.; Beer, M. A review of vibration-based gear wear monitoring and prediction techniques. Mech. Syst. Signal Process. 2023, 182, 109605. [Google Scholar] [CrossRef]
  3. Lu, S.; He, Q.; Wang, J. A review of stochastic resonance in rotating machine fault detection. Mech. Syst. Signal Process. 2019, 116, 230–260. [Google Scholar] [CrossRef]
  4. Matania, O.; Dattner, I.; Bortman, J.; Kenett, R.S.; Parmet, Y. A systematic literature review of deep learning for vibration-based fault diagnosis of critical rotating machinery: Limitations and challenges. J. Sound Vib. 2024, 590, 118562. [Google Scholar] [CrossRef]
  5. Hünemohr, D.; Litzba, J.; Rahimi, F. Usage Monitoring of Helicopter Gearboxes with ADS-B Flight Data. Aerospace 2022, 9, 647. [Google Scholar] [CrossRef]
  6. Lei, Y.; Zuo, M.J. Gear crack level identification based on weighted K nearest neighbor classification algorithm. Mech. Syst. Signal Process. 2009, 23, 1535–1547. [Google Scholar] [CrossRef]
  7. Lei, Y.; Zuo, M.J.; He, Z.; Zi, Y. A multidimensional hybrid intelligent method for gear fault diagnosis. Expert Syst. Appl. 2010, 37, 1419–1430. [Google Scholar] [CrossRef]
  8. Xie, J.; Zhang, L.; Duan, L.; Wang, J. On cross-domain feature fusion in gearbox fault diagnosis under various operating conditions based on Transfer Component Analysis. In Proceedings of the IEEE International Conference on Prognostics and Health Management (ICPHM), Ottawa, ON, Canada, 20–22 June 2016; pp. 1–6. [Google Scholar]
  9. Tayyab, S.M.; Chatterton, S.; Pennacchi, P. Fault Detection and Severity Level Identification of Spiral Bevel Gears under Different Operating Conditions Using Artificial Intelligence Techniques. Machines 2021, 9, 173. [Google Scholar] [CrossRef]
  10. Wang, D. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited. Mech. Syst. Signal Process. 2016, 70, 201–208. [Google Scholar] [CrossRef]
  11. Boškoski, P.; Juričić, Đani. Fault detection of mechanical drives under variable operating conditions based on wavelet packet Rényi entropy signatures. Mech. Syst. Signal Process. 2012, 31, 369–381. [Google Scholar] [CrossRef]
  12. Tabrizi, A.; Garibaldi, L.; Fasana, A.; Marchesiello, S. Early damage detection of roller bearings using wavelet packet decomposition, ensemble empirical mode decomposition and support vector machine. Meccanica 2015, 50, 865–874. [Google Scholar] [CrossRef]
  13. Liu, R.; Yang, B.; Zio, E.; Chen, X. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process. 2018, 108, 33–47. [Google Scholar] [CrossRef]
  14. Gao, T.; Yang, J.; Wang, W.; Fan, X. A domain feature decoupling network for rotating machinery fault diagnosis under unseen operating conditions. Reliab. Eng. Syst. Saf. 2024, 252, 110449. [Google Scholar] [CrossRef]
  15. Shi, Y.; Deng, A.; Deng, M.; Xu, M.; Liu, Y.; Ding, X.; Bian, W. Domain augmentation generalization network for real-time fault diagnosis under unseen working conditions. Reliab. Eng. Syst. Saf. 2023, 235, 109188. [Google Scholar] [CrossRef]
  16. Zhao, C.; Shen, W. A domain generalization network combing invariance and specificity towards real-time intelligent fault diagnosis. Mech. Syst. Signal Process. 2022, 173, 108990. [Google Scholar] [CrossRef]
  17. Liu, X.; Sun, W.; Li, H.; Li, Q.; Ma, Z.; Yang, C. Unknown working condition fault diagnosis of rotate machine without training sample based on local fault semantic attribute. Adv. Eng. Inform. 2024, 61, 102515. [Google Scholar] [CrossRef]
  18. Zhang, M.; D. Wang, W.L.; Yang, J.; Li, Z.; Liang, B. A Deep Transfer Model With Wasserstein Distance Guided Multi-Adversarial Networks for Bearing Fault Diagnosis Under Different Working Conditions. IEEE Access 2019, 7, 65303–65318. [Google Scholar] [CrossRef]
  19. Huang, Y.; Hu, X.; Wang, H.; He, Y.; Cao, J. OAIFAN: A Noise-Robust Discriminative Feature Unification Framework for Cross-Speed Fault Transfer Diagnosis. IEEE Trans. Instrum. Meas. 2025, 74, 1–18. [Google Scholar] [CrossRef]
  20. Yan, S.; Shao, H.; Min, Z.; Peng, J.; Cai, B.; Liu, B. FGDAE: A new machinery anomaly detection method towards complex operating conditions. Reliab. Eng. Syst. Saf. 2023, 236, 109319. [Google Scholar] [CrossRef]
  21. Niu, M.; Jiang, H.; Wu, Z.; Shao, H. An enhanced sparse autoencoder for machinery interpretable fault diagnosis. Meas. Sci. Technol. 2024, 35, 055108. [Google Scholar] [CrossRef]
  22. Rao, M.; Zuo, M.J.; Tian, Z. A speed normalized autoencoder for rotating machinery fault detection under varying speed conditions. Mech. Syst. Signal Process. 2023, 189, 109109. [Google Scholar] [CrossRef]
  23. Shao, H.; Jiang, H.; Zhao, H.; Wang, F. A novel deep autoencoder feature learning method for rotating machinery fault diagnosis. Mech. Syst. Signal Process. 2017, 95, 187–204. [Google Scholar] [CrossRef]
  24. Lu, C.; Wang, Z.Y.; Qin, W.L.; Ma, J. Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process. 2017, 130, 377–388. [Google Scholar] [CrossRef]
  25. Pang, S.; Yang, X. A Cross-Domain Stacked Denoising Autoencoders for Rotating Machinery Fault Diagnosis Under Different Working Conditions. IEEE Access 2019, 130, 377–388. [Google Scholar] [CrossRef]
  26. Pang, S. Stacked maximum independence autoencoders: A domain generalization approach for fault diagnosis under various working conditions. Mech. Syst. Signal Process. 2024, 208, 111035. [Google Scholar] [CrossRef]
  27. Qi, Y.; Shen, C.; Wang, D.; Shi, J.; Jiang, X.; Zhu, Z. Stacked Sparse Autoencoder-Based Deep Network for Fault Diagnosis of Rotating Machinery. IEEE Access 2017, 5, 15066–15079. [Google Scholar] [CrossRef]
  28. Pang, S.; Yang, X. Intelligent fault diagnosis among different rotating machines using novel stacked transfer auto-encoder optimized by PSO. ISA Trans. 2020, 105, 308–319. [Google Scholar] [CrossRef]
  29. Brito, L.C.; Susto, G.A.; Brito, J.N.; Duarte, M.A. An explainable artificial intelligence approach for unsupervised fault detection and diagnosis in rotating machinery. Mech. Syst. Signal Process. 2022, 163, 108105. [Google Scholar] [CrossRef]
  30. Avendaño-Valencia, L.D.; Fassois, S.D. Damage/fault diagnosis in an operating wind turbine under uncertainty via a vibration response Gaussian mixture random coefficient model based framework. Mech. Syst. Signal Process. 2017, 91, 326–353. [Google Scholar] [CrossRef]
  31. Bourdalos, D.M.; Sakellariou, J.S. Vibration-based unsupervised detection of common faults in rotating machinery under varying operating speeds. In Proceedings of the Surveillance, Vibrations, Shock and Noise, Institut Supérieur de l’Aéronautique et de l’Espace [ISAE-SUPAERO], Toulouse, France, 10–13 July 2023. [Google Scholar]
  32. Chen, Y.; Li, Z.; Jiang, Y.; Gong, D.; Zhou, K. Sparse LPV-ARMA model for non-stationary vibration representation and its application on gearbox tooth crack detection under variable speed conditions. Mech. Syst. Signal Process. 2025, 224, 112161. [Google Scholar] [CrossRef]
  33. Chen, Y.; Schmidt, S.; Heyns, P.S.; Zuo, M.J. A time series model-based method for gear tooth crack detection and severity assessment under random speed variation. Mech. Syst. Signal Process. 2021, 156, 107605. [Google Scholar] [CrossRef]
  34. Chen, Y.; Zuo, M.J. A sparse multivariate time series model-based fault detection method for gearboxes under variable speed condition. Mech. Syst. Signal Process. 2022, 167, 108539. [Google Scholar] [CrossRef]
  35. Braun, S. The Extraction of Periodic Waveforms by Time Domain Averaging. Acustica 1975, 32, 69–77. [Google Scholar]
  36. Wang, W.; Wong, A.K. Autoregressive model-based gear fault diagnosis. J. Vib. Acoust. 2002, 124, 172–179. [Google Scholar] [CrossRef]
  37. Zhan, Y.; Mechefske, C.K. Robust detection of gearbox deterioration using compromised autoregressive modeling and Kolmogorov-Smirnov test statistic. Part II: Experiment and application. Mech. Syst. Signal Process. 2007, 21, 1983–2011. [Google Scholar] [CrossRef]
  38. Yang, M.; Makis, V. ARX model-based gearbox fault detection and localization under varying load conditions. J. Sound Vib. 2010, 329, 5209–5221. [Google Scholar] [CrossRef]
  39. Lin, C.; Makis, V. Application of Vector Time Series Modeling and T-squared Control Chart to Detect Early Gearbox Deterioration. Int. J. Perform. Eng. 2014, 10, 105–114. [Google Scholar]
  40. Li, X.; Zuo, H.; Hao, P.; Su, Y.; Liu, H.; Xue, C. Early Fault Detection of Gearbox Using TSA and VAR Model Considering Load Variation. In Proceedings of the 2021 Global Reliability and Prognostics and Health Management (PHM-Nanjing), Nanjing, China, 15–17 October 2021; pp. 1–6. [Google Scholar]
  41. Bourdalos, D.; Sakellariou, J. A statistical time series model-based method for robust detection of incipient faults in rotating machinery under different operating conditions. Mech. Syst. Signal Process. 2025, 238, 113204. [Google Scholar] [CrossRef]
  42. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification; John Wiley and Sons: Hoboken, NJ, USA, 2001; pp. 34–35. [Google Scholar]
  43. Bishop, C.M. Pattern Recognition and Machine Learning; Springer Science + Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  44. Endo, H.; Randall, R. Enhancement of autoregressive model based gear tooth fault detection technique by the use of minimum entropy deconvolution filter. Mech. Syst. Signal Process. 2007, 21, 906–919. [Google Scholar] [CrossRef]
  45. Sakellariou, J.S.; Fassois, S.D. Functionally Pooled models for the global identification of stochastic systems under different pseudo-static operating conditions. Mech. Syst. Signal Process. 2016, 72, 785–807. [Google Scholar] [CrossRef]
  46. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus; John Wiley and Sons: Hoboken, NJ, USA, 1988. [Google Scholar]
  47. Ljung, L. System Identification: Theory for the User, 2nd ed.; Prentice Hall Information and System Sciences Series; Prentice Hall PTR: Englewood Cliffs, NJ, USA, 1999. [Google Scholar]
  48. Haupt, R.; Haupt, S. Practical Genetic Algorithms, 2nd ed.; John Wiley and Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
  49. Farrar, C.R.; Worden, K. Structural Health Monitoring: A Machine Learning Perspective; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  50. Brent, R.P. Algorithms for Minimization Without Derivatives; Prentice-Hall: Englewood Cliffs, NJ, USA, 1973. [Google Scholar]
  51. Sakellariou, J.; Fassois, S. Vibration based fault detection and identification in an aircraft skeleton structure via a stochastic functional model based method. Mech. Syst. Signal Process. 2008, 22, 557–573. [Google Scholar] [CrossRef]
  52. Aravanis, T.C.I.; Sakellariou, J.S.; Fassois, S.D. A stochastic Functional Model based method for random vibration based robust fault detection under variable non-mameasurable operating conditions with application to railway vehicle suspensions. J. Sound Vib. 2020, 466, 115006. [Google Scholar] [CrossRef]
  53. Girdhar, P.; Scheffer, C. 5—Machinery fault diagnosis using vibration analysis. In Practical Machinery Vibration Analysis and Predictive Maintenance; Girdhar, P., Scheffer, C., Eds.; Newnes: Oxford, UK, 2004; pp. 89–133. [Google Scholar]
  54. Sigonde, V.C.; Sozinando, D.F.; Tchomeni, B.X.; Alugongo, A.A. Coupled Nonlinear Dynamic Modeling and Experimental Investigation of Gear Transmission Error for Enhanced Fault Diagnosis in Single-Stage Spur Gear Systems. Dynamics 2025, 5, 37. [Google Scholar] [CrossRef]
  55. Li, X.; Chen, K.; Huangfu, Y.; Ma, H.; Zhao, B.; Yu, K. Vibration characteristic analysis of spur gear systems under tooth crack or fracture. J. Low Freq. Noise, Vib. Act. Control 2021, 40, 135–153. [Google Scholar] [CrossRef]
  56. Korolis, J.; Bourdalos, D.; Sakellariou, J. Machine Learning-Based Damage Diagnosis in Floating Wind Turbines Using Vibration Signals: A Lab-Scale Study Under Different Wind Speeds and Directions. Sensors 2025, 25, 1170. [Google Scholar] [CrossRef]
  57. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Figure 1. Flowchart of the ML fault detection and severity characterization methodology.
Figure 1. Flowchart of the ML fault detection and severity characterization methodology.
Machines 14 00009 g001
Figure 2. Example of angular resampling and speed subranges selection: (a) frequency spectra (time-domain) at five different speeds considered; (b) order spectra obtained using a common angular resampling frequency f s θ = f s / ω 5 ; (c,e) frequency spectra for two adjacent speed subranges in the case of a maximum frequency of interest f H should be retained and, (d,f) the respective order spectra based on subrange specific angular resampling frequencies. The yellow shaded regions indicate bandwidths to be filtered out in each case prior to angular resampling.
Figure 2. Example of angular resampling and speed subranges selection: (a) frequency spectra (time-domain) at five different speeds considered; (b) order spectra obtained using a common angular resampling frequency f s θ = f s / ω 5 ; (c,e) frequency spectra for two adjacent speed subranges in the case of a maximum frequency of interest f H should be retained and, (d,f) the respective order spectra based on subrange specific angular resampling frequencies. The yellow shaded regions indicate bandwidths to be filtered out in each case prior to angular resampling.
Machines 14 00009 g002
Figure 3. Flowchart of the ML fault detection and severity characterization methodology training and inspection (real-time) phases.
Figure 3. Flowchart of the ML fault detection and severity characterization methodology training and inspection (real-time) phases.
Machines 14 00009 g003
Figure 4. The experimental set-up: (a) photo of the gearbox including the sensors (accelerometer and tachometer) locations, (b) cross-sectional view of the one-stage gearbox and (c) the pinion single-tooth fault scenarios F25, F50, F75 and F100.
Figure 4. The experimental set-up: (a) photo of the gearbox including the sensors (accelerometer and tachometer) locations, (b) cross-sectional view of the one-stage gearbox and (c) the pinion single-tooth fault scenarios F25, F50, F75 and F100.
Machines 14 00009 g004
Figure 5. Time domain vibration signals (one per health state) from an indicative speed of 20 rev/s and 2nd load level: (a) Healthy, (b) Healthy & F25, (c) Healthy & F50, (d) Healthy & F75, (e) Healthy & F100.
Figure 5. Time domain vibration signals (one per health state) from an indicative speed of 20 rev/s and 2nd load level: (a) Healthy, (b) Healthy & F25, (c) Healthy & F50, (d) Healthy & F75, (e) Healthy & F100.
Machines 14 00009 g005
Figure 6. Commonly used for fault diagnosis time-domain feature values estimated from vibration signals of healthy and faulty gearbox across all 61 speeds and 4 loads. Each subplot corresponds to a single feature while each color to a different health state; 1220 signals per plot: 244 per health state, 1 per speed and load combination; the two black dashed horizontal lines in each subplot denote the minimum and maximum value of the corresponding feature for the healthy state.
Figure 6. Commonly used for fault diagnosis time-domain feature values estimated from vibration signals of healthy and faulty gearbox across all 61 speeds and 4 loads. Each subplot corresponds to a single feature while each color to a different health state; 1220 signals per plot: 244 per health state, 1 per speed and load combination; the two black dashed horizontal lines in each subplot denote the minimum and maximum value of the corresponding feature for the healthy state.
Machines 14 00009 g006
Figure 7. Order-spectrum zones based on Fast Fourier Transform (FFT) amplitude estimates using a single angular vibration signal from each rotating speed and load (244 angular signals for each health state): (a) Healthy vs. fault F25, (b) Healthy vs. fault F50, (c) Healthy vs. fault F75, (d) Healthy vs. fault F100; the black dashed lines represent Gear Meshing Frequencies (GMFs); zoom versions focus on 1×, 2× and 3× GMF.
Figure 7. Order-spectrum zones based on Fast Fourier Transform (FFT) amplitude estimates using a single angular vibration signal from each rotating speed and load (244 angular signals for each health state): (a) Healthy vs. fault F25, (b) Healthy vs. fault F50, (c) Healthy vs. fault F75, (d) Healthy vs. fault F100; the black dashed lines represent Gear Meshing Frequencies (GMFs); zoom versions focus on 1×, 2× and 3× GMF.
Machines 14 00009 g007
Figure 8. Representative VFP–AR model identification for the healthy state at the fourth speed subrange (model No. 4 within the healthy cloud, see Table 3): (a) model order determination through BIC criterion (blue box-plots) together with the Residuals Sum of Squares (RSS) values (red box-plots) computed from 33 training signals (one per speed and load). (b) Model validation performed through the residuals’ Autocorrelation Function (ACF) using an indicative signal from the training set. (c,d) Indicative parameters ( a 1 and a 3 ) of the identified VFP–AR model shown as functions of rotating speed and load level.
Figure 8. Representative VFP–AR model identification for the healthy state at the fourth speed subrange (model No. 4 within the healthy cloud, see Table 3): (a) model order determination through BIC criterion (blue box-plots) together with the Residuals Sum of Squares (RSS) values (red box-plots) computed from 33 training signals (one per speed and load). (b) Model validation performed through the residuals’ Autocorrelation Function (ACF) using an indicative signal from the training set. (c,d) Indicative parameters ( a 1 and a 3 ) of the identified VFP–AR model shown as functions of rotating speed and load level.
Machines 14 00009 g008
Figure 9. VFP–AR model-based power spectral density (psd) magnitude estimation for healthy state as a function of rotating speed for constant load: (ad) correspond to 1st–4th speed range, respectively.
Figure 9. VFP–AR model-based power spectral density (psd) magnitude estimation for healthy state as a function of rotating speed for constant load: (ad) correspond to 1st–4th speed range, respectively.
Machines 14 00009 g009
Figure 10. VFP–AR model-based power spectral density (psd) magnitude estimation for healthy state as a function of load at 4 different constant speeds, one from each speed range: (ad) correspond to 1st–4th speed range, respectively.
Figure 10. VFP–AR model-based power spectral density (psd) magnitude estimation for healthy state as a function of load at 4 different constant speeds, one from each speed range: (ad) correspond to 1st–4th speed range, respectively.
Machines 14 00009 g010
Figure 11. Fault Detection results of the proposed ML methodology: (ad) corresponds to 1st–4th speed range; (a1d1) Pena-Rodriguez D statistic and, (a2d2) corresponding ROC curves. 18,300 test signals in total (inspection phase) from the healthy and faulty gearbox under 61 different rotating speeds and 4 different loads.
Figure 11. Fault Detection results of the proposed ML methodology: (ad) corresponds to 1st–4th speed range; (a1d1) Pena-Rodriguez D statistic and, (a2d2) corresponding ROC curves. 18,300 test signals in total (inspection phase) from the healthy and faulty gearbox under 61 different rotating speeds and 4 different loads.
Machines 14 00009 g011
Figure 12. Fault detection results of the ML methodology via typical confusion matrix (correct classifications indicated by green, and misclassifications by red; 18,300 inspection (test) signals).
Figure 12. Fault detection results of the ML methodology via typical confusion matrix (correct classifications indicated by green, and misclassifications by red; 18,300 inspection (test) signals).
Machines 14 00009 g012
Figure 13. Fault severity characterization results of the ML methodology via typical confusion matrix (correct classifications indicated by green, and misclassifications by red; 14,640 inspection (test) signals).
Figure 13. Fault severity characterization results of the ML methodology via typical confusion matrix (correct classifications indicated by green, and misclassifications by red; 14,640 inspection (test) signals).
Machines 14 00009 g013
Figure 14. Illustration of an indicative Deep Autoencoder (DAE) architecture.
Figure 14. Illustration of an indicative Deep Autoencoder (DAE) architecture.
Machines 14 00009 g014
Figure 15. Illustration of an indicative deep Stacked Autoencoder (SAE) architecture.
Figure 15. Illustration of an indicative deep Stacked Autoencoder (SAE) architecture.
Machines 14 00009 g015
Figure 16. Fault Detection results of the Deep Autoencoder (DAE) method via (left) Residual RMS scatter plot and (right) corresponding confusion matrix. 18,300 test signals in total (inspection phase) from the healthy and faulty gearbox under 61 different rotating speeds and 4 different loads; correct classifications indicated by green, and misclassifications by red.
Figure 16. Fault Detection results of the Deep Autoencoder (DAE) method via (left) Residual RMS scatter plot and (right) corresponding confusion matrix. 18,300 test signals in total (inspection phase) from the healthy and faulty gearbox under 61 different rotating speeds and 4 different loads; correct classifications indicated by green, and misclassifications by red.
Machines 14 00009 g016
Figure 17. Fault severity characterization results of the deep Stacked Autoencoder (SAE) method via typical confusion matrix (correct classifications indicated by green-, and misclassifications by red; 14,640 inspection (test) signals).
Figure 17. Fault severity characterization results of the deep Stacked Autoencoder (SAE) method via typical confusion matrix (correct classifications indicated by green-, and misclassifications by red; 14,640 inspection (test) signals).
Machines 14 00009 g017
Table 1. Details on the vibration signals used in the training and inspection phases.
Table 1. Details on the vibration signals used in the training and inspection phases.
Gearbox StateRotating Speed (rev/s)Load LevelNo. of Signals *
per Speed & Load
No. of Different
Speeds
No. of Signals
per State
Training (learning) phase
Healthy { 10 , 10.5 , , 25 } [ 1 , 2 , 4 ] 13193
(step of 0.5 )
F25/F50/F75/F100Machines 14 00009 i001Machines 14 00009 i002Machines 14 00009 i003Machines 14 00009 i00493
Inspection (testing) phase
Healthy { 10 , 10.25 , , 25 } [ 1 , 2 , 3 , 4 ] 15613660
(step of 0.25 )
F25/F50/F75/F100Machines 14 00009 i001Machines 14 00009 i002Machines 14 00009 i003Machines 14 00009 i0043660
* Sampling frequency: f s = 10 , 240 Hz, Signal length: N = 51 , 200 samples (5 s). Frequency bandwidth: B W = [0–5120] Hz. Total No. of signals = 18,765. Training signals = 465 (2.5%). Inspection (test) signals = 18,300 (97.5%).
Table 2. Details on the selected speed ranges.
Table 2. Details on the selected speed ranges.
Range No.Speeds (rev/s)No. of
Speeds
f s θ (Samples/rev)No. of Rotations
1 [ 10 , 10.25 , , 13 ] 1378749
2 [ 13 , 13.25 , , 16 ] 1364064
3 [ 16 , 16.25 , , 20 ] 1751279
4 [ 20 , 20.25 , , 25 ] 2140999
Table 3. Estimation details for each cloud of VFP–AR models.
Table 3. Estimation details for each cloud of VFP–AR models.
Model No.Speed Range
(rev/s)
ModelSamples
per Parameter
Condition
Number
Healthy Cloud
1 [ 10 , 13 ] VFP–AR(340)9 264.65 9.17 · 10 6
2 [ 13 , 16 ] VFP–AR(360)8 298.67 1.69 · 10 3
3 [ 16 , 20 ] VFP–AR(310)13 274.42 3.75 · 10 5
4 [ 20 , 25 ] VFP–AR(330)11 368.10 2.84 · 10 4
F25 Cloud
5 [ 10 , 13 ] VFP–AR(310)10 261.23 3.19 · 10 6
6 [ 13 , 16 ] VFP–AR(330)11 236.96 7.39 · 10 6
7 [ 16 , 20 ] VFP–AR(270)14 292.57 5.16 · 10 5
8 [ 20 , 25 ] VFP–AR(290)10 460.76 1.16 · 10 3
F50 Cloud
9 [ 10 , 13 ] VFP–AR(300)14 192.82 1.78 · 10 5
10 [ 13 , 16 ] VFP–AR(340)11 229.99 4.91 · 10 5
11 [ 16 , 20 ] VFP–AR(260)16 265.85 1.43 · 10 3
12 [ 20 , 25 ] VFP–AR(240)9 618.61 1.77 · 10 5
F75 Cloud
13 [ 10 , 13 ] VFP–AR(290)10 279.25 1.11 · 10 5
14 [ 13 , 16 ] VFP–AR(300)14 204.80 2.45 · 10 3
15 [ 16 , 20 ] VFP–AR(260)14 303.82 4.82 · 10 6
16 [ 20 , 25 ] VFP–AR(250)14 381.77 3.06 · 10 6
F100 Cloud
17 [ 16 , 20 ] VFP–AR(300)11 245.40 1.18 · 10 4
18 [ 13 , 16 ] VFP–AR(270)9 353.98 5.01 · 10 5
19 [ 16 , 20 ] VFP–AR(270)14 288.91 3.64 · 10 5
20 [ 20 , 25 ] VFP–AR(230)13 446.89 1.53 · 10 6
Estimation method: Ordinary Least Squares. Genetic Algorithm options: population = 110, crossover fraction = 0.8, elite count = 30, tolerance of the objective function = 10 10 , maximum generations = 1500. Functional basis: bivariate Shifted Legendre (orthogonal) polynomials of total degree i + j , obtained as tensor product from two univariate polynomials of degrees i, j, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bourdalos, D.M.; Sakellariou, J.S. A Machine Learning Vibration-Based Methodology for Robust Detection and Severity Characterization of Gear Incipient Faults Under Variable Working Speed and Load. Machines 2026, 14, 9. https://doi.org/10.3390/machines14010009

AMA Style

Bourdalos DM, Sakellariou JS. A Machine Learning Vibration-Based Methodology for Robust Detection and Severity Characterization of Gear Incipient Faults Under Variable Working Speed and Load. Machines. 2026; 14(1):9. https://doi.org/10.3390/machines14010009

Chicago/Turabian Style

Bourdalos, Dimitrios M., and John S. Sakellariou. 2026. "A Machine Learning Vibration-Based Methodology for Robust Detection and Severity Characterization of Gear Incipient Faults Under Variable Working Speed and Load" Machines 14, no. 1: 9. https://doi.org/10.3390/machines14010009

APA Style

Bourdalos, D. M., & Sakellariou, J. S. (2026). A Machine Learning Vibration-Based Methodology for Robust Detection and Severity Characterization of Gear Incipient Faults Under Variable Working Speed and Load. Machines, 14(1), 9. https://doi.org/10.3390/machines14010009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop