Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review

Junquera, Enrique; Pérez-Carrera, Carlos; Rubio, Higinio; Bustos, Alejandro

doi:10.3390/electronics15071381

Open AccessSystematic Review

Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review

¹

MAQLAB Research Group, Department of Mechanical Engineering, Universidad Carlos III de Madrid, Av. de la Universidad, 30, 28911 Leganés, Spain

²

Department of Industrial Engineering, University of Salerno, Via Giovanni Paolo II, 132, Fisciano, 84084 Salerno, Italy

³

MAQLAB Research Group, Department of Mechanics, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal, 12, 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(7), 1381; https://doi.org/10.3390/electronics15071381

Submission received: 15 January 2026 / Revised: 27 February 2026 / Accepted: 23 March 2026 / Published: 26 March 2026

(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence, 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The use of Machine Learning tools for studying, among others, vibrating signals that enable a comprehensive analysis of the state of the elements under study through Machine Learning techniques has become widespread. Considering the main traditional classification methods of these tools and their associated use of artificial intelligence, this paper thoroughly analyses both current approaches and trends in their use, as well as examining intelligent means for diagnosing faults and monitoring the condition of mechanical systems. These methodologies are becoming increasingly common in Industry 4.0. The objective of this paper is to systematically review the latest trends in research and development for the diagnosis of faults and monitoring the condition of rotating equipment using Artificial Intelligence tools. Therefore, this paper studies Machine Learning techniques applied to the analysis of signals from rotating mechanical elements, particularly bearings and shafts, with a special focus on the classification of the condition of railway rolling stock.

Keywords:

machine learning; vibration analysis; machinery diagnostics: mechanical faults; signal processing

1. Introduction

We are currently undergoing the Fourth Industrial Revolution, commonly referred to as Industry 4.0, which essentially involves the integration of advanced sensors that collect all kinds of information in real time. This information is processed and analysed by computers with the appropriate software, thus optimizing decision-making.

The constant evolution of computer equipment, which allows for the handling and storage of continuously growing volumes of information, and consequently, of the applications that, in turn, allow for complex calculations using ever-increasing amounts of data in shorter timeframes, constitutes the combination of elements that has given rise to what is known as Artificial Intelligence (AI) [1]. AI is, in essence, the development of techniques and methods that simulate the functioning of the human brain for information processing and decision-making [2].

The use of these tools in the railway sector can lead not only to a significant increase in operational safety, but also to a drastic reduction in the costs associated with maintenance and overall operation.

Currently, a wide variety of tools have been developed that, when combined, have made it possible to address numerous problems related to mechanics [3], medicine [4], climatology [5], and a wide range of other fields [6].

Therefore, this document explores the main trends in the use of AI tools and the challenges that must be addressed in the near future, both for their development and their application in maintenance tasks, primarily those related, in this case, to the maintenance of railway equipment.

Structure of the Paper

The structure of the paper is as follows: The following section describes the development of the techniques employed. Section 3 details the literature review methodology. Section 4 shows the results. Section 5 contains the discussions of this work. Section 6 contains the conclusions.

2. Historical Perspective

This section presents an analysis of the state of the art and theoretical foundation prior to the study that is the subject of this document, thus ensuring the context of the study so that the objectives, challenges and future work have the appropriate theoretical framework.

2.1. Early Works

One of the greatest aspirations of scientists throughout history has been to develop apparatus and instruments capable not only of performing calculations, but also of thinking autonomously. In this regard, Leonardo Torres Quevedo stated in 1914: “I will attempt to demonstrate in this note, from a purely theoretical point of view, that it is always possible to construct an automaton whose actions all depend on certain circumstances, more or less numerous, which obey rules that can be arbitrarily imposed at the time of construction. Evidently, these rules must be sufficient to determine, at any moment, without any uncertainty, the behavior of the automaton [7].” In 1936, Turing introduced the concept of the universal machine [8] as a device capable of executing any algorithm. Although this concept was originally formulated from the perspective of mathematical logic, it ultimately became the theoretical foundation of automata theory and automatic computation and in 1943, the first mathematical model of neural networks was established by McCulloch and Pitts [9].

Significant advances in the fields of automation and computation were established by von Neumann. In 1945, he set out the essential principle underlying the operation of modern computers—the von Neumann architecture—[10] and subsequently made one of the earliest systematic attempts to formulate a general theory of automata in a lecture delivered in 1948 and published in 1951 [11].

Once the mechanical foundations were established, the subsequent step is for machines to acquire the capacity to think autonomously, that is, to replicate the functioning of the human brain. Thus, with advances in new electronic and control devices during WWII, McCarthy et al. [1] defined the foundations of what would come to be known as Artificial Intelligence, based on the assumption that, through the use of computing machines, the functioning of the human brain can be replicated.

Although the foundations of what is referred to as Artificial Intelligence were laid by McCarthy et al. [1], the dominant theoretical framework remains the rational agent approach provided by Russell and Norvig [2], what Artificial Intelligence signifies when understood in its broadest sense, “namely:

I.: Artificial Intelligence.
II.: Problem-solving.
III.: Knowledge, reasoning, and planning.
IV.: Uncertain knowledge and reasoning.
V.: Machine Learning.
VI.: Communicating, perceiving, and acting.”

In this sense, they describe Artificial Intelligence in terms of “four possible goals:

Systems that think like humans.
Systems that act like humans.
Systems that think rationally.
Systems that act rationally.”

Learning constitutes a fundamental component of the system, such that programs are able to improve their performance; that is, machine learning—a term coined by A. Samuel [12]—which laid the foundations for the subsequent development of the concept, based on the premise that the study and development of algorithms enable them to learn from data in a general manner in order to perform tasks autonomously without the need for explicit instructions.

Machine Learning constitutes a fundamental component of what is understood as artificial intelligence; Figure 1 schematically represents Machine Learning as a constituent element.

Once it had been established how automata should operate and the capacity of computers to solve complex problems and perform intricate calculations, thereby laying the foundations that enable machines to learn, the next stage consisted in determining the ability of machines to imitate physical phenomena, as proposed by Feynman [13], and thus their capacity to imitate and replicate physical systems of all kinds.

2.2. Tooling Development

With the development of the theories underpinning Artificial Intelligence and, consequently, machine learning, it becomes necessary to act in parallel and to develop the required mathematical apparatus.

In the present work, a brief description is provided of the most significant ones employed in the research process that has given rise to this document.

Learning is presented as one of the fundamental components of Artificial Intelligence, insofar as it enables systems, through programs and algorithms, to autonomously perform tasks based on learning obtained from the interpretation of information.

Once Machine Learning has been described as a concept and method, the notion of teaching is subsequently considered. Holmberg et al. [14], note that Machine Learning is typically dependent on substantial data sets and, consequently, substantial computational capabilities to attain significant and high-calibre outcomes in domains such as autonomous vehicles and image recognition. Nevertheless, this approach is not without its limitations, insofar as it is deficient in its capacity to engage in logical reasoning and to discern causal relationships. The development of Machine Learning systems and methodologies is an iterative process that typically commences with the collection and labelling of data. This is followed by subsequent stages of analysis and algorithm selection, until a trained model can be developed and implemented [7].

Mosqueira et al. [15] present the perspective of different learning possibilities, considering the involvement of human beings in the process, which includes the concept of Machine Teaching, building upon the classical notion of humans teaching machines and extending it to the teaching process understood in its entirety, thus, we have Active Learning (AL), Machine Teaching (MT), and Interactive Machine Learning (IML), respectively, as an approach to the general state of the art. The authors conclude that one possible way of classifying the methods to be employed is according to whether or not human intervention is involved in the learning and training process of the model.

After developing the concept of Machine Learning (ML), a wide range of possibilities for its application in different domains emerges. Thus, Bayesian networks and decision trees may be employed in transport regulation models [16], while specific techniques can be used for fault diagnosis in machinery and installations in nuclear power plants, Zhong and Ban [17], or for visual recognition, as in Liang et al. [18].

At present, Artificial Intelligence is experiencing significant momentum. On the one hand, advances in hardware that provide increased computational capacity and, in parallel, the development of new programs and software requires increasingly powerful and efficient systems. Prati [19] sets out the principles of quantum computing applied to AI, while Capra et al. [20] examine the most efficient architectures for use in neural networks, focusing, on the one hand, on architectures optimised in terms of energy consumption and, on the other hand, on more efficient use of available memory and their acceleration methods. This line of research on acceleration and more efficient architectures is further developed by Dhilleswararao et al. [21].

Similarly, Zhang et al. [22] outline future options and possibilities for assessing the performance of ML programs and training methods in order to detect faults and errors, while Su et al. [23] present a comparable analysis for the hardware used in computation.

The emergence of more powerful equipment, with greater capacity and speed, capable of performing complex operations in large volumes over shorter periods of time [19,20], has driven the development of Deep Learning.

It is therefore a matter of solving a series of mathematical problems that, on the one hand, require a high computational capacity and, on the other, the ability to perform such calculations at an acceptable speed.

To this end, a range of tools and applications is available on the market to carry out these calculations, including open-source solutions, accessible to the public either fully or partially free of charge—such as Python, R, and the Weka tool—as well as licensed software, including MATLAB^®, Amazon AWS^®, and IBM SPSS^®.

Moreover, there is currently an ever-increasing number of datasets and libraries that provide a wide variety of information in each field. However, this also introduces uncertainty regarding the quality of the accessible information, as well as potential weaknesses, errors, and defects in the execution of files, which may lead to significant distortions in the models to be developed. For this reason, Exploratory Data Analysis (EDA) has been developed [24], enabling users to perform an exploratory analysis of the data available in libraries and thus identify potential issues.

2.3. Data and Information

Both the data and the information used are of paramount importance, both with regard to the information and data to be employed and how they are to be used, as well as in terms of the manner in which such information is obtained, ensuring it is secure and of the highest quality, since the information collected may distort the training of models and, consequently, lead to conclusions that are either erroneous or of limited utility.

The objective of the present document is the analysis of vibrational signals generated by different mechanical assemblies in order to obtain high-quality information to support decision-making.

Vibration signal analysis is currently one of the most widely utilised techniques for the inspection of mechanical components under operational conditions [25], building on early models such as that proposed by McFadden and Smith [26], as it enables testing and assessment across a wide range of elements and scenarios, for example in the operation of wind turbines [27] and within the railway sector, including both infrastructure and rolling stock. A usage guideline for the methodology has been proposed by Randall and Antoni [28], which addresses the analysis of bearing systems [29]; however, the technique can be extended to the analysis and classification of other rotating mechanical systems.

The study of vibrations for condition monitoring is a field common to many researchers, who apply their techniques to rotating components. For example, Antoni and Randall [30] analyse bearing faults in various types of assemblies and operating conditions. Railway systems constitute a key area of study for many researchers working on vibration analysis [31], while others focus on disturbances of the track or ground induced by the passage of rolling stock, as noted by Bustos et al. [32].

Vibration signal analysis is a common method for examining a wide range of mechanical components. Guo et al. [33,34] analyse signals to detect bearing faults, while other authors, such as Luo et al. [35] and Borghesani et al. [36] apply such analyses to shaft components.

One of the most important aspects of signal processing is the ability to interpret the information provided by data in different domains, such as the time and frequency domains [25]. Accordingly, depending on the purpose and the properties of the signal, one domain may be preferable to the other.

Vibration signals generated by mechanical systems are analysed [32] through the grouping and decomposition of their components using, among other methods, the Discrete Fourier Transform (DFT), the Fast Fourier Transform (FFT), as well as the Hilbert transform [37].

In recent years, a method for interpreting vibration signals has gained widespread acceptance. The methodology relies on constructing envelopes determined by local maxima and minima, which are linked using a cubic spline. The local maxima form the upper envelope, while the local minima define the lower envelope, ensuring that all data points lie between these bounds. Once the envelopes are established, the subsequent step is to define the conditions necessary to extract the so-called Intrinsic Mode Function (IMF) through the decomposition of the signals using Empirical Mode Decomposition (EMD), widely in use [38] since its introduction by Huang et al. [39] and the subsequent development of the methodology.

2.4. Model Training

In recent years, the diagnosis of mechanical components through the analysis of vibration signals using Machine Learning tools has experienced significant growth, driven by the synergies generated by research across a broad range of fields, which results in shared benefits. Therefore, the present document does not aim to provide an exhaustive survey of the model training methods employed; rather, it focuses on identifying the most relevant trends in the research field and the challenges it faces. Accordingly, a review of the most significant techniques is presented.

2.4.1. Machine Learning Traditional Tools

The so-called traditional methods are those that have been employed since the early stages of Artificial Intelligence and since the emergence of Machine Learning concepts. Nevertheless, at present, among all available approaches, three remain among the most popular and widely used: support vector machines, decision trees, and k-Nearest Neighbours.

Support Vector Machines (SVM) are robust and adaptable models that have remained a popular classification method since their introduction [40], being employed both as standalone approaches [41] and in combination with other methodologies as i.e., Deep Learning [42]. Their ability to construct optimal decision boundaries by maximizing the margin between classes allows SVMs to achieve strong generalization performance, even in high-dimensional feature spaces. Furthermore, using kernel functions [43], SVMs are capable of effectively handling non-linear classification problems. As a result, SVMs have demonstrated considerable effectiveness in solving a wide range of problems across different application domains; however, they also exhibit a tendency towards over-dimension [44], an issue that has been addressed by various authors in different studies [45,46].

Decision trees are supervised Machine Learning models [47] that do not require parameterization methods and exhibit a high classification accuracy through feature-based data separation. However, they show a certain tendency towards overfitting under specific conditions due to their mode of operation [48], which is based on rule-based data splitting through partitions of the prediction space into regions using inherent statistical criteria. Nevertheless, they have been extensively developed and refined [49].

K-nearest neighbours is a simple and straightforward method, based on analysing a given number, k, of neighbouring entities to perform prediction and classification of the units under study.

In their 2013 study, Andre et al. [50], using a combination of SVM and k-Nearest Neighbours for defect classification based on vibration signal analysis and the signature these signals achieved results exceeding 80% accuracy under various operating conditions.

2.4.2. Wavelet Transform

Currently, the Wavelet Transform is increasingly widely used, particularly for signal analysis and, more specifically, for vibration signals of interest in railway maintenance applications. The Wavelet Transform analyses signals by decomposing them into components in both the time and frequency domains [51], representing a given signal as a combination of basis functions known as wavelets, which are derived from a base or mother wavelet through scaling and translation operations with significant differences with respect to the Fourier Transform [52]. This methodology enables the analysis of local signal characteristics at different resolutions and is therefore particularly well suited to non-stationary or transient signals [53].

As mentioned, the use of the Wavelet Transform provides a reliable option for the interpretation of non-stationary signals and also for comparative analysis of signals obtained under different operating conditions [54]. It is also employed in combination with other methodologies in systems that may be described as hybrid; for example, alongside more traditional methods such as k-Nearest Neighbours [54] or decision trees [55], as well as in combination with deep learning techniques [56].

Owing to its flexibility, the Wavelet Transform exhibits certain similarities with the Empirical Mode Decomposition method [57], as both approaches aim to decompose the signal. In the case of the WT, this is achieved using a mother wavelet, whereas in EMD the signal is decomposed into IMF. The signal is decomposed according to criteria that are intrinsically linked to the signal itself [58], while in the case of the WT it is necessary to select and scale the analysing functions, with the possibility of not achieving the desired level of efficiency [59].

The flexibility of the method is demonstrated in studies such as that by Puliafito et al. [60], which performs condition analysis using all three of the aforementioned transforms: Fourier, Hilbert–Huang, and wavelet. Rather than employing the method for classification, but instead for the prediction of potential damage in dams [61].

2.4.3. Deep Learning (DL)

Deep learning (DL) [62], or Deep Neural Networks (DNNs) [63], is a subcategory of Machine Learning that handles both linear and non-linear data sets. DNNs consist of multiple hierarchical layers of nodes, neurons, with associated activation functions and weights, which are fully or partially connected and are typically trained—through weight adjustment—using backpropagation and optimisation algorithms [64]. For a schematic representation of deep learning see Figure 2.

Neural networks have undergone rapid development over the past two decades and are now employed in many aspects of everyday life [65].

In the field of deep learning, Convolutional Neural Networks (CNNs), for which the input is a structured tensor, have gained considerable acceptance and proliferation, significantly improving deep learning models for visual tasks since 2011 [66]. However, they are also gaining ground in other research fields, for example in condition assessment and operating analysis of various types of mechanical equipment [67], in particular, in rotating mechanical assemblies suitable for use in railway equipment [68]. They have also demonstrated their usefulness for other types of equipment and services, such as the electrical power grid [69]. They are also currently being used both to design hardware architectures and to improve their efficiency in order to enhance performance in the application of Artificial Intelligence tools [20].

A noteworthy advancement in deep learning is the so-called Transfer Learning (TL) [70]. While traditional approaches assume that all data are acquired under identical operating conditions and share the same distribution and feature space, with homogeneous feature vectors in all cases, current research has also moved towards a transfer learning approach. In this context, it is first necessary to learn features from source data by adjusting the parameters of neural networks, and subsequently to modify the network structure to accommodate changes in data distribution [71]. Thus, when data are obtained from multiple and diverse sources in order to integrate information—that is, when data and knowledge are shared across different equipment and devices with heterogeneous types of information—this paradigm is referred to as federated learning [72].

2.4.4. Fault Diagnosis (FD) and Condition Monitoring (CM)

In contemporary practice, and within the framework of mechanical system state determination, two fundamental principles are recognised: Fault Diagnosis (FD) and Condition Monitoring (CM).

Fault Diagnosis [73] may be defined as the systematic process that undertakes the detection (particularly in the mechanical systems addressed in this manuscript), the isolation from other events, and the identification of abnormal operating conditions within such systems, with particular emphasis on moving and dynamic systems.

Therefore, Fault Diagnosis [35,74] constitutes the early-stage diagnosis of mechanical systems aimed at determining the root causes of abnormal operating conditions and, consequently, enabling the implementation of appropriate interventions to ensure system reliability, operational continuity, and, ultimately, overall safety.

Condition Monitoring [75] may be defined as the systematic process of acquiring measurable parameters from a mechanical system in order to evaluate its operational condition through the analysis and interpretation of such metrics, and thereby enable the detection of faults or defects at their incipient stages, prior to the occurrence of a global failure that would compromise the system’s operability.

The current development of industry entails that both Fault Diagnosis and Condition Monitoring constitute fundamental tools for maintaining the optimal operational state of systems [76,77], which not only reduces operating costs but also significantly enhances safety—critical aspects within the railway industry. Advances in machine learning have contributed to a notable evolution of both tools, as illustrated in this document.

3. Literature Review Methodology

The protocol for this systematic review was retrospectively registered on OSF on 7 March 2026, after the data extraction process had been completed. The protocol is publicly available at: https://doi.org/10.17605/OSF.IO/78CM2 (accessed on 22 March 2026). No changes were made to the objectives or inclusion/exclusion criteria after finalisation of the review.

The selection of published studies was restricted to the 2015–2025 period, using the indicated search criteria and applying study selection methodologies based on relevance and impact. In addition, the Bibliometrix^® (v. 5.0) [78] software was used to support the generation of analyses, figures, and tables. The last search for documents that could be used in this manuscript was carried out on 3 November 2025, after which the search was concluded.

3.1. Implementation of the Proposed Methodology

The information search was conducted using the Scopus and Web of Science databases, which are highly relevant and impactful sources. In addition, a supplementary search was performed using Google Scholar, according to the criteria specified below:

Publication years: 2015–2025.
Document type: article, review.
Publication language: English.
Document availability: Open Access.
Literature search strategy:

(“vibration monitoring” OR “vibration signal processing” OR “vibration diagnostics” OR “condition monitoring”) AND (“mechanical systems” OR “mechanical components” OR “rotating machinery” OR “mechanical structures” OR “machinery diagnostics” OR “mechanical faults”) AND (“techniques” OR “methods” OR “approaches” OR “algorithms” OR “signal processing” OR “data analysis”) AND (“Fourier transform” OR “Wavelet Transform” OR “short-time Fourier” OR “Hilbert-Huang transform” OR “machine learning” OR “deep learning”) AND (“bearings” OR “gears” OR “gearbox” OR “electric motors” OR “turbines” OR “pumps”).

The information search was conducted using the Scopus and Web of Science databases, both of which are highly relevant and impactful sources. In addition, a supplementary search was carried out using Google Scholar.

The methodology has been conducted in accordance with the most rigorous standards to ensure both the clarity and the credibility of the results. Thus, Figure 3 illustrates the process of identifying the most relevant publications in accordance with the PRISMA [79] guidelines.

The total number of documents obtained was 91; after filtering through review techniques, this number was reduced to 51, which are the documents analysed in the present study.

3.2. Searching Results

The final search results, in accordance with the selected criteria, limit the total number of documents to 91. Within the chosen time frame, these documents reflect current trends in related research as well as the challenges that must be addressed. The searching results are shown in Table 1.

4. Results

The review of the manuscripts and data extraction was conducted independently by two reviewers, and any discrepancies were resolved by consensus.

In the previous section, the total number of records identified through the application of the search methods is reported. However, in accordance with the specific requirements related to their application to the maintenance of both rolling stock and railway infrastructure, the number of documents considered for analysis in the present study amounts to 51. Table 2 presents the characteristics of the aforementioned documents, while Figure 4 provides a graphical overview.

The selection of manuscripts was carried out through a progressive and cumulative consideration process which, following an initial screening phase, resulted in the exclusion of those studies that did not meet the previously defined eligibility criteria.

The full-text review enabled confirmation of the final inclusion of the research articles relevant to the intended analysis, thereby ensuring the traceability of the decisions taken across the different stages of the review process.

The inclusion and exclusion criteria were selected with the aim of ensuring that the studies included met the required methodological relevance. Accordingly, investigations whose objectives were applicable to railway systems, and which provided quantitative results together with a sufficiently detailed methodological description of the methods employed were included.

Inclusion criteria:

Studies focused on mechanical systems within the railway domain.
Studies that consider components that are relevant or potentially relevant to the railway system.
Works presenting data, indicators, or comparable results.
Publications with significant impact.

Exclusion criteria:

Lack of direct relevance to railway mechanical systems.
Studies focused on non-mechanical aspects.
Studies analysing systems without a clear justification for applicability or transferability to railway systems.
Duplicate publications or versions of studies already included.
Publications with lower significant impact.
In order to reduce bias and avoid potential conflicts related to spam, it was agreed to remove articles by authors and from the authors’ environment whenever possible.

Finally, and in accordance with the aforementioned criteria, four reviews have been included with the intention of providing theoretical support and reinforcement for the identification of trends in the fields under analysis.

The documents included are listed in Table 3, indicating whether each study addresses Fault Diagnosis (FD) or Condition Monitoring (CM), as well as the model training techniques employed, including Wavelet-based methods, Neural Networks, Deep Learning or generic Machine Learning techniques, indicating the correspondence of each document with the specified items by means of a “yes” designation. In the combinations, when the method used for signal feature extraction is designated by “(FE)”, the method used for classification by “(C)”. When it refers to a comparison, it is denoted as “COM”, and when it refers to a combined use, it is denoted as “CP”. No meta-analysis was conducted.

Two reviewers independently categorised the risk of bias into three levels (high, medium or low), resolving any discrepancies by consensus.

The included investigations in the review were generally assessed as presenting a low to moderate risk of bias. Potential limitations were mainly associated with blinding. Incomplete outcome data was another potential limitation.

For each synthesis, the characteristics of the included studies and their potential risk of bias were reviewed and summarised collectively.

The certainty of the evidence for each outcome was considered by examining the quality, robustness and consistency of the results reported in the studies included in this review.

As shown in Table 3, the study focuses primarily on Fault Detection (FD) and Condition Monitoring (CM), as well as on the machine learning methodologies used for their analysis, which, as observed, are largely limited to three main approaches. Figure 5 presents the co-occurrence chart illustrating the presence of these three principal methods. Keyword co-occurrence analysis in Bibliometrix represents keywords as nodes connected by edges, where the thickness of each edge indicates the strength of co-occurrence between two keywords. The size of each node reflects the frequency of that keyword across the dataset, while colours are used to highlight clusters of closely related keywords.

Finally, in Figure 6 the networks are represented graphically: on the left of the chart are the most frequently cited authors, in the center the authors of the documents cited in the present paper, and on the right the main topics addressed in the documents, highlighting the strong and intensive research connections among all of them.

The results were interpreted in relation to the selected research and its adaptability to the railway context. However, the size and heterogeneity of the selected sample (47 documents) must be taken into account, as must the limitations of the review process. In particular, prospective registration was absent, and the search was restricted to a certain number of languages and databases. The limitations mentioned invite further studies, for which this study may well serve as a basis, expanding, for example, the number of languages used, the databases, or the amount of research used.

All data extracted from the studies analysed in this manuscript are complete and consistent, and therefore no additional conversions, adjustments, or imputations were required. Consequently, all results were considered directly.

Studies were selected according to the predefined inclusion and exclusion criteria and therefore based on their relevance to the objectives of the study; they were subsequently assigned to the corresponding methodological group in each case.

No amendments were made to the information provided at registration or in the review protocol.

5. Discussion

As previously stated, the results of the research are structured into three main groups: studies employing Wavelet Transform techniques, studies based on the use of neural networks and deep learning in general, and investigations using traditional machine learning methods.

Although not strictly a paper reporting tests and experiments [88], this study is included due to its relevance, as it provides a complete and publicly accessible dataset from an experimental investigation of an independent cart system driven by linear motors. The dataset under consideration includes faults in the inner race, outer race, and in the upper and lower bearings, with fault depths of 0.25 mm, 0.5 mm, 1.0 mm, and 1.5 mm, respectively. A total of eight experiments were conducted, with each experiment conducted at two nominal speeds of 1000 mm/s and 2000 mm/s. The acquisition duration for each experiment ranged from 30 s to 2 min. It is important to note that a significant number of experiments included multiple runs, thus ensuring the attainment of statistical reliability. The data were recorded at a sampling frequency of 50 kHz with a 24-bit resolution.

5.1. Wavelet Transform

Analysing vibration signals obtained from a rotating shaft with different types of damage [124], nine fault conditions ranging from 4% to 50% of the diameter were considered, with the fault locations being at the mid-span of the shaft and at lateral sections. Crack diagnosis was carried out using the 3 x energy component for cracks located in the mid-section. According to the authors’ study the use of the Wavelet Packet Transform (WPT) demonstrates its robustness through the acquisition of 1500 vibration signals, obtained in consecutive groups of 100, resulting in a total of 15 groups for each of the study conditions, particularly at higher speed ranges (60 Hz). “It has applications in condition monitoring under stationary conditions. It allows the establishment of parameters that define a machine working under normal operating conditions, and the detection and location of the presence of a crack if a threshold value is exceeded” The study, on the one hand, confirms the validity and robustness of the technique employed; on the other hand, it enables its application to diverse datasets, through the extrapolation and delimitation of the operating conditions defined in the document.

The following study [123] presents a novel method for the detection of manufacturing defects in bearings, employing a technique based on nine different real-valued wavelets, using the Shannon Entropy Criterion to determine the condition of bearings through the analysis of vibration signals. The considered faults range from 0.6311 mm to 1.6236 mm and are additionally verified using an optical contact microscope (GARANT, Hoffmann Group, Remscheid, Germany), yielding a maximum deviation of 4.12% in the case of the smallest fault analysed in the experiment. During the experimental procedure, it is observed that the Morlet wavelet, Continuous Wavelet Transform (CWT), achieves the highest values and is therefore the most effective both for fault detection and for the estimation of its size.

Another approach that has been adopted [54] enables the analysis and differentiation of signals corresponding to four bearing conditions: a healthy bearing and three types of defective bearings, namely faults in the outer race, inner race, and roller. During the testing phase, vibration data were collected at a sampling frequency of 8 kHz for the various operating conditions that were indicated. In order to undertake a rigorous examination and assessment of the efficacy of the proposed approach, the four bearing conditions were meticulously documented at five distinct test speeds: The data was collected at 1000, 1500, 2000, 2500, and 3000 revolutions per minute (rpm), thus yielding a total of 3200 datasets. This method, which constitutes an optimisation of the wavelet analysis technique through an approach based on multiple-speed approximation and impulse modelling, thereby representing a methodological advancement, can be effectively applied to extract fault information from vibration signals, providing high time–frequency resolution for signal analysis. Notably, the three-dimensional feature space was reduced, and an overall accuracy of 100% was achieved in the experimental results.

One of the methods used for condition monitoring of equipment is acoustic signature monitoring [122], which in this study is applied to the assessment of bearing condition. Each bearing exhibits its own acoustic signature, both in a healthy state and under faulty operating conditions. Active Noise Control (ANC) is employed to filter the noisy acoustic signal, with the ANC implemented through adaptive filters. The investigation is conducted using two defective models compared against a healthy reference model. For this purpose, the Morlet wavelet is used as the mother wavelet, and two-dimensional scalograms based on the Morlet wavelet are generated from the acoustic signals. This work demonstrates that acoustic emission constitutes a viable alternative for diagnosing defects in rolling element bearings [122]. A similar method is employed to extract features of defective bearings from the stator current of a loaded machine using the CWT, also based on the complex Morlet wavelet. Two-dimensional and three-dimensional scalograms of the stator current signatures for healthy and faulty bearings are used to characterize defects in the time–frequency domain and, in conjunction with convolutional neural networks, achieve excellent results. The technique, which entails a significant variation in operation, based on noise filtering yields encouraging results and is, moreover, subjected to comparison with other methodologies, thereby providing a method that can be applied under conditions of limited accessibility.

The technique aimed at determining optimal wavelet parameters and suitable statistical functions for conducting the analysis is the focus of the following document [118]. The optimisation algorithm selects the most appropriate feature submatrix in order to improve the final accuracy of the results through an iterative procedure, which represents an improvement in the application of the technique, enabling more efficient data extraction for the selection of the wavelets. The proposed algorithms are applied to experimental data collected during the operation of a heavy-duty industrial oil pump installed in a refinery. As variable-frequency vibration signals are analysed, the time–frequency method of the Wavelet Transform is employed to examine both the local and global content of the acquired signals. Using Genetic Algorithms (GAs), the most significant parameters of this procedure are selected. Consequently, the separability factor is increased, ensuring classification efficiency. The disparity between the decision boundaries is therefore enlarged, leading to a considerable improvement in classifier performance. Finally, it is concluded that the classification results are reliable, with a final test data accuracy of 98%.

In the field of engineering, particularly in the context of defining bearings, the internal radial clearance (IRC) is a paramount parameter [117]. This is due to the fact that it is crucial to their operation, and it exerts a strong influence on their dynamic response. The IRC is defined as the existing clearance between the rings and the rolling elements. It is imperative to emphasise that the selection of the IRC in ball bearings is of paramount importance for understanding the dynamic behaviour and all associated operational aspects. The optimal value of the IRC exerts a substantial influence on noise, generated vibrations, thermal expansion, and consequently, the associated fatigue. Dynamic response is defined as the internal radial clearance, which is the existing clearance between the rings and the rolling elements. Therefore, the selection of the IRC in ball bearings is fundamental for dynamic behaviour and all related operational aspects, and its optimal value consequently has a significant impact on noise, generated vibrations, thermal expansion, and hence the associated fatigue.

The analysis, which constitutes a refinement in the use of the technology and, according to the authors, provides a basis for its application to, for example, different operating speeds, was evaluated using the statistical indicators Fast Fourier Transform and Continuous Wavelet Transform. Through FFT and CWT, the characteristic frequencies present in the spectra were identified and subsequently visualised in the time domain. The method primarily focuses on the analysis of short time series; therefore, the signal was initially divided into ten segments of 1500 data points each, for a total of 15,000 points. Subsequently, the value of a given quantifier was calculated for each short signal, and the final evaluation value was obtained as the average of the results, with the optimal solution of the study corresponding to an IRC of 22 μm.

The method employed in the following document is based on the instantaneous definition of Shannon Spectral Entropy (SSE) [114]. Instantaneous Spectral Entropy (ISE) is applied as a time-dependent, damage-sensitive feature for the detection of fault-related events. In this research, the CWT, and in particular the Generalised Morse Wavelet (GMW), is used to define the Time–Frequency (TF) representations required for the ISE analysis.

The study verifies that the exclusive use of ISE proved to be problematic due to the influence of peaks in the analysis, such as very high-speed, short-duration transients. However, the combined use of ISE with GMW is shown to be effective in fault detection, an enhancement of the wavelet- and entropy-based method, developed in parallel with the previously reviewed document [124], which, in the authors’ words, is applicable to real-time operation and entails a low computational load.

The methodology employed in the following document [112] is fault diagnosis based on vibration signal analysis, combining different machine learning techniques. In this experimental study, a total of 40 signal samples were acquired for each operating condition of the gearbox, without considering load conditions. From the acquired signals, features were extracted using the DWT; in this research, the Haar wavelet function was employed.

The features thus extracted from the feature vector were used as inputs to a decision tree in order to select the most significant features for classifying the different operating conditions of the engine gearbox. This approach ensures a reduction in computational time for model construction by the classifier. The performance of classifiers based on Lazy kNN and K-star was evaluated to assess their effectiveness in classification. Authors employ decision trees to reduce computational time in the selection of significant signal features, subsequently applying classifiers, which constitutes a refinement in the use of the wavelet-based technology.

The comparative study revealed that the K-star classifier outperformed the other classifiers in diagnosing the condition of the internal combustion engine gearbox. The K-star algorithm achieved a maximum classification accuracy of approximately 97.5%, which is significantly higher than that of the remaining methods In a similar study [108], in which two levels of decomposition are required, an additional level is applied to achieve a more effective decomposition. It is notable that the adjacent bands of the selected frequency band reach a target value of at least 70% of the maximum calculated value, and as such, these bands are also employed for envelope analysis. Following the execution of the envelope analysis, the defect frequencies are extracted. In conclusion, an ANFIS (Adaptive Neuro-Fuzzy Inference System) framework (Multi-Input Multi-Output, MIMO) is utilised for the identification of faults occurring in bearings. The authors posit that this approach facilitates enhanced accuracy in the detection and isolation of faults, whilst concurrently ensuring the efficient implementation of the method in practical applications.

In the following study [53], the degradation of bearings was estimated through the analysis of six bearings, using both original data and artificially generated data, and applying two machine learning models: Ensemble Bagged Trees (EBT) and Squared Exponential Gaussian Process Regression (SEGPR). The first step involved capturing the raw vibration signals and preprocessing them using selected DWT functions. Thirteen statistical features were then extracted, and to demonstrate their usefulness for bearing degradation prediction, an artificial feature vector was generated for comparative purposes.

Subsequently, the two machine learning models were applied with five-, ten-, and fifteen-segment cross-validation. The MAE values obtained following the application of the ML models were 0.0039 and 0.0040 for the SEGPR and EBT models, respectively. In a comparable manner, the mean MAE values that were obtained using the ML models on the artificial feature set were 0.0037 and 0.0039, respectively.

As indicated on next document [109], induction motors are extensively utilised in the manufacturing and energy sectors, owing to their numerous advantages, including cost-effectiveness, simplicity of mechanism, robust design, high efficiency, and reliability. Nonetheless, it is imperative to acknowledge the inherent probability of malfunctions, given the exposure of motors to substantial electrical and mechanical stresses over extended operational durations. The classification of faults can be categorised into various types, including bearing faults, rotor-related faults, and stator faults, among others. Approximately 44% of these faults occur in bearings, which can arise in any of the four main components: the inner race, the outer race, the balls, and the cage, with 90% of bearing faults occurring in the races.

The time–frequency analysis technique employed is the Wavelet Scattering Transform (WST). In this study, two ensemble learning algorithms are used: Random Forest (RF) and Extreme Gradient Boosting (XGBoost). During the data acquisition phase, 32 different test bearings were used, including six normal bearings, 12 defective bearings with artificially induced damage, and 14 defective bearings. This technique not only reduces the overall model complexity but also achieves an accuracy exceeding 99% using only the feature extraction and classification steps, the outcome of the research undertaken indicates that, through the enhancement applied to the classification method based on features obtained via data extraction, it achieves greater efficacy in comparison with other methods, in accordance with the results reported by the authors.

A study based on the use of the Wavelet Scattering Transform, as in the previous study [109], was conducted [90]. The number of wavelet scattering coefficients obtained from vibration signals of different lengths was found to vary due to the use of wavelet filters and the predefined scale within the scattering network. Moreover, these scattering coefficients are associated with different scattering paths within the wavelet scattering networks. An investigation was conducted into eight distinct signal lengths, corresponding to fifteen distinct fault classes and operating conditions. This investigation involved the extraction of wavelet scattering features. Specifically, 300, 450, 600, 1500, 3000, 4500, 6000, and 7500 samples were utilised for vibration signal epochs with lengths of 600, 900, 1200, 3000, 6000, 9000, 12,000, and 15,000 samples, respectively. The computational results indicated that a minimum of 20% of the total dataset was required to train the multiclass SVM classifiers to achieve a classification accuracy exceeding 98.7% across all bearing types.

The study [85] proposes a methodology for bearing condition monitoring, integrating feature extraction via the Wavelet Packet Transform with a Decision Tree (DT) classifier. The performance of the DT model is evaluated against Support Vector Machine (SVM) and Feedforward Neural Network (FFW) classifiers. The DT model achieved the highest classification accuracy of 95.83%, surpassing the 95.01% attained by the SVM and the 86.72% of the FFW. Beyond accuracy, the DT model also demonstrated superior precision, recall, and F1-score, with all metrics exceeding 95%. Furthermore, the DT model exhibited remarkable computational efficiency, with a training time of only 0.502 s, compared to 2.5478 s for the SVM and 62.2951 s for the FFW, as a demonstration that the use of WT in combination with traditional ML methods is viable for the classification of elements for both CM and FD.

The selected studies are of considerable interest both for their methodology and the results obtained, as well as for the adaptability they demonstrate for use in the categorisation of railway rolling stock. The study on the use of Wavelet Transform [85] to extract signal features for subsequent analysis using various Machine Learning methods highlights both the flexibility and the adaptable application of WT. Of particular interest is the comparison of results obtained with some of the most widely used methods, including DT and SVM, relative to neural networks—specifically, FFW in this case.

From the perspective of railway rolling stock categorisation, the study based on WPT [124] is particularly noteworthy due to its versatility. It was employed in this research to determine the condition of railway axles under various integrity states and operational conditions, using WPT and yielding significant results in defect identification achieving a reliability considerably higher than 90%. Furthermore, the study enables the investigation of the method’s use for different crack types and across multiple locations along the axle, providing a basis for future research. Also noteworthy is the study utilising the WST [90] for bearings, in which data extracted through the WT is used for multi-class classification with SVM.

5.2. Deep Learning

This section begins by referring to two review articles that analyses conventional approaches and examines in detail intelligent methods for Fault Diagnosis and Condition Monitoring, one [105] of electrical units and the other of Industrial Robot [101]. The review also compiles various methods that can be applied to both FD and CM in the railway sector.

A study is reported [119], a dataset was gathered from a rotor–bearing system by varying both the severity levels and types of faults present in the rotor and bearing components. This procedure resulted in 48 distinct machine operating conditions. The classifier was developed by integrating two one-dimensional Convolutional Neural Networks (1-D CNNs), with each network dedicated to diagnosing faults in the rotor and the bearings, respectively. By simulating five fault states, determined by the location and number of additional weights, the rotor and bearing fault-state models achieved test accuracies of 97.0% and 98.90%, respectively. The combination of these models within the dual rotor–bearing classification framework produced an overall accuracy of 95.93%. The study indicates that the importance of considering noise in the signals has been recognised, and that the work represents an advancement due to the robustness of the method against Gaussian noise, to which the rotor classifier is particularly sensitive.

An investigation employing the UNET model, originally developed for image analysis, is presented [116]. The model’s distinctive architectural design enables it to perform pixel-level feature learning through the utilisation of vibration images. A comparative analysis with other DL models confirmed the superiority of the UNET model, which achieved maximum accuracy of 98.91% and an F1-score of 99%. This methodology entails the allocation of a label to each individual sample in the dataset, as opposed to the implementation of window-based labelling. This approach ensures the preservation of all label-related information within the dataset. The U-Net architecture consists of two paths—a contracting path and an expanding path—which are symmetrical to each other. A nine-level U-Net architecture was employed for pixel-level feature learning in bearing fault classification, achieving an accuracy of 98.91% and a highest F1-score of 99%. A novel method which, according to the information provided by the researchers, is a robust approach yielding results comparatively superior to those obtained, for example, with CNN or LeNet-5, thereby offering potential applicability for FD.

An article puts forward a two-stage RGBVI–CNN method [115], which is then evaluated. This method combines RGB Vibration Images (RGBVIs) and CNN for the purpose of bearing fault diagnosis. The initial phase of the proposed methodology entails the generation of RGB vibration images from the input vibration signals. The process entails the conversion of one-dimensional vibration signals into two-dimensional greyscale vibration images. Subsequent to the completion of the conversion process, Regions Of Interest (ROIs) are identified within the converted images. Subsequently, an algorithm is applied to the two-dimensional greyscale vibration images in order to generate RGB vibration images based on connected components (RGBVIs) with colour sets and texture features, thereby producing vibration images with more discriminative features.

In the second stage, CNN-based architecture is employed to automatically learn features from the resulting RGBVIs and to classify the bearing health conditions. The proposed method is validated through the utilisation of two bearing fault classification cases. The experimental results demonstrate the efficacy of the RGBVI-CNN approach in state determination from bearing vibration signals and the classification of bearing conditions under varying load levels with a high degree of accuracy. According to the authors, a method in its initial stages and promising for the use of FD in bearings, but they note the potential for other applications through Transfer Learning, such as its use in CM.

The next article considered [107] proposes a hybrid diagnostic method based on a CNN–MLP (Multi-Layer Perceptron) model that combines heterogeneous data for bearing diagnosis. The method has been demonstrated to successfully detect and localise bearing defects using acceleration data acquired from a wireless accelerometer mounted on the shaft. The experimental results demonstrate that the hybrid model outperforms standalone CNN and MLP models, achieving a detection accuracy of 99.6% for bearing faults, compared with 98% for the CNN model and 81% for the MLP model.

The proposed framework integrates an MLP for numerical inputs and a CNN for Hilbert–Huang Transform images, obtained through EMD of the signals and the subsequent extraction of the associated IMFs. The hybrid approach achieves an accuracy of 99.6% in bearing fault diagnosis, compared with 98% for the CNN model and 81% for the MLP model. All experiments reported in this study were conducted at a single shaft rotational frequency. It is a novel method consisting of a hybrid model which, according to the results obtained by the researchers, proves advantageous for FD in bearings, reportedly outperforming CNN and MLP models individually, although they indicate that future studies will be conducted to validate the results for different shaft speeds.

The proposed Physics-Informed Residual Network (PIResNet) is predicated on the physical layer generated by the dominant modal property and the domain conversion layer [100]. The PIResNet is composed of two parallel channels, one of which is analogous to a conventional Residual Network (ResNet). The sequence of layers in the model commences with the domain conversion layer, succeeded by a wide-kernel CNN layer that functions to suppress high-frequency noise. The remaining channels are structurally analogous, with the objective of automatically extracting high-dimensional features from the signal with dominant modal properties. Subsequently, both channels are flattened and concatenated, and fully connected layers and a Softmax function are employed for classification. It is important to note that both Batch Normalisation (BN) and the Rectified Linear Unit (ReLU) do not modify the feature size.

Key hyperparameters used in the proposed PIResNet include the Adam optimiser, a mini-batch size of 128, 100 training epochs, a dropout rate of 0.5, and an initial learning rate of 0.001.

To demonstrate the effectiveness of the proposed PIResNet, validation was conducted in two experiments: bearings operating under variable speeds and loads, and bearings operating under variable speeds over time. The proposed PIResNet achieves the highest accuracy of 99.59% and can also reach a remarkable 99.76%. A method comparatively superior to those subjected to comparative analysis, according to the authors, based on a novel physical modal-property-dominant-generated feature layer, yielding results markedly superior to those obtained using DCNN or ResNet, among others.

A bearing fault diagnosis method for edge devices is presented [93], including the acquisition of Acoustic Emission (AE) signals. A test rig was configured to acquire AE signals containing information about the bearing condition under different states and operating conditions. The acquired AE signals were then segmented using a fixed-length window (0.05 s), and a Short-Time Fourier Transform was applied to each segment to generate spectrogram images. Each STFT spectrogram was converted into a single-channel greyscale image. Teacher (ResNet-50) and student (MobileNetV2 and MobileNetV3-L) models were employed, with the first layer of each model modified from a three-channel input to a single-channel input, and the models were subsequently trained. The convolutional, batch normalisation, and Rectified Linear Unit (ReLU) layers in the trained student models were then fused, and the performance and other metrics of the final model (accuracy, memory usage, and inference time) were evaluated.

According to the authors, the method benefits from a highly efficient STFT-based transformation and greyscale image representation, achieving a competitive accuracy of 98.7%, compared with 99.17% at scale 0.2 and 98.95% at scale 0.01. The novelty of considering a hybrid neural network method is proposed, such that the improvement in efficacy also offsets the limitations of CNN-based methods arising from their high computational demand in comparison with traditional approaches; in particular, the objective is the use of 16- or 32-bit CPU-based units.

In accordance with the preceding case study, the Random Forest (RF) and Recurrent Expansion Network (RexNet) algorithms, which are advanced versions of the Long Short-Term Memory (LSTM) network, were employed to predict the Remaining Useful Life (RUL) of high-speed wind turbine bearings [94]. The aforementioned study utilised a lifecycle dataset pertaining to a 2 MW wind turbine, with the focal point of the investigation being the degradation of bearings occasioned by the presence of cracks within the inner raceway. In order to optimise the feature selection process, RF was employed under Bayesian optimisation, which enabled the identification of critical features such as load, safety index and health indicator. It is evident that these features were incorporated into two RexNet variants: a single-layer RexNet and a two-layer RexNet2. It should be noted that computational limitations necessitated the interruption of operations. The findings indicated the efficacy of the RexNet variants in evaluation metrics and RUL prediction visualisations (curve fitting). The authors report that significant achievements have been made in the field of RUL prediction.

A CNN–Transformer (CNN-T) system is proposed [86] that employs cross-attention for feature fusion and directly processes vibration signals without the need for prior preprocessing. Within the 1D-CNN encoder block, parallel convolutional layers with distinct kernel sizes are used to extract multiscale features. Max-pooling is then applied to reduce spatial dimensions, lower computational costs, and mitigate overfitting by emphasising the most significant and relevant features.

This study sets itself the objective of investigating the process of blade fault diagnosis. In order to achieve this objective, the study compares traditional Machine Learning approaches with a hybrid deep learning model of the one-dimensional convolutional transformer type, which is a novel approach. The hybrid model was subjected to rigorous testing on a multistage rotor, which revealed its remarkable accuracy, with a success rate exceeding 93% under a range of operating and fault conditions. The authors of the study posit that this development signifies a substantial enhancement over the conventional methodologies that have been employed to date, which have demonstrated accuracies ranging from 49.81% to 86.75%. In comparison with the established artificial neural networks, the performance of which has been shown to range between 88.43% and 90%, the proposed approach is evidently more advanced.

This paper [82] is founded upon the premise of preserving a standardised and simplified Artificial Neural Network (ANN) architecture, whilst concomitantly concentrating on the optimisation of vibration parameters derived from rotor dynamic analysis. A Machine Learning model based on the ANN is utilised to manage, process and correlate the voluminous dataset, thereby enabling the accurate detection of faults without the influence of human error. This approach eliminates the reliance on the experience or knowledge of any individual contributor. Consequently, a simple ANN-based ML model from the preceding study is employed. The objective of the present endeavour is to utilise the available vibration parameters, derived from rotordynamics, for the purpose of conducting precise fault detection through the implementation of a simple ANN model. The ANN, configured as a MultiLayer Perceptron (MLP) with four hidden layers, processes these parameters as inputs to classify machinery conditions and detect faults, leveraging its capacity to model nonlinear relationships between inputs and outputs.

Data were collected for four cases at three different operating speeds: 6 Hz, 12 Hz, and 18 Hz. For the healthy condition, 30 samples were acquired, each lasting 10 s with a sampling frequency of 25.6 kHz. The same sampling parameters were applied to the misalignment, imbalance, and bearing fault cases, achieving an effectiveness approaching 100%. For the authors, the method employed and the adjustments implemented during the research process make it possible to maintain the accuracy of the approach while substantially mitigating both overfitting and computational overhead.

The paper [81] proposes a Multiscale Domain Convolutional Attention Network (MSDCAN), which achieves accuracies of 97.3% under clean conditions, 96.6% at a signal-to-noise ratio (SNR) of 15 dB, 94.4% at an SNR of 10 dB, and a robust 85.5% under severe conditions with an SNR of 5 dB. The MSDCAN employs a three-stage hybrid architecture. First, a feature extractor is used to systematically capture physical features across multiple domains, including time, frequency, wavelet, and cyclic spectral domains. Subsequently, a deep hierarchical convolutional encoder with progressively multiscale kernels (from 64 to 3) automatically learns abstract representations from raw vibration signals. Finally, an enhanced hybrid attention fusion mechanism intelligently integrates heterogeneous features through bidirectional cross-attention interactions, enabling domain knowledge features to guide the interpretation of deep features while allowing data-driven patterns to enhance physical representations. The study proposes the application of the method in noisy environments which, in general, hinder the acquisition of the real signals requiring analysis, as noted by the authors, by configuring a hybrid system.

Considering the results obtained from the studies reviewed in this article, neural networks, in their various forms of application, demonstrate significant outcomes, as well as flexibility of use and considerable potential for future development. Notably, among the studies examined, the hybrid method proposed by Sinitsin et al. [107] is of particular interest, as it combines the use of neural networks with EMD, achieving over 99% accuracy in the experiment conducted, surpassing the performance of the baseline methods employed in the research (CNN and MLP models). Also of interest is the study [115], due to its dimensional shift from one-dimensional vibration signals to two-dimensional RGB colour images. This approach enables not only state identification based on colour but also recognition based on textures. By employing CNN, this combination achieves high-precision classification results, which, in the researchers’ words, renders it a method of particular interest.

5.3. Machine Learning

To begin the section, two papers that review the most common Machine Learning methods and techniques are discussed. The first one [92] addresses supervised learning methods, including the Wavelet Transform and deep learning, all of which are focused on their application to induction motors. The second paper [95] focuses on the analysis of bearing faults, which account for approximately 45–50% of failures in rotating machinery. Both cases are of significant interest for their application to railway equipment, whether rolling stock or infrastructure, and, due to their relevance in identifying research trends, are therefore included in this study. A third document is also included to support the compilation of data presented in this paper [96].

The Self-Supervised Pyramid Transformer-Based Anomaly Detection (SPT-AD) algorithm is a transformer-based model that detects anomalies in time series by generating anomalous data through self-supervised learning. The authors [87] propose an algorithm, termed AnomalyBERT, which is based on BERT. AnomalyBERT utilises only the encoder structure of the transformer. The present study proposes a novel SPT-AD architecture, which replaces the core Multi-head Self-Attention (MSA) component with the Pyramid Attention Module (PAM) from Pyraformer. In addition, a Coarse-Scale Construction Module (CSCM) is incorporated with a view to improving anomaly detection performance and computational efficiency for time series data.

The data degradation methods used include the following elements: Smooth Replacement, Uniform Replacement, Length Adjustment, and Noise Spiking. Experimental results demonstrate that SPT-AD outperforms existing anomaly detection models, achieving a 6% improvement in F1-score while significantly reducing computational overhead. In comparison, according to the researchers, the model exhibits a higher level of effectiveness than the benchmarked approaches (LSTM AutoEncoder, MSCRED, USAD, and AnomalyBERT), indicating in particular that it reduces the computational overhead associated with processing. They also acknowledge the need to train the model on additional datasets beyond those employed in the study, as this is regarded as a limitation until validation has been undertaken using a broader range of data sources.

A study [104] proposes a methodology for bearing fault diagnosis based on the processing of spectrogram images. The proposed approach enables accurate detection of bearing faults and classification of their type and severity through AI-based spectrogram recognition, even in the presence of noisy data. In addition, the article introduces a framework for interpreting the contribution of features to the model outcomes, addressing the issue of interpretability.

The data matrices are projected onto a Principal Component (PC) sub-space, which has been demonstrated to capture the majority of the data variance. This low-dimensional representation of the dataset is used as the feature space for a multiclass support vector machine (SVM) classifier with polynomial kernels. Spectrograms are extracted using STFT, and a greyscale dataset is constructed and split according to the Pareto rule into 80% for training and 20% for testing. Feature extraction is performed using Randomized Principal Component Analysis (rPCA), generating the corresponding matrices known as eigen-spectrograms, which are subsequently used to train the SVM. The results show that eigen-spectrograms efficiently capture the inherent structures of vibration data from a specific machine with limited human intervention. However, the applicability of the proposed model to larger and more challenging standardized datasets requires further experimental validation, and RUL assessment could also be explored. In this document, research employing spectrograms and CNNs for the execution of the study has been examined [93]; however, the present study also undertakes an analysis using these techniques, in this case in conjunction with rCPA and SVM, achieving superior classification performance and a significant reduction in model training time (100 s as opposed to nearly 800 s for CNN).

Anomaly detection was investigated using a test rig equipped with an electric motor, and the underlying causes were subsequently diagnosed under various scenarios [99]. Approximately 501 samples were generated for each of the five classes considered in the study, resulting in a total of 2505 samples. For all trained algorithms, a cross-validation technique was adopted, with the samples split into 75% for training (1878 samples) and 25% for testing (627 samples).

A range of statistical classifiers and common learning methods were employed for the purpose of diagnosing faults in rotating machinery. The classifiers and learning methods that were utilised included k-NN, the Naïve Bayes (NB) classifier, SVM, and MLP. The supervised training was performed using balanced samples from five classes (target attributes) representing possible operating conditions in industrial applications involving electrically driven rotating machinery. The objective of the study was to demonstrate the continued validity and efficacy of so-called traditional machine learning techniques.

The investigation of bearing defects was conducted using data obtained from the oscillatory bearing test rig installed at CSIR–NAL [98]. The principal bearing features were extracted and the fundamental fault frequencies were calculated. Features from the non-redundant region and the diagonal slice of the bispectrum were utilised to capture higher-order statistical and spectral characteristics of the vibration signal. A set of sixteen machine learning models was employed for the classification of bearing faults. The models included decision trees, k-nearest neighbours, naïve Bayes and support vector machines. The evaluation process involved a robust 10-fold cross-validation technique.

The findings indicated that the Decision Tree algorithm exhibited superior performance, attaining a noteworthy accuracy of 100%. The Naïve Bayes algorithm demonstrated the least optimal performance, with an accuracy of 99.68%. A comparison was made between the results obtained using these algorithms and those achieved using CNNs. The former were found to require significantly less training time than the latter. The authors further compare their findings with a previous study of their own in which a CNN was employed, reporting a markedly favourable outcome in terms of model training time: exceeding 6 min in the case of the CNN, compared with less than 10 s for the models analysed in the present paper.

The following study [97] is cited in support of the hypothesis that the Advanced Fault Indicator (AFI), which uses the FFT of broadband accelerometer signals to compute the spectral content of vibration signals emitted by bearings, can identify faults earlier than the RMS method and is more robust than the Envelope Frequency Band (EFB) and the Mean Absolute Value of the Extremes (MAVE). The combination of RMS and AFI appears to be a promising approach for obtaining a reliable configuration for bearing fault detection. AFI acts as a predictive indicator, ideally providing an earlier warning than RMS, albeit with a certain probability of responding at an incorrect location.

The datasets utilised for these calculations were obtained under constant speed and load conditions. Subsequent to the analysis, an operational class and a steady-state detection scheme were implemented in order to compensate for non-stationary operating conditions. Concurrently, the developed AFI algorithm was implemented and tested in real-world industrial environments, including customer test-rig configurations for racing car development. A range of practical considerations were given full consideration in the research process, including but not limited to computational efficiency, real-time processing requirements and integration with existing condition monitoring systems.

The study [84] proposes an experimental framework using a custom-designed Bearing–Shaft Misalignment Simulator, specifically engineered to induce and control varying degrees of parallel misalignment under operational conditions. The configuration under consideration facilitates a methodical examination of the vibration behaviour exhibited by the bearing. Six classification models were evaluated: five Machine Learning algorithms (Multilayer Perceptron, Random Forest, Decision Tree, k-Nearest Neighbors, and Adaptive Boosting) and one DL model (Long Short-Term Memory, LSTM) for classifying four levels of misalignment severity. The results reveal a strong positive correlation between the magnitude of misalignment and vibration intensity, highlighting the increase in dynamic instability within the bearing system. Statistical feature extraction in the time domain significantly improved the performance of classical models, with KNN achieving a maximum accuracy of 92.9%.

What can be considered traditional Machine Learning methods continue to be widely used today, often in combination with other methodologies. As noted, features are sometimes extracted via the WT and subsequently analysed using methods such as DT or SVM. Among the studies reviewed in this paper, the work of Sharma et al. [98] is noteworthy, as the results indicate that the decision trees employed achieved higher accuracy than both SVM and CNNs. Accordingly, the authors conclude that DT remain an effective method.

5.4. Combined DL and WT

The Wavelet Transform analyses signals by decomposing them into components in both the time and frequency domains, wavelets, which are derived from a base or mother wavelet through scaling and translation operations thereby enabling the extraction of, for example, energy-related parameters that may subsequently be used as inputs to the neural network.

With a deep CNN, it is challenging to meet the computational requirements of embedded systems due to the large amount of processing involved. In recent years, significant progress has been made in research on both deep and lightweight CNNs [67,121]. Convolutional networks can hierarchically extract and combine data features, especially when used in conjunction with the WPT. However, as the network depth increases, model accuracy tends to decrease. Therefore, employing a residual structure combined with Batch Normalization (BN) enables the training of very deep networks without the risk of performance degradation associated with increased depth.

Experimental results show that the proposed algorithm achieves the second-best performance under various levels of added Gaussian white noise, and its noise robustness is between 3% and 58% higher than that of existing algorithms. Furthermore, it demonstrates a transfer learning capability ranging from 1.45% to 52% [121]. In addition, the proposed algorithm also exhibits the lowest computational complexity and memory requirements, being 72.5% and 88.5% lower than those of existing algorithms, respectively [67]. A concept of network lightening is presented in a study based on the use of the integrated Frequency Slice Wavelet Transform (FSWT) [103], which also yields significant results regarding overall process efficiency. The authors further demonstrate that using a DC-ResNet structure provides superior feature extraction capability compared with the standard ResNet18, achieving an average diagnostic accuracy of 93.90%, which is higher than that of ResNet18 (89.98%).

A similar approach is presented through a Discrete Wavelet Convolutional Residual neural Network (DWCResNet) [80]. In order to attenuate the influence of noise and improve classification accuracy, wavelet transforms are incorporated into ResNet architectures. Specifically, one-dimensional DWT and Inverse Discrete Wavelet Transform (IDWT) operations are embedded within standard network layers. During the down-sampling process, DWCResNet suppresses the high-frequency components of fault-related features with a view to enhancing the noise robustness of ResNets, while extracting relevant features from low-frequency components. DWCResNet is a neural network which has been designed to function from end-to-end. It has been demonstrated as having the capacity to directly classify raw input signals which are subject to noise, without the requirement of any pre-existing methodologies for the removal of such noise.

An experiment was conducted using six different motor condition states [120]. All states were monitored under static conditions for 90 s. This dataset was subsequently used to train the classification model after data preprocessing, which included standardization and feature extraction using the CWT and scalograms. A CNN model based on the LeNet-5 architecture was designed. This model achieved an accuracy of 97.53% in classifying all motor conditions across the six different scenarios established in the experiment: defective bearings, loose assembly, rotor eccentricity, motor phase loss, and stator winding short circuit.

One approach that has been presented involves the combination of a digital twin with DL [83]. he collection of real bearing vibration data is facilitated by an experimental platform. Following the removal of noise, a high-fidelity digital twin system is constructed by integrating a dynamic bearing model with a Generative Adversarial Network (GAN). Subsequently, the Wavelet Synchronous Extraction Transform (WSET) is employed for high-resolution time–frequency analysis, and convolutional neural networks are used to adaptively extract deep fault features. The fully connected layer of the CNN is combined with a Least Squares Support Vector Machine (LSSVM), with key parameters optimised using an improved Pelican Optimization Algorithm (IPOA) to enhance diagnostic accuracy. In comparison with alternative methodologies, the proposed CNN–IPOA–LSSVM model exhibits a substantial enhancement in diagnostic performance. The experimental validation process has demonstrated that the proposed method attains an average diagnostic accuracy of 99.30%. At present, the use of digital twins is becoming increasingly widespread, and this study introduces the novel application of employing digital twins for analytical purposes by, as the researchers describe, establishing a processing chain that begins with signal generation by the corresponding digital twin, followed by the implementation of the standard analytical procedures.

This chapter presents a variety of notable and interesting studies, highlighting, for its potential application in the categorisation of railway rolling stock, the research focused on data collection to subsequently generate a digital twin [83]. In this study, signal features are extracted using the WT, followed by classification through neural networks combined with SVM. Authors report that the method achieves over 99% accuracy. Although the diagnostic model is based on an experimental test bench, the possibility of collecting data in an operational system under real-world conditions, together with the generation of an associated digital twin, offers promising prospects.

5.5. Combined DL and Traditional ML

In the following paper [125], data were collected via SCADA from a wind farm. The construction of a reference power curve (wind speed versus power output) for each of the 48 turbines on the farm was achieved by employing artificial neural networks and Gaussian processes. Subsequently, each reference model was used to predict the power output of the remaining turbines. This process resulted in the creation of a confusion matrix of regression model errors (MSE) for all combinations. For the neural networks, multilayer perceptrons were employed. In addition, the reference power curve is indicative of a healthy power curve. That is to say, it is constructed solely from data corresponding to time instances with a status code of “0” (“no fault” in the turbines).

The findings indicated that the majority of the models exhibited a high degree of robustness, characterised by consistently low MSE errors. The dataset utilised in this study encompasses a complete annual period of operation. All SCADA extracts are comprised of 10-min means, incorporating the maximum, mean, minimum, and standard deviation values recorded and made available for each 10-min interval. The actual sampling frequency is less than 10 min.

In this study [111], some of the most relevant Artificial Intelligence (AI) techniques are combined with a Range-Resolution Interferometry (RRI) instrument applied to wind turbine maintenance. A small wind turbine prototype, demonstrating component degradation and wear along with their effects, was used to facilitate fault diagnosis. A laser scanner was employed to detect vibrations in two different fault conditions. After each operational cycle, the in-process RRI measurements were found to correspond closely with in-process manual measurements taken using in situ micrometres, as well as with in-process measurements obtained via the laser scanner. Therefore, the proposed method is expected to be highly useful for the monitoring and diagnosis of faults in wind turbines operating in harsh environments. Additionally, it enables low-cost in-process measurements.

The following paper [110] is essentially a comparison between deep learning and traditional Machine Learning methods. The applicability of physical methods, such as the envelope spectrum enhanced with Cepstrum Pre-Whitening (CPW), was evaluated for ML-based classification. The application of this method to feature extraction yielded optimal results, thereby demonstrating its efficacy when employed in conjunction with time-domain features. The results obtained in this study surpassed those of a CNN trained on greyscale images of raw time-series signals under identical conditions. The conclusions presented herein should be regarded as a contribution to the study of pertinent features for machine learning-based rolling element bearing fault diagnosis. Overall, the ML models achieved excellent accuracies, with k-NN and SVM classifiers performing best. The potential exists for enhancing the precision of the CNN through the augmentation of the dataset.

This paper [106] explores the potential of eXplainable Artificial Intelligence (XAI) algorithms when employed within the context of convolutional neural networks, with a focus on their application in the domain of vibration-based condition monitoring. In this study, three XAI algorithms (GradCAM, LRP, and LIME) with a modified perturbation strategy were applied to classifications based on the Fourier transform, as well as to order analysis of the vibration signal. The employment of XAI methods resulted in the generation of saliency maps for the purpose of feature explanation.

Dataset classification was performed using CNNs, which allowed the application of both model-agnostic XAI methods, such as LIME, and deep neural network-specific methods, such as GradCAM and LRP. Each model was trained for 150 epochs using the Adam optimizer with a learning rate of 10⁻⁴.

A promising study [111] due to the combination of AI techniques applied to a model—in this case, a prototype wind turbine model—which, according to the authors, offers broad applicability for both fault detection and condition monitoring in wind turbines, and which could consequently be applied to the categorisation of railway rolling stock.

5.6. Perspective on the Methodological Evolution

An examination of the studies compiled in this manuscript, a clear progression can be observed in the application of the methodology, following the patterns identified in recent research, as will be discussed below:

5.6.1. Classical Signal Processing + ML

The extraction of features from vibration signals using methods such as the FFT, and their subsequent analysis by means of traditional classifiers (decision trees, SVMs, etc.), is regarded as the application of classical signal processing combined with machine learning techniques [126] which, in the present study, correspond to the group of investigations classified under the Machine Learning Section.

5.6.2. Hybrid Feature Extraction + DL

The subsequent stage in the classification [126,127] is characterised by the combined use of signal extraction obtained through different methods, including manual obtention, followed by feature extraction for subsequent analysis. These features are derived either through traditional methods such as the FFT or by means of WT or EMD decomposition techniques which, in the present document, encompasses a substantial proportion of the studies analysed, as the manuscripts grouped under the Sections Wavelet Transform, combined DL and traditional ML, and combined DL and WT correspond to this methodological approach.

5.6.3. End-to-End DL

These models do not require the need for feature preselection [128]; that is, the methods do not need any prior signal processing and perform the analysis directly without intermediate stages, as presented in the DL Section of this manuscript, By definition, it could be said that the methodology is based on the use of a single working approach, which in practice corresponds to the use of neural networks, specifically (DL).

5.6.4. Digital Twins

The virtual representation of systems is currently becoming a methodology that enables the simulation, analysis, and real-time prediction of systems dynamics [129,130], in other words, a digital twin is generated that faithfully reproduces the characteristics, states, and dynamics of its physical counterpart, from which the necessary information has been obtained through the use of sensors. An example of the application of digital twins is provided in the paper [83], which describes their use in detail and the promising results obtained.

6. Conclusions

This article reviews the different trends in the interpretation of information provided by vibration signals from various mechanical systems, as well as the different approaches based on the use of ML models. In addition, more than ninety recently published research studies from different related fields are reviewed, listed, categorised, and analysed. Subsequently, the methodologies and limitations of the most commonly used learning approaches are examined, and the 51 most relevant studies are selected in line with the objectives of this work. These techniques have demonstrated their capability and effectiveness in addressing a wide range of ML problems.

From the analysis of the selected documents, it is observed that, given the priority of fault diagnosis and condition monitoring, there are currently two main research directions: one based on DL and the other on wavelet-based approaches. Although these constitute the principal lines of research, other ML methods, such as SVM, DT, and other commonly used classification techniques, remain relevant and are occasionally combined with the main approaches.

The reduction in model complexity aimed at decreasing computational cost—at the risk of information loss and error induction—as well as direct application in real-time analysis systems, constitutes the most immediate challenges currently being addressed by ongoing research.

The studies reviewed in this paper provide a large number of highly effective methods, each presenting, depending on the preferred application, a range of advantages and disadvantages. As an example, one of the studies cited in this article is indicated in parentheses. The WT is shown to be a widely used method, both in its discrete form [112] and continuous form [117], offering, among other benefits, the advantage of being usable in combination with other ML methods [93] or even deep learning approaches [115]. However, it presents the drawback that careful selection of the mother wavelets [123] is required to avoid loss of effectiveness.

Meanwhile, deep learning methodologies exhibit considerable operational flexibility and broad applicability through the development of associated techniques such as Transfer Learning [70]. This methodology is receiving significant attention across a wide variety of fields [105,116], leading to substantial expansion and development. Nonetheless, issues of over-dimensioning arise with the increase in the number of layers in the network generated in each case [86].

The use of neural networks, among the two main selected methods, offers greater operational flexibility for the categorisation of railway rolling stock, and can be combined with other ML methods, such as SVM, to enhance effectiveness and mitigate issues associated with over-dimensioning.

Furthermore, it is essential to consider the classification arising from the methodological evolution, along with its defining characteristics, chronological development, and prospective future trajectories. While these aspects are inherently linked to the description presented above, their analysis provides an additional perspective of significant scholarly interest.

Classical methodologies have served as the foundation of ML and continue to be widely used and highly valuable. A substantial number of the reviewed studies demonstrate that these methods, which represent incremental refinements in their evolution, continue to provide highly reliable results. However, they occasionally exhibit issues of overparameterisation and overfitting. Support Vector Machines (SVM) and decision trees, in particular, continue to demonstrate robust performance. As previously noted, the results obtained using classical methods have been entirely satisfactory, in some cases outperforming those achieved with more advanced techniques.

In turn, Hybrid Feature Extraction combined with Deep Learning (DL) represents the next evolutionary step, marking a transition from classical methods to end-to-end approaches. This is achieved by integrating classical feature extraction techniques (such as FFT) with other developed methods (e.g., EMD, WT), followed by analysis and classification via DL. Although these methods offer improvements in both capacity and performance, they also entail a considerable increase in computational demands and, consequently, resource requirements, necessitating careful operational supervision. Overall, they can be regarded as incremental refinements in their evolution. It has been observed in the reviewed manuscripts that this methodology is extensively employed, owing to the operational flexibility it offers and the large volumes of data it can process. However, some studies have also highlighted the substantial computational resources required and the significantly longer operational time needed to complete the process. At the same time, these studies indicate that research is expanding and data acquisition methods are being progressively improved.

The next step in the evolutionary trajectory is the end-to-end methodology, which in this manuscript encompasses the Deep Learning (DL) group. Based on the articles analysed, this approach is currently in a phase of rapid expansion. It represents a methodological advancement that benefits from progress in computational capabilities, although it has been noted that it generally entails a substantial computational load. At the same time, it does not require data preselection or supervision during operation. This methodology is expected to undergo significant advancements and improvements, and it is anticipated that quantum computers, once accessible, will provide a genuine boost to the approach, potentially reducing or eliminating the limitations currently encountered.

Finally, digital twins, a methodology currently in its initial stages and with a highly promising future, undoubtedly represent a systemic methodological advance. By integrating sensors, simulations, and real-time analytics, in addition to replicated physical models, they consolidate all previous paradigms and offer a significant and promising future across a wide range of applications. In particular, for the characterisation of railway rolling stock, digital twins present the major advantage of combining all previous methods; however, they will also require highly rigorous working procedures to avoid biases in data collection, as well as reliable and robust information-gathering methods. Despite rapid advances in computing, it is estimated that some time will be needed before these systems can operate at full efficiency. Therefore, in the medium term, end-to-end and hybrid feature extraction combined with deep learning are expected to benefit from further development.

Author Contributions

Conceptualization, E.J., C.P.-C., H.R. and A.B.; methodology, E.J. and C.P.-C.; software, C.P.-C.; validation, H.R. and A.B.; formal analysis, E.J.; investigation, E.J. and C.P.-C.; resources, H.R. and A.B.; data curation, A.B.; writing—original draft preparation, E.J.; writing—review and editing, E.J., C.P.-C., H.R. and A.B.; visualization, E.J.; supervision, H.R. and A.B.; project administration, H.R.; funding acquisition, E.J. All authors have read and agreed to the published version of the manuscript.

Funding

This publication is part of the R&D&I project PID2024-160821OB-I00, funded by MICIU/AEI/10.13039/501100011033/.

Data Availability Statement

No new data was created.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript, with the most significant ones listed due to their widespread use:

1D-CNN	One-Dimensional CNN
AI	Artificial Intelligence
AL	Active Learning
ANC	Active Noise Control
ANFIS	Adaptive Neuro-Fuzzy Inference System
ANN	Artificial Neural Network
CM	Condition Monitoring
CNN	Convolutional Neural Network
CNN-T	CNN–Transformer
CWT	Continuous Wavelet Transform
DL	Deep Learning
DT	Decision Tree
DWCResNet	Discrete Wavelet Convolutional Residual Neural Network
DWT	Discreet Wavelet Transform
EDA	Exploratory Data Analysis
EFB	Envelope Frequency Band
EMD	Empirical Mode Decomposition
FD	Fault Diagnosis
FFT	Fast Fourier Transform
FFW	Feedforward Neural Network
FSWT	Frequency Slice Wavelet Transform
GA	Genetic Algorithm
GMW	Generalised Morse Wavelet
HHT	Hilbert–Huang Transform
IMF	Intrinsic Mode Functions
IML	Interactive Machine Learning
IR	Inner Race
IRC	Internal Radial Clearance
k-NN	K-Nearest Neighbours
LSTM	Extended Versions of Long Short-Term Memory
ML	Machine Learning
MLP	Multi-Layer Perceptron
MT	Machine Teaching
OR	Outer Race
PIResNet	Physics-Informed Residual Network
ResNet	Residual Neural Network
RexNet	Recurrent Expansion Network
RF	Random Forest
RUL	Remaining Useful Life
SSE	Shannon Spectral Entropy
STFT	Short-Time Fourier Transform
SVM	Support Vector Machines
TL	Transfer Learning
WPT	Wavelet Packet Transform
WST	Wavelet Scattering Transform
WT	Wavelet Transform
XGBoost	Extreme Gradient Boosting

References

McCarthy, J.; Minsky, M.L.; Shannon, C.E. A proposal for the Dartmouth summer research project on artificial intelligence—August 31, 1955. AI Mag. 2006, 27, 12–14. [Google Scholar]
Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; FirstPrentice Hall: Boston, MA, USA, 1995. [Google Scholar]
Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
Vora, L.K.; Gholap, A.D.; Jetha, K.; Thakur, R.R.S.; Solanki, H.K.; Chavda, V.P. Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design. Pharmaceutics 2023, 15, 1916. [Google Scholar] [CrossRef]
Tsai, C.W.; Lin, M.-L.; Tung, J.Y. Spatial and temporal evolution of heatwaves in Taiwan in a changing climate using multi-dimensional complementary ensemble empirical mode decomposition. Ecol. Inform. 2024, 81, 102585. [Google Scholar] [CrossRef]
Talaei Khoei, T.; Kaabouch, N. Machine Learning: Models, Challenges, and Research Directions. Future Internet 2023, 15, 332. [Google Scholar] [CrossRef]
Torres Quevedo, L. Ensayos sobre Automática. Su definición. Extensión teórica de sus aplicaciones. In Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales, XII; Spring: Berlin/Heidelberg, Germany, 1914; pp. 391–419. [Google Scholar]
Turing, A.M. On Computable Numbers, with an Application to the Entscheidungsproblem. Proc. Lond. Math. Soc. 1936, 58, 230–265. [Google Scholar]
Mcculloch, W.S.; Pitts, W. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
von Neumann, J. First Draft of a Report on the EDVAC; Moore School of Electrical Engineering, University of Pennsylvania: Philadelphia, PA, USA, 1945. [Google Scholar]
von Neumann, J. The General and Logical Theory of Automata. In Cerebral Mechanisms in Behavior: The Hixon Symposium; L. A. JeffressJohn Wiley & Sons: Hoboken, NJ, USA, 1951; pp. 1–41. [Google Scholar]
Samuel, A.L. Some Studies in Machine Learning Using the Game of Checkers. IBM J. Res. Dev. 1959, 3, 210–229. [Google Scholar] [CrossRef]
Feynman, R.P. Simulating Physics with Computers. Int. J. Theor. Phys. 1982, 21, 467–488. [Google Scholar] [CrossRef]
Holmberg, L.; Davidsson, P.; Linde, P. A Feature Space Focus in Machine Teaching. In Proceedings of the 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Austin, TX, USA, 23–27 March 2020; pp. 1–2. [Google Scholar] [CrossRef]
Mosqueira-Rey, E.; Hernández-Pereira, E.; Alonso-Ríos, D.; Bobes-Bascarán, J.; Fernández-Leal, Á. Human-in-the-loop machine learning: A state of the art. Artif. Intell. Rev. 2023, 56, 3005–3054. [Google Scholar] [CrossRef]
Janssens, D.; Wets, G.; Brijs, T.; Vanhoof, K.; Arentze, T.; Timmermans, H. Integrating Bayesian networks and decision trees in a sequential rule-based transportation model. Eur. J. Oper. Res. 2006, 175, 16–34. [Google Scholar] [CrossRef]
Zhong, X.; Ban, H. Crack fault diagnosis of rotating machine in nuclear power plant based on ensemble learning. Ann. Nucl. Energy 2022, 168, 108909. [Google Scholar] [CrossRef]
Liang, C.; Yang, Z.; Zhu, L.; Yang, Y. Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition. IEEE Trans. Image Process. 2023, 32, 2508–2519. Available online: https://ieeexplore.ieee.org/document/10112637 (accessed on 23 November 2023). [CrossRef]
Prati, E. Quantum neuromorphic hardware for quantum artificial intelligence. J. Phys. Conf. Ser. 2017, 880, 012018. [Google Scholar] [CrossRef]
Capra, M.; Bussolino, B.; Marchisio, A.; Shafique, M.; Masera, G.; Martina, M. An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks. Future Internet 2020, 12, 113. [Google Scholar] [CrossRef]
Dhilleswararao, P.; Boppu, S.; Manikandan, M.S.; Cenkeramaddi, L.R. Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey. IEEE Access 2022, 10, 131788–131828. [Google Scholar] [CrossRef]
Zhang, J.M.; Harman, M.; Ma, L.; Liu, Y. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Trans. Softw. Eng. 2022, 48, 1–36. [Google Scholar] [CrossRef]
Su, F.; Liu, C.; Stratigopoulos, H.-G. Testability and Dependability of AI Hardware: Survey, Trends, Challenges, and Perspectives. IEEE Des. Test. 2023, 40, 8–58. [Google Scholar] [CrossRef]
Tufail, S.; Riggs, H.; Tariq, M.; Sarwat, A.I. Advancements and Challenges in Machine Learning: A Comprehensive Review of Models, Libraries, Applications, and Algorithms. Electronics 2023, 12, 1789. [Google Scholar] [CrossRef]
Braun, S. Discover Signal Processing: An Interactive Guide for Engineers/Simon Braun; Wiley: Chichester, UK; Hoboken, NJ, USA, 2008. [Google Scholar]
McFadden, P.D.; Smith, J.D. Model for the vibration produced by a single point defect in a rolling element bearing. J. Sound Vib. 1984, 96, 69–82. [Google Scholar] [CrossRef]
Vives, J. Incorporating Machine Learning into Vibration Detection for Wind Turbines. Model. Simul. Eng. 2022, 2022, 6572298. [Google Scholar] [CrossRef]
Randall, R.B.; Antoni, J. Rolling element bearing diagnostics—A tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [Google Scholar] [CrossRef]
Antoni, J.; Randall, R.B. Differential Diagnosis of Gear and Bearing Faults. J. Vib. Acoust. 2022, 24, 165–171. [Google Scholar] [CrossRef]
Antoni, J.; Randall, R.B. A stochastic model for simulation and diagnostics of rolling element bearings with localized faults. J. Vib. Acoust. Trans. 2003, 125, 282–289. [Google Scholar] [CrossRef]
Entezami, M.; Roberts, C.; Weston, P.; Stewart, E.; Amini, A.; Papaelias, M. Perspectives on railway axle bearing condition monitoring. Proc. Inst. Mech. Eng. Part F J. Rail Rapid Transit 2019, 234, 17–31. [Google Scholar] [CrossRef]
Bustos, A.; Rubio, H.; Castejon, C.; Garcia-Prada, J.C. Condition monitoring of critical mechanical elements through Graphical Representation of State Configurations and Chromogram of Bands of Frequency. Measurement 2019, 135, 71–82. [Google Scholar] [CrossRef]
Guo, L.; Gao, H.; Huang, H.; He, X.; Li, S. Multifeatures Fusion and Nonlinear Dimension Reduction for Intelligent Bearing Condition Monitoring. Shock Vib. 2016, 2016, 4632562. [Google Scholar] [CrossRef]
Guo, J.; Wang, Z.; Li, H.; Yang, Y.; Huang, C.-G.; Yazdi, M.; Kang, H.S. A hybrid prognosis scheme for rolling bearings based on a novel health indicator and nonlinear Wiener process. Reliab. Eng. Syst. Saf. 2024, 245, 110014. [Google Scholar] [CrossRef]
Luo, H.; Bo, L.; Peng, C.; Hou, D. Fault Diagnosis for High-Speed Train Axle-Box Bearing Using Simplified Shallow Information Fusion Convolutional Neural Network. Sensors 2020, 20, 4930. [Google Scholar] [CrossRef]
Borghesani, P.; Smith, W.A.; Randall, R.B.; Antoni, J.; El Badaoui, M.; Peng, Z. Bearing signal models and their effect on bearing diagnostics. Mech. Syst. Signal Process. 2022, 174, 109077. [Google Scholar] [CrossRef]
Cao, P.; Zhang, S.; Tang, J. Preprocessing-Free Gear Fault Diagnosis Using Small Datasets with Deep Convolutional Neural Network-Based Transfer Learning. IEEE Access 2018, 6, 26241–26253. [Google Scholar] [CrossRef]
Junquera, E.; Rubio, H.; Bustos, A. Determination of the Condition of Railway Rolling Stock Using Automatic Classifiers. Electronics 2025, 14, 3006. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Gul, S.T.; Imran, M.; Khan, A.Q. An online incremental support vector machine for fault diagnosis using vibration signature analysis. In Proceedings of the 2018 IEEE International Conference on Industrial Technology (ICIT), Lyon, France, 20–22 February 2018; pp. 1467–1472. [Google Scholar] [CrossRef]
Ranawat, N.S.; Kankar, P.K.; Miglani, A. Fault diagnosis in centrifugal pump using support vector machine and artificial neural network. J. Eng. Res. 2021, 9, 99–111. [Google Scholar] [CrossRef]
Zhang, W. Learning Distance Metric for Support Vector Machine: A Multiple Kernel Learning Approach. Neural Process. Lett. 2019, 50, 2899–2923. [Google Scholar] [CrossRef]
Nakayama, Y.; Yata, K.; Aoshima, M. Support vector machine and its bias correction in high-dimension, low-sample-size settings. J. Stat. Plan. Inference 2017, 191, 88–100. [Google Scholar] [CrossRef]
Artemiou, A.; Dong, Y. Sufficient dimension reduction via principal Lq support vector machine. Electron. J. Stat. 2016, 10, 783–805. [Google Scholar] [CrossRef]
Qiao, X.; Zhang, L. Flexible High-dimensional Classification Machines and Their Asymptotic Properties. arXiv 2013, arXiv:1310.3004. [Google Scholar] [CrossRef]
Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
Quinlan, J.R. Simplifying decision trees. Int. J. Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef]
Bakirli, G.; Birant, D. DTreeSim: A new approach to compute decision tree similarity using re-mining. Turk. J. Electr. Eng. Comput. Sci. 2017, 25, 108–125. [Google Scholar] [CrossRef]
Andre, A.B.; Beltrame, E.; Wainer, J. A Combination of Support Vector Machine and K-Nearest Neighbors for Machine Fault Detection. Appl. Artif. Intell. 2013, 27, 36–49. [Google Scholar] [CrossRef]
Chui, C.K.; Li, X. Generalized wavelet decompositions of bivariate functions. Proc. Am. Math. Soc. 1994, 121, 125–131. [Google Scholar] [CrossRef][Green Version]
Strang, G. Wavelet transforms versus Fourier transforms. Bull. Am. Math. Soc. 1993, 28, 288–305. [Google Scholar] [CrossRef]
Bhavsar, K.; Vakharia, V.; Chaudhari, R.; Vora, J.; Pimenov, D.Y.; Giasin, K. A Comparative Study to Predict Bearing Degradation Using Discrete Wavelet Transform (DWT), Tabular Generative Adversarial Networks (TGAN) and Machine Learning Models. Machines 2022, 10, 176. [Google Scholar] [CrossRef]
Huo, Z.; Zhang, Y.; Francq, P.; Shu, L.; Huang, J. Incipient Fault Diagnosis of Roller Bearing Using Optimized Wavelet Transform Based Multi-Speed Vibration Signatures. IEEE Access 2017, 5, 19442–19456. [Google Scholar] [CrossRef]
Saravanan, N.; Ramachandran, K.I. Fault diagnosis of spur bevel gear box using discrete wavelet features and Decision Tree classification. Expert Syst. Appl. 2009, 36, 9564–9573. [Google Scholar] [CrossRef]
Liao, M.; Liu, C.; Wang, C.; Yang, J. Research on a Rolling Bearing Fault Detection Method with Wavelet Convolution Deep Transfer Learning. IEEE Access 2021, 9, 45175–45188. [Google Scholar] [CrossRef]
Zeiler, A.; Faltermeier, R.; Keck, I.R.; Tomé, A.M.; Puntonet, C.G.; Lang, E.W. Empirical Mode Decomposition—An introduction. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar] [CrossRef]
Bustos, A.; Rubio, H.; Castejón, C.; García-Prada, J.C. EMD-Based Methodology for the Identification of a High-Speed Train Running in a Gear Operating State. Sensors 2018, 18, 793. [Google Scholar] [CrossRef]
Wang, C.; Liu, C.; Liao, M.; Yang, Q. An enhanced diagnosis method for weak fault features of bearing acoustic emission signal based on compressed sensing. Math. Biosci. Eng. 2021, 18, 1670–1688. [Google Scholar] [CrossRef]
Puliafito, V.; Vergura, S.; Carpentieri, M. Fourier, Wavelet, and Hilbert-Huang Transforms for Studying Electrical Users in the Time and Frequency Domain. Energies 2017, 10, 188. [Google Scholar] [CrossRef]
Zhang, C.; Fu, S.; Ou, B.; Liu, Z.; Hu, M. Prediction of Dam Deformation Using SSA-LSTM Model Based on Empirical Mode Decomposition Method and Wavelet Threshold Noise Reduction. Water 2022, 14, 3380. [Google Scholar] [CrossRef]
Le Cun, Y. Generalization and Network Design Strategies. In Connectionism in Perspective: Proceedings of the International Conference Connectionism in Perspective; University of Zurich: Zurich, Switzerland, 1989. [Google Scholar]
Guresen, E.; Kayakutlu, G. Definition of artificial neural networks with comparison to other networks. Procedia Comput. Sci. 2011, 3, 426–433. [Google Scholar] [CrossRef]
Fiesler, E. Neural network classification and formalization. Comput. Stand. Interfaces 1994, 16, 231–239. [Google Scholar] [CrossRef]
Rahmani, A.M.; Azhir, E.; Ali, S.; Mohammadi, M.; Ahmed, O.H.; Ghafour, M.Y.; Ahmed, S.H.; Hosseinzadeh, M. Artificial intelligence approaches and mechanisms for big data analytics: A systematic study. PeerJ Comput. Sci. 2021, 7, e488. [Google Scholar] [CrossRef]
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Ma, S.; Cai, W.; Liu, W.; Shang, Z.; Liu, G. A Lighted Deep Convolutional Neural Network Based Fault Diagnosis of Rotating Machinery. Sensors 2019, 19, 2381. [Google Scholar] [CrossRef]
Afrasiabi, S.; Mohammadi, M.; Afrasiabi, M.; Parang, B. Modulated Gabor filter based deep convolutional network for electrical motor bearing fault classification and diagnosis. IET Sci. Meas. Technol. 2021, 15, 154–162. [Google Scholar] [CrossRef]
Klaar, A.C.R.; Seman, L.O.; Mariani, V.C.; Coelho, L.d.S. Random Convolutional Kernel Transform with Empirical Mode Decomposition for Classification of Insulators from Power Grid. Sensors 2024, 24, 1113. [Google Scholar] [CrossRef] [PubMed]
Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
Zhang, R.; Tao, H.; Wu, L.; Guan, Y. Transfer Learning with Neural Networks for Bearing Fault Diagnosis in Changing Working Conditions. IEEE Access 2017, 5, 14347–14357. [Google Scholar] [CrossRef]
Gafni, T.; Shlezinger, N.; Cohen, K.; Eldar, Y.C.; Poor, H.V. Federated Learning: A signal processing perspective. IEEE Signal Process. Mag. 2022, 39, 14–41. [Google Scholar] [CrossRef]
Isermann, R. Fault-Diagnosis Systems; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar] [CrossRef]
Cerrada, M.; Sánchez, R.V.; Cabrera, D.; Zurita, G.; Li, C. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal. Sensors 2015, 15, 23903–23926. [Google Scholar] [CrossRef]
Mobley, R.K. An Introduction to Predictive Maintenance, 2nd ed.; Butterworth-Heinemann: Amsterdam, The Netherlands; New York, NY, USA, 2002. [Google Scholar]
Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
Jardine, A.K.S.; Tsang, A.H.C. Maintenance, Replacement, and Reliability: Theory and Applications, 2nd ed.; CRC Press-Taylor & Francis Group: Boca Raton, FL, USA, 2013; pp. 1–330. [Google Scholar]
Aria, M.; Cuccurullo, C. Bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Int. J. Surg. 2010, 8, 336–341. [Google Scholar] [CrossRef]
Ni, Y.; Li, S.; Guo, P. Discrete wavelet integrated convolutional residual network for bearing fault diagnosis under noise and variable operating conditions. Sci. Rep. 2025, 15, 16185. [Google Scholar] [CrossRef]
Xu, L.-M.; Wong, P.K.; Gao, Z.-J.; Yang, Z.-X.; Zhao, J.; Wang, X.-B. An Attention-Driven Multi-Scale Framework for Rotating-Machinery Fault Diagnosis Under Noisy Conditions. Electronics 2025, 14, 3805. [Google Scholar] [CrossRef]
Almutairi, K.; Wen, H.; Sinha, J.K. Standardisation of Vibration-based Parameters for Rotor and Bearing for Machine Faults Detection Using Machine Learning Model. J. Vib. Eng. Technol. 2025, 13, 504. [Google Scholar] [CrossRef]
Li, S.; Gong, Z.; Wang, S.; Meng, W.; Jiang, W. Fault Diagnosis Method for Rolling Bearings Based on a Digital Twin and WSET-CNN Feature Extraction with IPOA-LSSVM. Processes 2025, 13, 2779. [Google Scholar] [CrossRef]
Atmaji, F.T.D.; Jamasri; Yuniarto, H.A.; Made Miasa, I. Experimental investigation of shaft misalignment effects on bearing reliability through vibration signal analysis using machine learning and deep learning. Results Eng. 2025, 27, 106754. [Google Scholar] [CrossRef]
Nguyen, T.-D.; Nguyen, T.-H.; Do, D.-T.-B.; Pham, T.-H.; Liang, J.-W.; Nguyen, P.-D. Efficient and Explainable Bearing Condition Monitoring with Decision Tree-Based Feature Learning. Machines 2025, 13, 467. [Google Scholar] [CrossRef]
Imam, S.A.; Lim, M.H.; Abdelrhman, A.M.; Ahmad, I.; Leong, M.S. Enhanced Blade Fault Diagnosis Using Hybrid Deep Learning: A Comparative Analysis of Traditional Machine Learning and 1D Convolutional Transformer Architecture. Eng. Rep. 2025, 7, e70202. [Google Scholar] [CrossRef]
Gong, S.; Kim, T.; Jeong, J. SPT-AD: Self-Supervised Pyramidal Transformer Network-Based Anomaly Detection of Time Series Vibration Data. Appl. Sci. 2025, 15, 5185. [Google Scholar] [CrossRef]
Jabbar, A.; Cocconcelli, M.; D’Elia, G.; Borghi, D.; Capelli, L.; Molano, J.C.C.; Strozzi, M.; Rubini, R. MOIRA-UNIMORE Bearing Data Set for Independent Cart Systems. Appl. Sci. 2025, 15, 3691. [Google Scholar] [CrossRef]
Diversi, R.; Lenzi, A.; Speciale, N.; Barbieri, M. An Autoregressive-Based Motor Current Signature Analysis Approach for Fault Diagnosis of Electric Motor-Driven Mechanisms. Sensors 2025, 25, 1130. [Google Scholar] [CrossRef]
Janjarasjitt, S. Investigating the Effect of Vibration Signal Length on Bearing Fault Classification Using Wavelet Scattering Transform. Sensors 2025, 25, 699. [Google Scholar] [CrossRef]
Rong, Z.; Lee, J. An interpretable transfer learning method for bearing diagnosis across different systems, faults, and signal types. Struct. Health Monit. 2025, 14759217251363600. [Google Scholar] [CrossRef]
Hussain, R.; Alshaikh Saleh, M.; Refaat, S.S. Various Faults Classification of Industrial Application of Induction Motors Using Supervised Machine Learning: A Comprehensive Review. IEEE Access 2025, 13, 146649–146675. [Google Scholar] [CrossRef]
Nguyen, H.-A.-H.; Kim, C.H. Efficient Bearing Fault Diagnosis for Edge Computing Using Grayscale Spectrograms and Hybrid Neural Model Compression. IEEE Access 2025, 13, 147494–147510. [Google Scholar] [CrossRef]
Berghout, T.; Bechhoefer, E.; Djeffal, F.; Lim, W.H. Integrating Learning-Driven Model Behavior and Data Representation for Enhanced Remaining Useful Life Prediction in Rotating Machinery. Machines 2024, 12, 729. [Google Scholar] [CrossRef]
Soomro, A.A.; Muhammad, M.B.; Mokhtar, A.A.; Md Saad, M.H.; Lashari, N.; Hussain, M.; Sarwar, U.; Palli, A.S. Insights into modern machine learning approaches for bearing fault classification: A systematic literature review. Results Eng. 2024, 23, 102700. [Google Scholar] [CrossRef]
Sánchez, R.-V.; Macancela, J.C.; Ortega, L.-R.; Cabrera, D.; García Márquez, F.P.; Cerrada, M. Evaluation of Hand-Crafted Feature Extraction for Fault Diagnosis in Rotating Machinery: A Survey. Sensors 2024, 24, 5400. [Google Scholar] [CrossRef]
Gruber, H.; Fuchs, A.; Bader, M. Evaluation of a Condition Monitoring Algorithm for Early Bearing Fault Detection. Sensors 2024, 24, 2138. [Google Scholar] [CrossRef]
Sharma, A.; Patra, G.K.; Naidu, V.P.S. Machine learning Based Bearing Fault Classification Using Higher Order Spectral Analysis. Def. Sc. J. 2024, 74, 505–516. [Google Scholar] [CrossRef]
Ferraz Júnior, F.; Romero, R.A.F.; Hsieh, S.-J. Machine Learning for the Detection and Diagnosis of Anomalies in Applications Driven by Electric Motors. Sensors 2023, 23, 9725. [Google Scholar] [CrossRef]
Ni, Q.; Ji, J.C.; Halkon, B.; Feng, K.; Nandi, A.K. Physics-Informed Residual Network (PIResNet) for rolling element bearing fault diagnostics. Mech. Syst. Signal Process. 2023, 200, 110544. [Google Scholar] [CrossRef]
Kumar, P.; Khalid, S.; Kim, H.S. Prognostics and Health Management of Rotating Machinery of Industrial Robot with Deep Learning Applications—A Review. Mathematics 2023, 11, 3008. [Google Scholar] [CrossRef]
Tu, Y.; Inoue, T.; Yabui, S.; Katayama, K.; Tomimatsu, S. Hybrid feature selection method for SVM classification and its application for fault diagnosis of wear and peeling in journal bearing with a little muddy water using long-term real data. J. Low Freq. Noise Vib. Act. Control 2023, 42, 231–252. [Google Scholar] [CrossRef]
Wu, Q.; Zhu, Z.; Tang, J.; Xia, Y.; Wu, Q.; Zhu, Z.; Tang, J.; Xia, Y. Fault diagnosis of printing press bearing based on deformable convolution residual neural network. Netw. Heterog. Media 2023, 18, 622–646. [Google Scholar] [CrossRef]
Brusa, E.; Delprete, C.; Di Maggio, L.G. Eigen-spectrograms: An interpretable feature space for bearing fault diagnosis based on artificial intelligence and image processing. Mech. Adv. Mater. Struct. 2023, 30, 4639–4651. [Google Scholar] [CrossRef]
Kumar, R.R.; Andriollo, M.; Cirrincione, G.; Cirrincione, M.; Tortella, A. A Comprehensive Review of Conventional and Intelligence-Based Approaches for the Fault Diagnosis and Condition Monitoring of Induction Motors. Energies 2022, 15, 8938. [Google Scholar] [CrossRef]
Mey, O.; Neufeld, D. Explainable AI Algorithms for Vibration Data-Based Fault Detection: Use Case-Adadpted Methods and Critical Evaluation. Sensors 2022, 22, 9037. [Google Scholar] [CrossRef]
Sinitsin, V.; Ibryaeva, O.; Sakovskaya, V.; Eremeeva, V. Intelligent bearing fault diagnosis method combining mixed input and hybrid CNN-MLP model. Mech. Syst. Signal Process. 2022, 180, 109454. [Google Scholar] [CrossRef]
Rajabi, S.; Saman Azari, M.; Santini, S.; Flammini, F. Fault diagnosis in industrial rotating equipment based on permutation entropy, signal processing and multi-output neuro-fuzzy classifier. Expert Syst. Appl. 2022, 206, 117754. [Google Scholar] [CrossRef]
Toma, R.N.; Gao, Y.; Piltan, F.; Im, K.; Shon, D.; Yoon, T.H.; Yoo, D.-S.; Kim, J.-M. Classification Framework of the Bearing Faults of an Induction Motor Using Wavelet Scattering Transform-Based Features. Sensors 2022, 22, 8958. [Google Scholar] [CrossRef]
Cascales-Fulgencio, D.; Quiles-Cucarella, E.; García-Moreno, E. Computation and Statistical Analysis of Bearings’ Time- and Frequency-Domain Features Enhanced Using Cepstrum Pre-Whitening: A ML- and DL-Based Classification. Appl. Sci. 2022, 12, 10882. [Google Scholar] [CrossRef]
Vives, J.; Palací, J. Artificial Intelligence and 3D Scanning Laser Combination for Supervision and Fault Diagnostics. Sensors 2022, 22, 7649. [Google Scholar] [CrossRef]
Ravikumar, K.N.; Madhusudana, C.K.; Kumar, H.; Gangadharan, K.V. Classification of gear faults in internal combustion (IC) engine gearbox using discrete wavelet transform features and K star algorithm. Eng. Sci. Technol. Int. J. 2022, 30, 101048. [Google Scholar] [CrossRef]
Ma, J.; Li, S.; Wang, X. Condition Monitoring of Rolling Bearing Based on Multi-Order FRFT and SSA-DBN. Symmetry 2022, 14, 320. [Google Scholar] [CrossRef]
Civera, M.; Surace, C. An Application of Instantaneous Spectral Entropy for the Condition Monitoring of Wind Turbines. Appl. Sci. 2022, 12, 1059. [Google Scholar] [CrossRef]
Ahmed, H.O.A.; Nandi, A.K. Connected Components-based Colour Image Representations of Vibrations for a Two-stage Fault Diagnosis of Roller Bearings Using Convolutional Neural Networks. Chin. J. Mech. Eng. 2021, 34, 37. [Google Scholar] [CrossRef]
Kumar, D.; Kalwar, I.; Hussain, T.; Chowdhry, B.; Ujjan, S.; Memon, T. A Novel Method Based on UNET for Bearing Fault Diagnosis. Comput. Mater. Contin. 2021, 69, 393–408. [Google Scholar] [CrossRef]
Ambrożkiewicz, B.; Syta, A.; Meier, N.; Litak, G.; Georgiadis, A. Radial internal clearance analysis in ball bearings. Eksploat. Niezawodn.—Maint. Reliab. 2021, 23, 42–54. [Google Scholar] [CrossRef]
Ranjbar, A.; Suratgar, A.; Ghidary, S.; Milimonfared, J. Condition Monitoring of an Industrial Oil Pump Using a Learning Based Technique. Sound Vib. 2020, 54, 257–267. [Google Scholar] [CrossRef]
Chen, S.; Meng, Y.; Tang, H.; Tian, Y.; He, N.; Shao, C. Robust Deep Learning-Based Diagnosis of Mixed Faults in Rotating Machinery. IEEE/ASME Trans. Mechatron. 2020, 25, 2167–2176. [Google Scholar] [CrossRef]
Zimnickas, T.; Vanagas, J.; Dambrauskas, K.; Kalvaitis, A. A Technique for Frequency Converter-Fed Asynchronous Motor Vibration Monitoring and Fault Classification, Applying Continuous Wavelet Transform and Convolutional Neural Networks. Energies 2020, 13, 3690. [Google Scholar] [CrossRef]
Ma, S.; Liu, W.; Cai, W.; Shang, Z.; Liu, G. Lightweight Deep Residual CNN for Fault Diagnosis of Rotating Machinery Based on Depthwise Separable Convolutions. IEEE Access 2019, 7, 57023–57036. [Google Scholar] [CrossRef]
Sahoo, S.; Das, J.K.; Debnath, B. Rolling Element Bearing Condition Monitoring using Filtered Acoustic Emission. Int. J. Electr. Comput. Eng. 2018, 8, 3560–3567. [Google Scholar] [CrossRef]
Deák, K.; Mankovits, T.; Kocsis, I. Optimal Wavelet Selection for the Size Estimation of Manufacturing Defects of Tapered Roller Bearings with Vibration Measurement using Shannon Entropy Criteria. Stroj. Vestn.—J. Mech. Eng. 2017, 63, 3–14. [Google Scholar] [CrossRef]
Gómez, M.J.; Castejón, C.; Corral, E.; García-Prada, J.C. Analysis of the influence of crack location for diagnosis in rotating shafts based on 3 x energy. Mech. Mach. Theory 2016, 103, 167–173. [Google Scholar] [CrossRef]
Antoniadou, I.; Dervilis, N.; Papatheou, P.; Maguire, A.E.; Worden, K. Aspects of structural health and condition monitoring of offshore wind turbines. R. Soc. 2015, 373, 20140075. Available online: https://royalsocietypublishing.org/doi/epdf/10.1098/rsta.2014.0075?src=getftr&utm_source=scopus&getft_integrator=scopus (accessed on 5 October 2025). [CrossRef]
Zaparoli Cunha, B.; Droz, C.; Zine, A.-M.; Foulard, S.; Ichchou, M. A review of machine learning methods applied to structural dynamics and vibroacoustic. Mech. Syst. Signal Process. 2023, 200, 110535. [Google Scholar] [CrossRef]
Parziale, M.; Lomazzi, L.; Giglio, M.; Cadini, F. Physics-Informed Neural Networks for the Condition Monitoring of Rotating Shafts. Sensors 2023, 24, 207. [Google Scholar] [CrossRef]
Chen, H.; Yu, Y.; Li, P. Transformer-Based Denoising of Mechanical Vibration Signals. arXiv 2023, arXiv:2308.02166. [Google Scholar] [CrossRef]
Nele, L.; Mattera, G.; Yap, E.W.; Vozza, M.; Vespoli, S. Towards the application of machine learning in digital twin technology: A multi-scale review. Discov. Appl. Sci. 2024, 6, 502. [Google Scholar] [CrossRef]
Azanaw, G.M. Application of Digital Twin in Structural Health Monitoring of Civil Structures: A Systematic Literature Review Based on PRISMA. J. Mech. Constr. Eng. 2024, 4, 1–10. [Google Scholar] [CrossRef]

Figure 1. Scheme of Machine Learning within artificial intelligence.

Figure 2. Schematic representation of deep and Machine Learning as part of artificial intelligence.

Figure 3. PRISMA flow diagram.

Figure 4. Selected documents: graphical overview.

Figure 5. Co-occurrence network. Keyword co-occurrence analysis in Bibliometrix represents keywords as nodes connected by edges, where the thickness of each edge indicates the strength of co-occurrence between two keywords. The size of each node reflects the frequency of that keyword across the dataset, while colours are used to highlight clusters of closely related keywords.

Figure 6. Authors and themes network.

Table 1. Searching results.

Description	Results
Timespan	2015:2025
Sources (Journals, Books, etc.)	50
Documents	91
article	86
review	5
Annual Growth Rate %	25.89
Document Average Age (years)	3.98
Average citations per document	28.41
References	843
Single-authored documents	2
Co-Authors per document	3.99
International co-authorships %	29.67

Table 2. Characteristics of the selected documents.

Description	Results
Timespan	2015:2025
Sources (Journals, Books, etc.)	32
Documents	51
article	47
review	4
Annual Growth Rate %	31.10
Document Average Age	3.65
Average citations per document	30.33
References	478
Single-authored documents	1
Co-Authors per document	4.06
International co-authorships %	33.33

Table 3. List of selected documents: titles, methodologies, and research fields.

Ref.	Year	Title	FD	CM	WT	DL	ML
[80]	2025	“Discrete wavelet integrated convolutional residual network for bearing fault diagnosis under noise and variable operating conditions”	Yes		Yes
[81]	2025	“An Attention-Driven Multi-Scale Framework for Rotating-Machinery Fault Diagnosis Under Noisy Conditions”	Yes		(FE)	(C)
[82]	2025	“Standardisation of Vibration-based Parameters for Rotor and Bearing for Machine Faults Detection Using Machine Learning Model”	Yes	Yes		Yes
[83]	2025	“Fault Diagnosis Method for Rolling Bearings Based on a Digital Twin and WSET-CNN Feature Extraction with IPOA-LSSVM”	Yes	Yes	(FE)	(C)
[84]	2025	“Experimental investigation of shaft misalignment effects on bearing reliability through vibration signal analysis using machine learning and deep learning”	Yes			Yes	Yes
[85]	2025	“Efficient and Explainable Bearing Condition Monitoring with Decision Tree-Based Feature Learning”	Yes	Yes			Yes
[86]	2025	“Enhanced Blade Fault Diagnosis Using Hybrid Deep Learning: A Comparative Analysis of Traditional Machine Learning and 1D Convolutional Transformer Architecture”	Yes	Yes		Yes
[87]	2025	“SPT-AD: Self-Supervised Pyramidal Transformer Network-Based Anomaly Detection of Time Series Vibration Data”	Yes				Yes
[88]	2025	“MOIRA-UNIMORE Bearing Data Set for Independent Cart Systems”	Yes	Yes			Yes
[89]	2025	“An Autoregressive-Based Motor Current Signature Analysis Approach for Fault Diagnosis of Electric Motor-Driven Mechanisms”	Yes	Yes	Yes
[90]	2025	“Investigating the Effect of Vibration Signal Length on Bearing Fault Classification Using Wavelet Scattering Transform”		Yes	Yes
[91]	2025	“An interpretable transfer learning method for bearing diagnosis across different systems, faults, and signal types”	Yes		Yes
[92]	2025	“Various Faults Classification of Industrial Application of Induction Motors Using Supervised Machine Learning: A Comprehensive Review”	Yes				Yes
[93]	2025	“Efficient Bearing Fault Diagnosis for Edge Computing Using Grayscale Spectrograms and Hybrid Neural Model Compression”	Yes			Yes
[94]	2024	“Integrating Learning-Driven Model Behavior and Data Representation for Enhanced Remaining Useful Life Prediction in Rotating Machinery”		Yes		Yes
[95]	2024	“Insights into modern machine learning approaches for bearing fault classification: A systematic literature review”		Yes			Yes
[96]	2024	“Evaluation of Hand-Crafted Feature Extraction for Fault Diagnosis in Rotating Machinery: A Survey”	Yes				Yes
[97]	2024	“Evaluation of a Condition Monitoring Algorithm for Early Bearing Fault Detection”		Yes			Yes
[98]	2024	“Machine Learning-based Bearing Fault Classification Using Higher Order Spectral Analysis”		Yes			Yes
[99]	2023	“Machine Learning for the Detection and Diagnosis of Anomalies in Applications Driven by Electric Motors”		Yes			Yes
[100]	2023	“Physics-Informed Residual Network (PIResNet) for rolling element bearing fault diagnostics”	Yes			Yes
[101]	2023	“Prognostics and Health Management of Rotating Machinery of Industrial Robot with Deep Learning Applications—A Review”	Yes			Yes
[102]	2023	“Hybrid feature selection method for SVM classification and its application for fault diagnosis of wear and peeling in journal bearing with a little muddy water using long-term real data”	Yes				Yes
[103]	2023	“Fault diagnosis of printing press bearing based on deformable convolution residual neural network”	Yes		Yes
[104]	2023	“Eigen-spectrograms: An interpretable feature space for bearing fault diagnosis based on artificial intelligence and image processing”	Yes	Yes			Yes
[105]	2022	“A Comprehensive Review of Conventional and Intelligence-Based Approaches for the Fault Diagnosis and Condition Monitoring of Induction Motors”	Yes	Yes		Yes
[106]	2022	“Explainable AI Algorithms for Vibration Data-Based Fault Detection: Use Case-Adapted Methods and Critical Evaluation”	Yes	Yes		(C)	(FE)
[107]	2022	“Intelligent bearing fault diagnosis method combining mixed input and hybrid CNN-MLP model”		Yes		Yes
[108]	2022	“Fault diagnosis in industrial rotating equipment based on permutation entropy, signal processing and multi-output neuro-fuzzy classifier”	Yes		Yes
[109]	2022	“Classification Framework of the Bearing Faults of an Induction Motor Using Wavelet Scattering Transform-Based Features”	Yes		Yes
[110]	2022	“Computation and Statistical Analysis of Bearings’ Time- and Frequency-Domain Features Enhanced Using Cepstrum Pre-Whitening: A ML- and DL-Based Classification”		Yes		COM	COM
[111]	2022	“Artificial Intelligence and 3D Scanning Laser Combination for Supervision and Fault Diagnostics”	Yes	Yes		CP	CP
[112]	2022	“Classification of gear faults in internal combustion (IC) engine gearbox using discrete Wavelet Transform features and K star algorithm”	Yes		Yes
[53]	2022	“A Comparative Study to Predict Bearing Degradation Using Discrete Wavelet Transform (DWT), Tabular Generative Adversarial Networks (TGAN) and Machine Learning Models”		Yes	Yes
[113]	2022	“Condition Monitoring of Rolling Bearing Based on Multi-Order FRFT and SSA-DBN”		Yes			Yes
[114]	2022	“An Application of Instantaneous Spectral Entropy for the Condition Monitoring of Wind Turbines”		Yes	Yes
[115]	2021	“Connected Components-based Colour Image Representations of Vibrations for a Two-stage Fault Diagnosis of Roller Bearings Using Convolutional Neural Networks”	Yes			Yes
[42]	2021	“Fault diagnosis in centrifugal pump using support vector machine and artificial neural network”		Yes			Yes
[116]	2021	“A novel method based on UNET for bearing fault diagnosis”	Yes			Yes
[117]	2021	“Radial internal clearance analysis in ball bearings”		Yes	Yes
[118]	2020	“Condition monitoring of an industrial oil pump using a learning-based technique”		Yes	Yes
[119]	2020	“Robust Deep Learning-Based Diagnosis of Mixed Faults in Rotating Machinery”	Yes			Yes
[120]	2020	“A technique for frequency converter-fed asynchronous motor vibration monitoring and fault classification, applying continuous Wavelet Transform and convolutional neural networks”		Yes	(FE)	(C)
[67]	2019	“A lighted deep convolutional neural network-based fault diagnosis of rotating machinery”	Yes		(FE)	(C)
[121]	2019	“Lightweight Deep Residual CNN for Fault Diagnosis of Rotating Machinery Based on Depthwise Separable Convolutions”	Yes		(FE)	(C)
[122]	2018	“Rolling element bearing condition monitoring using filtered acoustic emission”	Yes		Yes
[54]	2017	“Incipient Fault Diagnosis of Roller Bearing Using Optimized Wavelet Transform Based Multi-Speed Vibration Signatures”	Yes		Yes		Yes
[123]	2017	“Optimal wavelet selection for the size estimation of manufacturing defects of tapered roller bearings with vibration measurement using Shannon Entropy Criteria”		Yes	Yes
[124]	2016	“Analysis of the influence of crack location for diagnosis in rotating shafts based on 3 x energy”		Yes	Yes
[125]	2015	“Aspects of structural health and condition monitoring of offshore wind turbines”		Yes			Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Junquera, E.; Pérez-Carrera, C.; Rubio, H.; Bustos, A. Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review. Electronics 2026, 15, 1381. https://doi.org/10.3390/electronics15071381

AMA Style

Junquera E, Pérez-Carrera C, Rubio H, Bustos A. Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review. Electronics. 2026; 15(7):1381. https://doi.org/10.3390/electronics15071381

Chicago/Turabian Style

Junquera, Enrique, Carlos Pérez-Carrera, Higinio Rubio, and Alejandro Bustos. 2026. "Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review" Electronics 15, no. 7: 1381. https://doi.org/10.3390/electronics15071381

APA Style

Junquera, E., Pérez-Carrera, C., Rubio, H., & Bustos, A. (2026). Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review. Electronics, 15(7), 1381. https://doi.org/10.3390/electronics15071381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advances, Trends and Challenges for Determining the Condition of Railway Rolling Stock Using Automatic Classifiers: A Systematic Review

Abstract

1. Introduction

Structure of the Paper

2. Historical Perspective

2.1. Early Works

2.2. Tooling Development

2.3. Data and Information

2.4. Model Training

2.4.1. Machine Learning Traditional Tools

2.4.2. Wavelet Transform

2.4.3. Deep Learning (DL)

2.4.4. Fault Diagnosis (FD) and Condition Monitoring (CM)

3. Literature Review Methodology

3.1. Implementation of the Proposed Methodology

3.2. Searching Results

4. Results

5. Discussion

5.1. Wavelet Transform

5.2. Deep Learning

5.3. Machine Learning

5.4. Combined DL and WT

5.5. Combined DL and Traditional ML

5.6. Perspective on the Methodological Evolution

5.6.1. Classical Signal Processing + ML

5.6.2. Hybrid Feature Extraction + DL

5.6.3. End-to-End DL

5.6.4. Digital Twins

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI