1. Introduction
Railway systems play a vital role in global mobility networks, moving both passengers and cargo efficiently. However, maintaining these systems’ safety and performance presents ongoing challenges, particularly regarding wheel-related issues. One significant problem is the development of flat spots on train wheels, which typically occur when wheels lock or skid during braking [
1,
2,
3]. These wheel flats create recurring impacts during wheel rotation, leading to increased vibration and noise in the railway system [
4].
Wheel flats pose serious safety risks in railway operations. They create concentrated stress points where wheels meet rails, causing faster track wear and possible structural damage. These defects can lead to derailments in severe cases, threatening passenger and cargo safety. Derailments are a significant problem in the operation of rail systems and can account for up to 40% of rail accidents in some countries (data shown in
Figure 1), according to a 2023 report by Eurostat [
5]. Additionally, wheel flats increase noise pollution in urban areas and reduce ride comfort through vibration and sound [
6,
7].
Another important aspect of the negative impact of flat spots is energy loss. Flat spots in wheels cause significant periodic impact forces between the wheel and the rail, leading to forced vibrations. These vibrations represent dispersed energy, which, if not recovered through regenerative dampers [
8] or harvested [
9], is lost. In addition, flat spots affect the entire power supply and drive system of the rail vehicle, causing increased engine speed fluctuations, additional torque loads [
10], and damage to motor elements [
11]. Another aspect worth mentioning is the noise generated—it has been shown that rolling noise associated with vibrations is the main source of sound emitted by the vehicle [
12]. A flat spot that intensifies the wheel–rail interaction therefore increases noise, which is non-compliant with the idea of sustainable transport [
13].
Traditional wheel inspection methods rely on visual checks, which are time-consuming and prone to human error. In response, researchers have developed more advanced detection methods using vibration and acoustic measurements. These newer approaches offer continuous monitoring capabilities and improved reliability [
3,
14,
15].
These detection methods operate in two main ways. Vibration-based systems use sensors called accelerometers to measure changes in wheel movement patterns caused by flat spots. Acoustic systems, on the other hand, analyze the distinct sounds produced by damaged wheels, enabling non-contact measurement capabilities. A subcategory of the use of acoustics is the measurement of acoustic emission using the phenomenon of propagation of elastic acoustic waves in solids [
15,
16,
17].
This review focuses on vibroacoustic approaches (e.g., acceleration, acoustic, and acoustic emission sensing) because they provide scalable deployment options with relatively low-cost sensors, direct sensitivity to wheel–rail impact dynamics, and compatibility with both onboard and wayside architectures. Vision-based inspection and fiber-optic sensing are valuable complementary directions; however, they involve different sensing physics, infrastructure requirements, and evaluation metrics and are therefore considered out of scope for the present work.
The vibroacoustic signal in the diagnosis of rail transport components is often non-stationary and difficult to interpret [
18,
19]. To handle the aforementioned difficulties, which are amplified by challenges like varying train speeds and track conditions, these systems employ advanced signal processing techniques. Recent developments in data analysis methods, including but not limited to wavelet analysis, cepstrum analysis, empirical methods, machine learning, and deep learning, have improved the accuracy of wheel-flat detection. These improvements enable maintenance teams to identify problems as they develop, reducing system downtime [
20,
21,
22,
23,
24].
Early detection of wheel flats provides clear benefits in multiple operational aspects of railway systems:
Safety impact. Early detection of wheel flats helps prevent escalation of damage to rolling stock and track infrastructure, reducing the likelihood of hazardous running conditions and safety-critical failures under unfavorable operating scenarios.
Economic impact. Early detection supports condition-based maintenance by optimizing resource allocation, reducing unplanned downtime, and extending the operational life of wheel–rail components through timely intervention.
Environmental impact. Detecting wheel flats at an early stage can mitigate excessive noise and vibration emissions and indirectly reduce resource consumption associated with repeated maintenance actions and premature component replacement [
25].
The goal of this review is to support improved railway safety, reliability, and operational efficiency by synthesizing and critically structuring state-of-the-art vibroacoustic methods for wheel-flat detection and severity assessment in both onboard and wayside settings. The analysis of signal-processing and AI-based detection techniques is the means to achieve this goal, rather than an end in itself. By evaluating practical applications, performance limitations, and deployment constraints reported in recent studies, this review identifies research gaps and actionable directions for more reliable condition monitoring and condition-based maintenance in modern railway operations.
The remainder of this paper is organized as follows.
Section 2 outlines the review methodology, including databases, search strategy, and selection criteria.
Section 3,
Section 4 and
Section 5 synthesize wayside and onboard approaches and discuss challenges and deployment considerations.
Section 6 concludes the paper and outlines future research directions.
2. Methodology of Review
The present review adopts a scoping review methodology with the aim of investigating and describing the various methods of detecting flat spots on the wheels of rail vehicles. This approach was selected in order to evaluate the state-of-the-art methods in the field as comprehensively and broadly as possible and to identify the main concepts, possible research gaps and opportunities for professional application in the field. The methodology adopted facilitates the analysis of a broader range of papers, as it does not necessitate the selection of manuscripts in which a critical assessment of the reliability of a given study is possible, as would be the case in a systematic review. The primary objective of this paper is to analyze existing methods for detecting flat spots using vibration and acoustics. In addition, it aims to highlight the strengths, limitations and possible opportunities for practical application of these methods. Furthermore, it seeks to identify research gaps and propose possible directions for development [
26].
The literature search for the analysis was conducted in various academic databases, including ScienceDirect, SpringerLink, Google Scholar and IEEE Explore. In order to narrow down the search results and increase the quality of the subject of the review, only articles from peer-reviewed journals and conference proceedings were selected for further exploration [
27]. The time scope of the review was set to the last six years, i.e., the years 2019–2025, in order to include only the most recent discoveries and advances in the field. In addition, large language models (LLMs) like GPT 4.1 were used with dedicated overlays allowing direct access by the artificial intelligence model to academic databases, in this case Perplexity and Consensus, with the aim of increasing the efficiency of the literature search. This ensured that the language model did not hallucinate, i.e., suggest references that were not present. However, each publication suggested by the language model was rigorously checked manually by the authors. The combination of tools enabled the analysis of large datasets, facilitating the identification of contextual relationships in the literature.
Search terms included combinations of keywords such as “flat spot detection in rail wheels,” “rail vehicle defect monitoring,” “wayside wheel flat detection,” “onboard detection methods,” and “vibration and acoustic-based detection.” Inclusion criteria required the primary focus of the articles to be on wheel-flat detection. Studies addressing other rail wheel defects, such as polygonization or shelling, were included only if the methods demonstrated applicability to wheel-flat detection. Additionally, only studies employing vibration-based or acoustic-based methods were considered. Articles describing multisensor or alternative approaches were included only if they explicitly analyzed vibration- or acoustic-based techniques. Simulation-based or experimental-based studies were emphasized, while purely theoretical approaches were excluded to focus on practical methodologies.
To improve reproducibility,
Table 1 lists representative database-specific search strings and operators used during the literature search.
The review organized the examined detection methodologies into two primary approaches: wayside and onboard systems. Wayside methods employ stationary detection systems positioned alongside rail tracks, whereas onboard methods utilize systems integrated into rail vehicles to enable continuous monitoring during operation. This categorization formed the analytical framework, facilitating the examination and structuring of the studies in order to emphasize the specific applications, advantages, and limitations of each approach.
The review process involved a three-stage screening procedure. First, the titles and abstracts were reviewed to ensure the relevance of the studies. Next, a detailed assessment of the full-text articles was conducted against the pre-defined inclusion criteria. Finally, a selection of the most relevant studies was made for comprehensive analysis. To maintain consistency, the data extraction followed a structured template, capturing key details such as the type of detection system, methodological approach, study objectives, accuracy and reliability, limitations, practical applications, and recommendations for future research. The review employed a thematic synthesis to classify the studies into two primary categories: wayside and onboard methods. This synthesis facilitated a comparative analysis of the methodologies, evaluating their performance in terms of accuracy, feasibility, and practical implementation.
Summarized inclusion criteria:
Peer-reviewed journal articles and conference papers published between 2019 and 2025;
Railway wheel flats or flat spots as the primary or one of the main defects of interest;
Vibroacoustic sensing and analysis, including vibration, acoustic, or acoustic emission signals;
Onboard and/or wayside monitoring configurations;
Sufficient methodological detail to reproduce the pipeline at a high level, including sensor setup and a description of signal processing and/or learning-based detection;
At least one form of evaluation, such as simulation, laboratory tests, or field measurements.
Title and abstract screening as well as data extraction were performed independently by two reviewers. Disagreements were resolved through discussion, and a consensus decision was recorded for each record.
The article presents accuracy metrics as declared by the authors of the individual publications. During the review, individual experiments were not reproduced, and the reliability of the results provided was not assessed.
Although this paper is a review rather than a methodological tutorial, a consistent signal analysis workflow underlies most vibroacoustic wheel-flat detection studies. In general, the pipeline comprises data acquisition and synchronization, preprocessing and noise mitigation, transformation to the time-frequency or order domains when needed, feature extraction or representation learning, and finally detection and severity estimation with performance evaluation under stated boundary conditions. Throughout this review, methods are discussed using this common workflow to enable comparison across sensing configurations and operating regimes without repeating full derivations that are available in standard signal processing references.
Detailed mathematical derivations are beyond the scope of this review and are therefore referenced rather than reproduced.
This scoping review focuses on recent simulation-based and experimental research on vibration- and acoustic-based wheel-flat detection, published in peer-reviewed sources between 2019 and 2025. By organizing the findings based on wayside and onboard methods, the review provides a broad and structured overview of the current state of knowledge in this field. Furthermore, the utilization of advanced search tools, including machine-learning search engines and large language models, enables a comprehensive and efficient identification of relevant studies, enhancing the depth and scope of the analysis.
3. Onboard Vibration-Based Methods
Onboard detection methods for wheel flats are an important component of railway diagnostics, allowing continuous monitoring of wheel health during train operation [
24,
28,
29]. These systems, installed on components such as axle boxes, bogies, or traction systems, rely on vibration and acoustic signals to identify wheel defects in real time. Unlike wayside systems, onboard methods provide direct interaction with the dynamic wheel–rail interface, offering precise and timely fault detection. Despite challenges such as noise, speed variability, and complex operating conditions, advancements in signal processing and machine learning have enhanced the reliability and practicality of these methods for condition-based maintenance (CBM).
One effective approach for addressing the variability of train speeds involves cepstrum and order analysis. The method, as detailed in [
15], transforms vibration signals from the time domain to the angular domain, allowing periodic impacts caused by wheel flats to be identified consistently across different speeds. Measurements were collected using accelerometers mounted on axle boxes, ensuring proximity to the source of vibration. Cross-correlation analysis was additionally employed to estimate the size of wheel flats. The combined approach proved effective, particularly under conditions of severe noise. However, the method’s dependence on precise speed measurements introduced a limitation, as inaccuracies in speed data could compromise the angular domain transformation.
Noise suppression remains a key challenge in detecting wheel flats, particularly in environments with frequent rail joints and crossings. Angular Domain Synchronous Averaging (ADSA), presented in [
30], processes vibration signals in the angular domain, leveraging averaging techniques to enhance signal clarity. This method was validated through both simulation and field experiments, where axle box accelerometers collected vibration signals. While ADSA effectively suppressed noise and isolated periodic wheel-flat impacts, it was found to require pre-filtering to address low-frequency disturbances. This additional preprocessing step, while improving accuracy, introduced computational complexity that could hinder its application in real-time onboard systems.
Methods such as variational modal decomposition (VMD) have shown promise in isolating specific vibration modes from complex datasets. As described in [
31], VMD decomposes axle box vibration signals into intrinsic mode functions, enabling the separation of fault-specific frequencies from noise. The study employed envelope spectrum (ES) analysis to further enhance fault detection, with sensors mounted on both axle boxes and bogies, providing a comprehensive dataset. Field validation demonstrated the method’s high sensitivity to small wheel flats. However, overlapping frequencies caused by track irregularities posed challenges, necessitating advanced filtering techniques to ensure reliability in noisy environments. The proposed location of the acceleration transducer on the axle box is shown in
Figure 2.
In the context of analyzing dynamic wheel–rail interactions, torsional vibration analysis has emerged as a novel diagnostic method. Discussed in [
33], this approach focuses on vibrations within the traction system, which are influenced by defects such as wheel flats. Torsional signals were measured using transducers installed near the traction system. These measurements proved particularly effective for detecting defects at high speeds, where torsional responses to wheel flats become more pronounced. Despite its sensitivity, the method faced limitations in distinguishing torsional vibrations caused by wheel flats from those due to other irregularities, such as gear backlash. The study suggested additional filtering techniques to address these challenges.
Another advanced signal processing method is Adaptive Chirp Mode Decomposition (ACMD), which combines time-frequency analysis with signal mode separation to address variable-speed conditions. In [
1], sensors mounted on axle boxes and bogies collected vibration data, which were then analyzed using ACMD to extract fault-specific frequencies from noisy environments. The method demonstrated high accuracy in detecting even small wheel flats. However, its computational intensity posed challenges for real-time implementation, with the study recommending algorithmic optimization and hardware acceleration to improve feasibility.
A coupled rigid-flexible dynamic model, as described in [
34], was utilized to examine the impact of wheel flats on vehicles traversing railway turnouts. Analysis of axle box acceleration signals revealed characteristic frequency changes associated with wheel flats, with key frequencies in the 550–800 Hz range serving as reliable indicators for fault detection. The study demonstrated that vertical wheel–rail forces and axle box vibrations were notably elevated when wheel flats interacted with the frog zone of the turnout. Despite the method’s efficacy, it faced challenges in isolating fault-specific signals due to the overlapping vibrations caused by the turnout structure, emphasizing the need for advanced signal separation techniques.
In [
35], a data-driven method was developed for estimating wheel-flat lengths using a combination of surrogate modeling and optimization techniques. Axle box acceleration signals were analyzed to extract peak values corresponding to wheel-flat impacts, which were then used to train a Kriging surrogate model. An example using the aforementioned model to find a relationship between acceleration value and wheel-flat length is shown in
Figure 3. This model established a relationship between axle box accelerations, vehicle speeds, and wheel-flat lengths. To estimate defect lengths from operational data, the study employed a particle swarm optimization (PSO) algorithm, leveraging its iterative search capabilities to solve the inverse problem of identifying flat dimensions. The approach demonstrated high accuracy in simulations and experimental validation, successfully estimating wheel-flat lengths with minimal errors. However, real-world applicability was constrained by challenges such as noise from environmental conditions and vehicle non-linearities, as well as computational demands associated with the surrogate model and optimization process. These limitations highlight the need for further refinement to enable reliable real-time onboard implementation.
Machine learning models have significantly advanced the automation of fault detection. In [
36], a supervised learning model based on k-Nearest Neighbors (kNN) and logistic regression was trained on axle box acceleration data collected from simulations and controlled tests. The system achieved 98.6% accuracy, effectively identifying wheel flats under diverse conditions. Nonetheless, the reliance on extensive labeled datasets was identified as a constraint, particularly for operators with limited data resources. In contrast, [
23] employed a hybrid approach, combining multi-layer perceptrons (MLP) with random forest classifiers. This method achieved a detection accuracy of 99%, supported by robust preprocessing techniques to manage noise caused by track irregularities. However, the preprocessing steps added complexity to real-world deployment.
The integration of machine learning with advanced data preprocessing was further demonstrated in [
37], which introduced Activated Time-Domain Images (ATDI). This method transformed vibration signals into adaptive visual representations, allowing deep neural networks to classify faults effectively. Data collected from axle box accelerometers under varying speed conditions highlighted the method’s adaptability. However, the sensitivity of ATDI to noise spikes necessitated rigorous preprocessing, increasing computational demands. Similarly, [
38] explored hybrid Convolutional Neural Network—Long Short-Term Memory (CNN-LSTM) models for sequential data analysis, focusing on early-stage fault detection. While these models excelled in simulation settings, their computational intensity during training posed challenges for real-time onboard deployment. Cui et al. [
39] advanced this line of research by comparing LSTM and transformer models using data from a 1:10 scale test rig equipped with axle box accelerometers. Their results showed that transformer models, particularly with feature-level sensor fusion, outperformed LSTM, achieving a mean error of just 0.0069 mm in estimating wheel-flat depth. The system, implemented on embedded hardware, highlights the potential for accurate and cost-effective onboard detection under controlled but realistic conditions.
Another example of the use of machine learning for wheel-flat detection is the transfer learning network. A notable method, Frequency-Domain Gramian Angular Field (FDGAF), was discussed in [
32]. This technique encoded vibration signals from axle box accelerometers into images, which were subsequently classified using transfer learning networks. The FDGAF method proved effective for small datasets, achieving high classification accuracy despite limited training samples. The computational demands of encoding vibration signals into images, however, limited the method’s real-time applicability, underscoring the need for specialized hardware to support faster processing.
Addressing computational constraints for embedded deployment, the authors of [
40] developed LightWFNet, a lightweight 1D CNN (LCNN) optimized through Bayesian optimization. Using carbody accelerometers—a challenging location due to suspension damping and noise—the method incorporated depthwise separable convolutions and squeeze-and-excitation (SE) modules, achieving 93.53% accuracy with 378 times fewer FLOPs than conventional LeNet. Field tests on tank wagons validated the approach, though accuracy decreased to 67.86% at speeds above 85 km/h.
In [
41], a unified framework combining deep residual networks (ResNet) with squeeze-and-excitation modules and supervised contrastive learning (SCL) was presented. Validated on SIMPACK simulation data, the system achieved 97.36% accuracy for wheel-flat detection while providing fault severity regression. The method’s interpretability analysis revealed amplification of structural resonances (140, 230, 320 Hz) while suppressing low-frequency vehicle dynamics. Despite its superior performance, its relatively complex architecture and reliance on simulated data limit immediate industrial deployment.
Ref. [
42] explored the application of signal processing and deep learning for anomaly detection in railway systems, focusing on the identification of wheel flats. Their study highlighted the potential of condition-based maintenance frameworks, which integrate onboard vibration-based detection methods, to significantly reduce maintenance costs (up to 30%) and extend wheelset lifespan (approximately 40%). The authors employed a combination of order analysis and short-time Fourier transforms (STFT) to preprocess vibration data, compensating for variations in wheel rotation speed and converting time-domain signals into spectrograms suitable for deep learning. A modified LeNet-5 convolutional neural network architecture was then used to classify wheel flats. However, the research also acknowledged the challenges of real-world CBM deployment, particularly the difficulty of isolating wheel-flat vibrations from other noise sources (e.g., track irregularities, traction motor noise) and the computational demands of processing large, high-frequency vibration datasets in real time.
This comprehensive overview highlights the diverse methodologies employed for onboard wheel-flat detection, with a focus on vibration signal processing, machine learning, and experimental validation. By addressing the limitations of each method and leveraging their advantages, these techniques contribute to enhanced safety, operational reliability, and efficiency in railway systems. A summary of the onboard methods is presented in
Table 2.
Analysis of onboard wheel-flat detection methodologies reveals distinct patterns in signal processing approaches and validation strategies. Axle box-mounted accelerometers constitute the predominant sensor configuration (13 studies), with only limited exploration of alternative placements such as bogie and carbody [
43] or torsional variables [
33]. This concentration on axle box measurements reflects the direct mechanical coupling between wheel defects and sensing location, providing stronger signal characteristics for defect identification. The signal analysis methods distribution is shown in
Figure 4.
Deep learning and hybrid neural models represent a significant methodological category [
23,
32,
37,
38,
39,
40,
41], demonstrating superior performance metrics, with transformer models achieving an average error of 0.0069 mm [
39] and ATDI-DNN reaching 98–100% accuracy [
37]. These approaches excel particularly in handling variable operational conditions but require substantial computational resources and often extensive preprocessing.
Time-frequency analysis techniques [
15,
30] maintain relevance for specific applications, with ADSA [
30] demonstrating robust performance under variable track conditions. Signal decomposition and adaptive filtering methods [
1,
31,
34,
42] offer specialized approaches for isolating fault-related components, with VMD-ES [
30] proving effective at extracting flat-induced modes under noise interference.
Machine learning with feature extraction [
35,
36,
43] constitutes an intermediate approach between traditional signal processing and deep learning, with kNN-based methods achieving 98.6% accuracy [
36] through relatively straightforward implementation.
Validation strategies demonstrate considerable methodological diversity, ranging from pure simulation [
23,
35,
43] to field testing [
30,
37] and various hybrid approaches. Notably, scaled test rigs [
32,
39] provide controlled experimental environments while approximating operational conditions. This validation spectrum reflects different maturity levels across methodologies, with established techniques progressing to field implementation while emerging approaches remain in preliminary validation phases.
The temporal evolution of wheel-flat detection methodologies (2019–2025; as shown in
Figure 5) reveals a clear research progression from traditional approaches toward advanced computational techniques.
Machine learning with feature extraction dominated early research (2019) and was followed by significant methodological diversification in 2020 with the emergence of physics-based modeling. Signal decomposition methods gained prominence in 2021–2022, while deep learning approaches maintained consistent presence from 2020 onward, achieving exclusive representation by 2025 through transformer-based models [
39] with superior accuracy metrics (0.0069 mm average error). This trajectory demonstrates a systematic shift from conventional signal processing toward sophisticated machine learning paradigms capable of addressing the complexities of operational railway environments.
The validation methodology distribution across computational approaches (as shown in
Figure 6) reveals distinctive patterns in wheel-flat detection research.
Simulation predominates as the principal validation strategy (13 studies), with particularly strong representation in the machine learning with feature extraction and signal decomposition categories (3 studies each). Deep learning approaches demonstrate balanced validation distribution between all three methodologies, indicating methodological maturity across the research spectrum. Time-frequency analysis methods show less simulation emphasis compared to other approaches, suggesting greater progression toward practical implementation. Notably, physics-based modeling appears exclusively in simulation environments, reflecting its focus on theoretical foundations rather than operational application. Field testing and test rig validation demonstrate relatively equal distribution (4 and 5 studies, respectively) across most methodological categories, with machine learning showing a distinct preference for simulation validation. This comprehensive validation distribution complements the previously identified temporal evolution from traditional techniques toward advanced computational methods, with newer approaches progressing systematically through theoretical modeling and controlled testing toward field implementation.
4. Wayside Methods
Wayside methods for detecting wheel flats using vibration and acoustic signal processing provide a synchronized approach for monitoring all wheels on passing trains. These systems stand apart from onboard methods by avoiding the need for vehicle-mounted sensors, significantly reducing maintenance complexity and operational disruptions. By focusing on track-mounted sensors, wayside systems capture dynamic interactions between wheels and rails, utilizing distinct vibration patterns caused by defects like wheel flats to diagnose and address issues. This review examines these methods across five categories: time-domain, frequency-domain, time-frequency analysis, envelope analysis, and machine learning, highlighting their methodologies, effectiveness, and limitations.
One of the basic categories of wayside methods is that using the vibration signal measured at or near the rail. Time-domain methods directly analyze raw vibration signals, concentrating on features like root mean square (RMS) and peak amplitudes to identify defects. Ref. [
44] developed a mathematical model to simulate rail vertical vibrations caused by wheel flats, emphasizing the use of RMS values of vibration velocity as a primary detection metric. The model accounted for key factors, including wheel geometry and track irregularities, to ensure realistic simulations aligned with actual operating conditions. Vibration signals were measured using accelerometers placed along the rail, allowing precise data acquisition for evaluating the dynamic response. This approach demonstrated a robust capability to detect wheel flats under various simulated conditions, particularly for significant defects that produce noticeable vibration amplitudes. However, the method primarily focused on the transient nature of major defects, with limited analysis of early-stage or subtle anomalies exhibiting weaker vibration patterns. Despite its simplicity and reliance on RMS as a diagnostic indicator, the model effectively highlights changes in the dynamic response caused by wheel flats, providing a reliable yet straightforward tool for fault detection.
Frequency-domain methods delve deeper into the spectral characteristics of vibration signals to identify periodic impacts caused by wheel flats. In [
45], a method employing spectral kurtosis analysis to detect wheel flats by isolating transient high-energy events in vibration signals was proposed. Data were collected from strain gauges and accelerometers placed at 19 track positions to measure shear forces and accelerations. The method identified frequency bands with high kurtosis values, enabling precise defect detection through envelope spectrum analysis. The optimal frequency band for defect signals was determined using a kurtogram, with a center frequency of 0.1875 kHz and a bandwidth of 0.125 kHz. Accelerometers proved more reliable than strain gauges for long-term monitoring due to their resistance to environmental interference. While effective in controlled conditions, the method’s reliability diminished in noisy environments or where operational variability was significant. The approach demonstrated potential for wayside monitoring but was constrained by its dependence on proper sensor placement and preprocessing.
Time-frequency methods, which combine temporal and spectral analysis, are particularly suited to analyzing non-stationary vibration signals. In [
46], researchers applied multi-channel singular spectrum analysis (MSSA) to decompose multi-sensor vibration signals. Vibrational signals were recorded using five piezoelectric accelerometers, each positioned at the rail heel in the vertical direction and spaced 600 mm apart along the rail. Data were sampled at 10 kHz during train passages at constant speeds of 10, 20, 30, and 40 km/h. MSSA was used to decompose these multi-sensor signals into reconstructed components (RCs), isolating features associated with wheel flats. Specifically, the second RC exhibited significant amplitude increases at fixed angular positions, serving as a key diagnostic indicator of defects. Crest factors were calculated to verify the consistency of these RCs across angular positions. Regions with high crest factors compared to other sectors confirmed the presence of potential wheel defects. The method effectively identified repeating patterns caused by wheel flats across various speeds and positions relative to the sensors. However, the study noted increased noise levels when the vehicle passed directly above the sensors, emphasizing the need for preprocessing and signal interpretation. Although MSSA demonstrated robustness in feature extraction, its computational complexity may limit real-time applicability in operational railway systems. Recent research in [
47] explored other time-frequency analysis techniques, such as a combination of the wavelet transform and Wigner–Ville transform, to study vibration signals and pinpoint wheel-flat defects. These approaches provide a high-resolution view of the signals, helping to identify the signatures of problematic wheel flats. Data were collected using a single 3-axis accelerometer mounted on the fish plate of the rail. The wavelet transform was optimized to detect transient impulses in the vibration signal, while the Wigner–Ville transform (WVT) provided a time-frequency map to localize and confirm the presence of wheel-flat defects. The analysis demonstrated that bursts of energy in the low-frequency range (<5 Hz) corresponded to wheel-flat impacts, which were successfully localized to specific wheels and bogies of the Kolongpar Express. However, the WVT’s sensitivity to cross-frequency components posed challenges, requiring careful preprocessing to avoid artifacts. The study highlighted the computational intensity of the Wigner–Ville transform as a limitation, particularly for real-time applications, but emphasized its effectiveness in confirming defect signatures detected by the wavelet transform.
The study in [
48] presented a hybrid methodology for tram wheel-flat detection, combining micro-electro-mechanical systems-based (MEMS) vibration sensors with maximal overlap discrete wavelet packet transform (MODWPT) and energy-based classification. Acceleration data were recorded at a sampling rate of 1 kHz using MEMS accelerometers mounted on rails. MODWPT was applied to decompose the vibration signals into frequency bands, isolating energy changes associated with wheel faults. The weighted difference (DW) function was employed to determine the most diagnostic frequency band, with the range 418–422 Hz identified as highly effective for distinguishing normal and faulty wheels. Energy analysis in this frequency band revealed a 35% difference between healthy and defective wheels, establishing a clear threshold for defect detection. This approach demonstrated the potential for reliable classification of wheel conditions, even in challenging environments like tram depots with variable speeds (2–7 m/s). However, the method’s reliance on predefined parameters, including wavelet type and decomposition levels, highlighted the need for site-specific calibration. While the computational demands of MODWPT were manageable, the system required careful tuning to adapt to diverse operational scenarios.
The approach detailed in [
49] utilizes acceleration measurements to enable the early identification of wheel-flat defects. Accelerometers installed along the rail capture vibration signals, which are then analyzed using continuous wavelet transform (CWT) to extract features sensitive to damage. To reduce the influence of environmental and operational variations and improve the consistency of the data, the researchers applied principal component analysis (PCA) to these features. The processed features were subsequently evaluated using the Mahalanobis distance to calculate a damage indicator, which quantifies the deviation of the wheel’s condition from the established baseline. The detection process compares this damage indicator to a statistically derived confidence boundary, calculated using a Gaussian inverse cumulative distribution function. If the damage indicator exceeds the confidence boundary, the wheel is classified as defective. This approach demonstrated high sensitivity, capable of detecting small defects often overlooked by time-domain methods. While the study utilized multiple sensors, the researchers noted that a single accelerometer could achieve comparable performance, potentially reducing system complexity and cost. Ref. [
50] proposed an unsupervised learning methodology for wheel-flat detection, utilizing acceleration signals collected from sensors installed along the rail. The methodology comprised four main steps: feature extraction, feature normalization, data fusion, and outlier analysis. Feature extraction techniques, including auto-regressive (AR), auto-regressive with exogenous input (ARX), PCA, and CWT, were applied to derive damage-sensitive features from the vibration signals. PCA was also used for feature normalization to reduce the effects of environmental and operational variability. Data fusion, based on the Mahalanobis distance (MD), combined features from multiple sensors to enhance sensitivity to wheel flats. The damage indicator (DI), computed from the fused data, was compared against a statistical confidence boundary to classify wheels as healthy or defective. The study highlighted AR and ARX as the most effective feature extraction methods, demonstrating higher sensitivity compared to PCA and CWT. While the methodology achieved reliable defect detection even with a minimal sensor configuration, the computational demands of the data fusion process were identified as a limitation for real-time, large-scale deployment. This approach shares conceptual similarities with the methodology in [
49], particularly in its use of PCA for normalization, Mahalanobis distance for data fusion, and reliance on a DI and statistical confidence boundary for classification. However, [
50] offers a broader evaluation of feature extraction methods, emphasizing the effectiveness of AR and ARX techniques.
In [
51], an unsupervised learning approach for wayside detection, localization, and severity classification of wheel flats using trackside vibration measurements was proposed. The method utilized eight accelerometers mounted along the rail, with dynamic responses processed through CWT for feature extraction, followed by PCA for dimensionality reduction. Mahalanobis distance was then applied to fuse these extracted features into a unified damage indicator, which made it possible to distinguish between healthy and defective wheels. A hidden Markov model (HMM) was employed for automatic segmentation of signals, facilitating precise localization of wheel flats for both single and multiple defect scenarios. The severity classification stage involved a sparse autoencoder (SAE)-based feature reconstruction, with subsequent k-means clustering categorizing defect severity into three distinct levels: low, moderate, and severe. The proposed approach demonstrated a detection accuracy of 100% in simulated scenarios; however, its practical implementation requires experimental validation under real operating conditions.
Envelope spectrum analysis was applied in [
52] to detect wheel flats by isolating periodic impacts in vibration signals. Using accelerometers mounted between sleepers, the approach employed a kurtogram to pinpoint the frequency band with the highest kurtosis, highlighting the most impulsive behavior in the signal. This frequency band was then demodulated and passed through a low-pass filter to derive an analytical signal, allowing the calculation of its spectral envelope. Unequal responses among sensors served as a clear indicator of defects, signaling localized impacts caused by wheel flats. The analysis effectively identified defective wheels by examining amplitude variations in the envelope spectrum. For instance, a 20 mm flat produced a maximum amplitude of approximately 2 m/s
2, which increased to 12 m/s
2 for an 80 mm flat and 25 m/s
2 for a 140 mm flat. Despite its high detection accuracy, the methodology required precise filter tuning and optimal sensor placement to ensure consistent performance across varying operational conditions. Another example, described in [
53], employed Hilbert transform-based (HT) envelope analysis to isolate transient impacts in rail vibration signals. Piezoelectric accelerometers were positioned on the rail above sleepers and spaced to ensure comprehensive signal coverage over the track. The methodology focused on processing acceleration signals to extract the envelope of the vibration signals, capturing the cyclic impulsive patterns caused by wheel flats. To enhance defect detection accuracy, the acceleration signals were filtered using a band-pass filter designed to match the resonance frequency of the rail system. The filtered signals were then subjected to demodulation, allowing the extraction of their envelope for further analysis. A significant diagnostic criterion was the amplitude of the envelope, with higher peaks corresponding to the presence of wheel flats. The study demonstrated that trams with wheel flats exhibited envelope amplitude peaks exceeding 35 m/s
2, compared to significantly lower amplitudes for defect-free trams. However, the method’s reliance on precise filter tuning and sensor calibration introduced challenges in noisy environments and operational variability.
The study in [
54] proposed a methodology for detecting wheel flats using data collected from four unidirectional vertical accelerometers installed mid-span between sleepers on each rail. The sensors were strategically positioned to optimize sensitivity to dynamic responses induced by wheel defects. FFT preprocessing was employed to transform time-domain acceleration signals into the frequency domain, allowing the identification of excitation frequencies linked to flat-induced impacts. The acceleration signals were sampled at 10 kHz, ensuring sufficient resolution for detecting subtle vibration features. The core methodology utilized a stacked sparse autoencoder (SSAE) for feature extraction and dimensionality reduction. This process compressed the data into a bottleneck layer, isolating features that best represented wheel-flat-induced vibrations. These features were subsequently fused using the Mahalanobis distance to compute a damage index for each case, distinguishing between healthy and defective wheels. This method was able to relate DI to the wheel-flat size range as shown in
Figure 7. The system achieved a classification accuracy of 90% when using two accelerometers (one on each rail), with results slightly lower (85%) when all four sensors were considered. While the methodology performed well under controlled conditions, it exhibited sensitivity to noise and variability in train speed, highlighting the importance of robust preprocessing and proper sensor placement.
The work presented by the authors of [
55] describes the development and validation of a trackside system for detecting wheel defects. It combines physics-based modeling and machine learning techniques. Physical models were utilized to optimize sensor placement and ensure the interpretability of results, while machine learning algorithms were employed to efficiently adjust correction parameters. The methodology centered on features extracted from acceleration measurements, particularly the root-mean-square value of the band-pass filtered around the first pinned–pinned frequency, as a key indicator of impact severity. The study accounted for the influence of speed and impact position on acceleration features, applying correction functions derived from numerical models to standardize the data and estimate defect severity independently of operational conditions. Machine learning further enabled stable predictions of peak contact forces caused by wheel flats, achieving a theoretical RMSE of less than 5%. Validation within a real-world metro track environment demonstrated the system’s potential, although the methodology requires tuning for specific track configurations, representing a key limitation.
In [
16], advanced acoustic signal analysis using Fourier and Hilbert transforms was applied in pass-by tests for detecting wheel flats. Acoustic signals were captured using dedicated trackside microphones and analyzed to discern impulse noises caused by flat spots on the wheels. This method highlighted the role of time-domain and frequency-domain analysis in isolating defect-related signals from background noise. While effective in detecting localized defects, the approach required precise calibration of sensor placement and environmental noise suppression to ensure accurate results, particularly in high-traffic railway settings.
The study in [
56] introduced a psychoacoustic framework for wayside wheel-flat detection and classification. Using microphones to capture acoustic emissions during train passings, researchers calculated psychoacoustic metrics such as loudness and sharpness to characterize wheel defects. This approach integrated perceptual considerations into the detection process, offering insights into both technical and human-audible aspects of the noise. While the method showed promise in accurately identifying defects and their perceived annoyance levels, its dependence on subjective auditory assessments posed challenges for standardization in broader applications.
The method presented in [
57] relies on acoustic emission (AE) techniques to monitor and detect wheel-flat defects under complex wheel–rail rolling interactions. By employing a feature fusion strategy—specifically, the Improved Synthesized Health Index (ISHI), which combines multiple discriminative features from AE signals—alongside a time-adaptive thresholding mechanism, researchers achieved improved detection rates and reduced interference from rolling noise. The transducers were positioned near the rails to capture AE signals with high sensitivity, while advanced feature extraction and fusion enhanced the reliability of the classification process. This approach demonstrated strong accuracy in controlled environments but faced challenges in noisy operational settings where additional acoustic events could affect the results.
The methodology presented in [
58] introduced an attentive feature selection sequential (AFSS) framework for processing acoustic emission signals in railway wheel defect detection. The approach integrated Mel-frequency and Gammatone cepstral coefficients for feature extraction with a bidirectional gated recurrent unit (biGRU) network architecture. Two AE sensors, mounted on the rail web, captured signals at a 2 MHz sampling rate. The system demonstrated exceptional performance, with a false alarm rate of 0.0022 ± 0.0066 and superior noise resistance compared to state-of-the-art methods. While implementation required increased computational resources during the training phase, the method effectively handled imbalanced datasets through ROC curve-based threshold rescaling. However, the detection capability for spalling defects remained inferior to wheel-flat identification, indicating areas for future improvement.
Ref. [
59] describes a combined methodology for detecting wheel flats using vibration and acoustic signal processing in a wayside monitoring system. Vibration signals were collected with track-mounted accelerometers, while acoustic data were analyzed using linear configuration pattern kurtograms (LCP-K). Signal segmentation isolated periodic elements, and feature extraction techniques, including wavelet packet energy (WPE) and time-domain features (TDF), were employed. Classification utilized machine learning algorithms, such as decision trees (DT), support vector machines (SVM), and Fisher’s linear discriminant analysis (FLDA), achieving detection accuracies of up to 100% for vibration-based features. Adaptive synthetic sampling (ADASYN) was used to address class imbalance, enhancing the dataset for classifier training. Advantages of this approach include the integration of multimodal data, which improves defect detection reliability and sensitivity to subtle changes in signal patterns. However, the method’s complexity increases with the use of multiple sensors, and the computational demands for processing high-dimensional data may pose challenges for real-time applications.
Wayside monitoring systems employing these methods offer a scalable alternative to onboard solutions, enabling comprehensive defect assessment for entire rail networks. These systems reduce maintenance burdens by eliminating the need for vehicle-mounted sensors, enhancing safety and reliability. The different approaches to wayside signal measurement are shown in
Figure 8. However, challenges remain in achieving real-time processing, improving computational efficiency, and ensuring adaptability to diverse operational environments. Future advancements in signal processing, sensor technology, and artificial intelligence will further enhance the capabilities of these systems, paving the way for improved railway safety and operational reliability. A summary of the wayside methods is presented in
Table 3.
Analysis of wheel-flat detection methodologies reveals substantive patterns across signal modalities, processing techniques, and validation approaches. Vibration signals constitute the predominant sensing modality, employed in 13 studies compared to 3 acoustic and 2 acoustic emission investigations. This distribution reflects the established reliability and implementation accessibility of vibration monitoring for railway applications.
Figure 9 shows the classification of signals into methods.
Time-frequency analysis methods [
44,
45,
47,
52,
53] represent a significant portion of vibration-based approaches, effectively leveraging both temporal and spectral characteristics for defect identification. Machine learning with feature extraction constitutes the most prevalent methodological framework across all signal domains [
50,
51,
54,
55,
56,
59], indicating a progressive shift toward automated pattern recognition in wheel defect diagnosis.
Validation strategies demonstrate methodological maturation, with simulation studies providing theoretical foundations, test rigs offering controlled experimental environments, and field tests confirming operational viability. This progression reflects the development cycle typical in diagnostic technology evolution. Notably, studies employing machine learning techniques frequently incorporate field validation [
56,
59], suggesting practical implementation potential.
Acoustic emission signal processing predominantly employs either signal decomposition [
57] or deep learning approaches [
58], reflecting the inherent complexity of stress wave analysis in railway environments with significant background disturbances. The superior noise resistance demonstrated by advanced techniques such as transformer models with feature-level fusion [
39] (achieving 0.0069 mm average error, 0.0985 mm maximum error) contrasts with traditional signal processing methods that, while computationally efficient, exhibit reduced robustness to interference.
Methods utilizing spectral kurtosis [
45] or envelope spectrum analysis [
52] demonstrate particular effectiveness for extracting impulsive components from vibration signals, while Hilbert transform approaches [
53] excel at identifying wheel flats through envelope amplitude analysis. MEMS-based sensors with wavelet packet decomposition [
48] offer cost-effective solutions for depot conditions, successfully discriminating faulty wheels within specific frequency bands (418–422 Hz).
The relative underrepresentation of deep learning applications [
39,
58] suggests an emerging research direction with significant potential for enhancing detection accuracy and robustness, particularly in operational environments characterized by variable speeds, track conditions, and background noise levels.
The heatmap (
Figure 10) reveals distinct validation patterns across wheel-flat detection methodologies. Field testing predominates with 9 studies, primarily employing time-frequency analysis and machine learning with feature extraction for vibration signals. Simulation environments (8 studies) show similar methodological distribution, focusing on vibration-based techniques. Test rig validation (4 studies) appears exclusively with deep learning approaches and acoustic emission signals, indicating that these represent emerging technologies at earlier development stages. Machine learning with feature extraction demonstrates the most balanced validation distribution, suggesting methodological maturity. The absence of field-validated deep learning applications highlights a significant implementation gap, while acoustic emission signals remain exclusively validated in controlled test environments, reflecting their greater sensitivity to operational conditions. This distribution illustrates a clear technological progression pathway where established techniques have completed the full validation cycle while emerging approaches remain in preliminary validation phases.
5. Discussion
5.1. The Importance of Real-World Validation in Wheel-Flat Detection Research
While the previous sections focused on advancements in signal processing and machine learning for wheel-flat detection, a critical barrier to real-world application remains largely unresolved: the absence of comprehensive, real-world validation. Many detection methods demonstrate high accuracy in controlled environments but lack evidence of their performance under operational railway conditions. Numerous studies rely on simulations, test rigs, or artificially induced wheel flats. Su et al. [
36] and Shaikh et al. [
23], for instance, reported accuracies of 98.6% and 99%, respectively, using machine learning models trained on simulated or lab-generated datasets. Yet, these methods have not been tested in real operational scenarios where environmental factors, track irregularities, and rolling stock diversity introduce considerable complexity. The authors of [
38] similarly demonstrated high accuracy using a hybrid CNN-LSTM model but acknowledged significant computational limitations for real-time deployment in field settings. Even recent contributions leveraging physical setups fall short of full-scale validation. Researchers in [
39] tested transformer and LSTM-based models on a 1:10 scale railway rig, which showed promising accuracy. However, the authors emphasized that scaled environments—despite being useful for prototyping—fail to replicate the complexity of high-speed train dynamics, contact mechanics, and structural loads seen in real operations. Similarly, [
32] demonstrated the potential of transfer learning and Gramian angular field encoding in a laboratory setup but explicitly noted the need for validation using naturally worn wheel flats in real conditions.
Simulation studies further dominate recent literature. Shim et al. [
42] generated wheel-flat signals via multibody simulation (SIMPACK), later combining them with recorded urban rail noise. While this hybrid approach improves realism, it still lacks validation against true field data. In [
43], an analysis of vibration detectability in a freight wagon showed that signal strength diminishes significantly between the axle box, bogie, and carbody. While insightful, this study—based on simulation—demonstrates that sensor placement strongly influences detectability yet fails to address how these effects play out under unpredictable field conditions.
There are rare examples of partial validation. Zhou et al. [
30] applied ADSA using both simulation and field test data from a freight vehicle. Although this hybrid approach adds credibility, it was limited to a single test scenario and cannot be generalized across multiple speeds, rail profiles, and vehicle types. Wang et al. [
33] specifically called for future testing with “real high-speed trains” to verify torsional vibration methods. Likewise, Chen et al. [
1] emphasized that their high-resolution ACMD-based detection method must be validated “on operational railway vehicles across diverse conditions” to ensure robustness.
This persistent validation gap underscores the need for industry–academic collaboration. Developing effective detection systems requires access to operational vehicles, exposure to naturally occurring defects, and testing across a wide range of scenarios. Without this, algorithms remain vulnerable to overfitting to idealized conditions. Furthermore, mounting locations, sensor types, and data transmission constraints can all affect practical deployment viability—issues rarely addressed in simulation studies.
Bridging this gap will also require the railway industry to actively support joint validation efforts. Standardized testing protocols and shared operational datasets would enable reproducible benchmarking across detection methods. Given the safety-critical nature of wheel defects—capable of causing derailments and structural degradation—such initiatives are not merely academic, but essential for responsible system deployment.
In conclusion, while simulation-based research continues to fuel innovation, its benefits cannot be fully realized without rigorous field validation. This transition from controlled proof-of-concept to robust, field-ready systems is imperative. Only through real-world testing and industry alignment can wheel-flat detection technologies mature into reliable tools that improve safety, reduce costs, and support predictive maintenance strategies across modern rail networks.
5.2. Comparative Analysis of Onboard and Wayside Detection Methods
Analysis of wayside and onboard methods allows for clear identification of areas of application by operators. The selection of the appropriate technology should always be preceded by an assessment of the operating context, the scale of the rolling stock, the available infrastructure and the objectives to be achieved.
Wayside methods currently form the basis for wheel condition diagnostics in large-scale railway networks with multiple operators. Installed in strategic locations, they allow the condition of all vehicles passing through a given point to be monitored, regardless of the carrier or type of rolling stock. They allow even serious defects that could lead to damage to the track infrastructure or pose a safety risk to be detected effectively with just one sensor [
45]. Furthermore, these solutions are relatively cost-effective from the perspective of the infrastructure manager, as the installation cost is spread over a very large number of monitored axles and can be limited to just one sensor. However, it should be noted that the effectiveness of wayside detection decreases at low speeds and in the case of single defects that are difficult to detect. In addition, the limited number of measuring points may mean that a defect occurring between measuring stations remains undetected for some time.
Onboard methods, using vibration and noise sensors installed directly on the vehicle, allow for continuous real-time monitoring of the condition of the wheels, regardless of location and operating conditions. Due to their greater ability to integrate with other signals from the vehicle—e.g., using a tachometer to correlate vibrations with wheel speed—they enable more accurate detection of flat spots [
30]. This is particularly important for operators who care about high safety standards and are ready to respond quickly to developing defects. The main barrier to implementation here is the cost of implementation and the need to integrate new functionalities with existing CBM systems. In practice, it makes the most economic sense to use onboard sensors already used to monitor bearings, transmissions or engines—however, this requires advanced filtering algorithms that can distinguish between signals from different types of damage. Another advantage of the onboard method is the ability to detect flat spots even at very low driving speeds.
In both approaches, the key challenge remains not so much the detection of the defect itself but the precise assessment of its size. This is because wheel turning—the only available repair method—leads to a reduction in its service life. In practice, operators and infrastructure managers therefore seek to reduce the number of unnecessary relocations, which requires reliable classification of flat spots according to their size. Vibroacoustic methods, both onboard and trackside, allow the level of damage to be estimated, although the accuracy of this classification is still the subject of research and development. The comparison of key factors for wayside/onboard systems implementation is shown in
Table 4.
In the context of balancing the early detection of flat spots with the costs of rolling stock, an additional argument to consider is the regulations concerning the permissible size of flat spots, which do not result in the need to withdraw a given unit of rolling stock from service for repair. There are various regulations on this matter, particularly for railways. In the United States, federal regulations for railways allow a wheel with a single flat spot ≤ 2.5 inches (≈64 mm) in length or with two adjacent flat spots ≤ 2 inches each (≈51 mm). In Sweden, repair is only required for flat spots with a length of 60 mm or more; this is similar to the rules in Australia (a summary is presented in
Table 5). Onboard systems capable of detecting very small flat spots, in the order of 10 mm, may not be practical for operators. Simpler and cheaper systems with an accuracy that allows flat spots to be detected only when they approach a size that requires withdrawal from service will be more advantageous.
Thus, it can be concluded that the accuracy of flat spot prediction alone is not a key factor in assessing the suitability of use by rail fleet operators. If there is no need for continuous monitoring of the defect, simpler wayside systems, whose sensitivity will enable the detection of a flat spot before it reaches a size that would exclude the vehicle from further operation under the regulations, will suffice. However, if the operator wishes to monitor its rolling stock continuously, regardless of its presence in depots or service stations, it is worth considering systems that predict the formation of flat spots from an onboard position, where an economical solution would be to integrate existing solutions that monitor other vehicle components, such as bearings.
5.3. Role of Machine Learning and Deep Learning in Enhancing Detection Accuracy
The integration of machine learning and deep learning approaches has significantly advanced wheel-flat detection capabilities compared to traditional signal processing methods. Recent research demonstrates that ML/DL techniques achieve superior classification accuracy while reducing false positives in both onboard and wayside monitoring systems. In particular, hybrid approaches combining ML feature extraction with classical classifiers have yielded remarkable results, with random forest classifiers achieving up to 99% detection accuracy when paired with multi-layer perceptron feature extractors [
23]. Similarly, data-driven methods incorporating k-Nearest Neighbor have demonstrated 98.6% accuracy in detecting wheel flats across diverse operational conditions [
36].
The distinctive advantage of ML/DL methods lies in their capacity to automatically learn complex, non-linear relationships within vibration and acoustic signals without requiring extensive domain expertise for feature engineering. This capability proves particularly valuable when addressing variable-speed conditions and environmental noise, which traditionally pose significant challenges for conventional signal processing techniques. The transformer model with feature-level sensor fusion, for instance, achieves an average error as low as 0.0069 mm (5.30% error) in wheel-flat depth estimation, substantially outperforming traditional methods in handling data from multiple sensors [
39]. Similarly, the application of Activated Time-Domain Images with deep neural networks demonstrates remarkable robustness to speed variations, maintaining detection accuracy above 98% across speeds ranging from 30 to 90 km/h [
37]. Additionally, [
51] illustrates these advantages by integrating unsupervised ML techniques, enabling effective multi-sensor data fusion and robust classification of wheel-flat severity. Despite achieving high accuracy in simulated scenarios, the study similarly highlights the critical need for experimental validation under real operational conditions.
Despite these advances, ML/DL approaches face several significant limitations. Computational demands represent a primary constraint, particularly for onboard systems where real-time processing capability is essential. Complex architectures like CNN-LSTM models [
38] and Frequency-Domain Gramian Angular Field [
32] encoders require substantial processing resources, potentially limiting their practical implementation without specialized hardware acceleration. Additionally, these methods often necessitate extensive training datasets, presenting challenges for operators with limited historical fault data. While techniques such as Adaptive Synthetic Sampling [
59] and transfer learning [
32] help address this limitation, they introduce additional complexity to the implementation process.
The trade-off between computational efficiency and detection accuracy presents a fundamental challenge in developing practical wheel-flat detection systems. Simpler ML models like decision trees offer computational advantages but may sacrifice some accuracy and robustness compared to more complex DL architectures. This balance is particularly important when considering real-time applications, where processing latency directly impacts the system’s ability to promptly identify defects. A promising approach to address this trade-off involves implementing feature-level fusion strategies that combine multiple sensor inputs before classification, achieving higher accuracy (up to 100% with vibration data [
59]) while maintaining reasonable computational demands.
Engineering considerations for multi-sensor fusion in practical deployments are not limited to model design but also include system-level integration constraints:
Time synchronization: Clock drift, timestamp jitter, and alignment errors; sub-millisecond consistency may be required for impact localization.
Sampling-rate mismatch: Resampling and anti-aliasing are needed to unify the time base prior to fusion.
Heterogeneous modalities: Vibration vs. acoustic vs. AE require normalization and consistent feature scaling.
Missing data/dropouts: Fusion should support confidence scoring and fallback modes (e.g., decision-level fusion when one channel fails).
Computational efficiency: Feature-level fusion is often preferable to raw-signal fusion for edge devices due to lower bandwidth and compute.
Latency budget: Detection window length and buffering must be matched to actionable maintenance needs and communication constraints.
While ML/DL methods demonstrate impressive performance in controlled environments and simulations, a significant research gap remains in validating these approaches under diverse real-world operating conditions. The transition from laboratory validation to operational implementation requires addressing challenges related to sensor robustness, environmental interference, and long-term reliability. Nevertheless, the remarkable advances in ML/DL techniques, particularly those utilizing signal transformation and multi-sensor fusion, indicate a promising direction for developing more accurate, reliable, and efficient wheel-flat detection systems for railway applications.
5.4. Challenges in Data Acquisition and Quality
The effectiveness of wheel-flat detection systems fundamentally depends on data quality, which is challenged by multiple factors such as sensor placement, environmental noise, and varying operating conditions. Notably, suboptimal sensor positioning can significantly degrade signal quality; for instance, transitioning from wheelset to bogie-mounted sensors resulted in a 17.3 dB reduction in detectability indicators, underscoring the importance of placement strategies [
43]. Environmental influences—particularly ambient acoustic noise and track-induced vibration—can obscure defect signatures and hinder the reliability of acoustic-based detection methods [
16]. Moreover, variations in train speed alter the frequency content of impact signals [
45,
59], while the rarity of wheel-flat instances introduces significant dataset imbalance issues, which researchers have addressed using data augmentation methods such as ADASYN [
59]. Among signal processing strategies for wheel-flat assessment, data-driven estimation methods—such as Kriging surrogate modeling (KSM) combined with particle swarm optimization—have demonstrated improved accuracy, with relative errors remaining below 6% in simulations and within ±10.5% in field experiments. In contrast, traditional single-peak estimation methods exhibited errors up to −15.65%, indicating the potential of model-based approaches to improve robustness under variable and noisy operating conditions [
35].
Advanced filtering and demodulation techniques have also been employed to isolate defect-related information in noisy railway environments. Spectral kurtosis and kurtogram-based methods identify the most impulsive frequency bands without requiring prior historical data [
45], while Hilbert transforms and band-pass filtering near resonance frequencies (e.g., 8 Hz ± 500 Hz) have been used to isolate impact energy in acoustic signals [
16]. Standardization of sensor layouts—particularly through symmetric and multi-sensor configurations—has further improved detection coverage and consistency [
45]. Normalization strategies have improved detection robustness under varying operating conditions. Feature-level fusion of vibration data from onboard and trackside sensors has significantly enhanced accuracy, reducing the maximum detection error from 1.9879 mm to 0.0985 mm [
39], while model-based correction functions have been applied to normalize signal features across different values of speed, load, and impact position [
14]. Multi-sensor approaches have also demonstrated high performance, with classifiers trained on combined vibration and acoustic signals achieving detection accuracies of up to 100% in both measured and augmented datasets [
59].
Real-time wheel-flat detection in railway sensor networks is often constrained by limited compute, power, and communication bandwidth, particularly for onboard and remote wayside nodes. Practical implementations therefore favor (i) compact feature extraction pipelines, (ii) lightweight classifiers (e.g., linear models, trees, small CNNs), and (iii) model compression (quantization/pruning) when deep models are required. The compute–accuracy trade-off becomes critical under higher speeds and higher sampling rates, where latency constraints may necessitate streaming inference, early-exit decision logic, or two-stage screening (fast anomaly detection followed by a slower severity estimator). In addition, deployment robustness requires handling drift (seasonal changes, rail wear, sensor aging) via periodic recalibration and monitoring of data quality indicators.
Table 6 summarizes common interference sources in field measurements and practical mitigation strategies, along with typical limitations that affect real-world deployment.
5.5. Implications for Energy Efficiency and Maintenance
The operational impact of wheel flats extends beyond detection challenges to fundamental efficiency losses and maintenance considerations. Dynamic simulations demonstrate that wheel flats induce significant vertical wheel-rail impact forces that increase progressively with defect size, translating each additional 10 mm of flat into ≈15 kN of extra impact energy that must be dissipated. Xu et al. [
34] quantified that wheel flats generate vertical forces approximately 80 kN higher in turnout zones compared to interval rails, with these impacts concentrated in the 550–800 Hz frequency band that accelerates infrastructure fatigue. Because that 80 kN spike recurs at every wheel rotation through the frog, the cumulative work lost to vertical impacts can exceed 0.5 MJ per kilometer for a single defective axle. This creates a destructive cycle where damaged components increase rolling resistance for all traffic while promoting further degradation.
Directly quantifying the share of total traction energy attributable specifically to wheel-flat impacts is challenging, because the additional losses depend on operating and infrastructure conditions, including speed, axle load, defect geometry, track irregularity level, and traction control strategy. Nevertheless, quantitative evidence indicates that increased dynamic excitation in the vehicle–track system leads to higher dissipative energy associated with vibration and friction. For example, multibody simulations under different track irregularity classes reported higher total dissipated energy on rougher tracks, with an increase of 28% between selected classes, and the wheel–rail contact contributing the largest proportion of the energy loss, at 51–57% [
63]. Wheel flats represent a localized and repeatable impact-like excitation at the wheel–rail interface. Accordingly, these findings support the plausibility of an energy-penalty mechanism mediated by increased contact-related dissipation, while the wheel-flat-specific magnitude still requires dedicated traction-energy measurements or coupled vehicle–track–traction simulations under clearly defined boundary conditions.
Part of the wasted mechanical input is also radiated as sound. From an acoustic perspective, Komorski et al. [
16] measured sound pressure level (SPL) differences of 8.9–13.6 dB between wheels with flats and healthy wheels at various harmonic frequencies, equivalent to roughly a three-fold increase in acoustic power and a non-recoverable diversion of drive energy. These acoustic emissions indicate substantial mechanical energy losses that directly impact traction efficiency and energy consumption.
The safety implications are equally critical. As noted by [
35], wheel flats may cause shaft temperatures to rise sharply, potentially leading to hot axle failures. Elevated axle temperatures also raise rolling-bearing friction, increasing locomotive power demand by an estimated 1–2% while a defect is propagating. Additionally, the large impact vibrations can cause existing fatigue cracks in axles and wheel treads to expand rapidly, creating risks of catastrophic cold axle failures. Bernal et al. [
43] demonstrated that even small flats generate impact forces that propagate through vehicle structures, with wheelset acceleration peaks increasing systematically with both flat size and vehicle speed; damping of those high-frequency vibrations adds 40–60 W of parasitic thermal loss per bogie for a moderate (20 mm) flat.
Early detection enables timely intervention that preserves both vehicle and infrastructure integrity. Mosleh et al. [
49] emphasized that wheel defects can induce damage to railway tracks, increasing considerable maintenance costs for both railway administrations and rolling stock operators. Their research demonstrated that a single sensor can detect defective wheels automatically, reliably distinguishing healthy wheels from defects in the 5–20 mm range—well before impact-energy losses are overwhelmingly exceeded and long before the traction-efficiency penalty becomes significant.
The economic justification for comprehensive wheel-flat detection systems is compelling. From the perspective of safety, economy, and maintenance strategy, early detection of wheel flats and the determination of their length has significant benefits [
35]. Undetected flats progress from minor defects to severe damage that compromises both vehicle components and track infrastructure. The ability to detect extremely small flats—such as those causing only about 2% increase in peak wheel-rail force [
55]—enables preventive maintenance before defects reach critical dimensions.
Multiple studies confirm that condition-based maintenance approaches are more cost-effective than traditional periodic inspections or reactive maintenance strategies [
59]. By enabling targeted interventions based on actual wheel condition rather than fixed schedules, detection systems optimize maintenance resource allocation while maximizing component service life. For railways positioning themselves as sustainable transport options, minimizing energy waste through preventive maintenance becomes essential. Wheel-flat detection thus represents not merely a safety measure but a fundamental efficiency optimization, preserving the low energy intensity that makes rail transport environmentally and economically competitive.
5.6. Integration of Wheel-Flat Detection with Predictive Maintenance Frameworks
The energy efficiency and infrastructure benefits outlined above can only be captured through systematic integration of detection technologies into CBM frameworks. Recent research illuminates both the technical requirements and implementation challenges of this integration.
CBM architectures for wheel-flat management require three integrated layers. The data acquisition layer processes continuous sensor streams, while analytical algorithms extract and quantify defect signatures. The critical decision layer translates these outputs into maintenance actions—generating work orders and scheduling interventions based on severity thresholds [
36]. Shim et al. [
42] demonstrated that success depends on this end-to-end integration rather than detection capability alone.
Technical challenges emerge in scaling from demonstration to operational deployment. Processing fleet-wide data streams necessitates hierarchical architectures that balance computational efficiency with detection reliability. Research shows that tiered approaches—rapid screening followed by detailed analysis of suspected defects—enable real-time monitoring without prohibitive infrastructure requirements [
42].
Recent advances in industrial IoT provide crucial enabling technologies for railway CBM. Fifth-generation wireless networks, as specified in [
64], demonstrate sub-millisecond latency through ultra-reliable low-latency communications with 99.999% reliability, supporting up to 1 million devices per km
2—enabling comprehensive trackside sensor coverage. Edge computing platforms, with NVIDIA’s Jetson Orin series documented in Jetson Solutions Book [
65] as an example, deliver 20-275 TOPS AI performance, processing 1000+ vibrations per second locally with 2–5 W power consumption. Aviation sector implementations demonstrate substantial operational improvements with CBM systems, providing proven frameworks achieving significant reduction in unplanned downtime through integrated vehicle monitoring systems [
66]. Railway-specific digital twin implementations demonstrate great potential across different approaches. Erdozain et al. [
55] developed a wayside detection system combining finite element modeling with machine learning, achieving stable force prediction with RMSE below 5% in field validation. At the infrastructure level, Sresakoolchai and Kaewunruen achieved a 21% reduction in maintenance activities and a 68% reduction in defect occurrence using Advantage Actor-Critic algorithms validated over 4 years across 30 km of track [
67]. These complementary approaches—component-level detection and network-wide optimization—illustrate how digital twins can enhance CBM at multiple scales. Beyond wheel-focused diagnostics, infrastructure-level condition monitoring can be strengthened by integrating rail non-destructive evaluation outputs into the same CBM and digital twin pipeline (an example of such integration is shown in
Figure 11). In particular, in situ residual-stress assessment using non-destructive methods such as X-ray diffraction, ultrasonic, and magnetic or electromagnetic measurements provides complementary rail-integrity indicators that can be fused with wayside or onboard wheel-flat detection to support more robust, system-level predictive maintenance [
68]. The integration architecture leverages Multi-Access Edge Computing to reduce processing latency to sub-10 ms while achieving 62% reduction in energy consumption through optimized scheduling [
69].
Another aspect to be considered is optimal intervention thresholds. While studies demonstrate detection of 20–30 mm flats, limited research addresses how defect progression rates vary with operational conditions, making it difficult to optimize intervention timing. Ye et al. [
35] provided quantitative length measurements but acknowledged uncertainty in translating static measurements to remaining useful life predictions.
The integration of flat spot detection with other systems is in line with the eMaintenance trend, which has become increasingly popular in the railway industry in recent years [
70,
71,
72,
73]. An example of this is the use of RFID tag reading and its transfer to cloud systems (as shown in
Figure 12). It would be possible to synchronize data from the flat spot measurement system with vehicle data, which would enable the creation of a measurement–vehicle pair with the simultaneous transfer of important data such as speed.
Translating detection outputs into actionable maintenance decisions requires a severity- and confidence-aware decision logic rather than a single alarm flag. In practice, the detection module should output at least a calibrated severity indicator, such as estimated flat length or an equivalent damage index, together with an uncertainty or confidence score, and contextual metadata including speed, axle load proxy, and track segment identification. These outputs can then be mapped to maintenance-relevant action bands, for example, “monitor,” “inspect at next depot,” or “reprofile within a defined mileage or time window,” using risk-based thresholds that balance false alarms against the cost of missed detections. Because operational conditions and background excitation vary, thresholds should be adaptive or stratified by operating regime, and they should be periodically recalibrated using inspection feedback. Finally, a closed-loop workflow is required, in which confirmed maintenance outcomes are logged and fed back to the analytics layer to update calibration, monitor drift, and improve decision reliability over time.
Organizational factors receive insufficient attention in the technical literature. While researchers emphasize detection accuracy, few studies examine how maintenance organizations adapt to continuous condition data. The shift from scheduled tasks to dynamic interventions requires new competencies and decision frameworks that remain largely unexplored in wheel-flat detection research.
Current implementations suggest that CBM integration remains in early stages, with most studies focusing on detection technology rather than complete maintenance systems. This gap between technical capability and operational implementation represents a critical area for future research, particularly in developing decision support tools that translate detection outputs into optimal maintenance strategies.
5.7. Future Research Directions and Emerging Technologies
The critical gap between laboratory demonstrations and operational deployment demands urgent attention. Despite high detection accuracies, the absence of real-world validation with naturally occurring wheel flats—acknowledged across studies [
23,
39]—remains the primary barrier to implementation.
Emerging solutions address specific constraints. Edge computing architectures proposed by Shaikh et al. [
23] could resolve computational bottlenecks through distributed processing, enabling real-time fleet monitoring. Future digital twin implementations could leverage the comprehensive dynamics models already developed for wheel-flat analysis. While Erdozain et al. [
55] demonstrated successful digital twin implementation for wayside detection, extending this approach to network-wide monitoring presents opportunities for comprehensive asset management. This approach demonstrates how physical models can be combined with machine learning for operational deployment. Similar foundations exist in the multibody dynamics frameworks with detailed vehicle-track interaction modeling [
1,
43]. These validated models could form the core of predictive digital twins that combine real-time sensor data with physics-based simulation to optimize intervention timing—addressing a key CBM limitation.
Advanced architectures show particular promise. Transformer models with sensor fusion achieved 0.0069 mm error rates [
34], yet scaling to operational systems remains challenging. Federated learning could accelerate this transition by enabling cross-operator collaboration without sharing sensitive data, addressing the validation barrier that limits current research.
Building on the CBM infrastructure advances discussed earlier, emerging technologies offer solutions to specific implementation barriers. Network slicing capabilities [
74] enable simultaneous support for high-bandwidth visualization, medium-latency sensor networks, and ultra-low latency control systems on shared infrastructure—allowing phased deployment without dedicated networks. For the critical challenge of limited fault data, [
75] demonstrates that autoencoders achieve >90% accuracy with minimal training samples, while multi-working condition VAE systems maintain 92–96% accuracy across varying operational conditions. Energy harvesting advances beyond traditional batteries, with [
76] reporting hybrid triboelectric-piezoelectric systems achieving 139.39 μW RMS power with broadband frequency response suitable for railway vibrations. The use of this solution would enable the creation of an autonomous vibration processing system (simple schematics shown in
Figure 13). Looking toward future capabilities, [
77] describes elimination of central server requirements through peer-to-peer learning architectures, while physics-informed neural networks reviewed in [
78], already proven in the aerospace industry, integrate mechanical with data-driven principles, which would allow improved generalization with limited railway-specific training data. The progression toward 6G networks, as outlined in emerging communications research, promises integrated sensing and communication capabilities with native AI support in network infrastructure.
Cost-effective MEMS sensors with energy harvesting and standardized IoT frameworks could enable network-wide deployment. However, as demonstrated throughout this review, the persistent gap between technical capabilities and operational implementation suggests that technology alone is insufficient—coordinated industry efforts for validation protocols and organizational transformation are equally critical.
Future research must bridge technical innovation with practical implementation, developing not only accurate detection algorithms but also complete systems addressing the real-world constraints of scalability, reliability, and operational integration.