1. Introduction
The rail sector plays a crucial role in modern transportation systems, providing an efficient, reliable, and sustainable means of moving passengers and freight over long distances. To ensure the safety and efficiency of rail operations, it is vital to prioritize the integrity of railway infrastructure, particularly the wheels and tracks. Defects in the wheels, like flats, can significantly degrade vehicle–track interaction, resulting in premature component wear, increased vibration levels, and reduced ride comfort. Therefore, early detection and classification of wheel defects are critical for enabling timely maintenance actions and preventing potential derailments and service disruptions.
Conventionally, railway operators have relied on regular manual inspections; however, advances in sensing technologies have led to the development of more comprehensive condition monitoring systems. Two primary monitoring strategies are currently employed: wayside and onboard monitoring. Wayside systems [
1], which involve sensors positioned along the track, allow continuous monitoring of all trains passing through a given location, with relatively low maintenance costs and no need for train retrofitting. In contrast, onboard systems, which use sensors placed on the train itself [
2], provide detailed and real-time information about the condition of the train. However, these systems are generally more expensive to install and maintain, and typically monitor only a single train at a time rather than multiple trains.
Both wayside and onboard monitoring systems are increasingly benefiting from the introduction of Artificial Intelligence (AI), which is revolutionizing the way rail networks ensure safety, efficiency, and reliability [
3]. AI-based systems can process and analyze vast amounts of sensor data in real time, enabling immediate decision-making and action, while helping to maintain system integrity with minimal disruption.
Several studies in the literature report the application of machine learning (ML) methods applied to the identification of out-of-roundness (OOR) wheel defects, in particular wheel flats. This type of methodology usually involves feature extraction techniques using data from acceleration or strain measurements and classifier methods to identify the defects [
4]. Common feature extraction techniques include Auto-Regressive (AR) models [
5] and wavelet transforms [
6]. For feature classification, recent studies have increasingly applied machine learning methods to distinguish between healthy and defective wheels [
7,
8]. Typically, two different approaches are used: (i) supervised methods, where the algorithm learns from labeled training data [
9,
10,
11,
12], and (ii) unsupervised methods, where the model is trained using data without labeled outputs [
4]. Partitional and density-based clustering methods, such as
k-means [
13] and DBSCAN [
14], are widely used for classification and pattern identification in complex datasets. While supervised learning typically offers higher accuracy due to the availability of labeled data, it requires significant effort for data annotation. In contrast, unsupervised learning avoids labeling costs and is more scalable, though often at the expense of classification accuracy and interpretability.
Advances in computing power have driven increased use of deep learning techniques for complex pattern recognition tasks. In particular, Convolutional Neural Networks (CNNs) have been widely adopted for classification and image-based applications [
15,
16] and have recently been applied to wheel failure detection using wheel tread images [
17,
18,
19].
Xing et al. [
19] developed a machine vision method based on an enhanced YOLOv3 framework, incorporating convolution layers for wheel flat detection. Trilla et al. [
18] developed a wayside system based on an image-processing technique using CNN to detect and diagnose tread defects, including wheel flats. Zhang et al. [
17] proposed a deep learning algorithm to improve upon existing YOLO models by incorporating optimized convolutional layers for detecting small defects. Liao et al. [
20] present an effective application of evolutionary optimization to balance multiple performance metrics.
Therefore, these reported integrated AI techniques in wayside systems have contributed to enhancing the accuracy of wheel flat detection. In the studies by Mosleh et al. [
21] and Mohammadi et al. [
5], which employed unsupervised methodologies based on outlier analysis, no misclassifications were observed in the simulated dataset using only one pair of accelerometers installed on the rail. Ghiasi et al. [
14], using a multistage clustering framework (M-CLUSTER) for unsupervised monitoring, achieved 98% accuracy in detecting train wheel flats using accelerometers. Despite the great effectiveness of ML-based methods in damage detection, only a few can classify wheel flat severities and not very efficiently. Jorge et al. [
22] employed a sparse autoencoder and
k-means clustering technique to classify wheel flats. Subsequently, Mohammadi et al. [
23] extended this work by localizing the defects and classifying the length of flats in trains with multiple defective wheels. In another study, Guo et al. [
24] developed an image-based inspection technique for monitoring wheel tread defects, such as flats, using SVM and a supervised approach for defect classification.
While previous studies have successfully implemented machine learning techniques for wheel flat detection, most methods focus on either unsupervised clustering approaches or supervised classification techniques, often requiring extensive labeled datasets or complex feature extraction processes. For example, Mosleh et al. [
25] used supervised classification on acceleration signals, while Ghiasi et al. [
14] and Mosleh et al. [
4] applied unsupervised clustering. Despite their effectiveness, several critical gaps remain. Few studies provide a direct and quantitative comparison between supervised and unsupervised approaches under identical operational conditions, feature sets, and sensor layouts, making it difficult to objectively assess their relative suitability for wayside deployment. Moreover, the majority of reported systems are oriented toward binary defect detection, while explicit multi-class classification of wheel-flat severity, which is essential for maintenance prioritization, is rarely addressed. Most methods are validated under specific speeds, loads, or environmental conditions, leaving uncertainty about performance across variable railway operations. Prior research emphasizes defect detection but often ignores the classification of wheel flat severity, which is essential for prioritizing maintenance actions. Furthermore, many approaches rely on complex feature extraction, data fusion, or extensive sensor arrays, limiting practical implementation in real-world systems.
Table 1 provides a quantitative benchmarking of representative studies and reported approaches for wheel-flat monitoring. The comparison summarizes key implementation aspects, including the number and type of sensors, investigated speed range, minimum reported detectable flat length (when available), and the methodological capability (defect detection versus severity classification). This overview allows the proposed framework to be positioned with respect to existing state-of-the-art methodologies and typical wayside monitoring configurations.
The benchmarking indicates that most reported approaches rely on multi-sensor configurations and are mainly oriented toward wheel flat detection rather than explicit severity classification. Moreover, a large portion of the existing studies are validated under limited operational conditions, such as restricted speed ranges or controlled loading scenarios, and in several cases focus on relatively large wheel flat sizes. In contrast, the proposed framework demonstrates reliable performance using a reduced number of rail-mounted accelerometers, supports multi-class wheel-flat severity classification, and maintains stable performance at higher vehicle speed (200 km/h). These characteristics indicate a favorable balance between classification capability, sensor economy, and practical applicability for wayside condition monitoring systems.
To address the gaps identified above and evidenced through the benchmarking in
Table 1, this study presents a systematic comparison between supervised (SVM) and unsupervised (
k-means clustering) approaches for wheel-flat identification using identical datasets, features, and sensor configurations. In addition to defect detection, the investigation focuses on multi-class severity classification, which is essential for maintenance prioritization and is rarely addressed in existing wayside systems. Furthermore, a simplified supervised framework is adopted that removes feature normalization and data fusion steps, aiming to reduce computational complexity and enable real-time implementation with a limited number of sensors.
To address these challenges, this study proposes:
A comparative analysis of supervised and unsupervised methods to assess their strengths and weaknesses in wheel flat identification, at train speeds of 120 and 200 km/h;
A sensitivity analysis on sensor placement to optimize the number of sensors while maintaining accuracy;
An implementation-oriented simplified supervised approach that eliminates the need for feature normalization and data fusion, reducing computational complexity while maintaining robust classification performance.
3. Numerical Modeling
This section introduces the virtual track-side monitoring system and presents numerical simulations of train-track interactions, wheel profiles, and track profiles. The numerical simulations of dynamic interactions between trains and tracks are conducted using the in-house software Vehicle–Structure Interaction (VSI), enabling the generation of synthetic measurement responses [
47]. Developed in MATLAB version R2018a [
48], the VSI tool integrates the structural matrices of the track and vehicle, which are initially modeled independently using a finite element (FE) package. Although these subsystems are modeled separately, VSI employs a fully coupled approach to integrate them. The train-to-track coupling is achieved through a 3D wheel-rail contact model based on Hertzian theory [
49], with normal and tangential contact forces including rolling friction creep calculated via the USETAB routine [
50].
The track is modeled through a finite element environment implemented in ANSYS release 19.2 [
51], in which the rail, fasteners/pad, sleeper, ballast, and foundation components are represented through a layered modeling approach that defines their interactions within the track system. The rails are simulated using BEAM181 elements, and the sleepers are also modeled using the same element type, with the rail–sleeper connection defined through rail pads and fastening systems that link the two components. The mechanical behavior of the rail pads, ballast, and foundation is introduced through linear spring-damper elements modeled using COMBIN14, which are defined in the longitudinal, lateral, and vertical directions to represent the force-displacement relations between the rail-sleeper assembly and the supporting layers. The mass associated with the ballast is incorporated through concentrated mass elements modeled using MASS21, which are connected to the sleeper nodes to account for the inertial contribution of the ballast within the numerical model. The coupling between the different layers of the track system is enabled by the COMBIN14 elements, which define the interaction between the rail, sleepers, ballast, and foundation. The mechanical properties assigned to the rail, sleepers, rail pads, ballast, and foundation are summarized in
Table 2, which provides the mechanical parameters used in the numerical simulations.
The railway vehicle considered in this study corresponds to an Alfa Pendular train (Portuguese passenger train) composed of six wagons and is numerically simulated using ANSYS [
51]. The kinematic configuration of the vehicle is defined using rigid beam finite elements, which establish the geometric connectivity between the main structural components without introducing structural flexibility. The car body, bogies, and wheelsets are connected through these rigid elements to reproduce the overall vehicle layout. The inertial properties of the vehicle components are introduced through concentrated mass elements (MASS21) located at the respective centers of gravity of the car body, bogies, and wheelsets. These mass elements account for both translational mass and rotary inertia, allowing the representation of roll, pitch, and yaw motions of each component. The primary and secondary suspension systems are modeled using linear spring–damper elements (COMBIN14), which connect the wheelsets to the bogies and the bogies to the car body, respectively. The suspension elements are defined independently in the longitudinal, lateral, and vertical directions, allowing the directional stiffness and damping characteristics of the suspensions to be applied. Moreover, the stiffness, damping, concentrated mass, and rotational inertia parameters are denoted by
and
, respectively, while subscripts
and
refer to the car body, bogies and, wheelsets. The geometric configuration of the vehicle is described through a set of characteristic distances defining the relative positioning of the main components. The longitudinal, transversal, and vertical distances between the car body, bogies, and wheelsets are represented by the parameters
and
, respectively. The wheel–rail interface geometry is further characterized by the track gauge
and the nominal wheel rolling radius
. The values considered for each parameter in the present study are summarized in
Table 3, which provides the mechanical and geometric parameters used in the numerical simulations.
A comprehensive description of the numerical scheme of the train and track components is available in the study performed by Mosleh et al. [
36]. A schematic of the numerical simulation for the train–track interaction system is represented in
Figure 2a.
The proposed simulated wayside monitoring system’s sensor placement is illustrated in
Figure 2b. The wheel flat identification setup consists of eight accelerometers, located at the midspan of the right and left rails. Measurement points 1 through 4 correspond to the sensors on the right rail (opposite side of the defective wheel), while measurement points 5 through 8 are located on the left rail (side of the damage). Each sensor is spaced 0.6 m apart.
From the literature [
59], it is possible to consider two main shapes for wheel flat geometries. A newly formed wheel flat has sharp edges and appears immediately after the defect is created (
Figure 3a). During operation, continued wheel rolling and tread wear gradually smooth these edges; therefore, the ends of the flat become rounded over time, while the middle part of the flat remains largely unchanged (
Figure 3b).
Different intervals of wheel flat lengths are considered across previous studies on wheel flat identification [
21,
60,
61]. This study simulates defective train wheels by considering two levels of wheel flat severity, classified as Damage 1 and Damage 2, based on the flat length (
). To consider the variability of wheel flats observed in operation,
is modeled as a random variable following a uniform distribution, denoted by
. Following this notation,
is randomly generated between the lower
and upper
boundary, while all values within this interval are equally likely to occur. Accordingly, the flat lengths corresponding to Damage 1 ranged between 10 and 25 mm and are distributed as
. On the other hand, the flats regarding Damage 2 ranged between 28 and 50 mm and are distributed as
. The depth of the wheel flat (
) is determined using Equation (11), where
denotes the radius of the wheel.
Additionally, the vertical profile of a wheel flat is determined using expression 12:
Here, refers to the coordinates that are aligned with the track’s longitudinal direction, and H denotes the periodic Heaviside function.
Track irregularities, although minor, significantly affect wheel–rail contact forces because the rails are not perfectly smooth. Therefore, it is crucial to account for these irregularities, even in numerical simulations. Power spectral density (PSD) curves are generated [
62] to create artificial rail unevenness profiles in both vertical and transverse directions using MATLAB [
48]. In accordance with the European Standard EN 13848-2 [
63], consequently, rail unevenness profiles are generated for spatial wavelengths, covering the D1 (3 m to 25 m) and D2 (25 m to 70 m) wavelength ranges. Since the wavelengths applied to construct the unevenness profile are relatively long in comparison to wheel flats, the excited frequencies due to track unevenness are considerably lower than the dominant frequencies of wheel flat impacts.
6. Wheel Flat Identification Application: Results and Discussion
This section presents a comprehensive analysis of wheel flat identification using both supervised and unsupervised machine learning approaches. A comparative study between the SVM method and
k-means clustering is conducted to evaluate their respective strengths and limitations. Additionally, a sensitivity analysis on sensor placement is performed to optimize the number of sensors required while ensuring high classification accuracy. Furthermore, a simplified preprocessing framework is introduced for supervised classification, which eliminates the need for feature normalization and data fusion. This method significantly reduces computational complexity while maintaining reliable classification performance, offering a more efficient solution for wheel flat identification. To address the need for a clear and reproducible description of the entire workflow,
Table 5 summarizes all preprocessing and classification stages together with their governing parameters and outputs. This unified representation complements
Figure 1 by providing an explicit parameter-level view of the pipeline. All parameters reported in
Table 5 are fully defined and described in the corresponding sections of the manuscript.
6.1. Damage Detection
The process of damage detection in wheel flats involves four key steps, feature extraction, feature normalization, data fusion, and outlier analysis, as illustrated in
Figure 1. These techniques have been extensively studied in previous works by the authors [
5,
21] and are briefly elaborated in this section. The methodology ensures accurate detection of wheel flat damage while accounting for various operational conditions and train speeds.
Feature extraction is the first step in damage detection, converting raw acceleration data from wayside sensors into damage-sensitive indicators. Time-series data are collected from eight accelerometers installed at mid-span positions on both rails. For each recorded passage, 40 statistical features per sensor are extracted using an Auto-Regressive model of order 40, forming a structured three-dimensional dataset of size 250 × 40 × 8 (passages × features × sensors).
Figure 5a and
Figure 6a display the extracted AR model features at sensor position 3 for train speeds of 120 km/h and 200 km/h, respectively. The results show clear ascending and descending patterns for damaged scenarios, with noticeable differences in amplitude. However, at higher speeds (200 km/h), the distinction between baseline and low-severity damage becomes less evident due to increased environmental and operational variations. This highlights the necessity of feature normalization to enhance robustness. Data normalization is a crucial step for mitigating the influence of environmental and operational effects on the extracted features. As illustrated in
Figure 4, distinguishing between defective and healthy wheels directly from time-series signals is a challenging task. This difficulty arises primarily from variations induced by environmental and operational conditions, which introduce additional variability into the feature space. To reduce the impact of these effects and enhance damage sensitivity, the extracted features must be appropriately modeled. In this study, feature normalization is carried out using a latent-variable approach based on Principal Component Analysis (PCA). By applying PCA to the dataset, external variations are effectively suppressed, allowing damage-sensitive features to remain dominant and improving the robustness of wheel flat identification. A feature matrix of size 8 × 40 is constructed for each passage, and the first two principal components (accounting for over 80% of the variance) are removed to enhance sensitivity to damage.
Figure 5b and
Figure 6b illustrate the impact of PCA-based feature normalization, showing a significant reduction in variability caused by environmental and operational conditions. However, despite this improvement, the separation between damaged and undamaged wheels remains ambiguous, necessitating an additional data fusion step.
To further enhance damage detection accuracy, MD is employed as a damage index (DI). The MD calculation consolidates 40 PCA-normalized features per sensor into a single indicator that quantifies the deviation of a passage from the baseline condition. The output is an 8 × 250 matrix, where each entry represents the MD-based damage index for a specific sensor and train passage.
Figure 5c and
Figure 6c illustrate the effectiveness of this approach, providing a clearer visual separation between baseline and damaged scenarios. While this step successfully captures variations between damaged and undamaged wheels, the differences in MD amplitudes across sensors necessitate further validation through confidence boundary analysis to ensure consistent and reliable damage detection.
To classify a passage as healthy or damaged, an outlier detection algorithm is applied based on CB estimation. The squared Mahalanobis Distance values are approximated using a chi-squared distribution, allowing the damage indices to be modeled as a Gaussian distribution. The inverse cumulative distribution function (ICDF) is then used to determine a 1% CB threshold.
Figure 5d and
Figure 6d illustrate the final damage detection results, comparing the computed damage indices with the established confidence boundaries. The analysis shows that baseline scenarios (the first 120 passages) remain within the confidence boundary, confirming healthy wheel conditions, while damaged scenarios (passages 121–250) exceed the confidence boundary, accurately detecting wheel flats. The green dots in
Figure 5d and
Figure 6d represent damaged wheel scenarios, and the proposed methodology can distinguish between damaged scenarios (green dots in
Figure 5d and
Figure 6d) and baseline conditions (black dots in
Figure 5d and
Figure 6d). All damage scenarios are correctly identified without false positives or negatives, demonstrating the effectiveness of the methodology. Furthermore, it is observed that damage detection is reliable with a single sensor, regardless of defect severity or position. This highlights the potential for sensor optimization, which is further investigated in the following sections.
6.2. Classification Approach Using k-Means
After successfully detecting wheel flat damage, the next step is to classify the severity of the defect. This section presents the classification results obtained using the k-means clustering technique, an unsupervised learning method. The analysis is performed for vehicle speeds of 120 km/h and 200 km/h, using eight accelerometers installed along the track. The goal is to evaluate the performance of k-means clustering in identifying different levels of wheel flat severity based on sensor data.
Figure 7a shows the classification results for accelerometer 3 for the vehicle speed of 120 km/h, while the three damage classes are well-separated, with clear boundaries between the baseline and both levels of damage. The clustering process successfully distinguishes between Damage 1 and Damage 2, indicating that the extracted features are highly sensitive to defect severity. These results confirm that at 120 km/h,
k-means clustering provides highly accurate classification without any misclassification.
Figure 7b illustrates the classification results when the vehicle passes at 200 km/h. The performance of the clustering method declines at this higher speed due to increased noise and external variations. Damage classification for accelerometer 3 (
Figure 7b) at 200 km/h show a significant overlap between baseline and damaged scenarios. The presence of misclassified points suggests that the extracted features become less reliable at higher speeds. The overlapping regions indicate that, at higher train speeds, the damage sensitivity of individual sensors decreases, leading to lower clustering accuracy. The main challenge at 200 km/h is the increased environmental and operational variability, leading to overlapping damage index distributions, and since the
k-means clustering is an unsupervised approach, the pattern in the amplitude variation in the damage indices is not clearly captured by the algorithm, which affects the feature distribution and reduces classification accuracy. As a result, a sensor fusion approach is applied in the next step to enhance the robustness of the classification process.
To improve classification accuracy at 200 km/h, a sensor fusion technique is applied, merging the information from all eight accelerometers. The results are presented in
Figure 7c. The clustering performance does not show a substantial improvement. The distinction between different damage severities remains unclear, indicating that external noise and operational variability continue to impact classification accuracy. Since sensor fusion alone does not significantly enhance classification performance, an additional step is introduced to refine the analysis. To further improve clustering accuracy, baseline (healthy) scenarios are removed from the dataset in the next stage of analysis. The rationale behind this approach is that eliminating non-damaged cases will allow the clustering algorithm to focus exclusively on distinguishing between different levels of wheel flat severity, thereby minimizing the influence of external variations. The results of this refined approach are discussed in the following section.
Figure 8 presents the classification results for wheel flat severities considering feature fusion for accelerometers 3 (
Figure 8a), while the outputs for sensor fusion are described through
Figure 8b as representative examples at a train speed of 200 km/h. In this analysis, only damaged scenarios are considered, while the features from all eight sensors are merged through feature fusion. Despite the expectation that removing baseline (healthy) scenarios would improve classification accuracy, the results indicate no significant improvement. Note that damage 1 and damage 2 are separated by a vertical red line in the figure. As shown in the figure, the overlap between different damage severity levels persists, suggesting that even with feature fusion and baseline exclusion, the
k-means clustering approach continues to struggle to achieve precise classification at higher speeds.
To further enhance the sensitivity of the extracted features to wheel flat damage, the defect classification at 200 km/h is re-evaluated using merged data after sensor fusion, considering only damaged scenarios. As illustrated in
Figure 8b, sensor fusion leads to an increase in classification accuracy. However, despite this improvement, the defect classification remains imprecise, with some degree of misclassification still present. These results indicate that while merging sensor data improves overall feature sensitivity, it does not completely resolve the challenges associated with high-speed classification using
k-means clustering.
6.3. Classification Approach Using SVM
The results from the previous section demonstrated a clear distinction between damaged and healthy wheel scenarios at a train speed of 120 km/h. However, at higher speeds (200 km/h), the separation between baseline and damaged cases became less evident due to environmental and operational variations. To address this limitation and improve classification accuracy, this section presents a supervised SVM-based approach. Different kernel functions, including Radial Basis Function (RBF), Polynomial (P), Linear (L), and Gaussian (G), are tested to optimize classification performance [
45]. The dataset used for training and testing consists of 50 simulations each for healthy wheels, Damage 1 (mild severity), and Damage 2 (severe flat length). Furthermore, 10-fold cross-validation is used to evaluate the performance of the SVM classifiers. This approach allows each sample to be used for both training and testing, reducing variability and providing a more robust estimate compared to a single random split, such as 80% training and 20% testing. The dataset is divided into 10 equal segments, where in each iteration, nine folds are used for training and the remaining fold is used for validation. This process is repeated 10 times, with a different fold used for validation in each iteration. The confusion matrices reflect the testing results across all folds. It should also mention that default SVM kernel parameters were employed (BoxConstraint = 1, KernelScale = ‘auto’, PolynomialOrder = 3 for the Polynomial kernel, and KernelOffset = 0 where applicable), as they provided satisfactory performance without the need for further hyperparameter optimization.
Figure 9 presents the classification results obtained using SVM for both 120 km/h and 200 km/h train speeds in the form of confusion matrices. In addition to the class-wise prediction counts, the last column and last row report precision, false discovery rate (FDR), recall, and false negative rate (FNR), while the final cell shows the overall accuracy and error rate. It should be noted that, since the dataset is class-balanced and the task is a single-label multi-class classification problem, the micro-averaged F1-score is numerically equal to the overall accuracy. The analysis considers feature fusion from all eight installed accelerometers along the track. SVM provides accurate classification across all kernel functions, with no misclassifications observed at a training speed of 120 km/h. The method successfully distinguishes between healthy wheels and different levels of damage severity. The classification accuracy remains high when the train passes at 200 km/h, achieving 99.3% across all kernel functions. Despite the increased noise and external variations at higher speeds, the classifier is trained using labeled data and learned decision boundaries that effectively separate damage-related features from variations induced by speed, noise, and sensor-specific responses, resulting in significantly fewer misclassifications than the unsupervised
k-means clustering (
Figure 8). These results indicate that SVM, when applied with multiple sensors, ensures accurate classification regardless of train speed. Given its superior performance, the R kernel is selected for further analyses in subsequent sections.
6.4. Comparison of Unsupervised and Supervised Methods: Influence of Sensor Position on Wheel Flat Identification Accuracy
This section evaluates the performance of unsupervised (
k-means clustering) and supervised SVM methods for classifying wheel-flat severities across different sensor layouts. The primary objective is to determine the influence of accelerometer positioning on classification accuracy and assess the feasibility of reducing the number of sensors while maintaining reliable defect identification. To analyze the impact of sensor positioning, classification results are first examined separately for sensors located on either side of the damage.
Figure 10 and
Figure 11 illustrate the accuracy of
k-means clustering for train speeds of 120 km/h and 200 km/h, respectively. Note that in
Figure 10, the baseline scenarios are separated from the defective scenarios by a vertical red line, whereas in
Figure 11, the same marker is used to distinguish Damage 1 from Damage 2. For the speed of 120 km/h, all train passages are included (
Figure 10), while for the speed of 200 km/h, the healthy train passages are excluded (
Figure 11).
As illustrated in these figures, damage clustering analysis proceeds without misclassification when using sensors positioned on the opposite side of the damaged wheel, for both 120 km/h and 200 km/h (
Figure 10a and
Figure 11a). This suggests that placing sensors on the opposite rail provides more stable and accurate damage classification by capturing a clearer response to the dynamic interaction between the wheel and rail. However, when sensors are positioned on the same side as the damage, classification performance declines, particularly at higher speeds. At 120 km/h, the classification still maintains reasonable accuracy, but a limited number of misclassifications appear (
Figure 10b). As the train speed increases to 200 km/h, the number of misclassified damage cases increases significantly (
Figure 11b), indicating that at higher speeds, local vibrations and external noise interfere with the clustering process, making accurate damage classification more challenging. These findings highlight the sensitivity of the unsupervised
k-means method to sensor placement, where sensors on the opposite side of the damage produce more reliable results, while those on the same side suffer from higher misclassification rates, particularly at elevated train speeds. The superior performance of sensors located on the opposite side of the damaged wheel is attributed to the spatial filtering effect of the rail–sleeper system, which captures a more stable structural response, whereas sensors on the same side are more influenced by localized high-frequency impact forces that increase feature variability and reduce classification separability.
Figure 12 presents the defect classification results using the SVM method with the R kernel function, analyzing the impact of sensor placement on classification accuracy. The classification is performed separately for sensors on either side of the track to assess whether sensor location affects the robustness of the supervised approach.
Figure 12a,b illustrate the damage classification results using sensors positioned on the opposite side of the damaged wheel (sensors 1–4) for train speeds of 120 km/h and 200 km/h, respectively. The results demonstrate high classification accuracy, with a clear distinction between baseline (healthy) wheels and different levels of wheel flat severity. The ability of the SVM method to maintain precise classification for both speed conditions indicates its robustness and reliability in damage detection, even when sensors are positioned away from the defect location.
On the other hand,
Figure 12c,d display the classification results using sensors located on the same side as the damaged wheel (sensors 5–8), considering both train speeds. The results indicate that the classification accuracy remains consistent with the previous cases, showing that the supervised method is unaffected by sensor placement. Unlike the unsupervised approach, which exhibited a strong dependency on sensor location, the SVM method provides stable and reliable classification performance, regardless of whether the sensors are positioned on the same or opposite side of the defect.
These findings confirm that, as the classifier is trained using labeled data and learned decision boundaries that remain consistent across different sensor locations, the supervised method delivers robust results for flat severity classification across different sensor layouts and speeds when four sensors are installed on each side of the track. In contrast, the unsupervised method demonstrated sensitivity to sensor positioning, particularly at higher speeds, where misclassifications increased significantly when sensors were placed on the same side as the defect. This highlights the advantage of the supervised approach, as it ensures accurate classification without being influenced by sensor placement or train speed.
To further optimize the number of sensors required for accurate damage classification, a single-sensor approach is evaluated to determine if it is possible to reduce sensor usage while maintaining classification accuracy. As discussed earlier in this section, two different sensor layouts were initially tested: one configuration with eight sensors, four on each side of the track, and another setup considering four sensors on each side separately. Following these evaluations, the setup is further reduced to a single sensor, leading to the selection of sensors 3 and 7 for damage classification using SVM. This step aims to analyze the feasibility of achieving reliable classification performance with minimal sensor deployment.
Figure 13 illustrates the damage classification results using accelerometers 3 and 7 separately for both 120 km/h and 200 km/h train speeds. As observed in
Figure 13a,c, the classification accuracy is higher at 120 km/h, whereas at 200 km/h (
Figure 13b,d), the accuracy decreases due to increased noise and environmental variability. Moreover, a key observation from
Figure 13a,b is that when the single sensor is positioned on the opposite side of the damage, the flat severity classification achieves higher accuracy compared to when the sensor is installed on the same side as the defect (
Figure 13c,d). This trend holds regardless of vehicle speed, confirming that sensor placement significantly influences classification performance. Additionally, the sensitivity to the sensor location becomes more pronounced at higher speeds (200 km/h), further emphasizing the challenges associated with reduced sensor configurations.
Despite these variations, the results indicate that even with a single sensor, damage severity can still be classified with robust accuracy using the SVM method, achieving a minimum accuracy of 92% in the worst-case scenario. This finding highlights the feasibility of a cost-effective monitoring system that minimizes the number of required sensors while maintaining reliable classification capabilities. However, as noted earlier, using four sensors on each side of the track provides consistent classification performance without sensitivity to sensor placement or train speed, whereas reducing the number of sensors to one introduces sensitivity to the side of the track, particularly at higher speeds. This trade-off between sensor optimization and classification stability must be carefully considered when designing wayside monitoring systems for railway applications.
To verify that the performance of the single-sensor configuration is not caused by random cross-validation variability, a paired statistical comparison was conducted using fold-wise accuracies obtained from identical 10-fold partitions for both single-sensor and multi-sensor configurations. The Wilcoxon signed-rank test indicated a statistically significant difference (p = 0.0078 < 0.05), which was further confirmed by the paired t-test (p = 0.0032 < 0.05). Since both models were evaluated on the same folds, the observed performance gap cannot be attributed to random sampling variation. The results therefore demonstrate that the single-sensor configuration consistently achieves accuracies above 92% while maintaining statistically reliable classification performance.
Table 6 and
Table 7 present a comparative analysis of SVM (supervised) and
k-means clustering (unsupervised) methods, evaluating their classification accuracy, sensitivity to sensor placement, and misclassification rates under different sensor configurations and train speeds. Moreover, the accuracy, precision, recall, and F1 score for different fault levels are evaluated to reveal class-specific detection capability. The results show that SVM consistently outperforms
k-means clustering, achieving higher accuracy and greater robustness across sensor placements and speeds. Additionally, the table demonstrates that reducing the number of sensors to a single unit remains feasible with SVM. These findings reinforce that SVM is the more reliable approach for damage classification in railway condition monitoring systems. The decrease in
k-means accuracy at 200 km/h is mainly due to increased dynamic noise and vibration levels, which reduce the stability of the extracted AR features and lead to greater overlap between damage classes. In addition, speed-dependent shifts in dominant excitation frequencies alter the signal content, decreasing the separability of clusters in the distance-based algorithm. These effects, combined with the larger variability of high-speed responses, explain the reduced performance of the unsupervised method compared to the supervised SVM approach.
6.5. Simplified Preprocessing Framework for SVM-Based Classification
In the original unsupervised framework, preprocessing steps such as PCA-based feature normalization and data fusion using Mahalanobis Distance were primarily introduced to reduce feature dimensionality and mitigate redundancy, which is essential for stabilizing outlier analysis and k-means-clustering-based damage detection. In contrast, in the supervised framework adopted in this study, these steps are not strictly required. SVMs are inherently capable of handling moderately high-dimensional feature spaces and identifying optimal separating hyperplanes without the need for prior dimensionality reduction, provided that the input features are properly scaled. Moreover, the AR-model-based features used in this work are already compact and damage-sensitive; therefore, additional PCA or distance-based data fusion did not lead to measurable improvements in classification performance within the supervised scheme. Removing these steps simplifies the classification pipeline while maintaining comparable accuracy and interpretability. To enhance damage classification efficiency while reducing computational complexity, a simplified preprocessing framework is introduced for supervised SVM classification. Classical SVM models often require extensive preprocessing steps, including feature normalization and data fusion, to optimize classification accuracy. However, these additional steps increase the computational load and may not be practical for real-time railway monitoring applications. The proposed simplified framework eliminates the need for feature normalization and data fusion, significantly streamlining the classification process without compromising accuracy.
The key advantage of this approach lies in its ability to directly utilize raw extracted features for classification. By removing normalization steps, the model maintains high accuracy while reducing processing time. Additionally, eliminating data fusion minimizes dependencies on multiple sensors, making the system more adaptable to real-world implementations where a reduced number of sensors is desirable.
The results demonstrate that the simplified framework maintains robust classification performance through SVM, achieving comparable accuracy to the classical SVM approach. Even with minimal preprocessing, the model effectively distinguishes between different levels of damage severity, confirming the feasibility of this optimization. Furthermore, the simplified approach retains its effectiveness across varying train speeds, ensuring consistent performance under different operational conditions.
Figure 14 and
Figure 15 illustrate the defect classification results obtained using the simplified framework with accelerometers 3 and 7, both separately and simultaneously, under different vehicle speeds.
Figure 14 presents the classification results when sensors 3 and 7 are used simultaneously for vehicle speeds of 120 km/h and 200 km/h. The results demonstrate that flat severity classification achieves high accuracy regardless of the vehicle speed. Furthermore, classification precision improves at 200 km/h, suggesting that the model benefits from higher dynamic interactions at higher velocities.
Figure 15 provides a detailed comparison of the classification performance when sensors 3 and 7 are analyzed separately. The findings indicate that at 120 km/h, the defect classification is not sensitive to the side of the damage (
Figure 15a,c). However, at 200 km/h, classification accuracy varies with sensor placement, with sensor 3 performing better than sensor 7 (
Figure 15b,d). This suggests that sensor location becomes more critical at higher speeds, affecting classification precision in the simplified framework.
The comparison between the classical SVM and the simplified framework, as illustrated in
Figure 13 and
Figure 15, highlights differences in classification accuracy. For sensor 3, at a train speed of 120 km/h, the classical SVM (
Figure 13a) achieves 100% accuracy, whereas the simplified approach (
Figure 15a) slightly decreases to 96%. At 200 km/h, the accuracy of the classical SVM (
Figure 13b) remains at 96%, whereas the simplified framework (
Figure 15b) reaches 98.7%. Similarly, for sensor 7, the classical SVM method (
Figure 13c,d) attains 97.3 and 92% accuracy at 120 km/h and 200 km/h, respectively, while the simplified approach (
Figure 15c,d) shows a slight drop to 96% and 91.3%. These results indicate that while the simplified method slightly reduces classification accuracy, the impact remains minimal. Given its reduced computational complexity and faster processing time, the simplified framework remains a viable alternative, particularly for real-time railway monitoring applications where efficiency is a priority. It should be noted that the reported classification results are obtained from numerical simulations conducted under controlled and class-balanced conditions, in which noise levels were explicitly introduced to better reflect realistic system behavior. In this context, the micro-averaged F1-score is equivalent to the overall classification accuracy, and the presented confusion matrices provide class-wise precision, recall, and error rates, which implicitly reflect variability in the model predictions. While these results demonstrate the effectiveness of the proposed SVM-based framework under controlled numerical conditions, performance degradation may be expected when applying the methodology to real experimental data, due to additional environmental variability, loading uncertainties, and other sources of operational noise that are not fully captured in the numerical model. Therefore, caution should be exercised when interpreting near-perfect classification accuracy obtained from simulated data, and further validation using real measurement datasets is required to assess the generalization capability and robustness of the proposed approach.
7. Conclusions
This study evaluated supervised (SVM) and unsupervised (k-means) machine learning approaches for wayside detection and severity classification of railway wheel flats at train speeds of 120 and 200 km/h. The results show that the classical SVM approach provides consistent and reliable classification across the investigated scenarios, achieving accuracies above 99% with four sensors and above 92% using a single-sensor configuration. In contrast, the k-means clustering method exhibited pronounced sensitivity to train speed and sensor placement, with its accuracy decreasing to 75.2% at 200 km/h even when eight sensors were employed. It should be emphasized that these conclusions are derived from high-fidelity synthetic data generated through numerical simulations, and performance levels may differ under real-world operational conditions.
A sensitivity analysis on sensor placement confirmed that reliable detection can be achieved with a reduced number of accelerometers, particularly when sensors are positioned on the opposite side of the defect. This finding supports significant sensor optimization and associated cost reduction for practical wayside deployments.
A simplified preprocessing framework was introduced, eliminating feature normalization and data fusion while maintaining high accuracy (91.3–98.7%). The reduced computational burden makes the approach suitable for real-time edge implementation in existing Wheel Condition Monitoring System infrastructures. The proposed workflow can be directly embedded in current wayside condition monitoring systems, where vibration signals from rail-mounted accelerometers are processed on-site to extract AR features, followed by real-time SVM classification and transmission of severity alerts to the maintenance platform.
Overall, supervised learning—especially SVM and the simplified framework—proved superior for wheel-flat severity classification, enabling accurate and cost-effective monitoring with minimal sensor configurations. Compared with typical multi-sensor wayside installations, the optimized configuration offers substantial reductions in hardware, cabling, installation effort, and long-term maintenance costs per monitoring site.
The proposed methodology will be tested and validated in an upcoming field trial through ongoing projects involving different types of trains (passenger and freight), where its performance will be rigorously assessed under real operational conditions. Further developments will incorporate modeling uncertainties, sensor errors, and environmental effects to enhance robustness.