This section evaluates the effectiveness of the proposed method using real power data collected from S700K switch machines. It includes parameter determination, feature analysis, clustering evaluation, and comparative experiments.
4.1. Dataset Description
A total of 1000 switching power curves from S700K railway switch machines were used as experimental data. The signals were collected at several railway stations in Guangzhou, China, between 2016 and 2018 during routine turnout operations. Although all samples were recorded under normal operating conditions, they reflect varying levels of performance degradation accumulated during long-term service and thus provide representative data for degradation state analysis.
Each power curve corresponds to a complete switching cycle of the switch machine. The sampling interval was 0.04 s, yielding 137 sampling points per signal (equivalent sampling frequency: 25 Hz). All measurements were obtained under consistent operating conditions with standard power supply and normal mechanical load to ensure data comparability.
Before feature extraction, all signals were normalized to mitigate operational variations. The processed dataset was then used for multi-domain feature construction, UMAP-based feature fusion and dimensionality reduction, and K-means++ clustering to identify degradation states. The clustering procedure was fully unsupervised, and no fault labels were introduced. The resulting degradation stages were subsequently interpreted according to engineering knowledge of switch machine performance evolution.
4.5. Determination of Optimal Cluster Number
In the analysis of switch machine power data, the K-means++ clustering algorithm is employed to partition the performance degradation states of the switch machine. The analyzed data cover the complete evolution process of power-related characteristics as the switch machine transitions from normal operating conditions toward degraded performance states. Previous studies and maintenance practice indicate that the degradation process of switch machines can typically be divided into several sequential stages. Based on this observation, the UMAP-fused low-dimensional feature set is used as the input for unsupervised clustering analysis in this study. Under identical feature conditions, clustering is performed with the number of clusters set to k = 3–6 for comparative evaluation.
Figure 4 illustrates the visualization results of K-means++ clustering under different numbers of clusters. The subfigures correspond to k = 3, 4, 5, and 6, respectively, and the red cross symbols denote the cluster centers. As the number of clusters increases, the partitioning of samples becomes more refined; however, excessive subdivision may lead to over-segmentation of transitional degradation states without providing additional physically meaningful degradation stages. In particular, although the distribution in the embedded space under k = 4 may visually suggest additional separable groups, some of these clusters primarily subdivide intermediate degradation states rather than representing distinct degradation phases.
To quantitatively evaluate clustering performance under different cluster numbers, the silhouette coefficient is adopted as the evaluation metric, as it comprehensively reflects intra-cluster compactness and inter-cluster separation.
Figure 5 presents the silhouette coefficient results for different values of k. It can be observed that when k = 3 the silhouette coefficient reaches its maximum value of 0.726, indicating a higher degree of separability among different degradation states while maintaining strong similarity within the same state. Therefore, k = 3 is selected as the optimal number of clusters for partitioning the performance degradation states of the switch machine. The obtained clusters are subsequently interpreted as degradation stages through feature trend analysis and engineering knowledge, rather than being predefined labels. As shown in
Figure 4, the three clusters correspond to three consecutive degradation stages: the upper-left cluster (brown) represents Degradation State 1, the lower-left cluster (light blue) represents Degradation State 2, and the upper-right cluster (dark blue) represents Degradation State 3.
To effectively identify the performance degradation states of switch machines, this study conducts a comparative analysis between the proposed K-means++ clustering algorithm and two widely used clustering methods, namely fuzzy C-means (FCM) and K-means, based on the fused low-dimensional feature set. These clustering algorithms have been extensively adopted in degradation state identification studies and thus provide a representative basis for performance comparison. To ensure the fairness of the comparison, all clustering algorithms are evaluated under identical feature inputs and the same number of clusters.
The visualization results of the clustering analysis are presented in
Figure 6 and
Figure 7, where
Figure 6 illustrates the clustering results obtained using the FCM algorithm and
Figure 7 shows the results produced by the K-means algorithm. In both cases, the number of clusters is set to k = 3. To quantitatively assess the clustering performance, three evaluation metrics are employed: the silhouette coefficient, the Calinski–Harabasz (CH) index, and the Davies–Bouldin (DB) index. The corresponding quantitative evaluation results are summarized in
Table 5.
As indicated in
Table 5, when the number of clusters is k = 3, the K-means++ algorithm achieves the best overall performance across all three evaluation metrics. Specifically, the silhouette coefficient reaches 0.726, the CH index attains the highest value, and the DB index achieves the lowest value, demonstrating that the K-means++ algorithm provides superior intra-cluster compactness and inter-cluster separability under this condition. These results verify the effectiveness and reliability of the proposed degradation state identification method.
Based on the observed degradation evolution patterns and the distribution characteristics revealed by the clustering results, and in conjunction with practical engineering experience, the performance degradation process of the switch machine is categorized into three stages: the normal operation stage, the moderate degradation stage, and the severe degradation stage. The characteristic descriptions of power curves and corresponding maintenance recommendations for each degradation stage are detailed in
Table 6, providing practical guidance for switch machine condition monitoring and maintenance decision-making.
After determining the optimal number of clusters (k = 3), the clusters are interpreted as degradation stages through post hoc analysis, since the clustering is fully unsupervised. The three clusters exhibit a clear sequential distribution in the UMAP space and show progressively increased waveform fluctuation, entropy, and feature deviation, corresponding to healthy, moderate, and severe degradation states. These patterns are consistent with typical S700K switch-machine degradation behaviors reported in practice and literature. Therefore, the clusters are mapped to three consecutive degradation stages, as summarized in
Table 6. Although larger k values may produce finer partitions, the silhouette, CH, and DB indices all indicate that k = 3 provides the best balance between compactness, separability, and engineering interpretability for maintenance-oriented stage identification.
It should be noted that, under certain parameter settings (e.g., k = 4), more than three visually separable groups may appear in the embedded feature space. However, the objective of this study is not to maximize geometric separability in the low-dimensional space, but to identify physically meaningful degradation stages that are consistent with the actual performance evolution of switch machines.
The internal clustering validation indices—including the silhouette coefficient, Calinski–Harabasz (CH) index, and Davies–Bouldin (DB) index—consistently indicate that k = 3 provides the best overall balance between intra-cluster compactness and inter-cluster separability. Moreover, in practical railway maintenance scenarios, switch-machine performance degradation typically evolves through three consecutive stages: healthy, moderate degradation, and severe degradation. The clustering results obtained with k = 3 exhibit a clear sequential distribution in the embedded space and show good correspondence with these practical degradation stages.
Although larger values of k may produce finer partitions, they mainly subdivide transitional states rather than revealing additional physically meaningful degradation modes. Density-based clustering methods (e.g., DBSCAN or HDBSCAN) may capture local density variations within transitional regions and could be explored in future work for fine-grained analysis. Nevertheless, for stage-oriented degradation identification aimed at maintenance decision support, the K-means++ model with k = 3 provides a more stable, interpretable, and practically meaningful partition of the degradation process.
4.6. Comparative Experimental Analysis
(1) Comparison of Different Dimensionality Reduction Methods
To evaluate the performance of different dimensionality reduction methods in the multi-source feature fusion scenario of switch machine power signals, three representative techniques—Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP)—are comparatively investigated. The focus is placed on assessing their dimensionality reduction effectiveness under multi-source power feature fusion, using both clustering visualization results and quantitative evaluation metrics.
Under identical clustering conditions, the K-means++ algorithm with the same number of clusters is applied to the low-dimensional features obtained by each dimensionality reduction method. The clustering visualization results corresponding to PCA, t-SNE, and UMAP are presented in
Figure 8, while the quantitative evaluation results are summarized in
Table 7.
As shown in
Table 7, the UMAP-based features achieve a silhouette coefficient of 0.726, which is higher than those obtained using PCA and t-SNE under the same experimental conditions. This indicates that UMAP provides more discriminative low-dimensional representations in the considered feature fusion scenario and exhibits superior cluster separability.
(2) Comparison of Different Feature Combinations
To further verify the effectiveness of the proposed feature fusion strategy, clustering performance under different feature combinations is evaluated within the same UMAP + K-means++ framework. The visualization results of clustering outcomes for different feature sets are shown in
Figure 9, and the corresponding quantitative evaluation results are reported in
Table 8.
As indicated in
Table 8, the fused feature set achieves a silhouette coefficient of 0.726, which is notably higher than those obtained using single time-domain features, frequency-domain features, and time–frequency feature combinations. These results demonstrate that multi-domain feature fusion enables a more comprehensive characterization of the performance degradation evolution of switch machines and significantly enhances the discriminative capability of degradation state clustering.
Overall, the comparative experimental results confirm that the proposed feature fusion method effectively improves the stability and reliability of clustering-based degradation state identification, providing robust data support for switch machine condition monitoring and maintenance decision-making.
4.7. Degradation State Identification
A total of 1000 switch machine power samples are used to construct the modeling dataset, while approximately 200 power curves corresponding to normal operating conditions are extracted from the CSM system as reference samples. Based on the previously obtained UMAP-based feature fusion results and K-means++ clustering model, the Euclidean distances between test samples and each cluster center are calculated. The degradation state of each test sample is then identified using a minimum-distance (nearest cluster center) decision criterion.
Table 9 presents representative distance calculation results for test samples. Each column corresponds to the distance between a sample and the cluster centers representing the three degradation stages (Stage I: healthy, Stage II: moderate degradation, and Stage III: severe degradation). The identification result is determined by the minimum-distance criterion.
As illustrated in
Figure 10, the confusion matrix of the K-means++-based identification results exhibits a clear diagonal dominance, indicating good separability among different degradation states.
Figure 11 further visualizes the distribution of training and test samples in the low-dimensional feature space, showing that test samples are consistently mapped to the corresponding degradation state regions defined by the training data.
On the selected test dataset, the proposed method achieves completely correct degradation state identification results, demonstrating the effectiveness of the feature fusion and clustering-based identification framework for equipment health assessment. Accurate identification of switch machine degradation states is a critical prerequisite for intelligent operation and maintenance. By formulating and implementing predictive maintenance and condition-based inspection strategies tailored to different degradation stages, potential failure risks can be effectively reduced, equipment reliability and availability can be improved, and maintenance resources can be allocated and utilized more efficiently.