1. Introduction
Rail corrugation is a common wheel–rail contact fatigue and uneven wear defect in urban rail transit [
1,
2,
3,
4]. It not only increases vehicle noise, reduces ride comfort, and generates more passenger complaints, but also induces high-frequency vibrations in the vehicle–track system [
5,
6,
7], accelerating fatigue damage of critical wheel–rail components and increasing maintenance costs and frequency [
8,
9,
10]. With the continuous expansion of urban rail networks and the increasing service life of lines, noise and vibration issues have received growing attention. Therefore, monitoring rail corrugation and evaluating grinding effectiveness have become essential tasks in rail maintenance and operation.
Currently, rail corrugation detection mainly relies on two approaches: (1) contact-based measurement methods, such as track inspection trolleys [
11,
12,
13], which can obtain high-precision geometric features of corrugation, including wavelength and depth. However, these methods have long inspection cycles, are labor-intensive, costly, and often require nighttime operation, making them unsuitable for high-frequency monitoring. (2) Online monitoring based on vehicle vibration signals [
14,
15,
16], which infer track irregularities from acceleration responses during train operation. This approach offers real-time capability and convenient data acquisition, but vibration responses are sensitive to vehicle structure, suspension system, and operating speed. Consequently, response variations under different vehicle types or track conditions limit the stability of detection results.
Beyond vibration signals, recent studies have explored the sensitivity of acoustic signals to high-frequency vehicle–track responses and their potential application in onboard real-time monitoring. Zhang [
17] analyzed the correlation between high-speed rail rail roughness and in-train noise, showing that corrugation features are prominently reflected in operational noise. Wei [
18] utilized onboard noise collected during train operation to diagnose rail corrugation in high-speed railways. Liu [
19] applied a wavelet packet-based feature extraction method on wheel–rail noise for rapid evaluation of corrugation amplitude. Han [
20] constructed diagnostic indices based on the energy ratio of intrinsic mode components in acoustic signals, effectively identifying corrugation conditions. These studies indicate that acoustic signals, as a physical quantity directly accessible in a “vehicle–onboard platform–real-time acquisition–online analysis” framework, possess unique advantages in rail condition characterization. However, acoustic signals are also prone to environmental noise interference, limiting the stability of characterization when used alone.
To overcome the limitations of single acoustic or vibration signals, sound–vibration fusion has gradually become an important research direction in rail structural health monitoring. By integrating complementary information from acoustic and vibration signals, more robust rail condition recognition can be achieved at the feature, decision, or model level. Wang [
21] proposed a rail corrugation characterization index based on sound–vibration fusion for metro tracks. Huang [
22] and Cong [
23] designed corrugation identification methods based on combined acoustic–vibration features. Cai [
24] and Li [
25] further introduced machine learning and multi-task learning frameworks to achieve intelligent detection of rail corrugation.
However, existing sound–vibration fusion methods in rail monitoring have several limitations:
- a.
Limitations of empirical and statistical fusion methods
These methods typically assume feature independence and use global energy or variance indicators for weighting. Although computationally simple, they fail to capture sample distribution structures and cannot reflect nonlinear coupling among features under different operating conditions. When acoustic and vibration signals are affected by operational disturbances or background noise, the weighted results may be distorted, reducing robustness.
- b.
Limitations of traditional machine learning methods
Linear dimensionality reduction techniques such as PCA and PLS determine main directions based on global variance or covariance, focusing on overall information retention while neglecting local geometric relationships among samples. Supervised learning methods such as SVM can distinguish classes in high-dimensional space but rely on sufficient labels and hyperparameter tuning, which are constrained in rail monitoring by label scarcity and data imbalance. Moreover, these methods emphasize feature “discrimination” rather than “continuity” or “neighborhood preservation,” making it difficult to reflect the gradual evolution of corrugation.
- c.
Limitations of deep learning methods
Deep models can achieve nonlinear feature fusion in an end-to-end manner but heavily depend on large-scale samples and sufficient training. In metro onboard sound–vibration monitoring, limited sample size, variable operating conditions, and scarce labels restrict generalization. Additionally, the “black-box” nature of deep models reduces interpretability, hindering understanding of the physical relationship between sound–vibration features and rail conditions.
Therefore, for rail corrugation detection scenarios with small samples, weak labels, and complex noise, there is an urgent need for a feature-weighted fusion method that preserves local geometric structures under limited data, with good interpretability and robustness.
This study introduces the concept of manifold learning [
26,
27,
28] and proposes a Laplacian manifold learning-based sound–vibration feature weighted fusion method, constructing a comprehensive indicator, the Laplacian-Weighted Vibro-Acoustic Feature (LWVAF), for rail condition characterization.
The remainder of this paper is organized as follows:
Section 2 presents the research background and the idea of sound–vibration feature fusion from a manifold learning perspective;
Section 3 details the proposed method;
Section 4 describes the experimental setup;
Section 5 presents experimental results and comparative analysis;
Section 6 concludes the paper and summarizes the main findings.
2. Problem Statement
Manifold learning provides a new perspective for feature fusion. This theory assumes that high-dimensional observed data lie on an underlying low-dimensional manifold. If two data points are close in the high-dimensional space, they should also remain proximate in the low-dimensional space. This “local preservation assumption” aligns with the physical characteristics of rail corrugation evolution: the generation and development of corrugation exhibit gradual, continuous changes, meaning that features from adjacent segments or nearby time points vary slowly, and the rail condition shows locally similar and globally smooth distribution patterns. Therefore, analyzing the neighborhood relationships of samples within the manifold structure can help understand the correlations among features.
Figure 1 illustrates the manifold structure of samples before and after rail grinding in a low-dimensional space, showing good preservation of neighborhood relationships. However, in the task of state characterization, the effectiveness of feature fusion depends not only on intra-class compactness but also on inter-class separability.
Figure 1a shows the projection distribution of features F1 and F2 (the distribution patterns do not represent the joint discriminative ability of the two features in 2D space, but reflect the distribution of each feature along its respective axis). Along the F1 dimension, the two states can be clearly distinguished, but the intra-class dispersion is large, indicating weak clustering. Correspondingly, in
Figure 1b, the trend curves of F1 before and after grinding show noticeable differences, effectively reflecting changes in rail condition, but the curves themselves fluctuate considerably, indicating insufficient stability. In contrast, the F2 dimension exhibits good intra-class clustering, but the coordinate values of the two classes heavily overlap, reflecting poor separability. Correspondingly,
Figure 1c shows the F2 trend curves before and after grinding, which fluctuate a little but almost completely overlap, indicating that while this feature is stable, it lacks the ability to characterize state changes.
In summary, clustering reflects the consistency of sample distribution under the same state and forms the basis for stable characterization of rail health. Separability reflects the degree of difference between distinct states and is a key indicator for detecting rail changes, such as grinding effects. Only by considering both clustering and separability can features maintain reliability and discriminative power in rail state recognition. To obtain indicators that preserve the internal structure of a state while remaining sensitive to rail changes, it is necessary to develop a feature fusion method that leverages the advantages of multiple features, enabling a more robust characterization of rail corrugation.
The feature distributions illustrated in this section serve as conceptual examples to motivate the manifold-based fusion perspective rather than to present actual measured data. The real vibro-acoustic measurements, data-acquisition procedures, and the computation of the six features—
Sound-pressure RMS
Vibration RMS
Sound-pressure standard deviation
Vibration standard deviation
Sound spectral energy
Vibration spectral energy
—are comprehensively detailed in
Section 3 and
Section 5.1 These later sections, based on real collected data, provide the full feature definitions and preprocessing workflow that form the experimental foundation of this study.
4. Methodology
4.1. Theoretical Basis
This study introduces the concept of manifold learning and extends it to construct a new feature evaluation metric: the Laplacian-based indicator
Lr, as shown in
Figure 5. Traditional Laplacian approaches focus on the local similarity among data points, assessing a feature’s ability to preserve these relationships, typically used to characterize intra-class compactness. However, focusing solely on intra-class compactness is insufficient to fully capture differences in rail conditions. Therefore, this study further incorporates an evaluation dimension of feature separability, comprehensively assessing feature importance from both intra-class compactness and inter-class separability perspectives.
Specifically, the proposed method measures a feature’s similarity within local neighborhoods and the overall state distribution’s dispersion to achieve a balanced assessment of clustering and separability. A smaller Lr value indicates that the feature can effectively distinguish different states while preserving local similarity. Therefore, features with smaller Lr values are more important and better reflect the evolution of rail condition.
The “manifold preservation” theory posits that points close to each other in high-dimensional space remain proximate in a low-dimensional space, and any point can be linearly represented by other points in its neighborhood, meaning that the local geometric structure of samples is preserved under dimensionality reduction.
For example, a data point in low-dimensional space can be represented as:
Here, Xi is expressed as a linear combination of its k nearest neighbors, and Wij denotes the weight between Xi and its neighbors, usually related to Euclidean distance or similarity. Equation (1) reveals the local linear structure of the data: each point can be viewed as a linear combination of its neighbors, forming a locally linear approximation of the manifold.
Based on Equation (1), a kernel function is applied to map the low-dimensional points into a high-dimensional space. Let the kernel function be (⋅), then the linear relationship in high-dimensional space can be expressed as:
Equation (2) indicates that in the high-dimensional feature space after nonlinear mapping, the neighborhood relationships among data points should still be preserved. The Laplacian approach quantifies a feature’s ability to maintain local structure by constructing adjacency relationships among samples and defining a weight matrix.
4.2. Construction of Data Adjacency Graph
To characterize the local manifold structure of the data, a data adjacency graph is first constructed, where the vertex set , and the edge set . The Euclidean distance between each data point and all other points is calculated. If Xj is one of the k nearest neighbors of Xi, an edge exists between Xi and Xj in the adjacency graph , and Sij = 1; otherwise, if Xj is not a neighbor of Xi, then Sij = 0.
The adjacency matrix
S is defined as:
where
denotes the k-nearest neighbor set of
Xi. This adjacency graph captures the neighborhood relationships between data points in the dataset. Once adjacency is established, a weight is assigned to each edge to reflect the similarity between neighboring data points.
When
Sij = 1, the edge weight is determined using a heat kernel function. The scale parameter (a constant) controls the decay of similarity. The weight
Wij is an exponentially decaying function that represents the closeness between points: the closer two points are, the larger the value of
Wij, which ranges from 0 <
Wij ≤ 1. When
Sij = 0, the weight
Wij = 0.
The adjacency weight between two points acts as a penalty factor, assigning greater weight to points that are closer and smaller weight to points that are farther apart.
4.3. Determination of Feature Evaluation Indicator
Based on the established adjacency graph, the Laplacian-based indicator
Lr is defined for each feature to achieve a balanced assessment of clustering and separability:
where
fri denotes the
r-th feature value of the
i-th sample,
frj denotes the
r-th feature value of the
j-th sample, and Var(
fr) represents the global variance of the
r-th feature
fr. A larger variance indicates stronger separability among feature values.
Equation (5) embodies the concept of “local clustering–global separability”: a small numerator indicates strong local clustering of the feature within neighborhoods, while a large denominator indicates a dispersed global distribution, reflecting good separability. A smaller Lr value implies that the feature performs better in maintaining local consistency while distinguishing global states, and is therefore more representative of the underlying data structure.
4.4. Sound–Vibration Feature Weighted Fusion
In this study, six basic acoustic and vibration features are extracted from sound–vibration samples (see
Section 5.2 for details). The Laplacian indicator
Lr of each feature is calculated according to Equation (5), and its reciprocal is used as the feature weighting coefficient:
where
Ar represents the weighting coefficient of the
r-th feature. Based on these coefficients, the sound–vibration fusion indicator is defined as:
where LWVAF(
i) denotes the LWVAF value of the
i-th sample, and
R is the total number of features. Equation (7) achieves weighted fusion under the manifold-preserving constraint, ensuring that the fused feature both reflects the continuous evolution of rail conditions and maintains good discriminative ability.
4.5. Method Workflow and Implementation
To illustrate the overall framework of the proposed method,
Figure 6 presents the workflow of the sound–vibration fusion approach based on Laplacian manifold learning, which constitutes the implementation process of the sound–vibration fusion indicator LWVAF.
To ensure that the choice of neighborhood size
k and kernel scale
t does not bias the Laplacian-based LWVAF computation, we performed a full sensitivity analysis covering
k ∈ {5, 8, 10, 12, 15, 20} and
t ∈ {0.2, 0.5, 1.0, 2.0}
σ2. As detailed in
Appendix A, both the between-section standard deviation of LWVAF and the mean reduction rate after grinding exhibit extremely small numerical variations across the
k ×
t grid. For over 90% of the parameter combinations, the relative deviation from the baseline setting (
k = 8,
t = 0.5
σ2) stays within ±2%. This confirms that LWVAF is robust with respect to graph-construction hyperparameters, and validates the use of the baseline setting in all experiments of this study.
5. Results Analysis and Discussion
5.1. Selection of Basic Features
Feature extraction is a key step to mine state information from raw acoustic and vibration signals. Since rail corrugation affects both the structural vibration of wheel–rail contact and acoustic radiation, six basic features are extracted separately from the acoustic and vibration signals to describe rail conditions across different physical dimensions. These six features include: root mean square (RMS) of sound pressure, RMS of vibration, standard deviation of sound pressure, standard deviation of vibration, spectral energy of sound pressure, and spectral energy of vibration.
Let the raw acoustic signal be (t) and the vibration signal be (t). The signals are first processed by band-pass filtering, with a passband of 20–20 kHz for acoustic signals and 1–5 kHz for vibration signals. The continuous acoustic and vibration signals are then sampled at 51.2 kHz and segmented using a 1 s time window. Each window is considered as an independent data sample. After discrete sampling, the acoustic and vibration data are represented as p = [p(1), p(2), …, p(k), …,p(K)]T, a = [a(1), a(2), …, a(k), …, a(K)]T, where K is the sample length of each data window.
The six basic features are defined as follows:
where
x(
k) represents either
p(
k) or
a(
k),
Rp denotes the RMS of the acoustic signal, and
Ra denotes the RMS of the vibration signal. This metric reflects the average signal energy and characterizes the overall amplitude variation induced by rail corrugation.
where
x(
k) represents
p(
k) or
a(
k),
is the mean of
x(
k),
Dp denotes the standard deviation of the acoustic signal, and
Da denotes the standard deviation of the vibration signal. This metric characterizes signal dispersion, reflecting the random or periodic fluctuations of rail conditions. The acoustic SD represents the variations in the sound field, while the vibration SD reflects the fluctuation of the track structure response.
where
X(
f) is the Fourier spectrum of
x(
k), and
f corresponds to the dominant frequency range of corrugation. Based on the analysis in
Section 5.1, the range is 400–600 Hz.
Ep denotes the spectral energy of the acoustic signal, and
Ea denotes that of the vibration signal. Spectral energy reflects the concentration of energy in the corrugation frequency band, which is an important indicator for characterizing the strength of rail surface excitation.
The six extracted features form the feature vector for the i-th sample Fi= [Rp, Ra, Dp, Da, Ep, Ea], and the feature matrix for all samples is F = [F1, F2, …, FI]T, where I is the total number of samples. The first four features are general indicators based on signal energy and variability, not specifically designed for corrugation, and they reflect overall signal intensity and fluctuation in the time domain, forming the basis for studying the internal manifold structure of the data. The last two features are selected based on the physical mechanism of rail corrugation. Since rail corrugation exhibits a distinct dominant frequency range and measured results show a significant energy decrease after grinding, using the energy in this frequency band as a feature effectively enhances the model’s sensitivity to rail condition changes.
5.2. Feature Weight Analysis
Although five experimental sessions were conducted, only the two sets of data corresponding to the rail conditions before and after grinding for each section were selected for manifold learning analysis, as shown in
Table 2. Manifold learning assumes that, to a certain extent, the proximity between samples determines their distribution in the low-dimensional space. Therefore, by selecting the data before and after rail grinding, it is possible to effectively evaluate whether the features can reflect the continuous changes in rail conditions in a targeted manner.
A total of 790 samples were selected from
Table 2 to construct the feature matrix
F6×790. Based on the theoretical method in
Section 3 and Equation (5), the feature evaluation results are summarized in
Table 3:
According to the feature evaluation results in
Table 3, the spectral energy of vibration (
Ea) has the smallest Laplacian indicator, indicating that it contributes the most to the manifold structure of the samples and is the most discriminative for states before and after grinding. From a physical perspective, this result aligns with the excitation mechanism of rail corrugation. Rail corrugation is a typical structural irregularity, primarily causing high-frequency vibration excitation through the wheel–rail contact interface. This excitation propagates along the rail and wheel structure, significantly enhancing the vibration spectrum energy within the characteristic frequency band of 400–600 Hz. After grinding, due to the reduction in rail surface irregularity, the energy in this band decreases significantly, making the vibration spectral energy the most direct and sensitive indicator of rail surface corrugation.
In contrast, the RMS of vibration (Ra) also reflects the overall structural vibration intensity, but its time-domain averaging property reduces sensitivity to local frequency band variations. The spectral energy of sound (Ep) and RMS of sound (Rp) reflect the acoustic radiation effect of corrugation, but their stability and representativeness are slightly lower than the vibration response due to the influence of vehicle structure, carbody acoustic coupling, and environmental noise.
The standard deviations of sound and vibration (Dp and Da) mainly characterize signal fluctuation and non-stationarity, responding relatively indirectly to rail condition changes; hence, their importance in the feature weight analysis is lower.
5.3. Trend Analysis of LWVAF
According to Equation (7), the LWVAF for each month is calculated as:
Figure 7 shows the LWVAF trends from January to May for the eight sections. In the figure, the temporal variations of LWVAF values for different sections can be observed. Specifically, the rise or fall of the LWVAF reflects the improvement or deterioration of rail conditions, thereby indicating the state changes before and after grinding. Analysis of the trend plots reveals several key points:
Significant changes after rail grinding: After grinding, the LWVAF values of all eight sections show a noticeable decrease, which is consistent with expectations and indicates that the grinding treatment significantly improved the rail surface corrugation conditions.
Different trends across sections: Some sections exhibit relatively mild LWVAF trends reflecting slower rail degradation (e.g., S03, S04, S05), while other sections show more pronounced LWVAF variations indicating more significant rail condition changes (e.g., S01, S02, S06, S07, S08).
The LWVAF trend changes shown in
Figure 7 not only reflect the temporal evolution of rail conditions but also validate the effectiveness of the Laplacian manifold learning–based feature fusion in characterizing rail corrugation. In particular, after corrugation maintenance, observing the LWVAF changes for each section allows one to relate the corrugation evolution to grinding operations, providing an important basis for dynamic monitoring and assessment of rail conditions.
5.4. Comparative Analysis
To further verify the advantage of the vibro-acoustic fusion indicator LWVAF over single acoustic or vibration indicators in rail condition characterization, this section presents both trend visualization and quantitative comparison of the three types of indicators.
First, from the perspective of trend changes,
Figure 8 shows the variations of the RMS of sound pressure, RMS of vibration, and the LWVAF for the typical Section S01 before and after rail grinding. It can be clearly observed that both the sound and vibration RMS indicators decrease to some extent after grinding, but their changes differ; in contrast, the LWVAF shows a more pronounced downward trend with smaller fluctuations.
To quantitatively compare the sensitivity of each indicator to rail condition changes, the indicator change rate is defined as:
where
IDbefore and
IDafter denote the indicator values before and after grinding, respectively. A negative Δ
r indicates that the indicator value decreases after grinding, reflecting improvement in the rail surface condition. The larger the absolute value of Δ
r, the higher the indicator’s sensitivity to rail condition changes and, therefore, its characterization capability. Conversely, a smaller absolute Δ
r indicates lower sensitivity and weaker representation of rail condition changes.
From
Table 4, it can be observed that all three types of indicators across the sections show negative changes after grinding, indicating that the rail grinding operation generally reduces the signal energy features. Among them, the change rate of the vibration RMS indicator is relatively unstable, ranging from a minimum of 0.79% to a maximum of 73.6%. The sound RMS indicator is comparatively more stable than vibration, but in some sections the absolute change rate is relatively small; for example, the right rail of S03 shows a change rate of 19.8%, failing to exhibit a pronounced variation. In contrast, the absolute change rates of the LWVAF indicator exceed 50% across all sections, significantly higher than those of the individual sound or vibration indicators, demonstrating that the fused feature is more sensitive in capturing changes in rail conditions.
Its superior performance essentially arises from two key reasons. First, acoustic signals represent the radiated response of rail corrugation excitation in the air medium, while vibration signals directly reflect the structural response of the wheel–rail system and are therefore more sensitive to track surface irregularities. The LWVAF indicator integrates energy-related features from both acoustic and vibration domains, capturing complementary information from the two physical channels. Second, the fusion process leverages manifold learning theory to exploit the intrinsic geometric relationships within the data. Through data-driven weight learning, feature dimensions that are strongly correlated with track-state variations are amplified, whereas noise-related dimensions are suppressed. As a result, LWVAF enhances state-discrimination capability while preserving the intrinsic consistency of the samples.
5.5. Feasibility of Incremental or Adaptive Manifold Updating
The graph used in this study is constructed by k-nearest neighbors and heat-kernel weights, both of which are inherently local operators. For a newly acquired sample, only its own neighborhood relations—and the relations of a small set of points that include it in their kNN list—need to be updated. The rest of the graph remains unchanged. This locality property implies that the underlying neighborhood graph can be maintained in an incremental manner without rebuilding the entire structure.
Since the LWVAF indicator depends only on local feature differences weighted by the current graph structure, its value can be updated by refreshing the local neighborhoods and their associated weights. No global recomputation of all pairwise relations is required. This property makes the LWVAF framework naturally compatible with adaptive or streaming data settings.
Depending on the continuity and rate of incoming data, several practical updating strategies are feasible:
Incremental updates: update graph and LWVAF only when new samples arrive.
Mini-batch updates: process data in short windows (e.g., 5–10 min).
These schemes require minimal computational overhead because only local modifications are made.
While the locality of the graph structure enables incremental updates, several engineering considerations remain:
Drift in long-term data distributions may require occasional reinitialization of the neighborhood graph.
Extreme outlier samples may distort local neighborhoods and should be detected prior to updating.
Computation on edge devices may restrict update frequency, requiring simplified neighborhood search.
Overall, the graph structure and LWVAF formulation used in this study possess natural mathematical and computational properties that make them suitable for incremental or adaptive manifold updating in long-term monitoring systems.
6. Conclusions
A manifold-learning-based acoustic–vibration fusion method for track-state representation was developed. By applying Laplacian manifold analysis, the adjacency relationships among acoustic–vibration samples were constructed, enabling the characterization of their distribution in a low-dimensional manifold space. From the perspective of feature-space locality, the method reveals the latent clustering properties of track-state features. This framework provides theoretical support for multimodal feature fusion, allowing acoustic and vibration information to be represented within a unified manifold space.
An acoustic–vibration fusion index, LWVAF, based on Laplacian weighting was proposed. Six fundamental features—acoustic RMS, vibration RMS, acoustic standard deviation, vibration standard deviation, acoustic spectral energy, and vibration spectral energy—were adaptively weighted through the Laplacian weighting mechanism to form the fusion index LWVAF. The results demonstrate that Laplacian weights effectively reflect the importance of each feature within the manifold structure, enabling adaptive feature-level fusion and overcoming the limitations of conventional subjective weighting approaches.
The superiority of the fusion index in track-state representation was validated. Trend analysis and change-rate evaluation across eight experimental sections show that LWVAF exhibits more pronounced trend variations and consistently larger absolute change rates before and after rail grinding. Compared with using acoustic or vibration indicators alone, LWVAF is more sensitive and stable in capturing changes in track condition. This confirms the robustness and enhanced representational capability of the proposed fusion method.
This study establishes a quantitative evaluation method for track condition characterization. By constructing a quantitative mapping between acoustic–vibration features and track states based on the LWVAF index, the proposed approach provides a measurable basis for assessing rail grinding effectiveness. The method exhibits strong robustness and engineering applicability under diverse operating conditions, and it holds promising potential for integration into onboard real-time diagnostic systems. It can also be extended to long-term track health monitoring, offering particular advantages in scenarios with limited prior knowledge. Leveraging its unified acoustic–vibration characterization framework and adaptive fusion capability, the method can further be applied to localized track defect detection. In addition to urban rail systems, the approach is equally applicable to more complex railway networks such as high-speed and heavy-haul railways.