1. Introduction
Railway transportation continues to be a fundamental mode for both passenger and freight movement, where safety, reliability, and operational continuity are of primary importance. Railway wagons are subjected to demanding operating conditions that include cyclic loading, continuous vibration, thermal variations, and long service duration [
1,
2]. Over time, these conditions contribute to the gradual deterioration of safety-critical components such as axle bearings, wheelsets, braking mechanisms, and suspension elements. If not detected at an early stage, such degradation can escalate into severe failures, posing risks to operational safety and resulting in substantial economic losses [
3]. Conventional maintenance practices in railway systems are largely based on periodic inspections or predefined mileage thresholds. Although these approaches are widely adopted, they provide only a limited representation of the actual conditions of wagon components during operation. Faults that develop between inspection intervals may remain undetected, while components that are still in acceptable condition may be replaced prematurely. These shortcomings have led to increased interest in condition-based monitoring (CBM), which emphasizes maintenance decisions based on the observed operating condition of equipment rather than fixed schedules. The availability of low-cost sensors and advances in networked sensing technologies have enabled continuous acquisition of operational data from railway wagons. Sensor-based monitoring systems allow parameters such as vibration, temperature, and acoustic response to be observed during normal service, providing valuable insights into component health [
4]. When such data are systematically analyzed, deviations from normal behavior can be identified at an early stage, supporting timely maintenance actions and enhancing overall system safety. Despite the growing adoption of sensor-based monitoring in railway applications, much of the existing research remains limited to offline analysis or focuses on individual components under controlled conditions. Furthermore, several studies emphasize fault detection performance without adequately addressing system-level integration, real-time health assessment, or safety-oriented decision support in operational environments [
5]. These limitations indicate the need for a comprehensive monitoring framework that integrates sensing, data processing, and health assessment in a manner suitable for railway wagon operations.
This paper addresses the above-mentioned challenges by presenting a sensor-based health assessment framework for railway wagons aimed at safety-critical applications. It is worth clarifying that the proposed framework has been evaluated using a publicly available run-to-failure benchmark dataset and has not been deployed on physical IoT devices or embedded platforms. The primary focus of this study is to validate the underlying health assessment methodology in a controlled analytical setting. Implementation on real hardware and practical field deployment are considered future extensions of this research. It is noteworthy that this study focuses on algorithmic health assessment using sensor data and does not address communication protocols, edge-cloud distribution, or IoT system deployment, which are considered for future work.
The remainder of this paper is organized as follows.
Section 2 presents a brief review of related studies on condition-based monitoring and health assessment techniques.
Section 3 describes the problem formulation and defines the key concepts used in the proposed framework. The proposed sensor-based health assessment model and its operational workflow are detailed in
Section 4.
Section 5 explains the algorithmic implementation and decision logic employed for real-time monitoring.
Section 6 discusses the experimental setup, datasets, and performance evaluation metrics, followed by a comprehensive analysis of the results. Finally,
Section 7 concludes the paper and outlines potential directions for future work.
2. Related Work
This section reviews existing research related to condition-based monitoring and health assessment of railway and rotating mechanical components [
6,
7,
8,
9,
10,
11]. The discussion focuses on commonly adopted monitoring strategies, including statistical threshold-based methods, feature-based classification approaches, and health indicator-driven assessment techniques.
Condition-based monitoring (CBM) has been extensively studied for improving safety and maintenance efficiency in railway systems and other rotating machinery. Early research predominantly relied on statistical and rule-based techniques, where sensor signals such as vibration and temperature were compared against predefined thresholds to detect abnormal behavior [
12,
13,
14,
15,
16]. Jardine et al. [
17] provided one of the earliest comprehensive reviews on CBM, highlighting the practicality of threshold-based approaches while also noting their sensitivity to operating variability and noise. Several studies have applied statistical threshold-based condition monitoring (ST-CBM) to railway and bearing systems due to its simplicity and low computational cost. For example, Wang et al. [
18] employed vibration amplitude thresholds for bearing fault detection and demonstrated feasibility under controlled conditions. Despite their practical advantages, threshold-based monitoring methods suffer from several limitations. Fixed thresholds must be carefully calibrated for specific operating environments, and their performance degrades under varying loads, environmental disturbances, or measurement noise. In railway applications where wagons experience continuously changing operating conditions, static thresholds may lead to either delayed fault detection or frequent false alarms. These limitations have motivated the exploration of more advanced monitoring techniques that can adapt to complex degradation patterns.
To overcome the limitations of fixed thresholds, feature-based supervised health classification (FSHC) approaches have been widely explored. Samanta and Al-Balushi [
19] used time- and frequency-domain features combined with supervised classifiers to detect bearing faults, achieving improved classification accuracy. Similarly, Widodo and Yang [
20] applied support vector machines (SVMs) to machinery fault diagnosis and reported better generalization compared to rule-based methods. Despite these improvements, most feature-based classifiers treat samples independently and do not explicitly model the temporal evolution of degradation. Although these approaches improve fault detection accuracy, they typically treat each observation independently and focus primarily on classification performance. As a result, many feature-based classifiers do not explicitly model the temporal evolution of degradation, which is an important aspect of machinery health monitoring. Without incorporating temporal information, classification models may fail to capture gradual degradation trends that occur over long operational periods.
Another line of research has focused on health indicator-based assessment (IHIA), where multiple features are fused into a single indicator representing component conditions. Yan et al. [
21] proposed a health indicator for machinery monitoring that provided intuitive interpretation and improved sensitivity to degradation. Lei et al. [
22] further emphasized the importance of health indicators in rotating machinery diagnostics. However, instantaneous evaluation of health indicators without persistence validation often leads to spurious alerts caused by transient disturbances. While health indicator-based approaches provide improved interpretability and sensitivity to degradation, instantaneous evaluation of health indicators may lead to false alarms when transient disturbances occur. Without incorporating persistence validation or trend analysis, temporary fluctuations in sensor signals can be incorrectly interpreted as degradation events.
With the availability of run-to-failure datasets, degradation trend analysis has gained significant attention. Qiu et al. [
23] demonstrated that tracking degradation trends enables earlier fault detection compared to static classification. The NASA IMS bearing dataset [
24] has been widely used to study bearing degradation behavior and validate trend-based health assessment models. While these approaches provide valuable insights, many studies focus primarily on prognostics rather than reliable online health state classification. Recent research has also explored CBM in railway-specific contexts. Yang et al. [
25] investigated vibration-based monitoring of railway axle bearings and highlighted the challenges associated with variable operating conditions. Similarly, Zakir et al. [
26] emphasized the need for robust decision logic to reduce false alarms in railway environments.
Although considerable progress has been made in condition-based monitoring of rotating machinery and railway components, several challenges still remain. Statistical threshold-based methods are simple and computationally efficient, but they rely on fixed thresholds that are often sensitive to noise and changing operating conditions. Feature-based supervised classification techniques improve fault detection accuracy by learning patterns from historical data; however, these methods generally evaluate observations independently and do not adequately capture the gradual degradation behavior of components over time. Health indicator-based approaches provide an interpretable measure of equipment condition and allow continuous monitoring of degradation, but evaluating these indicators at a single time instant may lead to false alarms when temporary disturbances occur. In addition, many existing studies focus mainly on improving classification accuracy or analyzing degradation trends separately, without simultaneously addressing early fault detection, false alarm reduction, and stable decision-making, which are critical for safety-oriented railway monitoring applications. To overcome these limitations, this study proposes a health assessment framework that combines normalized health indicators, degradation trend analysis, and persistence-based decision validation. By integrating these elements within a unified monitoring process, the proposed approach enables more reliable detection of component degradation while minimizing unnecessary alerts, thereby supporting safer and more practical monitoring of railway wagon components during operation.
3. System Overview and Problem Formulation
This section presents the system overview, formal definitions, and the mathematical formulation of the railway wagon health assessment problem. The objective is to model continuous condition monitoring as an optimization problem that supports safety-oriented decision-making under operational constraints.
3.1. System Overview
Consider a railway wagon equipped with multiple sensors mounted on safety-critical components such as axle bearings, wheelsets, braking systems, and suspension units. These sensors continuously collect operational data during wagon movement. The collected data are transmitted through a communication layer to a processing unit where health assessment is performed in near real time.
Let the monitoring system operate over discrete time intervals as . At each time instant, sensor measurements are used to evaluate the current health conditions of wagon components.
3.2. Definitions and Notation
Let
denote the set of monitored components.
denote the set of sensors deployed on the wagon.
represent the measurement obtained from sensor at time t.
denote the multivariate sensor observation vector.
Health Indicator: To characterize the condition of each component, a health indicator is constructed from the sensor measurements. The health indicator of component
at time
t is defined as
where
is a feature aggregation function mapping sensor observations to a scalar health value. The health indicator is normalized to lie within a bounded interval:
where
represents the normal operating condition and
indicates a critical degraded state.
Health State Classification: Each health indicator component is mapped to a discrete health state. The health state of component
i at time
t, denoted by
, is defined as
where
and
are predefined threshold values.
Sliding Window Length: The parameter denotes the number of past time instances considered for evaluating the degradation trend. It defines the temporal horizon over which health evolution is analyzed.
Validation Window: The parameter represents the time interval used for confirming sustained degradation. It specifies the length of the decision accumulation period.
Persistence Parameter: The parameter defines the minimum number of degradation detections required within the validation window to confirm a maintenance alert.
3.3. Problem Formulation
The objective of the monitoring system is to minimize the overall health degradation risk while ensuring timely detection of critical conditions. The optimization formulation provides a theoretical representation of risk-aware decision objectives. In this work, these objectives are implemented implicitly through threshold-based decision logic rather than an explicit optimization solver.
Let
denote the safety importance weight associated with component
i, where higher values correspond to more safety-critical components. The aggregate health risk at time
t is defined as
The health assessment problem is formulated as a linear optimization model that determines whether maintenance attention is required for each component.
Let
be a binary decision variable indicating whether component
i requires maintenance at time
t. The objective is to minimize the weighted health degradation while limiting unnecessary maintenance actions:
with respect to
Maintenance is triggered only if the health indicator falls below a safety threshold
, as shown in Equation (6). At any time instant, the number of components selected for maintenance is limited as indicated in Equation (7). Equation (8) indicates the binary decision constraint. Note:
K represents the maximum allowable maintenance actions. Although the above formulation resembles a constrained optimization problem, in this work, it serves as a theoretical representation of risk-aware decision-making. The objective is realized implicitly through threshold-based decision logic, risk weighting, and maintenance selection constraints embedded within the health assessment procedure, rather than through an explicit optimization solver.
The proposed formulation models health assessment as a constrained optimization problem that balances safety requirements and operational limitations. By integrating normalized health indicators and component criticality weights, the framework supports systematic and interpretable decision-making suitable for safety-critical railway wagon monitoring applications.
This illustrative example reflects the operational behavior of Algorithm 1, where risk-weighted threshold evaluation and maintenance capacity constraints collectively guide component selection in accordance with the formulated objective.
| Algorithm 1 Proposed Health Assessment Algorithm for Railway Wagon Monitoring |
- Require:
Sensor data stream , health thresholds , , , component weights , trend window length L, validation window , persistence count - Ensure:
Health indicator , health state , confirmed alert decision
|
| 1: Initialize health history buffer for each component i | |
| 2: Initialize decision history buffer for each component i | |
| 3: for each time instant t do | |
| 4: Acquire multivariate sensor data | |
| 5: Preprocess sensor signals: | |
| 6: Extract features from preprocessed signals: | |
| 7: for each monitored component i do | |
| 8: Compute normalized health indicator: | |
| 9: Update health history with | |
| 10: if then | |
| 11: | ▹ Normal |
| 12: else if then | |
| 13: | ▹ Degraded |
| 14: else | |
| 15: | ▹ Critical |
| 16: end if | |
| 17: if health history length then | |
| 18: Compute degradation trend: | |
| 19: else | |
| 20: | |
| 21: end if | |
| 22: Compute component risk index: | |
| 23: if and then | |
| 24: | |
| 25: else | |
| 26: | |
| 27: end if | |
| 28: Update decision history with | |
| 29: if then | |
| 30: Issue confirmed alert for component i | |
| 31: end if | |
| 32: end for | |
| 33: end for | |
| 34: return Health indicators , health states , confirmed alerts | |
Numerical Illustration of Risk-Aware Maintenance Selection
To clarify how the threshold-based decision logic and maintenance constraints operationalize the objective defined in Equations (4)–(8), a numerical example is presented.
Consider a wagon with four monitored components at time
t. Let the normalized health indicators and safety weights be given as shown in
Table 1.
The weighted risk contribution of each component is computed as
The resulting risk values are
The aggregate degradation risk before maintenance is therefore
Assume a maintenance threshold
and a resource constraint
. According to the threshold condition, only components satisfying
are eligible for maintenance. Thus, components C3 and C4 are considered.
Since only one maintenance action is allowed (
), the component with the highest weighted risk among the eligible set is selected. Here, C3 has the highest risk value (
) and is therefore chosen. Assuming maintenance restores the health indicator to
, its risk contribution becomes zero. The new aggregate risk is
The reduction in aggregate risk is
This reduction corresponds to the largest removable weighted risk contribution under the constraint
.
Therefore, the threshold-based and risk-weighted selection mechanism achieves the maximum immediate reduction in aggregate degradation risk while satisfying operational constraints. This demonstrates that the implemented decision logic provides a computationally efficient realization of the constrained risk minimization objective formulated in Equations (4)–(8). This illustrative example reflects the operational behavior of Algorithm 1, where risk-weighted threshold evaluation and maintenance capacity constraints collectively guide component selection in accordance with the formulated objective.
4. Proposed IoT-Based Health Assessment Model
The overall methodology of the proposed health assessment for the railway wagon framework is illustrated in
Figure 1. To improve clarity and interpretability, the framework is organized into four logical layers: sensing, data processing, health evaluation, and decision validation. The sensing layer acquires operational signals from safety-critical components. The data processing layer performs signal preprocessing and feature extraction to obtain representative diagnostic features. The health evaluation layer constructs a normalized health indicator and evaluates degradation trends over a sliding time window. Finally, the decision validation layer applies threshold evaluation and temporal persistence verification to generate reliable alerts. This layered representation provides a structured view of the relationship between signal acquisition, health modeling, and decision logic, thereby clarifying the methodological flow of the proposed approach. The detailed operational steps corresponding to this architecture are described in the subsequent subsections and summarized in Algorithm 1.
This section describes the proposed health assessment model for railway wagons in a sequential manner, detailing each operational stage from data acquisition to decision-making. The model is designed to support continuous condition evaluation of safety-critical components during normal wagon operation.
- Step 1:
Sensor Data Acquisition
Safety-critical wagon components are instrumented with sensors that continuously measure operational parameters such as vibration and temperature. Let
denote the measurement obtained from sensor
s at time instant
t. The complete set of observations at time
t is represented as
where,
M is the total number of deployed sensors.
- Step 2:
Signal Preprocessing
The raw sensor measurements are subject to environmental disturbances and measurement noise. To improve data quality, preprocessing operations including filtering, normalization, and segmentation are applied. The preprocessed signal vector is denoted as
where
represents the preprocessing function.
- Step 3:
Feature Extraction
Relevant features are extracted from the preprocessed sensor signals to capture component condition characteristics. The feature extraction process transforms the sensor signals into a compact feature vector:
where
denotes the feature mapping function.
- Step 4:
Health Indicator Construction
For each monitored component, a scalar health indicator is constructed from the extracted features. The health indicator for component
i at time
t is defined as
The indicator is normalized such that
where higher values correspond to healthier operating conditions.
- Step 5:
Health State Evaluation
The health indicator is evaluated against predefined thresholds to determine the operating state of each component. The health state variable
is defined as
representing normal, degraded, and critical states, respectively.
- Step 6:
Risk-Based Assessment
To account for varying safety importance among components, a risk index is computed where the risk index is weighted health deviation. Let
denote the safety weight of component
i. The component-level risk at time
t is given by
The overall wagon risk level is then expressed as
- Step 7:
Decision and Alert Generation
Based on the evaluated health states and risk levels, maintenance decisions are determined. A binary decision variable
is defined as
Alerts are generated according to the severity of the detected condition, enabling timely intervention for safety-critical components.
- Step 8:
Temporal Validation
To avoid false alerts due to transient disturbances, a temporal validation rule is applied. A decision is confirmed only if the degradation condition persists over a time window of length
:
where
represents the minimum number of consecutive detections required for confirmation.
The proposed health assessment model follows a structured sequence of sensing, processing, condition evaluation, and decision-making. By integrating multi-sensor data with risk-based assessment and temporal validation, the model supports reliable and safety-oriented monitoring of railway wagon components under real operating conditions.
Algorithm 1 summarizes the complete workflow of the proposed health assessment framework. The algorithm processes continuous sensor data streams through preprocessing and feature extraction stages to compute normalized health indicators for each monitored component (i.e., health indicator is a normalized condition metric). Component health states are determined using predefined thresholds, while degradation trends are evaluated over a sliding window to capture persistent deterioration. A safety-oriented decision logic combines health indicator deviation and trend information to generate maintenance decisions, which are further validated using temporal persistence constraints to reduce false alarms. This structured procedure enables reliable, real-time assessment of railway wagon component health under operational conditions.
5. Experimental Setup and Dataset Description
This section presents the experimental configuration adopted for evaluating the proposed health assessment model, followed by a description of the publicly available dataset used in the study. The selected dataset provides complete degradation trajectories required to validate all stages of the proposed model.
5.1. Experimental Setup
The experimental setup is designed to emulate continuous condition monitoring of safety-critical railway wagon components. Sensor data are processed sequentially to reflect real-time health assessment, where each new observation contributes to the evaluation of component conditions. Raw vibration signals are segmented into fixed-length time windows to enable consistent preprocessing and feature extraction. Signal normalization and noise filtering are applied to reduce the influence of measurement disturbances. Feature vectors derived from each segment are used to compute component-specific health indicators. Health indicators are evaluated at each time step using the methodology described in previous sections. Predefined thresholds are applied to classify component conditions into normal, degraded, and critical states. Degradation trends are evaluated over a sliding window to ensure robustness against transient fluctuations and to support reliable decision-making.
All computational experiments were performed on a standard desktop system equipped with an Intel i7 processor and 16 GB RAM to evaluate algorithmic efficiency. The proposed algorithms were implemented in C (GCC version 9.3.0), while MATLAB (R2021a) was used for data analysis and visualization of results. The obtained performance metrics indicate potential suitability for real-time applications; however, validation on target hardware platforms is necessary to confirm deployment feasibility under practical operating conditions. The simulation and evaluation parameters used in this study are summarized in
Table 2.
The thresholds , , and were predefined to ensure clear separation between normal, degraded, and critical health regions during degradation progression. The temporal parameters L, , and were selected to support stable degradation confirmation within the persistence-based validation mechanism. The same parameter values were applied across all comparative methods to ensure fairness and eliminate bias arising from selective parameter adjustment. The focus of this study is on the methodological contribution of the health assessment framework rather than parameter optimization.
5.2. Dataset Description
The datasets used in this study correspond to the degradation of rotating mechanical components such as rolling element bearings (NASA-IMS [
24] and PRONOSTIA [
27]), turbofan engine components (C-MAPSS [
28]), and industrial rotating machinery (MIMII [
29]). These components exhibit degradation behavior similar to railway wagon axle bearings and wheelset assemblies, thereby providing realistic validation scenarios for the proposed health assessment framework. The proposed health assessment model is evaluated using the NASA-IMS bearing dataset. This dataset consists of run-to-failure vibration measurements collected from rolling element bearings operated under constant load and speed conditions. The dataset includes multiple experimental runs, each capturing the complete degradation process from healthy operation to bearing failure. Vibration signals are recorded at regular intervals using accelerometers mounted on the bearing housing. The availability of full degradation trajectories makes the dataset suitable for constructing health indicators, evaluating degradation trends, and validating health state classification.
Although the dataset originates from a laboratory test rig, the degradation mechanisms and vibration characteristics closely resemble those observed in railway wagon axle bearings. As a result, the dataset provides a realistic and widely accepted benchmark for evaluating condition-based health assessment methods in safety-critical monitoring applications.
To provide formal evidence supporting the methodological transferability of the NASA-IMS dataset to railway axle-bearing applications, quantitative signal-level analysis was conducted. The degradation progression was evaluated using root mean square (RMS), kurtosis, and frequency-domain spectral energy concentration. The RMS values exhibited a gradual monotonic increase over operational cycles, indicating progressive amplitude growth consistent with rolling element defect evolution. Kurtosis values increased significantly during later degradation stages, reflecting the emergence of impulsive fault signatures commonly reported in axle-bearing diagnostics.
Additionally, spectral analysis revealed progressive energy amplification near characteristic defect frequency bands as failure approached. These degradation patterns are consistent with vibration characteristics reported in railway axle-bearing studies [
25,
26]. While the dataset originates from laboratory conditions, the observed statistical and spectral fault evolution dynamics demonstrate structural similarity in degradation progression. Therefore, the dataset is employed as a methodological validation benchmark rather than as a direct railway field replication.
5.3. Multi-State and Binary Classification Performance Evaluation
The proposed framework defines three discrete health states: Normal (), degraded (), and critical (). To ensure methodological consistency, performance evaluation is conducted under both three-class and binary settings.
5.3.1. Three-Class Evaluation
Let
denote the number of samples belonging to actual class
and predicted as class
. The three-class confusion matrix is therefore defined as
For each class
, the following quantities are defined:
The class-wise precision, recall, and F1-score are computed as
To obtain an overall balanced performance measure across all health states, the macro-averaged F1-score is computed as
Overall classification accuracy is defined as
The false alarm rate for each class is defined as
where
The performance of the health assessment model is evaluated using a confusion matrix mentioned in
Table 3. For performance evaluation, the degraded and critical states are combined into a single abnormal class to enable binary classification focused on early fault detection.
5.3.2. Binary Evaluation for Early Abnormality Detection
For safety-oriented abnormal condition detection, degraded and critical states are grouped into a single abnormal class. The mapping is formally defined as
Under this mapping, binary performance metrics (accuracy, precision, recall, F1-score, and false alarm rate) are computed using the standard two-class confusion matrix. The inclusion of both three-class and binary evaluations ensures methodological consistency while reflecting practical deployment scenarios focused on early abnormality detection. These metrics jointly provide a comprehensive evaluation of classification performance. While accuracy and precision reflect overall correctness, recall and false alarm rate are particularly important for safety-critical railway wagon monitoring.
6. Results and Discussion
This section presents the experimental results obtained using the NASA-IMS bearing dataset and discusses the performance of the proposed health assessment model. The NASA-IMS dataset is employed as a benchmark run-to-failure dataset for validating the proposed health assessment methodology. While it does not capture all railway-specific operational effects, it represents the degradation behavior of rotating mechanical components that are functionally similar to railway wagon axle bearings. The evaluation is carried out using eight performance metrics that jointly assess classification performance, detection reliability, and safety effectiveness. The proposed model is compared with three representative existing approaches: statistical threshold-based condition monitoring (ST-CBM) [
30], feature-based supervised health classification (FSHC) [
31], and instantaneous health indicator-based assessment (IHIA) [
32].
6.1. Rationale of Existing Methods
ST-CBM [
30] represents traditional rule-based condition monitoring approaches in which predefined statistical thresholds are applied directly to sensor features. Although this method is simple and computationally efficient, it is sensitive to noise and operating variations, often leading to higher false alarm rates and delayed detection of gradual degradation. FSHC [
31] employs extracted features and supervised classification to identify health states. Compared to ST-CBM, this approach improves classification accuracy by learning decision boundaries from data. However, it generally evaluates observations independently and does not explicitly account for temporal degradation behavior, which limits early fault detection reliability. IHIA [
32] constructs a normalized health indicator to represent component conditions at each time instant. This approach provides an interpretable representation of health and improves sensitivity to degradation. Nevertheless, the absence of temporal persistence validation makes the method susceptible to transient disturbances and spurious alerts.
6.2. Classification Performance
This section describes the classification performance of all evaluated methods. All these performance metrics reported in this study, including accuracy, precision, recall, F1-score, and false alarm rate, are computed from the corresponding confusion matrices constructed for each experimental run. For three-class evaluation (normal, degraded, critical), a confusion matrix is used. The proposed model achieves an accuracy of 94.8%, outperforming ST-CBM (84.3%), FSHC (88.6%), and IHIA (91.2%). Improvements in precision, recall, and F1-score further indicate reliable and balanced health state classification. These gains are attributed to the integration of normalized health indicators with temporal validation, which reduces misclassification caused by short-term fluctuations.
6.2.1. Accuracy Analysis
Accuracy represents the overall correctness of health state classification by measuring the proportion of correctly identified operating conditions.
Figure 2 illustrates the accuracy comparison between the proposed method and existing algorithms across multiple public datasets. The proposed framework consistently achieves higher accuracy due to its integrated health indicator normalization and trend-based validation. In contrast, ST-CBM [
30] relies on fixed thresholds, which limits adaptability to varying operational conditions. FSHC [
31] improves accuracy through feature-based learning but remains sensitive to class boundary ambiguity, while IHIA [
32] benefits from normalized indicators but lacks temporal consistency, leading to occasional misclassifications.
6.2.2. Precision Analysis
Precision quantifies the reliability of degradation alerts by measuring the proportion of correctly identified abnormal conditions among all issued alerts. As shown in
Figure 3, the proposed method demonstrates higher precision, indicating fewer false maintenance alarms. ST-CBM [
30] exhibits lower precision due to its susceptibility to noise-induced threshold crossings. FSHC [
31] improves alert reliability by leveraging discriminative features, whereas IHIA [
32] achieves moderate gains through health indicator normalization but remains vulnerable to transient disturbances. The proposed framework enhances precision through persistence-based decision logic, ensuring that alerts are only generated for sustained degradation.
6.2.3. Recall Analysis
Recall, also referred to as the detection rate, measures the ability of a method to correctly identify degraded or faulty conditions.
Figure 4 shows that the proposed method consistently achieves higher recall across datasets, demonstrating improved sensitivity to early-stage degradation. ST-CBM [
30] often exhibits lower recall due to conservative threshold selection, resulting in missed detections. FSHC [
31] improves detection capability but remains limited by static classification boundaries, while IHIA [
32] detects faults more effectively but lacks persistence validation. The proposed framework captures degradation trends over time, enabling reliable detection without sacrificing robustness.
6.2.4. F-Measure Analysis
The F-measure provides a balanced evaluation of classification performance by combining precision and recall into a single metric. As illustrated in
Figure 5, the proposed method achieves the highest F1-scores with lower variability across datasets. This indicates a balanced trade-off between detection sensitivity and alert reliability. Existing methods tend to emphasize either detection or reliability in isolation, leading to suboptimal F-measure values. The proposed approach effectively balances these competing objectives through integrated health assessment and temporal validation.
6.3. Detection and Safety-Oriented Performance
Detection reliability and safety relevance are evaluated using the false alarm rate, detection latency, and critical condition detection rate. For safety-oriented abnormal condition detection, degraded and critical states are grouped into a single abnormal class and evaluated using a binary confusion matrix. Detection of latency is defined as the number of monitoring cycles between the true onset of degradation and the first confirmed detection generated by the proposed method. The latency values reported in this study represent the average across all experimental runs. As shown in
Figure 6 and
Figure 7, the proposed model achieves a false alarm rate of 4.1%, which is significantly lower than those of the comparative methods. The detection latency is reduced to an average of six operational cycles, enabling earlier identification of degradation while maintaining robustness.
6.3.1. False Alarm Rate Analysis
The false alarm rate (FAR) quantifies the proportion of normal operating conditions incorrectly classified as faulty.
Figure 5 shows that the proposed framework significantly reduces FAR compared to existing algorithms.
ST-CBM [
30] exhibits the highest FAR due to fixed thresholds that are sensitive to noise and operational variability. FSHC [
31] and IHIA [
32] reduce FAR to some extent but still produce spurious alerts under transient disturbances. The proposed method minimizes false alarms by incorporating degradation persistence and trend confirmation, improving operational reliability.
6.3.2. Detection Latency Analysis
Detection latency measures the time or number of operational cycles required to identify degradation after its onset.
Figure 8 demonstrates that the proposed method achieves lower detection latency across datasets, enabling earlier intervention. ST-CBM [
30] typically exhibits delayed detection due to conservative thresholds, while FSHC [
31] and IHIA [
32] rely on instantaneous classification decisions that may delay reliable confirmation. The proposed framework leverages degradation trend analysis to identify emerging faults earlier without increasing false alarms.
6.3.3. Overall Classification Performance
Overall classification performance reflects the combined effectiveness of a health assessment framework across multiple metrics, including accuracy, precision and recall, as shown in
Figure 8. The overall classification performance was calculated from run-wise confusion matrices using three-class evaluation, and the reported values correspond to the mean performance across three independent degradation runs. The results demonstrate that the proposed method consistently achieves superior and more stable performance across multiple datasets. Existing methods exhibit trade-offs between sensitivity and reliability, whereas the proposed framework provides a balanced and robust solution by integrating health normalization, trend analysis, and persistence-based decision logic.
6.4. Computational Performance
The computational efficiency of the proposed framework is assessed in terms of processing time per assessment cycle. Computational performance was measured as the average time required to process one fixed-length time window through the complete health assessment model. For each window, execution time was recorded from feature extraction to final health state decision, including preprocessing, health indicator computation, and decision logic. The average processing time was computed for each degradation run and then averaged across the three independent runs, with standard deviation calculated to assess variability. Data loading and plotting were excluded so that the reported values reflect only the monitoring computation time.
Table 4 shows that the proposed model requires approximately
ms per cycle, which is comparable to FSHC and IHIA. Despite the inclusion of temporal validation, the computational overhead remains low, confirming the suitability. The proposed framework demonstrates strong potential for deployment in railway wagon monitoring, subject to hardware validation and real-world implementation.
Remark 1. The marginal increase in computational time is mainly due to the additional steps used to analyze degradation trends and verify persistence over time. These steps help distinguish actual component degradation from short-term fluctuations, which is important for avoiding unnecessary alerts in safety-critical systems. Although this introduces a small computational overhead, the processing time remains well within real-time limits. The proposed design intentionally prioritizes reliable and stable decision-making rather than minimizing computation time.
Across all performance metrics, such as accuracy, precision, recall, F1-score, false alarm rate, detection latency, and computational time, the proposed health assessment model demonstrates consistent and balanced improvements over existing approaches. By integrating normalized health indicators, temporal persistence validation, and safety-oriented decision logic, the proposed framework effectively addresses the key limitations of traditional condition monitoring methods. The results confirm its suitability for safety-critical railway wagon health monitoring under real operating conditions.
7. Conclusions and Future Scope
This paper presented a comprehensive health assessment framework for railway wagon monitoring based on continuous condition evaluation of safety-critical components. The proposed approach integrates structured sensor data processing, normalized health indicator construction, temporal degradation analysis, and safety-oriented decision logic to enable reliable health state assessment under real operating conditions. Unlike traditional threshold-based or instantaneous monitoring approaches, the proposed framework emphasizes the early detection of both degradation and robustness against transient disturbances. Experimental evaluation using a publicly available run-to-failure bearing dataset demonstrated that the proposed model consistently outperforms representative existing methods across multiple performance metrics. Improvements were observed in classification accuracy, detection reliability, safety-oriented metrics, and real-time computational performance. The reduction in false alarm rate and detection latency highlights the effectiveness of incorporating temporal validation and trend-based assessment, which are essential for safety-critical railway applications.
The scope for future work includes several promising directions. First, the framework can be extended to incorporate remaining useful life estimation to support predictive maintenance planning and resource optimization. Second, the current health assessment model can be enhanced by integrating adaptive threshold selection to improve robustness under varying operational and environmental conditions. Third, large-scale deployment and validation using real-world railway wagon data will further strengthen the practical applicability of the proposed approach. Finally, the integration of optimization-based maintenance scheduling strategies can enable coordinated decision-making at the fleet level, contributing to improved safety, reliability, and cost efficiency in railway operations.
Author Contributions
S.K.M.G. contributed to the conceptualization of the study, development of the methodology, software implementation, formal analysis, investigation, data curation, and preparation of the original manuscript draft. He also carried out visualization tasks and led the overall technical execution of the work. K.A.N. contributed to the conceptualization of the research and provided supervision throughout the study. Validation of the proposed framework and experimental results was performed collaboratively by S.K.M.G., K.A.N., A.A.C., and N.C. A.A.C. and N.C. contributed to result verification, consistency checks, and critical review of the experimental findings. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The data used in this study are publicly available benchmark datasets, including the NASA-IMS bearing dataset, PRONOSTIA, C-MAPSS, and MIMII datasets. These datasets can be accessed from their respective public repositories. The data generated during the study are derived from simulation and analytical processing and are available from the corresponding author upon reasonable request.
https://data.nasa.gov/dataset/ims-bearings (accessed on 5 January 2026).
Acknowledgments
During the preparation of this manuscript, the authors used ChatGPT (OpenAI, version GPT-5.3) for language editing and refinement of the manuscript text. The authors reviewed and edited the output and take full responsibility for the content of this publication.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Bešinović, N. Resilience in railway transport systems: A literature review and research agenda. Transp. Rev. 2020, 40, 457–478. [Google Scholar] [CrossRef]
- Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
- Kaur, G.; Moza, B. Exploring railway forensics: Top approaches and future directions. Asian J. Sci. Technol. 2023, 14, 12561–12567. [Google Scholar]
- Paixão, A.; Fortunato, E.; Calçada, R. Applications of low-cost and smart mobile devices for railway infrastructure performance assessment and characterization. In Digital Railway Infrastructure; Springer Nature: Cham, Switzerland, 2024; pp. 43–61. [Google Scholar]
- Ye, Y.; Zhang, J.; Liang, H. An acoustic-based recognition algorithm for the unreleased braking of railway wagons in marshalling yards. IEEE Access 2020, 8, 120295–120308. [Google Scholar] [CrossRef]
- Hoelzl, C.; Dertimanis, V.; Landgraf, M.; Ancu, L.; Zurkirchen, M.; Chatzi, E. On-board monitoring for smart assessment of railway infrastructure: A systematic review. In The Rise of Smart Cities; Elsevier: Amsterdam, The Netherlands, 2022; pp. 223–259. [Google Scholar]
- Duan, Y.; Cao, X.; Zhao, J.; Xu, X. Health indicator construction and status assessment of rotating machinery by spatio-temporal fusion of multi-domain mixed features. Measurement 2022, 205, 112170. [Google Scholar] [CrossRef]
- Wang, X.; Lu, S.; Chen, K.; Wang, Q.; Zhang, S. Bearing fault diagnosis of switched reluctance motor in electric vehicle powertrain via multisensor data fusion. IEEE Trans. Ind. Inform. 2021, 18, 2452–2464. [Google Scholar] [CrossRef]
- Loidolt, M.; Egger, J.; Korenjak, A.K. Data-Driven Condition Monitoring of Fixed-Turnout Frogs Using Standard Track Recording Car Measurements. Appl. Sci. 2025, 15, 11122. [Google Scholar] [CrossRef]
- Kabashkin, I. The Iceberg Model for Integrated Aircraft Health Monitoring Based on AI, Blockchain, and Data Analytics. Electronics 2024, 13, 3822. [Google Scholar] [CrossRef]
- Ibadah, N.; Benavente-Peces, C.; Pahl, M.-O. Securing the Future of Railway Systems: A Comprehensive Cybersecurity Strategy for Critical On-Board and Track-Side Infrastructure. Sensors 2024, 24, 8218. [Google Scholar] [CrossRef]
- Hu, W.; Xin, G.; Wu, J.; An, G.; Li, Y.; Feng, K.; Antoni, J. Vibration-based bearing fault diagnosis of high-speed trains: A literature review. High-Speed Railw. 2023, 1, 219–223. [Google Scholar] [CrossRef]
- Zhu, D.; Lyu, J.; Gao, Q.; Lu, Y.; Zhao, D. Remaining useful life estimation of bearing using spatio-temporal convolutional transformer. Meas. Sci. Technol. 2024, 35, 045126. [Google Scholar] [CrossRef]
- Zhang, W.; Li, C.; Peng, G.; Chen, Y.; Zhang, Z. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 2018, 100, 439–453. [Google Scholar] [CrossRef]
- Sanz Bobi, J.D.; Garrido Martínez-Llop, P.; Rubio Marcos, P.; Solano Jiménez, Á.; Fernández, J.G. Prediction of Degraded Infrastructure Conditions for Railway Operation. Sensors 2024, 24, 2456. [Google Scholar] [CrossRef]
- Dinh, T.P.; Le, Q.H.; Thach, T.N.; Kim, B.; Ahn, Y. Railway Track Structural Health Monitoring: Identifying Emerging Trends and Research Agendas Using Bibliometric and Topic Modeling. Appl. Sci. 2025, 15, 12462. [Google Scholar] [CrossRef]
- Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
- Wang, W. An adaptive predictor for dynamic system forecasting. Mech. Syst. Signal Process. 2007, 21, 809–823. [Google Scholar] [CrossRef]
- Samanta, B.; Al-Balushi, K.R. Artificial neural network based fault diagnostics of rolling element bearings using time-domain features. Mech. Syst. Signal Process. 2003, 17, 317–328. [Google Scholar] [CrossRef]
- Widodo, A.; Yang, B.-S. Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process. 2007, 21, 2560–2574. [Google Scholar] [CrossRef]
- Yan, R.; Gao, R.X.; Chen, X. Wavelets for fault diagnosis of rotary machines: A review with applications. Signal Process. 2014, 96, 1–15. [Google Scholar] [CrossRef]
- Lei, Y.; He, Z.; Zi, Y. A new approach to intelligent fault diagnosis of rotating machinery. Expert Syst. Appl. 2008, 35, 1593–1600. [Google Scholar] [CrossRef]
- Qiu, H.; Lee, J.; Lin, J.; Yu, G. Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. J. Sound Vib. 2006, 289, 1066–1090. [Google Scholar] [CrossRef]
- Sacerdoti, D.; Strozzi, M.; Secchi, C. A comparison of signal analysis techniques for the diagnostics of the IMS rolling element bearing dataset. Appl. Sci. 2023, 13, 5977. [Google Scholar] [CrossRef]
- Yang, Z.; Wu, B.; Shao, J.; Lu, X.; Zhang, L.; Xu, Y.; Chen, G. Fault detection of high-speed train axle bearings based on a hybridized physical and data-driven temperature model. Mech. Syst. Signal Process. 2024, 208, 111037. [Google Scholar] [CrossRef]
- Shaikh, M.Z.; Ahmed, Z.; Chowdhry, B.S.; Baro, E.N.; Hussain, T.; Uqaili, M.A.; Mehran, S.; Kumar, D.; Shah, A.A. State-of-the-art wayside condition monitoring systems for railway wheels: A comprehensive review. IEEE Access 2023, 11, 13257–13279. [Google Scholar] [CrossRef]
- Omoregbee, H.O.; Edward, B.A.; Olanipekun, M.U. Bearing failure diagnosis and prognostics modeling in plants for industrial purpose. J. Eng. Appl. Sci. 2023, 70, 17. [Google Scholar] [CrossRef]
- Maulana, F.; Starr, A.; Ompusunggu, A.P. Explainable data-driven method combined with bayesian filtering for remaining useful lifetime prediction of aircraft engines using nasa cmapss datasets. Machines 2023, 11, 163. [Google Scholar] [CrossRef]
- Gantert, L.; Zeffiro, T.; Sammarco, M.; Campista, M.E.M. Multiclass classification of faulty industrial machinery using sound samples. Eng. Appl. Artif. Intell. 2024, 136, 108943. [Google Scholar] [CrossRef]
- Deng, Y.; Hou, B.; Shen, C.; Wang, D. Statistical learning modeling based health indicator construction for machine condition monitoring. Meas. Sci. Technol. 2022, 34, 014008. [Google Scholar] [CrossRef]
- Silva-Rodríguez, J.; Salvador, P.; Naranjo, V.; Insa, R. Supervised contrastive learning-guided prototypes on axle-box accelerations for railway crossing inspections. Expert Syst. Appl. 2022, 207, 117946. [Google Scholar] [CrossRef]
- Sysyn, M.; Gerber, U.; Kluge, F.; Nabochenko, O.; Kovalchuk, V. Turnout remaining useful life prognosis by means of on-board inertial measurements on operational trains. Int. J. Rail Transp. 2020, 8, 347–369. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |