1. Introduction
Since the 20th century, as awareness has grown on the international stage, it has become accepted that development is not solely an economic matter, and the concept of sustainable development has emerged. Transportation is also affected by this concept.
The transportation sector plays a critical role in ensuring sustainable development, as it impacts all economic, environmental, and social dimensions. In this context, the concept of “sustainable urban transportation” has come to the fore, aiming to meet mobility and access needs in urban areas in the most economical and environmentally sensitive way [
1]. Various policies have been developed worldwide to achieve this goal.
Today, successfully implementing urban transportation policies and ensuring sustainability is only possible by evaluating these policies within the framework of occupational health and safety (OHS) issues and regulations. As an environmentally friendly urban transportation mode, the metro recorded approximately 58 billion ridership worldwide in 2023 [
2]. Besides environmental concerns, factors such as service quality, accessibility, affordability, sociodemographic characteristics, and built environment features play a role in the usage of transportation modes [
3,
4]. Additionally, OHS has a significant impact on transport mode choices by shaping how safe, manageable, and low-risk the commute feels, because perceived safety in stations and vehicles strongly influences whether people choose public transport [
5,
6]. Transporting thousands of passengers on a single ride imposes greater responsibility in terms of OHS during operations. Therefore, a comprehensive fuzzy risk analysis model that demonstrates an evidence-based approach to the risks encountered during metro operations has been developed in this study.
Many studies cover the concepts of occupational safety and health in metros; however, only a few have conducted fuzzy safety risk analysis of operational risks, especially in the last decade [
7]. Rong et al. used questionnaires about underground railway safety, which were completed by Harbin Metro Line 1 staff and passengers [
7]. Although the study focuses on operational safety, it relies solely on survey data and does not use experimental methods. Wu et al. present a Bayesian network and Delphi method model for evaluating fire risks and emergency decision-making in underground metro stations [
8]. However, the research is based only on specific scenarios and expert opinions; furthermore, real-world data are considered only to a limited extent. In their study, Deng et al. aimed to analyze metro failures and evaluate their influencing factors to promote safety management in Nanjing Metro operations by failure modes, effects, and criticality analysis (FMECA) [
9]. The analysis was conducted using a limited number of FMs and long-term field data. Then, Deng et al. analyzed the quantitative effect of the safety climate on workers’ safety behavior in subway operations using the structural equation model (SEM) [
10]. However, since these data were based on a survey, they contained subjective judgments, and no fuzzy method was used to address uncertainty. Wu et al. combined a cloud model with an improved CRITIC method to evaluate urban rail safety. By accounting for uncertainty and assigning objective indicator weights, the framework—tested on the Beijing Metro Line 1—provided more systematic and reliable insights into passenger satisfaction and operational risks [
11]. A study by Li et al. measured public transportation passenger satisfaction through the fuzzy MULTIMOORA method using fuzzy ordered weights for decision-makers [
12]. W. Wang et al. used the analytic hierarchy process (AHP) to calculate the weights of the risk evaluation indicators and used the fuzzy comprehensive evaluation method to assess the overall fire risk level of a metro line in China [
13]. Moreover, Ju et al. presented a methodology for evaluating the fire risks of Changzhou subway stations by developing a comprehensive fire risk assessment index system via a combination of the AHP and TOPSIS methods [
14]. Avcı and Koca applied the AHP to the security risk analysis of smart railway systems in the context of transportation safety [
15]. The lack of an objective weighting mechanism is a methodological constraint. In 2025, Zou et al. proposed an integrated model combining Stimulus–Organism–Response (SOR) and the Technology Acceptance Model (TAM) to study thermal comfort in the Changsha Metro in terms of a safer and sustainable journey [
16]. The collection of data over a limited four-month period limits the applicability of the results across metro systems.
Traditional FMEA estimates the FMs of actual or potential incidents only with inherently subjective crisp values, which do not account for uncertainty, vagueness, or hesitancy in decision-makers’ judgments [
17]. Therefore, this study proposes a unique enhanced FMEA approach that integrates three key components: expert weighting, interval-valued intuitionistic fuzzy sets (IVIFSs), and real-world work incident data-driven risk parameter factors. By combining expert knowledge and experience with objective empirical data, the goal of this study is to help make underground stations safer by developing a more holistic and actionable risk analysis method. Finally, by implementing this novel approach in the underground stations of a metro line within a metropolitan city in Türkiye, improved risk prioritization is demonstrated, thereby enabling a reduction in workplace accidents in the operational phase of transportation.
This paper is divided into five sections, the remainder which are organized as follows. In
Section 2, the limitations of traditional FMEA are discussed; then, the methodology, consisting of calculating expert weights, implementing fuzzy transformation and dynamic weighting through the analysis of accident data, and the derivation of normalized scores, is described, and the practical implementation of this approach in the context of metro station evaluation is demonstrated. In
Section 3, a comparison of the results of the traditional FMEA, IVIF FMEA, and the proposed IVIF FMEA methods is presented.
Section 4 discusses the limitations of the study.
Section 5 emphasizes that the proposed approach provides more realistic, data-driven, and reliable risk prioritization compared to traditional methods.
2. Materials and Methods
Although traditional FMEA is a useful risk analysis method, it has certain shortcomings. First, risk parameters are scored on a discrete and crisp scale of 1 to 10, heavily reliant on subjective expert judgment and failing to account for uncertainty [
18]. Additionally, risk factors that are treated equally do not reflect real-world conditions. Furthermore, different score combinations across risk factors can yield the same risk priority number (RPN) value, making FMs more difficult to rank. This disrupts the safety measures needed to prevent FMs, risking the safety of personnel and passengers. To address these challenges, researchers have developed various weighting methods and proposed different approaches to handle uncertainty since Zadeh introduced fuzzy sets [
19].
This unique method is developed to address the weaknesses of traditional FMEA risk analysis, such as its inability to handle expert judgment uncertainty, and to quantify the vagueness of risk parameters, while also incorporating a scoring system based on actual accident records [
20]. This work presents a detailed methodology for implementing this approach and conducts a comparative analysis of traditional FMEA, classical IVIF FMEA, and the proposed IVIF FMEA. The proposed model follows the steps below in sequence to perform a comprehensive analysis that incorporates historical workplace accident data while also accounting for the uncertainty in expert opinions, which the traditional FMEA method cannot do.
Data Input and Preparation: Initial FMEA expert ratings and historical incident data are loaded to process traditional RPN and derive objective incident factors.
Crisp-to-Fuzzy Conversion: The crisp severity (S), probability of occurrence (O), and detectability (D) scores provided by individual experts are converted into interval-valued intuitionistic fuzzy numbers (IVIFNs) using
Table 1. This step captures the inherent vagueness and uncertainty of human judgment within a fuzzy framework.
Expert Weighting: Decision-makers are identified, and their influence is derived from their years of experience, education levels, and profession scores.
Fuzzy Aggregation: These IVIFNs, representing individual expert opinions, are then aggregated for each FM using the interval-valued intuitionistic fuzzy weighted averaging (IVIFWA) operator. This operator incorporates static expert weights to reflect the varying influence of each decision-maker.
Risk Parameter Factors Calculation: For each FM, the incident severity factor and incident frequency factor are calculated from the historical incident data. The detectability consensus factor measures the agreement among experts on detectability. These factors are normalized to reflect the relative contributions as risk parameter weights of each FM.
Defuzzification and Ranking: Traditional RPNs and the scores of the IVIF FMEA and proposed IVIF FMEA methods are calculated via the score function. Risk parameter factors are used to adjust the risk parameter weights of the proposed method.
In traditional FMEA risk analysis, the RPN is a widely used product of 1–10 scores for the S, O, and D parameters, as shown in Equation (1) [
21]. The average of each expert’s scores for S, O, and D is calculated, where
t is the number of experts, and
where
. However, only 120 distinct RPN values can be generated. Many different risk scenarios yield the same RPN value, which needs to be used to sort them in descending order for the implementation of risk-mitigating precautions. In such cases, it is unofficially recommended that the FM with a higher S score should be prioritized. The approach used to break these ties is a forced workaround that lacks a mathematical foundation [
22]. Moreover, it assigns equal weights to both experts and risk parameters. By incorporating uncertainties and differing expert judgments to systematically resolve tied rankings, fuzzy approaches provide much more precise risk prioritization.
The IVIFS theory was introduced by Atanassov to show and quantify the ambiguous nature of subjective judgments [
23]. In this theory, the belonging of an element to a set is represented by a membership degree (
), a non-membership degree (
), and a hesitancy degree (1
, whose values are intervals rather than crisp values representing the uncertainty of judgments [
24]. Classical IVIF FMEA is a mode of fuzzy FMEA risk analysis that utilizes vagueness and decision-makers’ weighting, preferably through different aggregation operators. Here, expert scores for S, O, and D were converted into IVIFSs using the linguistic variables of Tooranloo and Saghafi, as shown in
Table 1 [
25]. Then, they were aggregated using the static expert-weighted IVIFWA operator. The aggregated IVIFSs for S, O, and D are multiplied to form the IVIF RPN, which is then defuzzified and normalized to a 0–1000 scale using the score function to facilitate comparison with the traditional RPN [
26]. No adjustments based on factors developed are applied. This scenario is labeled “IVIF FMEA Score” in the analysis of the results.
Table 1.
Interval-valued intuitionistic fuzzy linguistic variables [
25].
Table 1.
Interval-valued intuitionistic fuzzy linguistic variables [
25].
| Severity | Occurrence | Detectability |
|---|
| IVIFN | Linguistic Term | IVIFN | Linguistic Term | IVIFN | Linguistic Term |
|---|
| ([1.00,1.00], [0.00,0.00]) | Risky Without Warning (RWOW) | ([0.75,0.95], [0.00,0.05]) | Very High (VH) | ([0.05,0.10], [0.85,0.90]) | Absolutely Impossible (AI) |
| ([0.85,0.90], [0.05,0.10]) | Risky With Warning (RWW) | ([0.50,0.70], [0.05,0.25]) | High (H) | ([0.05,0.10], [0.70,0.75]) | Highly Unlikely (HU) |
| ([0.75,0.80], [0.05,0.10]) | Very High (VH) | ([0.30,0.50], [0.20,0.40]) | Medium (M) | ([0.20,0.25], [0.55,0.60]) | Unlikely (U) |
| ([0.65,0.70], [0.15,0.20]) | High (H) | ([0.05,0.25], [0.50,0.70]) | Low (L) | ([0.35,0.40], [0.45,0.50]) | Very Low (VL) |
| ([0.55,0.60], [0.25,0.30]) | Medium (M) | ([0.00,0.05], [0.75,0.95]) | Very Low (VL) | ([0.45,0.50], [0.35,0.40]) | Low (L) |
| ([0.45,0.50], [0.35,0.40]) | Low (L) | | | ([0.55,0.60], [0.25,0.30]) | Medium (M) |
| ([0.35,0.40], [0.45,0.50]) | Very Low (VL) | | | ([0.65,0.70], [0.15,0.20]) | Relatively High (RH) |
| ([0.20,0.25], [0.55,0.60]) | Insignificant (I) | | | ([0.75,0.80], [0.05,0.10]) | High (HU) |
| ([0.05,0.10], [0.70,0.75]) | Very Insignificant (VI) | | | ([0.85,0.90], [0.05,0.10]) | Very High (VH) |
| ([0.05,0.10], [0.85,0.90]) | None (N) | | | ([1.00,1.00], [0.00,0.00]) | Absolutely Possible (AP) |
Let
be an IVIFS on a universe
X, having the form
where
denotes the lower and upper interval-valued membership degrees and
denotes the lower and upper interval-valued non-membership degrees of
to
, respectively, with the following conditions [
23]:
The hesitation degree is given by [
17]
2.1. Expert Weighting
To account for the varying levels of expertise among decision-makers, a weighted aggregation approach, proposed by Yaşlı and Bolat, is employed based on each expert’s years of experience, education level, and profession [
27]. These criteria are often used to reflect the practical knowledge, theoretical understanding, and domain-specific relevance of each decision-maker by ensuring that more qualified experts have a proportionally greater influence on the final aggregated fuzzy scores.
Table 2 shows the scores according to the professions, work experience, and education levels of the decision-makers.
If the sum of the scores collected by each expert based on their professions, experience, and education levels is called the total score, where
is the weight of the expert, then
This scoring function calculates each expert’s score and assigns a weight to differentiate their judgments.
2.2. Fuzzy Aggregation
A crucial aspect of applying IVIFSs in group decision-making environments is aggregating diverse expert opinions. Xu developed the IVIFWA operator, which aggregates fuzzy numbers while accounting for the relative importance of different decision-makers or criteria [
28].
The following equation represents a general IVIFWA operator for aggregating IVIFNs from multiple sources, such as different experts [
29]. This operator accounts for the relative importance, or weight, of each expert when combining uncertain judgments.
where
is the total number of experts or information sources included in the aggregation process;
is the index of the expert or information source, ranging from 1 to ;
is the weight of the expert;
represents the final, aggregated IVIF value obtained by combining the judgments of experts for a specific FM () and evaluation criterion ()—this aggregated value is expressed as a pair including a membership degree and a non-membership degree, where is the individual IVIF assessment provided by the -th expert for the same FM () and criterion (), and each is expressed with its own membership degree () and non-membership degree ();
is the membership degree specified by the -th expert for the value—this indicates how well the FM conforms to the relevant fuzzy property;
is the non-membership degree specified by the -th expert for the value —this indicates how much the risk does not conform to the relevant fuzzy property.
Through the aggregation process, three different IVIFNs for each S, O, and D parameter derived from each expert are converted into a single aggregated
—separately for each parameter—taking the expert weights into account. For FMs analyzed using the classical IVIF FMEA method, the discussion proceeds in
Section 2.4, while the risk parameter factors need to be calculated for the proposed approach.
2.3. Risk Parameter Factors
For FMs lacking historical incident data or significant expert disagreement implementation, the proposed IVIF FMEA tends to align with classical IVIF FMEA rankings, suggesting that the dynamic weighting mechanism requires sufficient data variance or expert disagreement to trigger substantial re-prioritization. The novel factors calculated for this study are listed below. The dynamic weight for each FM (
i) is
where
I is the influence factor derived from each risk parameter (
l) factor stated as in each factor’s section below.
2.3.1. Incident Severity Factor
This factor is calculated by using historical incident data obtained from the metro authority’s archives. The raw data were filtered for operational risks in underground stations for the last five years. Each of these incidents was scored according to the need for first aid, hospitalization, and the duration of incapacity for work and assigned to the FM that caused each incident using
Table 3 [
30].
The aggregated IVIFN for S,
, is adjusted via the incident severity factor (
) according to the impact of the incident, which is obtained from historical workplace incident records. Let
N be the total number of FMs:
The combined incident severity is the score that is calculated for each FM that caused an incident based on the impact, using
Table 3. The sum of the combined incident severity across all related incidents is normalized by the total severity impact across all risks to obtain
. This factor quantifies the proportional impact of a risk based on its severity, as shown below:
A higher
indicates a greater historical severity impact, which should increase the membership degree
and decrease the non-membership degree
of the
that was aggregated through IVIFWA. The dynamic weight of the parameter adjusts the IVIFN of each FM through the equation below:
is the dynamic weight of the S parameter calculated through the Equation (7), where the influence value . The reason for adding 1 to the factor is that, unless there is a prior accident record associated with the FM, the analysis resembles classical IVIF FMEA.
2.3.2. Incident Frequency Factor
Although frequency is a parameter in the Fine–Kinney safety risk analysis method, it has an indirect relationship with the O parameter in FMEA. Based on Sematech’s occurrence ranking criteria, this parameter represents the potential number of occurrences per unit time [
21]. Hence, the number of incidents caused by an FM in a year should be calculated using the incident frequency factor (
), which ranges from 0 to 1. This factor reflects the proportional historical frequency of a risk.
A higher value indicates that the FM is associated with a greater proportion of the total historical incident frequency, making it empirically more critical. This factor is then used to dynamically adjust the weight of the O parameter calculation, giving greater prominence to FMs with documented histories of frequent incidents. This factor behaves similarly to
. The dynamic weight of the O parameter adjusts the IVIFN of each FM through the equation below:
The dynamic weight of the O parameter, , is calculated through Equation (7), with the influence value . The concept is the same as in the weight calculated for S.
2.3.3. Detectability Consensus Factor
Although these weights determine how effective an expert will be regarding the RPN for a given FM, the proposed approach also assesses the consensus among expert scores for the D parameter. Because this parameter is unique to the FMEA method, experts may struggle to assign a D score, especially when transitioning from another risk analysis method. In addition, this parameter is highly dependent not only on experience but also on attention. If there are significant differences among the experts’ D scores, the detectability consensus factor (
) value decreases, indicating a lower level of consensus on detection. This also increases the weight of D (
WD) and the RPN score to emphasize that this FM needs a higher priority. Here, SD
D is the standard deviation of the D scores, and
If
is set to 1. This factor is then used to adjust the weight of the D parameter in the proposed IVIF FMEA approach. If
, the original D of IVIF FMEA is returned, indicating no adjustment due to complete disagreement with the parameter weights. The effect of this factor is the inverse, because, while a high D score indicates “hard to detect”, a high consensus factor indicates higher confidence or lower uncertainty. Meanwhile,
is aggregated through IVIFWA:
Because deals with uncertainty, the factor is used as . Hence, the influence value becomes when calculating the dynamic weight of D.
2.4. Calculating Normalized Scores
Aggregated IVIFSs provide an effective method for transforming subjective and uncertain assessments from multiple experts into a single, representative fuzzy value, while accounting for their relative expertise. Let
be the IVIF RPN, which is calculated by multiplying the aggregated IVIFNs of S, O, and D. If
and
are the IVIFNs of two different risk criteria and their product is defined as [
31]
then it becomes
.
The proposed IVIF FMEA is an IVIF FMEA procedure achieved through the IVIFWA operator, including expert weighting, consensus factors, and evidence-based work-related incident data adjustments to enhance risk prioritization. This method uniquely follows the steps of classical IVIF FMEA; nevertheless, S, O, and D are adjusted with incident severity factors, incident frequency factors, and detectability consensus factors, respectively, due to the dynamic weighting, as shown in Equations (10), (12) and (14), before multiplication. The resulting RPN of the proposed method is calculated as
Then, it must be converted into a crisp score using a defuzzification method based on Bai’s improved score function, which transforms the interval-valued membership and non-membership degrees into a single real number [
32]:
The raw score typically ranges from 0 to 1. To obtain a comparable normalized score on an equal scale to traditional RPN, the is multiplied by 1000. This provides the final RPN score for each FM, representing its priority.
Consequently, scores are calculated and ranked for each scenario. The differences in rankings between the traditional RPN and the IVIFS-based methods highlight the value added by incorporating fuzzy logic, expert weighting, and historical data adjustment through the factors. The process of the proposed method, covering all these steps, is shown as a flowchart in
Figure 1.
2.5. Real-World Application of the Proposed IVIF FMEA to Underground Metro Station Risks
The proposed method was implemented in FMEA for the operational risks of a metro line within a metropolitan city in Türkiye. A total of 65 FMs were identified for 11 underground stations, and 516 work incident records related to 19 different FMs were obtained from historical data. Three experts, who knew these risks well, were involved in the risk assessment stage. Each expert was provided with the FMEA scale in
Table 1 and asked to score the risk analysis. The first 10 of them are shown in
Table 4.
Before adjusting the risk parameters with the factors, unlike in traditional FMEA processes, the expert weighting was calculated and applied to the IVIF FMEA method using the scores in
Table 2 and Equation (5).
Table 5 shows the calculated weights of the experts involved in the risk analysis as members of the metro organization’s risk assessment team according to the Turkish OSH Occupational Safety and Health Law. In addition to classical IVIF FMEA, the proposed method is among the most comprehensive IVIF FMEA methods, with dynamic weights based on external factors such as incident severity and frequency, as well as on detectability consensus. When historical incidents are present for a specific FM, the incident severity factor and incident frequency factor directly amplify the membership degrees and suppress the non-membership degrees of S and O, respectively, while the detectability consensus factor has the opposite effect.
3. Results
The analysis provides a comprehensive comparison of risk prioritization using three distinct methodologies: traditional RPN, IVIFWA-aggregated IVIF FMEA, and a real-world, data-driven, factored IVIFWA-aggregated IVIF FMEA as the proposed approach. The top 10 FMs, sorted by the proposed approach rank, are presented in this section, along with their scores and ranks across all three methodologies and the associated adjustment factors.
Figure 2a shows a bar chart of the RPNs of the top 10 factored FMs based on the proposed IVIF FMEA method, while
Figure 2b shows the normalized scores of the fuzzy FMEA methods. Even though both methods account for uncertainties and static weights in expert opinions, the difference in the scores is due to the inclusion of historical incident data as factors in the proposed method.
Although the scales of the traditional FMEA RPN and IVIF FMEA scores are fixed at 1–1000, they are not applied to the same scale for comparison. While RPNs have discrete values, the scores of IVIF FMEA methods have continuous values, as with other fuzzy FMEA methodologies. Therefore, traditional RPN and IVIF FMEA scores are shown on the separate graphs. The most striking outcome is the significant re-ranking of the FMs when moving from the traditional RPN to the IVIFS-based methods, especially with the proposed approach.
Table 6 shows the traditional RPNs and the ranks of the different methodologies, ordered according to the top 10 FMs of the proposed method, along with the dynamic weights of the S, O, and D risk parameters.
For all three FMEA methodologies, only FM60, with the potential FM of “use of cutting and piercing tools”, ranks first, with scores of approximately 380 and 670 for the IVIF FMEA methods, while its RPN is 220. For the proposed IVIF FMEA approach, the score increased further and it remained in first place due to incidents such as punctures, stabbing, cutting, etc., mainly due to the different judgments of the experts on D. Low consensus led to a
WD value of approximately 0.500, which is significantly higher than the values of
WS and
WO. This illustrates the proposed method’s strength in highlighting risks with high inherent uncertainty in detection, even if historical incidents are rare. In contrast, FM15 has a perfect
of 1.000, leading to balanced dynamic weights that match those of the traditional FMEA parameters. Experts agree on the level of D of the risk posed by people sitting on parapet walls at station entrances, as well as the associated hazards. For FM17, which covers worn-out tactile paving, there were moderate incident factors and a moderate consensus on D. It appears that the dynamic re-weighting that occurred was proportionally consistent with its original assessment relative to other risks, or the magnitude of adjustment was not strong enough to cause a relative shift in its rank. In particular, FM18 and FM49 stand out among the top 10 as the FMs with the greatest ranking increases, which means that more attention should be paid to eliminating blind spots in security cameras and to the work carried out on ventilation systems. Although no associated incidents have occurred according to historical incident records, these changes are due to conflicts in the D scores, making the risk difficult to predict before it occurs. In terms of ranking differences, working at height on ventilation systems emerges as a risk that warrants greater emphasis. This implies that, even without recorded incidents, the strong expert D consensus and high initial S and O scores—as reflected in the traditional RPN—can maintain their high priority in the IVIFS framework.
Table 7 shows the effects, causes, and current measures for the potential FMs. Straight lines are due to the same score being obtained after the score function for the four pairs of FMs.
The relationship between the traditional RPNs and the proposed IVIF FMEA scores is shown as a scatter plot in
Figure 3. Most of the values are collected between 0 and 100 on the x-axis and 100 and 500 for the proposed IVIF FMEA score. It is demonstrated that the RPNs and fuzzy FMEA scores do not align on the scatter plot. Therefore, these methodologies are frequently evaluated using FM ranks.
Although the rankings of the IVIF FMEA methods follow each other as expected, the traditional FMEA seems to diverge from them. The most divergent ranks lie especially in the section between the reference lines of FM21 and FM24. The scores of the proposed method and the traditional method diverge more significantly in this section. In fact, the highest rank changes are found in this section, as shown in
Figure 4.
Table 8 lists the ranks of the five FM methodologies and the most considerable rank changes. It is sorted according to the magnitudes of the rank changes of the proposed method. The change in ranking between traditional FMEA and IVIF FMEA is due to expert weighting and processing uncertainties arising from fuzzy logic; the change in ranking between IVIF FMEA methods is due to dynamic weighting in S, O, and D resulting from processing historical incident data. While FM59 caused only one incident with hospitalization, FM27 caused two incidents, of which only one resulted in hospitalization, although these two FMs have not caused any incapacitation. As shown in
Table 9, the increases in their rankings indicate that chemicals involving works and visual inspection before moving should be addressed earlier than the others. Even though the rest of these five FMs have not caused any incidents, FM40’s rank has increased because of the relatively low consensus factor. These dramatic changes demonstrate that, without historical data, expert weights, and consensus factors, traditional RPNs may not accurately reflect actual FM priorities that warrant immediate attention. While FM36 and FM31 are related to fire prevention systems, inconsistent D scores from experts and the lack of historical data led to a decrease in these FMs’ ranks, resulting in insufficient current measures. The ranks of potential FMs, including flooding or door entrapment, are higher, indicating that more safety measures are needed.
4. Discussion
The decision-makers selected were the three individuals that were the most knowledgeable about the risks at metro stations, who were responsible for the organization’s risk assessment. Although they were weighted by experience, educational background, and profession, they may not have fully reflected the contributions of their areas of expertise to specific FMs. To overcome this, the dynamic risk parameter weights within the scope of the study could be made more complex, creating a model that decision-makers can adjust. In addition, the sensitivity of the expert weighting was evaluated by comparing the base case with four weight variation scenarios, which were equally weighted and shifted by 60% for each decision-maker, using Spearman’s Rho (ρ) and the top 10 FMs stability metric. The results show a satisfactory range of sensitivity, with ρ values ranging from 0.6483 to 0.9339, and the dominance scenario for DM1 had the highest. In addition, 80% of the top 10 FMs remained at the same rank for this scenario. This validates expert weighting as a balanced, reliable, and discriminative tool. Because the weights of each FM’s risk parameters are dynamically adjusted through the factors calculated by the proposed approach, sensitivity analysis cannot be performed for the risk parameters.
The IVIFWA operator was used to aggregate the expert IVIFNs. However, different aggregation operators (e.g., weighted geometric mean, prioritized weighted average) can produce different results and may be more suitable for different risk assessment scenarios.
If this approach were applied to a larger metro system where more historical workplace incident records are available, the results could be evaluated on a larger scale to ensure that sustainable transportation is safer.
Future research could explore the implementation of this approach across various industrial sectors and investigate other forms of fuzzy aggregation or defuzzification to refine the model’s performance further.
5. Conclusions
This study has successfully developed and demonstrated an enhanced risk prioritization approach for FMEA safety risk analysis that addresses the key limitations of traditional methods. By integrating expert weighting, real-world incident data implementation, and decision-maker consensus factors, along with fuzzy risk analysis’s inherent vagueness, the proposed methodology offers a more robust, realistic, and data-driven approach to risk assessment.
The core novelty of this research lies in its holistic integration of empirical incident data to dynamically weight the S, P, and D parameters within an IVIF FMEA structure while accounting for experts’ consensus. Unlike prior fuzzy approaches that often rely on static or pre-defined weights for risk parameters, this model introduces a data-driven mechanism where workplace incidents directly influence the importance of S and P. Crucially, the model also dynamically adjusts the weight of D based on the degree of expert agreement, thereby explicitly accounting for epistemic uncertainty in the assessment process. In the resulting equation, the proposed method incorporates the dynamic weights of risk parameters, whereas the classical IVIF FMEA method does not consider these weights. This approach provides a more realistic and reliable framework by enabling the weighting of each individual risk parameter. It yields a more realistic and reliable framework by enabling the weighting of the impact factors derived from each individual risk factor.
In addition, this study contributes to the field of risk management by developing a more robust, context-aware, and uncertainty-resilient risk prioritization framework. By employing IVIFS and the IVIFWA operator, the model moves beyond simplistic point estimates of crisp numbers, effectively resolving the issue of tied RPNs by providing a clearer, more actionable prioritization hierarchy. This allows for a more adaptive and realistic representation of risk, especially for FMs with high historical impacts or significant expert disagreement.
The analysis highlights several critical FMs that require focused attention to ensure safer metro station operations. FM60, which involves the “use of cutting/piercing tools”, particularly during rail renovations, is the highest-ranked FM across all methods, as it carries the potential for severe injuries such as punctures, cuts, or even fatal outcomes. FM49, which has the greatest rank increase in the top 10 FMs, underscores the need for more urgent safety precautions than originally anticipated regarding “working at height” to maintain the ventilation system on the station ceiling. Although there are no documented historical incidents related to this FM, expert disagreement over detectability elevates its priority, emphasizing that latent hazards must not be underestimated. FM59, the FM with the most significant overall rank increase, involves the use of chemical substances. It has gained importance due to historical incidents leading to hospitalization, alongside risks of chemical exposure, injury, and poisoning. Its rising priority demonstrates how real-world incident data can reveal underestimated risks, reinforcing the need for strict adherence to PPE protocols and safe handling procedures. FM40, showing the second-highest rank increase, concerning “flooding or water inundation”, represents a significant operational hazard because excessive rain or plumbing failures can disrupt train services and potentially threaten personnel and passenger safety. Although it lacks a strong historical incident record, its rank has increased due to relatively low expert consensus on detectability, revealing how uncertainty in assessment can expose vulnerabilities in emergency preparedness. Finally, FM27, which has the third-highest rank increase, is related to the inability to perform visual inspections via operator monitors—a condition that poses a risk of entrapment or injury if the platform cannot be adequately viewed. Its ranking has risen due to a documented past incident that resulted in hospitalization, highlighting the practical consequences of limited visibility and emphasizing the need for reliable monitoring systems. Together, these FMs illustrate how the proposed approach more accurately exposes both high-impact and data-sensitive risks, ensuring that safety efforts are directed where they are truly needed.
The comparative analysis clearly shows that the traditional RPN, due to its inherent subjectivity and inability to handle uncertainty or empirical data, often yields risk rankings that diverge significantly from those of IVIFS-based approaches. The IVIFWA-aggregated IVIF FMEA scenario already demonstrates a slight improvement by incorporating expert-weighted fuzzy judgments that capture the nuances of expert opinions, vagueness, and varying levels of expertise. In addition, with incident severity factors and incident frequency factors derived from historical data, along with detectability consensus factors, the IVIFWA-aggregated IVIF FMEA scenario provides more comprehensive and actionable risk prioritization. The re-ranking observed across the scenarios highlights the potential for the misallocation of resources and oversight of critical risks if only traditional FMEA methods are employed. This proves that prioritizing risks is equally as important as the safety measures themselves.
The proposed model offers a valuable tool for improving the OHS system in urban transportation. By providing a more reliable and empirically grounded mechanism for identifying and mitigating transportation risks, it can lead to improved safety, enhanced operational efficiency, and more effective resource allocation in terms of sustainability, because sustainable urban transportation strongly depends on ensuring not only environmentally friendly operations but also safer rides.