1. Introduction
Benchmarking, although defined in various ways, is widely accepted as a structured process aimed at improving standards through the adoption of best practices. Its primary purpose is the implementation of proven solutions in dynamic environments, serving as a tool for comparison and evaluation of all aspects of operations, products, or services against leading industry examples [
1]. Traditionally, the focus of benchmarking has been on performance analysis in the later stages of the process, where retrospective evaluation of results supports planned organizational changes. However, to fully realize the potential of this method, its application needs to be expanded to anticipate unexpected changes and integrate it into the early stages of the process [
2]. In addition to enabling organizations to identify key processes that require improvement, benchmarking provides an opportunity to adopt applicable solutions from industry best practices [
3,
4]. Modern approaches increasingly emphasize proactive and preventive indicators that extend beyond financial metrics. These indicators enable the prediction of future outcomes and active influence over them, ensuring business stability, more efficient utilization of knowledge resources, and the strengthening of an organization’s competitive advantage [
5,
6]. Benchmarking has become an integral part of various sectors, including aviation, road, maritime, and railway transportation (performance analysis of airports and terminals [
7], efficiency assessment of container terminals and ports [
8], road safety evaluation [
9], and assessment of service quality, safety, and punctuality of railway operators [
10]). Also, it is widely applied in industries such as manufacturing, healthcare services, finance, education, and the public sector [
11]. Specifically, in the research of railway infrastructure managers (RIMs), which are the subject of this study, the benchmarking framework has been used for measuring railways’ safety on a national [
12] and cross-continental level [
13], environmental and economic impacts on high-speed rails [
14], the success of railway construction megaprojects [
15], railway power management systems [
16], etc.
European railway infrastructure is facing increasing operational demands, with rail networks expected to accommodate a 25% growth in freight transport and a 40% rise in passenger traffic by 2050 [
17]. However, inefficiencies in infrastructure management remain a major concern, with studies indicating that outdated systems and operational bottlenecks can reduce network capacity by up to 30% [
18]. This underscores the critical need for robust benchmarking tools to enhance performance and sustainability in railway infrastructure management. Effective railway infrastructure management is crucial for enhancing transportation systems, facilitating sustainable economic development, improving connectivity for people and goods, and reducing environmental impacts. As an environmentally friendly mode of transport, rail transport plays a significant role in achieving sustainable development goals. However, the complexity of railway infrastructure management, coupled with challenges such as outdated infrastructure, necessitates advanced tools for performance analysis and improvement. Benchmarking has come to the fore as an important performance assessment, best practice identification, and strategic decision-making tool to enhance efficiency and competitiveness in the sector. Benchmarking is a valuable tool for railway infrastructure managers to compare their performance against reference values, identify and address areas for improvement, and adopt best practices. This process is led by the Platform of Rail Infrastructure Managers in Europe (PRIME), which has created a catalogue of standardized key performance indicators (KPIs). The indicators encompass almost all the performance dimensions of the project, such as its contextual framework, safety, environment, performance, service delivery, finance, and growth. The complexity of railway systems, the numerous factors that influence them, and heterogeneity of data are the major challenges to traditional methods of performance analysis. With the use of standard KPIs and advanced analytical methods, benchmarking is not only a comparison tool for performance, but also a decision-making tool for strategic infrastructure management and resource optimization.
This study is based on the application of KPIs as input data for the development of a hybrid benchmarking model for assessing the performance of railway infrastructure managers. The proposed model integrates three methods: Principal Component Analysis (PCA) for data dimensionality reduction, the Grey Best–Worst Method (G-BWM) for determining the weight coefficients of KPIs, and Assurance Region Data Envelopment Analysis (AR-DEA) for evaluating relative efficiency. This approach enables a precise and comprehensive performance assessment, considering the complexity and diversity of the railway infrastructure sector. PCA allows for dimensionality reduction, identifying key KPIs that best explain variability within the dataset. The G-BWM provides a reliable and flexible approach for determining weight coefficients in conditions of uncertainty and limited information. By leveraging expert assessments, the G-BWM identifies the most important (best) and least important (worst) KPIs and, through pairwise comparisons, calculates optimal weight coefficients. This ensures that weight coefficients accurately reflect the true significance of each KPI, enabling a fair performance evaluation. This process is particularly important for sectors with complex and heterogeneous operational environments, such as the railway industry. Traditional DEA analysis often has limitations, due to the possibility that certain inputs or outputs receive zero weight, which compromises result accuracy and disregards critical factors. The AR-DEA (Assurance Region DEA) method overcomes this issue by introducing weight constraints (“assurance regions”) into the optimization process. These assurance regions define boundaries for input and output weight coefficients, ensuring that all key factors are included in the analysis. The AR-DEA method utilizes the weight coefficients derived through the G-BWM as initial values for setting assurance region limits. This establishes a link between subjective expert assessments and objective efficiency analysis, balancing model flexibility and accuracy. This integration enables the assessment of the relative efficiency of decision-making units (DMUs) while maintaining consistency in weight allocation. The application of the proposed model aims to enhance the benchmarking process by precisely identifying efficient and inefficient railway infrastructure managers. The model allows for a detailed comparison of different managers’ performances, highlighting best practices that can serve as guidelines for improving less efficient units. Furthermore, the model provides an in-depth understanding of key factors influencing performance, ensuring that decision-makers receive accurate and reliable information for strategic planning, resource optimization, and evidence-based decision-making in railway infrastructure management. This study makes a significant contribution to the development of methodologies for monitoring and improving railway infrastructure performance, establishing a foundation for future research and innovation in resource management optimization and railway system sustainability.
This study proposes a hybrid benchmarking model that integrates PCA, the G-BWM, and AR-DEA to address key efficiency challenges in railway infrastructure management. By offering a quantitatively grounded and decision-maker-informed efficiency assessment framework, the model provides actionable insights for optimizing infrastructure operations, guiding investment strategies, and ensuring long-term sustainability in the sector. Accordingly, the rest of the paper is organized as follows:
Section 2 provides a review of the relevant literature, focusing on benchmarking in the railway sector, efficiency concepts, and the application of advanced methodologies, including PCA, the G-BWM, and AR-DEA.
Section 3 presents a detailed description of the hybrid methodology proposed in this study, outlining the specific steps for integrating PCA, the G-BWM, and AR-DEA.
Section 4 applies the methodology to a selected sample of nine RIMs, demonstrating its practical implementation and results.
Section 5, the discussion section, analyzes the obtained results, exploring factors influencing efficiency, the advantages and limitations of the proposed methodology, and implications for both theory and practice. Finally,
Section 6 concludes the study, summarizing the key findings and providing recommendations for future research.
2. Literature Review and Key Performance Indicators
This section presents a review of the relevant literature on benchmarking in the railway sector, with a focus on efficiency and the application of advanced methodological approaches. Special attention will be given to the use of methods such as Principal Component Analysis (PCA), the Grey Best–Worst Method (G-BWM), and Assurance Region Data Envelopment Analysis (AR-DEA), which are employed for performance analysis and improvement in railway transport. This review aims to provide a more profound understanding of existing theoretical and practical approaches that are relevant to this research.
2.1. Methodological Approaches in Benchmarking
For the development of this paper, the Systematic Literature Review (SLR) method was applied to evaluate research questions related to the application of hybrid models for benchmarking analysis of railway infrastructure managers. The systematic literature review was conducted through five phases. In the first phase, selection of scientific papers published in the last five years was performed to ensure the relevance and contemporaneity of the analysis. The selection of scientific papers published in the last five years ensured that the review would capture the most current and relevant research, aligning the study with contemporary advancements in hybrid models for benchmarking railway infrastructure managers. The second phase involved defining inclusion criteria, where key search terms were focused on rail transport, infrastructure managers, performance, efficiency, and benchmarking. Defining inclusion criteria with specific search terms helps to focus the review on key areas, such as rail transport, infrastructure managers, and performance benchmarking, ensuring that only relevant and targeted literature is included in the analysis. The third phase involved structuring the literature according to the applied methodologies (PCA, the BWM, DEA, the G-BWM, and AR-DEA). Structuring the literature by applied methodologies allowed for a systematic comparison of different approaches and their application to benchmarking analysis, facilitating the identification of trends and gaps in the current body of knowledge. This was followed by the fourth phase, in which the areas of application were analyzed to determine whether the combination of PCA, the G-BWM, and AR-DEA had been previously applied in railway transport or other sectors. The literature search was conducted using the following meta-query: (TITLE-ABS-KEY (“benchmarking” OR “performance” OR “efficiency”)) AND (TITLE-ABS-KEY (“railway infrastructure” OR “railway infrastructure managers” OR “rail transport” OR “traffic” OR “transport”)) AND (TITLE-ABS-KEY (“PCA” OR “principal component analysis” OR “G-BWM” OR “grey best-worst method” OR “BWM” OR “best-worst method” OR “AR DEA” OR “assurance region DEA” OR “DEA” OR “data envelopment analysis”)) AND ((LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “re”) OR LIMIT-TO (DOCTYPE, “cp”) OR LIMIT-TO (DOCTYPE, “cr”) OR LIMIT-TO (DOCTYPE, “bo”))) AND (LIMIT-TO (LANGUAGE, “English”)). Finally, the fifth phase of the research focused on identifying the objectives and purposes of the proposed models in the reviewed papers. Identifying the objectives and purposes of the proposed models in the reviewed papers contributes to a deeper understanding of the practical outcomes and goals of applying hybrid models in benchmarking, guiding future research and practical implementation strategies. The results of the systematic literature review include an analysis of 30 scientific papers. The review enabled the identification of key methodologies and approaches, highlighting PCA, the G-BWM, and AR-DEA as fundamental components for the hybrid model, integrating dimensionality reduction, weight coefficient determination, and relative efficiency evaluation.
Table 1 presents an overview of the analyzed studies, structured according to the defined inclusion criteria, including year of publication, key terms, applied methods, application domains, and objectives/purposes of the proposed models. Additionally, key challenges and emerging trends were identified, which will be further analyzed in the following sections of the study.
Through the literature review, various applications of advanced analytical methods, including PCA, DEA, and the BWM, have been identified. PCA has been used for dimensionality reduction and the identification of key variables in studies on energy efficiency in railway systems [
19], the market competitiveness of railway freight transport [
20], optimization of railway wheel maintenance [
21], key human factors [
23], and railway infrastructure lifecycle management [
22]. The BWM has been applied for determining weight coefficients of risk factors in railway safety analyses [
24], evaluating public–private partnerships [
25], analyzing the sustainability of railway projects [
26], and assessing the impact of changes in demand and supply on inter-regional railway connectivity [
27]. In addition to these methods, researchers have employed other techniques to enhance analytical frameworks. For example, meta-analysis has been used to aggregate results from multiple studies and identify key efficiency variables in the railway sector [
39].
The DEA method, known for its ability to assess the relative efficiency of decision-making units, has found extensive application in the railway sector. It has been used for evaluating the efficiency of railway transport systems in international and regional contexts [
32,
33]. Furthermore, DEA has been applied in socio-technical performance analysis [
34] and in assessing the efficiency of freight railway companies [
35], identifying key criteria for improving competitiveness. Additionally, network DEA models have enabled a deeper analysis of railway station efficiency [
37], while specific DEA variants, such as the SBM and AR-DEA, have been employed to evaluate the performance of small intermodal terminals [
36]. Although these methods have significantly contributed to understanding various aspects of the railway sector, their integration into a hybrid benchmarking model for railway infrastructure managers remains unexplored. The motivation for developing such a model stems from the application of these methods in other sectors. While the combination of PCA, the BWM, and DEA has been applied in the banking sector [
46], it refers to their basic forms. An enhanced version that incorporates the G-BWM for improved flexibility and precision in weight determination under uncertainty, and AR-DEA, which introduces constraints on weight coefficients for a more realistic efficiency evaluation, represents a significant methodological advancement. This improved combination of methods has not been previously applied in the railway sector or other industries, underscoring the novelty and potential of the proposed hybrid model for enhancing performance evaluation and strategic decision-making.
The development of a hybrid model that combines PCA, the G-BWM, and AR-DEA represents a logical advancement in methodological frameworks, particularly within the railway sector. This model enables a more systematic analysis of complex interactions between key performance indicators (KPIs), where each method contributes to specific aspects of evaluation—from identifying key variables and ensuring reliable weighting of criteria, to accurately measuring relative efficiency. The model enhances performance evaluation by addressing DEA’s limitations, incorporating decision-maker preferences through weight bounds, and using the BWM to capture those preferences in a grey environment, which allows for handling imprecise information. Additionally, PCA reduces dimensionality and improves discrimination power, ensuring more accurate, efficient, and realistic identification of efficient DMUs, making the model valuable for performance assessment across various complex, multi-criteria environments. The proposed approach offers a unique opportunity for developing dynamic and adaptable infrastructure management strategies, thereby contributing to the long-term sustainability and competitiveness of the railway sector. Its application opens new perspectives for enhancing analytical practices in transportation and beyond, providing a methodological framework that can be adapted to the specific needs and challenges of other industries.
2.2. Role of KPIs in Railway Benchmarking
Benchmarking in the railway sector plays a key role in performance assessment by identifying best practices and comparing them with reference values. According to research, benchmarking facilitates the understanding of the relative efficiency of different railway systems, and supports strategic decision-making aimed at improving the operational and financial aspects of infrastructure management [
33]. Various studies have utilized benchmarking analyses to evaluate the efficiency of railway companies and identify best practices for enhancing operational performance. KPI-based benchmarking analysis is widely applied in the railway sector to optimize different operational aspects. For instance, benchmarking analysis is used for improving the energy efficiency of urban railway systems [
53], assessing service quality in railway transport [
54], enhancing national railway system safety [
12], comparing performance measurement standards among different railway operators [
55], evaluating railway system performance across Europe [
56], assessing high-speed rail network operational readiness [
57], and evaluating the performance of infrastructure managers in the railway sector [
58].
Benchmarking utilizing KPIs serves as a critical tool for assessing and improving railway infrastructure management, enabling systematic performance comparisons across different networks and identifying best practices for efficient management. The European Commission, through the PRIME (Platform for European Railway Infrastructure Managers) initiative, has established a catalogue of KPIs to standardize performance monitoring and enhance railway infrastructure management. This system covers a broad range of aspects, including contextual framework, safety, environment, performance, service delivery, finance, and growth, allowing for a structured approach to analyzing and comparing infrastructure managers’ performance across Europe [
59]. By utilizing KPIs, railway infrastructure managers can systematically monitor and analyze key parameters that impact the efficiency and safety of the railway system. The benchmarking process facilitates performance comparisons among different railway infrastructure managers, thereby identifying best practices. Using KPIs as a benchmarking foundation ensures precise evaluation and transparency, as clear metrics can be applied to assess success across various domains. The PRIME initiative not only provides a framework for internal evaluation, but also enables comparisons between different national and regional networks, fostering best practices in the railway transport sector. Through this methodology, performance comparisons can drive innovation and alignment with European standards, ultimately contributing to better conditions for passengers and more efficient railway system operations across Europe. Key challenges in benchmarking analysis within the railway sector include limited data availability, inconsistencies between different databases, and variations in regulatory frameworks across countries [
60]. Consequently, methodologically robust approaches must be applied to ensure reliable and accurate performance comparisons. To ensure precise and reliable benchmarking, this paper employs KPIs of railway infrastructure managers. The benchmarking analysis is based on the application of PCA, G-BWM, and AR-DEA methods, which enable a comprehensive evaluation of railway infrastructure managers’ performance.
3. Methodology—Hybrid PCA–G-BWM–AR-DEA Model
Benchmarking the performance of railway infrastructure managers is a key tool for identifying best practices and improving efficiency. This analysis requires a systematic approach that incorporates KPIs to ensure an objective and reliable evaluation. Given the complexity of infrastructure management, these indicators encompass various efficiency aspects, including operational, financial, technical, and environmental factors. However, evaluation challenges often arise, due to variable interdependencies, data multidimensionality, and subjectivity in determining the importance of individual criteria. To address these challenges, this paper proposes a hybrid model that integrates PCA, the G-BWM, and AR-DEA, leveraging their combined advantages to establish a comprehensive and robust framework for benchmarking railway infrastructure managers’ performance based on KPIs. The proposed hybrid model offers a significant advancement in performance evaluation by addressing key limitations of the traditional DEA method. DEA’s automatic weight assignment often leads to unrealistic evaluations, and the AR DEA model overcomes this by incorporating decision-maker preferences through weight bounds. However, defining these bounds requires a reliable method for capturing preferences, which is why the BWM is employed. The grey environment further enhances the model’s robustness by enabling it to handle imprecise or incomplete information. Additionally, PCA is integrated to reduce dimensionality and improve discrimination power by focusing on the most critical KPIs, which enhances both computational efficiency and the accuracy of the evaluation. This combination of methods ensures more flexible, accurate, and realistic identification of efficient DMUs, contributing to a more effective performance evaluation framework. This model’s contributions to the field include improved discrimination power, computational efficiency, and applicability to complex, multi-criteria environments, making it a valuable tool for performance assessment across various disciplines.
PCA is employed to identify and extract the most relevant variables within the model. This method transforms the original dataset into a new set of variables—principal components—that maximize variance and reduce redundancy among indicators [
61,
62]. The first step in PCA involves data standardization, followed by the calculation of the covariance matrix, the determination of eigenvalues and eigenvectors, and the formation of feature vectors that define the principal components. The objective of this process is to eliminate collinearity among variables and enhance model stability. After dimensionality reduction, the G-BWM is applied to determine the weight coefficients of KPIs. The G-BWM is a Multi-Criteria Decision-Making (MCDM) method based on the grey system theory, allowing experts to quantify subjective preferences by identifying the best and worst criteria. The key steps in this method include defining the evaluation scale, selecting reference criteria, calculating grey vectors that describe the relative relationships among criteria, and estimating the initial and final weight coefficients. This approach enhances evaluation consistency under uncertainty and reduces subjective biases in weight assignment.
In the final stage, the AR-DEA method is used to assess the relative efficiency of railway infrastructure managers. DEA is a non-parametric method for measuring the efficiency of DMUs, employing linear programming to distinguish between efficient and inefficient units. In this study, AR-DEA extends the standard DEA model by introducing additional constraints that enhance the discriminatory power of the analysis, allowing for more precise ranking of units. The process involves data collection and normalization, the formation of input and output variable matrices, and the calculation of the relative efficiency of each DMU. The combination of these methods provides a comprehensive performance evaluation of railway infrastructure managers, integrating KPIs and expert assessments into the benchmarking process. The steps of the proposed model are described below and illustrated in
Figure 1.
Step 1: Define the problem structure, i.e., the set of DMUs and the inputs and outputs for the efficiency evaluation.
Step 2: Apply PCA to analyze the relationships among input and output variables, reduce multicollinearity, and extract the most relevant components for further analysis [
63].
Step 2.1: For the initial data
, where
i = 1, 2, …,
N, a
t = 1, 2, …,
T, standardization is performed using the following formula:
where
is the mean value, and
is the standard deviation of the variable
.
Step 2.2: Let
denote the covariance matrix of the standardized data
; then, it can be decomposed as follows:
where
is a positive diagonal matrix containing the eigenvalues
,
, …,
, and the columns of the matrix
are the corresponding eigenvectors. It is assumed that the eigenvectors (principal components) are ordered in descending order of their eigenvalues, where the first K eigenvectors represent the factor loadings
. Then,
K features
are obtained, which are defined as follows:
where je
represents the
N-dimensional vector of the standardized term structure at time
t.
The covariance matrix captures the relationships between variables and helps to identify the directions of maximum variance, while the eigenvectors define the principal components along those directions, enabling dimensionality reduction and the transformation of data into a new uncorrelated feature space.
Step 2.3: To determine how well a principal component explains the data, the total variance and the proportion of variance of the
m-th principal component are used, defined as follows:
This proportion represents the explanatory power of each principal component. Accordingly, the principal component with the highest explanatory power becomes the first principal component, followed by the second with the second highest explanatory power, and so on. The principal components that capture the largest portion of the total variance (typically 80–90%) are selected, ensuring that the most important aspects of the data are preserved for further analysis.
Step 3: Define the input/output weight limits by applying the G-BWM (adapted from [
64].
Step 3.1: Define the grey scale for the evaluations (
Table 2).
Step 3.2: Decision-makers (DMs)
select the best and the worst input/output (
and
,
), and then evaluate the other inputs/outputs according to them using the scale from
Table 1, thus obtaining grey vectors „the best compared to others”—
, and “others compared to the worst”—
.
Step 3.3: Obtain the optimal grey input/output weight limits
by solving the following optimization problem:
where
,
is the grey limit of the “best” criterion,
is the grey limit of the “worst” input/output,
is the grey limit of the input/output
j,
,
,
is the grey preference of the “best” input/output over the input/output
j, and
is the preference of the input/output
j over the “worst” input/output.
is the white value of the grey number
:
Step 3.4: Check the comparison consistency by calculating the consistency ratio (
CR) with the following formula:
where
is the white value of the grey number
, obtained by (6), and
CI is the consistency index:
where
The comparison is considered consistent if the
CR value is close to 0.
Step 3.5: Obtain the final input/output limits, as follows:
Step 4: Select the efficient DMUs by applying the AR DEA method (adapted from [
65]).
Step 4.1: Define the input value matrices for all DMUs (
Xij), i.e., the output value matrices for all DMUs (
Yis), in the following manner:
where
xij is the value of input
j for DMU
i,
i = 1, …,
o,
j = 1, …,
e, and
yis is the value of output
s for DMU
i,
s = 1, …,
t.
e and
t represent the total number of inputs and outputs taken into consideration, respectively.
Step 4.2: Define the normalized matrices of inputs and outputs. The normalized matrices of inputs (
) and outputs (
) are formed by the normalized values of inputs and outputs, obtained in the following manner:
Step 4.3: Calculate the efficiencies for each DMU. In order to calculate the efficiencies for each DMU, it is necessary to solve the following linear optimization model:
subject to the following:
where
vj,
j = 1, …,
e, and
us,
s = 1, …,
t, are the input and output weights, respectively;
and
are the lower and upper weight limits for the input
j; and
and
are the lower and upper weight limits for the output
s, calculated by applying Equations (11) and (12).
p and
q are the scaling factors, enabling the fulfilment of conditions for introducing the input and output weight bounds.
4. Application of Proposed Model for Benchmarking Analysis of RIMs
In this section, the hybrid model for benchmarking analysis of RIMs is applied, utilizing a combination of PCA, G-BWM, and AR-DEA methods. The main objective of this approach is to conduct a detailed evaluation of the performance of nine selected RIMs, using a set of 35 KPIs. These KPIs serve as the foundation for quantifying efficiency and identifying best practices in railway infrastructure management. This multidimensional approach enables a systematic analysis and comparison of results, taking into account various business aspects, including contextual framework, safety, environment, performance, service delivery, finance, and growth. In Step 1, the DMUs in this study consist of nine RIMs from different EU countries. To ensure data privacy and confidentiality, the identities of the RIMs will remain anonymous, with each manager being assigned an alphanumeric code (e.g., RIM1, RIM2, etc.). These units were carefully selected to ensure comparability, thereby enhancing the accuracy and validity of the results. For the selection of KPIs, this study relies on the set of indicators defined in the PRIME report of the European Commission, which covers key aspects of railway infrastructure performance. The selection of KPIs was based on their high priority and relevance for benchmarking analysis. The indicators were meticulously chosen to ensure measurability and comprehensiveness, covering all critical performance dimensions of railway infrastructure managers. Additionally, the KPIs used in this study were selected based on previous research [
60], where the same indicators were applied to assess the performance of railway infrastructure managers. Data collection in this phase is of critical importance, as the data must be accurate, verified, and standardized to ensure reliable subsequent analysis. Ensuring high-quality data and aligning them with the required standards guarantees that the analysis remains valid and that the results are relevant for decision-making. All data used in this phase were collected from reliable sources, including relevant reports and the PRIME platform, ensuring consistency and analytical integrity [
18]. In Step 2, the PCA method was applied to reduce the dimensionality of the dataset KPIs, thereby enhancing analytical efficiency and improving data interpretation. The PCA implementation was conducted using Minitab software version 17.1, which facilitated the necessary statistical computations and visualization of results, streamlining further analysis and data interpretation.
Table 3 presents the results of the principal component analysis, including the eigenvalues, proportions, and cumulative proportions of variance for each principal component (PC1 to PC8).
The first principal component (PC1) has an eigenvalue of 10.805, explaining 30.9% of the total variance in the dataset. PC2 and PC3, with eigenvalues of 8.411 and 5.118, account for an additional 24% and 14.6% of the variance, respectively. By analyzing the eigenvalues and cumulative proportion of variance, it is evident that the first five components (PC1–PC5) collectively capture 88.7% of the total variance, which serves as an optimal threshold for retaining critical information while reducing dimensionality. The remaining components (PC6–PC8) contribute significantly less to the explained variance, and may be excluded from further analysis if the goal is to simplify the model.
Figure 2 presents a scree plot of the extracted components, visually illustrating the distribution of eigenvalues and enabling the identification of key components that retain the highest variance.
Table 4 shows the results, including factor loadings, which indicate the strength of the relationship between individual KPIs and principal components. These insights help to identify the most significant factors for decision-making and performance optimization in railway infrastructure management.
Following the application of PCA, selection of KPIs was performed, identifying those with the most significant impact on data variability within the first five principal components (PC1–PC5), which together accounted for 88.7% of the total variance. The selected variables were identified based on factor loadings, considering indicators with the highest absolute values, ensuring the retention of critical information necessary for further analysis. In the first principal component (PC1), the variables C1 (0.257), G5 (0.252), and D5 (−0.254) emerged as the most representative. These variables correspond to the degree of railway electrification, ATP (Automatic Train Protection) system coverage, and track failure frequency, highlighting their significant contribution to railway infrastructure management efficiency. The second principal component (PC2) was primarily influenced by the variable G6 (−0.315), which relates to the connectivity of the railway network with maritime ports. For the third component (PC3), the key indicators were C5 (−0.329), S3 (0.308), and S5 (0.306), which include the share of the national network managed by the RIM, safety indicators related to infrastructure management, and the share of electric trains in the total traffic. These indicators emphasize the relationship between safety aspects and energy efficiency within the railway system. Within the fourth component (PC4), the most significant indicator was C3 (0.404), which represents the modal share of rail transport in freight transport. This variable underscores the importance of freight transport as a critical efficiency factor in the railway network. The fifth component (PC5) identified S2 (0.359) and P4 (−0.53) as the most relevant KPIs. These variables are associated with the number of serious accidents and the percentage of cancelled trains due to infrastructure-related issues, emphasizing the importance of safety and operational challenges in the railway sector.
All the selected variables have high interpretative value, as they provide a more accurate assessment of the indicators influencing railway infrastructure efficiency. Furthermore, these indicators can serve as a foundation for optimizing existing operational strategies, and support informed decision-making in railway system management.
The further evaluation is based on ten KPIs, categorized into input and output variables. The input variables include the degree of main railway line electrification (C1), coverage of the Automatic Train Protection (ATP) system (G5), share of the national network managed by the RIM (C5), connectivity with maritime ports (G6), modal share of rail freight transport (C3), and track failures relative to network size (D5). On the other hand, the output variables consist of infrastructure manager-related safety indicators (S3), the share of electric trains in total traffic (S5), the number of serious accidents (S2), and the percentage of cancelled trains due to infrastructure-related issues (P4). These indicators serve as the foundation for assessing factor significance using the G-BWM and evaluating efficiency through the AR DEA method. In Step 3, the G-BWM was applied to determine the weight boundaries for inputs and outputs. The process began with the definition of the grey scale for indicator evaluation (
Table 2), utilizing interval values derived from grey number theory. Once the scale was defined, experts identified the most important (Best) and least important (Worst) input and output, which acted as reference values for evaluating the remaining indicators. Based on these reference values, the remaining input and output variables were assessed relative to the best and worst indicators, forming grey vectors. The first vector represents the preference of the best indicator over the others, while the second vector reflects the preference of all indicators concerning the worst indicator. The obtained scores were then used in an optimization model, where formulas (5) to (12) were applied to compute the lower and upper weight boundaries for each input and output. This process also included a consistency check to ensure that the expert assessments were logically coherent and mutually aligned, thereby increasing the reliability and robustness of the weighting process.
Table 5 presents the input evaluations obtained using the G-BWM, where inputs are assessed relative to the most important (Best) and least important (Worst) criteria, while the final column displays the corresponding grey weights. Based on the results, the degree of main railway line electrification (C1) was identified as the most important input, having the highest weight range. In contrast, the modal share of rail freight transport (C3), the share of the national network managed by the RIM (C5), and ATP system coverage (G5) were assigned approximately equal importance. The lowest weight was assigned to track failures relative to network size (D5), indicating that while this indicator remains relevant for analysis, it has the least impact on overall infrastructure efficiency.
Table 6 presents the output evaluations, where the output variables are assessed relative to the best (Best) and worst (Worst) output. The infrastructure manager-related safety indicator (S3) was identified as the most significant output, exhibiting the highest weight, while the number of serious accidents (S2) was recognized as the least significant output, with the lowest weight range. The share of electric trains (S5) and the percentage of cancelled trains due to infrastructure issues (P4) demonstrated moderate importance in the performance evaluation, with P4 having slightly lower weights compared to S5. These results indicate that in assessing the efficiency of railway infrastructure managers, the greatest emphasis is placed on safety indicators associated with infrastructure management.
After determining the input and output weights using the G-BWM, Step 4 applied the AR-DEA method to assess the efficiency of railway infrastructure managers (RIMs). This analysis was based on ten key performance indicators (KPIs), where C1, C3, C5, D5, G5, and G6 were defined as inputs, while S2, S3, S5, and P4 were designated as outputs. Based on the previous step, each input and output were assigned weight limits and optimized through the G-BWM.
In the first step of the AR-DEA method, input and output matrices for all decision-making units (DMUs) were formed, as defined in formulas (13) and (14). These matrices enabled the evaluation of each unit’s efficiency based on the available data. Subsequently, the data were normalized to ensure comparability across all inputs and outputs, following formulas (15) and (16). Once normalization was completed, efficiency scores for each DMU were computed by solving an optimization problem, where the weighted sum of outputs was maximized under the condition that the weighted sum of inputs equaled 1, as shown in formulas (17) and (18). The predefined input and output weight limits obtained from the G-BWM were incorporated as constraints in the model, as represented in formulas (19) and (20). Finally, scaling factors were introduced to ensure that the weight limits remained within the defined intervals, as described in formulas (21) and (22). By solving this optimization problem, efficiency scores were determined for each DMU. Units with an efficiency score of 1 are considered fully efficient, while those with lower efficiency scores indicate areas for potential improvement.
Table 7 presents the input and output values for the analyzed railway infrastructure managers, along with their efficiency scores calculated using the AR-DEA method. The table displays the six selected inputs (C1, C3, C5, D5, G5, G6) and four outputs (S2, S3, S5, P4), which serve as the basis for determining efficiency coefficients for each RIM.
The results indicate that RIM8 is the only infrastructure manager with an efficiency score of 1.000, making it the reference unit in the analysis, and demonstrating optimal resource utilization. The other RIMs exhibit varying levels of efficiency, with the lowest efficiency scores recorded for RIM2 (0.190) and RIM4 (0.222). Among the non-optimal units, RIM6 achieves the highest efficiency score (0.341), suggesting that it is approaching optimal performance, but still has room for improvement. Analyzing the input and output values reveals that units with lower efficiency are often characterized by high values of negative outputs (S2 and P4) or inefficient use of inputs, such as a high number of track failures (D5) and suboptimal electrification utilization (C1). Conversely, efficient RIMs maintain a better balance between inputs and outputs, indicating more effective infrastructure management. These results enable the identification of potential improvements for less efficient units through resource optimization and performance enhancement of key indicators.
Sensitivity Analysis
The sensitivity analysis presented in
Table 8 evaluates the stability of the obtained efficiency results across different scenarios of weight coefficient modifications. In the first four scenarios (Sc.1–Sc.4), the weight of the most important input was gradually reduced by 25%, 50%, 75%, and 90%, while in the following four scenarios (Sc.5–Sc.8), the same modifications were applied to the most important output. In Sc.9, the weights of all outputs were equalized, whereas in Sc.10, the weights of both inputs and outputs were balanced to test the model’s stability under completely uniform criteria. To enhance the clarity of the sensitivity analysis, the results are visually represented in
Figure 3, which facilitates the identification of efficiency trends across different scenarios.
The results indicate that changes in the weights of inputs and outputs had an expected impact on efficiency scores, but without drastic fluctuations, confirming the stability of the solution. The reduction in the most important input’s weight (Sc.1–Sc.4) led to a gradual decrease in efficiency for all RIMs, with the least efficient units experiencing the most significant drop. In Sc.4, where the weight of the key input was reduced by 90%, the efficiency of RIMs such as RIM2 and RIM7 significantly decreased, confirming that this indicator had a strong influence on the initial results. Conversely, in scenarios Sc.5–Sc.8, where the weight of the most important output was reduced, an increase in efficiency was observed for all RIMs. As expected, the least efficient units experienced the highest efficiency gains, as reducing the significance of the key output lowered their limiting constraints in the model. In Sc.8, where the weight of the key output was reduced by 90%, some RIMs (RIM6, RIM5) achieved efficiency scores close to 1, suggesting that this output was the main restricting factor in the baseline analysis. Sc.9 and Sc.10, in which all output weights and both input and output weights were equalized, demonstrated that efficiency values remained within an acceptable range, without significant deviations. This further confirms the robustness of the model, as the results do not exhibit extreme fluctuations, even when all factors are treated equally. The sensitivity analysis validates the stability and reliability of the obtained results, showing that changes in key input and output weights affect efficiency in the expected direction. More efficient RIMs displayed lower sensitivity to weight modifications, while less efficient RIMs exhibited greater variations across different scenarios, indicating potential opportunities for optimizing their performance.
5. Discussion
The application of a hybrid approach in this study, within the benchmarking analysis of RIMs, has enabled an objective assessment of their performance using a combination of PCA, G-BWM, and AR-DEA methods. This approach represents a significant advancement compared to traditional benchmarking methods, providing deeper insights into various aspects of railway infrastructure management and allowing for more precise strategic decision-making.
The proposed hybrid model is based on the DEA method, which identifies efficient DMUs using a set of KPIs as inputs and outputs. Compared to traditional performance evaluation approaches, DEA offers advantages such as handling multiple inputs and outputs simultaneously, not requiring predefined functional relationships, enabling direct comparisons between DMUs, and accommodating different units of measurement. However, conventional DEA has limitations, particularly in weight allocation, as it automatically assigns input and output weights without decision-maker (DM) influence. This can lead to extreme weighting scenarios in which a DMU appears efficient by emphasizing only one input and one output, while ignoring others. To address this, the AR DEA model introduces weight bounds, incorporating DM preferences and aligning the analysis with real-world conditions. Since defining these weight bounds relies on DM evaluations, a Multi-Criteria Decision-Making (MCDM) method is required. This study employs the Best–Worst Method (BWM), due to its simplicity, higher consistency, and reduced comparison requirements compared to alternatives like AHP and ANP. Additionally, the BWM is applied within a grey environment, which enhances decision-making in cases where DM evaluations are imprecise or incomplete. The grey number structure, defined by upper and lower values, aligns well with the AR DEA model’s need for weight constraints. Another drawback of conventional DEA is that the number of efficient DMUs tends to increase with more inputs and outputs, reducing discrimination power. To mitigate this, PCA is integrated into the model to identify the most critical KPIs, reducing dimensionality while preserving key information. PCA minimizes multicollinearity, enhances differentiation among DMUs, and improves computational efficiency. Thus, the hybrid model combines PCA for key input–output selection, the grey BWM for determining their priority and weight constraints, and AR DEA to derive truly efficient DMUs, ensuring a more realistic and effective evaluation framework.
In this specific case, a large number of KPIs are determined and monitored for the observed RIMs. Although at first glance, this seems like a positive thing, it makes it difficult to identify realistically effective RIMs. One of the reasons for this is that there is significant correlation and overlap between individual KPIs, and the other is that the existing and most commonly used methods for determining efficiency cannot adequately take into account a large number of inputs and outputs. In addition, there is no information on how important KPIs are in assessing the effectiveness of RIMs. The proposed model overcomes these problems by enabling the identification of key KPIs for the observed RIMs. Based on the factor loadings, which indicate the strength of the relationship between individual KPIs and principal components, PCA allowed a reduction in the problem size. At the same time, the AR DEA method identified effective RIMs, based on the limits (weight, importance) of the KPIs (input/output) obtained by applying the grey BWM for evaluation by DMs based on their experience and real-life circumstances. This made it possible to single out as efficient RIMs those that have high performance in the business segments that are truly important to the participants in these processes.
The results indicate that the most critical KPIs relate to the degree of electrification, ATP system coverage, and track failure rates. The degree of electrification directly influences energy efficiency, operational costs, and environmental sustainability, with higher electrification levels reducing reliance on fossil fuels and lowering carbon emissions. ATP system coverage enhances railway safety by preventing accidents caused by human error, ensuring compliance with speed limits, and reducing the risk of collisions, which is vital for maintaining operational reliability. Track failure rates serve as a key measure of infrastructure integrity, as frequent failures can lead to increased maintenance costs, service disruptions, and safety hazards, directly impacting network reliability. This underscores the essential role of infrastructure resources and safety measures in achieving effective railway infrastructure management. The results show that more efficient railway infrastructure managers exhibit a better balance between infrastructure resources and safety indicators, while less efficient RIMs are often burdened by a higher number of failures and lower coverage of advanced safety systems. A detailed analysis of the results reveals that RIM8 is the only fully efficient manager, with an efficiency score of 1.000, meaning it optimally utilizes available resources and can serve as a benchmark in the analysis. Other RIMs exhibit varying levels of efficiency, with RIM2 and RIM4 showing the lowest efficiencies of 0.190 and 0.222, respectively, indicating significant room for improvement. The efficiency results are primarily driven by the weighting of KPIs using the G-BWM and their values in the AR-DEA model. The proportion of the national network managed by a railway infrastructure manager affects strategic planning and investment efficiency, as larger networks require advanced asset management strategies to optimize performance and minimize operational risks. The degree of electrification of the total main track (C1) and infrastructure manager-related precursors to accidents (S3) had the highest weights, making them decisive factors. Precursors to accidents managed by infrastructure operators serve as early warning indicators, enabling proactive risk management and targeted interventions to enhance railway safety and prevent major disruptions. A higher share of electric trains in total traffic is critical for achieving sustainability goals, as electric traction significantly reduces air pollution, lowers energy costs, and enhances network efficiency. RIM2’s inefficiency stems from its low value in infrastructure manager-related precursors to accidents (S3), which carries the highest weight. RIM8, despite having the highest track failures relative to network size (D5), achieved full efficiency due to strong performance in infrastructure manager-related precursors to accidents (S3), which had a greater influence in the AR-DEA model. Conversely, RIM4’s inefficiency is driven by its high number of fatalities and serious injuries (S2) and infrastructure manager-related precursors to accidents (S3), both of which significantly impact efficiency due to their high weights. The number of fatalities and serious injuries is a fundamental safety KPI, as reducing these incidents is crucial for maintaining public trust, regulatory compliance, and the overall reliability of railway operations. High rates of passenger train cancellations due to infrastructure failures indicate operational inefficiencies and service reliability challenges, which can negatively impact customer satisfaction and financial performance. Strong connectivity with maritime ports is essential for intermodal transport efficiency, as it facilitates seamless freight transfers, reduces logistics costs, and enhances the competitiveness of rail freight networks. A higher modal share of rail freight transport indicates a well-utilized railway network, contributing to economic efficiency and sustainability by reducing road congestion, fuel consumption, and greenhouse gas emissions. These findings highlight that while infrastructure resources influence efficiency, safety, and operational stability, they are the most critical factors in performance evaluation.
The conducted sensitivity analysis confirmed the stability of the model, where the reduction in the weight of the key input was gradually reflected in lower efficiency scores, while the reduction in the most important output weight led to an increase in efficiency. Notably, less efficient RIMs were more sensitive to weight changes, suggesting that their performance is more dependent on specific infrastructure factors. In scenarios where input and output weights were equalized, efficiency scores remained within an acceptable range, further confirming the robustness of the approach.
The findings of this paper provide valuable insights that can assist infrastructure managers in making informed decisions, optimizing operational strategies, and defining future investments. Identified areas for improvement allow for targeted actions that can enhance the efficiency of the railway sector as a whole. Additionally, benchmarking analysis enables comparative evaluation, ensuring transparency and fostering competitiveness among infrastructure managers. Despite the significant advantages of the developed model, certain challenges and limitations remain. The reliability of results depends on the availability of accurate and up-to-date data, while subjectivity in expert judgement may affect the precision of weight coefficients in the G-BWM. Furthermore, the complexity of the AR-DEA method requires careful interpretation of results to ensure their practical applicability.
The hybrid model presented in this study could be adapted for other transport sectors, such as road and air transport, by incorporating sector-specific KPIs and adjusting the weighting process to reflect industry-specific efficiency drivers. For example, in road transport, key indicators could include congestion levels, vehicle emissions, and infrastructure maintenance costs, while in air transport, factors such as runway utilization, on-time performance, and the environmental impact of airport operations could be integrated. Additionally, the model’s structured approach to efficiency evaluation could serve as a foundation for transport policy-making, offering data-driven insights to regulatory bodies for optimizing resource allocation, setting performance benchmarks, and promoting sustainable infrastructure development.
One of the key challenges in collecting and validating data for the evaluation of RIMs is the inaccessibility and inconsistency of data across different countries. The PRIME platform, an initiative of the European Commission for collecting data from RIMs, plays a crucial role in standardizing and making information available. However, data submission to the PRIME platform is not mandatory, leading to issues with data incompleteness. Many RIMs lack unified or standardized performance tracking systems, and some do not monitor all KPIs, which further complicates the benchmarking analysis. Additionally, a key limitation of this study is its exclusive focus on RIMs within the EU, where regulatory frameworks, operational standards, and data availability exhibit a relatively high degree of uniformity. Nevertheless, the proposed model and methodology can be adapted for application to RIMs outside Europe, provided that necessary modifications are made to accommodate distinct regulatory landscapes, operational complexities, and data accessibility constraints. As previously discussed, data unavailability poses a significant challenge even within the EU, and this issue becomes more pronounced in non-EU contexts, due to inconsistencies in data collection methodologies and KPI reporting standards. While the PRIME platform facilitates a certain level of data standardization among European RIMs, the non-mandatory nature of data submission results in gaps that impede direct comparability across different railway infrastructure networks.
Future research could refine benchmarking models by incorporating additional KPIs, such as infrastructure resilience metrics (e.g., track maintenance backlog, climate-related service disruptions) or digitalization indicators (e.g., level of automated train control, real-time monitoring adoption). Moreover, examining external variables like regulatory policy changes, economic shifts, and emerging technologies (e.g., AI-driven asset management) would enhance the adaptability and robustness of efficiency assessments in railway infrastructure management. The proposed hybrid model confirms its usefulness in the railway infrastructure sector, enabling managers to enhance their performance through objective and quantitative insights, thereby contributing to sustainable development and more efficient resource management.
6. Conclusions
The performance of RIMs is a key factor in ensuring the efficiency, reliability, and sustainability of the railway sector. As rail networks face increasing demands for capacity expansion and service optimization, ensuring efficient infrastructure management is essential to achieving long-term competitiveness and sustainability in the railway sector. Given the complexity of infrastructure operations and the necessity for data-driven decision-making, benchmarking provides a valuable framework for evaluating performance, identifying best practices, and guiding strategic improvements. Benchmarking analysis, as a methodological framework for comparative performance assessment, enables the identification of best practices, the determination of reference values, and the recognition of areas that require improvement. Through systematic performance comparison among different RIMs, benchmarking provides objective insights that support informed decision-making and the optimization of operational processes to enhance the sector’s competitiveness and efficiency. This study has developed an integrated framework for evaluating the efficiency of railway infrastructure managers, combining PCA, the G-BWM, and AR-DEA. The proposed approach enables the identification of key performance factors, the determination of the relative importance of selected indicators, and an objective assessment of the efficiency of RIMs. The research results highlight that infrastructure and safety factors are crucial for the efficiency of railway infrastructure management. From an initial set of 35 KPIs, the PCA method identified six key inputs (C1, C3, C5, D5, G5, G6) and four outputs (S2, S3, S5, P4), covering the most relevant aspects of infrastructure, safety, and operational performance. The application of the G-BWM identified the degree of electrification (C1) as the most important input, while safety indicators (S3) and the share of electric trains (S5) were recognized as the most significant outputs. The AR-DEA enabled the assessment of the relative efficiency of infrastructure managers, with RIM8 identified as the only fully efficient unit, while other RIMs demonstrated varying levels of efficiency. RIM2 and RIM4 recorded the lowest efficiency scores, indicating substantial room for improvement. The sensitivity analysis confirmed the robustness of the model, showing that less efficient RIMs were more sensitive to changes in input and output weights, whereas more efficient RIMs demonstrated greater resilience to variations in indicator weights.
The key contributions of the study are the following:
Development of a Hybrid Benchmarking Framework: A novel methodological framework integrating PCA, the G-BWM, and AR-DEA for systematically assessing the performance of railway infrastructure managers (RIMs).
Reliable Dimensionality Reduction with PCA: PCA effectively reduces data dimensionality, identifying the most relevant performance indicators while eliminating redundancy and multicollinearity.
Objective Weight Determination Using the G-BWM: The Grey Best–Worst Method (G-BWM) ensures a structured and objective approach to weighting KPIs, even in conditions of uncertainty and incomplete information.
Enhanced Efficiency Measurement with AR-DEA: The Assurance Region Data Envelopment Analysis (AR-DEA) method improves efficiency evaluation by incorporating realistic operational constraints and decision-maker preferences.
Practical Guidelines for Performance Improvement: The study identifies key efficiency gaps and provides actionable recommendations for infrastructure managers to optimize resource allocation, improve maintenance strategies, and enhance overall network performance.
Support for Strategic Decision-Making: The model offers a data-driven tool for policymakers and infrastructure managers to develop investment strategies, optimize operational performance, and ensure long-term railway sector sustainability.
The practical implications of this study provide actionable insights that RIMs can implement to enhance efficiency and optimize operations. For instance, RIMs with lower efficiency scores, such as RIM2 and RIM4, can prioritize targeted investments in infrastructure upgrades, particularly by increasing network electrification and enhancing accident prevention measures, both of which were identified as key efficiency drivers. Additionally, by leveraging the benchmarking results, underperforming RIMs can adopt best practices from high-performing counterparts, such as RIM8, which demonstrated optimal resource utilization and safety management. For maintenance optimization, infrastructure managers can use the findings to develop predictive maintenance programmes, reducing downtime and infrastructure-related delays. Moreover, regulatory bodies can utilize this model to allocate funding based on efficiency gaps, directing resources toward RIMs that require urgent improvements in operational reliability and safety. By integrating these insights into long-term strategic planning, RIMs can systematically improve network performance, enhance service quality, and contribute to a more sustainable and competitive railway sector.
Future research should explore the impact of various external factors that directly influence efficiency and performance, including economic, regulatory, and technological changes. For example, evolving EU regulations on railway liberalization, infrastructure funding, and safety standards can significantly alter how resources are allocated and performance is measured. Technological innovations—such as IoT-based monitoring, AI-driven asset management, and predictive maintenance—are also transforming operations by enabling more accurate and proactive decision-making. Moreover, the way in which infrastructure managers are organized—whether under state control, private management, or hybrid models—can impact investment priorities and operational effectiveness. Considering these external drivers will help to ensure that future benchmarking models remain robust and practical for improving performance in an ever-changing industry. Additionally, the model could be expanded by incorporating additional criteria and adapting it to specific national and regional conditions, including the consideration of RIMs outside of the EU, where differences in regulatory frameworks and data availability may require further methodological adjustments. The proposed framework demonstrates high potential for practical application, providing reliable and objective insights that can serve as a foundation for decision-making aimed at improving the efficiency of the railway sector and achieving sustainable transport infrastructure development.