Principal Component Analysis of the Potential for Increased Rail Competitiveness in East-Central Europe

: Increased rail competitiveness has been the objective of many countries around the world, including member states of the EU. Although railway market liberalization has always been accompanied by high expectations of increased e ﬃ ciency and competitiveness, the overall impact of such decisions can be considered controversial. This paper aims to contribute to the scientiﬁc debate by conducting a factor analysis of some East-Central European countries from the aspect of rail freight competitiveness. Since many highly correlated factors inﬂuence competitiveness, its mathematical–statistical representation and analysis is di ﬃ cult due to the high number of dimensions of the factor space. Moreover, competitiveness cannot be measured directly only as a latent variable which is a feature of Principal Component Analysis (PCA). The introduced PCA, model by way of reducing the number of dimensions, can highlight the relations among the attributes and determine the most crucial issues capable of increasing rail competitiveness in the given countries and also of clustering those national railway markets. Recommendations for structural changes in national rail freight markets of the region are also supplied. Our results show that international rail competitiveness depends rather on market e ﬃ ciency than on market liberalization due to the fact that the Global Competitiveness Index and Export / Import attributes did not signiﬁcantly correlate with market concentration. As for the larger domestic rail freight sectors, small freight forwarders—spawned by liberalization—are shown to play a signiﬁcant role in increasing competitiveness.


Introduction
Economic theory states that market liberalization generally increases competition, hence motivates companies to more efficient and more productive operations that cause improved competitiveness in the market [1]. Regarding the European rail freight markets, the experience so far has not completely validated this logic of the theory [2]. On the one hand, several scientific studies confirm that the rail companies with the highest degree of independence from external influences are the most efficient and competitive players in the market [3]. Moreover, the expansion of the markets is often followed by increased technical innovation [4], which also contributes to efficiency. (In this paper, we use the term "competitiveness" in a broader sense than market efficiency. In our view, the notion of competitiveness covers both the efficiency/productivity of the operations and the demand for rail transport services. We also note that, in our paper, we only consider rail freight markets in our database and results. Although passenger rail markets undoubtedly have common characteristics with the rail freight sector such as infrastructure used and some macroeconomic attributes, there are a number of In the following sections, firstly, we give an overview the existing literature references to the selected competitiveness attributes. Afterwards, the applied PCA methodology is introduced, followed by the results of the analyzed five countries. Finally, conclusions are drawn with outlined limitations and recommendations for further research.

Theory/Literature Overview on Competitiveness Factors of Rail Freight Markets
In order to apply a complex approach integrating technical, cost-and user-preference elements, we consider competitiveness as a combination of efficiency attributes and the demand for railway transport services. Demand is considered here as motivation for rail freight service and not as a consequence [13]. In the transportation literature, technical and cost issues are generally examined by statistical techniques [14] while, for preference analyses, multi-criteria decision-making techniques are applied [15]. In our study, influencing factors from both groups were selected and also direct competitiveness indices (Global Competitiveness Index (GCI), GCI6P, GCI2P) have been integrated to elaborate the dependencies of these influencers. Since it has been argued that single measures of competitiveness are not sufficient, several authors propose composite indices [16,17]. One of our objectives was to shed light on the connections of single factors and these composite indices-especially the Global Competitiveness Index (GCI)-in our paper.
Undoubtedly, one of the most important determining factors of rail freight and passenger transport competitiveness is the quality of the infrastructure. This assertion has been proven for passenger transport [18] and for freight [19][20][21]. In their exhaustive survey, Purwanto et al. [17] concluded that increased competitiveness is among the wider benefits of transport infrastructure investments. The second pillar of the GCI is also related to the quality of rail infrastructure of the national railway.
Rail freight market competitiveness and track access fees of the country or region also have interrelations [22], hence we applied this factor among the influencers in our model.
Since one of the main objectives of this paper is to analyze the correlation between market liberalization and competitiveness, we have selected factors that reflect the assumed higher intensity of competition. Market concentration is generally measured by the Herfindahl-Hirschman index (HHI) [23] and, for rail freight markets, applicable references exist, e.g., Crozet, 2016 [24]. Although in his paper, Crozet leaves the question open as to whether high market concentration in the rail freight industry should be regulated or whether this situation leads to high(er) competitiveness. He states that dominancy of one or some major operators might be abusing and thus likely decrease competitiveness. Based on this, we applied market concentration attributes such as calculated HHI, market share of the biggest player of the total freight volume, market share of the smallest market player and number of market players.
Determining connections between regulative actions and competitiveness was also among the objectives of our survey. There have been several notable attempts in the scientific literature concerning this issue including the case of the European railways [25,26]. In our study of East-Central European countries, we decided to apply a direct regulatory factor, namely, the annual number of railway-related legislative actions in these countries.
Considering the demand side, many authors (e.g., Jarzemskis and Jarzemskiene, 2017) stated [27] that higher competitiveness in rail markets does not necessarily originate from a higher intensity of competition in the market (in Lithuania there is only one rail freight operator so the market is monopolized and still has been proven to be more competitive internationally than Poland where the competition is higher) but from the existing demand of the destination sites. Thus, we selected factors such as import, export, and transit freight volumes, domestic freight volume and total rail volumes in million tons, all measured in national levels in the examined five ECE countries.
Finally, we selected a direct competitiveness index, GCI, which is also officially measured/calculated by European countries. GCI, constructed and applied by the World Economic Forum, is accepted worldwide for indicating market and country competitiveness, including several transport-and logistics-related composites [28]. In our study, we applied GCI as an all-encompassing attribute related to competitiveness of each examined country, the second pillar of GCI related to rail infrastructure quality specifically and the sixth pillar related to market efficiency (The Global Competitiveness report 2011-2012).
Based on the ones detailed in the literature review, concentrating merely on rail freight attributes, the following influencers were applied in the model (Table 2). For better readability, we provide the 8-year average (2008-2015) values of each country for the 15 attributes, while the entire raw data table is presented in Appendix A of the paper.

Materials and Methods
In our research, countries are entities characterized by 15 attributes represented on different scales. The use of calculating the Euclidian distance between these entities in this high-dimensional space is non-applicable. Therefore, we have to use a method which reduces the number of dimensions of the space in a way that this reduction preserves as much information as possible.
Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert correlated variables into a set of linearly uncorrelated features which are called principal components [29].
The advantage of using PCA is twofold. On the one hand, a large attribute space can be compressed into a smaller space; on the other hand, it can be used in the visualization of the data with simple plots.
As for our case, in Table 3, it is clear that many of the attributes are highly correlated. This means that PCA can be used in order to reduce the dimensionality of the space that is the number of attributes. As a result, we will obtain a lower-dimension space defined by so-called PCA components, features or background variables.
The basic idea behind PCA is to compress the high-dimensional space into a lower-dimensional space such that it preserves as much variance as possible. After space compression into a convenient two-dimensional space, we can run a clustering algorithm in order to define clusters which group similar entities. This can help us analyze the connection between the different influencers (random variables) of rail freight market competitiveness and the entities. We expect that the representation in the two-dimensional space, defined by the obtained "background variables", will help us better understand market competitiveness.
In the following, we will present the main steps of the method, and we relate the steps to our practical problem. Let us suppose we have n entities in an R p space. Each entity is characterized by p variables (X 1 , . . . , X p ). Based on this, an X matrix of nxp dimensions can be constructed in which x ij numbers denote the value of the j-th variable in the i-th entity, the rows correspond to entities, and the columns correspond to variables.
The application of PCA has two main points of interest. The first one is related to the case of the entities when we use the rows of the X matrix. The second one is related to the case of the variables (attributes) in which we deal with the columns of the X matrix.
The fact that the variables are measured on different scales poses a regular problem. To overcome this drawback, we standardize the values as follows. We calculate the arithmetic means and standard deviations of each variable then the standardized values are computed by Formula 1.
where x j stands for the mean value and s j for the variance of variable j.
Note that the final objective of PCA is substituting X 1 , . . . , X p by the principal components C 1 , . . . , C p and then reducing the p dimension by selecting the principal components with the highest explanation powers (of variance) and omitting the rest.
It is easy to see that the new variables Z j associated with the columns will have 0 mean and all the standard deviations equal to 1.

Phase 1. PCA Applied to the Entities
Two entities are similar when they are close to each other in the p-dimension space. In this subsection, we want to find the lower dimension space, where the entities (points associated with the row vectors of X) are projected in the space defined by the principal components such that as much variation as possible is preserved. First, we define the first axis for which the sum of squares of projections is maximal. We denote this by u1, then it can be proved that the two-dimensional plane with this property is determined by u1 and u2 which is orthogonal to u1. The projection of the entities in the plane are called scores. If two scores are close, then the entities have similar properties. Now we consider the projections of each entity i to the component k. We denote this by F ik . The vector with coordinates F ik , i = 1, . . . n is denoted by F .k , k = 1, . . . p. We obtain, for each component, an n dimensional vector since we projected n points.

Phase 1. Variables Represented in the Space Defined by the Principal Components
In this subsection, we express the correlation between x. .k , k = 1, . . . p, which are the column vectors of the X matrix corresponding to the variables (attributes) and F .k , k = 1, . . . , p. If we project only into the two-dimensional space defined by the two principal components, then we calculate the correlations between x. .k , k = 1, . . . p and F .1 , respectively x. .k , k = 1, . . . p and F .2 , In this way, variable x. .k has the coordinates r(F .1 , x. .k ) respectively r(F .2 , x. .k ). This means that the variables are projected inside a circle with a unit radius of 1 known as the correlation circle.
Since one of the main objectives of PCA is to determine which variables are correlated to each other, we need the correlations matrix between the variables. The covariance of the standardized j-th variable and the standardized l-th variable is: The strength of the linear relation between two variables (attributes) of a particular phenomenon is expressed by the correlation coefficient: where 0 ≤ | r jl | ≤ 1. The closer the absolute value of r jl lies to the unit, the stronger the linear relationship between the two variables is. This is the basis behind formulating the model through linear functions. From a geometrical point of view, the correlation between the centered variables represents the cosine of the angle θ kl between the vectors associated with the two variables.
The problem is that these variables are in a high-dimensional space. So, it is impossible to interpret the angles and therefore the closeness between the variables. The idea is to project these variables onto a two-dimensional space. Based on their projection and with the help of the angles we obtain between the projections, we can obtain information on which of the variables are "close" to each other.
We emphasize here that only those variables which have an associated vector width and length close to 1 are deemed well represented. If not, then those variables are not well represented and the angles which appear are not meaningful. We used this in our data analysis.
The low-dimensional space can also be created in a way that it retains as much variance as possible.
Let us consider the correlation matrix: Accordingly, the correlation matrix and for Formulas (2) and (3) the correlation matrix can also be written as follows: The R matrix for the standardized variables will be used to determine the principal components.

Phase 2. Determination of the Principal Components
The idea is to find the component v1 such that And then find the direction v 2 orthogonal to it and so on. These directions are orthogonal to each other and have the property of maximizing the variance of the projections.
The principal components can be found by singular value decomposition or by diagonalizing the correlation matrix to extract the eigenvectors and the eigenvalues. We denote by u s the eigenvectors corresponding to the eigenvalues λ S . Let us note that λ S represents the variance of the entities projected to the u s component. The eigenvalues will appear in decreasing order.

Phase 2. The Representation of the Entities and Variables
The representation of the entities with respect to the attributes are expressed by the scores with respect to the loadings in the following two formulae: z ik G ks , coordinate s for entity i These formulae imply the very important conclusion that entities are on the same side as their corresponding variables with high values.
Some important concluding assertions: • The entities which have close projections have similar attributes, the entities which are far from each other are different in their subsets of attributes. • Attributes (variables) that are close to each other correlate in a similar way with the principal components, i.e., their angle with the principal components is similar (see Figure 1).

•
The attributes are projected in the subspace of the two principal components such that their length is maximum 1. Variables that have lengths close to 1 are well represented and can be interpreted in this subspace (Figure 1).

•
The entities that are on the same side as each other have corresponding variables with high values. If the entities are in the opposite direction to each other, then they have small values for these variables.
Sustainability 2019, 11, x FOR PEER REVIEW 9 of 19 more difficult in a three-dimensional space so we decided to consider the first two principal components. Figure 1 exhibits the situation of the 15 attributes in this two-dimensional space.  Based on these assertions, we can analyze our rail freight-related dataset and formulate some findings.

Results
Following the general procedure of the PCA, first, the standardized raw data matrix must be constructed. In our case, the 15 influencing factors, as Table 2 demonstrates, are the variables. So, the value of p is 15. The entities are the countries, whose freight market data we gathered from 2008 to 2015 (we were able to obtain a reliable complete dataset only for this period to include all countries for all 16 factors). So, the number of entities is five times eight (the number of years). Thus n = 40 in our model. All five countries had liberalized their rail freight market by the initial date of the examined period.
Afterwards, we constructed the R correlation matrix of the 15 variables, presented in Table 3. Elements under the main diagonal represent the density plots of the pair of random variables. Elements on the diagonal represent the histograms of the univariate variables.
For the scope of this study, the elements above the main diagonal are the most essential; they are the pairwise correlations of the variables. Table 3 justifies the selection of PCA, since it demonstrates not only the high degree of correlation of the analyzed attributes, but also the complexity of these interrelations. This complexity can be interpreted more appropriately by the reduction in dimensions applying PCA. The data represented in this table are official data of the five Central-European countries through eight years. Strong correlation (over 0.8) could be detected-except for the rather trivial ones-between some pairs of attributes. Interestingly, the percentage share of the smallest market player out of the total rail freight volume of the country (SMP) is in strong correlation with domestic traffic (DT, intensity: 0.9); import (IM, intensity: 0.85) and total rail traffic (TRT, intensity: 0.93). Moreover, the number of market players (NMP) is strongly correlated with domestic traffic (DT, intensity: 0.8) and total rail traffic (TRT, intensity: 0.8). Furthermore, between the percentage of the biggest market player (BMP) and transit traffic (TT, intensity: 0.81), a strong correlation could also be detected. This could suggest that the composition of the rail freight market in terms of the size of its competitors has an impact (or at least interrelation since we do not know the direction of the effect) on domestic, import and transit traffic (Figure 2). It is also exhibited in Table 3, that the directly measured market efficiency (by GCI 6th pillar, GCI6P) is correlated with international and export freights (which was quite expectable), but also interrelated with market concentration (MC) with the third strongest intensity, 0.48. The direct measure of rail infrastructure quality (GCI 2nd pillar, GCI2P) is bounded to total traffic, which is quite obvious. However, there is also significant correlation between GCI2P and market concentration (0.7) and the biggest market player (0.76). Based on the above, PCA is worth performing in order to reveal more complex interrelations of the variables and their connections to the entities.
Applying Formulas (4) and (5) described in the Methodology section, the steps of principal components determination can be conducted. In our case, the 16 influencers are substituted by 16 principal components. After reducing the dimensions, only those principal components remain in the model, which explains the dominance of the total variance (this could be reached by determining the eigenvalues because it can be proven that the standard deviation of the principal components equals the square root of the eigenvalues). Having calculated the unstandardized first and then the standardized principal components, the following was deduced. Figure 3 shows that the first two principal components explain 67% of the total variance. Thus, selecting these two can be sufficient for the analysis. Therefore, the space generated by PC1 and PC2 (which is a rotated and projected space now compared to the original 15-dimensional space of the raw variables) is applied to describe the interrelations of the variables and entities. The first three principal components would have explained 79.5% of the total variance but the contribution of the third one would only have been 10.8% and the interpretation of the results would have been much more difficult in a three-dimensional space so we decided to consider the first two principal components. Figure 1 exhibits the situation of the 15 attributes in this two-dimensional space.
The PCA technique constructed two axes, PC1 and PC2, and the proximity of the attributes demonstrates the weight of each variable in the linear combination for the construction of PC1 and PC2. Consequently, in the case of PC2, the attributes 'Biggest market player' and 'Market concentration' play the most significant roles (these have the highest PC2 coordinates). Based on this, the "shadow attribute" is most likely the degree of liberalization (not legislative but the real situation). For PC1, the most significant attributes are 'Total rail traffic', 'Smallest market player', 'Import' and 'Domestic traffic'.
Evidently, PC1 "shadow attribute" is the quantity of rail transport, measured in tons. It is interesting that the 'Smallest market player' has significant weight in constructing the PC1 axis.   The PCA technique constructed two axes, PC1 and PC2, and the proximity of the attributes demonstrates the weight of each variable in the linear combination for the construction of PC1 and PC2. Consequently, in the case of PC2, the attributes 'Biggest market player' and 'Market concentration' play the most significant roles (these have the highest PC2 coordinates). Based on this, the "shadow attribute" is most likely the degree of liberalization (not legislative but the real situation). For PC1, the most significant attributes are 'Total rail traffic', 'Smallest market player', 'Import' and 'Domestic traffic'. Evidently, PC1 "shadow attribute" is the quantity of rail transport, We note that some conclusions can be drawn by the directions of the vectors in this space. Opposite direction means negative correlation of an attribute on another or between an attribute and an axis. Also, the position of the entity points means positive or negative correlation with the others in the constructed PCA space. In our case, the Romanian points (marked with light blue color) are remote from the other ones in the negative area of the coordinate space, which means that their data correlate negatively with the other countries' data.
Another remarkable issue in PCA is that in the created space, the position of the attributes shows the degree of their correlation. If two variables are close to each other, that means they are strongly interrelated. Based on this, the strong interrelation of transit traffic and the quality of railway infrastructure is visible and also the connection between the smallest market player and the total rail traffic. The proximity of 'International traffic' and 'Export' indicates that the majority of railway transport in Eastern-Central European countries is motivated by export transportation.
In Figure 1, we consider only the variables for which the associated vectors have lengths close to one (in the current scaling, 0.1). Only those vectors are well represented, so 'Rail share' and 'Track access fee' can be omitted from the PCA analysis, their projection is not significant in the constructed vector space of PC1 and PC2 axis. One can observe that in Figure 3, three main attribute groups can be determined.
The first consists of biggest market player (BMP), transit traffic (TT), quality index of infrastructure (GCI2P), market concentration (MC) and track access fee (TAF) can be also connected here. This group demonstrates best the intensity of freight competition. If the market is very concentrated and the biggest company is very dominating in the C-E region, then high infrastructure quality and low fees could be expected with big volumes of transit traffic.
Market efficiency (GCI6P), international traffic (INT), export (EX), import (IM), rail share (RS) and the global competitiveness index (GCI) constitute the second group of variables. This group can be called international competitiveness indicators, since GCI is a direct measure of it and the others all reflect the global capability of the national rail system. Based on the conducted PCA, however, market liberalization and thus market concentration have no serious impact on the international competitiveness, because there is no significant correlation between this second group and the 'Market Concentration' and 'Number of Market players' attributes. However, increasing market efficiency instead of further liberalization might have a positive impact on the international competitiveness of the countries, which supports the findings of Bougna and Crozet (2016) cited in the Introduction. The third group consists of domestic traffic (DT), smallest market player (SMP), number of market players (NMP) and total rail traffic (TRT). Hence, we can conclude that for national rail freight, the existence of small freight forwarders is essential and has a positive impact on the total rail freight performance. In conclusion, market liberalization implementation should concentrate on motivating the small forwarders in order to increase domestic traffic, but it does not contribute to international competitiveness. Figure 1 also includes the positions of the examined five Eastern-Central European countries in the two-dimensional principal component space. The results strengthen the findings of Feuerstein et al. [14], who stated that different competitiveness influencers have different impact and importance in EU countries. Based on the attribute effects, the clusterization of the examined national rail freight markets can be completed-for better visualization, we constructed Figure 2.
National characteristics can be easily detected in this figure. Since the rail freight market of Poland is very much related to the other examined countries, in their case domestic factors are dominant. For Slovakia and Hungary, transit traffic is very important. So, their position is situated near the transit factors. The Czech Republic is also situated next to transit-like factors but also close to international competitiveness variables. Romania seems to be in the opposite direction to the variables, meaning that it is characterized by low values for these variables, which can be explained by the relative closeness of its rail freight market (lower proportion of international freight volumes) and different features compared to the other Eastern-Central European countries.
Regarding the recommended national transport policy implications, Figure 1 is also rather telling. For Slovakia and Hungary, measures related to infrastructure developments and market concentration (focusing on the biggest participants of the competition) can be the most effective in terms of raising competitiveness. As for the Czech Republic, market efficiency issues have to be prioritized. In Poland, the situation of small freight forwarders is the most crucial issue and many more small rail freight companies would be necessary. For Romania, further expansion of the railway to international traffic can be recommended.
The analysis revealed the interrelation of market liberalization and international competitiveness in a way that the roles of cause and effect are not trivial. Based on the results of the correlations of influencers and positions of countries, we can state that market liberalization issues do not necessarily cause increased international competitiveness in the rail market. Merely opening the market and motivating new players to enter in the market does not automatically cause higher competitiveness; see the different/opposite cases of Poland and Hungary. Because of the complex relations between railway attributes and the relative remote positions of liberalization and competitiveness influencers demonstrated by PCA, and also considering the significant distance between the examined countries, we emphasize the need for country-specific analyses before taking market liberalization measures.

Conclusions
Improving railway markets is essential both from economic and sustainability perspectives, which have been recognized by many states all over the world. However, the best ways of raising rail competitiveness are not trivial and need thorough analysis because a general solution may not exist for the different cases of national markets.
This paper aimed to contribute to the ongoing debate about the relation of rail freight market liberalization and competitiveness by applying the complex multivariate method of PCA. The contradiction between the high expectations of railway market liberalization and the stagnation of the Central-Eastern European rail freight sector has been palpable in recent years. The findings of our analysis can partly explain this discrepancy.
Our results support the findings of Bougna and Crozet in terms of international competitiveness; dealing with market efficiency questions appears to be more important than the drive for liberalization. However, for domestic rail freight markets-especially in countries with big volumes of rail freight and a large national market-focusing on small forwarder companies can be even more advisable.
Rail infrastructure developments contribute most directly to transit traffic and, in these cases, the biggest rail forwarders play the most significant role.
National characteristics of railway markets could be also detected, which supports the robustness of our study results.
Furthermore, the statements of Zunder et al. (2013) [30], that liberalization alone cannot explain the different rail freight performance of the countries in the EU, have also been verified. They found that organizational and managerial bottlenecks strongly determine this performance and, thus, market competitiveness. Our results support this idea as market concentration and market efficiency issues have not been positioned closely in our PCA vector space. Thus, liberalization measures cannot be the only means to improve rail freight performance.
Involving more data from databases of other countries (from other regions beyond the territory of the European Union or analyzing the data of old EU members but regarding a different period due to their earlier liberalization) might be a promising subject of further research and also selecting and integrating other attributes might improve on our findings in the future.

Conflicts of Interest:
The authors declare no conflict of interests. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.