Artificial Intelligence for Cluster Analysis: Case Study of Transport Companies in Czech Republic

: What is the situation of the transport sector in the Czech Republic and what is its im ‐ portance for the economy of the Czech Republic? How and to what extent do businesses operating in this sector influence the sector as such, and how many businesses in this sector have such influ ‐ ence? Additionally, what happens if the most important businesses in the transport sector go bank ‐ rupt, and which businesses are the most important ones? Searching for the answers to these ques ‐ tions is a subject of this contribution, focusing primarily on the cluster analysis using artificial neural networks (ANN), specifically with Kohonen networks, which represent the main method for pro ‐ cessing a large volume of not only accounting data on transport companies. In this research, the dataset consists of the financial statements of transport companies for the years 2015–2018. The re ‐ search part of the contribution deals mainly with the issue of the transport sector’s development in the years 2015–2018 with the companies operating in this sector and tries to identify the most im ‐ portant companies in terms of their importance for this sector. The results show that the whole transport sector is influenced mainly by the two largest companies, whose potential changes can affect companies themselves but to a great extent also the development of the whole transport sec ‐ tor. For the two companies, financial analysis is carried out using ratios, whose results show that despite the negative values of the important value generators of one of these companies, the com ‐ pany is still able to significantly influence the situation in the transport sector of the CR. This infor ‐ mation is a clear guide for experts, development analysts, to determine the further development of the whole sector when focusing on the development of the two specific companies only. A question arises as to how the created model can be applied to other economic sectors, especially in other EU countries.


Introduction
In today's world of fast-developing technologies and innovations, the need for thinking ahead and being one step ahead is becoming stronger. This can be facilitated by big datasets with collected data (sometimes also called big data), which can be analyzed and used to predict further development or to assess the current state. Data processing and evaluation can be carried out using a plethora of methods, whose application may help to find results to provide an outline for a future situation or to adopt measures that may enable users to prevent unwanted events that could affect the development either of the whole sector or one business only. Transport companies are not an exception; their data enable the analysis and assessment of the whole transport sector of the CR. However, data are currently very valued, and their collection is a long-term and complex process. Data can be analyzed using two methods: traditional statistical methods or advanced data processing methods, such as artificial intelligence or artificial neural networks. Compared to other statistical methods, a big advantage of neural networks is their ability to learn. (Cho et al. 2009) believe that artificial intelligence methods enable achieving much better results than traditional statistical methods. Very accurate and complex results are provided also by hybrid models based on the combination of statistical and artificial intelligence methods. However, (Li et al. 2015) point to the fact that traditional statistical methods are still popular due to their easy interpretation, comprehensibility, and acceptable predictive performance. According to (Rowland and Vrbka 2016), the disadvantages of artificial neural networks (ANN) are the demand for high-quality data and possible illogical behavior of an artificial neural network. (Vochozka and Machová 2017) state that ANN differs from other traditional methods, especially in the so-called adaption phase, where the neural network learns from appropriately selected training models that represent a given problem. The ability to learn is the biggest advantage of ANN. The volume of the data and the possibilities that the application of ANN or artificial neural networks bring make them a suitable tool for analyzing data of transport companies. The most suitable tool for cluster analysis is the so-called Kohonen networks, sometimes also referred to as self-organizing maps. These networks are used for analyzing the data of transport companies in the CR, which form a basic dataset for the network, and based on which the economic development of the transport sector in the CR can be analyzed and interpreted. Transport companies and the transport sector are a very important part of the economy, not only in the Czech Republic. The analysis of these companies provides real results, which can be used by the transport sector to improve future development and which enable focusing on certain aspects they are most affected by, paying particular attention to them, and take necessary measures. The objective of this contribution is to evaluate the development and the situation in the transport sector based on the data resulting from the analysis performed using Kohonen self-organizing maps and to determine how many companies and on what basis they most influence the transport sector. The analysis will also show the percentage of companies whose influence is low. The assumed share of companies significantly influencing the transport sector and the share of companies with small influence on the transport sector is based on the Pareto principle (80-20 rule), i.e., twenty percent of companies with a major influence on the transport sector to eighty percent of companies, which have a small influence.

Literature Research
Like many other infrastructure-intensive economic activities, the transport sector is an important part of the economy, which influences the development and well-being of the population. When efficient, transport systems provide economic and social opportunities and benefits that lead to positive multiplier effects, such as better access to markets, employment rate, and other investments. When inefficient in terms of their capacity or reliability, they might relate to economic costs, such as decreased or lost opportunities (Rodrigue et al. 2013). Transportation of people, information, or goods can be considered a very important component of the national economy of the CR. According to (Svobodová et al. 2013), the extraordinary importance of transport in the CR results from the location of the CR at the crossroads of trans-European routes; in the CR, of course, inland transport also plays a great role, especially in the import and export of raw materials and products and of citizens and in satisfying the need for inland and international passenger transport. As in other sectors, even in the transport sector, it is necessary to analyze and monitor its development. A tool for analyzing the data of transport companies in the CR as well as the entire transport sector can be both traditional statistical and advanced research methods. (Rayala and Kalli 2021) state that currently, there are various methods (including advanced ones) for processing large volumes of data. The authors believe that deep learning could be a strong paradigm for analyzing large datasets; however, it requires many samples for model training, which is costly and time-consuming. This can be avoided by using a fuzzy approach. (Rayala and Kalli 2021) proposed and developed an improvised Fuzzy C-means (IFCM), which includes the model of Convolutional Neural Network (CNN) and Fuzzy C-means (FCM) for improving the clustering mechanism. In addition, a comparative analysis was performed for each dataset, which showed that IFCM surpasses the existing model. Cluster analysis (clustering) is a major problem in machine learning without a teacher. It can be applied in many fields such as bioinformatics, gene sequencing, market research, medicine, analysis of social networks, and recommender systems. The main purpose of clustering is to arrange similar data objects from a given dataset into clusters (groups). Clusters usually represent a kind of real-world entity or meaningful abstractions (Lakhawat and Somani 2017). (Bakoben et al. 2017) used cluster analysis of behavior of credit card accounts to assess the level of credit risk. The credit card account was parametrically modeled, and subsequently, behavioral cluster analysis was applied using a recently proposed dissimilarity measure of statistical parameters of the model. (Bakoben et al. 2017) considered only the popular k-medoids clustering method; however, they state that a problem to be solved in the future is the computational complexity of the new metrics growing with the size of the sample. (Tyukhova and Sizykh 2019) studied the possibility of applying cluster analysis for creating preliminary groups of securities based on which the portfolio will be created subsequently. The research results showed that the application of the cluster analysis of shares as a preparatory phase for creating investment portfolios enables increasing its profitability and efficiency and reducing risks. It is more efficient to apply cluster analysis as a preliminary phase with a subsequent portfolio optimization using the Markowitz model. (Cahyana et al. 2020) used hybrid cluster analysis for the classification of customers of PT Pelindo I based on their satisfaction with the services offered by PT Pelindo I. Cluster analysis was performed for the purposes of grouping the research objects based on their characteristic similarities. The results showed that 72% of PT Pelindo I customers perceived the services provided by PT Pelindo I as special, while the remaining 28% perceived this service as good. (Feranecova et al. 2016) used cluster analysis to classify the companies in different financial situations operating in automotive into groups with similar characteristics, thus creating clusters of companies in a similar financial situation. Cluster analysis provided a methodology for determining the financial health of companies operating in automotive. In the study by (Vahalík and Staníčková 2016), factor analysis was carried out prior to cluster analysis. Key factors of foreign trade competitiveness were defined by means of factor analysis; subsequently, cluster analysis was used to identify the countries with similar characteristics of competitiveness factors. (Abdelkafi et al. 2018) carried out the classification of 42 developed and developing countries based on the impact of risk crises by means of Kohonen networks (SOM). The results of SOM (self-organizing maps) indicate the optimal map represented by 56 micro-classes (cells or neurons) and five areas of grouped countries, which is defined by five different economic situations. The classification was based on the level of development, development of economic ratios, and period. (Kohonen 1986) describes artificial neural networks as undetermined classification algorithms based on neural networks, consisting of competitive layers, which use Kohonen classification rules for input classification. Neurons in competitive layers learn to distinguish the groups of similar input vectors. This is a self-organizing method of mapping, which has two layers only (input and output) and each layer is made up of neurons. At the same time, the application of the Kohonen network with database clustering enables the projection of multidimensional data into two-dimensional space and the analysis of the resulting cluster system. Choosing cluster numbers is based on the calculation of cluster indicators, unlike the traditional statistical approach used by (Liashenko et al. 2018). A clear choice for the analysis and processing of the dataset containing the data of transport companies in the form of financial statements are Kohonen ANN (artificial neural networks), which are able to work with big data.

Dataset
A fundamental step for achieving the objective set is the collection and subsequent modification of data. The analysis used the data obtained from financial statements, specifically the balance sheet and profit and loss account (PLA). The data source for the research sample was the data obtained from Bisnode's Albertina database. The data were obtained from section H-Transportation and Storage according to the CZ NACE classification. Sll legal persons from section H for the period 2015-2018 were selected.
After the generation of the data, the dataset consisted of 21,882 data rows, where each row contained the following items: 1. Specification of the company: company registration number (IČO), name of the company; 2. Information on the company: date of resources, beginning and end of the period, number of months of the financial statement; 3. Financial statements: balance sheet, profit and loss list (PLL).
The data were modified in MS Office Excel as follows: a. The value "0" was added in empty cells: the table containing the generated data was selected, and "Replace" was used to replace fill in all empty cells in the table with the value 0. b. It is necessary to add another column "EBIT" by adding the column "EBT" and "Interest payable". c. The beginning and end of the period outside the period of 2015-2018 were removed. d. The columns and rows containing zero entries, the columns with zero variance, and duplicate rows were removed ("Data"-"Remove duplicates"-"by company registration number"). Subsequently, the entities (in rows) for which the data for the period other than 12 months (different accounting periods) needed to be removed, as well as the non-numeric entries. e. ROA and ROE were calculated, where ROA is expressed as a ratio of EBIT and assets and ROE is the ratio of EAT and equity. f.
All necessary components for calculating EVA Equity (from the point of view of shareholders) wsas calculated: risk-free return-rf; indicators characterizing the company size-rLA; indicators characterizing the production power-rentrepreneurship; XP; indicators characterizing the relationship between assets and liabilities-rfinstab; and weighted average costs of capital-WACC (risk-free return + indicators characterizing the company size + indicators characterizing the production power + indicators characterizing the relationship between assets and liabilities) according to (Vochozka 2020): (1) h. It was necessary to carry out the so-called data cleansing by removing nonsensical and extreme values; subsequently, non-numerical entries of EVA Equity needed to be removed, e.g., those with the error message claiming that it is not possible to divide by zero.
In order to keep entities (enterprises) which were further processed in the dataset, the enterprises needed to meet the following 29 conditions: positive assets, positive fixed assets, positive fixed financial assets, positive fixed intangible assets, positive current assets, positive inventories, positive long-term receivables, positive short-term receivables, positive trade receivables, positive receivables to associates, positive share capital, positive reserve funds, positive reserves, positive cash, positive sales of goods, positive consumed material, positive production-consumption, positive performance, positive costs of providing goods, positive depreciation, positive sales of fixed assets, positive sales of material, positive net book value of sold fixed assets, positive interest payable, wage costs higher than CZK 120 thousand per year, ROA in the interval (-100%, +100%), ROE in the interval (-100%, +100%), alternative costs of equity in the interval (0%, +100%), and sales of goods and performance together at least CZK 120 thousand per year.

Methods
As regards the methods for processing and analyzing the data used in this work, it is necessary to focus primarily on the Kohonen networks. To clarify self-organizing processes, their operation is first demonstrated by finally simplified system models. According to (Kohonen 1982), the basic components of these systems are: 1. A set of processor units that receive coherent inputs from the event space and create simple distinguishing functions of their input signals; 2. A mechanism that compares distinguishing functions and selects the unit with the highest functional value; 3. A kind of local interaction that simultaneously activates the selected drive and its closest neighbors; 4. An adaptive process that causes the parameters of activated units to increase their distinguishing functional values related to simultaneous input.

Ordered mapping
To clarify the methodology, let us take a look at Figure 1, which describes a simple one-stage self-organized system. Information about events A1, A2, A3,…, which take place in the outside world, is transmitted in the form of sensory signals to a set of processor units (shown here a one-dimensional field for simplicity) via a transmission network. Sets of sensory signals Si distributed to each processor unit i may be non-identical, and the number of signals in each Si may be different; however, these signals are considered coherent in the sense that they are clearly determined by the same events Ak. Suppose that events Ak can be arranged in some metric or topological way so that A1RA2RA3R …, where R stands for general arrangement relationship that is transitive (the above means e.g., A1RA3). Next, the processor units produce output responses to events with scalar values ηi (A1), ηi (A2), … The system according to Figure 1 is intended to implement one-dimensional ordered mapping if for i1 > i2 > i3 >…, ηi1 (A1) = maxj {ηj (A1) | j = 1, 2, …, η} ηi2 (A2) = maxj {ηj (A2) | j = 1, 2, …, η} ηi3 (A3) = maxj {ηj (A3) | j = 1, 2, …, η} The above definition is easily generalizable for two or more large arrays of processor units; in this case, topological order, induced by more than one ordered relationship with respect to different attributes, shall be definable for events Ak. On the other hand, the field topology is simply defined by the definition of the neighbors of each unit. If the unit with the maximum response to a specific event is treated as an image of that event, then the mapping is considered ordered if the topological relationships of images and events are similar (Kohonen, 1982). Figure 2 defines a rectangular array of processor drives. In the first experiment, the transmission network was neglected, and the same set of input signals was connected to all units {ξ1, ξ2,…, ξn}. In accordance with the notations used in mathematical system theory, this set of signals was expressed as a column vector x = [ξ1, ξ2, …, ξn] T ∈ R n where T indicates transposition. Unit i shall have input balances or parameters μi1, μi2, …, μin, expressible as an additional vector mi = [μi1, μi2,…, μin] T ∈ R n . The unit shall constitute a discriminatory function: (4)

Creation of topological maps in a two-dimensional field with the same inputs to all units
In addition, a discriminatory mechanism works to select the maximum of ηi: For unit k and all eight of its closest neighbors (except the edges of a field where the number of neighbors was different), the following adaptive process is assumed to be active: where variables have been identified by the confidentiality index t (integer), is the "adaptation parameter" in the adaptation, and the denominator is the Euclidean norm of the numerator. Equation (6) otherwise resembles the well-known perceptron learning rule, except that the direction of corrections is always the same as the direction (no decisionmaking process or supervision is included), and the weighing vectors are normalized. Normalization improves selectivity in discrimination and is also beneficial in maintaining "memory sources" at a certain level. Notice that the equation process (6) does not change length but only rotates. However, it is not always necessary for the Euclid standard to be the norm, as in Equation (6).
The data were processed using the method of Kohonen networks. Figure 3 shows the self-organizing algorithm.

Training-network setting and results
The data were processed using the TIBCO Statistica program. We chose "SANN: Cluster analysis". Topological height was 10 and "Topological width" was 10. We set the training cycle to 10,000, "Learning rates" start to tenths and end to thousandths.
Procedure for evaluating the strongest cluster i.

Assignment of cluster positions to individual enterprises
As a first step, one must assign the resulting clusters to the enterprises in the original dataset in each year, as shown in Table 1. Here, the positions in the cluster are listed as "Position 2015". The source of individual clusters is the result of neural network training. Source: Author.

Determination of enterprise value generators as variables
Position assignment is followed by the determination of variables that have been specified as enterprise value generator entries and then the creation of Pivot Table reports.
The variables are defined as generators of the value of the enterprise:  total assets;  fixed assets;  sales;  operating profit or loss. ii.

Determination of cluster ranking and number of their points
The individual variables mentioned above are filtered out of the PivotTables, along with the positions in the clusters, and these data are transferred to new tables where orders 1-10 are delivered; see Table 2. Aftersorting was determined in such a way that the first is always the cluster where the variable acquired the highest value. This was done for all years and for all variables. The aim is to find out which cluster is the strongest and then find and compare, using financial analysis, the companies in it for 2015, 2016, 2017, and 2018.
From the tables shown in Table 2, pivot tables were created (see Table 3), and 1-10 point scales were added to the table, with 10 points being added to the table if the cluster took first place in the order. In this way, the scoring continued up to the tenth place, where the cluster in this place received only one point for its position. The location of the cluster depends on the amount of assets, fixed assets, sales, and operating income. The "Order Points Count" column is added to each of the Pivot Tables, and using the IF(B7 = 1;" function. 10"; IF(B7 = 2;" 9"; IF(B7 = 3;" 8"; IF(B7 = 4;" 7"; IF(B7 = 5;" 6"; IF(B7 = 6;" 5"; IF(B7 = 7;" 4"; IF(B7 = 8;" 3"; IF(B7 = 9;" 2"; IF(B7 = 10;")), the number of points is assigned based on the order in the table. This is done for all variables in all years.

iii. Assembling a cluster leaderboard and determining the strongest cluster
To select the strongest class, the results from all Pivot Tables below must be copied and a single Pivot Table report must be created in which the clusters can be sorted based on all points earned for all years and for all variables. Subsequently, financial analysis was carried out, which is limited to the ratio indicators.

Dataset
The basis for the research was a dataset including the data from 21,882 companies, whose number was reduced after modifications to 1868 companies. The data were obtained from Bisnode´s Albertina database from section H-Transportation and Storage according to the classification CZ NACE, where all legal persons from Section H for the years 2015-2018 were selected.
Each data row contains: 1. Specification of the company: company registration number (IČO), name of the company; 2. Information on the company: date of resources, beginning and end of the period, and number of months of the financial statement; 3. Financial statements: balance sheet, profit, and loss account (PLL).
The dataset was divided into four other sheets, where the information about the companies and the variables were presented for the individual years of the monitored period (2015-2018).

Training of Neural Networks and Its Results
Network training was performed in the Statistica programme using the cluster analysis function. The basis for the training was the dataset containing the data on transport companies obtained from the financial statements for the years 2015-2018. Important information for the research is the location and number of companies in individual clusters, as presented in Table 4 and 3D Graph, which can be generated directly in the Statistica programme. The number of companies in individual clusters for the year 2015 is shown in Table 1.
Tables 4-8 need to be read according to the numbers horizontally or vertically located in the Table 4. The numbers indicate the precise location of the clusters and the number of companies in clusters. When finding out the number of companies situated, e.g., in the cluster (2, 1), start reading from the first number of the horizontal row to the number of the vertical column. You can see that in 2015, the cluster (2, 1) contained four companies. It can be seen from Table 4 that the strongest cluster in terms of the number of companies significantly exceeds the average number of companies per cluster (1). The average number of companies per cluster in 2015 was 18.69.
It can be noted that the high number of companies in a cluster does not mean it is the strongest cluster in the overall analysis, since if a cluster contains too many companies, it indicates that the cluster consists of more companies that are weaker but show the most similarities based on which the network included them in the cluster. Experience shows that the strongest cluster usually contains fewer than the average number of companies.
Scheme 1 shows the number of companies in clusters; i.e., it shows how many companies are in the given cluster in a specific year. The schema also shows that the biggest cluster in terms of the number of companies is in the position (6, 10), as seen in Table 4 based on which the Scheme 1 was created.    Scheme 2 is a graphical representation of the location and number of companies in clusters based on the information contained in Table 5 above. The highest number of companies (220) was in 2016 in the cluster (8, 10). The average number of companies per one cluster in 2016 was 18, which is slightly fewer than in 2015.  Table 6 shows the number of companies in individual clusters in the year 2017.   8  13  4  6  9  13  7  5  10  3  14  9  6  2  4  4  8  11  11  5  1  2  10  3  4  2  3  6  8  6  5  5  5 Source: Author.
Scheme 3 also indicates the changes in the number of companies in clusters and the location of clusters. In 2017, the number of companies in the first clusters increased, while the more distant clusters contain fewer companies, as seen in Scheme 3.
The average number of companies per one cluster in 2017 is 19. As follows from the graph, the cluster with the highest number of companies is (8, 1) with 289 companies.  Table 7 below shows the data from the year 2018, which indicate that the strongest cluster in terms of the number of companies is the cluster (10, 10), with its 222 companies. It is also the first year when clusters with zero companies appear-clusters (2, 1) and (3, 2).

Analysis of Strongest Cluster
The variables of business value generators are Assets, Fixed assets, Sales, and Operating result. These variables are used for the evaluation of the individual clusters. For a general comparison of clusters, other items are selected, specifically indebtedness, services, personnel costs, and interest payable.
First, the assets for the year 2015 were evaluated and ten clusters with the highest sum of company assets (in descending order) in each cluster were selected. The column "Number of companies from company ID" in Table 8 indicates the number of companies in each cluster. "Sum of Assets in total" indicates the sum of assets of individual companies in each cluster. Table 8 indicates that the strongest cluster in terms of the total assets (193,983,526 thousand CZK) is the cluster (1, 1), which contains five companies. The second strongest cluster is (2, 1) containing four companies with a total sum of assets of 49,520,740 thousand CZK. The cluster with the third-highest sum of assets is the cluster (3, 1) containing six companies with a total sum of assets being 13,980,772 thousand CZK.
The analysis by assets is as follows.  The cluster (1,1) is also the strongest one in the year 2016 (see Table 9). Compared to the year 2015, the cluster contains four companies only. The cluster (2, 1) dropped to the third position and contained only two companies. The second place is occupied by the cluster (1, 3), which was on the 9th position in 2015. The year 2017 brought a major change in the form of the cluster (10, 10) (see Table  10), which was replaced the cluster (1, 1). However, this does not mean that the companies in the cluster (1, 1) lost a major part of their assets in this period; the cluster (1, 1) was replaced by the cluster (10, 10) in the network training.
The cluster (10, 10) is considered to be the cluster (1, 1) since a detailed examination of the companies showed that the companies from the cluster (1, 1) are now in the cluster (10, 10) (see Table 11).  As seen from Table 12, in 2018, the cluster (1, 1) was again in the first position, with the total assets being 154,281,306 thousand CZK. In 2018, it contained three companies. The second position was occupied by the cluster (1, 2) with five companies, while the third position was occupied by the cluster (3, 1) with three companies. Scheme 5 below represents the development of assets in the cluster (1, 1), which was replaced by the cluster (10, 10) in 2017. The analysis by fixed assets is as follows. In terms of the sum of fixed assets, the strongest cluster for the year 2015 is (1, 1) (see Table 13), which occupied the highest positions in the evaluation of clusters by the highest sum of total assets. The cluster (1, 1) is represented by five companies, with the total sum of fixed assets being 160,176,338 thousand CZK. In 2016, the strongest cluster was again (1, 1) represented by four companies only, with the total sum of fixed assets being 151,923,525 thousand CZK (see Table 14 below).  Table 15 shows that the cluster (1, 1) was replaced again by the cluster (10, 10). This cluster is thus again considered the cluster (1, 1); in 2017, it was represented by five companies, with the total sum of fixed assets being 163,426,753 thousand CZK. In 2018, the strongest cluster in terms of the total sum of fixed assets was (1, 1), which so far appears to be the strongest cluster overall. As seen in Table 16, the second strongest cluster is (1, 2), with a total of five companies; the third strongest cluster (1, 2) contains four companies. The development of the sum of fixed assets in the strongest cluster (1, 1) (cluster (10, 10) for the year 2017) is shown in Scheme 6. In 2018, the curve goes down, which indicates the drop in the total sum of fixed assets of all companies in the cluster (1, 1). The analysis by sales is as follows. As seen from Table 17, the first position in the year 2015 is occupied again by the cluster (1, 1) with the total sum of sales of 69,370,150 thousand CZK. In 2015, the cluster includes five companies. The second position is occupied by the cluster (2, 10) with twelve companies, and the third position, the cluster (1, 10), containing three companies.  As in the previous cases, even in the year 2016, the strongest cluster turned out to be cluster (1, 1), which contained four companies with the total sum of sales being 51,118,919 thousand CZK. The second strongest cluster, which contained thirteen companies, was the cluster (1, 10) (see Table 18). In the year 2017, the cluster (1, 1) was typically replaced by the cluster (10, 10). As seen from Table 19, the cluster contains five companies with the sum of sales being 59,077,349 thousand CZK. The second position is occupied by the cluster (10, 9); the third strongest cluster was (1, 8), for the first time being one of the three strongest clusters. The year 2018 was no exception; the first position was occupied by the cluster (1, 1) (see Table 20). The cluster (1, 1) consists of three companies, with the total sum of sales being 44,758,784 thousand CZK.  The analysis by operating results is as follows. In the case of the variable "operating profit" (hereinafter also referred to as "OR"), a significant change was recorded, which concerned the strongest clusters. The strongest cluster appears to be cluster (2, 1) represented by four companies, whose sum of OR is 2,486,859 thousand CZK. The cluster (1, 1), represented by five companies in the previous cases, occupies the second position in 2017; the third strongest cluster appears to be the cluster (2, 10), with its twelve companies. Table 21 shows ten clusters with the highest sum of OR for the year 2015 As for the results for the year 2016, Table 22 indicates that the cluster (1, 2) still occupies the first position. The cluster (1, 1) is not among the three best ones; the second position is occupied by the cluster (1, 10), while the third strongest one is the cluster (2, 1). The cluster (1, 10) consists of thirteen companies, while the cluster (2, 1) includes two companies.  Table 23 shows that the first position is occupied by the cluster (9, 10) with a total sum of OR being 3,556,518 thousand CZK. The second position in 2017 is occupied by the cluster (10, 10) containing five companies while the third strongest cluster is the cluster (2, 10) with a total of four companies. In 2018, cluster (1, 1) is again among the strongest clusters, specifically, it occupies the second position with its three companies. As in the year 2016, the strongest cluster is the cluster (1, 2), with the total sum of OR being 5,138,259 thousand CZK (see Table 24). The development of the sum of the highest OR for the years 2015-2018 is shown in Scheme 8 below. In this case, it does not show the companies in the same cluster; for each year, the strongest cluster is different.

Order of Clusters and Their Scoring
To identify the strongest clusters, it was necessary to determine their order based on the sum of the given variable (e.g., the Table 25 shows the sum of assets).
Subsequently, the clusters were assigned points in such a way that clusters occupying the first position were assigned 10 points, clusters occupying the second position 9 points; for the third position, clusters are assigned 8 points; 7 points were assigned for the fourth position; 6 points for the fifth position; 5 points for the sixth position; 4 points for the seventh position; 3 points for the eighth position; 2 points for the ninth position; and 1 point for the cluster occupying the tenth position.
Graphical representation of the number of points assigned in the individual years for individual clusters is shown in Scheme 12.

Ranking of Strongest Clusters
After compiling the ranking, the overall table was, due to its extent, put in the appendices part of this text under the title Appendix B: Overall ranking of clusters. Table 30 presents the order of the first ten strongest clusters. As predicted in the previous chapter, the first position is occupied by the cluster (1, 1) with a total number of 108 points. Scheme 14 is a graphical representation of clusters by the number of points in total.  The strongest cluster based on the variables considered the business value generators turned out to be the cluster (1, 1). Table 31 shows companies from the cluster (1, 1) in individual years. It follows from the table that in 2015, the cluster (1, 1) contained five companies; in 2016, it was four companies; five companies in the year 2017, and three companies in the year 2018. Here, it shall be noted that in grouping the companies in the clusters, the situation in the year 2017 is considered, when, as already mentioned, the cluster (10, 10) replaced the cluster (1, 1). Table 31 also indicates that only two companies appeared in all years of the monitored period, specifically the company with the registration number 70994234 and the company with the registration number 5886. Those two companies are subject to subsequent analysis.  ,114,983 28,196,678 60,193,531 28,196,678 28,196,678 Source: Author.

Analysis of Selected Companies from Cluster (1, 1) and Their Participation in the Creation of the Selected Components
An interesting fact is how the selected companies from the cluster (1, 1) participated in the creation of the selected components-assets, fixed assets, sales, and operating result-in individual years of the monitored period. This is presented in the table in Figure  4, where the column "Assets" shows the sum of assets of a given company for a given year, while the column "Total assets" shows the sum of assets of all companies created in the same year. The same system is used for all components in the table. The data were  Number of points Cluster taken from the dataset, which was presented to the Kohonen network for training; based on this dataset, a map of clusters was created as an output of the network. The table shows that the company with the registration number 70994234 participated in the creation of the total assets as follows: 17.5% in 2015, 18.4 in 2016, and 15.95% in 2017, and in 2018, it decreased to its lowest level of 14.82%. The company with the registration number 5886 showed a slightly higher share on the creation of the total assets: in 2015, the company participated in the creation of the total assets of all companies in the dataset by 19.52%. It was 20.01% in 2016, 19.20% in the year 2017, and 19.93% in 2018. In the case of fixed assets, the participation of both companies was higher: in 2015, the share of the company with the registration number 70994234 on the total fixed assets was 23.84%, while the share of the company with the registration number 5886 was 25.93%. In 2016, the share of the company with the registration number 70994234 increased to 24.90% and to 26.77% in the case of the company with the registration number 5886. In the year 2017, there was a decrease to 22.24% and 25.89%. The lowest share was recorded in the year 2018, specifically 21.07% and 25.77%. As for sales, in the years 2015-2018, the share of the company with the registration number 70994234 ranged between 2.04% and 2.18%, while in the case of the company with the registration number, the share was slightly higher, ranging between 5.44% and 6.49%. The problem is the operating result, where the company with the registration number 70994234 shows a negative operating result. This is because this company is controlled by the state, where financing is different than in the case of private companies. Companies managed by the state provide strategic services and functions for the state and their operation is financed from the state budget, and they are not intended for generating profit but ensure various functions for all citizens; their functioning is thus logically different, and their principle is completely different from private companies. They are state organizations whose task is to ensure the transport infrastructure, not to generate profit. It may seem that this company should not be included in the other companies in the cluster, but it should be considered that it exists and creates the environment this contribution deals with. The negative operating result will thus not be commented on in terms of the percentage share. However, the company with the registration number 5886 showed positive operating results for all years of the monitored period, and its share in the years 2015-2018 ranged between 9.38% and 13%. Table 32 presents a financial analysis of the company in the form of ratios. The first three components in the table need not be commented on since their negative values are explained in the previous chapter (see Negative operating result). As for cash position ratio, it indicates to what extent the company can pay its short-term liabilities immediately, i.e., with cash, money in bank accounts, cheques, or short-term securities. The recommended value of cash position ratio ranges between 0.2 and 0.5 and is calculated by dividing financial assets by short-term liabilities + current bank loans. The company with the registration number 70994234, as seen in Table 32, shows the value of cash position ratio of 0.11 in 2015, which is not even the lower limit of the recommended value. Lower values of cash position ratio indicate a reduced ability to pay short-term liabilities; too high values indicate inefficient management. In the case of this company, the cash position ratio increased to 0.45, 0.54, and 0.56 in the years 2016-2018, which indicates inefficient management. Quick ratio, sometimes also called second-degree liquidity, provides information on how many times the company can pay its short-time liabilities if financial assets and short-term liabilities are converted into cash. The recommended value ranges between 0.7 and 1.2. As seen in Table 32, quick ratio in all years of the monitored period corresponds to the recommended range. As for current liquidity, its values range between 0.73 and 1.09, where the recommended range is 1.5-2. This ratio indicates how many times the company would be able to pay its short-term liabilities if all current assets are converted into cash.

Company with the Registration Number 70994234-Správa Železnic, Státní Podnik
The items including the word "turnover" indicate how long the assets remain in their initial form before being converted into sales or cash. In the case of assets, this period should be as short as possible, while in the case of liabilities, it is the other way around. Financial leverage shows the ratio of debt capital to the total assets of a given company. The higher the financial leverage (smaller share of equity on the total resources), the greater the effect of the financial leverage is on return on equity.  Table 33 shows the financial analysis using ratios for the years 2015-2018. Comparison of the tables for both companies shows that the results of the company 5886 are slightly better. Return on assets is significantly shorter than in the case of the company 70994234, while financial leverage is comparable with the company Správa železnic, státní podnik. EBITDA is the so-called earnings before Interest, Taxes, Depreciation, and Amortization; it shows the operating performance of a company. Table 33 shows that its value is significantly higher than in the case of the company 7099423. On the contrary, liquidity shows worse values. Current ratio does not achieve even the lower limit of the recommended value in any year of the monitored period. Quick ratio shows better values with the values being in the recommended range. Cash position ratio exceeds the limits of the recommended range for all years, which indicates inefficient management.

Discussion
Within this contribution, one research question was formulated on whether there are only a few companies that largely affect the growth in the transport sector. The research question was answered in the research section of this contribution. The dataset including the accounting data of all companies was presented to the Kohonen network for training; the result was a map of clusters graphically representing the presence and number of companies in individual clusters. Within the research, the so-called business value generators, on the basis of which it was determined to what extent the individual clusters participated in their creation. A ranking of the strongest clusters was compiled for individual years and individual variables, where the first position was mostly occupied by the cluster (1, 1), which was imaginarily replaced by the cluster (10, 10) in the year 2017. All clusters were assigned points based on their position in the individual tables; subsequently, a ranking of the strongest clusters for all variables in all years of the monitored period was compiled. The strongest cluster turned out to be the cluster (1, 1) containing 3-5 companies between 2015 and 2018. The second strongest cluster was the cluster (2, 1). As the strongest cluster in terms of the largest volume of assets, fixed assets, sales, and operating result in the years 2015-2018, it contained 3-5 companies, thus providing the answer to the research question: Are there only a few companies that have a great effect on the growth of the transport sector? The answer to this research question is thus yes. The transport sector is to a large extent influenced only by 3-5 companies, which are interconnected with the transport sector in such a way that if the companies included in the strongest cluster (1, 1) make a profit, stagnate, or even go bankrupt, this will have a significant influence on the transport sector of the CR. The companies that appeared in the cluster (1, 1) in all four years of the monitored period are Správa železnic, státní podnik and Dopravní podnik hl. města Prahy. These companies differ from each other in terms of their financing and management; however, both were identified as leading companies in the transport sector of the Czech Republic in terms of generating business value. The share in the total assets of all companies in the transport sector included in the original dataset ranges between 15.82% and 20.01%. This means that specifically Dopravní podnik hl. města Prahy, with its share of 20.01% makes up more than a fifth of the total assets of all companies. In the case of fixed assets, this company achieves the highest share in the year 2016 (26.77%). In the case of operating result, its share is highest in the year 2017, when it achieves 13%. A similar application of Kohonen networks was discussed by (Stehel et al. 2019), whose aim was to analyze companies operating in the agriculture sector of the Czech Republic using the Kohonen network and subsequent prediction of their further development. They created a dataset, which contained, as in the case of the research concerning the transport sector, data from financial statements of 4201 companies operating in the agriculture sector of the Czech Republic in the year 2016. The dataset was generated based on Bisnode´s Albertina database. The dataset was subsequently subject to cluster analysis using the Kohonen network. The advantages of using the Kohonen network are also mentioned by (Du Jardin and Séverin 2012), whose results show that out-of-sample error achieved by the map remains more stable over time than the error achieved by traditional methods used for proposing failure models (discriminant analysis, logistic regression, Cox model, and neural networks).

Conclusions
The transport sector is very important for the Czech Republic, as it is interconnected with other sectors and represents a great contribution to the economy of the Czech Republic. The transport sector includes mainly companies providing services. The goal of the research was to analyze the financial situation of the companies operating in the transport sector and the whole transport sector of the Czech Republic using artificial intelligence, specifically Kohonen networks. The contribution pointed to the importance of the transport sector and explained how cluster analysis and specifically an artificial intelligence method-Kohonen network-can be used for creating clusters based on similar characteristics; it also presented the clustering system and the importance of clusters as such. The methodology chapter was quite extensive because of the necessity to explain the methods and procedures used in the research and to specify the methods used for identifying the strongest cluster in the transport sector based on raw data, since the neural networks and working with them is a complex and exacting discipline. The methodology also included the information on the dataset used for the training and on the modification of the dataset so that it could be presented to the network. Both companies included in the cluster (1, 1) in all years of the monitored period were finally subject to financial analysis and subsequently described. The performance of a thorough analysis of the economic development of the transport sector using Kohonen network and subsequent prediction of its development enabled meeting the objective of the research. The added value of the research and its benefits can be seen in its complexity and details, which provided the information on what companies most affect the transport sector and how. This approach can be considered as being of general validity; it is thus applicable for other sectors or groups of companies. It also provides analysts with the opportunity to monitor the situation in these companies and in the case of fall or sudden growth, to focus on specific companies that are most likely to have given rise to the situation, since they had undergone the changes that affected the whole transport sector. The limitations of the research consist in the lack of international reach; however, this can be a subject of further investigation. With a slight modification of the choice of variables (business value generators), this model would be applicable for any group of companies in the same or a different sector. A possible improvement of the research can be the application of an optimized Kohonen network, as done by (Harchli et al. 2014), who used the generation of a heuristic method prior to the phase of network learning. The main objective was to find the initial parameters of the map, which means finding centroids in the most homogeneous areas of the dataset. Subsequently, the Kohonen learning phase was started and the obtained clustering was evaluated using the map. These two phases were repeated until reaching a desired number of iterations. The result shows that the proposed method can provide better clustering results than a traditional topological map. In further research on the application of the Kohonen map to certain sectors or groups of companies, the optimization of the model could thus provide better results.