Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis

Martinov, Svetoslav; Ivanov, Ivan; Petkov, Kiril

doi:10.3390/engproc2025121003

Open AccessProceeding Paper

Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis^†

by

Svetoslav Martinov

^1,*,

Ivan Ivanov

² and

Kiril Petkov

³

¹

Department of Railway Engineering, Technical University of Sofia, 8 Kl Ohridski Blvd., 1000 Sofia, Bulgaria

²

Department of Mathematical Analysis and Differential Equations, Technical University of Sofia, 1000 Sofia, Bulgaria

³

Department of Mathematical Modelling and Numerical Methods, Technical University of Sofia, 1000 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

^†

Presented at the 17th International Scientific Conference on Aerospace, Automotive, and Railway Engineering (BulTrans-2025), Sozopol, Bulgaria, 10–13 September 2025.

Eng. Proc. 2026, 121(1), 3; https://doi.org/10.3390/engproc2025121003

Published: 12 January 2026

(This article belongs to the Proceedings of The 17th International Scientific Conference on Aerospace, Automotive, and Railway Engineering)

Download

Browse Figures

Versions Notes

Abstract

A new hybrid forecasting model of the number of freight trains is proposed in this paper. This forecasting is performed by combining classification analysis and the use of Histogram-based Gradient Boosting Regressor (HGBR). The freight trains were classified into categories according to the transportation parameters and the train’s structure. A total of 100,441 freight trains with a constant number of wagons have been studied. The trains were ran over a 3-year basis period. The forecasting was performed using train categories for a forecast period lasting 1 year. The results obtained in this study have been validated by comparing them against real data from the trains that ran during the forecast period. The adequacy of the forecasting model was assessed using standard indicators, with the value of the Coefficient of Determination equal of 0.805. The results show the high level of accuracy of the model, despite discrepancies in the data of the base period concerning the train parameters for 34% of the trains.

Keywords:

number of freight trains; classification analysis; train categories; forecasting model; gradient boosting; HGBR; machine learning; time series regression; transportation data

1. Introduction

Rapidly changing conditions in the modern world are a challenge for different modes of transport in relation to providing reliable and competitive freight transport services. The rail transport is a key element of modern logistics chains. In this regard, forecasting railway transport is an essential part of the process of planning the activities of railway enterprises, of using the enterprise resources, and of forming transport tariffs.

It is a widespread practice in rail freight transportation for railway undertakings to apply freight rate tariffs. The tariffs are documents that outline the prices and rules for transporting goods in specific conditions [1,2]. The tariffs of railway enterprises are valid for a specific period of time and are formed for an upcoming period. This is related to the application of costing and pricing models. A main task of railway undertakings is to determine the value of the costs associated with transportation. It is necessary to determine a transportation price for the company’s customers. The tariffs are developed in a period preceding the period of their validity. This requires taking into account the upcoming conditions that impact transportation costs. On the one hand, these are the components involved in the formation of transportation costs: price of resources, infrastructure charges, costs for use of rolling stock, etc. On the other hand, there are the quantity of transported goods and the parameters of the transportation: number of trains, railway traffic organization, train composition, transport distance, etc.

The costs associated with rail freight transportation could be grouped into two main groups: operating costs directly related to the transportation process, and indirect costs arising from the activities of the railway undertaking [3]. A study of the operating costs of freight trains in Bulgaria [4] shows that some parameters involved in its formation have a more significant influence than others. Between 92% and 98% of the operating costs of a freight train [4] are formed through parameters related to the transport distance and/or the gross mass of the train. These parameters are as follows:

Charges for using the railway infrastructure [5]: The values are determined depending on the transport distance and the gross mass of the train.
The costs for locomotives [4], which include costs for movement of the train locomotives and for hourly and kilometer-based use of them; their values are determined on the basis of transport distance and the mass of the train.
Train staff costs [4] take into account the hourly and kilometer-based service of the train with train drivers and transport crews and are determined, respectively, indirectly and directly through transport distance.

The operating costs of trains also include the costs associated with the isolated movement and downtime of train locomotives before and after serving the train [4]. They are between 6% and 27% of the total operating cost per train and include the following: infrastructure charges; kilometer- and hourly based use of locomotives; and the energy costs for drive of locomotives and kilometer- and hourly based servicing of locomotives by locomotive drivers.

In this regard, it could be concluded that the main parameters that influence the value of operating costs of freight trains are as follows: the transport distance; the gross mass of the train; and the number of locomotives serving the train. These parameters are considered to be fundamental in developing the forecasting model and are used in classifying the trains in this study.

Freight transportation forecasting is part of the process of developing rate tariffs in rail freight transport. The estimated values for the volume and characteristics of transportation are used in planning the resources of the railway undertakings, for costing, and for developing freight rate tariffs. There are various models used for forecasting in rail freight transport [6,7,8,9].

A new hybrid model for forecasting the number of freight trains carried by a railway undertaking has been proposed in this study. The model is an element of a complex costing system [3,4] being developed for use by freight railway undertakings in Bulgaria. The model and the obtained results can be used in calculating the expected operating costs for the trains serviced by a railway undertaking during the forecast period.

2. Methodology of Research

The proposed hybrid model for forecasting the number of freight trains is based on combining a classification analysis of train composition and movement parameters and using Histogram-based Gradient Boosting Regressor (HGBR). A grouped structure of trains according to the basic parameters related to their composition and transportation was created by applying classification analysis. A database of the number and parameters of trains operated by railway undertakings during a past base period was used in the forecast. The expected number of trains during the forecast period was determined according to the classification analysis by applying the HGBR model.

The methodology used in the study includes the following basic steps:

Step 1. Database and period formulation

In this step, the historical basis period of data used for forecasting, the train composition data, and the transport parameters used in the forecast are defined.

Step 2. Train classification

The trains of the basis period are classified into groups and subgroups. The large number of studied trains and the diversity of their characteristics and composition (transport distance, net mass of the freight, gross mass of the train, number of locomotives, etc.) require the trains to be classified. The classification systematizes the number of studied options, allows us to take into account the respective characteristics of each train from the basis period, provides an opportunity to apply the obtained results in studies related to costing the expected operating costs for train movement during the forecast period, etc.

Step 3. Model selection

The Histogram-based Gradient Boosting Regressor (Hist Gradient Boosting Regres-sor, HGBR) from the scikit-learn library [10,11] was applied in this study to determine the number of trains during the forecast period. This model is well-suited for research conducted on structured tabular data and large-scale datasets. The model provides an efficient histogram-based implementation of gradient boosting. The algorithm discretized continuous features into integer bins, significantly reducing memory usage and training time while maintaining competitive predictive performance [12]. It could also natively handle missing values, which is particularly beneficial when real-world historic-based transport databases are used.

Step 4. Validation of the results

Validation of the obtained in forecasting results is carried out through comparison with available actual data for the trains transported during the forecasting period. The actual data is not used in the forecast process; it is used only to compare and validate the results obtained in the forecasting.

2.1. Database and Period Formulation

Forecasting was carried out using data from the basis period—a past historical period of time with available data for the parameters of freight trains. The forecast period is the period that follows immediately after the basis period and lasts one year. This duration allows for seasonal fluctuations in freight transportation to be taken into account. In the present study, the actual number and parameters of the trains that have been serviced in forecast period are known. This makes it possible for the forecasting results to be compared with available data for the forecast period.

The basis period covers 3 consecutive years (2021–23). A total of 100,441 freight trains with a constant number of wagons in the train’s composition (this number does not change at intermediate railway stations) have been studied: 39,118 in the first year, 34,160 in the second year, and 27,163 in the third year. The trains are transported over a distance of up to 635 km and are in the composition of up to 44 wagons, up to 2998 t of gross mass, and up to 2213 t of net mass. The trains were serviced by 1, 2, or 3 locomotives. The number of wagons in the train compositions and gross mass of the freight trains was determined through different organizational, technological, and technical parameters: train route, characteristics of the railway infrastructure on the route, amount of cargo, number and type of serving locomotives, etc. These parameters are not the object of research in this study, and their values were determined according to the available data.

The following are accepted as basis parameters on forecasting model development: the transport distance of the trains, the gross mass of the trains, and the number of locomotives serving trains.

2.2. Train Classification

The classification of trains was carried out using the transport parameters that have a significant impact on the value of the operating costs. The purpose of the classification was to reduce the total number of studied options. This is important because of the large number of trains with different characteristics that are transported during the basis period. The systematization of the options is carried out using the division of the trains with similar transport parameters into categories. The results obtained in the forecasting are based on the classification of the trains by categories. This allows for the results to be used in the planning activity of railway undertakings and in calculating the operating costs related to forecast trains.

In this study, freight trains with constant number of wagons are classified into categories using four main group parameters (Table 1):

Group A—number of locomotives serving the train (N_L, number);
Group B—the net mass (Q_N, t) of the goods transported by one train. This group contains two subgroups—trains carrying only empty wagons and trains carrying loaded wagons;
Group C—the length of the train route (L, km);
Group D—gross mass of the train (Q_G, t) according to the wagon list.

The different groups take into account the main parameters affecting the value of the operating costs of the train. Each main group consists of two or more subgroups. The subgroups determine the intervals of change in the parameters in the main group.

A four-digit designation of the symbol type ABCD is used for the classification of the trains into categories (Table 1). The meanings of the symbols and digits are as follows:

The first digit (symbol “A” of designation) refers to the number of locomotives serving the train (Group A parameters). Symbol “A” has two meanings: 1 or 2. When a train is served by a single locomotive (N_L = 1), the digit 1 is used (A = 1). When a train is served by two or three locomotives (1 < N_L ≤ 3, integer), the digit 2 is used (A = 2). An analysis of the data of the trains transported during the main period shows that 85% of the trains were served by one locomotive. Two locomotives served 14% of the trains, and about 1% trains were served by three locomotives. In the classification, trains served by two or three locomotives form a common group.
The second digit (symbol “B”) refers to the net mass of the goods carried by train (Group B). Two subgroups have been defined within this group with values of 0 or 1, respectively. Value 0 is used (B = 0) when the train was running with empty wagons (Q_N = 0 t). Value 1 is used (B = 1) when the train was carrying a load with net mass Q_N > 0 t. The data for the base period shows that 42.5% of the trains had only empty wagons. In freight railway transportation, it is often observed that there is a significant share of the empty mileage of freight wagons. This could be related to the specialization of wagons that are used in the transportation process or to the imperfections of the organization of transportation.
The third digit (symbol “C”) refers to the length of the train route. The trains in the main period have been classified into four subgroups within this group (Group C). Value 1 (C = 1) is used for trains with a route length L < 20 km (43% of all trains surveyed during the main period). Value 2 is used (C = 2) for the trains with a route length 20 ≤ L < 70 km (26% of the trains). Value 3 is used (C = 3) for train routes with a route length 70 ≤ L < 170 km (21% of the trains). Value 4 is used (C = 4) for the trains with a route length L ≥ 170 km (11% of the trains). The intervals defining the route length are expertly selected to ensure a sufficient number of trains into each subgroup.
The fourth digit (symbol “D”) refers to the gross mass of the trains. The trains of the basis period are classified into three subgroups within Group D. The trains with a gross mass Q_G < 800 t (58% of all trains) are designated with digit 1 (D = 1). Digit 2 (D = 2) denotes trains with gross mass 800 ≤ Q_G < 1600 t (36% of trains), and digit 3 (D = 3) stands for trains with gross mass Q_G ≥ 1600 t (6% of all trains). The subgroup intervals are based on studies related to the costing analysis of freight trains and determining the charges for using the railway infrastructure [4,13,14].

For the subgroup criteria described in Table 1, the maximum number of theoretically possible combinations of classifying the trains is 48. The practically possible number of combinations is limited by the technological and technical parameters referring to the transportation: the length of the trains, the number of wagons on a train, and the train’s mass. The distribution of the number of trains of the basis period in categories is shown in Figure 1, as the trains have been classified into a total of 45 categories.

The trains have not been classified according to the adopted criteria in three of the possible categories (1033, 1043, and 2043). Categories 1013, 1023, 2013, 2023, and 2033 are connected to trains with gross mass of over 1600 t carrying only empty wagons. These trains must carry more than 44 empty wagons in order to achieve the declared train gross mass of over 1600 t. This does not correspond to the data for the trains in the basis period where the maximum number of wagons per a train is 44. Therefore, it can be considered that the trains in categories 1013, 1023, 2013, 2023, and 2033 have been classified according to data from the base period that are incorrect.

According to the data for the basis period, trains carrying only empty wagons with an unrealistically high gross train mass (a total of 34,172 trains, or 34% of the all trains at basis period) have been identified. The number of these trains has been determined using the number of wagons in the relevant train composition, and an assumed value for the average tare per wagon is 22 tons. Similar inaccurate data do not match the technological and organizational standards for railway transport and lead to a distortion of the forecast results.

2.3. Forecasting Model

2.3.1. Problem Formulation

The forecasting task in this study is formulated as a supervised regression problem, where the target variable is the weekly count of freight trains for each predefined train category. Each data point represents one week for one train category. The objective is to estimate the expected number of trains per category in future periods based on historical operational data.

Although the data exhibits temporal characteristics, the prediction task is not approached with classical time series models such as Integrated Autoregressive Moving Average Models (ARIMA) [15] or exponential smoothing. Instead, the problem is reframed into a regression setting, allowing for the use of tabular machine learning algorithms that leverage multiple features. This transformation is suitable when

The goal is one-step-ahead forecasting on a fixed time window (e.g., weekly resolution);
Categorical groupings are present (e.g., train type, route length, mass of train, etc.);
Temporal patterns (such as holidays or seasonality) can be represented explicitly as engineered features [16]. To capture temporal dynamics, a set of calendar-based features could be engineered, including the week of the year, the year, the day of the week, the month, and the binary indicators for whether the week includes a national holiday or is in a high-traffic season.

Additionally, the category codes derived from the structured train classification scheme (Section 2.2) have been used as key predictive features.

2.3.2. Model Selection

For the regression model, the Histogram-based Gradient Boosting Regressor was selected. This algorithm offers a computationally efficient implementation of gradient boosting based on histogram binning, which accelerates training and reduces memory usage. It is especially suitable for large tabular datasets and can inherently handle missing values in both input features and the target [11].

Compared to traditional gradient boosting implementations, HGBR scales well to tens of thousands of training examples. Its design is inspired by LightGBM [12], which has shown superior performance in structured data challenges and time series forecasting competitions.

Although deep learning methods such as long short-term memory (LSTM) or Temporal Convolutional Networks are popular for sequence modeling, recent studies have demonstrated that gradient boosting algorithms could be matched or surpass them on real-world forecasting tasks. Especially when the data volume is limited or when explainability is essential [17,18]. HGBR offers a reliable and interpretable alternative with efficient training time and also demonstrated strong performance in real-world applications involving transport, logistics, and energy forecasting, where input variables are mixed (categorical + numerical) and the temporal resolution is weekly or daily.

2.3.3. Feature Engineering and Handling Missing Combinations

During preprocessing, we observed that some category–weak pairs had no train activity. Although it is common in time series forecasting to include such zero-traffic periods to preserve temporal continuity, in our case, this significantly degraded the model’s performance. Including these records led to higher prediction errors and a substantial decrease in the R² score. This is likely due to the imbalance between active and inactive weeks, and the difficulty of predicting sparse, zero-valued outputs in real transport operations.

Consequently, we chose to exclude category–week pairs with zero recorded trains from the training dataset. This decision allowed for the model to focus on learning the dynamics of actual transport activity, where meaningful patterns exist. Future work could explore modeling zero-activity periods using a separate classification model or hybrid time series approaches [19].

2.4. Model Validation

To comprehensively evaluate the model’s predictive capabilities, several regression metrics were employed, each providing unique insights into the model’s performance.

Mean Absolute Error (MAE)

MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. MAE offers a straightforward interpretation of the average error magnitude and is less sensitive to outliers than squared-error metrics [20]. It is calculated as

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \overset{\land}{y_{i}}|

(1)

where

y_{i}

—the actual value of count of trains;

\overset{\land}{y_{i}}

—the predicted value; and

n

—the number of the observations.

Mean Squared Error (MSE)

MSE calculates the average of the squares of the errors, emphasizing larger errors due to squaring. MSE penalizes large errors more heavily, making it suitable when such errors are particularly undesirable [21]. The value of MAE it is calculated as

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \overset{\land}{y_{i}})}^{2}

(2)

Root Mean Squared Error (RMSE)

RMSE is the square root of MSE, converting the error metric back to the original units of the target variable, which aids interpretability. RMSE is sensitive to outliers and is widely used as a standard measure of predictive accuracy [21]. The value of RMSE is calculated as

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \overset{\land}{y_{i}})}^{2}}

(3)

Coefficient of Determination (R²)

The Coefficient of Determination (R²) represents the proportion of the variance in the dependent variable that is predictable from the independent variables. The values of R² closer to 1 indicate a better model fit [22]. The Coefficient of Determination it is defined as

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \overset{\land}{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(4)

where

\bar{y}

is the mean of the observed values.

3. Results and Discussion

The purpose of the study is to present an approach for forecasting the number of trains for a future period based on historical data from a past period through the use of time series regression analysis. The structure of the forecast data is presented in a form that is suitable for the results obtained in the study to be used for determining the expected operating costs of freight trains during the forecast period. This study was conducted for a period with real data that were provided by a railway enterprise. These data are available for a 3-year period (2021–23) that includes the consequences for world trade from the Coronavirus disease in 2019 (COVID-19). The significant decrease in the number of the trains during the base period, observed in the data, may be due to this or other internal-to-the-enterprise factors. For a more in-depth real study, data for a longer basis period can be used to apply the proposed model. The model can be expanded with other parameters of transportation: train route, type of cargo, type of locomotives, etc. This will lead to a significant increase in the number of train categories. In addition, part of such data is commercially sensitive information for railway undertakings. This is the reason why only the described main parameters (number of locomotives, length of the train route, net mass of the goods, and gross mass of the trains) have been used in this study. These parameters are the basic ones that affect the operational costs of transportation. This study was conducted with real data and does not aim to analyze the transportation process and its efficiency, including the transportation of empty wagons. It is not aimed at an analysis and optimization of the railway transportation process and/or its interaction with the other modes of transport. It is aimed at researching the possibilities of applying time series regression analysis for predicting the number of freight trains.

A forecast of the number of freight trains by categories for the forecast period has been performed using Histogram-based Gradient Boosting Regressor. The forecast has been carried out based on the data from the basis period for 100,441 trains with constant number of wagons in each train composition. The summarized results of the predicted (a total of 27,523 in forecast period) and the actual (a total of 24,309 in forecast period) number of trains by categories for the forecast period are shown in Figure 2.

A comparison of the percentage of trains by subgroups of parameters for the forecasted and actual number of trains during the forecast period is shown in Figure 3a. The results show a difference of over 5% in the two subcategories (A = 1 and A = 2) of the parameters from Group A relating to the number of locomotives. The average weekly distribution for the number of predicted and actual trains during the forecast period is presented in Figure 3b.

The predictive model was trained on a dataset encompassing three years of historical data and evaluated on a separate one-year test set. The model’s performance was assessed using standard regression evaluation metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). The obtained performance metrics are as follows: MAE = 4.25; MSE = 45.32; RMSE = 6.73; and R² = 0.805. The average weekly values of Mean Absolute Error (MAE) by train category are shown in Figure 4.

In this study, the value of R² = 0.805 shows that the model explains a substantial portion of the variance in weekly train counts, confirming its effectiveness in capturing underlying data patterns. These results indicate that the model achieves a high level of accuracy, with an R² value suggesting that approximately 80.5% of the variance in weekly train counts is explained by the model.

4. Conclusions

The forecasting model demonstrated strong performance in predicting the weekly number of trains across various categories. Evaluated on a separate test set of real operational data for one calendar year, the model has achieved a Mean Absolute Error (MAE) of 4.25 and a Root Mean Squared Error (RMSE) of 6.73. These values indicate that average weekly predictions of the model deviate only by a few trains from the actual observed counts of trains. The relatively low value of RMSE confirms that the large prediction errors are relatively rare.

Most notably, the Coefficient of Determination R² reached 0.805, signifying that over 80% of the variance in weekly train counts is successfully explained by the model. This level of explained variance suggests the model effectively captures both time trends and train categories-specific fluctuations in traffic.

The results show that the model could be considered to be reliable for short-to-medium-term forecasting in operational settings. It has a practical value for railway undertakings, enabling scheduling based on resource allocation and the early identification of demand shifts. While the current approach provides a good prediction, further improvements could involve integrating of external variables (such as rail infrastructure capabilities or disruptions, seasonality of transportation, economic indicators, etc.) or input data refinement to capture additional variability and further enhance forecast precision.

In conclusion, the model presents a solid foundation for predictive decision-making in railway transportation, with demonstrated effectiveness for real-world data. The forecasting results of the number of trains obtained in this study could be used to calculate the value of the expected operating costs of railway undertaking and to be used in the pricing and tariffing of transportation activities.

Author Contributions

Conceptualization, S.M. and I.I.; methodology, S.M. and I.I.; software, I.I. and K.P.; validation, S.M. and I.I.; formal analysis, S.M.; investigation, S.M. and I.I.; resources, S.M. and I.I.; data curation, S.M. and I.I.; writing—original draft preparation, S.M., I.I. and K.P.; writing—review and editing, S.M.; visualization, I.I. and K.P.; supervision, S.M.; project administration, S.M.; funding acquisition, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

The proceeding paper was funded by the Research and Development Sector at the Technical University of Sofia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study uses real data that are not publicly available. Restrictions apply to the availability of these data. Data were obtained from third party and are available from the authors with the permission of third party.

Acknowledgments

The authors would like to thank the Research and Development Sector at the Technical University of Sofia for the financial support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dolinayová, A.; Černá, L.; Hřebíček, Z.; Zitrický, V. Methodology for the Tariff Formation in Railway Freight Transport. Naše More 2018, 65, 297–304. [Google Scholar] [CrossRef]
Jarocka, M.; Ryciuk, U. Pricing in the railway transport. In Proceedings of the 9th International Scientific Conference “Business and Management 2016”, Vilnius, Lithuania, 12–13 May 2016. [Google Scholar]
Martinov, S.; Yanakiev, T.; Yanev, M. Guidelines for developing a pricing model in rail cargo transport. Mech. Transp. Commun. 2024, 22, 2571. (In Bulgarian) [Google Scholar]
Martinov, S. Study of the Operating Costs of Freight Trains; Monograph; Technical University of Sofia: Sofia, Bulgaria, 2024; p. 190. ISBN 978-619-167-554-8. (In Bulgarian) [Google Scholar]
Network Statement 2024-2025 of the State Enterprise; National Railway Infrastructure Company: Sofia, Bulgaria, 2025. Available online: https://www.rail-infra.bg/en/377 (accessed on 25 June 2025).
Wang, W. A Review of Rail Freight Forecasting Studies. Int. Core J. Eng. 2022, 8, 609–616. [Google Scholar] [CrossRef]
Liu, C.; Zhang, J.; Luo, X.; Yang, Y.; Hu, C. Railway Freight Demand Forecasting Based on Multiple Factors: Grey Relational Analysis and Deep Autoencoder Neural Networks. Sustainability 2023, 15, 9652. [Google Scholar] [CrossRef]
Feng, C.; Lei, Y. Research on interval prediction method of railway freight based on big data and TCN-BiLSTM-QR. IET Intell. Transp. Syst. 2024, 18, 2713–2724. [Google Scholar] [CrossRef]
Mu, X.; Cheng, X.; Zhu, Y.; Tang, Y. Prediction on the number of railway freight trains based on binary linear model. Zhongguo Tiedao Kexue/China Railw. Sci. 2013, 34, 113–117. [Google Scholar] [CrossRef]
Khiari, J.; Olaverri-Monreal, C. Boosting Algorithms for Delivery Time Prediction in Transportation Logistics. In Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 17–20 November 2020; p. 8. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Available online: https://papers.nips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf (accessed on 25 June 2025).
Thompson, L. Charges for the Use of Rail Infrastructure 2008; International Transport Forum (ITF): Paris, France, 2008. [Google Scholar]
Report from the Commission to the European Parliament and the Council. Eighth Monitoring Report on the Development of the Rail Market Under Article 15(4) of Directive 2012/34/EU of the European Parliament and of the Council. COM(2023) 510 Final, Brussel. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52023DC0510 (accessed on 25 June 2025).
Kontopoulou, V.I.; Panagopoulos, A.D.; Kakkos, I.; Matsopoulos, G.K. A Review of ARIMA vs. Machine Learning Approaches for Time Series Forecasting in Data Driven Networks. Future Internet 2023, 15, 255. [Google Scholar] [CrossRef]
Bandara, K.; Bergmeir, C.; Smyl, S. Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert Syst. Appl. 2020, 140, 112896. [Google Scholar] [CrossRef]
Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
Alizadegan, H.; Malki, B.R.; Radmehr, A.; Karimi, H.; Ilani, M.A. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Explor. Exploit. 2025, 43, 281–301. [Google Scholar] [CrossRef]
Shih, S.-Y.; Sun, F.-K.; Lee, H.-Y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 2019, 108, 1421–1441. [Google Scholar] [CrossRef]
Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Number of trains by category at the basis period.

Figure 2. Predicted and actual number of trains by category.

Figure 3. Trains distribution: (a) the percentage comparison of trains by subgroups for the predicted and actual trains; (b) average weekly distribution for the number of predicted and actual trains.

Figure 4. Values of Mean Absolute Error per week by train category.

Table 1. Groups and subgroups of parameters.

Main Group Parameters	Subgroup Criteria	Train Category Designation
Group A. Number of locomotives (N_L, number)	N_L = 1 1 < N_L ≤ 3	1BCD 2BCD
Group B. Net mass of the goods transported by one train (Q_N, t)	Q_N = 0 t Q_N > 0 t	A0CD A1CD
Group C. Length of the train route (L, km)	L < 20 km 20 ≤ L < 70 km 70 ≤ L < 170 km L ≥ 170 km	AB1D AB2D AB3D AB4D
Group D. Gross mass of the train (Q_G, t)	Q_G < 800 t 800 ≤ Q_G < 1600 t Q_G ≥ 1600 t	ABC1 ABC2 ABC3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martinov, S.; Ivanov, I.; Petkov, K. Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis. Eng. Proc. 2026, 121, 3. https://doi.org/10.3390/engproc2025121003

AMA Style

Martinov S, Ivanov I, Petkov K. Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis. Engineering Proceedings. 2026; 121(1):3. https://doi.org/10.3390/engproc2025121003

Chicago/Turabian Style

Martinov, Svetoslav, Ivan Ivanov, and Kiril Petkov. 2026. "Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis" Engineering Proceedings 121, no. 1: 3. https://doi.org/10.3390/engproc2025121003

APA Style

Martinov, S., Ivanov, I., & Petkov, K. (2026). Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis. Engineering Proceedings, 121(1), 3. https://doi.org/10.3390/engproc2025121003

Article Menu

Forecasting the Number of Freight Trains by Categories Using Time Series Regression Analysis^†

Abstract

1. Introduction