Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept

Banjanin, Milorad K.; Stojčić, Mirko; Popović, Đorđe; Anđelković, Dejan; Jauševac, Goran; Husić, Maid

doi:10.3390/su17198718

Open AccessArticle

Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept

by

Milorad K. Banjanin

^1,2,*

,

Mirko Stojčić

^3,*

,

Đorđe Popović

⁴

,

Dejan Anđelković

⁵,

Goran Jauševac

³

and

Maid Husić

⁶

¹

Department of Computer Science and Systems, Faculty of Philosophy Pale, University of East Sarajevo, Alekse Šantića 1, 71420 East Sarajevo, Bosnia and Herzegovina

²

Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, 21000 Novi Sad, Serbia

³

Department of Information and Communication Systems in Traffic, Faculty of Transport and Traffic Engineering Doboj, University of East Sarajevo, Vojvode Mišića 52, 74000 Doboj, Bosnia and Herzegovina

⁴

Government of the Republic of Srpska, Ministry of Energy and Mining, 78000 Banja Luka, Bosnia and Herzegovina

⁵

Faculty of Applied Sciences in Niš, University Business Academy in Novi Sad, Višegradska 47, 18000 Nis, Serbia

⁶

City of Zavidovici, City Administration, Mehmed-paše Sokolovića 9, 72220 Zavidovići, Bosnia and Herzegovina

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(19), 8718; https://doi.org/10.3390/su17198718

Submission received: 10 September 2025 / Revised: 24 September 2025 / Accepted: 26 September 2025 / Published: 28 September 2025

(This article belongs to the Special Issue Sustainable Traffic Flow Management and Smart Transportation)

Download

Browse Figures

Versions Notes

Abstract

Postal traffic and transport face challenges related to the rapid growth of parcel volumes, increasing demands for sustainability, and the need for integration into the smart transportation concept. This study explores the application of machine learning (ML) models for the classification of postal delivery times, with the aim of improving service efficiency and quality. As a case study, the Postal Center Zenica, one of the seven organizational units of the Public Enterprise “BH Pošta” in Bosnia and Herzegovina, was analyzed. The available dataset comprised 11,138 instances, which were cleaned and filtered, then expanded through two iterations of data augmentation using an autoencoder neural network. Five ML models, Random Forest, Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), k-Nearest Neighbors (kNN), and Multi-Layer Perceptron (MLP), were developed and compared, with hyperparameters optimized using the Bayesian method and evaluated through standard classification metrics. The results indicate that the data augmentation method significantly improves model performance, particularly in the classification of delayed shipments, with ensemble, especially Random Forest and XGBoost, emerging as the most robust solutions. Beyond contributions in the context of postal traffic and transport, the proposed methodological framework demonstrates interdisciplinary relevance, as it can also be applied in telecommunication traffic classes, where similar network dynamics require reliable predictive models.

Keywords:

machine learning; postal traffic; smart transportation; data augmentation; delivery time classification

1. Introduction

The postal system is a complex multimodular transport network involving both public and private operators, inseparably linked to the sophisticated roles and urban needs of businesses and citizens. The development of e-commerce has made it one of the key drivers of economic growth and technological progress at the local and regional level, while global trends simultaneously indicate continuous growth in the parcel transport module and a slowdown in the decline of the letter mail module. Today, sustainable development is oriented not only toward continuously updated economic objectives and achieved business outcomes, but also toward overcoming the limitations of operators’ capacity to adapt to new cultural and societal needs through innovative and digital approaches. Within the smart transportation concept (STC), postal systems are becoming part of a broader ecosystem of smart mobility, where the integration of new technologies, data exchange, and the application of machine learning (ML) enable resource optimization, more accurate predictions, and the provision of more efficient and environmentally friendly services [1].

The STC operates through a cyclical loop of research, planning, execution, control, monitoring, and reflection, based on the principles of sustainability, equity, and resilience to different categories of constraints in the business processes of multimodal transport systems. An important aspect of STC is urban mobility, which develops in an environment that incorporates realistic sociodemographic attributes, allowing heterogeneous and context-sensitive behavior of service users and postal operators in multimodal and unstable urban scenarios. The smart transportation framework encompasses multidimensional roles of digital technologies and data-driven approaches in environments often characterized by unstable states and dynamic actions, where ML models stand out as key methods, techniques, and algorithms (tools). According to [2], smart transportation is a concept based on the application of advanced sensors, information and communication technologies, and management strategies with the aim of improving the efficiency and safety of transport systems. Its core principles—sustainability, integration, safety, and adaptability—are directed toward mobility, environmental responsibility, and economic development.

ML plays a central role in this context, as it enables the analysis of large datasets to identify patterns, predict demand for product and service portfolios, and estimate delivery times of transport shipments, as well as optimize resources in line with the needs of business and urban users. Applications include the analysis of shipment flows, assessment of operator competitiveness, and planning of new logistics capacities. Nevertheless, the broader application of proven methods such as Random Forest and boosting algorithms is still lacking, which opens space for research aimed at developing solutions that contribute to the sustainability of the postal transport system and its integration into the smart transportation environment [3], making it capable of reflective adaptation to environmental disruptions and changes in the personal preferences of postal service users and transport operators.

Previous research by the authors has addressed the prediction of the number of express mail parcels using statistical (SARIMA, Prophet) and machine learning models (Support Vector Regression—SVR, ANN), enabling the identification of seasonal and daily patterns and more efficient resource planning [4]. In addition, models for classifying parcel arrival rates (Random Forest, XGBoost, MLP, SVM) were developed, with SVM optimized by the Bayesian method achieving the highest accuracy, confirming the importance of ML approaches for enhancing the sustainability of the postal system [5].

Although ML and data-driven approaches have been widely applied in logistics and multimodal transport, the smart transportation framework in the postal system remains insufficiently explored. This study aims to fill that gap through the development and comparison of multiple ML models for the classification of delivery time, considered a particularly sensitive categorical and temporal process variable. The case study focuses on Postal Center Zenica, one of the seven organizational units of the Public Enterprise “BH Pošta” in Bosnia and Herzegovina, which allows for validation of the proposed approach on real data. In addition, the research examines the potential transfer of the developed solutions to the field of telecommunication traffic, where similar network dynamics create opportunities for interdisciplinary application. Although it is possible to analyze and investigate different modules of the postal system, such as the parcel module, the letter mail module, the logistics and distribution module, the universal postal service module, the financial-postal module, the philatelic module, the electronic and hybrid services module, the courier-express module, the international shipment module, and even innovative drone-based solutions, the focus of this paper is on express mail. This module in Bosnia and Herzegovina is characterized by strong competitiveness and represents the most sensitive area with regard to the accuracy of delivery time prediction, which further justifies its selection as the central focus of the research.

The main contributions of this study can be summarized as follows:

Comparative analysis of multiple ML models, Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), k-Nearest Neighbors (kNN), and Multi-Layer Perceptron (MLP), through a systematic evaluation of different performance metrics.
Development of an original methodology that combines the data augmentation method using an autoencoder with hyperparameter optimization through the Bayesian method, thereby improving classification accuracy and overcoming the limitations of a relatively small dataset.
Application to real-world research data from the postal transport system in a geographic area where no similar studies have been conducted, giving the research a unique case-study character of significance.
Practical applicability of the proposed solutions, reflected in the relative ease of integration into a real postal transport system, with direct potential to improve service quality and reliability.
Interdisciplinary significance, as the methodological framework developed for the postal transport system can also be applied in other domains of smart transportation, including telecommunications and urban logistics.

The paper is structured into five sections. Following the introduction, the second section provides a review of relevant literature on the development of the postal transport system within the smart transportation concept, with particular emphasis on the application of machine learning. The third section describes in detail the research data, their preparation, and the applied modeling methodology. The fourth section presents the experimental results and their interpretation in the context of postal transport system sustainability. Finally, the fifth section offers conclusions and outlines directions for future research.

2. Review of Relevant Published Research

In a study conducted in Turkey [6], ML algorithms Random Forest, Gradient Boosting (GB), kNN, and Neural Networks (NN) were applied to predict the delivery time of postal shipments using the e-scooter module. The best results were achieved by GB (R² = 0.845), with the lowest error values, confirming its reliability in the assessment of the duration of certain activities of delivery of mail to recipients. A study based on Canada Post data for the Toronto area [7] examined the application of deep learning for delivery time prediction in the last-mile segment. The authors developed an end-to-end neural model within Internet of Things (IoT) and cloud-based smart city architecture, using Origin–Destination (OD) data and weather conditions as inputs. Results showed that convolutional neural networks significantly outperformed classical ML models and OD reference approaches, enabling more accurate delivery time predictions and enhancing last-mile logistics. For Deutsche Post in Germany, a courier-oriented routing algorithm and an ML model for predicting delivery time windows were developed [8]. By combining operations research, statistics, and ML, the model leverages the implicit knowledge of experienced couriers through historical delivery sequence data.

Medić et al. [3], in their systematic literature review on the application of ML in urban logistics systems, emphasized studies focused on delivery time estimation in the last-mile segment. They noted that models such as random forest, convolutional, and residual neural networks were applied to large postal datasets (e.g., Canada Post) to accurately predict delivery times from depots to end-users. In [9], the authors compared linear regression models and ensemble approaches (random forest, bagging, boosting) and demonstrated that algorithms such as LightGBM and CatBoost achieved superior results compared to classical methods. Dobrodolac et al. [10] analyzed potential applications of artificial intelligence in postal systems, highlighting that AI models enable route optimization and more precise delivery time estimation, including adjustments of delivery schedules according to customer expectations. Karakaya [11] compared various ML models and ensemble learning methods using a large Amazon delivery dataset for delivery time classification. Results showed that ensemble methods, particularly a combination of SVM, Naive Bayes (NB), and Linear Discriminant Analysis (LDA) with Extra Trees (ET) as a meta-model, achieved extremely high accuracy (99.89%).

In [12], the authors focused on service time prediction (STP) in last-mile delivery, which is a key component of total delivery time. They proposed the Meta-learning Service Time Prediction (MetaSTP) model, a neural network based on meta-learning and Transformer architecture, which uses location-based features to better represent complex delivery conditions and address class imbalance. Experiments on two real datasets showed that MetaSTP outperformed baseline models by at least 9.5% and 7.6%. Küp et al. [13] developed a framework for delay prediction using real-world data from an e-commerce logistics company. The delivery process was modeled in two variants—with 11 and 15 operational steps, where each step represented a phase of shipment handling (e.g., acceptance, transfer, transport, terminal operations). Separate models were trained for each phase to estimate delay probabilities in real time. Logistic Regression (LR), Random Forest, XGBoost, and Categorical Boosting (CatBoost) were compared, with boosting methods achieving the highest Area Under the Curve (AUC) values (up to 99.9%).

In [14], the authors addressed the prediction of on-time delivery rates at the courier level, which is essential for workforce productivity and customer satisfaction in the express industry. They proposed a deep spatio-temporal neural network, the Regional Courier Correlation Network (RCCNet), combining Node2vec for road network graph encoding, Graph Convolutional Networks (GCN) for modeling courier interdependencies, and Long Short-Term Memory (LSTM) networks for leveraging historical sequences. In [15], factors affecting food delivery time were analyzed, and different ML models were tested on a Kaggle dataset. Linear Regression, Decision Trees, Random Forest, and Extreme Gradient Boosting Regressor (XGBRegressor) were compared, with XGBRegressor achieving the best results. In [16], the focus was on Service Time Prediction (STP), which, combined with travel time, determines the number of deliveries that can be planned in a day. Traditional approaches rely on planners’ manual estimates, which are often slow and inaccurate. Instead, a data-driven method based on GPS data was proposed for automatically determining historical service times and applying them in a kNN regression model.

In [17], the authors introduced Transformer-based multi-task Package Delivery Time prediction (TransPDT) for accurately estimating delivery times in complex logistics scenarios where couriers simultaneously perform deliveries and pickups. The model employs a Transformer encoder to capture spatio-temporal dependencies from couriers’ historical routes and shipment sets, while a memory module with attention improves the modeling of pickups, which have stricter time constraints. In [18], the possibility of predicting Service Level Agreement (SLA) travel times for goods and document shipments of PT Pos Indonesia from Java to the islands of Kalimantan, Sulawesi, Maluku, and Papua was analyzed. Random Forest was used for modeling, achieving an average accuracy of 83.86% across four experiments.

Table 1 provides a comparative overview of selected studies relevant to the application of machine learning in postal and logistics systems within the smart transportation framework. The reviewed papers demonstrate a wide range of approaches, from classical ML models such as Random Forest and Gradient Boosting to advanced deep learning architectures including convolutional and Transformer-based neural networks. While different contexts were analyzed ranging from e-scooter deliveries and last-mile parcel logistics to large-scale e-commerce platforms and national postal operators—the common objective was to improve delivery time estimation, route optimization, and service quality. These contributions highlight the growing importance of integrating ML and data-driven methods into smart transportation systems, with particular relevance for enhancing the efficiency and sustainability of postal transport.

3. Materials and Research Methods

The research methodology was based on a multi-stage process of data processing and modeling, as illustrated in Figure 1. The initial dataset of 11,138 instances first underwent a feature engineering process, in which new variables were derived, followed by a data cleaning phase that reduced the number of instances to 9691. Input variables were defined as numerical, while the output variable was categorical. In this stage, an initial evaluation of several ML models with default hyperparameters was conducted in order to obtain a preliminary insight into performance and directions for further modeling. After additional filtering of the data, resulting in a subset of 4166 instances, the models were tested again under the same conditions to compare results on a smaller but higher-quality sample. Subsequently, two iterations of augmentation were carried out: in Iteration I the dataset was expanded to 8332 instances, while in Iteration II the number of instances was increased to 16,664. In both cases, training, validation, and testing of five ML models with hyperparameter optimization were performed. Finally, an analysis and evaluation of model performance were conducted for both iterations.

3.1. Research Variables and Data

The postal transport data used in this study were provided by the Public Enterprise “BH Pošta,” Postal Center Zenica, based on a formal request. A portion of these structured research data is presented in Table 2. The spatial and temporal variables (1) Origin Post Office, (2) Date of Acceptance, (3) Destination Post Office, and (4) Date and Time of Delivery Recording are shown in the table columns, comprising a total of 11,138 instances.

The total number of unique origin post codes (Table 2) within the area organizationally covered by the Postal Center Zenica, where shipments were accepted, is 39. In the observed dataset, the number of destination post offices across Bosnia and Herzegovina is significantly higher, amounting to 365. The data were collected during the period from 2 March 2022 to 31 March 2022.

3.2. Derivation of New Variables

Analysis of the data presented in Table 2 shows that they are not fully suitable for processing with ML methods, which makes it necessary to derive new variables using feature engineering techniques that carry more informative value. The derivation of new variables from the original temporal and spatial variables is performed to identify patterns in the dynamics of postal shipment acceptance, improve the predictive power of the models, and enable easier practical interpretation of the results. Based on the postal addresses of the origin and destination post offices, geocoding was performed using the OpenStreetMap service to obtain the coordinates of these locations in the format of latitude and longitude (lat/long) pairs. The aerial distance between these two points was then calculated using the Haversine distance formula [19]:

d = 2 r \cdot a r c s i n (\sqrt{\sin^{2} (\frac{∆ φ}{2}) + \cos (φ_{1}) \cos (φ_{2}) \sin^{2} (\frac{∆ λ}{2})})

(1)

where φ₁ and φ₂ are the geographic latitudes (in radians); λ₁ and λ₂ are the geographic longitudes (in radians); Δφ = φ₂ − φ₁ and Δλ = λ₂ − λ₁; r is the radius of the Earth (≈6371 km); and d is the surface distance on the Earth (in km). Furthermore, based on the coordinates of the destination post office, reverse geocoding was applied to determine the geographical affiliation of this location to a broader region—Republika Srpska (RS), Federation of BiH (FBiH), or the Brčko District in Bosnia and Herzegovina.

From the temporal data of shipment acceptance, two additional variables were derived: day of acceptance in the week (Monday–Saturday) and hour of acceptance during the day (6:00–20:00). In addition, the difference between the acceptance date and time and the delivery date and time was calculated as the delivery time, expressed in hours. These derived variables enable more precise modeling and identification of patterns in postal flows. Thus, in this stage of the research, a total of five independent and dependent variables were derived, as presented in Table 3, two spatial and three temporal.

3.3. Data Cleaning

Out of the initial 11,138 instances, the geocoding algorithm used for determining the distance between the origin and destination post offices failed to assign correct values in 1447 cases. Instead of the actual distance expressed in kilometers, the values of variable X₁ in these instances were equal to zero. By removing such rows, the final dataset was reduced to 9691 instances.

3.4. Training, Validation, and Testing of ML Models with Default Hyperparameters

Before building the ML models, it is a common practice to categorize variables in order to reduce the influence of noise and outliers, making the models more stable and interpretable. This also facilitates the operation of algorithms such as decision trees, where high variability is grouped into broader categories. Variable X₁ was categorized according to the following rules:

if 0 ≤ distance < 50, then X₁ = “small”;
if 50 ≤ distance < 150, then X₁ = “medium”;
if distance ≥ 150, then X₁ = “large.”

Variables X₂ and X₃ already contained natural categorical values:

X₂: Republika Srpska, Federation of BiH, or Brčko District;
X₃: 1 (Monday), 2 (Tuesday), …, 6 (Saturday).

The rules for categorizing variable X₄ were as follows:

if 6 ≤ acceptance hour < 10, then X₄ = “morning”;
if 10 ≤ acceptance hour < 14, then X₄ = “noon”;
if 14 ≤ acceptance hour < 20, then X₄ = “afternoon”.

The values of the dependent variable Y were grouped into two classes:

“express”—delivery time of up to 24 h, and
“delay”—delivery time exceeding 24 h.

After data preparation, ML models were developed in the IBM SPSS Modeler 18.0 environment using the Auto Classifier node. The dataset with 9691 instances was split into three subsets: 80% for training, 10% for hold-out validation, and 10% for testing. This node automatically generates multiple classification models with default hyperparameters and ranks them according to the accuracy. The auto classifier node generated three default models with the highest accuracy, namely: SVM, C5.0 and Chi-squared Automatic Interaction Detection (CHAID). The SVM was trained with a Radial Basis Function (RBF) kernel (C = 10, gamma = 1.0, epsilon = 0.1), the C5.0 decision tree in simple mode optimized for accuracy without boosting or cross-validation, and the CHAID model with a maximum depth of five levels, significance thresholds of 0.05, and Bonferroni adjustment.

3.5. Data Filtering

The assumption was that focusing on a single dominant location would enable more precise modeling and potentially improve prediction results. The location with the postal code 72102 Zenica, which had the highest frequency of accepted shipments in the observed one-month period, was selected. After filtering, the dataset was reduced to 4166 instances with this postal code, while all others were removed.

3.6. Training, Validation, and Testing of ML Models with Default Hyperparameters and Filtered Data

In this step, ML models were created on the filtered dataset in the same manner as previously described, using the Auto Classifier node. The set of 4167 instances was divided in the same ratio as in the earlier processing, i.e., 80:10:10. For the filtered dataset, in addition to the previously described C5.0 decision tree, two further models were created. The C&R Tree was trained in expert mode with the Gini index, pruning enabled, and prior probabilities based on the training data (minimum 2% parent, 1% child records). The Logistic Regression was implemented as a multinomial main effects model in simple mode, with all predictors entered simultaneously (Enter method) and the constant term included.

3.7. Data Augmentation Using Autoencoder Neural Network—Iteration I

Artificial dataset expansion was carried out using an autoencoder neural network implemented in the MATLAB R2016a environment, with its architecture presented in Figure 2. An autoencoder is a type of feedforward neural network in which information flows strictly from the input layer toward the output, forming a directed acyclic structure. Its purpose is not classification but learning a compact internal representation of the data. The input is projected into a reduced-dimensional space through the hidden layer, where the network seeks to capture the most relevant features. From this latent representation, the model then reconstructs the original input as closely as possible. In this sense, the hidden layer functions as a feature extractor, enabling both dimensionality reduction and input reconstruction [20]. Since the autoencoder network in MATLAB works only with numerical values, each category of independent variables was represented by a unique number using integer coding.

The architecture of the autoencoder, shown in Figure 2, consists of an input layer with four nodes that serve as inputs for independent variables (X₁,…, X₄). The hidden layer has three neurons, and the output layer has four nodes in which the input vectors (X₁′…X₄′) are reconstructed. The transfer functions of the neurons of the hidden and output layer have the form of a logistic sigmoid function (logsig) [20]:

f (z) = \frac{1}{1 + e^{- z}}

(2)

where z is the input to the neuron. In addition to the independent variables, the dependent variable was numerically encoded with binary values 1 (“express”) and 2 (“delay”). However, it was not processed through the neural network; instead, its values were directly appended to the synthetic copies generated from the real inputs. In this way, the dataset was doubled, resulting in a total of 8332 instances. The autoencoder was trained with MATLAB 2016a using the trainAutoencoder function with default parameters: logistic sigmoid functions for both encoder and decoder, 1000 training epochs, scaled conjugate gradient optimization, L2 regularization (0.001), sparsity regularization (1) with target sparsity of 0.05, mean squared error with sparsity adjustment as loss function, and automatic data scaling.

3.8. Training, Validation, and Testing of Five ML Models with Hyperparameter Optimization—Iteration I

The ML models were developed in the R programming language and the Rstudio 4.4.3 integrated development environment, and their implementation can be summarized in the following algorithmic steps:

Data preparation. The research dataset was imported from an Excel file and included independent variables converted into the numeric type (continuous or discrete numerical variables), while the dependent variable was transformed into the factor format (categorical variable).
Stratified data partitioning. The dataset was split into training (80%), validation (10%), and test (10%) subsets while preserving the class distribution. This ensured an unbiased evaluation of performance.
Task and resampling definition. A classification task was created on the training and validation sets, while the test set was reserved for the final evaluation. The validation set was processed through a custom resampling procedure with predefined indices.
Model and hyperparameter definition. Five models were developed: Random Forest, SVM (RBF kernel), XGBoost, kNN, and MLP [21]. For each model, a hyperparameter space or search interval was predefined, as presented in Table 4.

5.: Optimization and training. Hyperparameters were optimized using Bayesian optimization with 30 evaluation cycles, where the target metric was overall classification accuracy. Each model was trained on the training set and validated on the validation subset, and then retrained with the best hyperparameters and tested on the independent test set.
6.: Performance evaluation. Model performance was assessed using the following metrics:

Overall accuracy—the ratio of the total number of correctly classified cases to the total number of cases (instances), calculated according to the following formula [22]:

$A c c u r a c y = \frac{N u m b e r o f c o r r e c t l y c l a s s i f i e d c a s e s}{T o t a l n u m b e r o f c a s e s}$

(3)
Balanced accuracy. Accuracy measures the proportion of correctly classified instances but can be misleading in the case of imbalanced classes. Balanced accuracy is calculated as the average accuracy across classes [23]:

$B a l a n c e d A c c u r a c y = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i}}{T P_{i} + F N_{i}}$

(4)

where TP_i (True Positive) is the number of correctly recognized instances of class i, FN_i (False Negative) is the number of misclassified instances of that class, and K is the total number of classes.
F-beta—represents the harmonic mean of precision and recall [24]:

$F_{β} = (1 + β^{2}) \cdot \frac{P r e c i s i o n \cdot R e c a l l}{(β^{2} \cdot P r e c i s i o n) + R e c a l l}$

(5)

where the parameter β (beta) determines the relative importance given to recall versus precision (or vice versa):
β = 1: the classical F1 score (balanced precision and recall).
β > 1: recall is emphasized more.
β < 1: precision is emphasized more.
Precision is defined as the ratio of correctly classified positive instances (TP) to the total number of instances predicted as positive by the model [22]:

$P r e c i s i o n = \frac{T P}{T P + F P}$

(6)
Recall represents the ratio of correctly classified positive instances to the total number of actual positive instances in the dataset [22]:

$R e c a l l = \frac{T P}{T P + F N}$

(7)
Area Under the Curve (AUC). The AUC is defined as the integral of the True Positive Rate (TPR) (also called sensitivity) over the False Positive Rate (FPR) (corresponding to 1 − specificity) [25]:

$A U C = \int_{0}^{1} \frac{T P}{P} d \frac{F P}{N}$

(8)

where P and N are the counts of positive and negative observations. In addition to the presented metrics, confusion matrices were generated, while the results of the optimal hyperparameters and predictions were saved in separate files. Within the implemented algorithm, the following R libraries were used: readxl, dplyr, mlr3, mlr3verse, mlr3learners, mlr3tuning, paradox, data.table, tidygeocoder, writexl, geosphere. To ensure the reproducibility of results, a fixed random seed value (set.seed(123)) was applied in the experiments.

3.9. Data Augmentation Using Autoencoder Neural Network—Iteration II

Analyzing the results of the ML models created on the data expanded through the first augmentation iteration, it was concluded that there was still room for improving their classification performance. Therefore, in this case, the dataset of 8332 instances of independent variables was used as input to the autoencoder network, while reconstructed copies were generated at the output, as explained in Section 3.7. By merging the input data with the generated synthetic data, the total dataset amounted to 16,664 instances (Figure 3).

3.10. Training, Validation, and Testing of Five ML Models with Hyperparameter Optimization—Iteration II

The second iteration of building Random Forest, SVM, XGBoost, kNN, and MLP models was based on the same algorithm described in Section 3.8. The models were trained, validated, and tested on a dataset enlarged fourfold, with a total size of 16,664 instances.

3.11. Analogy of Applying the Solution in Telecommunication Traffic

The methodological framework developed for classification modeling of the postal transport system can also be applied in telecommunication traffic, given the numerous similarities between these domains, which can be summarized as follows:

Telecommunications originated within postal companies (telegraph and telephone as support for faster transmission of letter mail) [26];
The basic unit of transfer or transport can be either a physical postal item or a digital data packet traveling through a network of nodes (post offices or routers/base stations) from source to destination;
Both postal and telecommunication systems function as complex networks of nodes connected by routes or communication channels for data exchange [27,28];
Both postal and telecommunication services require advanced methods of planning and flow monitoring to ensure efficiency and service quality [29,30];
Continuous technological advancement affects not only the volume of services but also their nature, introducing new forms of service offerings and transforming the way traditional services are used and perceived;
Both sectors are subject to regulation that ensures their role within the broader networked system [29].

The variables used in this study, such as physical (geographical) distance, territorial affiliation, acceptance time, and delivery time, have direct parallels in telecommunication traffic variables such as latency, jitter, throughput, and packet loss. The application of ML, which is already successfully used in telecommunications for traffic classification, congestion prediction, and quality of service/experience (QoS/QoE) optimization, can further improve reliability and efficiency in this context. A clear parallel can also be drawn between postal shipments and traffic classes in telecommunication networks. Express shipments correspond to streaming traffic, where the priority is fast and reliable delivery with minimal delay, similar to real-time video or audio streams. Conversely, delayed shipments can be associated with background traffic, where delivery is mandatory and content integrity must be preserved, but delivery time is neither strictly defined nor expected in real time [31]. This analogy demonstrates that the classification and prediction methodology applied in the postal transport system is directly transferable to telecommunication networks, confirming its interdisciplinary significance.

3.12. Software and Computational Environment

Machine learning models with hyperparameter optimization were implemented in R version 4.4.3 (28 February 2025, “Trophy Case”) on the platform x86_64-w64-mingw32/x64. The following R packages were used: readxl 1.4.5, dplyr 1.1.4, mlr3 1.0.0, mlr3verse 0.3.1, mlr3learners 0.12.0, mlr3tuning 1.4.0, paradox 1.0.1, data.table 1.17.0, tidygeocoder 1.0.6, writexl 1.5.4, and geosphere 1.5–20. Baseline models with default hyperparameters were created in IBM SPSS Modeler 18.0 (IBM Corp., Armonk, NY, USA). Data augmentation was carried out in MATLAB R2016a (MathWorks Inc., Natick, MA, USA). All computations were performed on a workstation running Windows 10 Pro (64-bit) with an Intel(R) Core(TM) i5-6300U CPU @ 2.40 GHz, 16 GB RAM, without GPU acceleration.

Compared with deep learning approaches such as ResNet, Transformer-based models, or RCCNet, which demand large datasets, GPU acceleration, and high computational resources, the proposed framework (Random Forest, XGBoost, SVM, kNN, MLP) is simpler, faster to train, and implementable in standard environments without specialized hardware. This ensures a favorable trade-off, achieving competitive accuracy with lower resource consumption, while ensemble methods further reduce sensitivity to error propagation compared to more complex deep learning architectures.

4. Results

4.1. ML Models with Default Hyperparameters Created on the Cleaned Dataset

The initial modeling results on the cleaned dataset of 9691 instances are presented in Table 5. Using the automated modeling method, three models with the best performance in terms of overall classification accuracy on the test set were created and selected: SVM, C5.0, and CHAID. In addition to the accuracy metrics, Table 5 also shows the Precision and Recall macro average values for the training and test sets.

Based on the results presented in Table 5, it can be concluded that all three models achieved similar performance. The best result was obtained by the SVM model (73.810%), closely followed by C5.0 (73.700%), while CHAID achieved a slightly lower value (73.390%). The small differences in model accuracy shown in Table 5 indicate that the cleaned dataset has a limited informational capacity, and no algorithm achieves significantly higher results. The comparison shows that all three models achieved very similar precision and recall values on both the training and test sets, indicating stable performance without signs of overfitting. The confusion matrix (Table 6) presents the aggregated results for the three best models on the entire dataset. In total, the models correctly classified 4629 express shipments and 2534 delayed shipments, while in 1260 cases an express shipment was misclassified as delay, and in 1268 cases a delayed shipment was classified as express, resulting in a total of 9691 instances.

According to Table 6, the models achieved an overall accuracy of approximately 73.91%, with 7163 out of 9691 shipments correctly classified. Class-wise analysis shows that the models are somewhat more reliable in recognizing express deliveries. One of the reasons for these results is a slight imbalance between the classes.

4.2. ML Models with Default Hyperparameters Created on the Filtered Data

The results of training and testing the models on the filtered dataset of 4166 instances are presented in Table 7. It can be concluded that the C5.0 model achieved slightly better results (79.460%) compared to the C&R Tree (79.230%) and Logistic Regression (79.010%). Although the differences are minimal, it may be concluded that the C5.0 model was the most successful in classification in this experiment. C5.0 and C&R Tree delivered a balanced relation between precision and recall, whereas Logistic Regression emphasized higher precision at the expense of lower recall.

According to the confusion matrix (Table 8), the model achieved an overall accuracy of approximately 79.2%, with 3300 out of 4166 shipments correctly classified. Class-wise analysis reveals that the model is significantly more reliable in recognizing express deliveries (2837 correctly classified cases) compared to delayed shipments, where only 463 cases were correctly identified.

By comparing the results in Table 5 and Table 7 an increase in accuracy for all presented models is evident, despite the reduction in dataset size. The increase in accuracy between the best-ranked models amounts to 5.65%, confirming the validity of the assumption that focusing on a single postal location is beneficial. Furthermore, the results suggest that all models achieved approximately the same accuracy, which indicates that the dataset has reached its informational limit. In other words, in this case, performance does not largely depend on the ML algorithm itself or on model hyperparameters.

4.3. ML Models Created with Hyperparameter Optimization—Iteration I

The results presented in Figure 4 and Figure 5 show that all models achieved consistent performance on both the training and test sets, with only minor differences, which indicates that there is no evident overfitting. Random Forest and XGBoost recorded the most stable outcomes, with accuracy of 0.90 on the training set and 0.91 on the test set, F-beta values of 0.93 and 0.94, and AUC of 0.94 and 0.93, respectively. These results confirm the robustness of ensemble approaches. MLP closely followed, with training accuracy of 0.89 and test accuracy of 0.91, accompanied by F-beta values of 0.93 and 0.94, which suggests that it generalizes well to unseen data. SVM (RBF) showed slightly lower values, with accuracy of 0.88 (train) and 0.89 (test), and stable F-beta of 0.92–0.93, while its AUC remained significantly lower (0.84–0.85), reflecting reduced class separability. The kNN model performed at a similar level to SVM, with accuracy of 0.88–0.89 and F-beta of 0.92, but without reaching the balanced performance of Random Forest, XGBoost, and MLP. Overall, the comparison of training and test results demonstrates that the models are well-calibrated, with ensemble models (Random Forest and XGBoost) and neural networks (MLP) providing the most reliable solutions, while SVM and kNN exhibit modest but consistent performance.

The analysis of the confusion matrices presented in Figure 6 reveals that all models achieved a high level of accuracy, though the distribution of errors varied across classes. Random Forest and XGBoost confirmed their robustness, with XGBoost showing the lowest number of misclassifications of Class 1, while Random Forest maintained balanced predictions between both classes. MLP also delivered stable results, with only a slightly increased number of errors for Class 2. SVM produced acceptable outcomes but remained less effective compared to ensemble methods and neural networks. The kNN model, on the other hand, exhibited the weakest performance, primarily due to the highest number of misclassified instances of Class 1, while its errors in Class 2 were comparable to those of the other models. Overall, ensemble models (Random Forest and XGBoost) and MLP stood out as the most reliable approaches, whereas SVM and kNN showed clear limitations in classification consistency.

4.4. ML Models Created with Hyperparameter Optimization—Iteration II

Figure 7 and Figure 8 illustrates that all evaluated models performed at a high level, though with certain differences between the metrics. Random Forest and XGBoost proved to be the most consistent, both reaching accuracy of 0.95 on training and test data, balanced accuracy in the range of 0.91–0.93, F-beta values of 0.96–0.97, and the highest AUC scores of 0.98–0.99. SVM and kNN followed closely, with slightly lower balanced accuracy (0.90–0.92), which points to a modest sensitivity to class imbalance, while their F-beta (0.96) and AUC (0.97–0.98) remained at a strong level. MLP showed the weakest results, recording accuracy of 0.89–0.90, balanced accuracy of 0.83–0.84, and AUC values of 0.93–0.94, making it less competitive under the given conditions. When comparing training and test outcomes, it can be observed that all models achieved nearly identical values across metrics. This indicates that there is no evident overfitting or underfitting, as the models generalized well to unseen data. The consistency of ensemble methods (Random Forest and XGBoost) further underlines their robustness, while neural networks (MLP) demonstrated slightly lower but still stable generalization performance.

In the new set of heat maps (Figure 9), all models maintained a high level of accuracy, with Random Forest and XGBoost again delivering the most balanced results and the fewest misclassifications across both classes, while MLP recorded a notably higher number of errors for Class 2 compared to the previous set. Compared to the matrices in Figure 6, SVM and kNN remained stable but continued to lag behind the ensemble approaches, confirming the consistent advantage of Random Forest and XGBoost in both iterations.

In summary, the comparison between Iteration I and Iteration II demonstrates that Random Forest, XGBoost, SVM, and kNN achieved clear performance gains, with accuracy rising to about 0.95, balanced accuracy improving by 7–8 percentage points, F1 scores exceeding 0.96, and AUC values approaching 0.99. These consistent improvements across multiple test metrics (accuracy, balanced accuracy, F1, precision, recall, and AUC) indicate that the observed gains are statistically significant, while MLP showed no notable change.

Table 9 presents the optimal hyperparameter values for all ML models obtained through Bayesian optimization. The results indicate that ensemble methods, particularly Random Forest and XGBoost, required fine-tuning of several parameters to achieve robust and balanced performance. For kNN and MLP, the optimization process identified values that enhanced model stability and ensured proper convergence, while for SVM the relatively high cost and gamma confirmed its sensitivity to the structure of the dataset. Overall, the optimized hyperparameters calibrated each model to the specific data characteristics, allowing for a reliable and fair comparison of their predictive performance.

5. Conclusions

For the purpose of this study, postal systems were selected as the subject of analysis due to their highly complex organizational and functional structures within transport activities. By their target function, these are spatially distributed systems designed to provide products and services that meet a broad portfolio of urban needs, including urban mobility that develops in environments characterized by heterogeneous and context-sensitive behavior of service users and postal operators in multimodal and unstable urban scenarios.

This article presents an interdisciplinary comparative approach to improving the classification of postal delivery times using machine learning (ML) models. Several classification models—Random Forest, SVM, XGBoost, kNN, and MLP—were analyzed to address business and functional challenges in the context of applying the smart transportation concept. Their hyperparameters were optimized using the Bayesian method and evaluated through standard classification metrics for postal delivery time classification, with the aim of enhancing efficiency and service quality. The focus on applying ML models for delivery time classification is well-motivated and aligned with broader trends in digital transformation within logistics and service optimization.

The Smart Transportation Concept (STC) enables postal systems to become part of a wider ecosystem of smart mobility, where the integration of new technologies, data exchange, and ML applications supports resource optimization, more accurate predictions, and the provision of more efficient and environmentally sustainable services for both individual and business users. The study shows that this concept has not yet reached the necessary level of implementation within the cyclical loop of planning, execution, management, and reflection, in relation to the requirements of sustainability, equity, and resilience to various categories of constraints in the business processes, strategies, and policies of multimodal transport systems.

An important innovative contribution of this study is the development of an original methodological framework for implementing the smart transportation concept in the evolution of the postal multimodal system, with the goal of creating a functional, ecologically, and economically sustainable system capable of reflective adaptation to environmental disruptions and changes in the personal preferences of postal service users as well as the business demands placed on postal operators.

The particular relevance of this study, which addresses critical challenges in postal traffic and transport, is reinforced by the unpredictable and growing parcel volumes, sustainability pressures, and the need for integration into smart transportation systems. For this purpose, a real dataset with numerous instances was collected at the Postal Center Zenica, Bosnia and Herzegovina, and processed through procedural operations, including data augmentation, followed by analytical application of five ML models. All developed models are consistently described and comparatively analyzed, with results presented concisely through textual explanations, tables, graphics, and visual interpretations.

The final dataset was divided into training, testing, and evaluation groups in the ratio of 80:10:10. The presented results demonstrate the application of a relevant methodological procedure, with performance metrics confirming the value of data augmentation in improving performance, particularly for minority classes (delayed deliveries). The emphasis on ensemble models (Random Forest, XGBoost) as robust solutions further strengthens the practical implications. The claim of interdisciplinary applicability in telecommunications adds additional significance to this article. The final conclusion is that the paper contains all key elements necessary for adequate evaluation, including compliance with the scope of the journal, supported by an appropriate visual overview and systematization of the cited literature.

The most important limitation of this study is the size of the dataset collected within the observed geographic area. Data augmentation proved useful in overcoming this limitation and in improving model performance, but it should be noted that augmentation must be applied with caution and limited to a certain number of iterations.

Future research will be oriented toward realistic sociodemographic attributes of service users and postal operators, enabling heterogeneous and context-sensitive behavior in multimodal and unstable urban scenarios. Furthermore, replicability could be enhanced by considering techniques such as the Synthetic Minority Over-sampling Technique (SMOTE) or class weighting to address class imbalance issues.

Author Contributions

Conceptualization, M.K.B. and M.S.; methodology, M.K.B. and M.S.; software, M.S., G.J. and M.H.; validation, M.K.B., Đ.P. and D.A.; formal analysis, M.K.B. and M.S.; investigation, M.K.B., Đ.P. and D.A.; resources, Đ.P., D.A. and G.J.; data curation, M.S. and M.H.; writing—original draft preparation, M.K.B. and M.S.; writing—review and editing, M.K.B. and M.S.; visualization, M.S., G.J. and M.H.; supervision, Đ.P.; project administration, M.K.B., Đ.P., G.J. and D.A.; funding acquisition, Đ.P., D.A., G.J. and M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors gratefully acknowledge the Public Enterprise BH Pošta and the Postal Office Center Zenica for providing the data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Čačić, N.; Ninović, M.; Šarac, D. Future development trends in the postal market: An overview. Int. J. Traffic Transp. Eng. 2023, 13, 28–39. [Google Scholar] [CrossRef]
Oladimeji, D.; Gupta, K.; Kose, N.A.; Gundogan, K.; Ge, L.; Liang, F. Smart transportation: An overview of technologies and applications. Sensors 2023, 23, 3880. [Google Scholar] [CrossRef] [PubMed]
Medić, A.; Kosovac, A.; Muharemović, E.; Begović, M. Machine Learning Application for Improving Customer and Postal Logistics Operator Satisfaction in Urban Areas—A Review. Promet-Traffic Transp. 2025, 37, 287–300. [Google Scholar] [CrossRef]
Banjanin, M.K.; Stojčić, M.; Vasiljević, M.; Stjepanović, A.; Jotanović, G.; Husić, M.; Jauševac, G. Prediction of the Number of Parcels Sent via Express Mail Service in a Time Series Using Statistical and Machine Learning Models. In Proceedings of the 2025 MIPRO 48th ICT and Electronics Convention, Opatija, Croatia, 19–23 May 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 192–197. [Google Scholar] [CrossRef]
Stojčić, M.; Banjanin, M.K.; Popović, Đ.; Husić, M. Machine Learning Models for the Classification of Parcel Arrival Rate in a Regional Postal Center. In Proceedings of the NEW HORIZONS of Transport and Communications 2025–TransportaCom, Doboj, Bosnia and Herzegovina, 5–8 November 2025; Faculty of Transport and Traffic Engineering, University of East Sarajevo: Doboj, Bosnia and Herzegovina, 2025. accepted. [Google Scholar]
İnaç, H.; Ayözen, Y.E.; Atalan, A.; Dönmez, C.Ç. Estimation of postal service delivery time and energy cost with e-scooter by machine learning algorithms. Appl. Sci. 2022, 12, 12266. [Google Scholar] [CrossRef]
de Araujo, A.C.; Etemad, A. End-to-end prediction of parcel delivery time with deep learning for smart-city applications. IEEE Internet Things J. 2021, 8, 17043–17056. [Google Scholar] [CrossRef]
Arıkan, U.; Kranz, T.; Sal, B.C.; Schmitt, S.; Witt, J. Human-centric parcel delivery at Deutsche Post with operations research and machine learning. INFORMS J. Appl. Anal. 2023, 53, 359–371. [Google Scholar] [CrossRef]
Khiari, J.; Olaverri-Monreal, C. Boosting algorithms for delivery time prediction in transportation logistics. In Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 251–258. [Google Scholar] [CrossRef]
Dobrodolac, M.; Lazarević, D.; Trifunović, A.; Petrović, M. Exploring the potential applications of artificial intelligence in parcel delivery systems. Manag. Sci. Adv. 2025, 2, 107–116. [Google Scholar] [CrossRef]
Karakaya, İ. Evaluation of Machine Learning and Ensemble Learning Models for Classification Using Delivery Data. Veriml. Derg. 2025, 89–104. [Google Scholar] [CrossRef]
Ruan, S.; Long, C.; Ma, Z.; Bao, J.; He, T.; Li, R.; Zheng, Y. Service time prediction for delivery tasks via spatial meta-learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Washington, DC, USA, 14–18 August 2022; ACM: New York, NY, USA, 2022; pp. 3829–3837. [Google Scholar] [CrossRef]
Küp, B.Ü.; Küp, E.T.; Koçak, G.; Yucekaya, A.D.; Hekimoğlu, M. Real-Time Prediction of Delivery Delay in Supply Chains Using Machine Learning Approaches; SSRN: Rochester, NY, USA, 2024; Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5054862 (accessed on 9 September 2025). [CrossRef]
Yi, J.; Yan, H.; Wang, H.; Yuan, J.; Li, Y. RCCNet: A spatial-temporal neural network model for logistics delivery timely rate prediction. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–21. [Google Scholar] [CrossRef]
Yalçinkaya, E.; Hızıroğlu, O.A. A comparative analysis of machine learning models for time prediction in food delivery operations. Artif. Intell. Theory Appl. 2024, 4, 43–56. [Google Scholar]
Song, J.; Wen, R.; Xu, C.; Tay, J.W.E. Service time prediction for last-yard delivery. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3933–3938. [Google Scholar] [CrossRef]
Yi, J.; Yan, H.; Wang, H.; Yuan, J.; Li, Y. Learning to estimate package delivery time in mixed imbalanced delivery and pickup logistics services. In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL), Hamburg, Germany, 29 October–1 November 2024; ACM: New York, NY, USA, 2024; pp. 432–443. [Google Scholar] [CrossRef]
Ansori, M.I.; Kusumawati, R.; Hariyadi, M.A. Prediction of service level agreement time of delivery of goods and documents at PT Pos Indonesia using the random forest method. Int. J. Adv. Data Inf. Syst. 2023, 4, 2. [Google Scholar] [CrossRef]
Ikasari, D.; Andika, R. Determine the Shortest Path Problem Using Haversine Algorithm: A Case Study of SMA Zoning in Depok. In Proceedings of the 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 11–13 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Banjanin, M.K.; Stojčić, M.; Danilović, D.; Ćurguz, Z.; Vasiljević, M.; Puzić, G. Classification and prediction of sustainable quality of experience of telecommunication service users using machine learning models. Sustainability 2022, 14, 17053. [Google Scholar] [CrossRef]
Marinković, D.; Dezső, G.; Milojević, S. Application of Machine Learning During Maintenance and Exploitation of Electric Vehicles. Adv. Eng. Lett. 2024, 3, 132–140. [Google Scholar] [CrossRef]
Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–11. [Google Scholar] [CrossRef]
mlr3measures: Performance Measures for ‘mlr3’. R Package Version 0.5.0. Available online: https://cran.r-project.org/package=mlr3measures (accessed on 23 September 2025).
Ferrer, L. Analysis and Comparison of Classification Metrics. arXiv 2022, arXiv:2209.05355. [Google Scholar] [CrossRef]
Jiang, X.; Menon, A.; Wang, S.; Kim, J.; Ohno-Machado, L. Doubly Optimized Calibrated Support Vector Machine (DOC-SVM): An algorithm for joint optimization of discrimination and calibration. PLoS ONE 2012, 7, e48823. [Google Scholar] [CrossRef]
Kljak, T.; Bolarić, M.; Binički, M. Impact of Mobile Telecommunications Traffic on the Development of Postal Traffic. Promet-Traffic Transp. 2011, 23, 359–365. [Google Scholar] [CrossRef]
Computer Networking Simplified. Available online: https://computernetworkingsimplified.wordpress.com/2013/06/16/compare-and-contrast-computer-networks-with-postal-networks/#:~:text=Each%20post%20office%20passes%20on,vary%20from%20hop%20to%20hop (accessed on 9 September 2025).
Afolalu, S.A.; Ikumapayi, O.M.; Abdulkareem, A.; Emetere, M.E.; Adejumo, O. A short review on queuing theory as a deterministic tool in sustainable telecommunication system. Mater. Today Proc. 2021, 44, 2884–2888. [Google Scholar] [CrossRef]
Chołodecki, M. The future EU postal regulation. What can be learnt from the telecommunication regulations. Comput. Law Secur. Rev. 2024, 52, 105938. [Google Scholar] [CrossRef]
CCITT. Recommendation E.600: Terms and Definitions of Traffic Engineering; International Telegraph and Telephone Consultative Committee: Geneva, Switzerland, 1988. [Google Scholar]
ETSI TS 123 107 V5.5.0; Universal Mobile Telecommunications System (UMTS); Quality of Service (QoS) Concept and Architecture (3GPP TS 23.107 version 5.5.0 Release 5). European Telecommunications Standards Institute: Sophia Antipolis, France, 2002.

Figure 1. Research Methodology.

Figure 2. Autoencoder Neural Network for Data Augmentation.

Figure 3. Second Iteration of Data Augmentation Using an Autoencoder Neural Network.

Figure 4. Comparison of Performance Metrics (Accuracy, Balanced Accuracy, F-beta) for Different ML Models Created in the First Iteration.

Figure 5. Comparison of Performance Metrics (Precision, Recall, AUC) for Different ML Models Created in the First Iteration.

Figure 6. Visual interpretability of the confusion matrix for the classification models created in the first iteration: (a) Random Forest, (b) SVM, (c) XGBoost, (d) kNN, (e) MLP.

Figure 7. Comparison of Performance Metrics (Accuracy, Balanced Accuracy, F-beta) for Different ML Models Created in the Second Iteration.

Figure 8. Comparison of Performance Metrics (Precision, AUC, Recall) for Different ML Models Created in the Second Iteration.

Figure 9. Visual interpretability of the confusion matrix for the classification models created in the second iteration: (a) Random Forest, (b) SVM, (c) XGBoost, (d) kNN, (e) MLP.

Table 1. Comparative overview of selected studies relevant to the application of machine learning in postal and logistics systems within the smart transportation framework.

Reference	Context/Case Study	Methods/Models	Main Contribution	Relevance to Smart Transportation	Improvement in This Study
[6]	Delivery time prediction with e-scooters	RF, GB, kNN, NN	GB achieved best results (R² = 0.845)	Highlights sustainable urban delivery (green transport, micro-mobility)	- Data augmentation (autoencoder-based) for small datasets - Improved recognition of minority class - Systematic comparison of multiple ML models - Regional case study in Bosnia and Herzegovina - Interdisciplinary methodological relevance transferable to telecommunication traffic - Simpler, faster to train, and implementable in standard environments without specialized hardware
[7]	Last-mile delivery time prediction	Deep learning (DL) within IoT and cloud	Outperformed classical ML and OD models	Integration of IoT, cloud, and AI in smart city context
[8]	Routing and delivery time windows	OR + statistics + ML	Leveraged couriers’ implicit knowledge from historical data	Courier-centered optimization in postal transport
[3]	Urban logistics systems	RF, ResNet	Identified ML models for accurate last-mile delivery time estimation	Provides systematic evidence of ML use in smart transportation
[11]	Delivery time classification	SVM, NB, LDA, ET (ensemble)	Achieved 99.89% accuracy	Large-scale application of ensemble ML in e-commerce logistics
[12]	Service time prediction	Meta-learning, Transformer	Outperformed baselines by 7–9.5%	Advanced DL architecture for dynamic urban conditions
[13]	Real-time delay prediction	LR, RF, XGBoost, CatBoost	Boosting achieved highest AUC (99.9%)	Real-time monitoring and prediction in supply chains
[14]	On-time delivery rate	RCCNet (Node2vec, GCN, LSTM)	Modeled courier interdependencies	Intelligent network modeling for courier efficiency
[17]	Mixed delivery and pickup services	Transformer-based multi-task model	Improved pickup-sensitive delivery predictions	Addresses complexity of urban courier services
[18]	SLA delivery time prediction	Random Forest	83.86% accuracy in four experiments	ML support for inter-island postal transport

Table 2. Research Data.

Instance	Origin Post Office	Date and Time of Acceptance	Destination Post Office	Date and Time of Delivery Recording
1	71300	2 March 2022 08:48:16.347	78418	3 March 2022 09:02:41.000
2	71300	2 March 2022 09:01:11.347	77220	3 March 2022 14:39:20.300
3	71300	2 March 2022 09:16:51.577	77220	3 March 2022 14:39:20.300
4	71300	2 March 2022 09:51:18.597	72220	3 March 2022 08:33:27.537
⋮	⋮	⋮	⋮	⋮
11,138	74266	31 March 2022 13:57:48.097	78230	1 April 2022 14:30:01.000

Table 3. Overview of Derived Variables.

Variable Name	Label	Context	Role in Models
Distance between origin and destination post offices (km)	X₁	Spatial	Independent
Territorial affiliation of destination (RS, FBiH, or Brčko District)	X₂	Spatial	Independent
Day of acceptance in the week (Monday–Saturday)	X₃	Temporal	Independent
Hour of acceptance during the day (6–20 h)	X₄	Temporal	Independent
Delivery time required for the shipment (h)	Y	Temporal	Dependent

Table 4. Search Intervals for the Optimal Values of ML Model Hyperparameters.

Model	Hyperparameter	Search Interval
Random Forest	`num.trees`—number of trees in the forest	200–1000
	`mtry.ratio`—proportion of variables randomly selected at each split	0.2–1.0
	`min.node.size`—minimum number of instances in a terminal node	1–20
SVM (RBF)	`cost`—regularization parameter (C)	0.1–10
SVM (RBF)	`gamma`—RBF kernel parameter (γ)	0.0001–1
XGBoost	`nrounds`—number of boosting iterations	100–700
	`eta`—learning rate	0.01–0.3
	`max_depth`—maximum depth of each tree	2–8
	`subsample`—proportion of data used for each tree	0.5–1.0
	`colsample_bytree`—proportion of variables for each tree	0.5–1.0
kNN	`k`—number of nearest neighbors	3–31
MLP	`size`—number of neurons in the hidden layer	3–30
	`decay`—weight decay regularization rate	0.00001–0.1
	`maxit`—maximum number of training iterations	100–500

Table 5. Three Best-Ranked ML Models by Accuracy.

Model	Test Set			Training Set
Model	Accuracy (%)	Precision	Recall	Accuracy (%)	Precision	Recall
SVM	73.810	0.722	0.721	74.14	0.729	0.728
C5.0	73.700	0.721	0.721	74.03	0.728	0.728
CHAID	73.390	0.718	0.717	73.16	0.719	0.717

Table 6. Confusion Matrix Obtained by Automated Modeling.

Prediction\Actual Class	Express	Delay	Total (Prediction)
express	4629	1268	5897
delay	1260	2534	3794
Total (actual class)	5889	3802	9691

Table 7. Three Best-Ranked ML Models by Accuracy (for Filtered Data).

Model	Test Set			Training Set
Model	Accuracy (%)	Precision	Recall	Accuracy (%)	Precision	Recall
C5.0	79.460	0.748	0.653	79.33	0.769	0.677
C&R Tree	79.230	0.744	0.648	79.33	0.771	0.675
Logistic Regression	79.010	0.758	0.626	78.79	0.796	0.645

Table 8. Confusion Matrix Obtained by Automated Modeling (for Filtered Data).

Prediction\Actual Class	Express	Delay	Total (Prediction)
express	2837	688	3525
delay	178	463	641
Total (actual class)	3015	1151	4166

Table 9. Optimal Hyperparameter Values of ML Models Created in the Second Iteration.

Model	Hyperparameter	Optimal Value
Random Forest	num.trees	204
	mtry.ratio	0.207
	min.node.size	13
SVM (RBF)	cost	9.958
SVM (RBF)	gamma	0.997
XGBoost	nrounds	433
	eta	0.243
	max_depth	5
	subsample	0.658
	colsample_bytree	0.882
kNN	k	27
MLP (nnet)	size	20
	decay	0.037
	maxit	182

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Banjanin, M.K.; Stojčić, M.; Popović, Đ.; Anđelković, D.; Jauševac, G.; Husić, M. Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept. Sustainability 2025, 17, 8718. https://doi.org/10.3390/su17198718

AMA Style

Banjanin MK, Stojčić M, Popović Đ, Anđelković D, Jauševac G, Husić M. Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept. Sustainability. 2025; 17(19):8718. https://doi.org/10.3390/su17198718

Chicago/Turabian Style

Banjanin, Milorad K., Mirko Stojčić, Đorđe Popović, Dejan Anđelković, Goran Jauševac, and Maid Husić. 2025. "Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept" Sustainability 17, no. 19: 8718. https://doi.org/10.3390/su17198718

APA Style

Banjanin, M. K., Stojčić, M., Popović, Đ., Anđelković, D., Jauševac, G., & Husić, M. (2025). Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept. Sustainability, 17(19), 8718. https://doi.org/10.3390/su17198718

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classification Machine Learning Models for Enhancing the Sustainability of Postal System Modules Within the Smart Transportation Concept

Abstract

1. Introduction

2. Review of Relevant Published Research

3. Materials and Research Methods

3.1. Research Variables and Data

3.2. Derivation of New Variables

3.3. Data Cleaning

3.4. Training, Validation, and Testing of ML Models with Default Hyperparameters

3.5. Data Filtering

3.6. Training, Validation, and Testing of ML Models with Default Hyperparameters and Filtered Data

3.7. Data Augmentation Using Autoencoder Neural Network—Iteration I

3.8. Training, Validation, and Testing of Five ML Models with Hyperparameter Optimization—Iteration I

3.9. Data Augmentation Using Autoencoder Neural Network—Iteration II

3.10. Training, Validation, and Testing of Five ML Models with Hyperparameter Optimization—Iteration II

3.11. Analogy of Applying the Solution in Telecommunication Traffic

3.12. Software and Computational Environment

4. Results

4.1. ML Models with Default Hyperparameters Created on the Cleaned Dataset

4.2. ML Models with Default Hyperparameters Created on the Filtered Data

4.3. ML Models Created with Hyperparameter Optimization—Iteration I

4.4. ML Models Created with Hyperparameter Optimization—Iteration II

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI