1. Introduction
In recent years, the food delivery sector has undergone dynamic expansion, driven both by rising customer expectations and an increasing number of enterprises operating in the industry [
1]. A central determinant of competitiveness in this market is process optimization, particularly with regard to last-mile delivery times. This dimension is not only critical for enhancing customer satisfaction but, more importantly, for advancing the energy and environmental performance of distribution systems. Accelerating urbanization and the growing scale of order volumes imply that even marginal improvements in delivery time planning can contribute significantly to reducing fuel consumption and CO
2 emissions, thereby lowering operational costs. Existing studies quantitatively confirm that improvements in routing and scheduling can simultaneously shorten delivery times and reduce energy use and emissions. For example, optimisation of last-mile delivery routes has been shown to cut delivery times by 10–25% while lowering fuel consumption and CO
2 emissions by approximately 8–20%, depending on the urban context and vehicle type. Simulation-based analyses for urban courier services similarly indicate that a reduction in average delivery time is accompanied by proportional decreases in distance travelled and energy consumption, particularly in congested city centres [
1]. The environmental dimension, with an emphasis on energy efficiency, is of particular importance. Delivery time optimization enables reduced unit-level energy consumption, which is especially pertinent in the case of Light Electric Vehicles (LEVs) commonly deployed within the sector. Moreover, these improvements strengthen sustainable urban logistics strategies by reducing congestion, air pollution and noise levels.
Planning and forecasting last-mile deliveries is a multi-component process that encompasses both economic and environmental aspects. This encompasses reductions in fuel consumption and pollutant emissions, noise mitigation and more efficient utilization of transport resources, including electric vehicles. In operational practice, every logistics decision—whether route planning, vehicle allocation, or delivery time-window selection—emerges as a decisive factor within a sustainable transport system in which energy and ecological efficiency must be balanced against service quality and enterprise competitiveness.
The food industry is currently facing numerous challenges in ensuring the efficient execution of distribution processes. The primary issue is delivery lead time. This parameter is extremely important from the customer satisfaction perspective. Food orders require delivery within a very short time window, usually up to 30 min. This leads to higher customer satisfaction and reduces the risk of order cancellations [
1]. Delivery time, in turn, depends on the degree of urbanization of a city. Traffic congestion, lack of parking spaces, or restricted zones for combustion engine vehicles may cause delivery delays. Another important factor is maintaining product freshness and safety, including proper temperature conditions, isolation from contaminated environments, and packaging integrity control [
2]. Among other relevant challenges in the industry is the high courier turnover rate, which can sometimes reach as much as 20%. This significantly affects service quality and increases costs, particularly those related to training new employees [
3]. Equally important is the issue of sustainable development. It concerns both the number of single-use packages consumed daily and the social pressure to implement a zero-emission fleet [
1].
Such conditions underscore the importance of research in this area as a fundamental component of sustainable and energy-efficient transport systems. The development of mathematical models based on rigorous analysis of historical process data provides the basis for forecasting future trends, optimizing the use of resources and identifying potential risks and inefficiencies. In turn, this facilitates the creation of decision-support tools for both strategic and operational levels, alongside the design of solutions that foster emission reduction, improvements in energy efficiency, and the integration of renewable energy sources into transport networks.
Therefore, the aim of this study is to develop a delivery time optimization algorithm for the food delivery sector using selected machine learning methods, supporting the implementation of sustainable development principles in the operations of transport enterprises. The study is based on historical delivery data, incorporating variables such as courier characteristics, vehicle type, order execution date, and weather and traffic conditions. Using this dataset, mathematical models were constructed through multiple regression, random forest regression and gradient boosting algorithms, which were subsequently evaluated to identify the key variables influencing delivery time and energy efficiency.
It can be concluded that this study introduces the following innovative elements:
The identified research gap is addressed through a comprehensive approach to modeling delivery time in food delivery systems, using data that include traffic conditions, weather variables, and operational parameters. In contrast to previous studies, the publication integrates data obtained from delivery platforms, which enables a more accurate representation of the dynamics of urban logistics processes. The modeling process incorporates both classical regression methods and modern machine learning algorithms, allowing for an empirical comparison of their effectiveness in predicting delivery times.
This study also contributes to filling a gap in the analysis of contextual factors, such as weather and traffic intensity, which have often been neglected or considered in a simplified manner. The inclusion of organizational components—such as courier experience and the technical condition of the vehicle—extends the research perspective, demonstrating the potential for businesses to influence delivery time reduction in practice. The developed model further allows flexible route planning and dynamic fleet management, addressing the need to adapt to changing environmental and urban conditions.
From a theoretical perspective, the study extends the discussion on the effectiveness of analytical tools in last-mile logistics, demonstrating that the use of machine learning methods—particularly the random forest model—ensures high predictive accuracy. From an applied perspective, this study presents a concept of intelligent distribution strategies that balance operational and environmental objectives, thereby contributing to the advancement of sustainable urban logistics. Consequently, the proposed solutions provide a practical pathway for reducing delivery time and costs while enhancing service quality.
This study is composed of six sections.
Section 1 provides introductions to this study.
Section 2 provides a comprehensive literature review on last-mile food delivery, with particular emphasis on delivery time determinants and optimization approaches in the context of sustainable urban logistics.
Section 3 describes the dataset, the operational variables considered, and the methodological framework, including the specification and calibration of the regression and machine learning models.
Section 4 presents the empirical results of the developed models, reporting their predictive performance and the estimated impact of individual factors on delivery time.
Section 5 discusses the findings in relation to the existing body of research and elaborates on their managerial and policy implications. Finally,
Section 6 summarizes the main conclusions, highlights the practical contributions for logistics enterprises, and outlines avenues for future research.
2. Literature Review
In recent years, the food delivery services sector has experienced dynamic growth. The primary factors significantly contributing to the industry’s rising value include the development of applications that facilitate food ordering as well as changes in consumer habits, largely as a result of increased digital adoption during the COVID-19 pandemic [
4]. In 2024, the global market value of the sector reached USD 288.84 billion, with a projected annual growth rate of 9.4% [
5]. These dynamics indicate that it constitutes a vital segment of the economy, while ongoing urbanization and digitalization underscore the necessity of seeking innovative, faster, and more environmentally sustainable solutions.
A critical element of food delivery operations is last-mile logistics, defined as the phase of transport extending from the distribution point or restaurant to the final customer, typically at their residence or workplace [
6]. This stage is estimated to account for up to 50% of the total costs within the sector [
7]. These costs include fuel expenditures, labor, as well as losses resulting from inefficient route planning and late deliveries. It should also be noted that this stage represents one of the most emission-intensive phases of the distribution process, responsible for approximately 40% of total CO
2 emissions in transport [
8], primarily due to low vehicle load factors and frequent trips. Increasing consumer awareness of environmental concerns, combined with the regulatory pressures exerted by European Union directives and regulations, highlights the urgency of implementing energy-efficient and environmentally sustainable solutions [
9]. Among the promising approaches are advanced mathematical models designed to optimize delivery times, thereby improving the energy performance and operational efficiency of transport enterprises.
Ensuring the competitiveness of enterprises necessitates the undertaking of measures aimed at maintaining a high level of supply chain reliability, particularly with regard to the timeliness of services [
10]. A comprehensive review of the literature reveals that the issue of delivery time optimization has been the subject of extensive scholarly investigation. Within this body of research, several categories of factors have been identified as exerting a significant influence on the phenomenon under study, including but not limited to:
Locational coordinates [
11,
12]: Including the distance between distribution points, the density of delivery points, delivery zones, and the degree of urbanization of the area.
Meal preparation time [
13]: Encompassing the average food preparation duration, queue length, and the current workload of the food service establishment.
Traffic conditions and delivery route [
14,
15]: Such as traffic congestion, the number of deliveries completed during a single route, and the time and day of order placement.
Weather conditions and random disruptions [
16,
17]: Including rain, snow, fog, road closures, and the availability of detours.
Driver-related factors [
18,
19,
20]: Such as courier attributes and experience, familiarity with urban topography, and the number of couriers operating in a given area.
Order characteristics [
21,
22]: Particularly the complexity of the order.
Technological factors [
23,
24,
25]: Such as the use of specialized order management models and customer communication applications.
The aforementioned factors were subsequently subjected to analysis regarding their influence on delivery time. A review of the literature reveals that the spatial distance between the point of order fulfillment and the delivery destination constitutes a critical determinant of distribution process duration [
26]. It is generally recognized that increased distance results in extended delivery times; for this reason, service providers frequently impose limits on the delivery radius in order to ensure that orders reach end customers within a maximally acceptable time window, often specified as 30 min [
1]. Furthermore, as evidenced by the findings of Liu et al., higher restaurant density within a specific region facilitates the minimization of delivery time, primarily through the application of delivery pooling strategies and intelligent order consolidation mechanisms [
12].
Hildebrandt et al. highlight that the physical order fulfillment time exerts a significant influence on the overall delivery duration to the customer. This phase frequently surpasses the time required for the courier’s arrival and is often exacerbated by considerable workload and queues within the restaurant. Additional delays may also arise during the packing and handover of orders. It is estimated that elevated operational loads at food service establishments can increase delivery times by 15 to 30 min [
1].
A further aspect discussed in the literature concerns traffic conditions and delivery routes. Traffic congestion is considered one of the most significant factors that may contribute to increased order fulfillment times in food delivery services. The temporal variables associated with order placement, specifically the time of day and the day of the week are also relevant, as they directly influence road conditions. Empirical studies indicate that under high-traffic conditions, delivery times may increase by approximately 15–30% [
1]. Delivery time minimization may be achieved through the strategic selection of routes that are optimally adjusted to prevailing traffic conditions [
27]. Martinez-Sykora et al. indicate that the choice between main and secondary roads, depending on specific conditions, can lead to a reduction in delivery time. The use of secondary roads has been shown to decrease travel time during peak traffic hours (by 10–18%), when the delivery is performed by an experienced courier (approximately 18%), and for deliveries to gated communities (by 10–15%). In contrast, the use of main roads proves more effective in reducing delivery times under adverse weather conditions (by 12–20%) and during nighttime or off-peak hours (by 7–14%) [
28].
Weather conditions represent another category of factors analyzed in the literature as potential determinants of delivery time performance. Empirical evidence demonstrates that, relative to neutral weather, delivery duration increases in the presence of fog (by 10%), rainfall (by 13%), snowfall or road icing (by 25%), strong winds (by 5%), and heatwaves (by 4–8%) [
29,
30]. Delivery performance is significantly influenced by the courier’s level of professional experience as well as their spatial familiarity with the urban layout [
31]. An increase in work tenure has been shown to reduce delivery time by approximately 5%, 10%, and 15%, corresponding respectively to experience of up to three months, up to six months, and more than six months [
32]. This effect is primarily attributable to more efficient use of routes, enhanced knowledge of shortcuts, and faster decision-making. Another relevant determinant is the complexity of the order. It is estimated that large group orders may extend delivery times by approximately 20% compared to those fulfilled for individual customers [
33]. The type of meal also constitutes a relevant factor affecting delivery time, with traditional or gourmet dishes extending delivery duration by as much as 30% to 50% [
34]. In addition, delivery time is affected by batched delivery, that is, the consolidation of multiple orders into a single trip carried out by the same courier, which may extend fulfillment time by up to 40% [
35].
Technological factors constitute the final set of parameters examined in the literature, particularly the deployment of digital tools for order distribution management. Noteworthy solutions include dynamic route optimization algorithms, which have been shown to decrease delivery times by approximately 35% [
36], automated dispatch systems leading to reductions of around 10% [
37], and comprehensive digital fleet monitoring platforms that integrate GPS and real-time traffic data, yielding delivery time improvements of approximately 15% [
1,
14].
The literature presents a range of studies employing diverse mathematical models to optimize delivery time. Key approaches include the classical Vehicle Routing Problem (VRP) [
38] and its extension with time windows (VRPTW) [
39], network flow models [
40], stochastic vehicle routing problems [
41], predictive-optimization methods [
42,
43], metaheuristic algorithms [
44,
45], dynamic programming [
46], as well as machine learning [
1], and artificial intelligence-based techniques [
47]. In recent years, machine learning (ML) methods have become an important research direction in modelling and predicting delivery time in food-delivery systems. These approaches rely on large operational datasets originating from platform-based delivery models or publicly available repositories and enable the identification of complex, non-linear relationships between operational, environmental and behavioural variables. The growing body of literature demonstrates that ML-based models significantly outperform classical statistical approaches, particularly in dynamic and heterogeneous urban environments. A summary of representative ML-based studies is presented in
Table 1.
In summary, the literature shows that food delivery time is influenced by a wide range of operational, contextual, and organizational factors, and that advanced optimization and routing models can improve both service quality and environmental performance. However, many existing studies analyse only selected subsets of determinants, rely on limited or non-operational datasets, and rarely combine comparative evaluations of machine learning methods with an interpretable assessment of factor importance. These limitations correspond directly to the research gaps identified in the Introduction, where we emphasized the need for an integrated, data-driven approach to modelling delivery time in platform-based food delivery systems. The empirical strategy adopted in this paper is therefore designed to address these gaps and to answer the research questions.
3. Materials and Methods
The data used in the study were obtained from real logistics operations of a company providing food delivery services within a platform-based model. The data used in this study are based on a publicly available food-delivery dataset obtained from the Kaggle platform, containing 45,584 anonymised records of individual deliveries carried out in a large urban agglomeration as described by the original data provider. The dataset includes operational information about couriers, vehicles, orders, weather conditions and traffic, and is widely used as a benchmark for delivery-time prediction tasks. In this paper, we use the original feature schema (including the weather categories and city-type labels) and apply additional cleaning and recoding steps described below. The data were provided by the platform operator in an anonymized form, devoid of any information that could enable user identification. Prior to analysis, a data verification and cleaning process was conducted. The study examined courier-specific variables, notably age, mode of transportation and its technical condition, as well as customer ratings of service performance. Additional variables included delivery distance, day and time of delivery, traffic density, and weather conditions.
In the analyzed dataset, delivery person ratings refer to ratings assigned to completed deliveries. Therefore, this variable was not fully available ex ante at the moment of order assignment and should not be interpreted as a strictly prospective predictor in real-time operational settings. In the present study, it was retained as a proxy for service performance quality, which may be associated with delivery efficiency. Future studies should replace this measure with lagged or historically aggregated courier ratings available before the analyzed delivery.
Detailed information on individual variables is presented in
Table 2.
For the purpose of delivery time modelling, four distinct mathematical models were constructed.
The variable Hour of order was retained in its original operational scale (0–23), as this representation directly reflects the actual time at which an order enters the delivery system. It should be noted, however, that the relationship between order time and delivery time may be non-linear, due to demand peaks, restaurant workload, and time-varying traffic conditions observed throughout the day. For this reason, the multiple linear regression model was treated primarily as a benchmark specification, whereas the substantive interpretation of predictor effects relied mainly on non-linear machine learning models, namely Random Forest, XGBoost, and GBM. These methods are able to capture non-linear effects and interactions between variables without requiring prior transformation of the predictor into a ranked or manually recoded form.
In the present dataset, weather conditions and road traffic density describe the operational context associated with the delivery process and were not explicitly stored as separate ex ante forecast inputs recorded at the exact moment of order assignment. Nevertheless, from an operational perspective, both variables may serve as valid predictive inputs in real-time applications, provided that they are supplied through current traffic information, traffic nowcasts, and short-term weather forecasts available before dispatch. Therefore, in the current study these variables should be interpreted primarily as contextual operational descriptors in retrospective modelling, while their online predictive use would require temporally aligned ex ante data feeds.
The first model developed was a multivariate regression model—a statistical technique used to model relationships among multiple variables, facilitating assessment of the significance and impact of selected factors on the phenomenon under investigation [
50].
The multivariate regression model typically takes the form of an equation which can be expressed as follows (1):
where:
Y—the dependent variable,
—the value of the ith independent variable,
—regression parameters (where i = 0, …, k),
—random error term.
The model parameters are estimated using the method of least squares, which involves selecting the model coefficients in such a way that the sum of squared differences between the observed values and those predicted by the model is minimized [
51].
Subsequently, a regression decision tree model based on random forest was constructed. These are machine learning algorithms designed to predict continuous values based on input features. Their mechanism of operation relies on recursively partitioning the feature space into regions (leaves), within which the prediction is represented by the mean value of the target variable for the samples contained in that region. The partitioning criterion is based on the minimization of the mean squared error (MSE) [
52].
where:
—actual value corresponding to the observation
represents a leaf (terminal) node after splitting,
the mean target value in the leaf node .
The value predicted by a leaf node of the tree is given by the formula:
where:
This study employed the Random Forest algorithm. This method represents an ensemble learning approach, in which multiple decision trees are generated on different subsamples of the dataset and on randomly selected subsets of features. Each tree is trained independently, which enhances the model’s robustness against overfitting and improves its generalization capability [
1]. The final regression estimate is obtained by averaging the predictions generated by all constituent trees in the ensemble [
53]:
where:
M—the number of trees in the random forest,
—the predicted value for input generated by tree.
The random forest algorithm introduces randomness both in the process of sampling (bootstrap) and in the selection of features for splitting, which results in individual trees being only weakly correlated and substantially improves the generalization ability of the model [
54].
The next model developed was XGBoost (Extreme Gradient Boosting). This algorithm, also based on the ensemble learning paradigm, utilizes the gradient boosting technique to construct robust predictive models from an aggregation of weak learners, specifically decision trees [
55]:
where:
A key feature of XGBoost is the sequential addition of trees, where each subsequent tree is trained to address the residual errors made by its predecessors. This process minimizes a regularized objective function, employing regularization techniques to mitigate overfitting and enhance the model’s generalization capability, thereby improving its performance on data that differ from the training or test set [
56].
The general form of the objective function used in the algorithm is:
where:
—loss function (e.g., mean squared error),
—observed value,
—predicted value after tree,
—the regularization term.
Regularization and optimization of the approximated objective function enable the XGBoost model to achieve high accuracy and robustness against overfitting, even when dealing with large datasets and numerous features.
Another model developed was the Gradient Boosting Machine (GBM), an ensemble learning method that constructs a strong predictive model by combining multiple weak learners, typically decision trees [
57].
A key aspect of the GBM model is the minimization of a defined loss function with respect to the predictions. The model is constructed iteratively, and at each stage, the sum of the model takes the form:
where:
—the prediction after iteration,
—new trained decision tree,
—weight assigned to each learner.
The final prediction for an observation is obtained as the sum of the predictions across all iterations, computed according to the following formula:
The application of the learning rate parameter allows for more precise control over model complexity: a lower value reduces the contribution of each tree and mitigates overfitting, although it necessitates a greater number of iterations to achieve satisfactory predictive performance [
58,
59].
The selection of these four models was methodologically justified in view of the study objective, which was to compare methods that differ in interpretability, functional flexibility, and sensitivity to data complexity. In this study, we compare four regression algorithms—multiple regression, random forest, XGBoost and GBM—which enables an empirical assessment of both classical statistical modelling and modern machine learning methods in predicting delivery time. Alternative methods, such as SVR and neural networks, were considered but ultimately not included for two reasons. First, in preliminary experiments they did not outperform boosting-based models in terms of predictive accuracy, while incurring substantially higher computational costs. Second, the objective of the study was not only prediction, but also interpretation of the impact of operational variables on delivery time, which in the case of SVR and neural networks would require additional model-agnostic explainability tools for black-box models [
60].
More specifically, the choice of these four models was intended to cover a meaningful spectrum of predictive approaches, ranging from an interpretable parametric benchmark to flexible ensemble methods capable of capturing non-linear effects and higher-order interactions. Multiple linear regression was included because it provides transparent coefficient-based interpretation, but its main limitation lies in the assumption of linear and additive relationships, which may be too restrictive for complex delivery-time data. Random Forest was selected as a robust non-parametric method with good resistance to overfitting and strong ability to model heterogeneous predictor effects; however, its limitations include reduced interpretability of the prediction mechanism, possible bias in variable-importance measures, and limited extrapolation beyond the range of the training data. XGBoost was included because of its high predictive efficiency and its ability to optimize complex relationships through regularized boosting, although it is more sensitive to hyperparameter specification and may become less transparent from an interpretive perspective. GBM was retained as an additional boosting-based method because it offers a useful intermediate comparison between predictive flexibility and model control; at the same time, like other sequential boosting approaches, it may be computationally more demanding, sensitive to tuning choices, and more difficult to interpret than classical regression. Thus, the selected set of models allowed comparison not only of predictive accuracy, but also of the methodological trade-offs between interpretability, flexibility, robustness, and implementation complexity.
Multiple linear regression was used as the baseline model because it relies on the same predictor set as the machine-learning methods, thereby providing a controlled reference for assessing the added value of non-linear modelling rather than the added value of a richer information set.
The hyperparameters of the models were tuned in a controlled manner. All experiments were conducted in the R environment (version 4.5.3). In the Random Forest model, 500 trees (ntree = 500) were used to improve stability, and the number of variables randomly selected at each split was optimally chosen by the algorithm (mtry). In the XGBoost model, the hyperparameters (max depth = 6, eta = 0.1, subsample = 0.8, colsample bytree = 0.8) were selected using cross-validation (xgb.cv) with an early_stopping_rounds mechanism, which prevented overfitting. For the GBM model, the number of boosting iterations and tree depth (n.trees and interaction.depth) were tuned using 5-fold cross-validation.
4. Results
Prior to model development, a data quality assessment was conducted, covering completeness, temporal consistency, and the presence of outliers. The goal was to clean the database by removing records with missing values in key operational variables, which accounted for less than 1% of the dataset. Subsequently, a preliminary screening of extreme observations was performed using the three-sigma rule as a heuristic data-quality check. This step should not be interpreted as implying normality of the target variable, particularly because delivery-time data in last-mile operations may exhibit skewness and extended upper tails. Therefore, the outlier assessment was treated as an auxiliary cleaning procedure and complemented by an empirical inspection of variable distributions.
The distributions of continuous variables are presented in
Table 3. Most of them exhibit an approximately symmetric distribution (skewness < 0.5), which confirms their statistical stability. The exceptions are delivery person ratings and hour of order, which show, respectively, strong and moderate left-skewness. This pattern is consistent with the characteristics of the analyzed sector: couriers typically receive high ratings from customers of delivery platforms, and the majority of orders are placed in the afternoon and evening hours.
To assess the balance of categorical variables, the share of the dominant category in each variable was analyzed. A variable was considered balanced if the proportion of the most frequent category did not exceed 50%, moderately imbalanced when it ranged between 50% and 80%, and strongly imbalanced when it exceeded 80%. This analysis makes it possible to evaluate the risk of model bias arising from the dominance of a single category in the dataset. The obtained results (
Table 4) showed that most variables are well balanced. Only the variable Events in city is strongly imbalanced, which follows from the fact that days with special events occur relatively rarely, reflecting real logistic conditions.
The collected results confirm the good quality of the data and their representativeness for the analyzed logistics process. Variability within the variables has been preserved, and the presence of mild or moderate skewness in selected variables reflects the natural operating conditions of the delivery sector. The data were therefore deemed suitable for subsequent predictive modeling.
For the purposes of this study, the dataset was divided into training and test sets in a ratio of 0.7 to 0.3 using a random holdout procedure. This split was adopted to enable a comparative assessment of model classes under the same data conditions. At the same time, it should be noted that for delivery data ordered in time, such a validation strategy may yield more optimistic performance estimates than a chronological split. Therefore, the reported metrics should be interpreted primarily as internal validation results for historical data rather than as a direct estimate of sequential real-time forecasting performance. The analysis commenced with a simple linear regression model to verify the existence of linear relationships. For the constructed model, the following reference levels were established (
Table 5).
Each categorical variable presented in the above table was encoded using dummy variables, and the indicated reference level serves as the comparison point for interpreting the model coefficients. This means that the estimated regression parameters for the remaining categories represent the difference in the predicted delivery time relative to the reference value.
For example, adopting “electric scooter” as the reference category for the vehicle type variable implies that the regression coefficients for the other vehicle types (bicycle, motorcycle, scooter) indicate how many minutes longer or shorter the delivery takes compared with deliveries made by electric scooter (holding all other variables constant). Similarly, Wednesday was chosen as the reference level for the “day of the week” variable, which allows the analysis of time differences relative to a neutral operational day. In the same way, “Sunny” weather conditions and “High” traffic density were selected as baseline categories due to their frequent occurrence in the dataset and their stable nature. This approach enables correct interpretation of categorical variables in the regression model and makes it possible to compare the relative impact of their individual categories on delivery time.
To ensure the correctness of parameter estimation in the linear regression model, multicollinearity diagnostics were carried out for the explanatory variables. The Variance Inflation Factor (VIF) was used for this purpose, and for multi-category variables its generalized form, the Generalized Variance Inflation Factor (GVIF), was applied. VIF values exceeding 10 are commonly regarded in the literature as an indication of serious multicollinearity. The results presented in the table show that all analyzed variables are characterized by an adjusted generalized variance inflation factor (GVIF adj) well below the critical threshold. The highest level of collinearity was observed for vehicle condition (GVIF adj = 1.34) and hour of order (GVIF adj = 1.30), but even in these cases the values remain far below the cut-off. The remaining predictors fell within the range from 1.00 to 1.12, which indicates a very low degree of interdependence between explanatory variables. These findings confirm the absence of substantial multicollinearity in the linear regression model. Consequently, the parameter estimates can be considered stable and reliable, and the identified relationships between variables are substantively interpretable, with detailed results reported in
Table 6.
The estimated values of the regression model parameters are presented in
Table 7.
Based on the conducted analysis, the intercept is estimated to be 25.07 min, which signifies the predicted delivery time for the reference category of the categorical variables and for continuous predictors set to zero. For continuous features, the following effects are observed:
Each additional year of the courier’s age increases the delivery time by an average of 0.42 min;
An improvement of 1 point in the courier’s rating decreases the delivery time by 2.90 min on average;
Each additional kilometer adds 0.38 min, on average, to the delivery duration;
Every subsequent hour of order placement reduces the delivery time by 0.07 min.
This coefficient should be interpreted as an average linear trend within the benchmark regression model rather than as evidence of a strictly monotonic effect across all hours of the day.
Analysis of categorical variables indicates that, relative to Wednesday, mean delivery times are shorter by 0.38 min on Monday, 0.62 min on Tuesday, 0.79 min on Saturday, 0.56 min on Thursday, and 0.36 min on Sunday. The observed difference for Friday is minimal and not statistically significant (0.17 min shorter). Furthermore, weather conditions (with “Sunny” as the reference category) significantly differentiate delivery durations. Specifically, cloudiness is associated with an average increase of 6.20 min, fog with 6.36 min, “normal” conditions with 3.64 min, sandstorms with 3.60 min, storms with 3.49 min, and wind with 3.76 min in delivery time. These findings underscore the substantive impact of both day-of-week and meteorological factors on delivery efficiency within the analyzed regression model.
Traffic density exerts a measurable influence on delivery times within the studied data. With high traffic as the reference category, the longest delivery durations were observed during traffic congestion, resulting in an average increase of 0.86 min. Conversely, deliveries conducted under light traffic conditions were expedited by an average of 6.49 min, while moderate traffic yielded a mean reduction of 2.54 min in delivery duration.
Further analysis of auxiliary variables revealed that each additional point in vehicle condition rating corresponded with a mean decrease of 2.35 min in delivery time. Both the condition and type of vehicle contributed significantly to the observed outcomes. In comparison to the reference category, motorcycle utilization resulted in deliveries that were approximately 0.40 min faster, scooters expedited deliveries by nearly 1 min, and bicycles led to a delay of 0.83 min, although this last effect was not statistically significant. An increment in concurrent assignments along the same route led to an average increase of 2.76 min in delivery duration, and days with events in city were associated with deliveries that were extended by 10.70 min. Deliveries performed in urban areas were, on average, 2.05 min shorter than those in metropolitan settings.
The most pronounced extensions in delivery time were attributed to fog, cloud cover (each exceeding 6.00 min), and days with events in city (10.00 min). Low traffic volume significantly shortened delivery periods by over 6.00 min. Additionally, delivery persons and vehicles receiving higher ratings, as well as the deployment of technically superior scooters and motorcycles, were strongly associated with reduced delivery times. Despite the inclusion of numerous predictors, the model accounted for only 54% of variability, indicating the probable influence of unobserved factors or non-linear relationships. Therefore, advanced machine learning models—specifically random forest, XGBoost, and GBM—were recommended for further investigation, as they are capable of modeling complex, non-linear interactions inherent to transportation and logistics systems.
As the subsequent approach, a random forest regression model was proposed and constructed using an ensemble of 500 decision trees. At each split node within a tree, four predictor variables were randomly selected from the full set of available features, and the optimal split was determined among these candidates. For this model specification, the Out-Of-Bag (OOB) error was estimated at 15.60 min2, indicating that the mean squared difference between the observed and predicted values was 15.6 min squared.
The analysis demonstrated that the model accounted for 82.42% of the variance in the dependent variable, with this statistic serving as the regression model’s coefficient of determination (R
2). The relative importance of individual predictors is presented in
Figure 1.
In
Figure 1, the left panel presents the Mean Decrease Accuracy (%IncMSE), which quantifies the average reduction in model accuracy resulting from either the exclusion or random permutation of a given predictor variable. This metric reflects the variable’s contribution to predictive performance, with higher values indicating greater importance. Notably, delivery person age exhibits the most substantial influence (352.30%), followed by delivery person ratings (250.10%) and distance (208.30%).
The right-hand panel presents the Mean Decrease Gini (IncNodePurity), an indicator quantifying the contribution of individual features to the homogeneity of nodes and terminal leaves within the ensemble model. Within this framework, the variable representing the delivery person’s age attained the highest ranking (498,350.21), followed by vehicle condition (463,453.04), and delivery distance (391,855.06). These findings suggest that the aforementioned variables exerted the strongest discriminatory power in the partitioning process, thereby minimizing reductions in the Gini index and enhancing the overall distinctiveness of the resulting splits.
The next model implemented was XGBoost, leveraging the gradient boosting framework for sequential integration of weak learners. The meta-ensemble comprised 156 individual trees, each with a maximum depth of six, utilizing a learning rate of 0.1 to incrementally refine predictive estimates. At each boosting round, decision trees are added to minimize a regularized loss function, with training guided by both first- and second-order gradient statistics to optimize split selection and regularization effects.
Within this architecture, feature importance is assessed using Gain, which quantifies the total improvement in loss reduction attributable to splits involving each variable throughout the ensemble. The Gain metric operationalizes the marginal contribution of predictors to model accuracy, serving as an analog to IncNodePurity in random forests. This relative ranking is visualized in
Figure 2, providing a direct measure of each feature’s utility in model partitioning and outcome refinement.
Among the predictors contributing most substantially to model error reduction are delivery person rating (0.10); road traffic density—low (0.62); and distance (0.62). In this model, weather conditions exert a moderate effect; only the sunny category ranks among the top features (0.09).
As an alternative to XGBoost, the Gradient Boosting Machine was implemented. GBM is also a gradient boosting algorithm, but it offers improved computational efficiency and greater resistance to overfitting through the integration of advanced optimization functions. Training model used 5000 trees, each with a maximum depth of 4 and a learning rate of 0.01. Additionally, five-fold cross-validation was applied to enhance model robustness. The model utilized a feature set of 12 predictors, and their relative importances are visualized in
Figure 3.
For this model, weather conditions was the most influential predictor (18.70%), followed by delivery person ratings (17.80%) and road traffic density (17.00%), which had very similar importances. Distance also displayed a substantial contribution (13.80%).
The empirical results obtained in
Section 4 directly address the research questions formulated in
Section 1 and
Section 2. First, the comparative evaluation of the four predictive models (multiple regression, random forest, XGBoost and GBM) clearly shows that ensemble machine learning methods, in particular random forest and XGBoost, achieve substantially higher predictive accuracy than the classical linear regression model, as evidenced by lower RMSE and MAE values and higher R
2 scores (
Table 7). Second, the feature importance analyses presented in
Figure 1,
Figure 2 and
Figure 3 confirm that a consistent set of key determinants—including delivery person ratings and age, delivery distance, weather conditions and road traffic density—exerts the strongest influence on delivery time, thereby providing a quantitative answer to the question concerning the relative impact of operational and contextual factors. Third, the results demonstrate that several of these determinants are at least partly manageable by logistics enterprises (e.g., vehicle condition, courier performance, order bundling and route planning), which directly links the modelling outcomes to the practical goal of using delivery time optimization as a tool for enhancing competitiveness within the framework of sustainable urban logistics.
5. Discussion
To summarize, a comparative assessment of the calculated performance metrics for all evaluated models was conducted. The results of this comparison are presented in
Table 8, enabling objective evaluation of model predictive quality based on established quantitative indicators.
The decision tree model using the random forest algorithm, characterized by the aggregation of multiple independent estimators, achieved the highest predictive performance—exhibiting the lowest root mean square error RMSE (3.96), mean absolute error MAE (3.14), and mean absolute percentage error MAPE (13.66%), as well as the greatest coefficient of determination (R2 = 0.81). The stochastic sampling of both observations and features at each node split efficiently reduces model variance, yielding highly stable and accurate predictions. Furthermore, random forests demonstrate substantial robustness to overfitting, often obviating the need for intensive regularization and detailed hyperparameter tuning.
The second-best results were obtained using the XGBoost model, which yielded a marginally lower R
2 (0.81) and slightly increased forecast errors: RMSE (4.09), MAE (3.26), and MAPE (14.13%). This positions XGBoost immediately behind random forest while affording superior control over model complexity. XGBoost enables fine-grained adjustment of learning rates, sampling proportions, and regularization penalties, facilitating an optimized trade-off between error minimization and mitigation of overfitting. A comparative visualization of all model results is provided in
Figure 4.
An analysis of predictor rankings for each model was also conducted (
Table 9). In the case of the multiple regression model, the most significant factors contributing to increased delivery time were fog, clouds (which added approximately six minutes), and days with events in city (which added roughly ten minutes).
For the decision tree model based on the random forest algorithm, the most influential factors are delivery person age, rating and distance. In the case of the XGBoost model, delivery person ratings, low road traffic density and distance have the greatest impact, whereas for the GBM model, weather conditions, delivery person ratings and road traffic density are the most significant.
Although courier age appears as a highly important predictor in the random forest model, the proposed models are designed exclusively for forecasting delivery time and analysing operational determinants of service duration, not for supporting hiring, firing or scheduling decisions at the individual level. Any practical deployment should therefore avoid using age as a decision variable in personnel management; instead, age-related effects should be interpreted only in aggregate form or replaced by less sensitive operational proxies (e.g., tenure or experience), with additional checks to ensure that the predictive system does not induce systematic disadvantage for specific age groups.
It should be noted that the analyzed machine learning models use different mechanisms to compute variable importance. Random Forest relies on the mean decrease in Gini impurity, XGBoost uses the gain measure derived from the reduction in error in subsequent iterations, while GBM evaluates the contribution of variables to minimizing the loss function. Therefore, discrepancies in the detailed ranking of features are a natural phenomenon and do not indicate a lack of credibility. What is important is that across all models the same set of variables consistently emerge as having a key impact on delivery time: courier rating, delivery distance, weather conditions, and traffic congestion. This convergence confirms the stability of the results and their practical usefulness.
An additional methodological limitation concerns the validation strategy. The reported results were obtained using a random 70/30 train–test split, which is suitable for comparing alternative modelling approaches within the same dataset, but may overestimate predictive performance when observations are temporally ordered and partially autocorrelated. For this reason, the present results should be interpreted mainly as evidence of relative model performance in historical data. Future research should extend the evaluation framework by applying chronological holdout schemes or rolling-origin validation in order to assess temporal generalization more rigorously.
A methodological limitation should be noted with regard to the variable delivery person ratings. In the present dataset, this measure was recorded after the completion of the focal delivery, which means that it should not be interpreted as a purely ex ante predictor for real-time forecasting. Its association with delivery time may nevertheless reflect broader and more persistent aspects of courier service quality and behavioral performance. For practical deployment, a more appropriate solution would be to use historical courier ratings aggregated from earlier deliveries.
This issue is also relevant from the perspective of feature-ablation analysis. In future work, it would be advisable to compare the full specification with reduced models built on predefined feature groups and with an ex ante feature set excluding post hoc variables such as delivery-specific ratings. Such analyses would provide a clearer estimate of the performance contribution attributable to each family of predictors and improve the operational interpretability of the modelling framework.
A related consideration applies to weather conditions and road traffic density. In the present database, these variables characterize the realized delivery context, but this does not diminish their practical relevance for prediction. In real-time deployment settings, equivalent information can be obtained before dispatch from current traffic monitoring systems, traffic nowcasts, and short-term weather forecasts. Accordingly, their strong association with delivery time should be understood as evidence of operational importance, while their direct implementation requires temporally valid ex ante input streams.
It should also be emphasized that the relatively lower importance of Hour of order in some model outputs does not imply that this factor is operationally irrelevant. Its effect is likely to be context-dependent and non-monotonic, as different times of day are associated with different levels of demand intensity, meal preparation workload, and traffic congestion. Therefore, the influence of order time should not be assessed solely on the basis of the linear regression coefficient, but rather in the context of the non-linear models applied in this study, which are more suitable for representing such time-dependent patterns.
At the same time, the reported variable-importance rankings should not be interpreted as a substitute for formal ablation experiments. While they indicate which predictors were most influential within each modelling framework, they do not quantify the marginal predictive contribution of broader feature families, such as traffic-weather variables, temporal variables, operational variables, or courier-vehicle attributes. A more rigorous assessment of feature-family contribution would require grouped ablation analyses comparing model performance after sequential removal of predefined sets of predictors.
From an operational perspective, the developed models are intended to serve as a predictive layer that can be integrated into existing dispatching and routing tools rather than as a standalone optimisation engine. In practice, predicted delivery times can be used as inputs to VRPTW-based route planners to check time-window feasibility more accurately, to support dynamic courier–order assignment when new orders arrive, and to trigger re-routing when predicted arrival times indicate a high risk of delay. By improving model accuracy, such integration has the potential to increase on-time delivery rates and reduce both delivery cancellations and cost per order, for example by lowering the number of failed attempts and enabling more efficient use of vehicles and driver working time.
From an empirical standpoint, the main strength of the reported results lies in their high predictive accuracy, robustness across multiple modelling approaches and the convergence of feature importance rankings, which together enhance confidence in the stability and practical relevance of the identified determinants of delivery time. At the same time, the models are subject to several weaknesses and limitations, including the reliance on data from a single platform-based operator and urban agglomeration, the omission of certain potentially relevant behavioural and institutional variables, which may limit the generalizability of the findings.
Despite these limitations, the results have important implications for using food delivery time modelling as a tool for developing competitive advantages in logistics within the context of sustainable development. The identification of key, partly controllable drivers of delivery time provides a concrete basis for designing operational policies that simultaneously shorten delivery times, reduce energy use and improve service reliability. By integrating the proposed predictive models into decision-support systems for fleet management and route planning, logistics enterprises can better align their operations with sustainability objectives, for example by minimizing unnecessary vehicle kilometres, favouring low-emission vehicles and adapting to adverse environmental conditions while maintaining a high level of customer service.
6. Conclusions
Systematic analysis of food delivery time data enables the identification and quantification of critical determinants influencing order fulfillment speed and operational efficiency, thereby facilitating the enhancement of customer satisfaction levels and strengthening the competitive positioning of food delivery services within the market landscape. Understanding the influence of such factors enables route optimization and better resource management, shortening delivery times and reducing the number of deliveries, which minimizes energy consumption and operational costs. Advanced analytics and automation integrated into daily operational decisions enable companies to respond quickly to ongoing challenges and dynamically adapt logistics processes to changing market conditions.
This article addresses delivery time optimization from the perspective of distribution process energy efficiency. The study establishes and analyzes four predictive models: multiple regression, decision trees (using the random forest algorithm), XGBoost, and GBM. The best fit results, evaluated using selected forecast error metrics, were achieved by decision tree models and Extreme Gradient Boosting (XGBoost). Weather conditions, delivery person age and rating, delivery distance, and road traffic density were identified as the most influential factors on the phenomenon under study.
The application of decision tree and gradient models (XGBoost, GBM) allows for highly accurate predictions of delivery time and energy efficiency, clearly surpassing traditional regression approaches. A critical success factor is the appropriate regularization and parameter optimization, which minimize the risk of overfitting and improve the overall effectiveness of decision-making in last-mile logistics, including food delivery.
The obtained analysis results have significant practical implications for the food delivery sector, in which order lead time is one of the key parameters shaping both competitiveness and energy consumption in logistics processes. The conducted modelling confirmed that delivery time is determined not only by operational factors such as distance or time of order execution, but also to a large extent by contextual variables, including weather conditions, traffic congestion, and the experience and performance quality of the courier.
From a practical standpoint, it is particularly important that factors related to work organization (e.g., courier rating, technical condition of the vehicle, order bundling) can be directly managed by companies, which creates an opportunity to shorten delivery time without the need for high capital expenditure. The results further indicate that adverse weather conditions and traffic jams significantly extend delivery time, which justifies the implementation of flexible route planning and delivery time prediction based on real-time data. The applied machine learning models, particularly the Random Forest model, demonstrated high predictive performance, which confirms the rationale for implementing data analytics in fleet management and route planning systems.
The presented findings therefore provide a basis for developing intelligent distribution strategies that can shorten delivery times, improve service punctuality, and reduce energy consumption in last-mile logistics. The analysis clearly showed that the developed models provide highly accurate delivery time predictions, which can serve as a basis for operational strategies aimed at reducing idle periods, improving vehicle utilisation and lowering fuel or energy costs, in line with the principles of sustainable logistics, although these operational effects were not directly measured in this study.
Future research directions may include the application of other machine learning methods and their integration with neural network models. It is also possible to build hybrid decision models combining classic operation research techniques with machine learning algorithms such as deep learning, reinforcement learning, or digital twin simulations. Furthermore, modeling delivery time in last-mile logistics based on real-time data could further streamline processes by monitoring routes, detecting random events, and enabling automated vehicle control through IT systems.
Future extensions should also include grouped ablation experiments in order to quantify the marginal contribution of major feature families and to compare the full model with temporally valid ex ante predictor sets.
An important extension of the present framework would also be the use of pre-existing historical courier ratings instead of delivery-specific post hoc evaluations, which would improve the temporal validity of predictor design in future predictive models. In subsequent research, we also plan to extend the set of compared algorithms to include additional state-of-the-art tree-based ensembles, such as LightGBM, in order to further verify the robustness of the obtained results.