Next Article in Journal
From ESG Signals to Sustainable Relationships: A Strategic Perspective on Perceived Sustainability Awareness, Dual-Path Value, and Long-Term Trust
Previous Article in Journal
Sustainability-Oriented Assessment of the Governance Capacity of China’s Intangible Cultural Heritage Policies: A Hybrid BERTopic-PMC Systematic Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A CatBoost-Based Prediction Framework for Logistics Industry Prosperity Index to Support Sustainable Decision-Making: An Empirical Study from China

1
School of International Education, Lanzhou University of Finance and Economics, Lanzhou 730101, China
2
School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou 730070, China
3
School of Civil Engineering and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Sustainability 2026, 18(5), 2178; https://doi.org/10.3390/su18052178
Submission received: 19 January 2026 / Revised: 19 February 2026 / Accepted: 22 February 2026 / Published: 24 February 2026
(This article belongs to the Section Sustainable Transportation)

Abstract

The logistics industry serves as a vital engine for economic growth, yet its prosperity is influenced by complex and dynamic factors. Accurate forecasting of the Logistics Industry Prosperity Index (LPI) is essential for optimizing resource allocation, enhancing operational efficiency, and mitigating potential risks, thereby supporting sustainable development and digital transformation. However, existing forecasting models often struggle with flexibility, interpretability, and handling complex nonlinear data. To address these challenges, this study proposes an innovative prediction framework based on the CatBoost algorithm and constructs an end-to-end prediction process integrating Bayesian optimization for hyperparameter tuning and a multidimensional evaluation system. The proposed framework is validated using a unique multidimensional dataset comprising 12 key indicators from Lanzhou City, China, spanning January 2022 to March 2025. Empirical results demonstrate that the CatBoost model significantly outperforms traditional and other machine learning approaches, including ARIMA, SVM, and XGBoost, achieving an R2 of 0.963 and a MAPE of 0.001%. From a theoretical perspective, this study enriches logistics prosperity forecasting and early-warning methodologies by introducing a highly accurate and robust learning-based framework. From a practical perspective, it provides governments and logistics enterprises with a reliable, data-driven tool for real-time decision support, strategic planning, and proactive risk management.

1. Introduction

China’s economy now stands at a pivotal turning point, transitioning decisively from rapid expansion towards high-quality development. Driving industrial structure optimization and boosting economic efficiency has consequently surged to the forefront for governments at all levels, enterprises, and the public. Against this profound macroeconomic backdrop, the logistics industry—often hailed as the “goldmine at the feet of enterprises”—has powerfully emerged as a vital new growth engine. By masterfully integrating resources, optimizing allocation, and creating substantial value, it unlocks significant profits for major corporations and serves as a foundational pillar supporting the national economy’s stability.
The Logistics Industry Prosperity Index (LPI), serving as the sector’s essential “barometer,” meticulously tracks 12 critical sub-indices spanning total business volume, workforce size, and new orders. As the industry becomes increasingly complex and susceptible to global fluctuations, the ability to accurately forecast the LPI is of paramount importance. Reliable forecasting provides a forward-looking perspective that enables stakeholders to anticipate market shifts rather than merely reacting to them. It acts as a crucial informational tool that bridges the gap between current operational data and future strategic planning, ensuring that the industry remains resilient in the face of economic volatility.
Research into LPI forecasting methods carries immense practical significance for steering government macroeconomic policy, optimizing resource allocation, and proactively mitigating potential risks. For government bodies, precise predictions are necessary to formulate supportive regulations and infrastructure plans that align with future demand. For enterprises, forecasting is indispensable for inventory management, capital investment, and maintaining competitive advantages. Without accurate predictive capabilities, the industry risks resource wastage and operational inefficiencies, which are detrimental to the goal of sustainable development. Therefore, developing robust forecasting mechanisms is not just an academic exercise but a fundamental necessity for the long-term health and sustainability of the entire logistics ecosystem.
In response to the problems of poor model flexibility and weak interpretability in current LPI prediction, this paper proposes an innovative method based on the CatBoost algorithm. While existing models often struggle with non-linear shocks and complex time-series data, this study leverages CatBoost’s unique advantages—such as ordered boosting and symmetric tree structures—to alleviate gradient bias and overfitting. By constructing an end-to-end prediction process that integrates Bayesian optimization and a multi-dimensional evaluation system, we aim to provide a highly accurate and reliable decision support tool. This study aims to demonstrate that the proposed framework not only outperforms traditional methods but also provides the scientific rigor required to support the high-quality development and digital transformation of the modern logistics industry.
Research on forecasting within the logistics industry has evolved significantly over the past decade. Huang et al. presented a development prediction and dynamic analysis of the modern logistics industry in Henan Province by applying a group of grey predictive models and buffer operators. Their results indicated that the modern logistics industry in Henan Province experienced steady growth [1]. Extending this scope to the national level, Xu et al. focused on demand forecasting for the Chinese logistics industry, introducing a novel adaptive grey model to address time-critical prediction accuracy issues. Comparative analysis concluded that the proposed model outperformed other methods in short-term forecasting [2]. Similarly, Guo and Li forecasted carbon emissions in the regional logistics industry. By comparing the traditional GM(1,N) model, a GM(1,N) model with background value optimization, and a GM(1,N) model with expression optimization, they found that the latter exhibited superior performance [3]. In line with environmental objectives, Chen et al. predicted carbon emission levels in China’s logistics industry using a PSO-SVR model to achieve the “dual carbon” goals. They employed grey relational analysis to screen influencing factors and utilized the particle swarm optimization (PSO) algorithm to optimize the penalty coefficient and kernel function parameters of the support vector regression (SVR) model. The study demonstrated that the proposed PSO-SVR model is effective for achieving “dual carbon” targets and upgrading the logistics industry [4]. Furthermore, in the context of the new era, Wen et al. predicted the development trend of the “Internet Plus” logistics industry under the “Belt and Road” strategy, providing insights that adjusted strategic approaches and development directions [5].
Existing forecasting models in the Logistics Performance Index (LPI) field primarily employ two technical approaches: traditional statistics and machine learning. Regarding traditional statistical models, d’Aleo et al. utilized an explanatory linear regression framework to investigate the mediating role of the LPI in the relationship between the Global Competitiveness Index (GCI) and gross domestic product (GDP) across European countries from 2007 to 2014 [6]. Martí et al. proposed a method based on data envelopment analysis (DEA) to construct a comprehensive composite logistics performance indicator (DEA-LPI), enabling benchmarking and comparative evaluation of logistics performance among countries within the LPI system [7]. Vishal et al. integrated expert opinions with the fuzzy ISM–MICMAC approach to identify logistics performance indicators critical to economic development, thereby establishing a logistics-oriented framework for understanding and forecasting future economic performance [8]. Hanif et al. applied a vector autoregression (VAR) model to examine dynamic time-series relationships between modern logistics development and economic growth, capturing both short-term fluctuations and long-term trends [9]. Hasan et al. argued that the relative importance and centrality of the six LPI dimensions had not been sufficiently examined from a global perspective; consequently, they employed network analysis (NA) to model interdependencies among indicators, identifying the core and most influential components of the LPI system [10].
In the domain of machine learning, researchers have introduced more sophisticated algorithms to enhance predictive performance. Qazi et al. investigated the temporal dependencies among the six LPI dimensions using a Bayesian Belief Network (BBN), effectively capturing probabilistic and time-dependent relationships between indicators and achieving a prediction accuracy of 88.1% [11]. Jonasíková et al. combined time-series analysis with clustering techniques to assess the evolution of LPI from 2007 to 2022, providing informative performance indicators to support logistics system management and decision-making [12]. Shepherd et al. evaluated the capability of machine learning algorithms to predict LPI by leveraging a large-scale dataset of national socioeconomic characteristics; their results demonstrated that the best-performing model explained nearly 90% of the observed variation in LPI and achieved prediction errors within 6% on unseen data [13]. Babayigit et al. applied multigene genetic programming (MGGP), an emerging machine learning technique, to rank countries according to LPI, highlighting its ability to generate both linear and nonlinear predictive models and support the adaptive prioritization of logistics indicators [14]. Roy et al. proposed a two-stage methodological framework in which K-means clustering was first employed to partition LPI datasets into a finite number of homogeneous clusters, followed by the application of multivariate adaptive regression splines (MARSs) to capture complex nonlinear relationships between LPI dimensions and key macroeconomic variables [15]. Jomthanachai et al. employed both linear and non-linear machine learning algorithms to predict logistics performance. The artificial neural network performed best and provided precise trend forecasting [16]. Baydar et al. examined the relationships among logistics performance, environmental degradation, and economic growth across 38 OECD countries using ensemble machine learning models, including random forest (RF), extreme gradient boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Their results indicated that LPI prediction depends largely on economic growth variables, followed by trade openness [17]. Finally, Jahana et al. conducted a comprehensive analysis of prosperity indicators across European countries using hierarchical clustering, identifying distinct social, economic, and environmental clusters, with interpretations supported by global indices and comparative analysis [18].
Compared to other methods, the CatBoost algorithm, developed by Prokhorenkova et al. in 2017, is specifically designed for classification and regression tasks and demonstrates unique advantages in multi-domain prediction scenarios [19]. In the medical field, Zhang et al. utilized the CatBoost algorithm to identify depression in middle-aged and elderly populations [20], and Kuo et al. developed a feline infectious peritonitis diagnosis system using the same algorithm [21]. Meanwhile, from an engineering perspective, Guo et al. applied the CatBoost algorithm to predict indoor PM2.5 concentrations [22].
In summary, the existing literature illustrates a clear evolutionary trajectory in logistics industry forecasting, transitioning from early applications of grey prediction models and traditional statistical techniques—such as linear regression and DEA—to advanced machine learning algorithms capable of capturing nonlinear and dynamic relationships. While traditional methods have established a solid foundation for understanding logistics development and economic linkages, recent studies increasingly favor sophisticated machine learning approaches like BBN, SVR, and ensemble models to achieve higher prediction accuracy and address complex issues such as carbon emissions and LPI ranking. Notably, the CatBoost algorithm has emerged as a particularly powerful tool across various domains, demonstrating superior performance in classification and regression tasks. These findings suggest that integrating advanced algorithms like CatBoost into logistics research holds significant promise for enhancing predictive precision and supporting strategic decision-making in a rapidly evolving industry landscape. Building on this foundation, this study achieves a groundbreaking advancement in logistics industry prosperity index forecasting by pioneering the application of CatBoost models and establishing an end-to-end prediction framework. The dataset comprises logistics-related indicators from Lanzhou City over the past three years, with Bayesian optimization automatically fine-tuning hyperparameters to generate predictions. Comparative experiments with conventional models, including ARIMA, SVR, and XGBoost, demonstrate the framework’s superiority across multiple dimensions, including prediction accuracy, generalization capacity, and interpretability.

2. Methodology

In model construction, CatBoost employs a decision tree (decision stump) as its learning unit, using the gradient descent algorithm to iteratively optimize loss functions such as Mean Squared Error (MSE) to minimize the deviation between predicted values and actual values. To further enhance the generalization ability, CatBoost introduces L1 (Lasso) and L2 (Ridge) regularization methods [23]. By adding a penalty term to the loss function to constrain the model complexity, it ensures the fitting accuracy while suppressing the risk of overfitting. Figure 1 [24] illustrates the architecture diagram of this algorithm.
CatBoost is particularly well-suited for forecasting the LPI primarily due to its ordered boosting mechanism, which effectively mitigates prediction shift and overfitting—a critical advantage when dealing with the limited time-series datasets typical in regional economic forecasting. Unlike standard gradient boosting methods, CatBoost utilizes symmetric (oblivious) tree structures, which serve as a natural form of regularization, leading to more balanced decision trees that generalize better and are less prone to overfitting on small samples. Furthermore, the algorithm natively handles categorical features without extensive preprocessing, preserving the integrity of mixed-type logistics indicators and reducing information loss. This combination of robustness against overfitting, efficient handling of diverse data types, and strong generalization capabilities makes CatBoost a superior choice over equivalent algorithms for the complex, nonlinear task of LPI prediction.
The specific steps are as follows.
1.
Initialization
(1)
Input: Training data {D}.
(2)
Parameter: T: Increase the number of iterations (number of trees in the final model). Lambda: L2 regularization coefficient.
(3)
Initial prediction: Set an initial constant prediction value for all samples, usually the global mean of the target variable:
y ^ i ( 0 ) = 1 n i = 1 n y i
2.
Order improvement and unbiased gradient calculation (for the tth to Tth iteration loop).
(1)
Calculate the gradient (residual) for each instance i in the sample, the residual for each instance i is the difference between the true value y i and the predicted value y ^ i
g i ( t , s ) = L ( y i ,   y ^ i t 1 ) y ^ i ( t 1 ) = ( y i y i ( t 1 ) )
In Equation (1), g i ( t ) represents the gradient (or residual) at the i-th data point in the t-th boosting iteration; y i denotes the true target value of the i-th data point; y ^ i t 1 is the predicted value by the model in previous boosting iterations; L ( y i ,   y ^ i ( t 1 ) ) indicates the loss function.
(2)
Calculate the Hessian matrix for each sample i, which is used to update model parameters.
The Hessian matrix captures the curvature characteristics of the loss function, enabling more precise adjustments to the model. For regression models using the mean squared error (MSE) loss function, the Hessian matrix values for each sample remain constant and equal to 1.
h i ( t , s ) = 2 L ( y i , y ^ i ( t 1 ) ) ( y ^ i ( t 1 ) ) 2 = 1
In Formula (2), the Hessian matrix represents the second derivative of the loss function with respect to the predicted value y ^ i t 1 ; h i ( t ) is the Hessian matrix (second derivative) of the loss function. The function L ( y i ,   y ^ i ) represents the prediction result of the i-th data point at the t-th promotion round and the s-th random iteration; y ^ i t 1 represents the predicted value of the previous boost round in the random iteration, where s is the number of iterations.
3.
Model construction and update
(1)
Use the features of the full training data {D} as input.
(2)
Take the unbiased gradient obtained in the previous step as the target value and follow the standard split gain formula:
Gain = 1 2 i L g i 2 i L h i + λ + i R g i 2 i R h i + λ i P g i 2 i P h i + λ
To fit these gradients, CatBoost f t typically uses symmetric trees (Oblivious Trees) as the base learning unit to build a new decision tree.
(3)
Calculate the output value of “j” each leaf node in the decision tree:
γ j t = i I j g i ( t ) i I j h i ( t ) + λ
Among them, I j represents the index set of instances in leaf j; λ is the regularization parameter.
(4)
Model update
y ^ i ( t )   = y ^ i ( t 1 ) + η f t ( x i )
Among them, η represents the learning rate, y ^ i ( t ) is the current predicted value of the model for sample i after the t-th iteration, and f t ( x i ) represents the output value of leaf node j ( i ) containing instance i.
4.
Output the final model and the predicted value
After T iterations, the final model F is a weighted integration of all T trees:
F ( x )   =   y ^ ( 0 )   +   η t = 1 T f t ( x )
For any new sample, the model will input it into the integrated model F, pass through each tree in turn, and each tree gives an output. Finally, the output of all trees is summed according to the weight, and the initial value is added to get the final prediction result:
y ^ new   =   F ( x new )

3. Empirical Study

3.1. Data Preparation and Process Design

To achieve accurate prediction of the logistics industry’s prosperity index, this study establishes a forecasting framework based on the CatBoost model, adhering to a structured application process. Initially, we utilize the monthly logistics industry dataset for Lanzhou City, spanning January 2022 to March 2025, sourced directly from the “2022–2025 Annual Logistics Industry Data Analysis Report of Lanzhou City”. This dataset, collected on a monthly basis, is characterized by its completeness and the absence of missing values. It has been meticulously transformed into a standardized format that includes 12 key characteristic indicators, such as total business volume, new orders, and average inventory levels, as detailed in Table 1, Table 2 and Table 3. To ensure data quality, this paper implemented seasonal adjustment for all 11 indicators except the ‘Business Activity Expectations’ (which belongs to the survey-type expected indicators) and preprocessed the original data using the min–max standardization method. Based on this, the entropy method was utilized to ascertain the weights of each indicator, culminating in the synthesis of a weighted business climate index for the period spanning from January 2022 to December 2024. The specific calculation results are shown in Table 4.
The variables are defined as follows: x1 (total business volume), x2 (new orders representing customer demand), x3 (average inventory level), x4 (inventory turnover frequency), x5 (capital turnover rate), x6 (equipment utilization rate), x7 (logistics service pricing), x8 (core business profit), x9 (core business cost), x10 (completed fixed asset investment), x11 (number of employees), and x12 (business activity expectations).
Secondly, the prediction model is established. In this study, the test set proportion is set at 0.9, partitioning the 39 samples into training, test, and prediction sets with proportions of 82.1%, 10.3%, and 7.7%, respectively. The independent variables include: x1 total business volume, x2 new orders (representing customer demand), x3 average inventory level, x4 inventory turnover frequency, x5 capital turnover rate, x6 equipment utilization rate, x7 logistics service price, x8 main business profit, x9 main business cost, x10 completed fixed asset investment, x11 employee count, and x12 business activity expectations. The dependent variable is the weighted prosperity index. The CatBoost modeling approach is implemented, undergoing 500 iterations to ensure comprehensive data learning and mitigate the risk of underfitting. Each tree’s maximum depth is limited to 6, controlling complexity and reducing overfitting risks. The learning rate (0.1) balances training speed with generalization capacity, and the L2 regularization coefficient (3.0) strengthens weight constraints, enhancing model robustness. The feature subset ratio (1.0) ensures full feature utilization in each iteration, avoiding randomness while leveraging raw information. The task type is explicitly defined as regression to fulfill the logistics industry’s prosperity index prediction requirements.
The analysis of feature importance provides critical insights into the drivers of the LPI in Figure 2:
A. Dominant drivers (x1, x2, x12)
The model identifies Total Business Volume (x1) and New Orders (x2) as the two most significant predictors, with importance scores of 0.89 and 0.87, respectively. This aligns with economic logic: business volume reflects current operational scale, while new orders signal future market demand. Business Activity Expectations (x12) follows closely (0.82), suggesting that market confidence and sentiment are strongly predictive of the industry’s prosperity. These top three features act as the primary “engine” of the index.
B. Operational drivers (x10, x6, x5)
Indicators related to operational capacity and efficiency—Fixed Asset Investment (x10), Equipment Utilization Rate (x6), and Capital Turnover Rate (x5)—form the second tier of importance. This indicates that infrastructure investment and resource efficiency are necessary supports for prosperity but fluctuate less dramatically than market demand variables.
C. Weak predictors (x11, x8)
Interestingly, Number of Employees (x11) and Core Business Profit (x8) show the lowest importance scores. This might suggest that within the specific context of the Lanzhou dataset, employment levels and profits are relatively stable or lag behind the immediate changes in the prosperity index, making them less useful for short-term prediction compared to volume and orders.

3.2. Model Training and Result Evaluation

To further evaluate the model’s performance in predicting industry trends, this study employed the CatBoost model trained on the standardized dataset. The prediction process comprised the following steps: Initially, categorical feature data underwent automatic processing to maintain the integrity of the original data. Subsequently, the ordered boosting mechanism was employed to incrementally enhance prediction accuracy, culminating in the completion of model training. Ultimately, feature data from January to March 2025 was inputted to generate predicted values for the logistics industry’s prosperity index for the subsequent three months.
Regarding evaluation criteria, in addition to conventional regression indicators such as Mean Square Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination (R2), this study also introduces several customized assessment indicators tailored to the characteristics of logistics industry prosperity index prediction. Specifically, Median Absolute Error (MAD) bolsters model robustness against outliers, mitigating the impact of extreme fluctuations on overall prediction accuracy; Mean Absolute Percentage Error (MAPE) offers a percentage-based metric for prediction deviations relative to actual values, providing intuitive insights into model precision across varying prosperity levels; Explainable Variance Share (EVS) quantifies the model’s explanatory capacity for industry volatility, reflecting the alignment between predictions and actual fluctuation patterns; Logarithmic Mean Square Error (MSLE) focuses on measuring logarithmic spatial variations in prosperity index discrepancies, imposing asymmetric penalties for underestimation and overestimation, rendering it particularly apt for indicators characterized by exponential growth or extensive fluctuations. These indicators together form a multi-dimensional and highly robust evaluation system that ensures the accuracy and reliability of logistics prosperity prediction models in complex industry environments. The relevant mathematical formulas [24] are as follows:
MSE   =   1 n i = 1 n ( y i y ^ i ) 2
MAE = 1 n i = 1 n y i     y ^ i
RMSE = 1 n i = 1 n ( | y i     y ^ i | ) 2
R 2 = 1 i = 1 n ( | y i   y ^ i | ) 2 i = 1 n ( | y i     y ^ i | ) 2
MAD = median   ( y i y ^ i )
MAPE = 1 n i 1 n | y i     y ^ i | y i   ×   100 %
EVS = 1     [ σ 2 ( y i     y ^ i ) σ 2 ( y i ) ]
MSLE = 1 n × [ log ( 1 + y i )     log ( 1 + y ^ i ) ]
Among them, y i represents the true value, y ^ i represents the predicted value, n is the number of samples, σ 2 represents the variance, and median indicates taking the median.

4. Results and Verification

A comprehensive comparison with existing research findings reveals that the CatBoost model’s test set R2 reaches 0.963, with a MAPE of merely 0.001%. In contrast to models like ARMA and ARIMA, which typically exhibit prediction errors ranging from 1.5% to 2%, the CatBoost model demonstrates superior performance. The explainable variance (EVS) is 0.968, indicating that the model can capture the vast majority of fluctuations in the data. The root mean square logarithmic error (MSLE) is recorded at 0.001, demonstrating that, in comparison to traditional machine learning models such as PSO-SVM, which necessitate complex preprocessing and often lead to efficiency losses, the model in question exhibits high prediction accuracy on a logarithmic scale.
To further validate the predictive performance of the CatBoost model, this study compared multiple forecasting methods for the logistics prosperity index from January to March 2025 (Table 5). The results indicate that the CatBoost model exhibits the highest predictive accuracy, with its forecasted values of 47.39, 41.98, and 53.65 closely approximating the true values of 46.39, 40.62, and 55.51, as calculated by the entropy method, demonstrating minimal deviation and outperforming other models.
To visually demonstrate the fit between the predicted and true values for each prediction method, this study presents a multi-method prediction comparison line chart of the logistics prosperity index from January 2022 to March 2025 (Figure 3).
The empirical results of the comparative experiments, as visualized in Figure 3, substantiate the CatBoost model as a dependable tool for forecasting the logistics industry prosperity index.
(1)
Regarding predictive accuracy and robustness, Figure 3 clearly shows that the data results generated by the CatBoost model closely match the actual values, particularly excelling in identifying turning points between the fourth quarter of 2024 and the first quarter of 2025. This suggests that CatBoost’s ordered boosting mechanism plays a pivotal role in reducing gradient bias and preventing data overfitting. In contrast, XGBoost displays a noticeable lag in its prediction curve, whereas the random forest prediction curve appears excessively smooth. These observations imply that CatBoost exhibits superior generalization and adaptability when processing complex economic data, such as the logistics industry prosperity index.
(2)
In terms of data processing capabilities, CatBoost’s exceptional performance is underscored by its inherent ability to effectively manage categorical features. The logistics industry prosperity index is computed from weighted contributions across diverse business categories, incorporating comprehensive data from multiple logistics enterprises. For models that necessitate intricate preprocessing of categorical features, substantial discrepancies may emerge between predicted values and actual outcomes at specific nodes, thereby compromising model accuracy. Conversely, CatBoost inherently circumvents these issues at an algorithmic level, thus preventing critical information loss due to the improper handling of complex data and yielding predictions that are more closely aligned with real-world values, enhancing model reliability.
(3)
Concerning practical applicability, this study systematically validates the comprehensive advantages of CatBoost. As depicted in Figure 3, in comparison with other models, CatBoost not only yields more precise predictive values but also exhibits a high degree of alignment with actual trends—a stability that is crucial for practical applications and advantageous for offering reliable decision-making references to governmental bodies and pertinent logistics enterprises. Furthermore, relative to traditional machine learning models, CatBoost offers strong practical applicability alongside high precision; it operates stably while being easy to maintain—thus presenting a novel methodology for monitoring logistical industry prosperity that ensures both scientific rigor and practical value. In summary, through intuitive comparisons based on visualized data results along with qualitative analyses outlined above, CatBoost emerges as a scientifically sound and effective solution within the realm of forecasting logistics industry prosperity indices due to its outstanding advantages in accuracy, robustness, adaptability, and interpretability.
In conclusion, the visual results obtained by introducing actual data into the model directly confirm the conclusions of quantitative analysis: the CatBoost algorithm has the best performance in the logistics industry prosperity index prediction task, with its strong nonlinear relationship fitting ability and anti-overfitting characteristics, providing a reliable tool for industry prosperity warning and policy making.
Figure 4 gives the SHAP (SHapley Additive exPlanations) feature importance bar chart, and provides the interpretability.
A. Demand-side drivers dominate
The analysis reveals that Total Business Volume (x1), New Orders (x2), and Business Activity Expectations (x12) are the three most influential predictors, collectively accounting for 61.9% of the model’s decision-making. This confirms that: Market demand is the primary driver of logistics prosperity; Future expectations strongly influence current industry sentiment; These findings align with economic theory and industry practice.
B. Operational efficiency is secondary
Features related to operational performance (x6 Equipment Utilization, x5 Capital Turnover, x4 Inventory Turnover) rank in the middle tier, suggesting that they are supporting factors rather than primary drivers; high utilization rates amplify prosperity but cannot create it without demand.
C. Employment and profit are lagging indicators
Number of Employees (x11) and Core Business Profit (x8) show the lowest importance scores (0.004 and 0.006), indicating that employment levels tend to be stable relative to prosperity fluctuations; profits may be affected by cost structures independent of volume; these are likely lagging indicators rather than predictors.
D. Direction of effects
All significant features show positive correlations with LPI:
Higher business volume → Higher prosperity
More new orders → Higher prosperity
Stronger expectations → Higher prosperity
This consistency validates the model’s alignment with economic logic and makes the predictions intuitively interpretable for policymakers.
Partial Dependence Plots (PDPs) show the marginal effect of a single feature on the predicted outcome, averaging out the effects of all other features. This helps policymakers understand the specific relationship between changes in indicator values and the resulting shifts in the prosperity index. Table 6 gives a partial dependence summary for the top 6 features.
Some key insights from partial dependence analysis are displayed:
A. Non-Linear Thresholds Identified
Equipment utilization rate (x6) shows the strongest marginal effect on LPI:
Below 50%: LPI drops precipitously (indicating severe underutilization);
50–65%: Optimal operating range with steady LPI gains;
Above 65%: Diminishing returns (capacity constraints);
Total Business Volume (x1) exhibits a plateau effect:
Values below 45: Strong positive impact per unit increase;
Values 45–65: Steady but diminishing marginal returns;
Values above 65: Minimal additional LPI gains.
B. Interaction Effects Observed
The PDPs reveal that High equipment utilization (x6) combined with strong new orders (x2) creates synergistic effects on LPI; Low business expectations (x12 < 40) acts as a bottleneck, limiting the positive effects of other indicators; Capital turnover (x5) has a foundational role: when below 40%, it constrains the effectiveness of volume increases
C. Practical Implications for Policymakers
Based on the partial dependence analysis, the primary leverage point focuses on the equipment utilization rate—the most impactful indicator with a 27.8-point effect range. Policies that help firms optimize equipment use (e.g., sharing platforms, lease programs) could significantly boost industry prosperity; an early warning system can monitor new orders and business activity expectations as leading indicators. Sharp declines below 40–45% signal imminent LPI downturns; design policies that target critical thresholds: Emergency support when indicators fall below warning thresholds, optimization programs when indicators are in the growth zone, and innovation incentives when indicators reach plateau zones. The PDPs show that improving a single indicator in isolation has a limited effect. A coordinated approach targeting utilization, volume, and expectations simultaneously yields the greatest LPI improvements.

5. Discussion

The CatBoost algorithm employed in this study boasts several theoretical advantages that contribute to its superior performance in predicting the LPI. Its ordered boosting mechanism effectively mitigates gradient bias and overfitting, leading to improved generalization and robustness. The use of symmetric tree structures further enhances training efficiency and accuracy by ensuring balanced tree growth. Additionally, CatBoost’s ability to handle categorical features natively, without the need for complex preprocessing, prevents information loss and maintains data integrity. These factors collectively enable CatBoost to capture complex nonlinear relationships within the LPI data, resulting in accurate and reliable predictions.
Despite its strengths, the CatBoost algorithm does possess some limitations. As a static model, it lacks the ability to adapt and learn from new data in real-time. This limitation hinders its ability to capture emerging trends and patterns, particularly in a dynamic industry like logistics, which is influenced by numerous factors such as technological advancements, seasonal fluctuations, and global events. Furthermore, the interpretability of CatBoost models can be challenging compared to simpler models like linear regression. Understanding the contribution of individual features to the prediction can be complex, requiring specialized tools and techniques.
The empirical results of this study offer valuable insights for stakeholders in the logistics industry. The CatBoost model’s high accuracy and robustness suggest its potential as a powerful tool for predicting future LPI trends and identifying critical turning points. This information can be invaluable for governments and enterprises in several ways:
Policy making: Governments can utilize the CatBoost model to anticipate market shifts and formulate supportive policies and infrastructure plans. This proactive approach can foster a stable and conducive environment for logistics industry growth.
Resource optimization: Enterprises can leverage the model’s predictions to optimize resource allocation, including inventory management, capital investment, and workforce planning. This can lead to improved operational efficiency and cost savings.
Risk management: By identifying potential downturns or disruptions early on, enterprises can implement risk mitigation strategies and maintain business continuity.
Strategic planning: The model’s ability to forecast LPI trends can assist enterprises in developing long-term strategic plans and setting realistic goals for growth and expansion.

6. Conclusions

This study presents a comprehensive prediction framework for the LPI by innovatively applying the CatBoost algorithm within a machine learning context. By constructing an end-to-end process that integrates Bayesian optimization for hyperparameter tuning and a multi-dimensional evaluation system, we address the limitations of traditional forecasting models in handling complex, nonlinear data. The research utilizes a unique dataset of 12 key indicators from Lanzhou City, spanning from January 2022 to March 2025, to validate the proposed model’s effectiveness. This approach not only enriches the theoretical methodology of logistics industry warning systems but also establishes a replicable technical framework that bridges the gap between advanced algorithmic theory and practical industrial application, providing a solid foundation for data-driven decision-making in the logistics sector.
The empirical results demonstrate that the CatBoost model significantly outperforms traditional and other machine learning approaches, such as ARIMA, SVM, and XGBoost. The model achieved exceptional predictive accuracy with an R2 of 0.963 and a MAPE of merely 0.001%. Comparative analysis revealed that CatBoost excels in capturing critical dynamic turning points in the LPI data and exhibits superior robustness against outliers. These advantages are attributed to the algorithm’s ordered boosting mechanism and symmetric tree structure, which effectively mitigate gradient bias and overfitting while enhancing training efficiency. Furthermore, the model’s ability to handle categorical features natively ensures data integrity and prevents information loss, making it a highly reliable tool for monitoring and forecasting the prosperity of the logistics industry.
Despite these promising results, the study acknowledges certain limitations inherent in the proposed framework. The current CatBoost model is essentially static, trained on fixed historical datasets, which limits its capacity to autonomously update parameters and knowledge in response to real-time changes. Given that the logistics industry is a dynamic system susceptible to rapid shifts such as technological advancements, seasonal fluctuations, and global crises, a static model may face challenges in maintaining long-term predictive accuracy.
Future research should focus on developing adaptive learning mechanisms that enable the model to assimilate real-time data incrementally. Additionally, establishing a multi-source data integration framework and enhancing system stability under extreme scenarios will be critical for achieving a more dynamic, resilient, and comprehensive predictive system that supports the sustainable development of the logistics industry.

Author Contributions

Conceptualization, Y.L. and C.M.; methodology, Y.L.; software, Y.L.; validation, Y.L., Q.L. and C.M.; formal analysis, Q.L.; investigation, X.X.; resources, Q.L.; data curation, Q.L.; writing—original draft preparation, X.X.; writing—review and editing, X.X.; visualization, Q.L.; supervision, C.M.; project administration, C.M.; funding acquisition, C.M. All authors have read and agreed to the published version of the manuscript.

Funding

Please add: This study was jointly supported by the National Natural Science Foundation of China [No: 72131008] and the National Key Research and Development Program (No:2022YFC3800103-03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data employed in this study can be made available on request.

Acknowledgments

Thanks for the anonymous reviewers’ comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Huang, Y.; Liu, H.; Qi, E.; Li, Q. Proceedings of the 2013 IEEE International Conference on Grey Systems and Intelligent Services (GSIS); IEEE: New York, NY, USA, 2013; pp. 194–199. [Google Scholar]
  2. Xu, N.; Gong, Y.; Bai, J. Adaptive grey prediction model with application to demand forecasting of Chinese logistics industry. J. Grey Syst. 2019, 31, 128–139. [Google Scholar]
  3. Guo, X.; Li, B. Carbon emission prediction method of regional logistics industry based on improved GM(1,N) model. J. Grey Syst. 2022, 34, 1–9. [Google Scholar]
  4. Chen, L.; Pan, Y.; Zhang, D. Prediction of carbon emissions level in China’s logistics industry based on the PSO-SVR model. Mathematics 2024, 12, 1980. [Google Scholar] [CrossRef]
  5. Wen, C.; Yang, J.; Zhang, Z.; Guo, Q.; Xu, H. Prediction of the development trend of the “Internet +” logistics industry under the “Belt and Road” strategy. Comput. Intell. Neurosci. 2022, 2022, 4630146. [Google Scholar] [CrossRef]
  6. d’Aleo, V. The mediator role of logistic performance index: A comparative study. J. Int. Trade Logist. Law 2015, 1, 1–7. [Google Scholar]
  7. Martí, L.; Martín, J.C.; Puertas, R. A DEA-logistics performance index. J. Appl. Econ. 2017, 20, 169–192. [Google Scholar] [CrossRef]
  8. Jain, V.; Sharma, A.; Abbas, H.; Kukreti, M.; Al Abri, S.; Al Makdami, M. Developing a logistics framework that contributes to economic growth: A fuzzy ISM-MICMAC approach. FIIB Bus. Rev. 2025, 23197145251345235. [Google Scholar] [CrossRef]
  9. Hanif, S.; Mu, D.; Baig, S.; Alam, K.M. A correlative analysis of modern logistics industry to developing economy using the VAR model: A case of Pakistan. J. Adv. Transp. 2020, 2020, 8861914. [Google Scholar] [CrossRef]
  10. Hasan, M.K.; Lei, X.; Tang, W.; Nishi, N.N.; Latif, Z. Exploring logistics performance index (LPI) from global perspective: A study based on network analysis. Oper. Manag. Res. 2025, 18, 1088–1112. [Google Scholar] [CrossRef]
  11. Qazi, A.; Al-Mhdawi, M.K.S.; Simsekler, M.C.E. Exploring temporal dependencies among country-level logistics performance indicators. Benchmarking 2025, 32, 1825–1856. [Google Scholar] [CrossRef]
  12. Jonasíková, D.; Konečný, V.; Zuzaniak, M. Evolution of logistics performance index and their structure in selected countries. Transp. Res. Procedia 2025, 87, 217–231. [Google Scholar] [CrossRef]
  13. Shepherd, B.; Sriklay, T. Extending and understanding: An application of machine learning to the World Bank’s logistics performance index. Int. J. Phys. Distrib. Logist. Manag. 2023, 53, 985–1014. [Google Scholar] [CrossRef]
  14. Babayigit, B.; Gürbüz, F.; Denizhan, B. Logistics performance index estimating with artificial intelligence. Int. J. Shipp. Transp. Logist. 2023, 16, 360–371. [Google Scholar] [CrossRef]
  15. Roy, V.; Mitra, S.K.; Chattopadhyay, M.; Sahay, B. Facilitating the extraction of extended insights on logistics performance from the logistics performance index dataset: A two-stage methodological framework and its application. Res. Transp. Bus. Manag. 2018, 28, 23–32. [Google Scholar] [CrossRef]
  16. Jomthanachai, S.; Wong, W.P.; Khaw, K.W. An application of machine learning to logistics performance prediction: An economics attribute-based of collective instance. Comput. Econ. 2024, 63, 741–792. [Google Scholar] [CrossRef]
  17. Baydar, M.B.; Mete, M. Explaining logistics performance, economic growth, and carbon emissions through machine learning and SHAP interpretability. Sustainability 2026, 18, 585. [Google Scholar] [CrossRef]
  18. Jahana, J.; Joseph, T.M.; Alzaatreh, A. Assessing the prosperity of European nations: A cluster analysis based approach. In Proceedings of the 2024 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD), Sharjah, United Arab Emirates, 4–6 November 2024; pp. 1–6. [Google Scholar]
  19. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
  20. Zhang, C.; Chen, X.; Wang, S.; Hu, J.; Wang, C.; Liu, X. Using CatBoost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011–2018. Psychiatry Res. 2021, 306, 114261. [Google Scholar] [CrossRef] [PubMed]
  21. Kuo, P.; Li, Y.; Yau, H. Development of feline infectious peritonitis diagnosis system by using CatBoost algorithm. Comput. Biol. Chem. 2024, 113, 108227. [Google Scholar] [CrossRef]
  22. Guo, Z.; Wang, X.; Ge, L. Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm. Front. Built Environ. 2023, 9, 1207193. [Google Scholar] [CrossRef]
  23. CatBoost: A Gradient Boosting Library. CatBoost Documentation 2024. Available online: https://catboost.ai/ (accessed on 22 July 2024).
  24. Huang, X.; Liu, W.; Guo, Q.; Tan, J. Prediction method for the dynamic response of expressway lateritic soil subgrades on the basis of Bayesian optimization CatBoost. Soil Dyn. Earthq. Eng. 2024, 186, 108943. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of CatBoost model.
Figure 1. Schematic diagram of CatBoost model.
Sustainability 18 02178 g001
Figure 2. Feature importance analysis for LPI.
Figure 2. Feature importance analysis for LPI.
Sustainability 18 02178 g002
Figure 3. Comparison of forecasted values of logistics industry prosperity index in Lanzhou from January 2022 to March 2025.
Figure 3. Comparison of forecasted values of logistics industry prosperity index in Lanzhou from January 2022 to March 2025.
Sustainability 18 02178 g003
Figure 4. SHAP feature importance bar chart.
Figure 4. SHAP feature importance bar chart.
Sustainability 18 02178 g004
Table 1. Standardized index of logistics industry prosperity index in 2022.
Table 1. Standardized index of logistics industry prosperity index in 2022.
No. Datex1x2x3x4x5x6x7x8x9x10x11x12
20220152.3852.3850.0056.2554.7657.1452.3845.2450.0053.8540.4854.76
20220242.0647.6244.4444.4440.4842.8640.4833.3347.6246.4342.8657.14
20220343.0245.2436.1136.1142.8638.1045.2430.9573.8143.7550.0052.38
20220444.9050.0041.1832.3552.3845.2442.8628.5766.6750.0042.8659.52
20220563.0664.2944.4463.8966.6769.0559.5259.5254.7650.0054.7661.90
20220650.0057.1436.6753.3350.0045.2442.8640.4857.1442.8647.6247.62
20220734.1733.3333.3333.3333.3333.3340.4823.8154.7650.0040.4842.86
20220838.1740.4841.1841.1838.1035.7142.8623.8171.4346.1538.1042.86
20220946.3447.6231.2556.2545.2445.2442.8633.3357.1450.0035.7142.86
20221033.6540.4835.2935.2933.3326.1940.4821.4359.5228.5737.5033.33
20221127.4130.9529.4126.4721.4323.8142.8621.4364.2944.4435.7135.71
20221243.9747.6243.7546.8835.7133.3347.6230.9561.9036.6752.3845.24
Table 2. Standardized index of logistics industry prosperity index in 2023.
Table 2. Standardized index of logistics industry prosperity index in 2023.
No. Datex1x2x3x4x5x6x7x8x9x10x11x12
20230148.9350.0035.7150.0045.2442.8645.2438.1054.7639.2950.0047.62
20230260.5661.1154.1775.0055.5658.3344.4455.5661.1145.8350.0063.89
20230358.7569.4454.1758.3350.0050.0044.4447.2247.2244.4450.0063.89
20230463.1469.0557.6973.0859.5257.1445.2440.4859.5272.7352.3866.67
20230557.4461.9050.0057.6964.2961.9042.8642.8654.7658.3347.6252.38
20230653.9361.9054.1750.0050.0047.6240.4847.6259.5259.0935.7150.00
20230748.7155.0058.3345.8352.547.5047.5042.5060.0061.5440.0055.00
20230856.3857.5050.0062.5060.0057.5047.5050.0052.5060.7152.5057.50
20230956.0855.0054.1758.3350.0055.0045.0040.0057.5061.5452.5060.00
20231064.6770.0050.0066.6780.0050.0050.0050.0060.0083.3360.0080.00
20231159.2160.0058.3358.3357.5057.5052.5052.5057.5067.8652.5062.50
20231255.6755.0054.1766.6760.0055.0052.5055.0057.5066.6752.5070.00
Table 3. Standardized index of logistics industry boom index from 2024 to March 2025.
Table 3. Standardized index of logistics industry boom index from 2024 to March 2025.
No. Datex1x2x3x4x5x6x7x8x9x10x11x12
20240162.7965.0054.1766.6762.5060.0050.0050.0057.5060.7160.0057.50
20240249.5552.5036.3645.4557.5052.5047.5045.0052.5053.8547.5065.00
20240368.0067.5063.6460.0057.5067.5045.0062.5057.5043.7555.0072.50
20240458.6760.0059.0954.1760.0065.0047.5042.5062.5061.1152.5067.50
20240569.6372.5050.0065.0072.5072.5052.5060.0047.5050.0052.5067.50
20240663.2170.0058.3358.3355.0060.0050.0055.0065.0050.0052.5065.79
20240753.5852.5045.8345.8350.0052.5047.5045.0060.0059.0955.0055.00
20240857.4260.0041.6754.1760.0055.0050.0047.5052.5050.0060.0068.42
20240958.8862.5054.1750.0057.5057.5057.5042.5065.0050.0057.5062.50
20241066.9665.0050.0058.3370.0062.5052.5060.0060.0050.0067.5070.00
20241158.2965.0061.5465.3855.0057.5052.5037.5057.5056.2552.5057.50
20241252.8855.0041.6750.0055.0045.0050.0035.0047.5056.2545.0042.50
20250145.6142.5042.3142.3142.5047.5052.5055.0057.5037.5045.0055.00
20250241.4250.0025.0029.1740.0042.5047.5032.5052.5044.4445.0070.00
20250353.3055.0054.5545.4557.5055.0045.0057.5052.5068.7555.0057.50
Table 4. Weighted prosperity index of logistics industry calculated under the entropy method from 2022 to 2024.
Table 4. Weighted prosperity index of logistics industry calculated under the entropy method from 2022 to 2024.
TimeThe Weighted Sentiment Index Calculated by the Entropy Value Method
20220151.92290356
20220243.19266213
20220341.17827971
20220443.68670562
20220560.04859892
20220646.73713025
20220735.12141193
20220838.39802657
20220943.59500877
20221032.2175366
20221129.50626128
20221240.83509791
20230144.28752048
20230258.20794981
20230353.45090399
20230459.75029046
20230554.89551694
20230651.05799209
20230750.46934696
20230856.09668674
20230953.19315475
20231063.85207181
20231158.15264092
20231258.93463078
20240159.14790904
20240249.99104655
20240361.3622919
20240457.0293283
20240562.92335161
20240658.39540845
20240750.70110825
20240854.24613005
20240954.33801531
20241061.25435604
20241155.56461915
20241247.14346081
Table 5. Forecasted value of logistics industry boom index in Lanzhou from January to March 2025 by different models.
Table 5. Forecasted value of logistics industry boom index in Lanzhou from January to March 2025 by different models.
Method/Time2025.12025.22025.3
ground truth46.3905789640.6223606455.50662265
Neural network49.4380332649.6639324753.64082261
Support vector machine47.7506717147.2237176553.91612538
Grey prediction method60.4660000060.9960000061.52800000
decision tree49.9910465544.2875204849.99104655
random forest50.2409019841.8946211954.73918814
XGboost48.4439506543.9849548352.53718948
GBDT2150.5390106839.2433717853.4968249
Adaboost50.2301967643.0633677753.29623168
Extreme decision tree46.6723118542.7296930253.87487926
CatBoost47.3873302741.9766174453.64980211
Table 6. Partial dependence summary for top 6 features.
Table 6. Partial dependence summary for top 6 features.
RankFeatureFeature RangeLPI Effect RangeAvg. Marginal Effect
1Equipment utilization rate (x6)23.81–72.5035.12–62.92+0.57 points/unit
2Total business volume (x1)27.41–69.6340.85–60.05+0.45 points/unit
3Capital turnover rate (x5)21.43–80.0042.15–58.93+0.28 points/unit
4Business activity expectations (x12)33.33–80.0044.28–60.12+0.34 points/unit
5New orders (x2)30.95–72.5043.19–60.05+0.39 points/unit
6Inventory turnover frequency(x4)26.47–75.0041.18–59.75+0.42 points/unit
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Li, Q.; Ma, C.; Xu, X. A CatBoost-Based Prediction Framework for Logistics Industry Prosperity Index to Support Sustainable Decision-Making: An Empirical Study from China. Sustainability 2026, 18, 2178. https://doi.org/10.3390/su18052178

AMA Style

Liu Y, Li Q, Ma C, Xu X. A CatBoost-Based Prediction Framework for Logistics Industry Prosperity Index to Support Sustainable Decision-Making: An Empirical Study from China. Sustainability. 2026; 18(5):2178. https://doi.org/10.3390/su18052178

Chicago/Turabian Style

Liu, Yule, Qiong Li, Changxi Ma, and Xuecai Xu. 2026. "A CatBoost-Based Prediction Framework for Logistics Industry Prosperity Index to Support Sustainable Decision-Making: An Empirical Study from China" Sustainability 18, no. 5: 2178. https://doi.org/10.3390/su18052178

APA Style

Liu, Y., Li, Q., Ma, C., & Xu, X. (2026). A CatBoost-Based Prediction Framework for Logistics Industry Prosperity Index to Support Sustainable Decision-Making: An Empirical Study from China. Sustainability, 18(5), 2178. https://doi.org/10.3390/su18052178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop