Next Article in Journal
Research on Financial Stock Market Prediction Based on the Hidden Quantum Markov Model
Previous Article in Journal
A Gray Predictive Evolutionary Algorithm with Adaptive Threshold Adjustment Strategy for Photovoltaic Model Parameter Estimation
Previous Article in Special Issue
A Local Pareto Front Guided Microscale Search Algorithm for Multi-Modal Multi-Objective Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms

School of Computer and Information, Anhui Polytechnic University, Wuhu 241000, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(15), 2504; https://doi.org/10.3390/math13152504
Submission received: 30 June 2025 / Revised: 28 July 2025 / Accepted: 1 August 2025 / Published: 4 August 2025

Abstract

Various factors influence the formation and adjustment of network freight prices, including transportation costs, cargo characteristics, and policies and regulations. The interaction of these factors increases the difficulty of accurately predicting network freight prices through regressions or other machine learning models, especially when the amount and quality of training data are limited. This paper introduces large language models (LLMs) to predict network freight prices using their inherent prior knowledge. Different data sorting methods and serialization strategies are employed to construct the corpora of LLMs, which are then tested on multiple base models. A few-shot sample dataset is constructed to test the performance of models under insufficient information. The Chain of Thought (CoT) is employed to construct a corpus that demonstrates the reasoning process in freight price prediction. Cross entropy loss with LoRA fine-tuning and cosine annealing learning rate adjustment, and Mean Absolute Error (MAE) loss with full fine-tuning and OneCycle learning rate adjustment to train the models, respectively, are used. The experimental results demonstrate that LLMs are better than or competitive with the best comparison model. Tests on a few-shot dataset demonstrate that LLMs outperform most comparison models in performance. This method provides a new reference for predicting network freight prices.

1. Introduction

Network freight [1] prices are affected by many factors, such as distance, vehicle length, model, type of cargo, fuel prices, and more. Standard billing methods include distance-based billing, weight-based billing, transit time-based billing, or mixed billing. Different types of cargo have different vehicle requirements. For example, high-rail or low-rail trucks are selected based on the height of the cargo, while vans are selected for cargo with higher sealing requirements. In addition, policies and regulations impact freight prices by increasing operational costs through taxes, tariffs, and compliance requirements or by reducing costs through trade agreements and infrastructure investments. The interaction of these factors increases the complexity of network freight price forecasting.
Network freight price prediction is a tabular data forecasting problem [2] that estimates freight prices based on various features, such as distance, cargo weight, delivery time, and other logistical factors. Ensemble Tree models are well suited for this problem, as they can model continuous output variables. However, the performance of these models is limited by the quality of the training data. Additionally, these models tend to overfit noisy data and are not well suited for small samples. Other advanced models are also a research direction, such as graph neural networks. However, graph neural networks excel at processing graph structure data between nodes and edges, but struggle with processing tabular data.
Large language models (LLMs) utilize their prior knowledge to address the issue of overfitting in noisy and small-sample cases. They are also better at processing tabular data by converting it into a text format. This paper proposes an approach to price prediction in the network freight domain using pre-trained LLMs. Fine-tuning [3] the base model allows it to interpret new data patterns emerging in the network freight market, thereby improving the accuracy and efficiency of forecasts.
Three data sorting [4] methods (initial sort, feature-based sort, and distance-based sort) and four data serialization [5] methods (Named Feature Sequence, Value Only Sequence, List Temp Sequence, and Json Sequence) are used in this paper to build a diversified LLMs corpus and perform multiple tests on base models. These methods enhance the data structure and format of model training, and are used to verify the specific impact of different data formats on model performance, thereby providing an experimental basis for an in-depth understanding of the relationship between data format and model performance.
The few-shot learning performance of LLMs is investigated in this paper by constructing different training sample sets. Our experiments evaluate how LLMs perform under varying amounts of training data, with a particular emphasis on their ability to generalize from limited examples.

2. Related Work

2.1. Network Freight Price Prediction

Regression prediction task: This type of problem involves predicting order prices based on the characteristics of a network freight order. Standard methods include a tree ensemble model, a neural network, and a linear model. Examples of the neural network method include Budak et al. [6], who investigate the price forecasting of the truckload spot market from the truckers’ perspective, considering a comprehensive set of variables, and LI et al. [7], who design a transaction pricing prediction model of a network freight platform based on dual LSTM and predict the results by K-means cluster analysis. Examples of the tree ensemble model method include Jang et al. [8], who identify the key variables influencing the determination of shipping costs and propose a recommended shipping cost derived from a price prediction model based on machine learning techniques, and Jingwei Guo et al. [9], who integrate the cargo floating price prediction model with the neural network algorithm (NNA) to develop a predictive framework. Examples of the linear model method include Macarringue et al. [10], who enhance the understanding of the variables influencing road freight costs by developing a road freight prediction system based on a multiple linear regression model, employing variable selection techniques such as Stepwise, Forward, and Backward elimination. In addition, Spreeuwenberg [11] employs best subset regression for feature selection and builds a multiple linear regression model to forecast freight prices in Poland, and Lindsey et al. [12] focus on predicting truckload freight rates in spot markets using linear regression models based on shipment- and lane-level data. This method identifies key cost-driving factors and builds predictive models to support more accurate freight rate estimation and lane performance analysis.
Time series prediction task: This problem is studied to predict the time index related to logistics prices, which changes with time. Kjeldsberg et al. [13] examine the factors influencing PSV time charter freight rates and investigate the use of AutoML modeling to capture nonlinearities in forecasting PSV freight rates over out-of-sample horizons of 1, 3, and 6 months. Bae et al. [14] provide valuable information to stakeholders by forecasting the tramp shipping market. Koyuncu et al. [15] explore different time-series models related to the Shanghai Containerized Freight Index.
Pricing strategy from the perspective of game theory: This type of problem research uses pricing games to improve the efficiency and stability of the network freight market. Tamannaei et al. [16] investigate a competitive freight transportation pricing problem involving two Intermodal Service Providers (ISPs) and a Direct Transportation System (DTS). Dimitriou et al. [17] develop a methodological framework and apply it to a real-world system, integrating the concept of pricing differentiation among competing container port facilities.

2.2. The Application of LLMs in Task Prediction

Regression prediction task: LLMs are primarily developed for natural language understanding and generation. Their potential applications in regression prediction tasks are being explored. Requeima et al. [4] construct a regression model to process numerical data and generate probabilistic predictions at arbitrary locations, leveraging natural language text to incorporate the user’s prior knowledge. Dinh et al. [18] propose Language Interface Fine-Tuning (LIFT) and study its effectiveness and limitations through extensive empirical studies on non-linguistic classification and regression tasks. Song et al. [19] propose OmniPred, a framework for training language models as universal end-to-end regressors designed to evaluate data from diverse real-world experiments. Rubungo et al. [20] propose LLM-Prop, which leverages the general learning ability of LLMs to predict the physical and electronic properties of crystals from text descriptions.
Time series prediction task: Due to LLMs’ robust context understanding and long-range dependence modeling capabilities, some scholars have studied their application to time series prediction tasks. Jin et al. [21] introduce a reprogramming framework that repurposes LLMs for general time series forecasting while preserving the integrity of the underlying language models. Xue et al. [22] propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). Zhou et al. [23] tackle the challenge of insufficient training data by utilizing language or computer vision models, pre-trained on billions of tokens, for time series analysis. Jia et al. [24] introduce GPT4MTS, a simultaneous prompt-based large language model (LLM) framework designed to leverage numerical data and textual information.
Few-shot learning: Because LLMs are pre-trained on vast amounts of data, they possess prior knowledge that gives them an advantage in few-shot learning scenarios. Hegselmann et al. [5] propose TabLLM to study the application of LLMs in zero-shot and few-shot classification of tabular data. Perez et al. [25] evaluate the few-shot ability of LLMs when such held-out examples are unavailable. Chen et al. [26] explore the potential of transformers to enhance clinical prediction performance relative to traditional machine learning methods while also addressing the challenge of few-shot learning in predicting rare disease areas.

2.3. Text Analytics

Moreno and Redondo [27] explore the integration of text analytics with big data and artificial intelligence, focusing on the application and challenges of text analytics technology in processing massive amounts of unstructured data. They review the primary technologies of text analytics, including information extraction, named entity recognition, topic detection, and sentiment analysis, and highlight the role of machine learning and deep learning in enhancing the results of text analytics. Yuan et al. [28] present a novel text analytics framework using Domain-Constraint Latent Dirichlet Allocation (DC-LDA) to predict crowdfunding success. By extracting latent semantic features from project descriptions, the framework outperforms traditional keyword-based approaches in identifying factors influencing funding outcomes. Gandomi and Haider [29] highlight text analytics as a key method for extracting valuable insights from unstructured data in big data environments. They discuss techniques such as information extraction, text summarization, and sentiment analysis, which are essential for converting large volumes of text into structured, actionable knowledge. These text analytics methods enable businesses to make evidence-based decisions and enhance their predictive capabilities by analyzing data from sources such as social media, customer reviews, and corporate documents.

3. Problem Formulation

Network freight comprises four main entities: shippers, network freight platforms, carriers, and consignees. The shipper publishes the transportation demand through the platform. After the carrier receives the supply order, it confirms the freight, collection, and delivery times, as well as the cargo and other detailed information with the shipper and the platform, and then signs the transportation contract. The shipper pays the freight through the platform, and the platform pays the freight to the carrier after confirming the delivery of the cargo. The transportation process is completed after the consignee confirms receipt of the cargo. As a hub, the network freight platform facilitates the collaboration of all parties and provides price estimation services. The relation is illustrated in Figure 1.
Given a set of network freight data features, the goal is to predict the network freight price for each order. The input to the problem is represented by a feature matrix, X = [ x 1 , x 2 , , x N ] R P × N , where each x i R P is a feature vector corresponding to the i-th order, P is the number of features, and N is the total number of network freight orders. The output is denoted as Y = [ y 1 , y 2 , , y N ] R N , where y i represents the price associated with the i-th order. The objective is to develop a predictive model that, given the features, x i , predicts the corresponding price, y i . The descriptions of all symbols can be found in Table 1.

4. Methodology

4.1. Framework

As illustrated in Figure 2, LLMs’ corpus construction involves data serialization and data sorting. Fine-tune the pre-trained model, which is a transformer-based [30] architecture using processed network freight data. The training strategies are based on GLM4-9b-chat [31], Qwen2.5-7B-instruct [32], Llama3-8b-instruct [33], and T5 [34], with LoRA [35] fine-tuning and Full fine-tuning applied using cross entropy loss and MAE loss. The knowledge acquired by the pre-trained model is transferred to the fine-tuning stage. Through fine-tuning via transfer learning [36], the model assimilates knowledge of network freight and adapts to network freight price prediction tasks, even when the target task has limited data.

4.2. Data Serialization

This paper uses the following four data serialization [5] methods (Named Feature Sequence, Value Only Sequence, List Temp Sequence, and Json Sequence). See Figure 3 for specific examples.
  • Named Feature Sequence (NFS)
    The input data combines feature names, feature values, and commas. Each table’s feature names and corresponding values are concatenated into key-value pairs, separated by commas. The sequence of key-value pairs forms the training samples for the LLMs.
  • Value Only Sequence (VOS)
    A space separates all the feature values of the network freight data.
  • List Temp Sequence (LTS)
    The input data combines feature name, feature value, and line feed. The feature names of each table data item and corresponding feature values are concatenated into key-value pairs, separated by newlines.
  • Json Sequence (JS)
    The input data is constructed in JSON format, consisting of a set of key-value pairs (feature name, feature value) enclosed in curly braces. The key is of the string type and is enclosed in double quotation marks (“key”). Commas separate the key-value pairs.

4.3. Data Sorting

This paper uses the following three data sort [4] methods: initial sort, feature-based sort, and distance-based sort.
  • Initial sort (IS)
    The order of the raw data follows its initial sort shown in Figure 4.
  • Feature-based sort (FS)
    As illustrated in Figure 5, this paper selected the three features most relevant to price. The sorting principle is as follows: primarily by the first feature, secondarily by the second feature if the first is equal, and finally by the third feature if the second is also equal.
  • Distance-based sort (DS)
    As illustrated in Figure 6, the sum of the feature differences between the target and all other points is calculated as the distance. The closer the target point is to the centre of the data cluster, the smaller the calculated distance will be. According to this distance, the data is sorted from smallest to largest.
For a dataset X, every data point, x i , consists of features { x i 1 , x i 2 , , x i P } , and the target data is x = { x 1 , x 2 , , x N 1 } . For each data point, x i , compute the distance to x j as d i j = k = 1 P | x i k x j k * | . For each data point, x i , define the objective function as S i = j = 1 N d i j . Sorting by the objective function S i for each x i .
The initial sort reflects the initial state of the data, the feature-based sort reflects the importance of the data features, and the distance-based sort reflects the distance of the data from the centre of the data cluster.

4.4. Chain-of-Thought

Chain-of-thought (CoT) [37] reduces the complexity of tasks by breaking them down into smaller tasks, gradually guiding LLMs to the correct answer, and improving the reasoning ability of the LLMs.
In this paper, we utilize the features most relevant to freight pricing to train a linear formula for calculating freight prices. The reasoning process of the LLMs’ thought chain is as follows: the basic freight price is calculated through the formula from Step 1 to Step N, and the final freight price is determined by considering other factors. Use “Let’s think step by step” to guide LLMs to reason step by step. Use the REASONING tag to guide the LLMs’ output reasoning process and increase the interpretability of the LLMs’ reasoning. Use the ANS tag to output the final price.
On the MathorCup dataset, the linear model test performance of the top-1 to top-3 features relevant to price gradually improves. When there are more than 3 features, the linear model test performance does not improve significantly. Therefore, 3 features are selected to build a linear model. The linear model performance can be found in Table 2. The price correlation can be found in Section 5.1, Datasets.
MathorCup dataset CoT corpus construction is shown in Equation (1) and Figure 7.
Y = 26.7603 + 4.2793 × X 1 + 0.2446 × X 2 + 2.0341 × X 3
On the HackerEarth dataset, a linear model using the top-9 most price-related features achieves an R 2 performance that is 97.3% of that obtained using all features. The remaining features have a low correlation with price. To reduce the length of the LLM’s inference sequence, the top 9 most price-related features are selected. The linear model performance can be found in Table 3. The price correlation can be found in Section 5.1, Datasets.
HackerEarth dataset CoT corpus construction is shown in Equation (2) and Figure 8. Freight cost is a compressed value.
Y = 4.1022 + 1.618714 × X 1 + 0.000005 × X 2 + 0.012749 × X 3 + 0.015024 × X 4 + 0.009922 × X 5 + 0.013978 × X 6 + 0.146301 × X 7 + 0.111457 × X 8 + 0.078866 × X 9 .

4.5. Loss Function

Cross Entropy Loss. Although traditionally used for classification, cross entropy loss can be adapted for regression by discretizing the continuous output space into intervals and treating the task as a classification problem over these intervals.
L = i = 1 N k = 1 K t i c log t ^ i c
Here, t i c denotes the boolean value in price classification c, while t ^ i c represents the predicted probability for that classification.
MAE Loss. It measures the average of the absolute differences between the predicted and actual values, providing a robust performance metric that is not unduly influenced by outliers.
L = 1 n i = 1 N | y i y ^ i |

4.6. Fine-Tuning

In resource limitation cases, we utilized LoRA for efficient adaptation, while Full fine-tuning was applied when resources and the model permitted optimal performance. The hyperparameters were manually selected based on the training loss, resulting in a configuration with a low loss, although not zero.
LoRA Fine-Tuning. Suppose X R b × p is a matrix representing network freight order data, where b is the batch size and p is the dimensionality of the features. In LoRA [35], the weight matrix, W, is adjusted by adding a low-rank modification, A B , to a base weight matrix, W 0 :
W = W 0 + A B
Here, A R p × r and B R r × n are newly introduced low-rank matrices, and r is a rank significantly smaller than m and n, indicating that the rank of A B is restricted to a lower level.
The modified weight matrix, W, is used to compute the result, Y:
Y = X W = X ( W 0 + A B )
This shows that the output, Y, depends not only on the original weights, W 0 , but is also influenced by the low-rank update defined by A and B. Allowing LoRA to finely adjust the model behaviour by adjusting fewer parameters thereby achieves adaptive improvements for network freight prediction tasks. The pseudocode of LoRA fine-tuning is as follows.
for layer in model.layers:
   layer.A, layer.B = initialize_low_rank_matrices()
for epoch in range(num_epochs):
   for batch in dataset:
     loss = compute_loss(model(batch))
     loss.backward()
     update_low_rank_matrices(layer.A, layer.B)
     optimizer.step()
Full Fine-Tuning. All parameters of the pre-trained model are updated during the training process [38]. Let X R b × p be the input data matrix. The model weights, W, are initialized from a pre-trained model and directly adjusted during training.
The output, Y, is computed as:
Y = X W
The pseudocode for Full fine-tuning is as follows.
for epoch in range(num_epochs):
   for batch in dataset:
     outputs = model(batch)
     loss = compute_loss(outputs, price)
     loss.backward()
     optimizer.step()

4.7. Learning Rate Adjustment

Cosine Annealing. In optimizing the learning strategy, the learning rate is considered a cosine function that varies with the iterations [39]. The equation describes the learning rate, η t , at any time t, oscillating between the minimum learning rate, η min , and the maximum learning rate, η max , over the cycle period T max . The learning rate at any given time can be expressed as:
η t = η min + 1 2 ( η max η min ) 1 + cos π t T max
In this model, the learning rate, η , varies between the minimum, η min , and the maximum, η max , to reflect the nonlinear relationship between learning rate and iterations. The frequency of learning rate variation is determined by the period T max . The learning rate example curve for this study is shown in Figure 9.
OneCycle. Fast convergence and local optimality are avoided by adjusting the learning rate in an alternating manner, increasing and decreasing it once [40]. The learning rate example curve is shown in Figure 10.

5. Experiments

5.1. Datasets

In this research, three datasets are employed, and their key characteristics are presented in Table 4. The changes in dataset size are shown in Table 5. The first dataset is from MathorCup (MC) 2020, which focuses on the pricing issues of Vehicle Free Carrier Platform Routes. The MathorCup dataset was provided by the official MathorCup competition organizers and is publicly available through the competition website http://www.mathorcup.org/detail/2294 (accessed on 1 January 2025). The data preprocessing procedures for this dataset are analogous to those detailed in the paper “Prediction model of transaction pricing in internet freight transport platform based on a combination of dual long short-term memory networks”. The second dataset is from HackerEarth (HE) Machine Learning, namely Exhibit A(rt), which focuses on predicting shipping costs for paintings, antiques, sculptures, and other collectables to customers. The Exhibit A(rt) dataset was obtained from HackerEarth, which is an online coding platform and developer assessment software, and is also publicly available at https://www.kaggle.com/datasets/oossiiris/hackerearth-machine-learning-exhibit-art/data (accessed on 1 January 2025). In the HE case, outliers are removed using the Interquartile Range (IQR) method. Since the freight cost span is large, the log function is used to compress the freight cost value. The data preprocessing steps are similar to those described in the kaggle notebook: https://www.kaggle.com/code/rachitjain124/exhibition-art-shipment-cost-prediction (accessed on 1 January 2025). The third dataset is the Company(CO) dataset, which comes from the logistics companies in the cooperation project.
Transport freight trends in MathorCup are shown in Table 6. Region 4’s freight costs are not only much higher on average compared with other regions but also exhibit substantial fluctuations. The trends could be due to various factors, such as specialized or large-scale transportation needs, infrastructure limitations, or regional economic disparities, which result in higher or inconsistent freight prices. Region 5 shows relatively high freight costs. However, the variation in costs here is significantly lower than in Region 4, indicating a more stable freight pricing structure. Region 1 represents a region where freight costs are relatively moderate but still exhibit some level of variability. This phenomenon suggests that, while freight services are generally affordable, there may be occasional fluctuations due to factors such as seasonality, demand spikes, or specific routes. Region 3 suggests that, while the average freight costs are relatively similar to those in Region 1, the costs in this region fluctuate significantly more. This phenomenon may imply that Region 3 experiences certain supply chain disruptions or irregularities in freight pricing. Region 6 indicates a mid-range freight cost structure with moderate fluctuations. This region may have more stable infrastructure and less variability in its freight services compared with other regions. Region 2 reflects a region where freight costs are lower, possibly due to less complex logistics or a well-developed transportation network that drives down costs.
Transport freight trends in HackerEarth are shown in Table 7. Airways have the highest average freight cost with a significant variance, indicating that, while the average cost for air transport is considerably high, prices fluctuate substantially. This result could be due to factors such as the high value of goods being transported, seasonal demand variations, or the reliance on air transport for time-sensitive deliveries, which may incur premium charges. Roadways show a more moderate cost structure compared with airways. Although the average cost is still substantial, the variance is lower, suggesting that freight prices via road transport are relatively more stable. Roadway transport represents a balance between cost and flexibility, accommodating a wide variety of goods and distances. Waterways presents the lowest average freight, along with the smallest variance. The relatively lower cost and more stable variance may indicate that waterways are a cost-effective option for bulk goods over long distances. The reduced volatility suggests fewer market fluctuations, which is likely due to the more predictable nature of shipping schedules and routes.
Missing points heatmaps are shown in Figure 11 and Figure 12. There are many missing values in the features ‘Sub-package number’, ‘Number of installations’, ‘Number of unloading’, and ‘B-side bargaining lowest price’ in the MathorCup dataset. Features such as Transport, Artist Reputation, and Remote Location have missing values in the HackerEarth dataset.
Price correlation analysis [41] is shown in Figure 13 and Figure 14. The features most relevant to price include ‘Total Mileage’, ‘Driving Minutes Before Planned Arrival’, ‘Unloading Minutes Planned’, etc., in the MathorCup dataset. ‘Artist Reputation’, ‘Weight’, and ‘Price Of Sculpture’ are most correlated with price in the HackerEarth dataset.

5.2. Setups

Hardware Environment: The experiment is equipped with a single NVIDIA RTX 4090 or 4090D GPU, an AMD EPYC 9754 128-core Processor, and 60 GB of RAM.
Software Environment: The base models include GLM-4, Qwen2.5, and Llama 3. GLM-4’s multi-dimensional indicators are comparable to those of OpenAI’s GPT-4. Qwen2.5 performs well in understanding and generating structured data. Llama 3 has demonstrated state-of-the-art performance in various industry benchmarks. These models are open source. The fine-tuning frameworks employed are LLaMA-Factory [42] and the Transformers library [30]. Table 8 lists the software versions.

5.3. Evaluation Metrics

The four evaluation metrics employed are as follows.
  • MAE (Mean Absolute Error)
    MAE = 1 n i = 1 n y i y ^ i
  • MSE (Mean Squared Error)
    MSE = 1 n i = 1 n y i y ^ i 2
  • MAPE (Mean Absolute Percentage Error)
    MAPE = 100 % n i = 1 n y i y ^ i y i
  • R 2 (Coefficient of Determination)
    R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ 2

5.4. Performance

5.4.1. Compared Models Experiments in Full Data

The comparison encompasses various types of models, including tree ensemble methods (such as XGBoost, LightGBM, Random Forest (RF), and DecisionTree (DT)), deep learning approaches (e.g., deep neural networks), and classical linear models (e.g., PLS, Lasso, Ridge, Kernel Ridge (KR), PCA Regression (PCA), and Bayesian Ridge (BR)). The performance of the compared models is depicted in Table 9.

5.4.2. Cross Entropy Loss and LoRA Fine-Tuning Experiments in Full Data

The cross entropy loss experiments’ parameters are shown in Table 10. The learning rate strategy is cosine annealing. The performance results achieved by utilizing different serialization approaches and sorting methods are presented in Table 11, Table 12 and Table 13.

5.4.3. Chain-of-Thought Experiments

The Chain-of-thought experiments configuration is as follows: The data serialization is Named Feature Sequence. Data Sort is the initial sort method. The loss function is cross entropy loss. The experimental results are shown in Table 14.

5.4.4. MAE Loss Experiments in Full Data

The Value Only Sequence was chosen for data serialization due to its lower hardware resource usage. Data Sort was used as the initial sort method. The learning rate strategy is OneCycle. The code is referenced from the open-source code [20]. The MAE loss experiments’ training parameters are shown in Table 15. The performance results are presented in Table 16 [41].

5.4.5. Few-Shot Learning Experiments

This paper explored the model’s performance using few-shot learning with varying data percentages across three datasets, as shown in Table 17, Table 18 and Table 19. The datasets were divided into different percentages of the total data, ranging from 40% to as low as 0.25%. Visualization trends are shown in Figure 15 and Figure 16. This paper measured the model’s performance for each dataset based on the number of training samples available for each setting.
GLM was selected as the base model for the large language mode(LLM), and the Named Feature Sequence was chosen for data serialization. Data Sort was used as the initial sort method.

6. Discussion

6.1. Results Analysis

Analysis of Full Data Learning: On the MathorCup dataset, LLMs achieved the best MAE and MAPE to the baseline models. Notably, LLMs achieved the best MAE, MSE, MAPE, and R 2 to the baseline models on the HackEarth and Company datasets. This performance demonstrates that LLMs are superior to the linear models and compete with the tree-integrated models, the most potent regression prediction model, in network freight price prediction.
Analysis of Few-Shot Learning: LLMs outperformed most comparison models on both datasets in a small number of training samples. This phenomenon demonstrates that LLMs utilize prior knowledge to enhance prediction performance in cases involving small sample sizes.
Analysis of Data Serialization: The Named Feature Sequence achieved the maximum number of top performances across four evaluation metrics on two datasets. Named Feature Sequence presents key-value pairs in the form of concise named statements. This representation method provides clear semantic information for the model, making the relationship between features more intuitive and identifiable, and improving the model’s ability to understand data features. The sequence combines input features with values to clarify the contextual relationship of each feature, which enables the model to better understand the actual meaning of each data point and its impact on prediction results. The corpus constructed by the Named Feature Sequence aligns with human language habits and explicitly expresses features and semantics, thereby improving the model’s prediction performance.
Analysis of Data Sorting: The distance-based sort performs best in most cases. The closer the sample is to the “centre” of the data distribution, the more representative it is. Distance-based sort enables LLMs to learn from core samples first, recognize the mainstream pattern, and mitigate the interference of outliers.
Analysis of CoT: Cot prompt can improve MSE, MAPE, and R 2 performance metrics in some cases.

6.2. Interpretability

LLMs excel in automatically extracting features from raw tabular data, such as identifying cargo types and discerning market trends, thanks to their ability to perform automated feature engineering. They also incorporate extensive prior knowledge, including historical price patterns and industry trends, and utilize large parameter sets to model complex, high-dimensional relationships. The synergistic effect of using these three factors enhances the predictive accuracy of these models.

6.3. Limitations

Fine-tuning pre-trained LLMs requires more hardware resources and training time, which increases the usage threshold for large-scale datasets. The dataset biases, such as uneven value distributions, could influence the model’s performance in different scenarios.

7. Conclusions

This paper converts tabular data into textual language, builds a diverse corpus of LLMs, and implements network freight price prediction by fine-tuning open-source LLMs with private deployment. Experimental evaluations on three distinct datasets have demonstrated that the LLMs perform better than or comparable to established tree ensemble models. The findings from this study not only broaden the applications of LLMs in predicting freight prices but also provide a reference for employing this method in other tabular data prediction fields.
The main contributions of this paper are as follows: 1. For the first time, LLMs are applied to the domain of network freight price prediction, providing a more reliable decision-support tool for the logistics industry, as per the authors’ investigation. 2. LLMs are capable of identifying complex associations in network freight price prediction by integrating prior knowledge to achieve more accurate price prediction. 3. LLMs showed superior performance among compared models in scenarios involving small sample sizes in most cases.
The limitations of this method are as follows: 1. Fine-tuning LLMs takes longer to train than tree models. Compared with tree models, fine-tuning LLMs requires GPU(s), as well as additional storage resources. 2. Model tuning and hyperparameter selection are complex and challenging in the process of fine-tuning LLMs.
The suggestions for future research are as follows: 1. Research methods for efficient parameter fine-tuning of LLMs to reduce computing resource usage and speed up the fine-tuning process. 2. Research more efficient methods for searching fine-tuning LLMs’ hyperparameters.

Author Contributions

Conceptualization, P.Z.; methodology, P.L.; software, P.L. and J.W.; validation, P.L.; formal analysis, P.L.; investigation, P.L.; resources, P.Z. and T.L.; data curation, P.L.; writing—original draft preparation, P.L.; writing—review and editing, P.L., P.Z., J.W., X.W., Y.M. and T.L.; visualization, P.L.; supervision, P.Z. and T.L.; project administration, P.Z. and T.L.; funding acquisition, P.Z. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Anhui Provincial University Research Project (grant number 2024AH050109) and Anhui Future Technology Research Institute Enterprise Cooperation Project (grant number 2023qyhz12).

Data Availability Statement

MathorCup dataset is from http://www.mathorcup.org/detail/2294 (accessed on 1 January 2025), and HackerEarth dataset is from https://www.kaggle.com/datasets/oossiiris/hackerearth-machine-learning-exhibit-art/data (accessed on 1 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Park, A.; Chen, R.; Cho, S.; Zhao, Y. The determinants of online matching platforms for freight services. Transp. Res. Part E Logist. Transp. Rev. 2023, 179, 103284. [Google Scholar] [CrossRef]
  2. Gorishniy, Y.; Rubachev, I.; Khrulkov, V.; Babenko, A. Revisiting Deep Learning Models for Tabular Data. In Proceedings of the Advances in Neural Information Processing Systems, Virtual-only, 6–14 December 2021; pp. 18932–18943. [Google Scholar]
  3. Xia, Y.; Kim, J.; Chen, Y.; Ye, H.; Kundu, S.; Hao, C.C.; Talati, N. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. In Proceedings of the 2024 IEEE International Symposium on Workload Characterization, Vancouver, BC, Canada, 15–17 September 2024; pp. 210–223. [Google Scholar] [CrossRef]
  4. Requeima, J.; Bronskill, J.; Choi, D.; Turner, R.E.; Duvenaud, D. LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; pp. 109609–109671. [Google Scholar]
  5. Hegselmann, S.; Buendia, A.; Lang, H.; Agrawal, M.; Jiang, X.; Sontag, D. TabLLM: Few-shot Classification of Tabular Data with Large Language Models. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 5549–5581. [Google Scholar]
  6. Budak, A.; Ustundag, A.; Guloglu, B. A forecasting approach for truckload spot market pricing. Transp. Res. Part A Policy Pract. 2017, 97, 55–68. [Google Scholar] [CrossRef]
  7. Li, Y.; Hu, Z.; Chen, C.; Yang, P.; Dong, Y. Prediction model of transaction pricing in internet freight transport platform based on combination of dual long short-term memory networks. J. Comput. Appl. 2022, 42, 1616. [Google Scholar]
  8. Jang, H.S.; Chang, T.W.; Kim, S.H. Prediction of Shipping Cost on Freight Brokerage Platform Using Machine Learning. Sustainability 2023, 15, 1122. [Google Scholar] [CrossRef]
  9. Guo, J.; Wang, J.; Li, Q.; Guo, B. Construction of Prediction Model of Neural Network Railway Bulk Cargo Floating Price Based on Random Forest Regression Algorithm. Neural Comput. Appl. 2019, 31, 8139–8145. [Google Scholar] [CrossRef]
  10. Macarringue, A.M.J.S.; Oliveira, A.L.R.d.; Dias, C.T.d.S.; Marsola, K.B. Multidimensionality of agricultural grain road freight price: A multiple linear regression model approach by variable selection. Ciência Rural 2023, 54, e20220335. [Google Scholar] [CrossRef]
  11. Spreeuwenberg, S. Developing a Forecast Model for Freight Prices in Poland. Master’s Thesis, Department of Economics and Econometrics, Tilburg University, Tilburg, The Netherlands, 2020. [Google Scholar]
  12. Lindsey, C.; Frei, A.; Alibabai, H.; Mahmassani, H.S.; Park, Y.W.; Klabjan, D.; Reed, M.; Langheim, G.; Keating, T. Modeling carrier truckload freight rates in spot markets. In Proceedings of the Submitted for presentation at the 92nd 24 Annual Meeting of the Transportation Research Board, Washington, DC, USA, 13–17 January 2013. [Google Scholar]
  13. Kjeldsberg, F.; Haque Munim, Z. Automated machine learning driven model for predicting platform supply vessel freight market. Comput. Ind. Eng. 2024, 191, 110153. [Google Scholar] [CrossRef]
  14. Bae, S.H.; Lee, G.; Park, K.S. A Baltic Dry Index prediction using deep learning models. J. Korea Trade 2021, 25, 17–36. [Google Scholar] [CrossRef]
  15. Koyuncu, K.; Tavacıoğlu, L. Forecasting Shanghai Containerized Freight Index by Using Time Series Models. Mar. Sci. Technol. Bull. 2021, 10, 426–434. [Google Scholar] [CrossRef]
  16. Tamannaei, M.; Zarei, H.; Aminzadegan, S. A Game-Theoretic Approach to the Freight Transportation Pricing Problem in the Presence of Intermodal Service Providers in a Competitive Market. Networks Spat. Econ. 2021, 21, 123–173. [Google Scholar] [CrossRef]
  17. Dimitriou, L. Optimal competitive pricing in European port container terminals: A game-theoretical framework. Transp. Res. Interdiscip. Perspect. 2021, 9, 100287. [Google Scholar] [CrossRef]
  18. Dinh, T.; Zeng, Y.; Zhang, R.; Lin, Z.; Gira, M.; Rajput, S.; Sohn, J.Y.; Papailiopoulos, D.; Lee, K. LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 11763–11784. [Google Scholar]
  19. Song, X.; Li, O.; Lee, C.; Yang, B.; Peng, D.; Perel, S.; Chen, Y. OmniPred: Language Models as Universal Regressors. arXiv 2025, arXiv:2402.14547. [Google Scholar]
  20. Rubungo, A.N.; Arnold, C.; Rand, B.P.; Dieng, A.B. LLM-Prop: Predicting Physical and Electronic Properties of Crystalline Solids From Their Text Descriptions. arXiv 2023, arXiv:2310.14029. [Google Scholar]
  21. Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.Y.; Liang, Y.; Li, Y.F.; Pan, S.; et al. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. arXiv 2024, arXiv:2310.01728. [Google Scholar]
  22. Xue, H.; Salim, F.D. PromptCast: A New Prompt-Based Learning Paradigm for Time Series Forecasting. IEEE Trans. Knowl. Data Eng. 2024, 36, 6851–6864. [Google Scholar] [CrossRef]
  23. Zhou, T.; Niu, P.; Wang, X.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained LM. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 43322–43355. [Google Scholar]
  24. Jia, F.; Wang, K.; Zheng, Y.; Cao, D.; Liu, Y. GPT4MTS: Prompt-based Large Language Model for Multimodal Time-series Forecasting. In Proceedings of the Association for the Advancement of Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 23343–23351. [Google Scholar] [CrossRef]
  25. Perez, E.; Kiela, D.; Cho, K. True Few-Shot Learning with Language Models. In Proceedings of the Advances in Neural Information Processing Systems, Virtual-only, 6–14 December 2021; pp. 11054–11070. [Google Scholar]
  26. Chen, Z.; Balan, M.M.; Brown, K. Language Models are Few-shot Learners for Prognostic Prediction. arXiv 2023, arXiv:2302.12692. [Google Scholar]
  27. Moreno, A.; Redondo, T. Text analytics: The convergence of big data and artificial intelligence. Int. J. Interact. Multimed. Artif. Intell. 2016, 3, 57–64. [Google Scholar] [CrossRef]
  28. Yuan, H.; Lau, R.Y.; Xu, W. The determinants of crowdfunding success: A semantic text analytics approach. Decis. Support Syst. 2016, 91, 67–76. [Google Scholar] [CrossRef]
  29. Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef]
  30. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  31. Zeng, A.; Xu, B.; Wang, B.; Zhang, C.; Yin, D.; Zhang, D.; Rojas, D.; Feng, G.; Zhao, H.; Lai, H.; et al. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv 2024, arXiv:2406.12793. [Google Scholar]
  32. Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
  33. Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar]
  34. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  35. Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. In Proceedings of the Tenth International Conference on Learning Representations, Virtual-only, 25–29 April 2022; p. 3. [Google Scholar]
  36. Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
  37. Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D.; Ichter, B. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 24824–24837. [Google Scholar]
  38. Christophe, C.; Kanithi, P.K.; Munjal, P.; Raha, T.; Hayat, N.; Rajan, R.; Al-Mahrooqi, A.; Gupta, A.; Salman, M.U.; Gosal, G.; et al. Med42–evaluating fine-tuning strategies for medical LLMs: Full-parameter vs. parameter-efficient approaches. arXiv 2024, arXiv:2404.14779. [Google Scholar]
  39. Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar]
  40. Hannan, M.A.; How, D.N.T.; Mansor, M.B.; Hossain Lipu, M.S.; Ker, P.J.; Muttaqi, K.M. State-of-Charge Estimation of Li-ion Battery Using Gated Recurrent Unit with One-Cycle Learning Rate Policy. IEEE Trans. Ind. Appl. 2021, 57, 2964–2971. [Google Scholar] [CrossRef]
  41. Lu, P.; Wang, Y.; Tang, Z.; Wu, X.; Liu, T.; Zhang, P.; Liu, S.; Bao, X. Network Freight Price Forecast via Bayesian Hierarchical Model. Int. J. Mach. Learn. Cybern. 2024; submitted. [Google Scholar]
  42. Zheng, Y.; Zhang, R.; Zhang, J.; Ye, Y.; Luo, Z.; Feng, Z.; Ma, Y. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024. [Google Scholar]
Figure 1. Diagram of the logistics delivery process and key participants on the network freight platform.
Figure 1. Diagram of the logistics delivery process and key participants on the network freight platform.
Mathematics 13 02504 g001
Figure 2. Framework for network freight price prediction based on LLMs.
Figure 2. Framework for network freight price prediction based on LLMs.
Mathematics 13 02504 g002
Figure 3. Data serialization approaches in freight price prediction.
Figure 3. Data serialization approaches in freight price prediction.
Mathematics 13 02504 g003
Figure 4. Initial sort in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.
Figure 4. Initial sort in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.
Mathematics 13 02504 g004
Figure 5. Feature-based in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.
Figure 5. Feature-based in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.
Mathematics 13 02504 g005
Figure 6. Distance-based sort in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.
Figure 6. Distance-based sort in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.
Mathematics 13 02504 g006
Figure 7. Chain-of-thought prompting for MathorCup dataset. For better readability, line breaks have been added to the output section.
Figure 7. Chain-of-thought prompting for MathorCup dataset. For better readability, line breaks have been added to the output section.
Mathematics 13 02504 g007
Figure 8. Chain-of-thought prompting for HackerEarth dataset. For better readability, line breaks have been added to the output section.
Figure 8. Chain-of-thought prompting for HackerEarth dataset. For better readability, line breaks have been added to the output section.
Mathematics 13 02504 g008
Figure 9. Cosine annealing learning rate.
Figure 9. Cosine annealing learning rate.
Mathematics 13 02504 g009
Figure 10. OneCycle learning rate.
Figure 10. OneCycle learning rate.
Mathematics 13 02504 g010
Figure 11. Missing data in MathorCup dataset.
Figure 11. Missing data in MathorCup dataset.
Mathematics 13 02504 g011
Figure 12. Missing data in HackerEarth dataset.
Figure 12. Missing data in HackerEarth dataset.
Mathematics 13 02504 g012
Figure 13. Price correlation analysis of each feature in MathorCup dataset.
Figure 13. Price correlation analysis of each feature in MathorCup dataset.
Mathematics 13 02504 g013
Figure 14. Price correlation analysis of each feature in HackerEarth dataset.
Figure 14. Price correlation analysis of each feature in HackerEarth dataset.
Mathematics 13 02504 g014
Figure 15. Comparison of model performance trends with varying data size on the MathorCup dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and R 2 as metrics. The MAE increases as the data size decreases, and R 2 increases with smaller data sizes. (b) This subfigure shows the performance trends of the XGBoost model, comparing MAE and R 2 metrics. As the data size decreases, the MAE increases and R 2 decreases.
Figure 15. Comparison of model performance trends with varying data size on the MathorCup dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and R 2 as metrics. The MAE increases as the data size decreases, and R 2 increases with smaller data sizes. (b) This subfigure shows the performance trends of the XGBoost model, comparing MAE and R 2 metrics. As the data size decreases, the MAE increases and R 2 decreases.
Mathematics 13 02504 g015
Figure 16. Comparison of model performance trends with varying data size on the HackerEarth dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and R 2 as metrics. As the data size decreases, MAE increases and R 2 decreases. (b) This subfigure shows the performance trends of the XGBoost model, comparing the MAE and R 2 metrics. A similar trend is observed, with MAE increasing and R 2 decreasing as the data size decreases.
Figure 16. Comparison of model performance trends with varying data size on the HackerEarth dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and R 2 as metrics. As the data size decreases, MAE increases and R 2 decreases. (b) This subfigure shows the performance trends of the XGBoost model, comparing the MAE and R 2 metrics. A similar trend is observed, with MAE increasing and R 2 decreasing as the data size decreases.
Mathematics 13 02504 g016
Table 1. Definitions and clarifications of mathematical symbols.
Table 1. Definitions and clarifications of mathematical symbols.
SymbolDimensionDescription
N1Total number of network freight orders.
P1Number of features in a network freight order.
C1Total number of prices in network freight orders.
t i c 1The value is 1 when the i-th network freight order is predicted to have the c-th price; otherwise, it is 0.
t ^ i c 1The predicted probability of the i-th network freight order belonging to the c-th price.
y i 1Real price of the i-th network freight order.
y ^ i 1Predicted price of the i-th network freight order by the model.
q i 1The i-th fundamental element of the token-sequence output.
X P × N Network freight order feature data.
YNNetwork freight order price data.
Table 2. Linear model performance varies with the number of features most relevant to price on MathorCup dataset.
Table 2. Linear model performance varies with the number of features most relevant to price on MathorCup dataset.
Top-KMAEMSEMAPE R 2
1163.525153,3770.1820990.983775
2164.559154,0450.1814160.983704
3147.086150,1130.1286770.984120
4147.086150,1130.1286770.984120
5148.388151,2840.1352700.983996
6148.352151,2760.1350850.983997
7148.729151,2760.1357380.983997
8148.567151,1090.1358000.984015
9148.547151,0860.1358170.984017
all161.084136,7840.1653280.985530
Red: Number of features modeled.
Table 3. Linear model performance varies with the number of features most relevant to price on HackerEarth dataset.
Table 3. Linear model performance varies with the number of features most relevant to price on HackerEarth dataset.
Top-KMAEMSEMAPE R 2
10.4400.3190.0742910.329808
20.3310.2160.0555150.545345
30.3140.1800.0526730.620787
40.2470.1170.0417680.754109
50.2210.0970.0377020.796323
60.2190.0960.0374340.798213
70.2190.0940.0373250.802006
80.2160.0920.0368230.807407
90.2120.0890.0362170.811891
all0.1920.0790.0326870.833838
Red: Number of features modeled.
Table 4. Overview of dataset key features.
Table 4. Overview of dataset key features.
DatasetFeature Name
MCTotal Mileage
Driving Minutes Before Planned Arrival
Unloading Minutes Planned
Transaction Runtime Minutes
Available Minutes Before Actual Docking
Available Minutes Before Actual Departure
Minutes Before Actual Completion
Price Adjustment Review Minutes
Available Minutes Before Planned Docking
Available Minutes Before Planned Departure
Packaging Type
Transport Grade
Price Adjustment Type
Price Adjustment Urgency
Transaction Counterparty
Region
Demand Urgency
Within or Outside Province
HEArtist Reputation
Height
Width
Weight
Price Of Sculpture
Base Shipping Price
Area
Combined Price
International
Express Shipment
Installation Included
CODistance
Vehicle length
Vehicle type
Cargo weight
Cargo volume
Pick-up province
Unloading province
Transportation time
Holiday
Month
Price
Table 5. Overview of dataset sizes.
Table 5. Overview of dataset sizes.
DatasetDescriptionCount
MCTotal Raw Entries16,016
Raw Feature Dimensions63
Total Preprocessed Entries13,615
HETotal Raw Entries6500
Raw Feature Dimensions20
Total Preprocessed Entries4001
COTotal Raw Entries36,000
Raw Feature Dimensions14
Total Preprocessed Entries33,195
Table 6. Regional freight trends in MathorCup.
Table 6. Regional freight trends in MathorCup.
RegionPrice MeanPrice Variance
1412.221919.98
2152.88984.12
3491.0380,545.38
411,871.471,985,854.00
51188.1829,320.30
6730.3519,950.27
Table 7. Transport freight trends in HackerEarth.
Table 7. Transport freight trends in HackerEarth.
TransportCost MeanCost Variance
Airways25,138.71 7.960766 × 10 10
Roadways16,321.61 3.062830 × 10 10
Waterways9187.90 1.415873 × 10 10
Table 8. Library versions.
Table 8. Library versions.
Library NameCurrent Version
transformers≥4.41.2
datasets≥2.16.0
accelerate≥0.30.1
peft≥0.11.1
trl≥0.8.6
torchmetrics0.11.4
Table 9. Performance of compared models in full data.
Table 9. Performance of compared models in full data.
DatasetModelMAEMSEMAPE R 2
MCXGBoost12.03668680.004530.999261
LightGBM20.54395250.010800.998975
RF15.42374090.007300.999203
GBDT13.29986700.009600.999067
ExtraTrees12.78294390.004380.998984
DT24.44811,8730.017040.998723
DNNs203.973165,7060.329180.982179
PLS190.998173,4390.267720.981348
Lasso155.075133,6310.154500.985629
Ridge152.920128,5100.163070.986179
ElasticNet145.528146,9880.147490.984192
SVR1158.88610,306,1380.63346−0.108338
KR153.193128,5840.164400.986171
PCA137.174146,2940.138500.984267
BR152.512129,6210.161050.986060
HEXGBoost128.068113,0790.235810.523815
LightGBM145.901140,3770.265840.408859
RF112.43786,5680.221280.635453
GBDT98.20477,9880.173980.671584
ExtraTrees137.114117,3200.269620.505957
DT160.048144,6920.310060.390689
DNNs250.959264,7630.50535−0.114936
PLS168.751141,0210.337300.406149
Lasso186.399146,2950.417600.383940
Ridge93.68258,3090.177470.754455
ElasticNet191.285153,6730.430780.352873
SVR252.096250,2540.59910−0.053837
KR93.35157,8230.177240.756500
PCA93.63458,3500.177380.754284
BR93.90858,7540.176930.752582
COXGBoost441.1961,119,8090.183620.885241
LightGBM584.2931,153,2420.265710.881815
RF785.0431,935,7860.348830.801619
GBDT440.8451,251,2950.177280.871766
ExtraTrees740.4131,577,9080.362910.838295
DT610.4681,585,5970.251340.837507
DNNs539.2121,083,4970.217910.888962
PLS970.8932,472,1200.482660.746655
Lasso828.0101,913,6040.411440.803892
Ridge827.1111,910,4700.410820.804214
ElasticNet891.7452,254,5880.417890.768948
SVR1723.15310,142,0730.61695−0.039364
KR827.0941,910,4850.410800.804212
PCA899.5782,244,2670.426560.770006
BR828.0791,912,6110.411690.803994
Red: The best performance in each dataset.
Table 10. Training parameters for cross entropy loss experiments.
Table 10. Training parameters for cross entropy loss experiments.
DatasetModelLoss FunctionEpochBatch SizeLearning Rate
MC;COGLMCross Entropy51 5.0 × 10 5
QwenCross Entropy51 5.0 × 10 5
LlamaCross Entropy51 5.0 × 10 5
HEGLMCross Entropy151 5.0 × 10 5
QwenCross Entropy151 5.0 × 10 5
LlamaCross Entropy151 5.0 × 10 5
Table 11. Performance of cross entropy loss in MathorCup full data.
Table 11. Performance of cross entropy loss in MathorCup full data.
SortSerializationModelMAEMSEMAPE R 2
ISNFSGLM9.25085900.004260.999076
NFSLlama17.92965,7750.006660.992926
NFSQwen25.76636,0250.014090.996125
LTSGLM11.50214,2210.005090.998470
LTSLlama11.99012,6710.005680.998637
LTSQwen20.67698,2050.006030.989438
VOSGLM8.55076160.003970.999180
VOSLlama12.21512,7860.004490.998624
VOSQwen11.51497910.006840.998946
JSGLM11.59711,8360.004340.998727
JSLlama13.38813,6410.005470.998532
JSQwen14.24711,2950.007010.998785
FSNFSGLM12.651153030.005290.998354
NFSLlama14.42923,6750.005330.997453
NFSQwen86.6174,523,0430.015370.513584
LTSGLM13.21117,9170.005330.998073
LTSLlama10.73112,3810.005030.998668
LTSQwen18.69254,9180.008390.994093
VOSGLM13.03846,6140.006250.994986
VOSLlama17.13361,1150.004700.993427
VOSQwen14.94017,6580.004990.998100
JSGLM11.35512,4910.005130.998656
JSLlama13.46815,0490.005900.998381
JSQwen12.26992610.007380.999004
DSNFSGLM12.43914,4180.004520.998449
NFSLlama13.40415,8700.007550.998293
NFSQwen34.860113,8840.015870.987752
LTSGLM11.04011,4490.004860.998768
LTSLlama10.10810,8730.004800.998830
LTSQwen19.40757,8030.008220.993783
VOSGLM12.63114,3890.005920.998452
VOSLlama11.07813,2560.003780.998574
VOSQwen13.28398760.006250.998937
JSGLM12.34114,0000.004880.998494
JSLlama13.01914,4450.005580.998446
JSQwen12.18874750.006410.999196
Red: The best performance.
Table 12. Performance of cross entropy loss in HackerEarth full data.
Table 12. Performance of cross entropy loss in HackerEarth full data.
SortSerializationModelMAEMSEMAPE R 2
ISNFSGLM52.01941,0910.091270.826960
NFSLlama67.51357,0630.113940.759702
NFSQwen59.80634,4080.107460.855104
LTSGLM53.89342,1870.093050.822344
LTSLlama71.83056,0080.127630.764143
LTSQwen53.55938,3760.095430.838396
VOSGLM52.56628,6800.101730.879224
VOSLlama72.93157,6780.119170.757113
VOSQwen74.74452,1150.136030.780540
JSGLM58.54144,2430.101670.813687
JSLlama72.08357,5590.123780.757614
JSQwen67.47251,6810.118610.782366
FSNFSGLM55.75533,3360.097950.859617
NFSLlama67.78638,2240.120800.839035
NFSQwen62.15938,5740.112920.837561
LTSGLM59.50141,4660.106770.825382
LTSLlama64.94443,9550.119120.814902
LTSQwen56.22239,0770.099850.835442
VOSGLM54.57645,1730.095280.809773
VOSLlama76.24457,9260.124910.756067
VOSQwen69.86143,8010.137520.815550
JSGLM56.89948,5280.105230.795644
JSLlama62.98948,2250.106050.796920
JSQwen68.09755,6020.119730.765852
DSNFSGLM54.05326,9510.099800.886506
NFSLlama69.72738,5560.124870.837636
NFSQwen56.67738,9570.105300.835946
LTSGLM59.69649,9230.106930.789769
LTSLlama67.19044,4720.121270.812723
LTSQwen57.87237,5940.106510.841686
VOSGLM55.05540,0020.102200.831546
VOSLlama71.84651,8750.124230.781550
VOSQwen68.01340,6730.138720.828721
JSGLM54.15144,0100.097900.814671
JSLlama76.80264,0020.131110.730479
JSQwen57.21735,2930.111600.851378
Red: The best performance.
Table 13. Performance of cross entropy loss in Company full data.
Table 13. Performance of cross entropy loss in Company full data.
SortSerializationModelMAEMSEMAPE R 2
ISNFSGLM424.0961,241,0600.161550.872815
NFSLlama428.400952,4350.167200.902393
NFSQwen478.1561,097,0890.194590.887569
LTSGLM413.7501,008,2390.174400.896675
LTSLlama436.841975,8760.175880.899991
LTSQwen469.1191,090,1850.186150.888277
VOSGLM548.9221,239,5160.202840.872973
VOSLlama433.555908,9550.165290.906849
VOSQwen460.2341,115,8360.177600.885648
JSGLM426.2742,254,7730.155710.768929
JSLlama430.2361,054,3720.170770.891947
JSQwen465.0821,019,9520.173060.895474
FSNFSGLM411.4671,015,8700.163230.895893
NFSLlama422.903875,6520.161940.910262
NFSQwen483.3891,105,6740.187130.886689
LTSGLM411.299927,9830.163260.904899
LTSLlama421.303905,4360.193470.907210
LTSQwen463.0501,010,1310.187670.896481
VOSGLM418.5861,149,5950.156900.882188
VOSLlama428.686903,8380.176920.907374
VOSQwen445.5621,024,5610.174310.895002
JSGLM452.9476,512,8210.172790.332562
JSLlama415.319890,2550.156960.908766
JSQwen473.2751,062,8040.177710.891083
DSNFSGLM420.6481,033,6760.163130.894068
NFSLlama429.1521,319,6450.175430.864762
NFSQwen470.139963,5860.181470.901251
LTSGLM412.089910,5500.156080.906686
LTSLlama428.334975,6300.182170.900016
LTSQwen474.5521,136,8020.177070.883499
VOSGLM424.5721,102,3540.167260.887030
VOSLlama435.400931,1920.162240.904570
VOSQwen443.729953,1520.170890.902320
JSGLM416.2931,095,4060.161360.887742
JSLlama436.2161,071,5340.189690.890188
JSQwen451.418885,1740.170720.909286
Red: The best performance.
Table 14. Performance of Chain-of-thought in full data.
Table 14. Performance of Chain-of-thought in full data.
DatasetModelMAEMSEMAPE R 2
MCGLM11.65616,8970.004910.998182
Llama18.49927,7450.006500.997016
Qwen28.36046,3770.007940.995012
HEGLM71.78651,5760.128900.782809
Llama77.42646,5790.137740.803850
Qwen79.78055,5580.139910.766040
Blue: The improved performance.
Table 15. Training parameters for MAE loss experiments.
Table 15. Training parameters for MAE loss experiments.
DatasetModelLoss FunctionEpochBatch SizeLearning Rate
MCT5 SmallMAE10028 1 × 10 3
T5 BaseMAE5012 1 × 10 3
T5 LargeMAE504 1 × 10 3
HET5 SmallMAE10028 1 × 10 3
T5 BaseMAE5012 1 × 10 3
T5 LargeMAE504 1 × 10 3
COT5 SmallMAE10028 1 × 10 3
T5 BaseMAE5012 1 × 10 3
T5 LargeMAE504 1 × 10 3
Table 16. Performance of MAE loss in full data.
Table 16. Performance of MAE loss in full data.
DatasetModelMAEMSEMAPE R 2
MCT5 Small41.86328,3570.034670.996950
T5 Base29.69118,7340.020790.997985
T5 Large41.23622,6480.033590.997564
HET5 Small98.63269,0200.165990.709351
T5 Base57.83948,0640.097390.797595
T5 Large61.22951,3200.101770.783886
COT5 Small528.1071,096,3720.197470.887643
T5 Base555.0731,204,9920.211820.876511
T5 Large1566.1617,853,4240.548670.195177
Table 17. Performance of few-shot learning on the MathorCup dataset.
Table 17. Performance of few-shot learning on the MathorCup dataset.
ModelPercentageMAEMSEMAPE R 2 PercentageMAEMSEMAPE R 2
XGBoost10%70.78498,7590.023300.9888851%139.369149,3310.060880.979095
LightGBM86.85859,7530.058220.993275368.219251,8740.835600.964740
RF83.344155,7900.027290.982466150.544211,4530.054230.970399
GBDT92.484202,8080.028450.977175106.575184,3010.041300.974200
ExtraTrees90.302185,7390.021350.979096158.962301,6270.067920.957776
DT106.095232,5910.032010.97382397.666165,6250.035950.976814
DNNs155.538132,5250.219550.985085307.463282,3130.457980.960480
PLS180.958137,7860.245310.984492289.414212,6900.624060.970226
Lasso175.751130,6290.195640.985298363.770317,5820.596320.955542
Ridge183.452132,1950.219160.985122440.042616,7080.686690.913669
ElasticNet139.307122,5060.128340.986212252.995285,6630.293060.960011
SVR1156.8409,865,3960.62953−0.11021011.6387,710,3310.75720−0.0793
KR183.461132,1810.219130.985123439.360614,4490.686510.913985
PCA141.844125,6730.157270.985856366.601428,6770.599120.939991
BR141.129130,1590.137680.985351248.203232,9800.335620.967385
LLM (Ours)68.068131,1540.023080.98523935.40771550.053940.998998
XGBoost0.5%201.99378,3390.179180.9877610.25%284.638496,1710.052780.960573
LightGBM1664.2693,928,4433.531320.3863072467.62812,584,8743.34101 1.1 × 10 5
RF253.687245,8580.141440.961592136.09253,5500.100060.995744
GBDT173.06052,6910.158130.991768289.647359,2990.107780.971449
ExtraTrees248.856357,1130.118440.944212131.77889,3770.035650.992897
DT113.50028,5890.125180.99553361.57115,2050.064990.998791
DNNs489.202357,3060.587560.944182602.336865,7780.572390.931204
PLS344.668163,8200.470480.9744083580.10974,752,7208.57418−4.9399
Lasso1000.2752,101,8860.802420.6716473347.12870,096,3218.15408−4.5699
Ridge504.707902,2970.317290.8590453819.30385,402,8089.08876−5.7862
ElasticNet215.798138,5510.185790.9783551464.08011,572,6473.355740.080421
SVR1237.2967,781,3080.70576−0.21551652.18014,885,2080.51258−0.1827
KR508.501885,5260.323180.8616643224.04859,189,3527.60270−3.7032
PCA414.288469,8190.370860.9266056241.154234,392,46814.9256−17.625
BR211.273108,6290.219440.983030256.008201,1550.1644710.984015
LLM (Ours)98.07124,0800.101840.99623884.42825,5850.047200.997966
Red: The best performance.
Table 18. Performance of few-shot learning on the HackerEarth dataset.
Table 18. Performance of few-shot learning on the HackerEarth dataset.
ModelPercentageMAEMSEMAPE R 2 PercentageMAEMSEMAPE R 2
XGBoost40%126.17673,3030.245370.5712014%119.05741,2240.248150.742702
LightGBM148.04498,8160.283310.421957187.292119,9590.398600.251286
RF117.71062,2820.235090.635664107.81636,8470.214310.770021
GBDT102.81454,7220.196220.679893105.95827,9330.224330.825658
ExtraTrees142.85184,3440.2888070.506611128.90452,9210.279500.669698
DecisionTree173.449103,8320.355350.392615172.46577,1730.363610.518335
DNNs247.359196,2430.502789−0.1479303.393242,0210.54808−0.5105
PLS177.502112,8210.3438270.340028141.10856,3090.337160.648554
Lasso193.585118,2500.411480.308272156.42961,5070.370250.616106
Ridge104.91854,8370.1988910.679221178.76472,8060.509230.545585
ElasticNet197.999123,8490.4198780.275518144.50654,2240.350560.661566
SVR246.484183,4690.577319−0.0732241.913168,0150.59739−0.0486
KR104.25554,1760.197340.683084126.55829,1470.382270.818079
PCA103.99154,9630.196070.678478202.14491,3700.579420.429723
BR105.61955,7300.198410.673996124.04141,4820.284120.741092
LLM (Ours)86.09945,6820.166850.732771117.40828,0930.306430.824656
XGBoost2%129.29829,6480.297290.5340751%153.37744,4180.315970.524929
LightGBM214.82279,0570.52405−0.2423264.698120,6030.61248−0.2898
RF136.02030,3610.290620.522876153.40853,7520.292380.425093
GBDT127.30326,4470.288650.584383142.82337,7190.291440.596572
ExtraTrees102.49515,2440.317060.760436178.96467,2270.380040.280976
DT164.15453,4730.376660.159669171.69962,6320.339320.330122
DNNs315.454135,3050.85028−1.1262431.35135,649,7493.29146−380.28
PLS135.98637,0470.355770.417798190.64287,6390.444650.062662
Lasso119.14026,1060.321960.5897371019.4435,725,9841.46376−60.241
Ridge260.600202,8730.57028−2.1883478.88287,166,0064.39456−931.27
ElasticNet136.57632,5600.378100.488309656.8781,900,9991.002837−19.331
SVR214.27278,2940.52741−0.2303264.698119,2220.620743−0.2751
KR504.2051,346,7400.77187−20.163259.406103,7910.69181−0.1100
PCA340.676366,5700.70031−4.76063656.36096,735,3114.61560−1033.6
BR98.14216,8340.307620.735450532.1411,024,4220.87501−9.9565
LLM (Ours)86.89413,5510.231430.787032158.14149,1530.445220.474282
Red: The best performance.
Table 19. Performance of few-shot learning on the Company dataset.
Table 19. Performance of few-shot learning on the Company dataset.
ModelPercentageMAEMSEMAPE R 2 PercentageMAEMSEMAPE R 2
XGBoost10%678.2742,602,3081.543460.7045061%828.1132,804,9420.243490.770074
LightGBM644.1731,942,5561.492650.7794211073.3303,020,6990.4181530.752388
RF807.4542,419,3221.912150.725284966.8442,803,0070.292420.770233
GBDT665.1411,811,6651.621760.7942841086.8144,011,9880.280380.671131
ExtraTrees763.6532,077,4731.658220.764101934.7322,286,2980.3185630.812588
DecisionTree739.6381,975,8861.526800.7756361209.9636,212,1700.273610.490778
DNNs693.4451,794,2951.641940.796256918.3702,716,4670.263140.777326
PLS1038.6423,214,3621.880710.6350071277.4553,251,5900.566410.733462
Lasso914.1113,108,8011.634720.6469931307.2153,163,5800.635110.740676
Ridge907.8402,935,3921.632530.6666841328.7833,199,0600.660230.737768
ElasticNet892.5802,666,3831.657560.6972301156.1123,009,0260.435640.753345
SVR1827.1579,737,5102.33241−0.10572196.53313,666,3640.67648−0.1202
KR907.9002,935,5351.632580.6666681328.8023,198,8360.6602280.737786
PCA907.8492,701,1521.520740.6932821237.6123,048,1200.4842690.750140
BR881.1332,328,9541.602960.7355451130.3643,105,3420.410880.745450
LLM (Ours)651.6282,064,2811.609700.765599831.672,282,0210.251930.812939
XGBoost0.5%649.852799,6030.317930.7060640.1%5246.54464,223,1220.46810−0.1337
LightGBM1042.7822,016,0120.531120.2589106689.91277,861,4141.15371−0.3745
RF716.388853,7570.507410.6861575179.32558,892,6960.54641−0.0396
GBDT644.161759,3830.318770.7208495342.48064,803,9630.44945−0.1440
ExtraTrees595.826624,8550.427090.7703025999.52170,046,5140.73166−0.2365
DT663.657809,2780.319510.7025076323.36183,481,8760.55840−0.4737
DNNs609.241749,2260.281720.7245834457.25952,756,0220.407660.068663
PLS1031.9911,477,8250.700060.4567484990.56954,251,9330.646730.042254
Lasso921.3771,309,3470.449300.5186814956.70859,723,1690.46992−0.0543
Ridge893.6421,244,7100.435410.542444203.28848,164,6880.366320.149717
ElasticNet958.9831,392,3430.527430.4881723645.38846,408,2840.369840.180723
SVR1344.8582,878,7920.81348−0.05826821.21889,557,8790.847606−0.5810
KR889.8291,231,5000.434860.5472984980.95254,407,1690.525800.039514
PCA1099.0881,748,8110.630810.3571344220.39548,083,0410.432000.151158
BR811.6571,144,5870.407860.5792473936.47846,725,1000.407740.175131
LLM (Ours)709.941902,3330.338830.6683003643.36141,174,5890.396810.273117
Red: The best performance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, P.; Zhang, P.; Wu, J.; Wu, X.; Mao, Y.; Liu, T. Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms. Mathematics 2025, 13, 2504. https://doi.org/10.3390/math13152504

AMA Style

Lu P, Zhang P, Wu J, Wu X, Mao Y, Liu T. Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms. Mathematics. 2025; 13(15):2504. https://doi.org/10.3390/math13152504

Chicago/Turabian Style

Lu, Pengfei, Ping Zhang, Jun Wu, Xia Wu, Yunsheng Mao, and Tao Liu. 2025. "Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms" Mathematics 13, no. 15: 2504. https://doi.org/10.3390/math13152504

APA Style

Lu, P., Zhang, P., Wu, J., Wu, X., Mao, Y., & Liu, T. (2025). Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms. Mathematics, 13(15), 2504. https://doi.org/10.3390/math13152504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop