Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms

Lu, Pengfei; Zhang, Ping; Wu, Jun; Wu, Xia; Mao, Yunsheng; Liu, Tao

doi:10.3390/math13152504

Open AccessArticle

Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms

by

Pengfei Lu

,

Ping Zhang

^*,

Jun Wu

,

Xia Wu

,

Yunsheng Mao

and

Tao Liu

School of Computer and Information, Anhui Polytechnic University, Wuhu 241000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2504; https://doi.org/10.3390/math13152504

Submission received: 30 June 2025 / Revised: 28 July 2025 / Accepted: 1 August 2025 / Published: 4 August 2025

(This article belongs to the Special Issue New Advances in Combinatorial Multi-Objective Optimization and Computational Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Various factors influence the formation and adjustment of network freight prices, including transportation costs, cargo characteristics, and policies and regulations. The interaction of these factors increases the difficulty of accurately predicting network freight prices through regressions or other machine learning models, especially when the amount and quality of training data are limited. This paper introduces large language models (LLMs) to predict network freight prices using their inherent prior knowledge. Different data sorting methods and serialization strategies are employed to construct the corpora of LLMs, which are then tested on multiple base models. A few-shot sample dataset is constructed to test the performance of models under insufficient information. The Chain of Thought (CoT) is employed to construct a corpus that demonstrates the reasoning process in freight price prediction. Cross entropy loss with LoRA fine-tuning and cosine annealing learning rate adjustment, and Mean Absolute Error (MAE) loss with full fine-tuning and OneCycle learning rate adjustment to train the models, respectively, are used. The experimental results demonstrate that LLMs are better than or competitive with the best comparison model. Tests on a few-shot dataset demonstrate that LLMs outperform most comparison models in performance. This method provides a new reference for predicting network freight prices.

Keywords:

network freight; price prediction; LLMs; few-shot learning; transfer learning

MSC:

68T07; 90B06; 68T50

1. Introduction

Network freight [1] prices are affected by many factors, such as distance, vehicle length, model, type of cargo, fuel prices, and more. Standard billing methods include distance-based billing, weight-based billing, transit time-based billing, or mixed billing. Different types of cargo have different vehicle requirements. For example, high-rail or low-rail trucks are selected based on the height of the cargo, while vans are selected for cargo with higher sealing requirements. In addition, policies and regulations impact freight prices by increasing operational costs through taxes, tariffs, and compliance requirements or by reducing costs through trade agreements and infrastructure investments. The interaction of these factors increases the complexity of network freight price forecasting.

Network freight price prediction is a tabular data forecasting problem [2] that estimates freight prices based on various features, such as distance, cargo weight, delivery time, and other logistical factors. Ensemble Tree models are well suited for this problem, as they can model continuous output variables. However, the performance of these models is limited by the quality of the training data. Additionally, these models tend to overfit noisy data and are not well suited for small samples. Other advanced models are also a research direction, such as graph neural networks. However, graph neural networks excel at processing graph structure data between nodes and edges, but struggle with processing tabular data.

Large language models (LLMs) utilize their prior knowledge to address the issue of overfitting in noisy and small-sample cases. They are also better at processing tabular data by converting it into a text format. This paper proposes an approach to price prediction in the network freight domain using pre-trained LLMs. Fine-tuning [3] the base model allows it to interpret new data patterns emerging in the network freight market, thereby improving the accuracy and efficiency of forecasts.

Three data sorting [4] methods (initial sort, feature-based sort, and distance-based sort) and four data serialization [5] methods (Named Feature Sequence, Value Only Sequence, List Temp Sequence, and Json Sequence) are used in this paper to build a diversified LLMs corpus and perform multiple tests on base models. These methods enhance the data structure and format of model training, and are used to verify the specific impact of different data formats on model performance, thereby providing an experimental basis for an in-depth understanding of the relationship between data format and model performance.

The few-shot learning performance of LLMs is investigated in this paper by constructing different training sample sets. Our experiments evaluate how LLMs perform under varying amounts of training data, with a particular emphasis on their ability to generalize from limited examples.

2. Related Work

2.1. Network Freight Price Prediction

Regression prediction task: This type of problem involves predicting order prices based on the characteristics of a network freight order. Standard methods include a tree ensemble model, a neural network, and a linear model. Examples of the neural network method include Budak et al. [6], who investigate the price forecasting of the truckload spot market from the truckers’ perspective, considering a comprehensive set of variables, and LI et al. [7], who design a transaction pricing prediction model of a network freight platform based on dual LSTM and predict the results by K-means cluster analysis. Examples of the tree ensemble model method include Jang et al. [8], who identify the key variables influencing the determination of shipping costs and propose a recommended shipping cost derived from a price prediction model based on machine learning techniques, and Jingwei Guo et al. [9], who integrate the cargo floating price prediction model with the neural network algorithm (NNA) to develop a predictive framework. Examples of the linear model method include Macarringue et al. [10], who enhance the understanding of the variables influencing road freight costs by developing a road freight prediction system based on a multiple linear regression model, employing variable selection techniques such as Stepwise, Forward, and Backward elimination. In addition, Spreeuwenberg [11] employs best subset regression for feature selection and builds a multiple linear regression model to forecast freight prices in Poland, and Lindsey et al. [12] focus on predicting truckload freight rates in spot markets using linear regression models based on shipment- and lane-level data. This method identifies key cost-driving factors and builds predictive models to support more accurate freight rate estimation and lane performance analysis.

Time series prediction task: This problem is studied to predict the time index related to logistics prices, which changes with time. Kjeldsberg et al. [13] examine the factors influencing PSV time charter freight rates and investigate the use of AutoML modeling to capture nonlinearities in forecasting PSV freight rates over out-of-sample horizons of 1, 3, and 6 months. Bae et al. [14] provide valuable information to stakeholders by forecasting the tramp shipping market. Koyuncu et al. [15] explore different time-series models related to the Shanghai Containerized Freight Index.

Pricing strategy from the perspective of game theory: This type of problem research uses pricing games to improve the efficiency and stability of the network freight market. Tamannaei et al. [16] investigate a competitive freight transportation pricing problem involving two Intermodal Service Providers (ISPs) and a Direct Transportation System (DTS). Dimitriou et al. [17] develop a methodological framework and apply it to a real-world system, integrating the concept of pricing differentiation among competing container port facilities.

2.2. The Application of LLMs in Task Prediction

Regression prediction task: LLMs are primarily developed for natural language understanding and generation. Their potential applications in regression prediction tasks are being explored. Requeima et al. [4] construct a regression model to process numerical data and generate probabilistic predictions at arbitrary locations, leveraging natural language text to incorporate the user’s prior knowledge. Dinh et al. [18] propose Language Interface Fine-Tuning (LIFT) and study its effectiveness and limitations through extensive empirical studies on non-linguistic classification and regression tasks. Song et al. [19] propose OmniPred, a framework for training language models as universal end-to-end regressors designed to evaluate data from diverse real-world experiments. Rubungo et al. [20] propose LLM-Prop, which leverages the general learning ability of LLMs to predict the physical and electronic properties of crystals from text descriptions.

Time series prediction task: Due to LLMs’ robust context understanding and long-range dependence modeling capabilities, some scholars have studied their application to time series prediction tasks. Jin et al. [21] introduce a reprogramming framework that repurposes LLMs for general time series forecasting while preserving the integrity of the underlying language models. Xue et al. [22] propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). Zhou et al. [23] tackle the challenge of insufficient training data by utilizing language or computer vision models, pre-trained on billions of tokens, for time series analysis. Jia et al. [24] introduce GPT4MTS, a simultaneous prompt-based large language model (LLM) framework designed to leverage numerical data and textual information.

Few-shot learning: Because LLMs are pre-trained on vast amounts of data, they possess prior knowledge that gives them an advantage in few-shot learning scenarios. Hegselmann et al. [5] propose TabLLM to study the application of LLMs in zero-shot and few-shot classification of tabular data. Perez et al. [25] evaluate the few-shot ability of LLMs when such held-out examples are unavailable. Chen et al. [26] explore the potential of transformers to enhance clinical prediction performance relative to traditional machine learning methods while also addressing the challenge of few-shot learning in predicting rare disease areas.

2.3. Text Analytics

Moreno and Redondo [27] explore the integration of text analytics with big data and artificial intelligence, focusing on the application and challenges of text analytics technology in processing massive amounts of unstructured data. They review the primary technologies of text analytics, including information extraction, named entity recognition, topic detection, and sentiment analysis, and highlight the role of machine learning and deep learning in enhancing the results of text analytics. Yuan et al. [28] present a novel text analytics framework using Domain-Constraint Latent Dirichlet Allocation (DC-LDA) to predict crowdfunding success. By extracting latent semantic features from project descriptions, the framework outperforms traditional keyword-based approaches in identifying factors influencing funding outcomes. Gandomi and Haider [29] highlight text analytics as a key method for extracting valuable insights from unstructured data in big data environments. They discuss techniques such as information extraction, text summarization, and sentiment analysis, which are essential for converting large volumes of text into structured, actionable knowledge. These text analytics methods enable businesses to make evidence-based decisions and enhance their predictive capabilities by analyzing data from sources such as social media, customer reviews, and corporate documents.

3. Problem Formulation

Network freight comprises four main entities: shippers, network freight platforms, carriers, and consignees. The shipper publishes the transportation demand through the platform. After the carrier receives the supply order, it confirms the freight, collection, and delivery times, as well as the cargo and other detailed information with the shipper and the platform, and then signs the transportation contract. The shipper pays the freight through the platform, and the platform pays the freight to the carrier after confirming the delivery of the cargo. The transportation process is completed after the consignee confirms receipt of the cargo. As a hub, the network freight platform facilitates the collaboration of all parties and provides price estimation services. The relation is illustrated in Figure 1.

Given a set of network freight data features, the goal is to predict the network freight price for each order. The input to the problem is represented by a feature matrix,

X = [x_{1}, x_{2}, \dots, x_{N}] \in R^{P \times N}

, where each

x_{i} \in R^{P}

is a feature vector corresponding to the i-th order, P is the number of features, and N is the total number of network freight orders. The output is denoted as

Y = [y_{1}, y_{2}, \dots, y_{N}] \in R^{N}

, where

y_{i}

represents the price associated with the i-th order. The objective is to develop a predictive model that, given the features,

x_{i}

, predicts the corresponding price,

y_{i}

. The descriptions of all symbols can be found in Table 1.

4. Methodology

4.1. Framework

As illustrated in Figure 2, LLMs’ corpus construction involves data serialization and data sorting. Fine-tune the pre-trained model, which is a transformer-based [30] architecture using processed network freight data. The training strategies are based on GLM4-9b-chat [31], Qwen2.5-7B-instruct [32], Llama3-8b-instruct [33], and T5 [34], with LoRA [35] fine-tuning and Full fine-tuning applied using cross entropy loss and MAE loss. The knowledge acquired by the pre-trained model is transferred to the fine-tuning stage. Through fine-tuning via transfer learning [36], the model assimilates knowledge of network freight and adapts to network freight price prediction tasks, even when the target task has limited data.

4.2. Data Serialization

This paper uses the following four data serialization [5] methods (Named Feature Sequence, Value Only Sequence, List Temp Sequence, and Json Sequence). See Figure 3 for specific examples.

Named Feature Sequence (NFS)
The input data combines feature names, feature values, and commas. Each table’s feature names and corresponding values are concatenated into key-value pairs, separated by commas. The sequence of key-value pairs forms the training samples for the LLMs.
Value Only Sequence (VOS)
A space separates all the feature values of the network freight data.
List Temp Sequence (LTS)
The input data combines feature name, feature value, and line feed. The feature names of each table data item and corresponding feature values are concatenated into key-value pairs, separated by newlines.
Json Sequence (JS)
The input data is constructed in JSON format, consisting of a set of key-value pairs (feature name, feature value) enclosed in curly braces. The key is of the string type and is enclosed in double quotation marks (“key”). Commas separate the key-value pairs.

4.3. Data Sorting

This paper uses the following three data sort [4] methods: initial sort, feature-based sort, and distance-based sort.

Initial sort (IS)
The order of the raw data follows its initial sort shown in Figure 4.
Feature-based sort (FS)
As illustrated in Figure 5, this paper selected the three features most relevant to price. The sorting principle is as follows: primarily by the first feature, secondarily by the second feature if the first is equal, and finally by the third feature if the second is also equal.
Distance-based sort (DS)
As illustrated in Figure 6, the sum of the feature differences between the target and all other points is calculated as the distance. The closer the target point is to the centre of the data cluster, the smaller the calculated distance will be. According to this distance, the data is sorted from smallest to largest.

For a dataset X, every data point,

x_{i}

, consists of features

{x_{i 1}, x_{i 2}, \dots, x_{i P}}

, and the target data is

x^{*} = {x_{1}^{*}, x_{2}^{*}, \dots, x_{N - 1}^{*}}

. For each data point,

x_{i}

, compute the distance to

x_{j}^{*}

as

d_{i j} = \sum_{k = 1}^{P} | x_{i k} - x_{j k}^{*} |

. For each data point,

x_{i}

, define the objective function as

S_{i} = \sum_{j = 1}^{N} d_{i j}

. Sorting by the objective function

S_{i}

for each

x_{i}

.

The initial sort reflects the initial state of the data, the feature-based sort reflects the importance of the data features, and the distance-based sort reflects the distance of the data from the centre of the data cluster.

4.4. Chain-of-Thought

Chain-of-thought (CoT) [37] reduces the complexity of tasks by breaking them down into smaller tasks, gradually guiding LLMs to the correct answer, and improving the reasoning ability of the LLMs.

In this paper, we utilize the features most relevant to freight pricing to train a linear formula for calculating freight prices. The reasoning process of the LLMs’ thought chain is as follows: the basic freight price is calculated through the formula from Step 1 to Step N, and the final freight price is determined by considering other factors. Use “Let’s think step by step” to guide LLMs to reason step by step. Use the REASONING tag to guide the LLMs’ output reasoning process and increase the interpretability of the LLMs’ reasoning. Use the ANS tag to output the final price.

On the MathorCup dataset, the linear model test performance of the top-1 to top-3 features relevant to price gradually improves. When there are more than 3 features, the linear model test performance does not improve significantly. Therefore, 3 features are selected to build a linear model. The linear model performance can be found in Table 2. The price correlation can be found in Section 5.1, Datasets.

MathorCup dataset CoT corpus construction is shown in Equation (1) and Figure 7.

Y = 26.7603 + 4.2793 \times X_{1} + 0.2446 \times X_{2} + 2.0341 \times X_{3}

(1)

On the HackerEarth dataset, a linear model using the top-9 most price-related features achieves an

R^{2}

performance that is 97.3% of that obtained using all features. The remaining features have a low correlation with price. To reduce the length of the LLM’s inference sequence, the top 9 most price-related features are selected. The linear model performance can be found in Table 3. The price correlation can be found in Section 5.1, Datasets.

HackerEarth dataset CoT corpus construction is shown in Equation (2) and Figure 8. Freight cost is a compressed value.

\begin{matrix} Y = 4.1022 + 1.618714 \times X_{1} + 0.000005 \times X_{2} + 0.012749 \times X_{3} & + 0.015024 \times X_{4} + 0.009922 \times X_{5} \\ + 0.013978 \times X_{6} + 0.146301 \times X_{7} & + 0.111457 \times X_{8} + 0.078866 \times X_{9} . \end{matrix}

(2)

4.5. Loss Function

Cross Entropy Loss. Although traditionally used for classification, cross entropy loss can be adapted for regression by discretizing the continuous output space into intervals and treating the task as a classification problem over these intervals.

L = - \sum_{i = 1}^{N} \sum_{k = 1}^{K} t_{i c} log {\hat{t}}_{i c}

(3)

Here,

t_{i c}

denotes the boolean value in price classification c, while

{\hat{t}}_{i c}

represents the predicted probability for that classification.

MAE Loss. It measures the average of the absolute differences between the predicted and actual values, providing a robust performance metric that is not unduly influenced by outliers.

L = \frac{1}{n} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |

(4)

4.6. Fine-Tuning

In resource limitation cases, we utilized LoRA for efficient adaptation, while Full fine-tuning was applied when resources and the model permitted optimal performance. The hyperparameters were manually selected based on the training loss, resulting in a configuration with a low loss, although not zero.

LoRA Fine-Tuning. Suppose

X \in R^{b \times p}

is a matrix representing network freight order data, where b is the batch size and p is the dimensionality of the features. In LoRA [35], the weight matrix, W, is adjusted by adding a low-rank modification,

A B

, to a base weight matrix,

W_{0}

:

W = W_{0} + A B

(5)

Here,

A \in R^{p \times r}

and

B \in R^{r \times n}

are newly introduced low-rank matrices, and r is a rank significantly smaller than m and n, indicating that the rank of

A B

is restricted to a lower level.

The modified weight matrix, W, is used to compute the result, Y:

Y = X W = X (W_{0} + A B)

(6)

This shows that the output, Y, depends not only on the original weights,

W_{0}

, but is also influenced by the low-rank update defined by A and B. Allowing LoRA to finely adjust the model behaviour by adjusting fewer parameters thereby achieves adaptive improvements for network freight prediction tasks. The pseudocode of LoRA fine-tuning is as follows.

for layer in model.layers:

layer.A, layer.B = initialize_low_rank_matrices()

for epoch in range(num_epochs):

for batch in dataset:

loss = compute_loss(model(batch))

loss.backward()

update_low_rank_matrices(layer.A, layer.B)

optimizer.step()

Full Fine-Tuning. All parameters of the pre-trained model are updated during the training process [38]. Let

X \in R^{b \times p}

be the input data matrix. The model weights, W, are initialized from a pre-trained model and directly adjusted during training.

The output, Y, is computed as:

Y = X W

(7)

The pseudocode for Full fine-tuning is as follows.

for epoch in range(num_epochs):

for batch in dataset:

outputs = model(batch)

loss = compute_loss(outputs, price)

loss.backward()

optimizer.step()

4.7. Learning Rate Adjustment

Cosine Annealing. In optimizing the learning strategy, the learning rate is considered a cosine function that varies with the iterations [39]. The equation describes the learning rate,

η_{t}

, at any time t, oscillating between the minimum learning rate,

η_{min}

, and the maximum learning rate,

η_{max}

, over the cycle period

T_{max}

. The learning rate at any given time can be expressed as:

η_{t} = η_{min} + \frac{1}{2} (η_{max} - η_{min}) (1 + cos (\frac{π t}{T_{max}}))

(8)

In this model, the learning rate,

η

, varies between the minimum,

η_{min}

, and the maximum,

η_{max}

, to reflect the nonlinear relationship between learning rate and iterations. The frequency of learning rate variation is determined by the period

T_{max}

. The learning rate example curve for this study is shown in Figure 9.

OneCycle. Fast convergence and local optimality are avoided by adjusting the learning rate in an alternating manner, increasing and decreasing it once [40]. The learning rate example curve is shown in Figure 10.

5. Experiments

5.1. Datasets

In this research, three datasets are employed, and their key characteristics are presented in Table 4. The changes in dataset size are shown in Table 5. The first dataset is from MathorCup (MC) 2020, which focuses on the pricing issues of Vehicle Free Carrier Platform Routes. The MathorCup dataset was provided by the official MathorCup competition organizers and is publicly available through the competition website http://www.mathorcup.org/detail/2294 (accessed on 1 January 2025). The data preprocessing procedures for this dataset are analogous to those detailed in the paper “Prediction model of transaction pricing in internet freight transport platform based on a combination of dual long short-term memory networks”. The second dataset is from HackerEarth (HE) Machine Learning, namely Exhibit A(rt), which focuses on predicting shipping costs for paintings, antiques, sculptures, and other collectables to customers. The Exhibit A(rt) dataset was obtained from HackerEarth, which is an online coding platform and developer assessment software, and is also publicly available at https://www.kaggle.com/datasets/oossiiris/hackerearth-machine-learning-exhibit-art/data (accessed on 1 January 2025). In the HE case, outliers are removed using the Interquartile Range (IQR) method. Since the freight cost span is large, the log function is used to compress the freight cost value. The data preprocessing steps are similar to those described in the kaggle notebook: https://www.kaggle.com/code/rachitjain124/exhibition-art-shipment-cost-prediction (accessed on 1 January 2025). The third dataset is the Company(CO) dataset, which comes from the logistics companies in the cooperation project.

Transport freight trends in MathorCup are shown in Table 6. Region 4’s freight costs are not only much higher on average compared with other regions but also exhibit substantial fluctuations. The trends could be due to various factors, such as specialized or large-scale transportation needs, infrastructure limitations, or regional economic disparities, which result in higher or inconsistent freight prices. Region 5 shows relatively high freight costs. However, the variation in costs here is significantly lower than in Region 4, indicating a more stable freight pricing structure. Region 1 represents a region where freight costs are relatively moderate but still exhibit some level of variability. This phenomenon suggests that, while freight services are generally affordable, there may be occasional fluctuations due to factors such as seasonality, demand spikes, or specific routes. Region 3 suggests that, while the average freight costs are relatively similar to those in Region 1, the costs in this region fluctuate significantly more. This phenomenon may imply that Region 3 experiences certain supply chain disruptions or irregularities in freight pricing. Region 6 indicates a mid-range freight cost structure with moderate fluctuations. This region may have more stable infrastructure and less variability in its freight services compared with other regions. Region 2 reflects a region where freight costs are lower, possibly due to less complex logistics or a well-developed transportation network that drives down costs.

Transport freight trends in HackerEarth are shown in Table 7. Airways have the highest average freight cost with a significant variance, indicating that, while the average cost for air transport is considerably high, prices fluctuate substantially. This result could be due to factors such as the high value of goods being transported, seasonal demand variations, or the reliance on air transport for time-sensitive deliveries, which may incur premium charges. Roadways show a more moderate cost structure compared with airways. Although the average cost is still substantial, the variance is lower, suggesting that freight prices via road transport are relatively more stable. Roadway transport represents a balance between cost and flexibility, accommodating a wide variety of goods and distances. Waterways presents the lowest average freight, along with the smallest variance. The relatively lower cost and more stable variance may indicate that waterways are a cost-effective option for bulk goods over long distances. The reduced volatility suggests fewer market fluctuations, which is likely due to the more predictable nature of shipping schedules and routes.

Missing points heatmaps are shown in Figure 11 and Figure 12. There are many missing values in the features ‘Sub-package number’, ‘Number of installations’, ‘Number of unloading’, and ‘B-side bargaining lowest price’ in the MathorCup dataset. Features such as Transport, Artist Reputation, and Remote Location have missing values in the HackerEarth dataset.

Price correlation analysis [41] is shown in Figure 13 and Figure 14. The features most relevant to price include ‘Total Mileage’, ‘Driving Minutes Before Planned Arrival’, ‘Unloading Minutes Planned’, etc., in the MathorCup dataset. ‘Artist Reputation’, ‘Weight’, and ‘Price Of Sculpture’ are most correlated with price in the HackerEarth dataset.

5.2. Setups

Hardware Environment: The experiment is equipped with a single NVIDIA RTX 4090 or 4090D GPU, an AMD EPYC 9754 128-core Processor, and 60 GB of RAM.

Software Environment: The base models include GLM-4, Qwen2.5, and Llama 3. GLM-4’s multi-dimensional indicators are comparable to those of OpenAI’s GPT-4. Qwen2.5 performs well in understanding and generating structured data. Llama 3 has demonstrated state-of-the-art performance in various industry benchmarks. These models are open source. The fine-tuning frameworks employed are LLaMA-Factory [42] and the Transformers library [30]. Table 8 lists the software versions.

5.3. Evaluation Metrics

The four evaluation metrics employed are as follows.

MAE (Mean Absolute Error)

$MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|$

(9)
MSE (Mean Squared Error)

$MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}$

(10)
MAPE (Mean Absolute Percentage Error)

$MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|$

(11)
$R^{2}$ (Coefficient of Determination)

$R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}$

(12)

5.4. Performance

5.4.1. Compared Models Experiments in Full Data

The comparison encompasses various types of models, including tree ensemble methods (such as XGBoost, LightGBM, Random Forest (RF), and DecisionTree (DT)), deep learning approaches (e.g., deep neural networks), and classical linear models (e.g., PLS, Lasso, Ridge, Kernel Ridge (KR), PCA Regression (PCA), and Bayesian Ridge (BR)). The performance of the compared models is depicted in Table 9.

5.4.2. Cross Entropy Loss and LoRA Fine-Tuning Experiments in Full Data

The cross entropy loss experiments’ parameters are shown in Table 10. The learning rate strategy is cosine annealing. The performance results achieved by utilizing different serialization approaches and sorting methods are presented in Table 11, Table 12 and Table 13.

5.4.3. Chain-of-Thought Experiments

The Chain-of-thought experiments configuration is as follows: The data serialization is Named Feature Sequence. Data Sort is the initial sort method. The loss function is cross entropy loss. The experimental results are shown in Table 14.

5.4.4. MAE Loss Experiments in Full Data

The Value Only Sequence was chosen for data serialization due to its lower hardware resource usage. Data Sort was used as the initial sort method. The learning rate strategy is OneCycle. The code is referenced from the open-source code [20]. The MAE loss experiments’ training parameters are shown in Table 15. The performance results are presented in Table 16 [41].

5.4.5. Few-Shot Learning Experiments

This paper explored the model’s performance using few-shot learning with varying data percentages across three datasets, as shown in Table 17, Table 18 and Table 19. The datasets were divided into different percentages of the total data, ranging from 40% to as low as 0.25%. Visualization trends are shown in Figure 15 and Figure 16. This paper measured the model’s performance for each dataset based on the number of training samples available for each setting.

GLM was selected as the base model for the large language mode(LLM), and the Named Feature Sequence was chosen for data serialization. Data Sort was used as the initial sort method.

6. Discussion

6.1. Results Analysis

Analysis of Full Data Learning: On the MathorCup dataset, LLMs achieved the best MAE and MAPE to the baseline models. Notably, LLMs achieved the best MAE, MSE, MAPE, and

R^{2}

to the baseline models on the HackEarth and Company datasets. This performance demonstrates that LLMs are superior to the linear models and compete with the tree-integrated models, the most potent regression prediction model, in network freight price prediction.

Analysis of Few-Shot Learning: LLMs outperformed most comparison models on both datasets in a small number of training samples. This phenomenon demonstrates that LLMs utilize prior knowledge to enhance prediction performance in cases involving small sample sizes.

Analysis of Data Serialization: The Named Feature Sequence achieved the maximum number of top performances across four evaluation metrics on two datasets. Named Feature Sequence presents key-value pairs in the form of concise named statements. This representation method provides clear semantic information for the model, making the relationship between features more intuitive and identifiable, and improving the model’s ability to understand data features. The sequence combines input features with values to clarify the contextual relationship of each feature, which enables the model to better understand the actual meaning of each data point and its impact on prediction results. The corpus constructed by the Named Feature Sequence aligns with human language habits and explicitly expresses features and semantics, thereby improving the model’s prediction performance.

Analysis of Data Sorting: The distance-based sort performs best in most cases. The closer the sample is to the “centre” of the data distribution, the more representative it is. Distance-based sort enables LLMs to learn from core samples first, recognize the mainstream pattern, and mitigate the interference of outliers.

Analysis of CoT: Cot prompt can improve MSE, MAPE, and

R^{2}

performance metrics in some cases.

6.2. Interpretability

LLMs excel in automatically extracting features from raw tabular data, such as identifying cargo types and discerning market trends, thanks to their ability to perform automated feature engineering. They also incorporate extensive prior knowledge, including historical price patterns and industry trends, and utilize large parameter sets to model complex, high-dimensional relationships. The synergistic effect of using these three factors enhances the predictive accuracy of these models.

6.3. Limitations

Fine-tuning pre-trained LLMs requires more hardware resources and training time, which increases the usage threshold for large-scale datasets. The dataset biases, such as uneven value distributions, could influence the model’s performance in different scenarios.

7. Conclusions

This paper converts tabular data into textual language, builds a diverse corpus of LLMs, and implements network freight price prediction by fine-tuning open-source LLMs with private deployment. Experimental evaluations on three distinct datasets have demonstrated that the LLMs perform better than or comparable to established tree ensemble models. The findings from this study not only broaden the applications of LLMs in predicting freight prices but also provide a reference for employing this method in other tabular data prediction fields.

The main contributions of this paper are as follows: 1. For the first time, LLMs are applied to the domain of network freight price prediction, providing a more reliable decision-support tool for the logistics industry, as per the authors’ investigation. 2. LLMs are capable of identifying complex associations in network freight price prediction by integrating prior knowledge to achieve more accurate price prediction. 3. LLMs showed superior performance among compared models in scenarios involving small sample sizes in most cases.

The limitations of this method are as follows: 1. Fine-tuning LLMs takes longer to train than tree models. Compared with tree models, fine-tuning LLMs requires GPU(s), as well as additional storage resources. 2. Model tuning and hyperparameter selection are complex and challenging in the process of fine-tuning LLMs.

The suggestions for future research are as follows: 1. Research methods for efficient parameter fine-tuning of LLMs to reduce computing resource usage and speed up the fine-tuning process. 2. Research more efficient methods for searching fine-tuning LLMs’ hyperparameters.

Author Contributions

Conceptualization, P.Z.; methodology, P.L.; software, P.L. and J.W.; validation, P.L.; formal analysis, P.L.; investigation, P.L.; resources, P.Z. and T.L.; data curation, P.L.; writing—original draft preparation, P.L.; writing—review and editing, P.L., P.Z., J.W., X.W., Y.M. and T.L.; visualization, P.L.; supervision, P.Z. and T.L.; project administration, P.Z. and T.L.; funding acquisition, P.Z. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Anhui Provincial University Research Project (grant number 2024AH050109) and Anhui Future Technology Research Institute Enterprise Cooperation Project (grant number 2023qyhz12).

Data Availability Statement

MathorCup dataset is from http://www.mathorcup.org/detail/2294 (accessed on 1 January 2025), and HackerEarth dataset is from https://www.kaggle.com/datasets/oossiiris/hackerearth-machine-learning-exhibit-art/data (accessed on 1 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Park, A.; Chen, R.; Cho, S.; Zhao, Y. The determinants of online matching platforms for freight services. Transp. Res. Part E Logist. Transp. Rev. 2023, 179, 103284. [Google Scholar] [CrossRef]
Gorishniy, Y.; Rubachev, I.; Khrulkov, V.; Babenko, A. Revisiting Deep Learning Models for Tabular Data. In Proceedings of the Advances in Neural Information Processing Systems, Virtual-only, 6–14 December 2021; pp. 18932–18943. [Google Scholar]
Xia, Y.; Kim, J.; Chen, Y.; Ye, H.; Kundu, S.; Hao, C.C.; Talati, N. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. In Proceedings of the 2024 IEEE International Symposium on Workload Characterization, Vancouver, BC, Canada, 15–17 September 2024; pp. 210–223. [Google Scholar] [CrossRef]
Requeima, J.; Bronskill, J.; Choi, D.; Turner, R.E.; Duvenaud, D. LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; pp. 109609–109671. [Google Scholar]
Hegselmann, S.; Buendia, A.; Lang, H.; Agrawal, M.; Jiang, X.; Sontag, D. TabLLM: Few-shot Classification of Tabular Data with Large Language Models. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 5549–5581. [Google Scholar]
Budak, A.; Ustundag, A.; Guloglu, B. A forecasting approach for truckload spot market pricing. Transp. Res. Part A Policy Pract. 2017, 97, 55–68. [Google Scholar] [CrossRef]
Li, Y.; Hu, Z.; Chen, C.; Yang, P.; Dong, Y. Prediction model of transaction pricing in internet freight transport platform based on combination of dual long short-term memory networks. J. Comput. Appl. 2022, 42, 1616. [Google Scholar]
Jang, H.S.; Chang, T.W.; Kim, S.H. Prediction of Shipping Cost on Freight Brokerage Platform Using Machine Learning. Sustainability 2023, 15, 1122. [Google Scholar] [CrossRef]
Guo, J.; Wang, J.; Li, Q.; Guo, B. Construction of Prediction Model of Neural Network Railway Bulk Cargo Floating Price Based on Random Forest Regression Algorithm. Neural Comput. Appl. 2019, 31, 8139–8145. [Google Scholar] [CrossRef]
Macarringue, A.M.J.S.; Oliveira, A.L.R.d.; Dias, C.T.d.S.; Marsola, K.B. Multidimensionality of agricultural grain road freight price: A multiple linear regression model approach by variable selection. Ciência Rural 2023, 54, e20220335. [Google Scholar] [CrossRef]
Spreeuwenberg, S. Developing a Forecast Model for Freight Prices in Poland. Master’s Thesis, Department of Economics and Econometrics, Tilburg University, Tilburg, The Netherlands, 2020. [Google Scholar]
Lindsey, C.; Frei, A.; Alibabai, H.; Mahmassani, H.S.; Park, Y.W.; Klabjan, D.; Reed, M.; Langheim, G.; Keating, T. Modeling carrier truckload freight rates in spot markets. In Proceedings of the Submitted for presentation at the 92nd 24 Annual Meeting of the Transportation Research Board, Washington, DC, USA, 13–17 January 2013. [Google Scholar]
Kjeldsberg, F.; Haque Munim, Z. Automated machine learning driven model for predicting platform supply vessel freight market. Comput. Ind. Eng. 2024, 191, 110153. [Google Scholar] [CrossRef]
Bae, S.H.; Lee, G.; Park, K.S. A Baltic Dry Index prediction using deep learning models. J. Korea Trade 2021, 25, 17–36. [Google Scholar] [CrossRef]
Koyuncu, K.; Tavacıoğlu, L. Forecasting Shanghai Containerized Freight Index by Using Time Series Models. Mar. Sci. Technol. Bull. 2021, 10, 426–434. [Google Scholar] [CrossRef]
Tamannaei, M.; Zarei, H.; Aminzadegan, S. A Game-Theoretic Approach to the Freight Transportation Pricing Problem in the Presence of Intermodal Service Providers in a Competitive Market. Networks Spat. Econ. 2021, 21, 123–173. [Google Scholar] [CrossRef]
Dimitriou, L. Optimal competitive pricing in European port container terminals: A game-theoretical framework. Transp. Res. Interdiscip. Perspect. 2021, 9, 100287. [Google Scholar] [CrossRef]
Dinh, T.; Zeng, Y.; Zhang, R.; Lin, Z.; Gira, M.; Rajput, S.; Sohn, J.Y.; Papailiopoulos, D.; Lee, K. LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 11763–11784. [Google Scholar]
Song, X.; Li, O.; Lee, C.; Yang, B.; Peng, D.; Perel, S.; Chen, Y. OmniPred: Language Models as Universal Regressors. arXiv 2025, arXiv:2402.14547. [Google Scholar]
Rubungo, A.N.; Arnold, C.; Rand, B.P.; Dieng, A.B. LLM-Prop: Predicting Physical and Electronic Properties of Crystalline Solids From Their Text Descriptions. arXiv 2023, arXiv:2310.14029. [Google Scholar]
Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.Y.; Liang, Y.; Li, Y.F.; Pan, S.; et al. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. arXiv 2024, arXiv:2310.01728. [Google Scholar]
Xue, H.; Salim, F.D. PromptCast: A New Prompt-Based Learning Paradigm for Time Series Forecasting. IEEE Trans. Knowl. Data Eng. 2024, 36, 6851–6864. [Google Scholar] [CrossRef]
Zhou, T.; Niu, P.; Wang, X.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained LM. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 43322–43355. [Google Scholar]
Jia, F.; Wang, K.; Zheng, Y.; Cao, D.; Liu, Y. GPT4MTS: Prompt-based Large Language Model for Multimodal Time-series Forecasting. In Proceedings of the Association for the Advancement of Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 23343–23351. [Google Scholar] [CrossRef]
Perez, E.; Kiela, D.; Cho, K. True Few-Shot Learning with Language Models. In Proceedings of the Advances in Neural Information Processing Systems, Virtual-only, 6–14 December 2021; pp. 11054–11070. [Google Scholar]
Chen, Z.; Balan, M.M.; Brown, K. Language Models are Few-shot Learners for Prognostic Prediction. arXiv 2023, arXiv:2302.12692. [Google Scholar]
Moreno, A.; Redondo, T. Text analytics: The convergence of big data and artificial intelligence. Int. J. Interact. Multimed. Artif. Intell. 2016, 3, 57–64. [Google Scholar] [CrossRef]
Yuan, H.; Lau, R.Y.; Xu, W. The determinants of crowdfunding success: A semantic text analytics approach. Decis. Support Syst. 2016, 91, 67–76. [Google Scholar] [CrossRef]
Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zeng, A.; Xu, B.; Wang, B.; Zhang, C.; Yin, D.; Zhang, D.; Rojas, D.; Feng, G.; Zhao, H.; Lai, H.; et al. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv 2024, arXiv:2406.12793. [Google Scholar]
Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. In Proceedings of the Tenth International Conference on Learning Representations, Virtual-only, 25–29 April 2022; p. 3. [Google Scholar]
Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D.; Ichter, B. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 24824–24837. [Google Scholar]
Christophe, C.; Kanithi, P.K.; Munjal, P.; Raha, T.; Hayat, N.; Rajan, R.; Al-Mahrooqi, A.; Gupta, A.; Salman, M.U.; Gosal, G.; et al. Med42–evaluating fine-tuning strategies for medical LLMs: Full-parameter vs. parameter-efficient approaches. arXiv 2024, arXiv:2404.14779. [Google Scholar]
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar]
Hannan, M.A.; How, D.N.T.; Mansor, M.B.; Hossain Lipu, M.S.; Ker, P.J.; Muttaqi, K.M. State-of-Charge Estimation of Li-ion Battery Using Gated Recurrent Unit with One-Cycle Learning Rate Policy. IEEE Trans. Ind. Appl. 2021, 57, 2964–2971. [Google Scholar] [CrossRef]
Lu, P.; Wang, Y.; Tang, Z.; Wu, X.; Liu, T.; Zhang, P.; Liu, S.; Bao, X. Network Freight Price Forecast via Bayesian Hierarchical Model. Int. J. Mach. Learn. Cybern. 2024; submitted. [Google Scholar]
Zheng, Y.; Zhang, R.; Zhang, J.; Ye, Y.; Luo, Z.; Feng, Z.; Ma, Y. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024. [Google Scholar]

Figure 1. Diagram of the logistics delivery process and key participants on the network freight platform.

Figure 2. Framework for network freight price prediction based on LLMs.

Figure 3. Data serialization approaches in freight price prediction.

Figure 4. Initial sort in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.

Figure 5. Feature-based in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.

Figure 6. Distance-based sort in the MathorCup dataset: (a) Demonstration in three-dimensional space; (b) Feature variation over the first 1000 data index. Blue represents Total Mileage, orange represents Driving Minutes Before Planned Arrival, and green represents Planned Unloading Minutes.

Figure 7. Chain-of-thought prompting for MathorCup dataset. For better readability, line breaks have been added to the output section.

Figure 8. Chain-of-thought prompting for HackerEarth dataset. For better readability, line breaks have been added to the output section.

Figure 9. Cosine annealing learning rate.

Figure 10. OneCycle learning rate.

Figure 11. Missing data in MathorCup dataset.

Figure 12. Missing data in HackerEarth dataset.

Figure 13. Price correlation analysis of each feature in MathorCup dataset.

Figure 14. Price correlation analysis of each feature in HackerEarth dataset.

Figure 15. Comparison of model performance trends with varying data size on the MathorCup dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and

R^{2}

as metrics. The MAE increases as the data size decreases, and

R^{2}

increases with smaller data sizes. (b) This subfigure shows the performance trends of the XGBoost model, comparing MAE and

R^{2}

metrics. As the data size decreases, the MAE increases and

R^{2}

decreases.

Figure 15. Comparison of model performance trends with varying data size on the MathorCup dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and

R^{2}

as metrics. The MAE increases as the data size decreases, and

R^{2}

increases with smaller data sizes. (b) This subfigure shows the performance trends of the XGBoost model, comparing MAE and

R^{2}

metrics. As the data size decreases, the MAE increases and

R^{2}

decreases.

Figure 16. Comparison of model performance trends with varying data size on the HackerEarth dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and

R^{2}

as metrics. As the data size decreases, MAE increases and

R^{2}

decreases. (b) This subfigure shows the performance trends of the XGBoost model, comparing the MAE and

R^{2}

metrics. A similar trend is observed, with MAE increasing and

R^{2}

decreasing as the data size decreases.

Figure 16. Comparison of model performance trends with varying data size on the HackerEarth dataset. (a) This subfigure shows the performance trends of the LLM model with respect to data size, using MAE and

R^{2}

as metrics. As the data size decreases, MAE increases and

R^{2}

decreases. (b) This subfigure shows the performance trends of the XGBoost model, comparing the MAE and

R^{2}

metrics. A similar trend is observed, with MAE increasing and

R^{2}

decreasing as the data size decreases.

Table 1. Definitions and clarifications of mathematical symbols.

Symbol	Dimension	Description
N	1	Total number of network freight orders.
P	1	Number of features in a network freight order.
C	1	Total number of prices in network freight orders.
$t_{i c}$	1	The value is 1 when the i-th network freight order is predicted to have the c-th price; otherwise, it is 0.
${\hat{t}}_{i c}$	1	The predicted probability of the i-th network freight order belonging to the c-th price.
$y_{i}$	1	Real price of the i-th network freight order.
${\hat{y}}_{i}$	1	Predicted price of the i-th network freight order by the model.
$q_{i}$	1	The i-th fundamental element of the token-sequence output.
X	$P \times N$	Network freight order feature data.
Y	N	Network freight order price data.

Table 2. Linear model performance varies with the number of features most relevant to price on MathorCup dataset.

Top-K	MAE	MSE	MAPE	$R^{2}$
1	163.525	153,377	0.182099	0.983775
2	164.559	154,045	0.181416	0.983704
3	147.086	150,113	0.128677	0.984120
4	147.086	150,113	0.128677	0.984120
5	148.388	151,284	0.135270	0.983996
6	148.352	151,276	0.135085	0.983997
7	148.729	151,276	0.135738	0.983997
8	148.567	151,109	0.135800	0.984015
9	148.547	151,086	0.135817	0.984017
all	161.084	136,784	0.165328	0.985530

Red: Number of features modeled.

Table 3. Linear model performance varies with the number of features most relevant to price on HackerEarth dataset.

Top-K	MAE	MSE	MAPE	$R^{2}$
1	0.440	0.319	0.074291	0.329808
2	0.331	0.216	0.055515	0.545345
3	0.314	0.180	0.052673	0.620787
4	0.247	0.117	0.041768	0.754109
5	0.221	0.097	0.037702	0.796323
6	0.219	0.096	0.037434	0.798213
7	0.219	0.094	0.037325	0.802006
8	0.216	0.092	0.036823	0.807407
9	0.212	0.089	0.036217	0.811891
all	0.192	0.079	0.032687	0.833838

Red: Number of features modeled.

Table 4. Overview of dataset key features.

Dataset	Feature Name
MC	Total Mileage
	Driving Minutes Before Planned Arrival
	Unloading Minutes Planned
	Transaction Runtime Minutes
	Available Minutes Before Actual Docking
	Available Minutes Before Actual Departure
	Minutes Before Actual Completion
	Price Adjustment Review Minutes
	Available Minutes Before Planned Docking
	Available Minutes Before Planned Departure
	Packaging Type
	Transport Grade
	Price Adjustment Type
	Price Adjustment Urgency
	Transaction Counterparty
	Region
	Demand Urgency
	Within or Outside Province
HE	Artist Reputation
	Height
	Width
	Weight
	Price Of Sculpture
	Base Shipping Price
	Area
	Combined Price
	International
	Express Shipment
	Installation Included
CO	Distance
	Vehicle length
	Vehicle type
	Cargo weight
	Cargo volume
	Pick-up province
	Unloading province
	Transportation time
	Holiday
	Month
	Price

Table 5. Overview of dataset sizes.

Dataset	Description	Count
MC	Total Raw Entries	16,016
	Raw Feature Dimensions	63
	Total Preprocessed Entries	13,615
HE	Total Raw Entries	6500
	Raw Feature Dimensions	20
	Total Preprocessed Entries	4001
CO	Total Raw Entries	36,000
	Raw Feature Dimensions	14
	Total Preprocessed Entries	33,195

Table 6. Regional freight trends in MathorCup.

Region	Price Mean	Price Variance
1	412.22	1919.98
2	152.88	984.12
3	491.03	80,545.38
4	11,871.47	1,985,854.00
5	1188.18	29,320.30
6	730.35	19,950.27

Table 7. Transport freight trends in HackerEarth.

Transport	Cost Mean	Cost Variance
Airways	25,138.71	$7.960766 \times 10^{10}$
Roadways	16,321.61	$3.062830 \times 10^{10}$
Waterways	9187.90	$1.415873 \times 10^{10}$

Table 8. Library versions.

Library Name	Current Version
transformers	≥4.41.2
datasets	≥2.16.0
accelerate	≥0.30.1
peft	≥0.11.1
trl	≥0.8.6
torchmetrics	0.11.4

Table 9. Performance of compared models in full data.

Dataset	Model	MAE	MSE	MAPE	$R^{2}$
MC	XGBoost	12.036	6868	0.00453	0.999261
	LightGBM	20.543	9525	0.01080	0.998975
	RF	15.423	7409	0.00730	0.999203
	GBDT	13.299	8670	0.00960	0.999067
	ExtraTrees	12.782	9439	0.00438	0.998984
	DT	24.448	11,873	0.01704	0.998723
	DNNs	203.973	165,706	0.32918	0.982179
	PLS	190.998	173,439	0.26772	0.981348
	Lasso	155.075	133,631	0.15450	0.985629
	Ridge	152.920	128,510	0.16307	0.986179
	ElasticNet	145.528	146,988	0.14749	0.984192
	SVR	1158.886	10,306,138	0.63346	−0.108338
	KR	153.193	128,584	0.16440	0.986171
	PCA	137.174	146,294	0.13850	0.984267
	BR	152.512	129,621	0.16105	0.986060
HE	XGBoost	128.068	113,079	0.23581	0.523815
	LightGBM	145.901	140,377	0.26584	0.408859
	RF	112.437	86,568	0.22128	0.635453
	GBDT	98.204	77,988	0.17398	0.671584
	ExtraTrees	137.114	117,320	0.26962	0.505957
	DT	160.048	144,692	0.31006	0.390689
	DNNs	250.959	264,763	0.50535	−0.114936
	PLS	168.751	141,021	0.33730	0.406149
	Lasso	186.399	146,295	0.41760	0.383940
	Ridge	93.682	58,309	0.17747	0.754455
	ElasticNet	191.285	153,673	0.43078	0.352873
	SVR	252.096	250,254	0.59910	−0.053837
	KR	93.351	57,823	0.17724	0.756500
	PCA	93.634	58,350	0.17738	0.754284
	BR	93.908	58,754	0.17693	0.752582
CO	XGBoost	441.196	1,119,809	0.18362	0.885241
	LightGBM	584.293	1,153,242	0.26571	0.881815
	RF	785.043	1,935,786	0.34883	0.801619
	GBDT	440.845	1,251,295	0.17728	0.871766
	ExtraTrees	740.413	1,577,908	0.36291	0.838295
	DT	610.468	1,585,597	0.25134	0.837507
	DNNs	539.212	1,083,497	0.21791	0.888962
	PLS	970.893	2,472,120	0.48266	0.746655
	Lasso	828.010	1,913,604	0.41144	0.803892
	Ridge	827.111	1,910,470	0.41082	0.804214
	ElasticNet	891.745	2,254,588	0.41789	0.768948
	SVR	1723.153	10,142,073	0.61695	−0.039364
	KR	827.094	1,910,485	0.41080	0.804212
	PCA	899.578	2,244,267	0.42656	0.770006
	BR	828.079	1,912,611	0.41169	0.803994

Red: The best performance in each dataset.

Table 10. Training parameters for cross entropy loss experiments.

Dataset	Model	Loss Function	Epoch	Batch Size	Learning Rate
MC;CO	GLM	Cross Entropy	5	1	$5.0 \times 10^{- 5}$
	Qwen	Cross Entropy	5	1	$5.0 \times 10^{- 5}$
	Llama	Cross Entropy	5	1	$5.0 \times 10^{- 5}$
HE	GLM	Cross Entropy	15	1	$5.0 \times 10^{- 5}$
	Qwen	Cross Entropy	15	1	$5.0 \times 10^{- 5}$
	Llama	Cross Entropy	15	1	$5.0 \times 10^{- 5}$

Table 11. Performance of cross entropy loss in MathorCup full data.

Sort	Serialization	Model	MAE	MSE	MAPE	$R^{2}$
IS	NFS	GLM	9.250	8590	0.00426	0.999076
	NFS	Llama	17.929	65,775	0.00666	0.992926
	NFS	Qwen	25.766	36,025	0.01409	0.996125
	LTS	GLM	11.502	14,221	0.00509	0.998470
	LTS	Llama	11.990	12,671	0.00568	0.998637
	LTS	Qwen	20.676	98,205	0.00603	0.989438
	VOS	GLM	8.550	7616	0.00397	0.999180
	VOS	Llama	12.215	12,786	0.00449	0.998624
	VOS	Qwen	11.514	9791	0.00684	0.998946
	JS	GLM	11.597	11,836	0.00434	0.998727
	JS	Llama	13.388	13,641	0.00547	0.998532
	JS	Qwen	14.247	11,295	0.00701	0.998785
FS	NFS	GLM	12.651	15303	0.00529	0.998354
	NFS	Llama	14.429	23,675	0.00533	0.997453
	NFS	Qwen	86.617	4,523,043	0.01537	0.513584
	LTS	GLM	13.211	17,917	0.00533	0.998073
	LTS	Llama	10.731	12,381	0.00503	0.998668
	LTS	Qwen	18.692	54,918	0.00839	0.994093
	VOS	GLM	13.038	46,614	0.00625	0.994986
	VOS	Llama	17.133	61,115	0.00470	0.993427
	VOS	Qwen	14.940	17,658	0.00499	0.998100
	JS	GLM	11.355	12,491	0.00513	0.998656
	JS	Llama	13.468	15,049	0.00590	0.998381
	JS	Qwen	12.269	9261	0.00738	0.999004
DS	NFS	GLM	12.439	14,418	0.00452	0.998449
	NFS	Llama	13.404	15,870	0.00755	0.998293
	NFS	Qwen	34.860	113,884	0.01587	0.987752
	LTS	GLM	11.040	11,449	0.00486	0.998768
	LTS	Llama	10.108	10,873	0.00480	0.998830
	LTS	Qwen	19.407	57,803	0.00822	0.993783
	VOS	GLM	12.631	14,389	0.00592	0.998452
	VOS	Llama	11.078	13,256	0.00378	0.998574
	VOS	Qwen	13.283	9876	0.00625	0.998937
	JS	GLM	12.341	14,000	0.00488	0.998494
	JS	Llama	13.019	14,445	0.00558	0.998446
	JS	Qwen	12.188	7475	0.00641	0.999196

Red: The best performance.

Table 12. Performance of cross entropy loss in HackerEarth full data.

Sort	Serialization	Model	MAE	MSE	MAPE	$R^{2}$
IS	NFS	GLM	52.019	41,091	0.09127	0.826960
	NFS	Llama	67.513	57,063	0.11394	0.759702
	NFS	Qwen	59.806	34,408	0.10746	0.855104
	LTS	GLM	53.893	42,187	0.09305	0.822344
	LTS	Llama	71.830	56,008	0.12763	0.764143
	LTS	Qwen	53.559	38,376	0.09543	0.838396
	VOS	GLM	52.566	28,680	0.10173	0.879224
	VOS	Llama	72.931	57,678	0.11917	0.757113
	VOS	Qwen	74.744	52,115	0.13603	0.780540
	JS	GLM	58.541	44,243	0.10167	0.813687
	JS	Llama	72.083	57,559	0.12378	0.757614
	JS	Qwen	67.472	51,681	0.11861	0.782366
FS	NFS	GLM	55.755	33,336	0.09795	0.859617
	NFS	Llama	67.786	38,224	0.12080	0.839035
	NFS	Qwen	62.159	38,574	0.11292	0.837561
	LTS	GLM	59.501	41,466	0.10677	0.825382
	LTS	Llama	64.944	43,955	0.11912	0.814902
	LTS	Qwen	56.222	39,077	0.09985	0.835442
	VOS	GLM	54.576	45,173	0.09528	0.809773
	VOS	Llama	76.244	57,926	0.12491	0.756067
	VOS	Qwen	69.861	43,801	0.13752	0.815550
	JS	GLM	56.899	48,528	0.10523	0.795644
	JS	Llama	62.989	48,225	0.10605	0.796920
	JS	Qwen	68.097	55,602	0.11973	0.765852
DS	NFS	GLM	54.053	26,951	0.09980	0.886506
	NFS	Llama	69.727	38,556	0.12487	0.837636
	NFS	Qwen	56.677	38,957	0.10530	0.835946
	LTS	GLM	59.696	49,923	0.10693	0.789769
	LTS	Llama	67.190	44,472	0.12127	0.812723
	LTS	Qwen	57.872	37,594	0.10651	0.841686
	VOS	GLM	55.055	40,002	0.10220	0.831546
	VOS	Llama	71.846	51,875	0.12423	0.781550
	VOS	Qwen	68.013	40,673	0.13872	0.828721
	JS	GLM	54.151	44,010	0.09790	0.814671
	JS	Llama	76.802	64,002	0.13111	0.730479
	JS	Qwen	57.217	35,293	0.11160	0.851378

Red: The best performance.

Table 13. Performance of cross entropy loss in Company full data.

Sort	Serialization	Model	MAE	MSE	MAPE	$R^{2}$
IS	NFS	GLM	424.096	1,241,060	0.16155	0.872815
	NFS	Llama	428.400	952,435	0.16720	0.902393
	NFS	Qwen	478.156	1,097,089	0.19459	0.887569
	LTS	GLM	413.750	1,008,239	0.17440	0.896675
	LTS	Llama	436.841	975,876	0.17588	0.899991
	LTS	Qwen	469.119	1,090,185	0.18615	0.888277
	VOS	GLM	548.922	1,239,516	0.20284	0.872973
	VOS	Llama	433.555	908,955	0.16529	0.906849
	VOS	Qwen	460.234	1,115,836	0.17760	0.885648
	JS	GLM	426.274	2,254,773	0.15571	0.768929
	JS	Llama	430.236	1,054,372	0.17077	0.891947
	JS	Qwen	465.082	1,019,952	0.17306	0.895474
FS	NFS	GLM	411.467	1,015,870	0.16323	0.895893
	NFS	Llama	422.903	875,652	0.16194	0.910262
	NFS	Qwen	483.389	1,105,674	0.18713	0.886689
	LTS	GLM	411.299	927,983	0.16326	0.904899
	LTS	Llama	421.303	905,436	0.19347	0.907210
	LTS	Qwen	463.050	1,010,131	0.18767	0.896481
	VOS	GLM	418.586	1,149,595	0.15690	0.882188
	VOS	Llama	428.686	903,838	0.17692	0.907374
	VOS	Qwen	445.562	1,024,561	0.17431	0.895002
	JS	GLM	452.947	6,512,821	0.17279	0.332562
	JS	Llama	415.319	890,255	0.15696	0.908766
	JS	Qwen	473.275	1,062,804	0.17771	0.891083
DS	NFS	GLM	420.648	1,033,676	0.16313	0.894068
	NFS	Llama	429.152	1,319,645	0.17543	0.864762
	NFS	Qwen	470.139	963,586	0.18147	0.901251
	LTS	GLM	412.089	910,550	0.15608	0.906686
	LTS	Llama	428.334	975,630	0.18217	0.900016
	LTS	Qwen	474.552	1,136,802	0.17707	0.883499
	VOS	GLM	424.572	1,102,354	0.16726	0.887030
	VOS	Llama	435.400	931,192	0.16224	0.904570
	VOS	Qwen	443.729	953,152	0.17089	0.902320
	JS	GLM	416.293	1,095,406	0.16136	0.887742
	JS	Llama	436.216	1,071,534	0.18969	0.890188
	JS	Qwen	451.418	885,174	0.17072	0.909286

Red: The best performance.

Table 14. Performance of Chain-of-thought in full data.

Dataset	Model	MAE	MSE	MAPE	$R^{2}$
MC	GLM	11.656	16,897	0.00491	0.998182
	Llama	18.499	27,745	0.00650	0.997016
	Qwen	28.360	46,377	0.00794	0.995012
HE	GLM	71.786	51,576	0.12890	0.782809
	Llama	77.426	46,579	0.13774	0.803850
	Qwen	79.780	55,558	0.13991	0.766040

Blue: The improved performance.

Table 15. Training parameters for MAE loss experiments.

Dataset	Model	Loss Function	Epoch	Batch Size	Learning Rate
MC	T5 Small	MAE	100	28	$1 \times 10^{- 3}$
	T5 Base	MAE	50	12	$1 \times 10^{- 3}$
	T5 Large	MAE	50	4	$1 \times 10^{- 3}$
HE	T5 Small	MAE	100	28	$1 \times 10^{- 3}$
	T5 Base	MAE	50	12	$1 \times 10^{- 3}$
	T5 Large	MAE	50	4	$1 \times 10^{- 3}$
CO	T5 Small	MAE	100	28	$1 \times 10^{- 3}$
	T5 Base	MAE	50	12	$1 \times 10^{- 3}$
	T5 Large	MAE	50	4	$1 \times 10^{- 3}$

Table 16. Performance of MAE loss in full data.

Dataset	Model	MAE	MSE	MAPE	$R^{2}$
MC	T5 Small	41.863	28,357	0.03467	0.996950
	T5 Base	29.691	18,734	0.02079	0.997985
	T5 Large	41.236	22,648	0.03359	0.997564
HE	T5 Small	98.632	69,020	0.16599	0.709351
	T5 Base	57.839	48,064	0.09739	0.797595
	T5 Large	61.229	51,320	0.10177	0.783886
CO	T5 Small	528.107	1,096,372	0.19747	0.887643
	T5 Base	555.073	1,204,992	0.21182	0.876511
	T5 Large	1566.161	7,853,424	0.54867	0.195177

Table 17. Performance of few-shot learning on the MathorCup dataset.

Model	Percentage	MAE	MSE	MAPE	$R^{2}$	Percentage	MAE	MSE	MAPE	$R^{2}$
XGBoost	10%	70.784	98,759	0.02330	0.988885	1%	139.369	149,331	0.06088	0.979095
LightGBM		86.858	59,753	0.05822	0.993275		368.219	251,874	0.83560	0.964740
RF		83.344	155,790	0.02729	0.982466		150.544	211,453	0.05423	0.970399
GBDT		92.484	202,808	0.02845	0.977175		106.575	184,301	0.04130	0.974200
ExtraTrees		90.302	185,739	0.02135	0.979096		158.962	301,627	0.06792	0.957776
DT		106.095	232,591	0.03201	0.973823		97.666	165,625	0.03595	0.976814
DNNs		155.538	132,525	0.21955	0.985085		307.463	282,313	0.45798	0.960480
PLS		180.958	137,786	0.24531	0.984492		289.414	212,690	0.62406	0.970226
Lasso		175.751	130,629	0.19564	0.985298		363.770	317,582	0.59632	0.955542
Ridge		183.452	132,195	0.21916	0.985122		440.042	616,708	0.68669	0.913669
ElasticNet		139.307	122,506	0.12834	0.986212		252.995	285,663	0.29306	0.960011
SVR		1156.840	9,865,396	0.62953	−0.1102		1011.638	7,710,331	0.75720	−0.0793
KR		183.461	132,181	0.21913	0.985123		439.360	614,449	0.68651	0.913985
PCA		141.844	125,673	0.15727	0.985856		366.601	428,677	0.59912	0.939991
BR		141.129	130,159	0.13768	0.985351		248.203	232,980	0.33562	0.967385
LLM (Ours)		68.068	131,154	0.02308	0.985239		35.407	7155	0.05394	0.998998
XGBoost	0.5%	201.993	78,339	0.17918	0.987761	0.25%	284.638	496,171	0.05278	0.960573
LightGBM		1664.269	3,928,443	3.53132	0.386307		2467.628	12,584,874	3.34101	$- 1.1 \times 10^{- 5}$
RF		253.687	245,858	0.14144	0.961592		136.092	53,550	0.10006	0.995744
GBDT		173.060	52,691	0.15813	0.991768		289.647	359,299	0.10778	0.971449
ExtraTrees		248.856	357,113	0.11844	0.944212		131.778	89,377	0.03565	0.992897
DT		113.500	28,589	0.12518	0.995533		61.571	15,205	0.06499	0.998791
DNNs		489.202	357,306	0.58756	0.944182		602.336	865,778	0.57239	0.931204
PLS		344.668	163,820	0.47048	0.974408		3580.109	74,752,720	8.57418	−4.9399
Lasso		1000.275	2,101,886	0.80242	0.671647		3347.128	70,096,321	8.15408	−4.5699
Ridge		504.707	902,297	0.31729	0.859045		3819.303	85,402,808	9.08876	−5.7862
ElasticNet		215.798	138,551	0.18579	0.978355		1464.080	11,572,647	3.35574	0.080421
SVR		1237.296	7,781,308	0.70576	−0.2155		1652.180	14,885,208	0.51258	−0.1827
KR		508.501	885,526	0.32318	0.861664		3224.048	59,189,352	7.60270	−3.7032
PCA		414.288	469,819	0.37086	0.926605		6241.154	234,392,468	14.9256	−17.625
BR		211.273	108,629	0.21944	0.983030		256.008	201,155	0.164471	0.984015
LLM (Ours)		98.071	24,080	0.10184	0.996238		84.428	25,585	0.04720	0.997966

Red: The best performance.

Table 18. Performance of few-shot learning on the HackerEarth dataset.

Model	Percentage	MAE	MSE	MAPE	$R^{2}$	Percentage	MAE	MSE	MAPE	$R^{2}$
XGBoost	40%	126.176	73,303	0.24537	0.571201	4%	119.057	41,224	0.24815	0.742702
LightGBM		148.044	98,816	0.28331	0.421957		187.292	119,959	0.39860	0.251286
RF		117.710	62,282	0.23509	0.635664		107.816	36,847	0.21431	0.770021
GBDT		102.814	54,722	0.19622	0.679893		105.958	27,933	0.22433	0.825658
ExtraTrees		142.851	84,344	0.288807	0.506611		128.904	52,921	0.27950	0.669698
DecisionTree		173.449	103,832	0.35535	0.392615		172.465	77,173	0.36361	0.518335
DNNs		247.359	196,243	0.502789	−0.1479		303.393	242,021	0.54808	−0.5105
PLS		177.502	112,821	0.343827	0.340028		141.108	56,309	0.33716	0.648554
Lasso		193.585	118,250	0.41148	0.308272		156.429	61,507	0.37025	0.616106
Ridge		104.918	54,837	0.198891	0.679221		178.764	72,806	0.50923	0.545585
ElasticNet		197.999	123,849	0.419878	0.275518		144.506	54,224	0.35056	0.661566
SVR		246.484	183,469	0.577319	−0.0732		241.913	168,015	0.59739	−0.0486
KR		104.255	54,176	0.19734	0.683084		126.558	29,147	0.38227	0.818079
PCA		103.991	54,963	0.19607	0.678478		202.144	91,370	0.57942	0.429723
BR		105.619	55,730	0.19841	0.673996		124.041	41,482	0.28412	0.741092
LLM (Ours)		86.099	45,682	0.16685	0.732771		117.408	28,093	0.30643	0.824656
XGBoost	2%	129.298	29,648	0.29729	0.534075	1%	153.377	44,418	0.31597	0.524929
LightGBM		214.822	79,057	0.52405	−0.2423		264.698	120,603	0.61248	−0.2898
RF		136.020	30,361	0.29062	0.522876		153.408	53,752	0.29238	0.425093
GBDT		127.303	26,447	0.28865	0.584383		142.823	37,719	0.29144	0.596572
ExtraTrees		102.495	15,244	0.31706	0.760436		178.964	67,227	0.38004	0.280976
DT		164.154	53,473	0.37666	0.159669		171.699	62,632	0.33932	0.330122
DNNs		315.454	135,305	0.85028	−1.126		2431.351	35,649,749	3.29146	−380.28
PLS		135.986	37,047	0.35577	0.417798		190.642	87,639	0.44465	0.062662
Lasso		119.140	26,106	0.32196	0.589737		1019.443	5,725,984	1.46376	−60.241
Ridge		260.600	202,873	0.57028	−2.188		3478.882	87,166,006	4.39456	−931.27
ElasticNet		136.576	32,560	0.37810	0.488309		656.878	1,900,999	1.002837	−19.331
SVR		214.272	78,294	0.52741	−0.2303		264.698	119,222	0.620743	−0.2751
KR		504.205	1,346,740	0.77187	−20.163		259.406	103,791	0.69181	−0.1100
PCA		340.676	366,570	0.70031	−4.7606		3656.360	96,735,311	4.61560	−1033.6
BR		98.142	16,834	0.30762	0.735450		532.141	1,024,422	0.87501	−9.9565
LLM (Ours)		86.894	13,551	0.23143	0.787032		158.141	49,153	0.44522	0.474282

Red: The best performance.

Table 19. Performance of few-shot learning on the Company dataset.

Model	Percentage	MAE	MSE	MAPE	$R^{2}$	Percentage	MAE	MSE	MAPE	$R^{2}$
XGBoost	10%	678.274	2,602,308	1.54346	0.704506	1%	828.113	2,804,942	0.24349	0.770074
LightGBM		644.173	1,942,556	1.49265	0.779421		1073.330	3,020,699	0.418153	0.752388
RF		807.454	2,419,322	1.91215	0.725284		966.844	2,803,007	0.29242	0.770233
GBDT		665.141	1,811,665	1.62176	0.794284		1086.814	4,011,988	0.28038	0.671131
ExtraTrees		763.653	2,077,473	1.65822	0.764101		934.732	2,286,298	0.318563	0.812588
DecisionTree		739.638	1,975,886	1.52680	0.775636		1209.963	6,212,170	0.27361	0.490778
DNNs		693.445	1,794,295	1.64194	0.796256		918.370	2,716,467	0.26314	0.777326
PLS		1038.642	3,214,362	1.88071	0.635007		1277.455	3,251,590	0.56641	0.733462
Lasso		914.111	3,108,801	1.63472	0.646993		1307.215	3,163,580	0.63511	0.740676
Ridge		907.840	2,935,392	1.63253	0.666684		1328.783	3,199,060	0.66023	0.737768
ElasticNet		892.580	2,666,383	1.65756	0.697230		1156.112	3,009,026	0.43564	0.753345
SVR		1827.157	9,737,510	2.33241	−0.1057		2196.533	13,666,364	0.67648	−0.1202
KR		907.900	2,935,535	1.63258	0.666668		1328.802	3,198,836	0.660228	0.737786
PCA		907.849	2,701,152	1.52074	0.693282		1237.612	3,048,120	0.484269	0.750140
BR		881.133	2,328,954	1.60296	0.735545		1130.364	3,105,342	0.41088	0.745450
LLM (Ours)		651.628	2,064,281	1.60970	0.765599		831.67	2,282,021	0.25193	0.812939
XGBoost	0.5%	649.852	799,603	0.31793	0.706064	0.1%	5246.544	64,223,122	0.46810	−0.1337
LightGBM		1042.782	2,016,012	0.53112	0.258910		6689.912	77,861,414	1.15371	−0.3745
RF		716.388	853,757	0.50741	0.686157		5179.325	58,892,696	0.54641	−0.0396
GBDT		644.161	759,383	0.31877	0.720849		5342.480	64,803,963	0.44945	−0.1440
ExtraTrees		595.826	624,855	0.42709	0.770302		5999.521	70,046,514	0.73166	−0.2365
DT		663.657	809,278	0.31951	0.702507		6323.361	83,481,876	0.55840	−0.4737
DNNs		609.241	749,226	0.28172	0.724583		4457.259	52,756,022	0.40766	0.068663
PLS		1031.991	1,477,825	0.70006	0.456748		4990.569	54,251,933	0.64673	0.042254
Lasso		921.377	1,309,347	0.44930	0.518681		4956.708	59,723,169	0.46992	−0.0543
Ridge		893.642	1,244,710	0.43541	0.54244		4203.288	48,164,688	0.36632	0.149717
ElasticNet		958.983	1,392,343	0.52743	0.488172		3645.388	46,408,284	0.36984	0.180723
SVR		1344.858	2,878,792	0.81348	−0.0582		6821.218	89,557,879	0.847606	−0.5810
KR		889.829	1,231,500	0.43486	0.547298		4980.952	54,407,169	0.52580	0.039514
PCA		1099.088	1,748,811	0.63081	0.357134		4220.395	48,083,041	0.43200	0.151158
BR		811.657	1,144,587	0.40786	0.579247		3936.478	46,725,100	0.40774	0.175131
LLM (Ours)		709.941	902,333	0.33883	0.668300		3643.361	41,174,589	0.39681	0.273117

Red: The best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, P.; Zhang, P.; Wu, J.; Wu, X.; Mao, Y.; Liu, T. Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms. Mathematics 2025, 13, 2504. https://doi.org/10.3390/math13152504

AMA Style

Lu P, Zhang P, Wu J, Wu X, Mao Y, Liu T. Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms. Mathematics. 2025; 13(15):2504. https://doi.org/10.3390/math13152504

Chicago/Turabian Style

Lu, Pengfei, Ping Zhang, Jun Wu, Xia Wu, Yunsheng Mao, and Tao Liu. 2025. "Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms" Mathematics 13, no. 15: 2504. https://doi.org/10.3390/math13152504

APA Style

Lu, P., Zhang, P., Wu, J., Wu, X., Mao, Y., & Liu, T. (2025). Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms. Mathematics, 13(15), 2504. https://doi.org/10.3390/math13152504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fine-Tuning Pre-Trained Large Language Models for Price Prediction on Network Freight Platforms

Abstract

1. Introduction

2. Related Work

2.1. Network Freight Price Prediction

2.2. The Application of LLMs in Task Prediction

2.3. Text Analytics

3. Problem Formulation

4. Methodology

4.1. Framework

4.2. Data Serialization

4.3. Data Sorting

4.4. Chain-of-Thought

4.5. Loss Function

4.6. Fine-Tuning

4.7. Learning Rate Adjustment

5. Experiments

5.1. Datasets

5.2. Setups

5.3. Evaluation Metrics

5.4. Performance

5.4.1. Compared Models Experiments in Full Data

5.4.2. Cross Entropy Loss and LoRA Fine-Tuning Experiments in Full Data

5.4.3. Chain-of-Thought Experiments

5.4.4. MAE Loss Experiments in Full Data

5.4.5. Few-Shot Learning Experiments

6. Discussion

6.1. Results Analysis

6.2. Interpretability

6.3. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI