LTransformer: A Transformer-Based Framework for Task Ofﬂoading in Vehicular Edge Computing

: Vehicular edge computing (VEC) is essential in vehicle applications such as trafﬁc control and in-vehicle services. In the task ofﬂoading process of VEC, predictive-mode transmission based on deep learning is constrained by limited computational resources. Furthermore, the accuracy of deep learning algorithms in VEC is compromised due to the lack of edge computing features in algorithms. To solve these problems, this paper proposes a task ofﬂoading optimization approach that enables edge servers to store deep learning models. Moreover, this paper proposes the LTransformer, a transformer-based framework that incorporates edge computing features. The framework consists of pre-training, an input module, an encoding–decoding module, and an output module. Compared with four sequential deep learning methods, namely a Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), a Gated Recurrent Unit (GRU), and the Transformer, the LTransformer achieves the highest accuracy, reaching 80.1% on the real dataset. In addition, the LTransformer achieves 0.008 s when predicting a single trajectory, fully satisfying the fundamental requirements of real-time prediction and enabling task ofﬂoading optimization.


Introduction
In recent years, research related to edge computing has gradually received extensive attention from researchers [1,2].Vehicular edge computing (VEC), as a part of edge computing, provides real-time service to vehicular users.It has excellent prospects in the fields of intelligent transportation systems, smart city applications, and vehicular applications.
As the infrastructure becomes well established, edge servers extend their service coverage to a wider scope.In traffic control, edge computing servers can acquire and regulate real-time traffic.In in-vehicle tasks, edge computing servers can provide high-quality services to users.However, the quality of service (QoS) in VEC still cannot be significantly improved, and one of its bottlenecks is the inefficient task offloading.Traditional task offloading methods are plagued by issues such as significant latency, high time and space complexity, and low transmission quality.
To solve the problems of task offloading, trajectory prediction methods are used in the task offloading scheme.For example, tasks which take up a lot of computational resources can be offloaded to other edge servers using predictive-mode multi-hop transmission.Once the vehicle enters the transmission range of the edge server, it obtains the computation results directly [3,4].
The current task offloading schemes mainly focus on resource allocation.Few studies discuss the deployment of advanced trajectory prediction method.Nowadays, deep learning is often utilized for trajectory prediction.However, the computational resources in edge servers make it difficult to deploy common deep learning algorithms.Furthermore, trajectory prediction schemes have relatively poor accuracy in VEC.
Therefore, the current task offloading scheme with predictive-mode transmission encounters two issues: (1) Existing edge servers have limited resources to deploy deep learning models, which consume massive storage and computational resources.(2) Neither the short trajectory prediction based on a Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and a Gated Recurrent Unit (GRU) nor the long trajectory prediction based on the Transformer takes into account the features of VEC.
In order to solve the above problems, we propose the LTransformer, a transformerbased prediction framework in VEC.Meanwhile, a task offloading scheme applied in VEC is proposed.Specifically, the contributions of this paper can be summarized as follows: (1) We propose a task offloading approach, in which the deep learning model can be deployed for predictive-mode transmission.On the cloud server, the predictive model is trained based on historical trajectory data.On the edge server, real-time trajectory prediction and task offloading optimization are achieved based on the predictive model.(2) We propose the LTransformer, which has a four-module structure.In the pre-training stage, stationary latitude and longitude data are embedded.In the pre-training stage, stationary latitude and longitude data are embedded.In the input module, multidimensional information such as geography, sequence, and time are integrated.
In the encoding-decoding module, encoders and decoders are used to train the trajectory data.In the output module, problematic results are removed using an error correction method.(3) Experiments are carried out in a real dataset.The proposed method is compared with other deep learning models to analyze its accuracy and applicability in VEC.
The composition of our manuscript is as follows.In Section 2, we describe existing vehicular edge computing schemes and machine learning algorithms related to trajectory prediction.In Section 3, we introduce the task offloading optimization method and the LTransformer.In Section 4, we conduct experiments to analyze the accuracy and efficiency of the LTransformer.Section 5 summarizes the accomplishments and provides an overview of potential avenues for future research.

Related Work 2.1. Vehicular Edge Computing
Task offloading in VEC is the process of transmitting the computing task and related parameters from the service requestor to the service providers through Vehicle-To-Vehicle (V2V) and Vehicle-To-Infrastructure (V2I) communications [1].Saeik et al. [5] summarized the communication issues in task offloading and proposed a novel task offloading scheme that combines edge and cloud resources.An example of vehicular edge computing is shown in Figure 1  Therefore, the current task offloading scheme with predictive-mode transmission encounters two issues: (1) Existing edge servers have limited resources to deploy deep learning models, which consume massive storage and computational resources.(2) Neither the short trajectory prediction based on a Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and a Gated Recurrent Unit (GRU) nor the long trajectory prediction based on the Transformer takes into account the features of VEC.
In order to solve the above problems, we propose the LTransformer, a transformerbased prediction framework in VEC.Meanwhile, a task offloading scheme applied in VEC is proposed.Specifically, the contributions of this paper can be summarized as follows: (1) We propose a task offloading approach, in which the deep learning model can be deployed for predictive-mode transmission.On the cloud server, the predictive model is trained based on historical trajectory data.On the edge server, real-time trajectory prediction and task offloading optimization are achieved based on the predictive model.(2) We propose the LTransformer, which has a four-module structure.In the pre-training stage, stationary latitude and longitude data are embedded.In the pre-training stage, stationary latitude and longitude data are embedded.In the input module, multidimensional information such as geography, sequence, and time are integrated.In the encoding-decoding module, encoders and decoders are used to train the trajectory data.In the output module, problematic results are removed using an error correction method.(3) Experiments are carried out in a real dataset.The proposed method is compared with other deep learning models to analyze its accuracy and applicability in VEC.
The composition of our manuscript is as follows.In Section 2, we describe existing vehicular edge computing schemes and machine learning algorithms related to trajectory prediction.In Section 3, we introduce the task offloading optimization method and the LTransformer.In Section 4, we conduct experiments to analyze the accuracy and efficiency of the LTransformer.Section 5 summarizes the accomplishments and provides an overview of potential avenues for future research.

Vehicular Edge Computing
Task offloading in VEC is the process of transmitting the computing task and related parameters from the service requestor to the service providers through Vehicle-To-Vehicle (V2V) and Vehicle-To-Infrastructure (V2I) communications [1].Saeik et al. [5] summarized the communication issues in task offloading and proposed a novel task offloading scheme that combines edge and cloud resources.An example of vehicular edge computing is shown in Figure 1 below.The resources of edge servers can be fully utilized to provide better QoS to users via optimizing the task offloading scheme.Zhang et al. [4] presented an efficient predictive combination-mode relegation scheme wherein the tasks are adaptively offloaded to the edge servers through direct uploading or predictive relay transmissions.Zhan et al. [3] converted global task offloading optimization problem into multiple local optimization problems with a heuristic mobility-aware offloading algorithm (HMAOA) to approximate the optimal offloading scheme.Yang et al. [6] proposed a low-complexity semiparametric predictive model that takes into account the periodic characteristics and spatial/temporal correlations of dynamic road events.Although these methods have shown some improvements, they still fail to achieve an optimal balance between efficiency and accuracy in VEC.Therefore, how to predict vehicle trajectories more accurately while ensuring efficiency is a pressing issue in task offloading at this stage.

Trajectory Prediction
Trajectory prediction problems can be categorized into two types based on different data types, namely continuous trajectory prediction problems and discrete trajectory prediction problems.The continuous trajectory prediction problem is a regression problem.Alahi et al. [7] developed an LSTM model which can learn general human movement and predict their future trajectories.Han et al. [8] proposed a short-term real-time trajectory coordinate point prediction method based on a GRU (Gated Recurrent Unit) cyclic neural network.This method improves the accuracy of real-time forecasting by updating the model parameters in real time.Huang et al. [9] discussed a new traffic network modeling algorithm based on the context of traffic intersections that maps vehicle trajectory nodes into a high-dimensional space vector, so that Bi-GRU can be used to bidirectionally model the trajectory matrix for the purpose of prediction.Amichi et al. [10] designed a two-step predictive framework solely based on personal location data.This framework aims to address the prediction of visits to new places and adjust prediction resolution to account for probable explorations of new locations.
Monreale et al. [11] proposed a T-pattern tree for trajectory prediction.The tree is constructed using trajectory patterns that represent specific areas, and it can serve as a predictor for the next location of a new trajectory by identifying the best-matching path within the tree.Dong et al. [12] put forward a new method named RTMatch to predict the future location of a moving object using the storage structure, RTPT and HT, which can be updated dynamically and provide dynamic analysis of trajectory pattern according to real-time information.Zeng et al. [13] presented a next-location prediction approach based on an RNN and self-attention mechanism to predict trajectory patterns based on a sequence of discrete nodes.Feng et al. [14] proposed DeepMove, an attentional recurrent network for mobility prediction from lengthy and sparse trajectories.DeepMove effectively utilizes the periodicity nature to augment the RNN for mobility prediction.Liu et al. [15] created a geographically temporally awareness hierarchical attention network (GT-HAN) to distinguish different user preferences.
Recent research proves that the Transformer outperforms other deep learning methods in trajectory prediction.Amirloo et al. [16] proposed LatentFormer, a transformerbased model able to predict future vehicle trajectories by leveraging a novel technique to model interactions among dynamic objects in the scene.Accounting for the interaction between vehicles, Yan et al. [17] proposed two spatial attention mechanisms to help the model understand the surrounding environment better and thus improve its prediction accuracy.Yu et al. [18] introduced the Spatio-Temporal grAph tRansformer (STAR) framework, a novel framework for spatio-temporal trajectory prediction based purely on a self-attention mechanism, with TGConv, a Transformer-based graph convolution mechanism.Dai et al. [19] proposed a novel neural architecture, Transformer-XL, which enables learning dependency beyond a fixed length without disrupting temporal coherence.Wang et al. [20] used a low-rank approximation method to approximate a self-attention mechanism, which maintains high performance while reducing the computational cost.Kitaev et al. [21] introduced reversible residual layers that reduce the memory consumption of the model and give the model the ability to handle larger datasets.Kong et al. [22] proposed the Spatial-Temporal Graph Attention Network (STGAT) for traffic flow forecasting.They demonstrated that STGAT can be generalized directly not only to graphs with an arbitrary structure, but also to completely unseen graphs.None of the existing deep-learning-based prediction methods consider the features in VEC.These deep learning methods need a large amount of storage and computational resources.However, edge servers have limited resources, which leads to the fact that these methods cannot be directly applied to VEC.

Method
There are two problems with the existing research: (1) in VEC, task offloading methods based on trajectory prediction encounter limitations in resources, which impedes the deployment of deep learning; (2) in deep learning, existing deep learning methods cannot be directly applied to VEC, and do not incorporate the features in VEC.
For (1), a task offloading optimization approach is proposed.It supports the deployment of deep learning models to optimize task offloading through prediction results.In this approach, the model is trained on the cloud server and stored in edge servers to provide trajectory prediction services in real time, which consequently optimizes task offloading.For (2), the LTransformer is proposed and applied to the task offloading in VEC.The proposed model uses the Transformer as the overall architecture and incorporates the stationary and adjacent features of edge servers.

Task Offloading
Existing machine learning methods consume a lot of edge server computational resources.Therefore, we add a portion of cloud computing to VEC, in order to break through the limitation of edge server resources.In our optimization approach, resource-hungry computational tasks are implemented in the cloud server.The edge servers only store the well-trained model and predict the trajectory based on the data given from vehicles.

Task Offloading Optimization Approach
The task offloading optimization approach can be divided into the training process and the prediction and optimization process.The training process involves gathering historical trajectory data, training the predictive model, and deploying the model.The input data of this process are the trajectory data stored by the vehicle, and the output datum of this process is the trajectory predictive model stored by edge servers.The detailed steps are shown in Figure 2a.The prediction and optimization process encompasses the collection of trajectories for prediction, the generation of prediction results, and the optimization of computational tasks.The input data are the trajectory data stored by the vehicle, and the output data are the results of the computational task transmitted to the vehicle, the detailed steps of which are shown in Figure 2b.
The training process can be divided into three steps, each of which corresponds to the serial number in Figure 2a.The details of the training process are as follows: Input: the vehicle trajectory set A is stored in the vehicle set V, where Output: the predictive model M is stored in the set edge servers, S, where S = {s 1 , s 2 , . .., s m }.

1.
Historical vehicle trajectories stored in the vehicle are uploaded into the edge server.Specifically, the vehicle trajectory T h stored in the vehicle v h is uploaded to the surrounding edge server s h through the roadside units (RSU).

2.
Edge servers upload vehicle trajectories to the cloud server.Specifically, the vehicle trajectories, T p , . .., T q stored in the edge server s h are uploaded to the cloud server c.

3.
The predictive model is trained on the cloud server based on the trajectory data of vehicles.Once the training process is completed, the predictive model is transmitted to the edge servers.Specifically, the predictive model M is trained on the cloud The training process can be divided into three steps, each of which corresponds the serial number in Figure 2a.The details of the training process are as follows: Input: the vehicle trajectory set Ais stored in the vehicle set V, where A = {T1, T2, Tn}, T = {s1, s2, …, sl}, V = {v1, v2, …, vn}.
Output: the predictive model M is stored in the set edge servers, S, where S = {s1, …, sm}.

Historical vehicle trajectories stored in the vehicle are uploaded into the edge serv
Specifically, the vehicle trajectory Th stored in the vehicle vh is uploaded to the s rounding edge server sh through the roadside units (RSU).2. Edge servers upload vehicle trajectories to the cloud server.Specifically, the vehi trajectories, Tp, …, Tq stored in the edge server sh are uploaded to the cloud server 3. The predictive model is trained on the cloud server based on the trajectory data vehicles.Once the training process is completed, the predictive model is transmitt to the edge servers.Specifically, the predictive model M is trained on the cloud ser c based on the vehicle trajectory set T. After training, the predictive model M is tra mitted to each edge server sh.
The prediction and optimization process is similarly divided into three steps, each which corresponds to a serial number in Figure 2b.The detailed process of prediction a optimization is as follows: Input includes the following three components: • The computational task Dp.

•
The vehicle trajectory Ti is stored in the vehicle vi.

•
The model M is stored in the edge server sj.
Output: the completed computational task Dq stored in the vehicle vi.
4. Computational tasks and vehicle trajectories stored in the vehicle are uploaded on the edge server.Specifically, the computational task Dp and vehicle trajectory stored in the vehicle vi are uploaded into the surrounding edge server sj via the RS The prediction and optimization process is similarly divided into three steps, each of which corresponds to a serial number in Figure 2b.The detailed process of prediction and optimization is as follows: Input includes the following three components: • The computational task D p .

•
The vehicle trajectory T i is stored in the vehicle v i .

•
The model M is stored in the edge server s j .
Output: the completed computational task D q stored in the vehicle v i .

4.
Computational tasks and vehicle trajectories stored in the vehicle are uploaded onto the edge server.Specifically, the computational task D p and vehicle trajectory T i stored in the vehicle v i are uploaded into the surrounding edge server s j via the RSU.

5.
The edge server uses the predictive model to predict the vehicle trajectory and optimizes the task offloading according to the prediction result.Specifically, the vehicle trajectory, T i , is predicted in the edge server s j using the predictive model M. Assuming the prediction result is the edge server s k , the computational task or the result of the task is transmitted to the edge server s k via V2V or V2I. 6.
The vehicle downloads the results of the computational task as it arrives around the predicted location.Specifically, the computational task result D q in the edge server s k is transmitted to the vehicle v i when the vehicle v i arrives in its vicinity.
The task offloading optimization approach has many applications.For example, in smart cities, this approach can be used to deploy deep learning in VEC to predict and control the overall flow of vehicles, and thus optimize the traffic situation.In in-vehicle services, this approach also deploys deep learning to improve the QoS of users.

Search Stage and Energy Management
In edge computing, we propose a task offloading strategy that combines the dynamism of edge computing networks with the hardware environment.Let r s represent the remaining computational resources of the server.f s represents the computing efficiency.P s is the server's computational power.Computational tasks typically consist of several sub-tasks.Therefore, we define N consecutive sub-tasks as n = {1, 2, . .., N}, and the computational resource required for task n as r n .After receiving a computational task, there are two scenarios that need to be considered in each edge server: 1.
The current edge server is capable of completing the computational task on its own.In that case, the energy consumption required for the edge server to complete computational task n is The remaining computational resources on this edge server cannot meet the minimum requirements of this computational task.The edge server transfers this computational task to an edge server in the direction of the vehicle's movement.In that case, the total energy consumption required for the edge server to complete computational task n is where E T represents the energy consumption for task transmission, and E p represents the energy consumption for trajectory prediction.Namely, if each edge server completes the computational task in the first scenario, the energy consumption is minimized.However, if an edge server handles a lot of computational tasks in the first scenario, it may lead to the depletion of resources on this edge server.Therefore, this strategy can be used to search for idle resources on edge servers and reduce the occurrence of denial-of-service incidents caused by computational task accumulation.The relevant pseudocode is presented in Algorithm 1 below.It is worth noting that if the predicted edge server still faces resource constraints, it will continue to follow the second scenario for execution.However, prediction tasks will update the trajectory as the vehicle moves, thus dynamically altering the prediction results in real time.
As a result, trajectory prediction becomes a crucial component of this optimization approach.To enhance prediction accuracy, we introduce the LTransformer.

LTransformer
In the task offloading optimization process, this paper proposes the LTransformer framework for trajectory prediction.The input datum of this framework is the set T = {s 1 , s 2 , . .., s n }, where s i ∈ R d in , d in = 2. s i denotes the edge server nodes and d in denotes the input dimension.Each edge server node s i includes (1) the timestamp of the vehicle trajectory when it passes through this node; (2) the serial number of the edge server node.The output datum of this framework is the prediction result, p, where p ∈ R d out , d out = 2.The prediction result p represents the position of the predicted edge server node and d out denotes the output dimension.p also contains two dimensions, namely the longitude and latitude of the edge server node.Therefore, this prediction process is defined as a function on any vehicle trajectory set, T. f (T) = p The optimization objective of the LTransformer framework is Pr(p = p i ) denotes the probabilities that the prediction result p is the edge server node with the serial number i, and p i belongs to a finite set of edge server nodes.
The LTransformer consists of four modules: the pre-training module, the input module, the encoding-decoding module, and the output module.The LTransformer uses the Transformer framework in general, and improves on the original Transformer by improving the input module and adding the pre-training and the output modules.In the pre-training module, stationary latitude and longitude data are embedded into the multidimensional space.There are three aspects considered in the input module, which are location, position, and time.In the location aspect, we use trainable embedding and local linear embedding (LLE).In the position aspect, we combine the trajectories with their serial numbers.In the time aspect, we add temporal information using the method in Informer [23].In the output module, this paper proposes an error correction mechanism to remove the faulty results.The overall framework of the LTransformer is shown in Figure 3.The performance of the prediction is increased.In pre-training, the parameters for training are reduced to save time.In the input module, multiple aspects are considered to improve the accuracy.In the output module, faulty results are removed to correct the output.

Pre-Training
In VEC, the edge servers' geographic locations are stationary.During pre-training, the latitude and longitude of each edge server node are incorporated and embedded into The performance of the prediction is increased.In pre-training, the parameters for training are reduced to save time.In the input module, multiple aspects are considered to improve the accuracy.In the output module, faulty results are removed to correct the output.

Pre-Training
In VEC, the edge servers' geographic locations are stationary.During pre-training, the latitude and longitude of each edge server node are incorporated and embedded into a multidimensional space.Meanwhile, we observed the location of the edge server nodes in a vehicle trajectory set.There is a local linear relationship between the nodes.In other words, the latitude and longitude between nodes are roughly linearly distributed in the same vehicle trajectory set.During the training period, the model's parameters are adjusted through backpropagation to capture the location relationship between nodes.Therefore, in pre-training, we use LLE and trainable embedding methods.The process of pre-training is depicted in Figure 4.In pre-training, according to the node order, a matrix, Wi, is generated as the trainable embedding and LLE.The matrix Wi consists of nodes ai which represent server node and only contains location information.

Trainable Embedding
The trainable embedding maps the ai of the matrix Wi from a two-dimension to a multidimensional space, following the standard normal distribution.Accordi results of hyperparameter tuning, 512-dimensional input data are the best hyper ters, leading to the best prediction results.Therefore, trainable embedding gene trainable matrix Wt using the matrix Wi,  ∈ ℝ × , where m denotes the total of edge server nodes; i.e., each row of the matrix Wt corresponds to the latitude gitude features of a node and numerically follows a normal distribution.During t ing process, backpropagation updates the parameters of trainable embeddings duces the cross-entropy loss.LLE LLE generates the geographic embedded matrix Wg through two optimizat cesses.The embedded matrix Wg effectively integrates the positional stationarity a linear relationship of edge server nodes.
Optimization Process 1: Generate Weight Coefficients Based on Local Linear Feat Optimization Process 1 aims to calculate the weight coefficient, wij, which is mediate result that preserves the local linear relationship.In pre-training, according to the node order, a matrix, W i , is generated as the input to trainable embedding and LLE.The matrix W i consists of nodes a i which represent the edge server node and only contains location information.

Trainable Embedding
The trainable embedding maps the a i of the matrix W i from a two-dimensional space to a multidimensional space, following the standard normal distribution.According to the results of hyperparameter tuning, 512-dimensional input data are the best hyperparameters, leading to the best prediction results.Therefore, trainable embedding generates the trainable matrix W t using the matrix W i , W t ∈ R m×512 , where m denotes the total number of edge server nodes; i.e., each row of the matrix W t corresponds to the latitude and longitude features of a node and numerically follows a normal distribution.During the training process, backpropagation updates the parameters of trainable embeddings and reduces the cross-entropy loss.LLE LLE generates the geographic embedded matrix W g through two optimization processes.The embedded matrix W g effectively integrates the positional stationarity and local linear relationship of edge server nodes.

Optimization Process 1: Generate Weight Coefficients Based on Local Linear Features
Optimization Process 1 aims to calculate the weight coefficient, w ij , which is an intermediate result that preserves the local linear relationship.
Initially, the Euclidean distance is computed between each node and other nodes.Subsequently, k neighboring nodes are selected with the closest Euclidean distances.Then, k weight coefficients w ij corresponding to each node a i are optimized, where the weight coefficient w ij denotes the weights between node a i and its neighboring node a j .Since the mean square error can reflect a linear relationship, the optimization process involves calculating the minimum value of the mean square error with the constraint that the sum of the weight coefficient w ij of each node is 1, i.e., the normalization.The optimization process is shown as follows: where the set Q(i) represents the set of serial numbers corresponding to the neighboring nodes of node a i .Assuming k = 2 and the 2 neighboring nodes of a 1 are selected as a 2 and a 3 , then Q(1) = {2, 3}.
Optimization Process 2: Generate the Geographic Embedded Matrix Based on the Weight Coefficients In order to maintain the local linear relationship between each node after embedding, LLE generates the geographic embedding matrix W g based on the weight coefficient w ij .Therefore, the optimization process involves computing the minimum of the mean square error after embedding and the constraints are the normalization of the embedded vectors.The optimization process is shown as follows: Based on the training results, the LTransformer sets the dimension of y i to 512.All y i vectors after embedding constitute the geographic embedding matrix W g , W g ∈ R m×512 , where m denotes the quantities of edge server nodes.

Input Module
The input module consists of three components: location embedding, position encoding, and temporal encoding, which encode geographic, sequential, and temporal information, respectively.The input datum of this module is a set consisting of a multiple vehicle trajectory set, T. We use a matrix, M, to represent the input data, i.e., where M ∈ R p×q×d in , p denotes the quantity of trajectories (batch size) at each training epoch, q denotes the length of the longest trajectory, and d in denotes the dimension of each node.
In the input module, the matrix M can be divided into two parts according to d in , the matrix M l (d in = 1), containing serial information, and the matrix M s (d in = 1), containing temporal information.

Location Embedding
The input of location embedding process is the matrix M l .It is generated by the results of pre-training process.Specifically, two embedded matrices, M t and M g , are generated by replacing the serial numbers in M l with the corresponding embedded vectors in the trainable embedded matrix W t and the geographic embedded matrix W g , where M t ∈ R p×q×512 , M g ∈ R p×q×512 .Finally, the matrix M t and the matrix M g are summed by weights to obtain the matrix M L , which represents the location information.
where θ 1 and θ 2 are the weights of the matrix M t and the matrix M g .The weights will be updated in the backpropagation process.

Position Encoding
The positional information determines the order of the nodes in the trajectory, so position encoding is critical.The LTransformer follows the Transformer's position encoding method; i.e., position information is represented by trigonometric functions.The encoding results are computed for each position and dimension, i.e., where t denotes the position number, i denotes the dimension, and d denotes the dimension after embedding.For each element in M and each of its dimensions, M P is calculated according to Equation (7).The calculation is shown in Equation ( 8).
The result of position encoding is the matrix M P , representing the sequential information.

Time Encoding
In VEC, there is a temporal pattern in the vehicle trajectories.For example, traffic volumes and overall direction of movement in the morning are different from those in the evening.Therefore, the LTransformer framework uses the time encoding in Informer, which is able to express the temporal pattern.First, the LTransformer selects five aspects of temporal pattern expression based on the training results, which are hour, day, week, month, and year.Then, we calculate each expression of the timestamp in the matrix M s .The calculation of each aspect has its own characteristic; e.g., in the hour aspect, we calculate the proportionality of the current minute in an hour.The specific formulation of each aspect is as follows: h = minute/60 d = hour/24 w = day/7 m = day/30 y = day/365 (11) After the calculation, the results are concatenated and embedded into a 512-dimensional space using a linear layer to generate M T , where M T ∈ R p×q×512 .The matrix M T represents the temporal information.(h, d, w, m, y)) (12) It is demonstrated in the Transformer that if the dimensions are summed directly, the distinctions and relationships between the dimensions can be captured during the training process.Therefore, in the input module, the summation of M L , M P , and M T represents the geographic, sequential, and temporal information.It is calculated to generate the output M O of the input module, i.e.,

Encoding-Decoding Module
The main structure of the encoding-decoding module is similar to that of the Transformer, comprising N encoders and N decoders.Each encoder or decoder includes multi-head attention with p heads and a fully connected neural network with a q-dimensional hidden layer, which employs residual concatenation and normalization after each encoding or decoding.Based on the training performance, the LTransformer sets the hyperparameters to N = 6, p = 16, and q = 4096.
The features of the vehicle trajectories and edge server nodes are trained via encoders.The decoder adjusts the weights to reduce the cross-entropy loss by forcing learning, so that the prediction results gradually become closer to the real ones during the training process.In addition, since the input module generates the matrix with geographic, sequential, and temporal information, the multidimensional features of vehicle trajectories and the edge server nodes can be captured by the multi-head attention in the encodingdecoding module.Thus, the prediction of the vehicle trajectory is generated with multiple aspects' information.

Output Module
The output module filters the results using the error correction mechanism, which takes into account the adjacent relationships of edge servers.Specifically, in VEC, the result of trajectory prediction must be adjacent to the last node of the vehicle trajectory.If the prediction is not adjacent to the last node, the result will be detected as an error.Therefore, this paper proposes the error correction mechanism to prevent erroneous results.
First, we define the concept of the adjacent node; if the binary trajectory sequence T j is a subsequence of any trajectory T i , i.e., T j = {s p , s q }, T i = {s 1 , s 2 , . .., s m }, satisfying T j ⊆ T i , then we refer to the node s p in the trajectory set T j as the adjacent node of s q .Based on this definition, this paper details the process of the error correction mechanism, as shown in Figure 5.
ppl.Sci.2023, 13, x FOR PEER REVIEW sequential, and temporal information, the multidimensional features of ve ries and the edge server nodes can be captured by the multi-head attention ing-decoding module.Thus, the prediction of the vehicle trajectory is genera tiple aspects' information.

Output Module
The output module filters the results using the error correction mech takes into account the adjacent relationships of edge servers.Specifically, i sult of trajectory prediction must be adjacent to the last node of the vehicl the prediction is not adjacent to the last node, the result will be detected as a fore, this paper proposes the error correction mechanism to prevent errone First, we define the concept of the adjacent node; if the binary trajecto is a subsequence of any trajectory Ti, i.e., Tj = {sp, sq}, Ti = {s1, s2, …, sm}, sati then we refer to the node sp in the trajectory set Tj as the adjacent node of sq definition, this paper details the process of the error correction mechanism Figure 5.In the error correction mechanism, first, the prediction results are ra according to their probability.Second, the node pi corresponding to the ma bility is obtained.Third, it is determined whether the node pi is the adjace last node of the vehicle trajectory.If the node pi is not the adjacent node, th leted and the results are ranked again.Finally, if the node pi is the adjacent sider the output pi the final prediction result of the LTransformer.
The LTransformer fully considers the features of edge computing, an work can be applied to in-vehicle edge computing.Specifically, traffic con In the error correction mechanism, first, the prediction results are ranked in order according to their probability.Second, the node p i corresponding to the maximum probability is obtained.Third, it is determined whether the node p i is the adjacent node of the last node of the vehicle trajectory.If the node p i is not the adjacent node, this node is deleted and the results are ranked again.Finally, if the node p i is the adjacent node, we consider the output p i the final prediction result of the LTransformer.
The LTransformer fully considers the features of edge computing, and this framework can be applied to in-vehicle edge computing.Specifically, traffic control and in-vehicle services require efficient task offloading mechanisms, and the LTransformer framework can optimize the task offloading mechanisms to provide high-quality services.

LTransformer Complexity
In this subsection, we will analyze the time complexity of the LTransformer in various stages and use this analysis to assess its feasibility for application in VEC.
Since pre-training will only be performed once, the time complexity of the LTransformer primarily lies in the analysis of the encoding-decoding module.During training, the time complexity of the encoding-decoding module is linked to the computation of Attention.
where Q, K, V ∈ R n×d , n refers to the length of the sequence, and d is the dimension.
The main computational step in the above equation involves calculating the similarity using QK T , which is essentially a matrix multiplication between n × d and d × n, resulting in an n × n matrix with a complexity of O(n 2 d).The time complexity of the SoftMax function is O(n); thus, the overall time complexity of the formula is O(n 2 d).
In the predication process, the time complexity of the LTransformer remains the same as during training.Similarly, its time complexity is also O(n 2 d).During prediction, the sequence length (i.e., n) is generally not long, resulting in fast computation speed, which can meet the demands of edge computing.
It is worth noting that the LTransformer model is continuously updated and iterated on the cloud server.Over time, the LTransformer will perform incremental training based on the latest trajectory data, updating itself to achieve higher accuracy.Due to the pre-training process, the LTransformer does not need to train the parameters in different dimensions.Therefore, when updating parameters in the LTransformer, the required training time can meet the demands of regular version iteration.

Experimental Environment and Dataset
This section introduces the experimental environment.The experiment analysis was conducted on a Linux server, and the source code was written in Python.The detailed platform parameters are shown in Table 1.The experimental data were real vehicle trajectory data detected using real equipment in a city in China.Vehicle trajectory data include four fields, namely longitude, latitude, timestamp, and serial number, as presented in Table 2.

Data Processing
This section primarily discusses the data processing, training process, and prediction process during the experiment.Each trajectory in the dataset was composed of several nodes.Each node contained information about latitude and longitude and a timestamp indicating the time at which the vehicle passed a specific geographical location.Let the length of the trajectory be n.
Before starting the experiment, each trajectory was split into a sequence of length n-1 and the last node.The sequence of length n-1 served as the input to the model.The latitude and longitude of the last node served as the ground truth for the model's predictions.In other words, the model needed to predict the location of the last node of the trajectory based on the sequence of the preceding n-1 nodes.
During the training process, we divided the dataset into training and testing sets in a ratio close to 2:1.All baseline algorithms and the LTransformer were trained using the teacher forcing mode.
During the prediction process, we put the test dataset into the trained model and compared the obtained results with the ground truth.Finally, we calculated the accuracy according to the comparison results.

Approach Comparing
In order to compare and analyze the performance of the LTransformer, the following baseline methods were chosen in this experiment.
RNN [24]: the RNN (Recurrent Neural Network) is a deep learning method which is always used to predict sequential data.
LSTM [25]: LSTM (Long-Short Term Memory) improves the RNN with gated structures and solves the short-term memory problem of the RNN.
GRU [8]: the GRU (Gated Recurrent Unit) simplifies LSTM with two gating mechanisms (reset gate and update gate) and solves the slow loss descent problem.
Transformer [26]: the Transformer performs attention mechanisms in the encoder layers and decoder layers to analyze or predict sequential data.
Additionally, the parameters of the model affect the training and prediction performance, and the corresponding experimental parameters are shown in Table 3.The RNN, LSTM, and GRU typically achieve better performance with the Adam optimizer, while their loss reduction is slower when using the SGD optimizer.Conversely, the Transformer tends to show a suboptimal performance with the Adam optimizer, while the SGD optimizer proves to be more suitable and effective.

Accuracy Verification
We conducted a comparative analysis by training all of the models for the same epochs on our dataset.We define accuracy as where T denotes the number of correct predictions and F denotes the number of incorrect predictions.Following this definition, the experiments compared and analyzed the accuracy of each model.All models were trained for the same rounds using the same training set and loss function.To reduce experiment variability, we selected three different samples from the dataset.Each sample contained the same number of trajectories but was drawn from different parts of the dataset.The specific results are presented in Tables 4 and 5. Tables 4 and 5 demonstrate that the LTransformer achieved higher accuracy and a better fitting performance compared to existing commonly used sequential deep learning methods.In terms of accuracy, the LTransformer improved the average accuracy to 80.1%.In terms of loss, the LTransformer reduced the loss to 0.248.This suggests that the LTransformer has a superior predictive performance.As the loss value reflects the fitting status, the correlation between the loss and the number of epochs was recorded during training.Figure 6 shows the specific results.
Figure 6 records the average loss per round for each model during training.After extensive experimental validation, we found that after 30 rounds of training, the loss rates of all models did not change significantly anymore.Therefore, we chose 30 rounds as the number of training rounds for all models.This suggests that the LTransformer has a slower rate of loss reduction, which is attributed to the incorporation of multiple dimensions in the LTransformer, which was required to fit different features across these dimensions.Simultaneously, the LTransformer was trained to fit multidimensional features, and its final fitting performance (at the 30th epoch) was better than that of other baseline models.
Tables 4 and 5 demonstrate that the LTransformer achieved higher accuracy and a better fitting performance compared to existing commonly used sequential deep learning methods.In terms of accuracy, the LTransformer improved the average accuracy to 80.1%.In terms of loss, the LTransformer reduced the loss to 0.248.This suggests that the LTransformer has a superior predictive performance.As the loss value reflects the fitting status, the correlation between the loss and the number of epochs was recorded during training.Figure 6 shows the specific results.Figure 6 records the average loss per round for each model during training.After extensive experimental validation, we found that after 30 rounds of training, the loss rates of all models did not change significantly anymore.Therefore, we chose 30 rounds as the number of training rounds for all models.This suggests that the LTransformer has a slower rate of loss reduction, which is attributed to the incorporation of multiple dimensions in the LTransformer, which was required to fit different features across these dimensions.Simultaneously, the LTransformer was trained to fit multidimensional features, and its final fitting performance (at the 30th epoch) was better than that of other baseline models.6.It can be observed that as the number of trajectories increases, the rate of time consumption tends towards 0.008 s per trajectory.Compared to processing computational tasks, the time required for trajectory prediction is almost negligible.Therefore, the LTransformer essentially meets the requirements for real-time prediction.

Memory Consumption
Memory consumption is one of the key resources for machine learning model training and inference.Machine learning models deployed on edge servers should make efficient use of memory to enhance model efficiency and performance.Therefore, we compared the memory usage of the LTransformer and other baseline algorithms when predicting 80,000 trajectories.It is worth noting that the LTransformer has similar memory usage to the Transformer.Therefore, no comparison was made in terms of memory between the LTransformer and the Transformer.In addition, we compared the memory consumption in three different samples.The specific data are shown in Table 7.It can be observed that LTransformer also has an advantage in terms of memory resource consumption.This may be attributed to the more advanced encoding techniques adopted by the Transformer model when processing trajectory sequence data, which compresses the data storage space.This allows it to be compatible with a greater number of edge servers.First, experiments were conducted to compare and analyze the embedded sions of the LTransformer.The results show that the LTransformer had the best pr performance with 512 dimensions.This is because the LTransformer goes throu training, which reduces the number of dimensions to be fitted.Meanwhile, all of beddings are summed together instead of going through concatenation, which duces the number of dimensions.
In addition, the experiment also tested the other hyperparameters, and finall other hyperparameters to N = 6, p = 16, and q = 4096, for which the LTransformer best prediction performance.
Furthermore, the experiment tested the prediction time of the LTransformer f gle trajectory datum, which only requires 0.008 s.Therefore, the experimental re dicate that the LTransformer achieves a higher accuracy compared to other baseli First, experiments were conducted to compare and analyze the embedded dimensions of the LTransformer.The results show that the LTransformer had the best prediction performance with 512 dimensions.This is because the LTransformer goes through pretraining, which reduces the number of dimensions to be fitted.Meanwhile, all of the embeddings are summed together instead of going through concatenation, which also reduces the number of dimensions.
In addition, the experiment also tested the other hyperparameters, and finally set the other hyperparameters to N = 6, p = 16, and q = 4096, for which the LTransformer had the best prediction performance.
Furthermore, the experiment tested the prediction time of the LTransformer for a single trajectory datum, which only requires 0.008 s.Therefore, the experimental results indicate that the LTransformer achieves a higher accuracy compared to other baseline algorithms and meets the requirement of real-time prediction, making it more suitable for deployment in VEC.Meanwhile, the experiments also demonstrated that the LTransformer can be applied in traffic control, in-vehicle services, and other VEC applications.

Conclusions
This paper proposes a task offloading optimization approach in VEC which enables the edge servers to deploy deep learning methods for predicting vehicle trajectories and optimizing task offloading strategies.
Considering the trajectory prediction approach, this paper introduces the LTransformer, which has four modules.In the pre-training, two-dimensional space containing latitude-longitude information is embedded into multidimensional space.The input module integrates geographic, sequential, and temporal information.The encoding-decoding module incorporates the encoder and decoder in the Transformer to train features of multidimensional data.In the output module, the error correction mechanism is employed to remove certain error results.In the experiment, a comparison was made with four commonly used sequential deep learning methods.The experimental results demonstrate that the LTransformer achieves more accurate vehicle trajectories, making it suitable for VEC.Meanwhile, in VEC, the framework can be applied in traffic control and in-vehicle services.
In the future, we will combine other technologies [27] to further optimize the task offloading process in VEC.In trajectory prediction, we will incorporate additional dimensions and behavioral features [28] to enhance the deep learning approach, aiming to achieve higher accuracy and efficiency in VEC.Moreover, privacy and security measures also need to be taken into further consideration.

Figure 1 .
Figure 1.An example of vehicular edge computing.
Set {Pr(p = p1), Pr(p = p2), … , Pr(p = pm)} Take the node p i corresponding to the Maximum Probability the Last node pj

Figure 6 .
Figure 6.Comparison of losses of each algorithm.

Figure 6 .
Figure 6.Comparison of losses of each algorithm.

4. 5 .
Resource Consumption 4.5.1.Time Consumption This subsection primarily discusses the feasibility of using the LTransformer in realtime prediction.Relevant experiments were conducted to test the time required for prediction with trajectory data volumes of 1, 10, 100, and 1000 in three different samples.The specific results are shown in Table 4.6.Parameter AdjustmentThe hyperparameters of the LTransformer are parameters whose values control the deep learning process and determine the values of parameters that the algorithm ends up learning.Different hyperparameters of the LTransformer were analyzed in the experiment.We tested the accuracy of the model by adjusting the value of one parameter, while keeping the training set and other parameters constant.Figure7shows the accuracy of the LTransformer model with different dimensions.Appl.Sci.2023, 13, x FOR PEER REVIEW

Figure 7 .
Figure 7.Comparison of accuracy of the LTransformer with different dimensions.

Figure 7 .
Figure 7.Comparison of accuracy of the LTransformer with different dimensions. below.

Table 1 .
Parameters of the experimental platform.

Table 2 .
Source and fields of the experimental dataset.

Table 3 .
Models and corresponding parameters.

Table 4 .
Accuracies in different samples of each model.

Table 5 .
Average accuracy and loss of each model.

Table 6 .
Time consumption for different numbers of trajectories.

Table 7 .
Memory consumption in different models.