1. Introduction
China’s road freight transported 403.37 billion tons of commercial cargo in 2023 [
1], representing significant growth in the transportation sector. Accurate heavy-duty vehicle load estimation is crucial for improving logistics efficiency, reducing costs, and minimizing environmental pollution. However, traditional weighing technologies struggle to meet real-time performance and cost-effectiveness requirements for such large-scale operations. Current load monitoring systems primarily rely on weighbridges at highway entrances, which, despite having high accuracy, suffer from fixed locations and operational inefficiencies. These limitations have prompted the development of vehicle-mounted sensor methods. Tosoongnoen et al. [
2] developed a strain gauge-based sensor for real-time truck freight monitoring and found linear relationships between sensor values and vehicle loads. However, their method exhibited limited measurement range and struggled with dynamic responses under complex conditions. Nikishechkin et al. [
3] employed fluxgate sensors to measure magnetic field strength from DC traction motors, but their method required identical calibration routes, severely limiting practical deployment. Marszalek et al. [
4] utilized inductive-loop technology for load estimation of moving passenger cars, which achieved good results for light vehicles but lacked scalability to heavy-duty applications.
While these sensor-based approaches demonstrate feasibility, they universally face challenges of high installation costs, complex calibration procedures, and limited adaptability across diverse vehicle types. To address these limitations, researchers have turned to Unobtrusive On-Board Weighing (UOBW) methods, which estimate loads through vehicle dynamics without hardware modifications. Torabi et al. [
5] applied feedforward neural networks using acceleration, torque, and speed parameters, achieving an RMSE of approximately 1% in specific scenarios. However, their approach was limited to simulation data and particular slope conditions. Jensen et al. [
6] developed mass estimation based on longitudinal dynamics using IMU and CAN-bus data; while their method achieved errors within 5% for light vehicles, it struggled with heavy-duty vehicle applications and varying road conditions. Korayem et al. [
7] compared physics-based and machine learning methods for trailer mass estimation, demonstrating errors below 10% but requiring extensive calibration for different vehicle types.
While these UOBW methods eliminate hardware requirements, they still face challenges in generalization across diverse operating conditions and vehicle configurations, particularly for heavy-duty vehicles. Deep learning integration has shown promise in overcoming traditional limitations. Han et al. [
8] utilized Bi-LSTM networks for real-time truck load estimation based on deep learning, achieving an average relative error of 3.58% on 80% of samples. Nevertheless, their method required high-quality structured data and manual preprocessing. Gu et al. [
9] enhanced this approach with multi-head attention mechanisms, reducing MAE by 6% and RMSE by 5%, but their method still faced challenges with data quality requirements and computational complexity. Li et al. [
10] employed Internet-of-Vehicles big data with clustering analysis, controlling errors within 10%, but their method suffered from non-end-to-end processing that introduced variability in results.
While deep learning methods demonstrate improved accuracy over traditional approaches, they still face challenges with data quality issues and lack interpretability, motivating researchers to explore complementary approaches. Recent advances include computer vision-based and physics-constrained methods. Feng et al. [
11] applied computer vision for moving vehicle weight estimation through tire deformation analysis, but their method faced challenges with environmental conditions and tire variability. Yu et al. [
12] developed a physics-constrained generative adversarial network for probabilistic vehicle weight estimation; their method showed innovation in handling uncertainty but was limited by computational efficiency and specific infrastructure requirements. İşbitirici et al. [
13] noted that traditional LSTM-based virtual load sensors struggle with dynamically changing features and long-term dependencies in heavy-duty vehicle applications. Integrating prior physics-based knowledge into deep learning models is a key research direction for enhancing their performance and data efficiency.
Despite recent advances, current methods face the following critical limitations:
- (1)
Excessive sensor dependence that increases costs and limits widespread adoption;
- (2)
Restricted application scenarios due to the lack of universality across vehicle types and conditions;
- (3)
Significant manual intervention required for data preprocessing, which reduces practical deployment feasibility.
To overcome the shortcomings of existing models, current state-of-the-art research has advanced in two key directions: physics-informed machine learning (PIML) and graph neural networks (GNNs). PIML techniques, in particular, have demonstrated tangible benefits. For instance, Pestourie et al. [
14] showed that coupling a low-fidelity simulator with a neural network can achieve a threefold accuracy improvement while reducing data requirements by a factor of over 100. Other hybrid architectures, such as the dual-branch neural ODE proposed by Tian et al. [
15], have achieved state-of-the-art performance in air quality prediction. Beyond accuracy, PIML can also tackle fundamental scientific challenges; Zou et al. [
16] have shown that ensembles of PINNs can discover multiple physical solutions, successfully identifying both stable and unstable results for problems like the Allen–Cahn equation. In parallel, GNNs have excelled in modeling complex relational systems. Liu et al. [
17], for example, incorporated the flow conservation law into the loss function of a heterogeneous GNN, which outperformed conventional models in both convergence rate and prediction accuracy for traffic assignment.
Despite these advances, a critical research gap remains for the specific challenge of real-world vehicle load estimation. Existing PIML techniques often require well-defined governing equations or simulators that are impractical for noisy OBD data where key variables (e.g., road grade and wind resistance) are unobserved. Similarly, while GNNs can model relationships, a standard application does not inherently enforce vehicle-specific physical laws or address the significant data quality issues (noise, outliers, and missing values) endemic to real-world OBD streams. Therefore, an innovative method is needed that synergistically combines the structural relational learning of GNNs with the domain knowledge of physics, while also being inherently robust to low-quality, unlabeled sensor data.
To bridge this gap, this paper proposes a Self-Supervised Reconstruction Heterogeneous Graph Convolutional Network (SSR-HGCN), a novel framework designed for robust and accurate load estimation based on raw OBD data. Our approach is founded on three key innovations that directly address the aforementioned limitations:
- (1)
A physics-constrained heterogeneous graph structure: Unlike models that ignore physics or require rigid equations, we encode vehicle dynamics directly into the model’s architecture. We construct a heterogeneous graph where distinct node types represent kinematic and dynamic features, and cross-domain edges explicitly enforce known physical relationships. This provides a strong inductive bias, enhancing both accuracy and interpretability.
- (2)
A self-supervised reconstruction mechanism: To tackle real-world data quality issues without costly manual labeling, we introduce a self-supervised auxiliary task. The model is forced to reconstruct the input features from their learned representations, which promotes the learning of robust features that are invariant to noise and sensor irregularities.
- (3)
A hierarchical feature extraction framework: We design a multi-layer architecture combining graph convolutional networks (GCNs) and graph attention networks (GATs) to effectively aggregate information across temporal, kinematic, and dynamic dimensions, capturing complex dependencies in the data.
2. Materials and Methods
2.1. Data Collection and Preprocessing
Considering that vehicle loads remain stable during continuous highway driving and that highway entrance/exit weighbridge measurements are accurate, we collected heavy-duty vehicle trajectory data from highway segments as our research material. The vehicle trajectory data originates from second-by-second OBD transmission data required by China’s Stage VI Heavy-Duty Vehicle Emission Standards, covering 800 6-axle heavy-duty vehicle trips on selected highways in Hebei Province from September to December 2024, with an average single trip duration of 4 h. The data format is shown in
Table 1.
The vehicle highway weighing data comes from real-time weighbridge data uploaded through the national highway networked toll collection platform, collected at highway entrances/exits corresponding to the trajectory data. By matching weighing data with trajectory data from similar time periods, we obtained actual vehicle loads for each trip.
The specific data matching process includes
Grouping samples based on X_VIN identifiers, using time windows to divide data segments with time intervals not exceeding 2 h under the same X_VIN into independent data groups, ensuring temporal continuity.
Matching highway weighing raw data (containing X_VIN, timestamp, and weight values) with segmented datasets through X_VIN and timestamps, retaining only successfully matched data groups as reliable datasets.
Supplementing calculations for acceleration and traction force features that have strong correlations with load using empirical formulas (see Equations (1) and (2)). In the equations,
is the current data vehicle speed,
is the previous data vehicle speed,
is the current data timestamp,
is the previous data timestamp.
is the engine net output torque,
u is the transmission efficiency (0.8 based on empirical formula), and
is the wheel radius (0.5 m based on empirical formula), with units corresponding to those presented in
Table 1.
Filtering outliers for all fields to ensure logical data features, with the range boundaries based on transportation industry empirical data, while maximizing data continuity and completeness; the filtering ranges are shown in
Table 1.
Calculating the mean and variance for all fields, and standardizing all fields to obtain complete valid input. For feature
X with mean
and standard deviation
, the standardization process follows Equation (3), where
is the standardized result.
The complete dataset includes all standardized fields shown in
Table 1, except for X_VIN and timestamp. The final sample weight distribution is shown in
Figure 1, which exhibits a unimodal distribution centered at 40–48 t, indicating that vehicles typically operate at full capacity for bulk material transportation, with most tasks requiring vehicles to reach the design load limits, reflecting the scale characteristics of logistic transportation. Low-load intervals have lower proportions, corresponding to empty return trips and scattered cargo scenarios.
2.2. Overall Framework Design
We combined deep learning technology with graph neural networks, physical constraints, and self-supervised techniques to construct a comprehensive estimation system framework, achieving a physically constrained model SSR-HGCN with strong interpretability and high estimation accuracy.
The architectural design of the SSR-HGCN is fundamentally informed by the principle of leveraging representational asymmetry to capture underlying physical symmetries. The governing laws of vehicle dynamics exhibit invariance across diverse operational contexts, a symmetry that conventional models struggle to learn due to their flawed inductive bias of treating all input features homogeneously. Our framework directly confronts this by constructing a heterogeneous graph, which introduces a structural asymmetry that segregates kinematic and dynamic features into distinct node domains. This physics-informed asymmetry provides a more accurate inductive bias, compelling the model to learn the universal, symmetric laws of motion with higher fidelity from noisy, real-world data.
As shown in
Figure 2, the graph neural networks serve as the feature extraction module, leveraging their message-passing mechanism to explicitly model physical constraints between features [
18]. The methodology comprises four main steps:
- (1)
Constructing a heterogeneous graph topology with physical constraint edges based on the coupling between vehicle load and kinematic/dynamic features;
- (2)
Designing a corresponding feature extraction network;
- (3)
Implementing feature reconstruction for enhanced robustness;
- (4)
Applying fully connected layers for final load estimation.
As shown in
Figure 2, the heterogeneous graph construction begins with establishing separate node domains for kinematic and dynamic features. Cross-domain dependency edges enable feature interaction and enforce physical constraints (illustrated as red edges in
Figure 2). Additionally, temporal edges connect adjacent data records within each domain. These three edge types collectively form the data topology and corresponding adjacency matrix.
For feature extraction, linear mapping first embeds the heterogeneous graph features into high-dimensional space to enhance expressiveness. The graph convolutional and attention layers then aggregate neighborhood information hierarchically, extracting three distinct feature representations: kinematic domain features, dynamic domain features, and cross-domain constraint information. Pooling and concatenation operations subsequently reduce dimensionality while preserving semantic information from all three feature types.
The feature reconstruction module enhances model robustness by reconstructing pre-pooling representations back to the original input space through linear mapping. This reconstruction task, serving as an auxiliary objective, enables the network to learn robust representations from noisy data.
Finally, a multilayer perceptron (MLP) transforms the extracted features into load estimates through multiple linear transformations with activation functions. The main task (load estimation) and auxiliary task (feature reconstruction) jointly optimize model parameters, thereby improving both accuracy and robustness to data quality variations.
2.3. Feature Reconstruction
The reconstruction head, as a component that can restore model-extracted feature representations to original inputs [
19], operates on the core mechanism of forcing the model to learn the intrinsic structure and semantic information of input data through reconstruction auxiliary tasks.
This component typically works collaboratively with the feature extraction module to enhance model generalization capability. In heavy-duty vehicle load estimation scenarios, the acquisition cost of high-quality labeled data is significant, but the self-supervised reconstruction head can alleviate limitations brought by low-quality data (such as noisy data, missing values, and outliers). By identifying low-quality data features through reconstruction errors, it ensures that the model maintains stable estimation performance under complex data features corresponding to various working conditions. At the same time, it minimizes the impact on model accuracy.
We adopted a regression-based self-supervised learning reconstruction head that forces the graph convolutional networks to learn more semantically representative graph embedding structures through a node feature reconstruction mechanism. The workflow is shown in
Figure 3. The model connects a linear reconstruction head after the feature extraction network, with its structure shown in Equation (4).
where
represents the reconstructed original input;
represents the node embeddings output by the feature extraction network;
is the output dimension corresponding to the nodes;
and
are the weight and bias terms of the reconstruction head feature vector, respectively; and
is the corresponding dimension of input features. The reconstruction head maps the node embeddings back to the original feature space, supervised by the mean squared error loss during training, as shown in Equation (5):
where
represents the original input features and
N is the total number of nodes. The reconstruction head serves as a model auxiliary task, minimizing
through gradient descent methods, and forcing the feature extraction network to learn the intrinsic structure and semantic information of input data.
2.4. Graph Convolutional Network
The graph structures
consist of nodes
and edges
, represented formally as shown in Equation (6).
where
represents the number of nodes, the edge structure is represented by the adjacency matrix, the node feature matrix is
, and
and
correspond to different graph node indices. Graph convolutional networks (GCNs) extract cross-node features through message-passing mechanisms that aggregate node neighborhood information to learn node-level representations. The main processes include message generation (Equation (7)), message aggregation (Equation (8)), and feature updating (Equation (9)).
In message generation, the neighbors of node
, messages
are generated based on their features, where
is a custom message function that is typically set as a simple linear mapping neural network. For message aggregation, node
aggregates messages passed from all neighbors to obtain an aggregated message
, where
is the aggregation function, typically taking global average aggregation or global maximum aggregation. For feature updating, node features are updated based on aggregated messages, where
is the update function, which is typically a nonlinear transformation. We used ReLU activation function plus a fully connected layer as the update function, as shown in Equation (10).
Heavy-duty vehicle trajectory data exhibits temporal dependencies and large feature fluctuations; thus, we adopted graph convolutional networks as the feature extraction network [
20]. As shown in Equation (11), the core idea is to perform normalized linear transformation on the nodes and their neighborhood feature matrices.
where
is the feature vector of node
in the
-th layer;
denotes the first-order neighborhood of node
, with
indicating the inclusion of self-loops;
represents the degree of node
with self-loops;
is the learnable weight matrix of the
-th layer; and
is a nonlinear activation function, referring to Equation (10). This transformation process is extended to the entire graph, as shown in Equations (12) and (13), where
is the normalized adjacency matrix,
is the original adjacency matrix,
is the identity matrix, and
is the diagonal matrix.
The final update process of the global feature vector
is shown in Equation (14).
2.5. Graph Attention Network
A graph attention network (GAT) is a feature extraction module that dynamically assigns weights based on feature importance. Its core idea is to weight connections between nodes through dynamically learned attention mechanisms, determining neighbor importance during feature aggregation. GATs differ from GCNs in that GCNs use degree-based weighted summation to aggregate neighbor node information, whereas GATs calculate attention coefficients through attention mechanisms to adjust node weights; this enables better capture of non-local graph information, as shown in Equation (15). To make the model focus more on features directly related to load, we used a GAT as a special GCN for feature extraction in ablation experiments [
21].
where
is a learnable weight vector representing attention mechanism weights; || represents feature concatenation operation; and are features obtained by linear transformation of nodes
i and
j, respectively.
is an activation function that performs better than ReLU in GATs, and it is specifically shown in Equation (16), where
is an adjustable parameter corresponding to the activation function.
Finally, attention coefficients are normalized through the Softmax function to obtain a normalized attention coefficient, representing the influence degree of node
j on node
i. The Softmax function is shown in Equation (17), where exp represents the natural exponential function.
Figure 4 illustrates the GAT feature processing pipeline.
2.6. SSR-HGCN Model
Building upon the self-supervised reconstruction module and graph neural network described above, we categorized OBD parameters into two groups: kinematic features (vehicle speed, engine speed, and acceleration) and dynamic features (engine torque, friction torque, fuel flow, power, and traction) for heavy-duty vehicle load estimation. To capture the temporal interactions between kinematic and dynamic features that influence vehicle load, we proposed the SSR-HGCN model, which enforces physical constraints through cross-domain message passing between kinematic and dynamic features.
As shown in
Figure 5, the complete structure and data flow of the SSR-HGCN model are as follows: After window grouping, standardization, and feature domain division of the original input features, we obtain kinematic feature input X
1 and dynamic feature input X
2. Cross-domain edges connect corresponding time steps between X
1 and X
2 to encode physical constraints. After extraction via the feature extraction network, the three types of features generate graph-level kinematic representation X
3, dynamic representation X
5, and physical constraint representation X
4, respectively. After pooling X
3 and X
5, they are concatenated with the corresponding dimensional physical constraint representation X
4, and input to the multilayer perceptron to form the main task, producing load prediction Y
2 for comparison with actual load Y
1 in the main task. Pre-pooling representations are reconstructed through the reconstruction head into temporal edge features consistent with the original temporal edges, forming the auxiliary task. The two types of tasks are trained collaboratively through joint loss functions, ultimately forming the complete model training process. Below is further explanation of the feature dimensions and physical constraint logic:
The original features (where “−2” corresponds to derivative fields such as acceleration and traction that have not yet been supplemented) are split into kinematic and dynamic features, and , after window grouping, standardization processing, and derivative field supplementation. The processing logic is detailed in the experimental section below. Here, “batch_size” represents the number of data groups participating in each batch input, window is the time window length, and and correspond to the kinematic and dynamic feature dimensions, respectively.
The kinematic feature is connected through temporal edges and outputs node embeddings through the feature extraction network (where is the number of output channels for feature extraction), obtaining graph-level representation after pooling. The dynamic feature is processed through a feature extraction network with different input dimensions but an otherwise identical structure, outputting node embeddings and obtaining representation after pooling.
The longitudinal force balance equation for a moving vehicle describes the equilibrium between traction force and all resistive/inertial forces acting upon the vehicle. The general form of this equilibrium is
where
is the traction force (N) transmitted from the engine to the driving wheels via the powertrain;
is the inertial force (N) opposing vehicle acceleration, calculated as
, with
being the total mass of the vehicle;
is the grade resistance (N) caused by road inclination, expressed as
;
is the gravitational acceleration;
is the rolling resistance (N) between the tires and the road surface, defined as
;
is the aerodynamic drag (N) opposing vehicle motion, calculated using the standard air resistance formula:
.
Based on the above formula, it can be derived that (Equation (18))
where
is the load,
is the traction force,
is air density (kg/m
3),
is the frontal area (m
2),
is the air resistance coefficient,
is the vehicle speed (m/s),
is gravitational acceleration,
is the rolling resistance coefficient,
is acceleration (m/s
2), and
is the vehicle’s unladen weight. Physical constraints are constructed based on the relationships between the measured variables, including acceleration, vehicle speed, and traction. Specifically, after abstracting
and
into graph nodes and corresponding temporal edges, nodes of the same time sequence are connected to obtain corresponding physical constraint edges, which are transformed into graph-level physical constraint representation
after feature extraction network and pooling operations.
The reconstruction head reconstructs and through linear mapping into adjacency matrix structures corresponding to the original temporal edges, and the reconstruction loss is calculated based on the mean squared error between the reconstructed and original temporal features.
The three types of graph-level representations are concatenated and fused, outputting the final load prediction result through the multilayer perceptron, which is compared with the actual load through the mean squared error to obtain the fitting main task loss.
We adopted a joint optimization learning method for gradient descent, balancing prediction accuracy with generalization. The total loss function L is calculated as shown in Equation (19):
where λ is the self-supervised loss weight determined through ablation experiments. The reconstruction head design considers two key aspects: loss balance parameters λ and reconstruction head structure. First, to ensure that the encoder can capture sufficiently compact semantic features to support reconstruction, we adopted a simple single-layer linear transformation reconstruction head as the main structure of the reconstruction module. Second, by adjusting different loss balance parameters, we ensure that the model does not deviate from the prediction target.
2.7. Evaluation Metrics
As shown in
Figure 1, the main distribution interval of the original load data concentrates at 40–48 t, directly reflecting the prevalence of 6-axle heavy-duty vehicle loads under full-load scenarios; the low-proportion interval at 10–25 t on the left corresponds to the typical load characteristics of empty states and return cargo scenarios. Further analysis of the box plot shows a median load of 44 t, with most experimental samples being 42–46 t.
Based on the above analysis, the original load data exhibits certain long-tail characteristics that closely match actual transportation scenarios but cause conventional evaluation metrics like the mean square error to inadequately represent model fitting effects. To alleviate evaluation metric distortion from such issues, we adopted the root mean square error (RMSE) as the core evaluation metric, calculated as shown in Equation (20), where
is the actual value,
is the predicted value, and n is the test set size. However, a disadvantage of using the RMSE is its sensitivity to outliers with large absolute errors. To avoid the influence of extreme outliers in the test samples, we also introduced the mean absolute percentage error (MAPE) to measure model fitting consistency across the overall sample set. The MAPE significantly reduces outlier interference on evaluation results by calculating the mean relative error between the predicted and actual values, as shown in Equation (21).
Lower values for both metrics indicate higher model estimation accuracy. The combined application of both metrics enables both quantitative analysis of individual sample fitting accuracy and reliable overall evaluation of models under complex data distributions.
2.8. Implementation Details
The SSR-HGCN model was implemented using PyTorch 1.12.0. All experiments were conducted on a server equipped with NVIDIA RTX 3090 GPU (24 GB memory) and Intel Core i9-10900K CPU. The dataset was split into 70% training, 15% validation, and 15% test sets. The key parameters were as follows:
Time window size: 60 s (corresponding to 60-time steps at a sampling rate of 1 Hz);
Batch size: 32 (“Batch size” is a core hyperparameter in deep learning training, referring to the number of samples processed by the model);
Learning rate: 0.001, using the Adam optimizer;
Training epochs: 200 epochs, using early stopping strategy (threshold = 20, signaling that training stops when the validation set’s root mean square error (RMSE) does not decrease for 20 consecutive epochs);
Dropout probability: 0.2 (used for model regularization to prevent overfitting).
To enable real-world deployment, the field terminal was connected to a 4G-Cat1 device via the OBD-II interface, powered by the vehicle’s electrical system. Upon power-up, it automatically completes network registration. The terminal periodically reads critical data such as engine RPM, vehicle speed, and fault codes. This data is encapsulated into JSON messages with timestamps and vehicle identifiers, and then securely transmitted to the cloud via MQTT-TLS. The edge device handles only data collection and transmission and does not perform local computation. Model inference is uniformly executed on cloud GPU servers, enabling a complete streaming workflow.
4. Conclusions
We addressed three key challenges in dynamic load estimation for heavy-duty vehicles: the insufficient accuracy of traditional models, the lack of interpretability, and data quality issues arising from equipment and environmental factors. Through the collection of kinematic and dynamic domain feature data from heavy-duty vehicles via OBD devices, we developed the SSR-HGCN model, which integrates graph neural networks, physical constraints, and self-supervised learning techniques. This model achieves robust physical constraints, high interpretability, and superior measurement accuracy for load assessment in dynamic driving scenarios. With 60 s of OBD data collected at 1 Hz, the model completes inference in just 0.005 s, achieving high-precision estimation with a mean absolute percentage error of 7.27%.
The key innovations include (1) a heterogeneous graph architecture that structurally encodes physical relationships between kinematic and dynamic features through cross-domain edges; (2) a self-supervised reconstruction module that enhances robustness to noisy OBD data without extensive manual labeling; and (3) a hierarchical feature extraction framework combining GCN and GAT layers for effective information aggregation across temporal and physical constraint dimensions.
Comprehensive experimental validation demonstrated that SSR-HGCN significantly outperformed traditional time-series models, achieving reductions of 20.76% in RMSE and 41.23% in MAPE over LSTM. It also outperformed the standard graph model GraphSAGE, reducing RMSE by 21.98% and MAPE by 7.15%, ultimately achieving < 15% error for over 90% of test samples. The model completed inference in 0.18 ms per sample at a batch size of 32, confirming its viability for real-time fleet monitoring applications.
4.1. Practical Implications
The SSR-HGCN framework offers substantial advantages for real-world deployment in transportation and environmental regulation. Unlike traditional weighbridge-based systems, this approach requires only standard OBD devices mandated by China’s Stage VI Heavy-Duty Vehicle Emission Standards, eliminating infrastructure costs while enabling continuous monitoring across entire highway networks. The model’s real-time computational performance supports immediate applications in fleet supervision, overloading detection, and regulatory enforcement for environmental compliance.
4.2. Limitations and Future Directions
Several limitations warrant future investigation.
Model limitations: The current model shows insufficient response capability in scenarios with changing data features. Future work could explore more complex physical constraints and temporal dependency structures to enrich graph model expressiveness and enhance the capability to capture temporal changes. Additionally, incorporating external environmental factors such as road gradients, traffic flow, and different vehicle types could further improve prediction accuracy.
Scalability considerations: Given the generality of the load estimation problem, future research should introduce state-of-the-art pre-trained models like BERT on large-scale datasets to pre-train heterogeneous graph models for improved performance. We also plan to explore multi-task learning frameworks to achieve the joint optimization of load prediction and vehicle state assessment in order to enhance model comprehensive applicability across multiple scenarios.
Data quality enhancement: While our self-supervised approach addresses some data quality issues, developing more sophisticated data cleaning and augmentation techniques could further improve model robustness, particularly for edge cases and rare events.