1. Introduction
Mobile robots are equipped with a variety of sensors, which not only provide rich environmental information but also offer high-precision pose estimation. By using three-dimensional (3D) mapping based on the LIO architecture, the vehicle can better understand its driving environment [
1,
2]. To achieve real-time six-degree-of-freedom pose estimation, a lightweight lidar odometry method is applied [
3]. It has high accuracy and computational efficiency, and can reduce pose estimation errors through the SLAM framework; however, it does not consider point cloud distortion during motion. Reference [
4] proposed a tightly coupled iterative Kalman filter algorithm for fusing IMU and LiDAR data in scenarios where the robot moves rapidly and encounters sparse features. However, this algorithm was limited to small-scale environments and did not incorporate loop detection, which may have led to cumulative errors in large-scale scenarios, thus affecting practicality. Y. Wang et al. proposed a prediction module to separately estimate and optimize the rotation and displacement directions, providing more accurate initial values for the back-end module [
5]. However, the optimization process significantly increases computation time and complexity, and does not consider the effect of height information. Effective 3D mapping can provide essential environmental information for autonomous driving systems, and path planning can leverage this information for effective navigation decisions.
Path planning is a key component in the autonomous navigation of robots. Considering safety, driving efficiency, passenger comfort, and energy consumption simultaneously complicates the planning process [
6,
7]. Based on an improved RRT algorithm, the work in [
8] proposed a strategy that added the endpoint cost to each node in the RRT* search, thereby reducing the path length of the random search and improving path smoothness. However, this study mainly focused on static environments and did not consider the shapes of surrounding obstacles. Reference [
9] proposed a path-planning method based on RRT*, aiming to improve robots’ mission success rate under uncertain terrain conditions. Although the proxy model reduces computational cost, the introduced reliability constraints may increase overall computational complexity and affect real-time performance. Reference [
10] used an artificial potential field (APF) method to improve the expansion point efficiency of the A* algorithm, successfully reducing the number of search nodes and improving obstacle avoidance efficiency. However, it did not smooth the path, potentially causing curvature discontinuities and affect the robot’s tracking efficiency. After path planning is completed, the robot needs to analyze the road surface information along the selected path to enhance environmental adaptability.
In actual operation, the friction coefficients of different road types affect the robot’s motion performance; thus, enhancing environmental perception capabilities has become urgent [
11]. In reference [
12], a multiscale convolutional attention network was introduced for the road detection task, which can more effectively capture information from occluded areas. However, this method mainly focused on road extraction in satellite images and failed to effectively segment road surface materials. An improved PointNet++ model capable of segmenting road irregularities and quantifying road roughness was proposed in [
13], thereby improving the vehicle’s perception of surface unevenness and enhancing driving safety. However, this work mainly focused on road surface unevenness, did not consider road surface materials, and was not validated through real vehicle testing. A nonholonomic robot model with integrated vision was proposed in [
14]; this model utilizes visual information to control vehicle motion and offers advantages such as low computational cost, simplicity, robustness, and finite-time convergence. A large-scale visual model can detect and identify obstacles on the road in real time, such as potholes, standing water, and stones, helping vehicles avoid potential risks during driving. In addition, the visual model can identify traffic signs and road markings, providing more comprehensive environmental information and assisting vehicles in decision-making.
As one of the core technologies for autonomous navigation of mobile robots [
15], path tracking will continue to promote the application of mobile robots in more complex and diverse environments. Reference [
16] adopted a reaction-based method to generate a path and used the pure-pursuit (PP) algorithm to achieve accurate path tracking. However, this study involved iterative optimization, which increased computational time and affected the real-time performance of the robot. In addition, the influence of road conditions and disturbances was not considered in the path-tracking process. Y. Tian et al. used important state information as input for reinforcement learning (RL) during the training process [
17], which greatly improved the convergence efficiency of the training framework. They also designed a reward function to ensure tracking accuracy during autonomous driving and to improve driving safety. However, this study did not consider the impact of road surface materials and did not conduct real vehicle tests. In [
18], a switching strategy for robot path tracking was adopted. In a known environment, a deterministic PP method was used; in an unknown and obstacle-filled environment, a deep RL-based method was employed for path tracking to better adapt to different environments. However, this scheme required the design of corresponding rules, and there was significant uncertainty in environmental assessment. Reference [
19] used an RL path-tracking framework, which can effectively handle the effects of uncertainty and disturbance. Additionally, a new experience pool priority mechanism was designed to improve the reward mechanism, and a dynamic reward function was introduced to reduce computing resource consumption, thereby accelerating convergence and avoiding local minima. However, this study did not incorporate robot body information, focused only on path tracking, and did not consider factors such as efficiency and energy consumption in the tracking process.
This paper proposes a velocity-adaptive robot path-tracking framework based on the SAC algorithm, aiming to achieve efficient and accurate path tracking and thus promote the application and development of mobile robots in complex environments. The framework integrates a variety of advanced technologies. In particular, the road state detection results from the large-scale visual model are fed into the reinforcement learning (RL) algorithm as feature values, significantly enhancing the perception and decision-making capabilities of the autonomous driving system. Specifically, the visual model processes real-time images to extract the road state, obstacle information, and other relevant features. These extracted features are then converted into state representations for the reinforcement learning model, forming the state space. Based on the current state information, the reinforcement learning algorithm optimizes the vehicle’s control strategy—including acceleration, deceleration, and steering—to adapt to different road conditions. During driving, the vehicle continuously collects new visual and feedback data, updating and optimizing the control strategy through reinforcement learning, which improves the adaptability and robustness of the system. The main contributions of this work are as follows:
The 3D environment mapping and localization method based on LIO is used to achieve stable robot pose output. In addition, by combining a gradient descent-based obstacle avoidance and path smoothing algorithm, the computational cost is reduced and planning efficiency is improved, all while satisfying the robot’s kinematic constraints.
A U-Net-based classification model is employed to perform detailed classification of road scenes, thereby enhancing the robot’s ability to perceive the road surface. This allows the robot to slow down in advance in complex environments and improves safety.
Leveraging the powerful learning capability of the ASAC algorithm, the proposed approach better represents the vehicle’s dynamic model and incorporates road surface information to enable road surface perception. This allows the robot to slow down in advance on slippery roads and increase its driving speed on dry and normal roads, thereby ensuring safety and improving movement efficiency.
Our STANLY_ASAC controller requires only the nominal values of vehicle parameters, rather than their exact values, to generate optimal acceleration and convert it into speed commands. In addition, the optimal look-ahead distance is adaptively determined by the geometry-based Stanley controller to obtain the optimal front wheel angle, which greatly reduces computational load and improves tracking efficiency.
4. Pavement Classification Algorithm
Different road types typically correspond to different friction coefficients. In this work, we do not directly estimate the friction coefficient of the road surface. Instead, we classify road surface types and incorporate this classification information into the ASAC controller. This enables the controller to adaptively adjust to varying road conditions, thereby improving path tracking accuracy. The U-Net network, which has a symmetrical encoder–decoder architecture [
22], is used to classify the actual road type in real time. The friction coefficient is then dynamically mapped based on the identified road type and input to the controller, enabling the control strategy to adapt to real-time changes in friction. Semantic segmentation is performed using a dataset containing 701 frames from RTK. We divide the road surface materials into 13 different categories: black—everything not related to the road; light blue—roads with asphalt surfaces; greenish blue—various pavements; peach/light orange—unpaved roads; white—road markings; pink—speed bumps; yellow—cat’s eyes; purple—storm drains; cyan—manhole covers; dark blue—patches on asphalt roads; dark red—water puddles; red—potholes; and orange—cracks.
For testing, we separated most of the frames by category and we present the results in
Figure 3. The original image is shown on the left, and the prediction results are in the middle, where different road types are represented as color images generated by the network, each corresponding to a specific road category. The rightmost part displays the detection result after the prediction output is mapped onto the original image with the background removed. The total number of parameters in the network is 1,941,309. The input image size is
, and the output is
, indicating the category to which each pixel belongs.
Table 2 shows the specifications of the U-Net, including each layer and its corresponding input and output parameter dimensions.
To improve data diversity and model robustness, we employed various data augmentation strategies during our experiments, including horizontal and vertical flipping, rotation within
, scaling from 0.8 to 1.2, color jittering (brightness, contrast, and saturation), and random cropping. These augmentation methods effectively expanded the distribution of the training samples and provided strong data support for subsequent performance improvements. Details are given in
Table 3, where precision is defined as the ratio of correctly predicted positive pixels to all predicted positive pixels, and recall represents the proportion of true positive pixels that are correctly identified.
5. ASAC Controller and Path Tracking
5.1. ASAC Controller
The ASAC controller effectively handles interference and accurately reflects vehicle dynamics, thereby generating reasonable speed control gains and improving tracking stability. It is able to adapt to changes in vehicle dynamics parameters and can improve tolerance to model uncertainties by adjusting control gains online.
State: , Roadseg_map, the lateral tracking error and the heading tracking error , .
Action: The output action variables of the ASAC algorithm are , and the control gain k. The setting of k is crucial to vehicle response: if k is too large, the vehicle becomes overly sensitive to lateral errors and is prone to oscillation; if k is too small, the response is sluggish and it becomes difficult to correct deviations in a timely manner. A reasonable selection of k is key to ensuring both stability and accuracy in path tracking.
Reward: To more realistically reflect and optimize tracking, the reward function in this design focuses on both tracking performance and ride comfort. The maximum penalty is , which is set to 20 in this work and serves to eliminate violations of the system requirement definition.
- 1.
During the control process, the controller should minimize both lateral and heading errors, with particular emphasis on lateral errors. To help the controller determine whether the vehicle has sufficient space to turn safely, half of the vehicle body width,
, serves as an important reference for evaluating tracking performance.
- 2.
The robot moves within the road’s speed limit. A higher speed can reduce the time required to travel the same route, thereby improving tracking efficiency, which is primarily reflected in speed. At the same time, to enhance ride comfort, frequent and large accelerations or decelerations should be avoided as much as possible.
Finally, the total reward is given as (8).
This study systematically designed the hyperparameter tuning process for the SAC algorithm to ensure reproducible results. We prioritized key hyperparameters such as the learning rate, discount factor, and target network update frequency, and set them appropriately. See
Table 4 for detailed parameter configurations.
5.2. Network of the ASAC
Figure 4 shows the network structure, including the input types, output dimensions, and the corresponding activation functions of each layer. “Linear” indicates a direct linear output without an activation function, while “Dense” refers to a fully connected layer. We use the Q value as the critic and update the network parameters using temporal difference learning. The update equations for the critic and actor are given by (9) and (10), respectively.
where
denotes the value of taking action
A in state
s as output by the critic network,
is the discount factor,
and
are the learning rate of the critic and actor networks, respectively;
is the coefficient of the entropy regularization term;
is computed by two independent critic networks and represents the minimum value of the action to be taken in the next state;
represents the probability density of taking an action in the next state;
is the output of the actor network; and
is the gradient of the actor network.
5.3. Path-Tracking Controller
The Stanley-based tracking controller shown in
Figure 1 sets the reference point at the center of the front wheel to achieve global convergence in path tracking, and the error decay rate is not affected by the vehicle speed. It ensures heading correction and position error correction, and the generated steering angle remains within the limits of the robot’s dynamics. It acts as a safety net to ensure that the control output is physically feasible and to enhance the system’s robustness to dynamic differences. These considerations ensure the effectiveness and safety of the controller during path tracking.
To eliminate the heading error relative to the path, set
. Assume that the lateral error of the robot is zero, i.e.,
. To achieve steering and eliminate the lateral error
, triangle
can be constructed as shown in
Figure 5. Take the nearest point on the tracking path to the front wheel center as
C and determine point
D at the intersection along the direction of the velocity. In this way, the angle
generated by
can be obtained. Define the distance of
as
, then Equation (
11) can be derived.
As
is a speed-related quantity, introduce the ratio
. Thus, the front wheel angle can be expressed as in Equation (
12). For
, the action is generated by reinforcement learning. The resulting steering angle is always within the vehicle dynamics range and is constrained to
.
7. Conclusions
In this paper, we propose a U-Net-based road semantic segmentation reinforcement learning controller to achieve accurate path tracking for autonomous vehicles on various road surfaces. First, we use a Lidar-IMU odometry framework to model the surrounding environment in 3D and output the vehicle’s positioning information in real time. Then, road surface detection information is obtained through U-Net semantic segmentation. Based on this information, we leverage the powerful learning capability of reinforcement learning to enable the controller to adapt to different road conditions. To reduce computational complexity, we introduce Stanley, a simple tracking controller based on a geometric algorithm, and combine it with an adaptive lateral deviation gain generated by reinforcement learning. This approach significantly reduces the computational burden and improves tracking efficiency while ensuring tracking accuracy. The experimental results show that our algorithm exhibits strong robustness against disturbances caused by changes in the road environment. This system achieves rapid response and correction of path-tracking errors through high-frequency IMU positioning and efficient controller design. The IMU’s output frequency of 100 Hz, combined with the controller’s average runtime of less than 0.0978 s, ensures that the control cycle is shorter than the positioning update interval. As a result, the system can correct trajectory deviations within 0.01 s, effectively improving path-tracking accuracy. In addition, we have conducted real-world experiments, and the results demonstrate that the algorithm performs well in terms of lightweight design, tracking accuracy, and real-time performance.
Limitations:
This method is limited by the tracking accuracy of the controller in extremely narrow passage scenarios [
26]. If the passage width is less than the minimum distance required for the robot to pass safely, path planning becomes infeasible, thus limiting the applicability of the method in such environments [
27].
This study has a real-time status-monitoring module that continuously monitors the output frequency of each subsystem and the lateral deviation during path tracking. If any indicator exceeds the preset threshold or the information has not been updated for a long time, the system will identify it as an abnormal situation, immediately stop the robot movement, and record the relevant data for subsequent fault analysis and system improvement. However, this monitoring mechanism may still have blind spots in extreme or unforeseen environments. Therefore, the safety and robustness of the system are limited by the coverage and sensitivity of the monitoring method, which is also a major limitation of this study.
This algorithm does not currently take road slope information into account, which to some extent limits its adaptability in complex road environments. Future work could incorporate slope information into the algorithm’s design to further enhance its adaptability to diverse road conditions.
Future work:
- 1.
We plan to incorporate additional environmental information, such as slope and traffic conditions, to address more complex application scenarios. We will then conduct a more in-depth analysis of the impact of hyperparameters on controller performance to further enhance the algorithm’s generalization and adaptability.
- 2.
We plan to increase the robot’s operating speed and introduce a trajectory prediction module for dynamic obstacles, thereby further enhancing the system’s adaptability and safety in complex environments.