Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture

Chen, Xuewen; Xia, Chenxi; Chen, Xiaohai

doi:10.3390/wevj16040198

Open AccessArticle

Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture

by

Xuewen Chen

^*

,

Chenxi Xia

and

Xiaohai Chen

College of Automobile and Traffic Engineering, Liaoning University of Technology, Jinzhou 121001, China

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(4), 198; https://doi.org/10.3390/wevj16040198

Submission received: 18 February 2025 / Revised: 20 March 2025 / Accepted: 21 March 2025 / Published: 1 April 2025

(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)

Download

Browse Figures

Versions Notes

Abstract

To address the limitation of 2D lane detection methods with monocular vision, which fail to capture the three-dimensional position of lane boundaries, this study proposes a convolutional neural network architecture for 3D lane detection. The deep residual network ResNet50 is employed as the feature extraction backbone, augmented with a coordinate attention mechanism to facilitate shallow feature extraction, multi-scale feature map generation, and extraction of small-scale high-order feature information. The BIFPN network is utilized for bidirectional feature fusion across different scales, significantly enhancing the accuracy of lane boundary detection. By constructing an inverse perspective transformation model (IPM), the conversion from front view to aerial view is realized. A dedicated 3D lane detection head is designed for lane boundary anchor lines, enabling efficient fusion and downsampling of multi-scale feature maps. By incorporating the bias between lane boundaries and anchor lines, the 3D position of lane boundaries is effectively detected. Validation experiments on the OpenLane dataset demonstrate that the proposed method not only detects the spatial locations of lane boundaries but also identifies attributes, such as color, solid or dashed, single or double lines, and left or right dashed configurations. Additionally, the method achieves an inference speed of 64.9 FPS on an RTX 4090 GPU, showcasing its computational efficiency.

Keywords:

intelligent vehicle; three-dimensional lane boundary detection; deep learning; coordinate attention mechanism; feature fusion; simulation

1. Introduction

In recent years, the global research on improving vehicle driving safety and reducing vehicle road accidents, especially the development of vehicle driver assistance systems, has been booming [1,2,3]. Lane departure warning, lane keeping, adaptive cruise control (ACC), lane change assistance and trajectory tracking control all belong to advanced driving assistance systems (ADASs) and intelligent driving systems [4,5,6]. As one of the core technologies of ADAS, lane boundary detection plays a key role in autonomous driving and assisted driving. However, 2D lane detection results cannot be directly used to perceive downstream planning and control tasks. Therefore, three-dimensional lane boundary detection based on spatial position can effectively make up for the shortcomings of 2D lane detection, which has important research significance and wide application prospects.

In order to realize 3D lane boundary detection, relevant scholars have carried out extensive research. Garnett proposed a 3D-LANenet model, which converted the forward-looking feature map into a bird’s-eye view, adopted the lane boundary representation method based on anchor lines, converted lane boundary detection into a target detection problem, and realized 3D lane boundary detection by predicting the offset between lane boundaries and associated anchor lines [7]. The authors of [8] propose a two-stage 3D lane boundary Gen-LaneNet detection method. The authors of [9] propose an end-to-end monocular 3D lane boundary detection method, which used perspective transformation to realize the conversion from front view to aerial view and adopted 2D/3D anchor line design to realize the detection of 3D lane boundaries. The authors of [10] propose a 3D lane boundary BEV-LaneDet detection method based on semantic segmentation. The front view captured by the vehicle camera is converted into the front view of the virtual camera, the front view feature map is converted into the bird’s-eye view by the multi-layer perceptron, and 3D lane boundary detection is realized by semantic segmentation. The authors of [11] propose a BEVFormer architecture, which learns unified BEV features through spatio-temporal transformation, combines spatial cross-attention and temporal self-attention modules, and effectively fuses multi-camera image information and time cues to achieve speed estimation and occlusion object detection under low-visibility conditions. The authors of [12,13] fuse the BEV feature mapping constructed by multi-frame image features into a unified BEV space and use historical sparse curve query and dynamic anchor point set to achieve effective time propagation. The authors of [14] use a Transformer sparse query and cross-concern mechanism to realize 3D lane polynomial coefficient regression. Based on CurveFormer query anchor modeling, reference [15] constructed a lane-aware query generator and dynamic 3D ground position embedding to extract lane information. The authors of [16] propose the Curveformer++ single-stage decoder, which does not require the image feature view conversion module. The curve-cross attention module is introduced to calculate the similarity between image features and curve queries and directly infer three-dimensional lane detection results from perspective image features. In reference [17], virtual cameras were used to ensure spatial consistency and three-dimensional lane boundaries were represented by key points to adapt to more complex scenes. The authors of [18] introduce a line-by-line classification method in BEV that can support lanes in any direction and interact with feature information within the instance group.

In this study, a convolutional neural network for 3D lane boundary detection is constructed based on the deep learning method. By constructing the inverse perspective transformation model (IPM), the conversion from front view to aerial view is realized. The residual network ResNet50 is used as the feature extraction network, and the coordinate attention mechanism is introduced to improve the feature extraction capability of the feature extraction network. In order to improve the detection performance of the model, a bidirectional feature pyramid fusion module is used to fuse the multi-scale forward-looking features extracted from the backbone network. A feature map fusion subsampling module is designed as a 3D lane boundary detection head to perform fusion subsampling on multi-scale bird’s-eye view (BEV) feature maps. A kind of lane boundary anchor line is constructed to predict the deviation between the predicted lane boundary and anchor line and to detect the spatial position of the lane boundary. The proposed model was verified on the Openlane dataset.

2. Inverse Perspective Transformation Model (IPM)

2.1. Three-Dimensional Lane Geometry Modeling

Figure 1 shows the relationship between the car coordinate system and the camera coordinate system when the vehicle is driving on an uphill road, where O-X-Y-Z is the self-propelled coordinate system, O_C-X_C-Y_C-Z_C is the camera coordinate system, the distance from the camera center to the road is h, and the angle θ formed by the Y_C axis and the Y axis of the self-propelled coordinate system is the camera pitch angle, without considering the camera rolling angle and yaw angle. In order to infer the position of the lane boundary from the front-view image taken by the vehicle camera, the line between the camera center OC and a point (x, y, z) on the lane surface intersects the BEV plane, and the intersection point is (x′, y′, z′ = 0). The center OC of the camera, the point on the lane surface (x, y, z) and the point on the BEV plane (x′, y′, z′ = 0) were mathematically constructed, as shown in Figure 2.

The solid black line in Figure 2 represents the lane surface, and the Y axis represents the BEV plane. According to the similar triangle theory, points (x, y, z) and points (x′, y′, z′ = 0) have the following relationship:

\frac{h}{h - z} = \frac{x^{'}}{x} = \frac{y^{'}}{y}

(1)

The conversion relationship between BEV coordinates and self-propelled coordinates is obtained as follows:

\{\begin{cases} x = x' \times (1 - \frac{z}{h}) \\ y = y' \times (1 - \frac{z}{h}) \end{cases}

(2)

According to Formula (2), the prediction of the three-dimensional position point (x, y, z) of the lane boundary in the self-driving coordinate system can be converted to the prediction of BEV coordinate (x′, y′) and height z.

2.2. Inverse Perspective Transformation

If there is a point M_i (X, Y, Z) in a coordinate system in space, the process of projection to the pixel coordinate system by the camera can be expressed as:

\begin{array}{l} z (\begin{array}{l} u \\ v \\ 1 \end{array}) = (\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}) (\begin{array}{l} \begin{matrix} r_{11} & r_{12} & r_{13} & t_{1} \end{matrix} \\ \begin{matrix} r_{21} & r_{22} & r_{23} & t_{2} \end{matrix} \\ \begin{matrix} r_{31} & r_{32} & r_{33} & t_{3} \end{matrix} \end{array}) (\begin{array}{l} X \\ Y \\ Z \\ 1 \end{array}) \\ = K [R, T] M \end{array}

(3)

where K and [R, T] are the internal and external parameter matrices of the camera, R is the rotation matrix from the vehicle coordinate system to the camera coordinate system, and T is the translation vector from the vehicle coordinate system to the camera coordinate system.

The matrix in Formula (3) is obtained by combining it as follows:

z (\begin{matrix} u \\ v \\ 1 \end{matrix}) = (\begin{matrix} \begin{matrix} m_{11} & m_{12} & m_{13} & m_{14} \end{matrix} \\ \begin{matrix} m_{21} & m_{22} & m_{23} & m_{24} \end{matrix} \\ \begin{matrix} m_{31} & m_{32} & m_{33} & m_{34} \end{matrix} \end{matrix}) (\begin{matrix} X \\ Y \\ Z \\ 1 \end{matrix})

(4)

Assuming the height z is equal to 0, the following results can be obtained:

z (\begin{matrix} u \\ v \\ 1 \end{matrix}) = (\begin{matrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ m_{31} & m_{32} & m_{33} \end{matrix}) (\begin{matrix} X \\ Y \\ 1 \end{matrix})

(5)

Given any point (X_bev, Y_bev) in the BEV plane, its corresponding position (u, v) in the pixel coordinate system can be calculated, and the corresponding relationship is shown in Figure 3.

As can be seen from Figure 3, the rectangular area formed by four points A-D in front of the road is the area of interest for the road ahead. By bringing the coordinates of these four points into Formula (5), the corresponding position coordinates of the four pixels on the front view can be obtained, a(u_a, v_a), b(u_b, v_b), c(u_c, v_c), d(u_d, v_d). Suppose that the four vertex coordinates of a fixed resolution aerial view are a′(u_a_′, v_a_′), b′(u_b_′, v_b_′), c′(u_c_′, v_c_′), d′(u_d_′, v_d_′); then, the inverse perspective transformation model can be constructed according to Formula (6).

\begin{array}{l} (\begin{matrix} u_{a'} & u_{b'} & u_{c'} & u_{d'} \\ v_{a'} & v_{b'} & v_{c'} & v_{d'} \\ 1 & 1 & 1 & 1 \end{matrix}) = (\begin{matrix} g_{11} & g_{12} & g_{13} \\ g_{21} & g_{22} & g_{23} \\ g_{31} & g_{32} & g_{33} \end{matrix}) (\begin{matrix} u_{a} & u_{b} & u_{c} & u_{d} \\ v_{a} & v_{b} & v_{c} & v_{d} \\ 1 & 1 & 1 & 1 \end{matrix}) \\ = G (\begin{matrix} u_{a} & u_{b} & u_{c} & u_{d} \\ v_{a} & v_{b} & v_{c} & v_{d} \\ 1 & 1 & 1 & 1 \end{matrix}) \end{array}

(6)

3. Three-Dimensional Lane Boundary Detection Network Construction

Figure 4 shows the overall structure of the 3D lane boundary detection model, which is mainly composed of the feature extraction network ResNet50, feature fusion network BIFPN, inverse perspective transform module IPM and 3D lane boundary detection head. Compared to PersFormer, EfficientNet-B7 is used as the feature extraction network to achieve a good balance between accuracy and computational efficiency. In this study, the convolutional residual network ResNet50 is used for feature extraction, which saves computing cost and is conducive to real-time detection on the basis of guaranteeing network depth. The residual network ResNet50 is selected as the feature extraction network, and a large 7 × 7 convolutional kernel is placed at the first layer of the network to receive RGB three-channel images. Before ResNet50 downsampling, coordinate attention (CA) operations are performed on the feature map, and then different convolutional downsampling layers are used to downsample the feature map. Multi-scale feature maps are encoded. Another CA attention operation is then performed to extract the “elongated” features of the lane boundary. The bidirectional feature fusion module (BIFPN) is used to fuse the multi-scale forward feature map of the backbone network output, and the multi-scale forward feature map is obtained. PersFormer does not set up a feature fusion module to perform feature fusion on multi-scale front-view feature maps. By generating dynamic offset, PersFormer adjusts the position of reference points in the front-view feature map to determine the feature of target points in the BEV feature map and converts the front-view feature map to the BEV feature map. The disadvantages of a Transformer include large the number of learnable parameters, high computational complexity, large amount of computing resources, and poor real-time detection. In this study, the inverse perspective transform module (IPM) is used to transform the fused multi-scale forward-looking feature map into a multi-scale BEV feature map by combining the internal and external parameters of the camera. This is a pure bilinear interpolation process with no learnable parameters, which helps to lighten the model weight and improve the real-time detection. The BEV feature map is fed into the 3D lane boundary detection head, and the spatial position detection of lane boundary is realized after fusion and subsampling.

Table 1 shows the ResNet50 network structure. Here, conv2d represents the convolution operation, bottleneck represents bottleneck structure, s = 2 represents the step length of convolution operation, s = (1, 1, 1) represents the step length of convolution operation in bottleneck, k = 7 represents the size of convolution nuclei participating in convolution operations, k = (1, 3, 1) represents the size of the convolution kernel corresponding to the convolution operation in a bottleneck. ResNet50 performs five double downsampling operations on the input image. First, convolution operations with step size 2 and convolution kernel size 7 × 7 and a maximum pooling operation with step size 2 and pool kernel size 3 × 3 were used to rapidly downsample the input image twice to obtain an intermediate feature map with rich lane semantic information. The bottleneck module extracts semantic information from feature graphs by convolutional operations stacked with three convolution layers with step size 1, convolution kernel size 1 × 1, 3 × 3, and 1 × 1, respectively. In addition, it adopts a residual connection mode structurally, adding a convolution operation with step size 1 or 2 and convolution kernel size 1 × 1 to the residual connection path. The connection mode is used to change the size of the input feature map or the number of channels to ensure that the feature map scale is consistent when residual fusion is performed. The bottleneck of layers 11, 23, and 41 implements the subsampling of the intermediate feature graph by setting the step of the second convolution operation to 2 and the size of the convolution kernel to 3 × 3.

3.1. Coordinate Attention Mechanism Introduction

For the elongated features of lane boundaries, the coordinate attention mechanism CA is introduced into the feature extraction network, so that the feature extraction network can capture the long-distance dependence between the spatial positions of lane boundaries and pay attention to the extraction of “elongated” features.

One-dimensional average pooling is carried out in the width and height directions of the input feature graph [C, H, W] to obtain two feature graphs, [C, H, 1] and [C, 1, W]. The two feature graphs are joined together in the width direction to obtain [C,1,C+W]. After 1 × 1 convolution, channel number compression, normalization and nonlinear activation, the feature graph [Cr, 1, C+W] and the independent tensors [Cr, H, 1] and [Cr, 1, W] after separation are obtained. Then, the new feature maps [C, H, 1] and [C, 1, W] are obtained after 1 × 1 convolution and channel number recovery. The Sigmoid() activation function is used to map the value of the feature graph to (0, 1) to obtain the coordinate attention weight in the direction of width and height and then multiplied by the original feature to achieve the feature extraction of important positions on the feature graph. The specific structure is shown in Figure 5.

3.2. Improved Feature Fusion Network

Figure 6 shows the BIFPN structure of the bidirectional feature pyramid fusion module. Different from the traditional feature pyramid fusion module FPN, BIFPN introduces bidirectional connections between adjacent levels of the feature pyramid, which enables the feature information to be transmitted between high-level features and low-level features.

The BIFPN module structure can fuse feature maps of different scales. The structure relies on image context information to judge the position of lane boundaries and can also accurately judge discontinuous situations, such as lane boundaries being blocked and lane boundaries being worn.

3.3. Three-Dimensional Lane Detection Head Design

Figure 7 shows a design diagram of the three-dimensional lane boundary detection head based on the anchor line. On the BEV plane, a number of fixed sampling points are taken on the Y axis to represent the fixed longitudinal distance of the lane points on the basis of a self-propelled coordinate system as the origin. Further, 16 anchor line starting points are defined with fixed spacing on the X axis, and 7 anchor lines are set at different angles at each starting point, so that a total of 112 anchor lines on the BEV plane are used as candidate anchor lines. For any lane boundary in three-dimensional space l, it is mapped to the BEV plane, and the corresponding BEV lane point horizontal coordinate set

{\{x_{i}\}}_{i = 0}^{N}

can be obtained through the predefined collection of sampling points

{\{y_{i}\}}_{i = 0}^{N}

on the Y axis. Then, the average distance between the lane point horizontal coordinate and each candidate anchor line is calculated, and the nearest anchor line is selected as the associated anchor line.

For each three-dimensional lane boundary, the detection head only needs to predict N sets of structure vectors related to the associated anchor lines

{\{X_{i}^{k}\}}_{i = 0}^{N}

, as shown in Formula (7).

l_{k} = {(Δ x_{i}^{k}, z_{i}, v i s_{i})}_{i = 0}^{N}

(7)

where

Δ x_{i}^{k}

represents the deviation between the i sampling point of the

k

anchor line and the predicted lane point in the X-axis direction.

z_{i}

represents the predicted height of the lane point in three-dimensional space.

v i s_{i}

represents the visibility of the predicted lane points, representing the starting position and ending position of the lane and realizing the prediction of the lane boundary length. Thus, the three-dimensional coordinates of each lane in the BEV plane can be expressed as

{(Δ x_{i}^{k} + X_{i}^{k}, y_{i}, z_{i})}_{i = 0}^{N}

. According to the visibility probability predicted value of each lane point, the lane point coordinates whose probability value is greater than the threshold value are retained. Spline interpolation is used to fit the space curve, and the three-dimensional space curve equation of the lane boundary is output to realize the three-dimensional lane boundary detection. For each predefined anchor line, the designed 3D lane detection head can finally realize the prediction of the lateral deviation

Δ x

of the sampling point position relative to the anchor line, the prediction of the sampling point position relative to the predicted height

z

of the BEV plane, whether the sampling point is visible

v i s

, the category

t y p e

of the lane boundary and the confidence

\hat{C}

of the associated anchor line of the lane boundary.

3.4. Evaluation Index of Three-Dimensional Lane Boundary Detection

\{\begin{cases} L_{p r o b} = - \frac{1}{K} \sum_{k = 1}^{K} [{\hat{c}}_{k} \cdot \log (c_{k}) + (1 - {\hat{c}}_{k}) \cdot \log (1 - c_{k})] \\ L_{r e g} = \sum_{k = 1}^{K} \sum_{i = 0}^{N} {\hat{c}}_{k} \cdot [‖ {\hat{v}}_{i} \cdot (Δ x_{i} - Δ {\hat{x}}_{i}) ‖_{1} + {‖{\hat{v}}_{i} \cdot (z_{i} - {\hat{z}}_{i})‖}_{1}] \\ L_{v i s} = - \frac{1}{N} \sum_{k = 1}^{K} \sum_{i = 0}^{N} {\hat{c}}_{k} \cdot [{\hat{v}}_{i} \cdot \log (v i s_{i}) + (1 - {\hat{v}}_{i}) \cdot \log (1 - v i s_{i})] \\ L_{c l s} = - \sum_{j = 0}^{S} {\hat{t}}_{j} \cdot \log (t y p e_{j}) \end{cases}

(8)

Assuming that the predicted lane anchor line

{(Δ x_{i}, z_{i}, v i s_{i}, t y p e_{j}, c_{k})}_{i = 0}^{N}

and the corresponding truth value

{(Δ {\hat{x}}_{i}, {\hat{z}}_{i}, {\hat{v}}_{i}, {\hat{t}}_{j}, {\hat{c}}_{k})}_{i = 0}^{N}

are known, the loss function of three-dimensional lane boundary detection is expressed as Formula (8).

In Formula (8),

L_{p r o b}

is the existence loss of lane boundary,

K

is the total number of anchor lines, and

{\hat{C}}_{k}

is the confidence of the lane boundary-associated anchor line

X^{k}

.

L_{r e g}

is the regression loss of the lateral deviation of sampling point

y_{i}

and the predicted height.

N

is the total number of sampling points, and

L_{v i s}

is the visibility loss of sampling point

y_{i}

. The length of the predicted lane boundary is determined by the forecast visibility

v i s_{i}

of the sampling point

y_{i}

.

L_{c l s}

is the loss of lane boundary classes, and

S

is the total number of lane boundary classes.

In addition to the total loss function, 3D lane boundary detection performance evaluation mainly adopts the following: the lane boundaries detected by the model are the ratio of the total number of correct lane boundaries detected by the model (F1 score), and the average absolute error index of the x and z coordinates of the speculated lane boundaries (

{\bar{x}}_{e_c}

and

{\bar{z}}_{e_c}

) and the x and z coordinates in the label (

{\bar{x}}_{e_f}

and

{\bar{z}}_{e_f}

) in the range of short distance (0–40 m) and long distance (40–100 m) to measure the quality of the control effect.

4. Experimental Verification

In order to verify the validity of the three-dimensional lane boundary detection network model built in this study, training and testing were carried out on the Openlane dataset. Among them, the training set consisted of 157,807 pictures, the verification set consisted of 39,981 pictures, and the test set consisted of 42,007 pictures. A total of 70 epochs were trained, and the batch_size was set to 32. The Adam optimizer is used, the initial learning rate is 0.0002, the two momentum parameters are 0.9 and 0.999, and the weight attenuation of 1 × 10⁻³ is used to prevent overfitting of the model. When the model training batch is less than 2000 times, the LinearWarmup method is used to dynamically adjust the learning rate, and the learning rate attenuation strategy (

l r = {(1 - i t e r / t o t a l_{i t e r})}^{0.9}

) is adopted to prevent the model from falling into the local optimum and failing to converge, where

l r

represents the current learning rate,

i t e r

represents the number of current iterations, and

t o t a l_{i t e r}

represents the total number of iterations.

This study adopts the official evaluation indicators of the Openlane dataset: the average absolute errors of the x and z coordinates of the inferred lane boundary in the range of F1 scores, near distance (0–40 m) and long distance (40–100 m) were compared with those in the label and compared with other 3D lane boundary detection models trained on the Openlane dataset. The comparison results are shown in Table 2 and Table 3.

As can be seen from Table 2, the method in this study has been greatly improved in three aspects: the F1 score, x coordinate absolute error

{\bar{x}}_{e_f}

and z coordinate absolute error

{\bar{z}}_{e_f}

in a long-distance range. Compared with other models, the performance of the near x coordinate error

{\bar{x}}_{e_c}

and the near z coordinate error

{\bar{z}}_{e_c}

is also improved (the bold data in this study represent the optimal value of performance). As can be seen from the comparison results of F1 scores of the three mainstream models in different scenarios in Table 3, the three-dimensional lane boundary detection model designed in this study is highly superior to PersFormer in the uphill and downhill scenarios and other test scenarios. It obtained higher F1 scores, which proves that the method proposed in this study is effective and has superior performance.

Finally, this study shows the test results of the proposed model compared with the mainstream model in different scenarios, such as upturn, dark, night, upslope and downslope, and lane boundary occlusion in the Openlane dataset, as shown in Figure 8. In the comparison results in Figure 8, the left picture is our method, the middle picture is the 3D-LaneNet method, and the right picture is the Gen-LaneNet method, where pred represents the detection result line and gt represents the real lane boundary (green line in Figure 8). Here, we need to explain the data label in the left figure of our method in Figure 8 as follows. In this study, only the actual green gt lines are shown in the result diagram, while the pred lines (detection result lines) are shown in the figure (see different colors and virtual and solid lines for details). Because the detection result line contains multiple lines such as different colors and virtual and real, if all the pred legend labels are displayed, there will be too many labels, which will make the figure cluttered and less readable.

As can be seen from Figure 8, the 3D lane boundary detection neural network structure model designed in this study based on monocular vision can identify multiple lane boundary categories, such as single white solid line, single white dotted line, single yellow solid line, single yellow dotted line, double yellow solid line, left virtual and right virtual line, left real and right virtual line, and edge line. The detection line (different colors) can be overlapped with the real lane boundary (green lines). And the detection distance can reach about 100 m ahead. In addition, this model can also realize three-dimensional lane boundary detection under different scenes, such as curves, dark, night, uphill and downhill, and lane boundary occlusion.

5. Closing Remarks

The convolutional neural network structure of 3D lane boundary detection is constructed based on the deep learning method. The main conclusions and achievements are as follows: (1) The coordinate attention mechanism is introduced into the feature extraction network ResNet50, and the feature map is decomposed into two one-dimensional feature coding processes during feature extraction, so that the feature extraction network can capture the long-distance dependence of the spatial positions of the lane boundaries and pay attention to the extraction of “slender” features during feature extraction. (2) Aiming at the multi-scale feature maps extracted by the feature extraction network, the BIFPN fusion network is used to fuse the forward feature maps with different receptive fields to solve the problem of blocked lane boundaries and discontinuous wear. (3) When detecting lane boundaries, the designed model can simultaneously identify different categories of lane boundaries, such as single-double line, white-yellow line, virtual–solid line, left virtual–right real, left real–right virtual, and so on. The detection distance can reach about 100 m ahead. It provides an accurate basis for judging whether the vehicle can perform lane change and steering. (4) Compared with the mainstream model, it has superior detection performance and higher inference speed and can meet the real-time requirements of three-dimensional lane boundary detection.

Author Contributions

X.C. (Xuewen Chen) is responsible for conceptualizing and writing, C.X. is responsible for the editing and normalization of graphics and tables, and participates in the writing, X.C. (Xiaohai Chen) is responsible for simulation verification and result analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported, in part, by the Natural Science Foundation of China under Grant 62,373,175 and 2024 Fundamental Research Funding of the Educational Department of Liaoning Province LJZZ232410154016.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original structure diagram and simulation result diagram can be provided. The original data presented in this study are openly available in reference [9].

Conflicts of Interest

The authors declare no conflicts of interest to back matter.

References

Mitiku, M.A.; Shao, Y.M.; Zhang, L.; Wang, R.; Peng, J.S. Test and Evaluation of Lane Departure Warning System for Passenger Cars. J. Chongqing Jiaotong Univ. (Nat. Sci.) 2020, 39, 141–146. [Google Scholar]
Peng, P.; Geng, K.K.; Wang, Z.W.; Liu, Z.C.; Yin, G.D. Review on Environmental Perception Methods of Autonomous Vehicles. J. Mech. Eng. 2023, 59, 281–303. [Google Scholar]
Huang, J.; Choudhury, P.K.; Yin, S.; Zhu, L. Real-Time Road Curb and Lane Detection for Autonomous Driving Using LiDAR Point Clouds. IEEE Access 2021, 9, 144940–144951. [Google Scholar] [CrossRef]
Wang, B.K. Lane Detection Method Based on Deep Learning. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2021. [Google Scholar]
Li, C.; Wang, S.F.; Liu, M.C.; Peng, Z.B.; Ne, S.Q. Research on the Trajectory Tracking Control of Autonomous Vehicle Based on the Multi-Objective Optimization. Automob. Technol. 2022, 559, 8–15. [Google Scholar]
Gao, Z.H.; Bao, M.X.; Gao, F.; Tang, M.H.; Lu, Y. A Uni-Modal Network Prediction Method for Surrounding Vehicle Expected Trajectory in Intelligent Driving System. Automob. Technol. 2022, 1–9. [Google Scholar]
Garnett, N.; Cohen, R.; Pe’er, T.; Lahav, R.; Levi, D. 3d-lanenet: End-to-end 3d multiple lane detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2921–2930. [Google Scholar]
Guo, Y.; Chen, G.; Zhao, P.; Zhang, W.; Miao, J.; Wang, J.; Choe, T.E. Gen-lanenet: A generalized and scalable approach for 3d lane detection. In Computer Vision-ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part XXI 16; Springer International Publishing: Cham, Switzerland, 2020; pp. 666–681. [Google Scholar]
Chen, L.; Sima, C.; Li, Y.; Zheng, Z.; Xu, J.; Geng, X.; Li, H.; He, C.; Shi, J.; Qiao, Y. Persformer: 3D lane detection via perspective transformer and the openlane benchmark. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 550–567. [Google Scholar]
Wang, R.; Qin, J.; Li, K.; Li, Y.; Cao, D.; Xu, J. BEV-LaneDet: A Simple and Effective 3D Lane Detection Baseline. arXiv 2022, arXiv:2210.06006. [Google Scholar]
Li, Z.; Wang, W.; Li, H.; Xie, E.; Sima, C.; Lu, T.; Qiao, Y.; Dai, J. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2020–2036. [Google Scholar] [CrossRef] [PubMed]
Liang, T.; Xie, H.; Yu, K.; Xia, Z.; Lin, Z.; Wang, Y.; Tang, T.; Wang, B.; Tang, Z. Bevfusion: A simple and robust lidar-camera fusion framework. Adv. Neural Inf. Process. Syst. 2022, 35, 10421–10434. [Google Scholar]
Wang, Y.; Guo, Q.; Lin, P.; Cheng, G.; Wu, J. Spatio-temporal fusion-based monocular 3d lane detection. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 21–24 November 2022. [Google Scholar]
Bai, Y.; Chen, Z.; Fu, Z.; Peng, L.; Liang, P.; Cheng, E. Curveformer: 3D lane detection by curve propagation with curve queries and attention. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 7062–7068. [Google Scholar]
Luo, Y.; Zheng, C.; Yan, X.; Kun, T.; Zheng, C.; Cui, S.; Li, Z. Latr: 3D lane detection from monocular images with transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 7941–7952. [Google Scholar]
Bai, Y.F.; Chen, Z.R.; Liang, P.P.; Cheng, E.K. CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention. arXiv 2024, arXiv:2402.06423v1. [Google Scholar]
Wang, R.; Qin, J.; Li, K.; Li, Y.; Cao, D.; Xu, J. Bev-lanedet: An efficient 3d lane detection based on virtual camera via key-points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1002–1011. [Google Scholar]
Li, Z.; Han, C.; Ge, Z.; Yang, J.; Yu, E.; Wang, H.; Zhao, H.; Zhang, X. Grouplane: End-to-end 3d lane detection with channel-wise grouping. arXiv 2023, arXiv:2307.09472. [Google Scholar] [CrossRef]

Figure 1. Diagram of the relationship between car and camera coordinate system in uphill scene.

Figure 2. Lane surface section in self-propelled coordinate system.

Figure 3. Schematic diagram of reverse perspective transformation.

Figure 4. Overall structure of 3D lane detection model.

Figure 5. Structure of coordinate attention mechanism.

Figure 6. Bidirectional feature pyramid BIFPN module structure.

Figure 7. Design of 3D lane boundary detection head based on anchor line.

Figure 8. Comparison of different detection methods on Openlane dataset.

Table 1. ResNet50 module structure.

Layer	Type	Out-Channels	Out-Size
1	conv2d, s = 2, k = 7	64	288 × 512
1	maxpool, s = 2, k = 3	64	144 × 256
2–10	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	256	144 × 256
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	256	144 × 256
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	256	144 × 256
11–22	bottleneck, s = (1, 2, 1), k = (1, 3, 1)	512	72 × 128
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	512	72 × 128
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	512	72 × 128
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	512	72 × 128
23–40	bottleneck s = (1, 2, 1), k = (1, 3, 1)	1024	36 × 64
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	1024	36 × 64
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	1024	36 × 64
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	1024	36 × 64
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	1024	36 × 64
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	1024	36 × 64
41–49	bottleneck, s = (1, 2, 1), k = (1, 3, 1)	2048	18 × 32
	bottleneck, s = (1, 1, 1), k = (1, 3, 1)	2048	18 × 32
	bottleneck s = (1, 1, 1), k = (1, 3, 1)	2048	18 × 32
50	conv2d, s = 1, k = 1	512	18 × 32

Table 2. Comparison results of three-dimensional lane boundary detection models.

Method	F1 Score↑	${\bar{x}}_{e_c}$ ↓	${\bar{x}}_{e_f}$ ↓	${\bar{z}}_{e_c}$ ↓	${\bar{z}}_{e_f}$ ↓
3D-LaneNet	41.5	0.269	0.815	0.142	0.685
Gen-LaneNet	30.6	0.298	0.865	0.160	0.738
PersFormer	50.5	0.311	0.553	0.149	0.541
Our method	55.7	0.247	0.416	0.135	0.411

Table 3. Comparison of 3D lane boundary detection model in different scenes.

Method	Uphill and Downhill	Curve	Severeweather	Night	Intersection	Bifurcation
3D-LaneNet	38.5	44.2	43.9	40.5	30.6	37.4
Gen-LaneNet	25.7	32.3	28.5	19.8	22.6	29.1
PersFormer	43.6	53.8	50.1	48.4	39.2	46.4
Our method	45.8	58.4	54.1	49.5	45.5	49.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Xia, C.; Chen, X. Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture. World Electr. Veh. J. 2025, 16, 198. https://doi.org/10.3390/wevj16040198

AMA Style

Chen X, Xia C, Chen X. Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture. World Electric Vehicle Journal. 2025; 16(4):198. https://doi.org/10.3390/wevj16040198

Chicago/Turabian Style

Chen, Xuewen, Chenxi Xia, and Xiaohai Chen. 2025. "Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture" World Electric Vehicle Journal 16, no. 4: 198. https://doi.org/10.3390/wevj16040198

APA Style

Chen, X., Xia, C., & Chen, X. (2025). Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture. World Electric Vehicle Journal, 16(4), 198. https://doi.org/10.3390/wevj16040198

Article Menu

Lane Boundary Detection for Intelligent Vehicles Using Deep Convolutional Neural Network Architecture

Abstract

1. Introduction

2. Inverse Perspective Transformation Model (IPM)

2.1. Three-Dimensional Lane Geometry Modeling

2.2. Inverse Perspective Transformation

3. Three-Dimensional Lane Boundary Detection Network Construction

3.1. Coordinate Attention Mechanism Introduction

3.2. Improved Feature Fusion Network

3.3. Three-Dimensional Lane Detection Head Design

3.4. Evaluation Index of Three-Dimensional Lane Boundary Detection

4. Experimental Verification

5. Closing Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI