1. Introduction
The key to realizing automatic driving technology is to correctly identify and detect various complex lane lines in traffic scenarios, using perception technology to transmit the processed correct lane line information to the control system of the intelligent vehicle, thus making correct path planning and behavior decisions to realize vehicle automatic driving. At the same time, the accurate semantic segmentation and detection of lane lines can provide an information basis for advanced driving assistance systems (ADASs), such as automatic cruise driving, lane keeping, lane changing, and overtaking, which can further improve active safety during driving [
1,
2].
At present, under good road conditions (structured roads with clear lane lines, no obvious illumination changes, and no obstacle occlusion), the lane line detection algorithm can achieve good results and meet the functional requirements of ADASs. However, actual traffic scenarios are complex and changeable, and there are various interferences such as illumination changes, the occlusion of other vehicles, unclear lane lines, etc., which may lead to false detection, missed detection, and even the failure of lane line detection algorithms, which greatly limits the application scenarios of ADASs and affects driving safety and user experience. Therefore, how to adapt to various complex road conditions in actual traffic scenarios and take into account the accuracy and real-time processing is the key problem facing lane line detection tasks, and it is also an important goal of the continuous optimization and upgrading of ADASs.
For lane line recognition, the Sobel operator and Canny operator are two widely used edge detection algorithms. Reference [
3] used the Otsu algorithm [
4] to extract the region of interest (ROI) from a gray image, detected the image ROI based on the Sobel operator, and then extracted the lane line through threshold segmentation and piecewise fitting. Reference [
5] combined the Sobel operator with non-local maximum suppression (NLMS) to select numerous candidate lane lines in the image ROI, and then screened the candidate lane lines according to the combination of lane line structure features and color features to complete the extraction of lane lines.
Reference [
6] detected the ROI extracted from the original image based on the Canny operator and the paired features of lane lines, and then extracted the lane lines through denoising and threshold segmentation. Reference [
7] first used the Canny operator to detect the edge of the image, then combined Hough transform with the maximum likelihood method, determined the control points of the Catmull ROM spline based on the perspective principle, and finally completed lane line extraction.
The common feature of these algorithms is that the mechanism is transparent, the process is controllable, and the speed is fast, although the pattern of feature extraction is single. Although they can achieve good recognition effects in one or a few scenarios, they are very vulnerable to changes in the road environment, and the robustness of the algorithms is poor, and cannot meet the needs of lane line recognition in complex and changeable real scenarios.
In recent years, the rapid development of computer hardware has greatly promoted the development of deep learning theory and technology, making deep learning more and more widely used in many fields, especially in lane recognition applications. Deep learning has become the mainstream development direction for future research.
Reference [
8] proposed a lane line identification method based on a support vector machine (SVM), using a spline curve to fit lane lines according to the screening results of lane lines by the SVM; however, this method relies on massive samples for training. Reference [
9] used digital image processing to determine the starting and ending positions of the lane line by calculating the gradient change size and direction of the lane line edge, and made corrections based on the density function to finally determine the optimal lane line. However, this algorithm only worked well on flat roads. Reference [
10] investigated preprocessing in lane line identification using a Gaussian filter to denoise, as well as edge enhancement to enrich lane line details to obtain accurate boundaries; however, the algorithm has low accuracy in scenarios of lane line damage. Reference [
11] proposed a lane line identification algorithm based on multi-resolution and multi-scale Hough transform, which obtained the lane line by setting the threshold of the geometric characteristics of the lane line; however, the algorithm demanded many experiments in the threshold setting. Reference [
12] estimated the directional changes of subsequent lane lines through the edge distribution function, but the effect was not ideal for dotted lines and bends.
The algorithm proposed in [
13] needed segmentation before clustering to obtain the results. In addition to the speed and accuracy of neural network feature extraction, it also depended on the real-time processing and accuracy of clustering algorithms. The method proposed in [
14] included two relatively independent neural networks which needed to be pre-trained, spliced, and adjusted slightly. The training process was complex and the convergence speed was slow. References [
15,
16] took a continuous lane line as a whole and used the Yolo V3 network for target detection on the image transformed with an inverse perspective to complete the lane line recognition, which made the prediction result of the algorithm deviate when the road slope value changed or the vehicle was bumpy.
When it comes to the impact of structured roads and scenarios with high dynamic light changes, all the above algorithms have low accuracy and poor robustness. The aforementioned machine-learning-based algorithms require large training sets, and the collection of such data is expensive [
8,
14]. Based on methods of digital image processing [
10,
11], the selection of the threshold value is inefficient. Thus, this paper proposes a lane line identification method based on Markov random field. First, through image preprocessing, the lane line pixels are enhanced; then, the Markov random field is used for image modeling; finally, the reasoning of the Markov random field is performed based on the graph cut method, which can achieve the accurate and rapid identification of lane lines.
3. Markov Random Field
After image preprocessing, we modeled the image using Markov random field and further segmented out the lane line pixels. Markov random field (MRF) represents an undirected probabilistic graphical model which can classify different pixels in an image [
19]. It has the relevant advantages of describing different labels of pixels. For an image, the position set of all pixel information is
, the random variable set associated with each pixel position is
, and the neighborhood set of each position pixel is
, where
N represents the neighborhood range; in this study, we set
N = 4. As in Equation (4), if a Markov random field is to be established, the model must meet the Markov features:
Among them, Wn represents the category of the n pixel, and in the lane line detection task we specify that the categories are only lane lines and non-lane lines. WS represents the prior category of the image as a whole, and WEn represents the category of the pixel n neighborhood range. P (Wn|Ws) represents the conditional probability that the class of pixel n is Wn when the global class is WS. This feature explains the principle of conditional independence in undirected graph models, i.e., given a neighborhood, each pixel variable should be conditionally independent of each other.
We further modeled the image by MRF and specified that the image pixels corresponded to the nodes of the MRF one-to-one. In the MRF model, each node represents a variable, and the edge between the nodes depicts the dependencies between variable levels, as shown in Equation (5). The joint probability of the variables is described as the product of potential functions:
where
Z is the normalization factor of the joint probability distribution to ensure that the result is an effective probability distribution, usually called the partition function, and the factor
represents the
jth potential function, which maps the set of random variables to the real number domain and returns a non-negative value. This value depends on the subset state of the variable set
.
In the field of computer vision and image processing, we should not only consider the continuity of adjacent pixels, but also ensure the correct classification of discrete pixels. Therefore, it is necessary to use paired Markov random fields to obtain joint probability distributions by adding node potential functions, as shown in Equation (6).
where
is the energy potential function at the node
p, and
is the energy potential function on the edge connected to the adjacent nodes
p and
q.
Figure 9 shows a 2 × 2 MRF graphic model. By defining the energy potential function in the neighborhood of four nodes and that of a single node, the joint probability distribution of the model can be determined, as shown in Equation (7):
4. MRF Reasoning Based on Graph Cut Method
Probabilistic reasoning is equivalent to model solving. When the joint probability distribution is known, the goal is to determine the maximum posterior probability state of the model through reasoning, and the combination with the highest probability is viewed as the optimal classification [
20]. Therefore, this study used the graph cut method for accurate reasoning of the MRF, which can support effective classification according to pixel information.
4.1. Maximum Posterior Probability Reasoning
After using Markov random fields for image modeling, the image processing problem is transformed into a problem of maximum posterior probability reasoning defined in the MRF, as shown in Equation (8):
where the maximum posteriori probability
p (
x) is proportional to the continuous product of the paired MRF energy potential function. The maximum posteriori probability is equivalent to the problem of obtaining a minimized the energy
E (
x); thus, it is transformed into the logarithmic domain for a solution. Therefore, it can be indicated through Equation (9):
where
represents the observation cost under a given state
at the single pixel
p, which can also be regarded as the cost of sample classification. Similarly,
represents the cost function of placing category labels
at the two adjacent pixels
p,
q. The normalization factor
Z in the MRF is equivalent to a constant in reasoning; therefore, it can be omitted.
4.2. Binary Graph Cut Method
The graph cut method is widely used in computer vision and image processing, mainly to solve the problem of energy minimization [
21]. When using the binary graph cut method for accurate reasoning, the value of each random variable should be two, and the potential function between adjacent variables should meet the inequality relationship of Equation (10):
where
represents the corresponding energy potential function value when the pixels
p and
q take different gray values. This inequality ensures that there will be no negative cost. For the MRF model, it can be represented by a directed graph:
. Here,
V represents the set of variables and
E represents the set of adjacent variable edges. The edge weights of graph
G are assigned according to the energy function. Each node potential function and edge potential function of the energy function are assigned to graph
G in turn through certain rules; then, the edge weights after assignment are accumulated, and finally, a weighted graph
G is formed.
Through the s–t cutting graph
G, the corresponding maximum posteriori probability is obtained. As shown in
Figure 10, each node in the figure corresponds to a pixel. The node set
V is divided into two disconnected subsets,
S and
T. Here, the source node
s is in the set
S, and the sink node
t is in the set
T. When a node
u belongs to
S, the node variable takes the value of 0; when the node
u belongs to
T, the node variable takes the value of 1. Adjacent nodes in the grid are connected by a pair of directed edges, the source node is connected to each node by a directed edge, and the sink node is also connected to each node by a directed edge.
As shown in
Figure 11, the minimum s–t segmentation problem is to find a segmentation to minimize the weights of all edges. It can be understood that when classifying pixels, the cost allocated to
S and
T is
. Considering the attributes of a single pixel and the continuity of the pixels in the neighborhood, the minimum cost can be calculated, which is specifically expressed as in Equation (11).
In conclusion, the energy minimization problem can be solved by s–t segmentation of the directed graph G, and each variable can be accurately classified by selecting an appropriate energy potential function.
4.3. Maximum Flow Problem
According to [
22], the minimum segmentation method is equivalent to the maximum flow from the source node to the sink node. For the maximum flow problem, there are many algorithms with time complexity as the polynomial level [
23]. In a given graph
G, connected by directed edges, each edge has a non-negative capacity
, and there are two vertices,
s and
t, called the source and sink nodes, respectively.
In the analysis of the maximum flow problem, when considering the capacity of each edge, the goal is to push as much “flow” from the source node to the sink node as possible.
The augmented path algorithm is often used to solve the maximum flow problem, which is an iterative process of the “flow”. First, a path from the source node s to the sink node t is found, and the capacity of the path should be greater than zero. Then, the maximum “flow” flowing through this path is calculated. For an edge constrained by a path of minimal volume, all paths along the edge will lose the portion of “flow” so that a new saturated edge is formed. This process is iterated until there are no paths that satisfy the condition.
The saturated edge in the maximum flow problem can isolate the source and sink nodes; therefore, the saturated edge can be used to achieve the purpose of segmentation. The minimum cost is achievable by reasonably choosing the energy function. Therefore, the maximum flow problem is equivalent to the minimum s–t segmentation problem.
4.4. Determination of Energy Potential Function
In Markov random field, the energy potential function is used to reflect the interaction between nodes, and is equivalent to the cost function of the minimum s–t segmentation. Each pixel in the image is independent of each other, and the energy potential function can be considered to obey the Gaussian distribution, which can be expressed by Equation (12):
where
z is the normalization factor; parameter
T is a non-zero constant which is used to indicate the concentration degree of the classification;
C is the set of all the potential groups; and
is the potential energy value of the potential groups which indicates the coupling degree between different potential groups. It can be expressed as shown in Equation (13):
where
β is an adjustable parameter that is used to specify the degree of the difference between the
s and
t set potential functions. Through the above formulas, the energy minimization problem can be solved iteratively, i.e., the lane line segmentation pixels are obtained.
In this study, first, the task of lane recognition was transformed into the problem of energy segmentation between reasoning pixels by establishing a Markov model on the preprocessed image. Then, the energy potential function was determined and applied to s–t segmentation; finally, the optimal segmentation strategy was obtained by introducing the maximum flow problem for completing the recognition of lane line pixels.