Stereo Matching Algorithm of Multi-Feature Fusion Based on Improved Census Transform

: This article proposes an improved stereo matching algorithm in order to address the issue that the conventional Census transform is overly dependent on the center pixel of the window, which makes the algorithm susceptible to noise interference and results in low matching accuracy in regions with weak texture and complex texture. In the cost calculation stage, the noise threshold is set utilizing the absolute difference detection approach, and pixels that exceed the threshold are replaced with the mean gray values of the neighboring pixels in the 3 × 3 window. This stage also includes the introduction of the gradient cost, which is coupled with the edge and feature point information to provide the ﬁnal matching cost. The cross approach is employed to build the adaptive support domain and aggregate the costs during the cost aggregation stage. The disparity is ﬁnally calculated using the WTA technique, and a multi-step reﬁnement process is employed to produce the ﬁnal disparity map. The experiments demonstrate that the proposed algorithm has good anti-noise performance. Compared with other improved algorithms or composite algorithms, the average matching rate of the four standard images on the Middlebury test platform is 5.53%, which is higher than the remaining algorithms, indicating that the matching accuracy is high. The proposed algorithm provides ideas for subsequent improved algorithms


Introduction
Stereo matching seeks to identify analogous pixels from images captured from varying perspectives.The process involves computing the disparity between corresponding pixels to obtain depth information.Stereo matching technology is extensively adopted in autonomous driving, target tracking and three-dimensional reconstruction [1][2][3].
The accuracy of matching and real-time performance of stereo matching algorithms have significantly increased as a result of extensive recent research by relevant academics both domestically and internationally [4].In their summary of the matching process, Scharstein et al. [5] categorized the matching process into four stages: cost calculation, cost aggregation, disparity calculation and disparity refinement.They also classified the standard stereo matching algorithm into global stereo matching algorithm and local stereo matching algorithm [6,7].The global algorithm obtains the optimal disparity value by minimizing the energy function.There are mainly global algorithms based on dynamic programming [8], belief propagation [9] and graph cut [10].The global stereo matching algorithm has a good matching effect, but it is difficult to apply to applications with high real-time requirements due to its high computational complexity and being time-consuming.The local stereo matching algorithm uses the pixel information in the neighborhood of the point to be matched to calculate the cost.The advantage is that the algorithm has low complexity and good real-time performance.Sum of absolute difference (SAD) [11], relative gradient [12], normalized cross correlation (NCC) [13] and Census transform (CT) [14] are commonly used to calculate the matching cost of two pixels.Among them,

Matching Cost Calculation
The matching cost is a crucial aspect in stereo matching.It is utilized for evaluating the similarity of the corresponding pixels in two images taken by the camera from different perspectives in the same scene and typically influenced by factors such as lighting and background noise [26].

Traditional Matching Cost Calculation
The conventional Census algorithm traverses the image's pixels in a rectangular window, chooses the gray value of the window's center point as the reference value, and compares the gray values of each pixel to the reference value.The Boolean value obtained from the comparison is mapped into a bit string and the value of the bit string is used as the Census transformation value of the central pixel [27].Normally, the process of transformation can be given as where ⊗ represents the bit-wise catenation, I(p) is the gray value of the central pixel p in the window, I(q) is the gray value of another point q in the same window.In Equation ( 1), the traditional Census auxiliary function for I(p) and I(q) is given as where Ω comprises all the other points in the support window besides the central point; ξ[I(p), I(q)] is the Census transform value.
According to Equation (1), Census transform is performed on the left and the right image in a certain disparity range, and two-bit strings are obtained.The Hamming distance of two-bit strings is compared bit by bit.The computation of the similarity between I(p) and I(q) by Hamming is given as where C(p, d) is the matching cost; when the parallax is d, C l (p) is the bit string of the left image and C r (p − d) is the bit string of the right image.d max and d min are the maximum and minimum values of the disparity range, respectively.

Improved Matching Cost Calculation
When choosing the reference value information, the traditional Census algorithm just takes the gray value of the window's center point into consideration.The result is overly dependent on the center point, leading to sensitivity to noise and being easy to cause mismatch.To solve this problem, MA et al. [28] replaced the gray value of the center pixel with the mean gray value of the neighborhood pixels in the support window.Although the dependence on the center pixel is reduced, the reliability of the obtained reference value becomes worse.This paper presents a matching cost algorithm rooted in multi-feature fusion in light of the aforementioned issues.Figure 1 is the flow chart of the algorithm.The Rank-Ordered Absolute Differences (ROAD) [29] is used to detect whether the center point is a noise point.The principle is as follows: define p as the center pixel and

3
 as the set of pixels except the center point in the 3 × 3 window.q p d , is the absolute value of the gray value difference between the center pixel p and the neighborhood pixel q and is given as Then, all the q p d , values in the window are arranged in ascending order and defined as In the above equation, 2 is the set of dp, q, and ) (x r i denotes the value at the i-th position in the ascending order.The edge pixels and impulse noise pixels of the Lena image are contrasted in Figure 2. By comparing the two images, it can be noticed that the neighborhood around the edge pixel has almost the same intensity, which means that its ROAD value is very low.The Rank-Ordered Absolute Differences (ROAD) [29] is used to detect whether the center point is a noise point.The principle is as follows: define p as the center pixel and Ω 3×3 as the set of pixels except the center point in the 3 × 3 window.d p,q is the absolute value of the gray value difference between the center pixel p and the neighborhood pixel q and is given as Then, all the d p,q values in the window are arranged in ascending order and defined as In the above equation, 2 ≤ m ≤ 7, r(x) is the set of d p, q , and r i (x) denotes the value at the i-th position in the ascending order.
The edge pixels and impulse noise pixels of the Lena image are contrasted in Figure 2. By comparing the two images, it can be noticed that the neighborhood around the edge pixel has almost the same intensity, which means that its ROAD value is very low.
The internal area and edge of the image are continuous, so when the ROAD value is low, the gray values of the center pixels are similar to those of the neighboring pixels.However, when the center point is affected by noise, it differs significantly from most or all of the neighboring pixels in terms of gray value, resulting in a higher ROAD value.The value of ROAD 4 represents the similarity between half of the pixels in the 3 × 3 window and the center pixel, which is the best choice to judge the noise.Therefore, this paper utilizes the values of ROAD 4 to determine whether the center point of the support window is a noise point.The internal area and edge of the image are continuous, so when the ROAD value is low, the gray values of the center pixels are similar to those of the neighboring pixels.However, when the center point is affected by noise, it differs significantly from most or all of the neighboring pixels in terms of gray value, resulting in a higher ROAD value.The value of ROAD4 represents the similarity between half of the pixels in the 3 × 3 window and the center pixel, which is the best choice to judge the noise.Therefore, this paper utilizes the values of ROAD4 to determine whether the center point of the support window is a noise point.
During the cost calculation, a threshold Tnoise is established.The center pixel will be modified when the threshold is surpassed.By replacing gray value of the center pixel with the average gray value of the neighboring pixels, the mismatch rate can be drastically reduced.However, the window selected according to the center point is typically larger in the real calculation cost, and numerous picture edges or multiple noise locations are more likely to appear in the window.In most cases, the points in the 3 × 3 window have outstanding continuity and the likelihood of multiple noise points is low.Therefore, this paper employs the average gray value of the 3 3 window pixels except the center pixels as the gray value of the reference.The gray value of the reference point I(p) of the transformation is given as is the set of all domain points except the central pixel in the 3 × 3 window.Figure 3 depicts the transformation and comparison procedure.The Census transform without noise is shown in the first part, followed by the conventional Census transform with noise added, and finally, the improved Census transform results with noise added.Even though the center point is disturbed, the bit string obtained by the transformation is less affected.Therefore, this algorithm can reduce the dependence on the center point.At the same time, the judgment of noise is added to improve the reliability of the reference value of the support window in the Census transform.During the cost calculation, a threshold T noise is established.The center pixel will be modified when the threshold is surpassed.By replacing gray value of the center pixel with the average gray value of the neighboring pixels, the mismatch rate can be drastically reduced.However, the window selected according to the center point is typically larger in the real calculation cost, and numerous picture edges or multiple noise locations are more likely to appear in the window.In most cases, the points in the 3 × 3 window have outstanding continuity and the likelihood of multiple noise points is low.Therefore, this paper employs the average gray value of the 3 × 3 window pixels except the center pixels as the gray value of the reference.The gray value of the reference point I(p) of the transformation is given as where N(p) is the set of all domain points except the central pixel in the 3 × 3 window.Figure 3 depicts the transformation and comparison procedure.The Census transform without noise is shown in the first part, followed by the conventional Census transform with noise added, and finally, the improved Census transform results with noise added.Even though the center point is disturbed, the bit string obtained by the transformation is less affected.Therefore, this algorithm can reduce the dependence on the center point.At the same time, the judgment of noise is added to improve the reliability of the reference value of the support window in the Census transform.
Although the reliance on the center point is lessened through judgment and adjustment of gray value of the center point, it is still unable to effectively improve the algorithm's effect in the weakly textured and the repeated textured regions.Therefore, this paper introduces edge and feature point information to improve the precision of the algorithm.Although the reliance on the center point is lessened through judgment and adjustment of gray value of the center point, it is still unable to effectively improve the algorithm's effect in the weakly textured and the repeated textured regions.Therefore, this paper introduces edge and feature point information to improve the precision of the algorithm.
First, Canny edge detection is required to create an initial binary image and then encode it in order to extract rich edge texture information.The specific coding conversion process is given as where   q E is the edge binary value obtained from the point q and edge  is the set of neighborhood pixels of q in the edge image.Then, the Harris corner detection method is employed to obtain the corner points in the image.By setting the distance between the corner points, the feature point set feature  is obtained.Finally, the edge information and corner information are combined to construct the following comparison function: Taking the 3 × 3 support window as an example, the specific Census transform coding and combination process of edge and feature point information is shown in Figure 4. 'X' is the position where the feature points are distributed in the support window.First, Canny edge detection is required to create an initial binary image and then encode it in order to extract rich edge texture information.The specific coding conversion process is given as where E(q) is the edge binary value obtained from the point q and Ω edge is the set of neighborhood pixels of q in the edge image.Then, the Harris corner detection method is employed to obtain the corner points in the image.By setting the distance between the corner points, the feature point set Ω f eature is obtained.Finally, the edge information and corner information are combined to construct the following comparison function: Taking the 3 × 3 support window as an example, the specific Census transform coding and combination process of edge and feature point information is shown in Figure 4. 'X' is the position where the feature points are distributed in the support window.
After introducing the edge and feature point information, the matching cost is calculated in accordance with Formula (2), and it is linearly fused with the initial cost calculated by the gray scale.The matching cost C weight (p, d) is given as where C cen (p, d) is the Census matching cost based on gray value information, C edge+ f eature (p, d) is the matching cost based on edge and feature points, and ε is the control parameter.When ε is 0.5, the weight is linear fusion.In order to improve the smoothness after edge filtering, this paper combines the enhanced transform with the gradient transform.When calculating the gradient, the gradient value of each pixel in the x and y directions should be taken into consideration.The calculation formula based on gradient transformation can be expressed as where ∇ is the directional derivative, I l (p) is the gray value of point p in the left image, and I r (p − d) is the gray value of point p to be matched in the right image.After introducing the edge and feature point information, the matching cost is cal lated in accordance with Formula (2), and it is linearly fused with the initial cost calcula by the gray scale.is the matching cost based on edge and feature points, and  is the c trol parameter.When  is 0.5, the weight is linear fusion.
In order to improve the smoothness after edge filtering, this paper combines the hanced transform with the gradient transform.When calculating the gradient, the gra ent value of each pixel in the x and y directions should be taken into consideration.T calculation formula based on gradient transformation can be expressed as is the gray value of point p in the left i age, and is the gray value of point p to be matched in the right image.The final cost derived from the improved Census algorithm is fused with the gradi cost using normalization, which is expressed as  and 2  are the parameters that affect the cost weight.

Cost Aggregation
Cost aggregation plays a key role in stereo matching and affects the final dispar map directly.For cost aggregation, this paper employs the adaptive window based cross intersection in Reference [16].Firstly, a cross-domain is built for each pixel, wh involves extending the point in the four directions and the limitation criteria.The ext sion is terminated if the restriction conditions are not satisfied.The specific process shown in Figure 5.The final cost derived from the improved Census algorithm is fused with the gradient cost using normalization, which is expressed as where λ 1 and λ 2 are the parameters that affect the cost weight.

Cost Aggregation
Cost aggregation plays a key role in stereo matching and affects the final disparity map directly.For cost aggregation, this paper employs the adaptive window based on cross intersection in Reference [16].Firstly, a cross-domain is built for each pixel, which involves extending the point in the four directions and the limitation criteria.The extension is terminated if the restriction conditions are not satisfied.The specific process is shown in Figure 5.In Figure 5, p is the center point, q is the end point of the pixel cross arm centered on O point, and q1 is a point behind the point q.The restriction condition is express as In Figure 5, p is the center point, q is the end point of the pixel cross arm centered on O point, and q 1 is a point behind the point q.The restriction condition is express as where τ 1 , τ 2 and τ 3 are the color thresholds and L 1 and L 2 are the distance thresholds.
Then, a support window is constructed based on the cross-arm.Each point in the four extended arms is extended vertically and horizontally with the same constraints.However, the support windows may not always be the same.In order to ensure good matching accuracy and matching effect, the intersection of the two support windows is taken as the final support window.The final support window can be observed in Figure 6.In Figure 5, p is the center point, q is the end point of the pixel cross arm centered O point, and q1 is a point behind the point q.The restriction condition is express as  and 3  are the color thresholds and 1 L and 2 L are the distance thre olds.
Then, a support window is constructed based on the cross-arm.Each point in the fo extended arms is extended vertically and horizontally with the same constraints.Ho ever, the support windows may not always be the same.In order to ensure good match accuracy and matching effect, the intersection of the two support windows is taken as final support window.The final support window can be observed in Figure 6.In Figure 6, orange is the support window of point A, green is the support window of point B and grey is the cross-arm.Since the support windows of the two do not intersect, the rightmost part of the support window of point B will be discarded.
The final aggregation cost C(p, d) is given as where U(x, y, d) represents the support window, q represents the pixel point, C(x, y, d) represents the initial matching cost, C (x, y, d) represents the value obtained after the cost aggregation, and m represents the total number of points in the window.

Disparity Calculation
The disparity calculation adopts a simple and efficient WTA strategy.First, the initial disparity D(p) is chosen as the disparity value corresponding to the minimal aggregation cost and it can be given as where d max is the maximum parallax.
Then, a multi-step refinement scheme is adopted for the initial disparity, including leftright consistency detection, iterative support domain voting, abnormal point classification interpolation, sub-pixel refinement and median filtering.
On each point in the initial disparity map, the formula used to detect the consistency on the left and right sides is given as where D L (p) is the value of point p in the left disparity map, D R [p − D L (p)] is the value of the corresponding point of point p in the right disparity map, and δ is the error tolerance.
The points that satisfy the validation are marked as valid points, and other points are marked as outliers.
In the above formula, |R(p)| is the number of points in the support domain R(p) where the point p is located, d R(p) is the number of the votes for the disparity point d R(p) with the highest number of votes in the support domain R(p), and τ n , τ r are the If the support domain where the outliers are located satisfies condition (16), the value of d R(p) is used instead of the outliers and marked as a valid point.In order to deal with more outliers, this process needs to be iterated repeatedly.
Abnormal point classification and interpolation are used to handle remaining abnormal points.Firstly, the anomalous points are categorized into occlusion points and mismatch points based on the geometric principle.Then, for the occlusion point, replace it with the nearest effective point in its left and right directions; for the mismatched point, find the nearest effective point in its left and right directions, respectively, and replace the mismatched point with the smaller disparity values of these two points.
Finally, sub-pixel refinement and median filtering are applied to the disparity map in order to obtain the ultimate disparity map.

Data and Experiments
In order to verify the effectiveness and stability of the proposed algorithm, this paper verifies it on some stereo image pairs of the Middlebury dataset [30][31][32].The mismatch rate, which is an important criterion in the test, can be expressed as where N is the number of effective pixels in the image region; d e (x, y) is the disparity map calculated by the stereo matching algorithm; the true disparity map is provided by d r (x, y) for the dataset; and σ d is the disparity threshold, which is taken as 1 in the experiment.When the difference between the disparity value calculated by the stereo matching algorithm and the real disparity value is larger than 1, the pixel is regarded as a mismatched point.

Anti-Noise Experiment
Firstly, to assess the accuracy of the suggested algorithm under the influence of noise, salt and pepper noise and Gaussian noise, respectively, are added to the four sets of standard test images.The coverage of salt and pepper noise is 2%, 5%, 10% and 15%, and the size of Gaussian noise is 2, 4, 6 and 8 standard deviations.Utilizing the three different algorithms, the initial cost matrix and unoptimized initial disparity map are acquired.Subsequently, the mean error values of the non-occluded area are assessed and compared with that of the MCT [28] and SGM [33] algorithms.The results after experiments are shown in Table 1.To provide a clearer comparison of the anti-noise capabilities of the three algorithms, an average mismatch rate line chart for the non-occluded area under the influence of salt and pepper and Gaussian noise was created based on the data from Table 1.
The statistical analysis presented in Table 1 and Figure 7 indicates that in the case of non-occluded area and no noise, the MCT algorithm exhibits a close approximation to the proposed algorithm.However, after adding salt and pepper noise, the mismatch rate of MCT and SGM algorithms increases rapidly, which indicates that the two algorithms are more sensitive to the impulse noise.The proposed algorithm, however, performs under the same conditions with a relatively stable and robust performance, making it a better choice.Although the distinction between the proposed method and the other two algorithms is not strikingly apparent for Gaussian noise, there nevertheless exists advantages.Consequently, the improved algorithm in this paper is distinguished by higher robustness against noise.

Comparison of Final Disparity Map Results
Figure 8 lists the final test results of the proposed algorithm and the other three traditional algorithms on four image pairs in the Middlebury dataset.In the test, the cost calculation stage is different and the other three stages are the same in these three algorithms.(a) is the left original image, (b) is the real disparity map, (c) is the traditional Census algorithm, (d) is the traditional SGM algorithm and (e) is the result of the proposed algorithm.Among them, the white box is the area where multiple edges and complex textures are located.
From the final disparity map, it becomes evident that the two traditional algorithms show unsatisfactory disparity in areas with complex textures and numerous edges, and the overall effect is poor.In the disparity map obtained through the algorithm proposed in this paper, the edges are smoother, and the effect is better in the complex texture area.Consequently, in comparison to the other two algorithms, the advantages of the proposed algorithm are more apparent, and the resulting disparity map approximates the actual one with a higher degree of accuracy.

The Overall Performance Test of the Algorithm
In order to test the overall performance, this paper conducted experiments on four classical image pairs and compared the results of the proposed algorithm with those of four non-traditional algorithms, SSD+MF [34], GlobalGCP [35], adaptive weight algorithm [36] and RINCensus [3].The mismatch rate (PBM) of Non-occ, All and Disc is used as the evaluation index, and the results are shown in Table 2.

Figure 1 .
Figure 1.Flow chart of cost calculation proposed in this paper.

Figure 1 .
Figure 1.Flow chart of cost calculation proposed in this paper.

Figure 2 .
Figure 2. Lena image noise pixel and pulse pixel comparison diagram.

Figure 2 .
Figure 2. Lena image noise pixel and pulse pixel comparison diagram.

Figure 3 .
Figure 3.Comparison of improved algorithm and traditional algorithm under the influence of noise.

Figure 3 .
Figure 3.Comparison of improved algorithm and traditional algorithm under the influence of noise.

Electronics 2023 ,Figure 4 .
Figure 4. Edge and feature point information coding and combination process.

Figure 4 .
Figure 4. Edge and feature point information coding and combination process.

Figure 7 .
Figure 7.The mismatch rate line chart of different algorithms.(a) salt and pepper noise; (b)Gaussian noise.

Figure 8
Figure 8 lists the final test results of the proposed algorithm and the other three traditional algorithms on four image pairs in the Middlebury dataset.In the test, the cost calculation stage is different and the other three stages are the same in these three algorithms.(a) is the left original image, (b) is the real disparity map, (c) is the traditional Census algorithm, (d) is the traditional SGM algorithm and (e) is the result of the proposed

Figure 7 .
Figure 7.The mismatch rate line chart of different algorithms.(a) salt and pepper noise; (b) Gaussian noise.

Table 1 .
The mismatch rate of different algorithms under two kinds of noise.