Improved Cost Computation and Adaptive Shape Guided Filter for Local Stereo Matching of Low Texture Stereo Images

: Dense stereo matching has been widely used in photogrammetry and computer vision applications. Even though it has a long research history, dense stereo matching is still challenging for occluded, textureless and discontinuous regions. This paper proposed an e ﬃ cient and e ﬀ ective matching cost measurement and an adaptive shape guided ﬁlter-based matching cost aggregation method to improve the stereo matching performance for large textureless regions. At ﬁrst, an e ﬃ cient matching cost function combining enhanced image gradient-based matching cost and improved census transform-based matching cost is introduced. This proposed matching cost function is robust against radiometric variations and textureless regions. Following this, an adaptive shape cross-based window is constructed for each pixel and a modiﬁed guided ﬁlter based on this adaptive shape window is implemented for cost aggregation. The ﬁnal disparity map is obtained after disparity selection and multiple steps disparity reﬁnement. Experiments were conducted on the Middlebury benchmark dataset to evaluate the e ﬀ ectiveness of the proposed cost measurement and cost aggregation strategy. The experimental results demonstrated that the average matching error rate on Middlebury standard image pairs is 9.40%. Compared with the traditional guided ﬁlter-based stereo matching method, the proposed method achieved a better matching result in textureless regions.


Introduction
Dense stereo matching is a significant research topic in the field of photogrammetry and computer vision, greatly benefiting applications like 3D reconstruction, DSM (Digital Surface Model) production, visual reality and autonomous vehicles [1][2][3][4]. A large number of efficient stereo matching algorithms have been developed in recent years, but it is still a challenging task to handle the stereo matching problem in occluded, textureless and discontinuous regions. According to the classical taxonomy method proposed by Scharstein and Szeliski [5], these existing stereo algorithms can be mainly classified into global and local approaches. Global algorithms explicitly incorporate smoothness assumption into an energy function that combines data and smoothness terms and estimate disparity by minimizing the global energy function. Belief propagation [6,7], graph cuts [8], and dynamic programming [9] are among the most commonly used global stereo matching optimization algorithms. They usually produce a more accurate disparity map than local methods but with higher computational complexity. On the other hand, local stereo matching algorithms only use the local information guided filter based on this adaptive shape region to improve the reliability of the disparity map, especially for low texture structures. The winner-take-all strategy is then carried out to obtain the optimal disparity for each pixel. Finally, multi-constraints-based disparity refinement is applied to further improve the disparity map and eventually get sub-pixel accuracy disparity map.
The remainder of this paper is organized as follows. The proposed matching cost computation method and matching cost aggregation strategy are first described in Section 2. Section 3 presents the experimental results and discussions about the method and results. Finally, Section 4 concludes this paper.

Methods
The workflow of the proposed method is shown in Figure 1. Similar to a basic local stereo matching method, the proposed method consists of four steps. (1) Matching cost computation: we propose a new matching cost computation method that uses a combination of the enhanced image gradient-based cost and improved census transform-based cost. (2) Cost aggregation: first, a crossbased adaptive shape support window is constructed for each pixel. Then modified guided filter is implemented based on the constructed support window to aggregate the matching cost inside the window. (3) Disparity selection: a winner-take-all (WTA) strategy is used to find the optimal disparity for each pixel according to the aggregated cost. (4) Multi-constraints-based disparity refinement framework. In order to further detect the incorrect matching results, a multi-constraintsbased disparity refinement framework is implemented. Outlier detection with left-right consistency checking process, occlusion/mismatch handling, weighted median filter, and subpixel enhancement are included in this framework.

Matching Cost Computation
In this step, cost volume is built by computing per-pixel matching cost at all given disparity values under consideration. This cost volume is a three-dimensional array with a size of × × , and , and denote the height, width and disparity range of the images respectively. Although absolute difference on intensity or color channels is very simple and fast, it is too sensitive to radiometric differences and noises. Stereo matching methods using absolute difference on intensity or color have lots of errors on the disparity maps especially for outdoor images, in which radiometric changes and noises are unavoidable. By contrast, gradient similarity [29] and census transform [30] are more robust to radiometric distortion. To make the matching cost more robust to radiometric changes and noises, we propose a matching cost function that combines the enhanced image gradient-based cost with improved census transform-based cost.
In order to obtain stronger edge information, an image enhancement algorithm is first applied to the input stereo images. Here, the input images are first enhanced using Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm [31]. Then the sum of absolute derivative differences of the left and right enhanced images in the and directions is used as a gradient-based cost

Matching Cost Computation
In this step, cost volume is built by computing per-pixel matching cost at all given disparity values under consideration. This cost volume is a three-dimensional array with a size of H × W × D, and H, W and D denote the height, width and disparity range of the images respectively. Although absolute difference on intensity or color channels is very simple and fast, it is too sensitive to radiometric differences and noises. Stereo matching methods using absolute difference on intensity or color have lots of errors on the disparity maps especially for outdoor images, in which radiometric changes and noises are unavoidable. By contrast, gradient similarity [29] and census transform [30] are more robust to radiometric distortion. To make the matching cost more robust to radiometric changes and noises, we propose a matching cost function that combines the enhanced image gradient-based cost with improved census transform-based cost.
In order to obtain stronger edge information, an image enhancement algorithm is first applied to the input stereo images. Here, the input images are first enhanced using Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm [31]. Then the sum of absolute derivative differences of the left and right enhanced images in the x and y directions is used as a gradient-based cost measure. The enhanced image gradient-based matching cost C CLAHE GRAD (p, d) is computed according to Equation (1): where I L represents the enhanced left image and I R represents the enhanced right image. ∇ x I L (p) and ∇ y I L (p) denote the gradients in x and y direction of the enhanced left image at a pixel p. Accordingly, ∇ x I R (p − d) and ∇ y I R (p − d) denote the gradients in x and y direction of the enhanced right image at the pixel p − d. d is the disparity. Besides the gradients-based cost, the census transform-based cost is also computed using the enhanced left and right images. The original census transform is based on local intensity relations between the center pixel and its neighbor pixels. It only relies on the relative ordering of intensities and not their values, and thus it compensates for all radiometric distortions that preserving this ordering. However, Mei et al. [14] has displayed that the traditional census transform would produce wrong matches in regions with repetitive local structures in stereo matching. To address the limitations of traditional census transform, we present an improved census transform using gradients rather than intensity itself. The improved census transform in this paper is formulated as follows: where operator ⊗ denotes a bit-wise catenation and N(p) represents the neighbor pixels of anchor pixel p. q is a neighbor of p. ∇I(p) and ∇I(q) are the gradient of enhanced images at pixel p and neighbor pixel q. |.| means the magnitude of gradient. ξ is a function to determine the bit value as described in Equation (3). Then the Hamming distance is used to calculate the difference between the two bit-strings in left and right images and used as the improved Census transform-based matching cost.
In Equation (4), CTg L and CTg R are the Census transform bit strings of pixel p and p − d in the left image and right image respectively, and d denotes the disparity of two pixels in the left and right images. The final combined matching cost is derived by merging the two cost components mentioned above: Here, λ is a normalizing parameter to control the influence of outliers, and the exponential function is used to normalize each cost component to the range [0, 1] and ensure that the final matching cost won't severely bias to one of the matching costs. C(p, d) is the final used matching cost that combines the enhanced image gradient-based matching cost (C CLAHE GRAD ) with the improved census transform-based matching cost (C CTg ). Figure 2 presents the visual results of stereo matched disparity map using the proposed enhanced gradient-based matching cost and improved census transform-based matching cost on the Tsukuba of the Middlebury dataset [32]. The original gradient-based and Census transform-based matching cost are also shown in Figure 2 for the convenience of comparison. In order to make the result convincing, we use the same cost aggregation strategy as [23] and no refinement process is applied. The comparison of results obtained from the original gradient-based and enhanced gradient-based is shown in the first row. Further, it can be found that the enhanced gradient-based cost produces a better disparity map than the raw gradient-based cost at image boundaries (red circle mark). In the second row, the proposed modified census transform-based cost and the original census transform-based cost are also compared. According to the disparity map, our modified census transform-based matching cost performs visibly better than the original census transform-based matching cost at repetitive local structures (blue circle mark in Figure 2). transform-based matching cost performs visibly better than the original census transform-based matching cost at repetitive local structures (blue circle mark in Figure 2).  Figure 3 compares the proposed combined matching cost computation method with common individual cost methods, such as the absolute difference of intensity (AD), the absolute difference of image gradient and traditional census transform-based cost measurement. Tests are implemented on the two image pairs of the Middlebury stereo dataset (i.e., Tsukuba, Teddy). All of the disparity maps are initial stereo matching results without any post-processing and generated by the same cost aggregation method, which would be mentioned in the next section, and the winner takes all disparity computation strategy. From Figure 3, it can be found that the comprehensive performance of our proposed cost method obviously outperforms AD, gradient-based matching cost, and traditional census transform-based matching cost. AD cannot handle large textureless regions effectively, the gradient-based matching cost cannot do well with tiny boundaries, while the traditional census transform-based matching cost fails at repetitive or similar local structures.   Figure 3 compares the proposed combined matching cost computation method with common individual cost methods, such as the absolute difference of intensity (AD), the absolute difference of image gradient and traditional census transform-based cost measurement. Tests are implemented on the two image pairs of the Middlebury stereo dataset (i.e., Tsukuba, Teddy). All of the disparity maps are initial stereo matching results without any post-processing and generated by the same cost aggregation method, which would be mentioned in the next section, and the winner takes all disparity computation strategy. From Figure 3, it can be found that the comprehensive performance of our proposed cost method obviously outperforms AD, gradient-based matching cost, and traditional census transform-based matching cost. AD cannot handle large textureless regions effectively, the gradient-based matching cost cannot do well with tiny boundaries, while the traditional census transform-based matching cost fails at repetitive or similar local structures.

Matching Cost Aggregation Method
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 17 transform-based matching cost performs visibly better than the original census transform-based matching cost at repetitive local structures (blue circle mark in Figure 2).  Figure 3 compares the proposed combined matching cost computation method with common individual cost methods, such as the absolute difference of intensity (AD), the absolute difference of image gradient and traditional census transform-based cost measurement. Tests are implemented on the two image pairs of the Middlebury stereo dataset (i.e., Tsukuba, Teddy). All of the disparity maps are initial stereo matching results without any post-processing and generated by the same cost aggregation method, which would be mentioned in the next section, and the winner takes all disparity computation strategy. From Figure 3, it can be found that the comprehensive performance of our proposed cost method obviously outperforms AD, gradient-based matching cost, and traditional census transform-based matching cost. AD cannot handle large textureless regions effectively, the gradient-based matching cost cannot do well with tiny boundaries, while the traditional census transform-based matching cost fails at repetitive or similar local structures.

Matching Cost Aggregation Method
Per-pixel-based matching cost is sensitive to noise. Aggregating the matching cost over a local support window is an effective way to improve the accuracy and robustness of local stereo matching methods. Matching cost aggregation can be viewed as filtering on the cost volume. Given a cost volume C, the aggregated matching cost at pixel i with disparity d can be computed according to: where C j,d is the matching cost of pixel j with disparity d. pixel j is a neighbor of pixel i. W ij is the weight coefficient of the pixel j. C i,d is the aggregated matching cost at pixel i with disparity d.
As an efficient and effective edge-preserving image filter, guided image filter has been successfully adopted in local stereo matching algorithms, achieving commendable disparity maps [23,33,34]. After applying guided filter using guidance image G, the kernel W ij in Equation (7) can be expressed as: where i and j represent pixel indexes. G i and G j are pixel values of guidance image at pixel i and pixel j. µ k and σ 2 k are the mean and variance of kernel window ω k in guided image G, reflecting the statistical characteristics of pixels inside the support window. |ω| is the number of pixels in the window ω k with a fixed size r × r. ε is a smooth parameter.
The matching cost aggregation implicitly assumes that the disparity of the pixels inside the aggregation window are the same, which means that the support window should only contain neighbor pixels that have the same disparity as the center pixel. However, the support window with fixed window size in this traditional guided filter can hardly adapt to objects with different sizes in the scene and guarantee that the pixels inside the window have the same disparity, resulting in inappropriate aggregation results due to pixels with different disparities are involved especially in discontinuous regions. Besides, larger support windows are preferred in textureless regions because more accurate mean value (µ k ) and variance value (σ 2 k ) can be estimated and thus improve the matching cost aggregation performance. In summary, the fixed size window guided filtering is not suitable for matching cost aggregation of textureless and discontinuous regions.
In order to better aggregate matching cost in discontinuous and textureless regions, adaptive shape support window is required. In general, the pixels with similar intensities within a constrained area are more likely captured from the same image structure, and thus have similar disparities. According to this assumption, Zhang et al. [19] and Mei et al. [14] proposed to construct cross-based support window. Cross-based support regions are constructed by expanding each pixel p to its neighbor pixels that have similar intensities with p in the horizontal and vertical directions, expressing as V(p) and H(p): h are the four arm lengths. Then the support region S(p) is generated by merging all of the pixels lying on the horizontal direction for each pixel q which belongs to the vertical direction V(p), as illustrated in Figure 4.
One of the core issues in constructing the cross-based adaptive shape support window is how to design proper rules to expand the pixel p to its neighbors. Zhang et al. [19] used color similarity D c (p, q) and constant color similarity threshold to construct the cross-based support window. This method cannot perform well in depth-discontinuous regions and low-texture regions at the same time. In order to perform better in both depth discontinuous and low-texture regions, Mei et al. [14] used both color similarity D c (p, q) and spatial distance D s (p, q) to construct the support window. Two color similarity thresholds were set according to two spatial distance thresholds, as illustrated in Figure 5a. If the spatial distance is smaller than threshold L 1 , a larger color similarity threshold τ 1 is used. If the spatial distance is between L 1 and L 2 , a smaller color similarity threshold τ 2 is used. We further improve the method in Mei et al. [14] by calculating the color similarity threshold for each pixel adaptively. As illustrated in Figure 4, if we want to construct the adaptive support region of pixel p, the color difference D c (p, q) and the spatial distance D s (p, q) between pixel p and q are first calculated. According to the spatial distance D s (p, q), the pixel q is labeled as textured region pixel or textureless region pixel. If D s (p, q) is larger than spatial distance threshold d Lim , the pixel q is labeled as textureless region pixel. Otherwise pixel q is labeled as textured region pixel. In the richly-textured regions, the color similarity thresholds should be larger and decrease with the increase of the spatial distance. In the textureless regions, the color similarity thresholds can be smaller and should decrease with the increase of the spatial distance. Based on these findings, we adaptively calculate the color similarity threshold for each pixel according to the following two rules: In the above rules, L 1 is a relatively small spatial distance constant and τ 1 is a relatively large color similarity constant for richly-textured regions. L 2 is a relatively large spatial distance constant and τ 2 is a relatively small color similarity constant for low texture regions. τ large (D s (p, q)) and τ samll (D s (p, q)) are the adaptively calculated color similarity threshold for richly-textured regions and textureless regions accordingly. Rule 1 is a restricted condition which ensures that limited area is included in the richly-textured regions. While Rule 2 is fulfilled to ensure as many points from the same depth as possible are included in the textureless regions. The adaptively calculated color similarity threshold is illustrated in Figure 5b.  One of the core issues in constructing the cross-based adaptive shape support window is how to design proper rules to expand the pixel to its neighbors. Zhang et al. [19] used color similarity ( , ) and constant color similarity threshold to construct the cross-based support window. This method cannot perform well in depth-discontinuous regions and low-texture regions at the same time. In order to perform better in both depth discontinuous and low-texture regions, Mei et al. [14] used both color similarity ( , ) and spatial distance ( , ) to construct the support window. Two color similarity thresholds were set according to two spatial distance thresholds, as illustrated in Figure 5a. If the spatial distance is smaller than threshold 1 , a larger color similarity threshold 1 is used. If the spatial distance is between 1 and 2 , a smaller color similarity threshold 2 is used. We further improve the method in Mei et al. [14] by calculating the color similarity threshold for each pixel adaptively. As illustrated in Figure 4, if we want to construct the adaptive support region of pixel , the color difference ( , ) and the spatial distance ( , ) between pixel and are first calculated. According to the spatial distance ( , ), the pixel is labeled as textured region pixel or textureless region pixel. If ( , ) is larger than spatial distance threshold , the pixel is labeled as textureless region pixel. Otherwise pixel is labeled as textured region pixel. In the The support regions generated by these three approaches are presented in Figure 6, and we can find that more valid pixels are included in the support window generated by our proposed method in large low texture regions compared with the other two methods. The support regions generated by these three approaches are presented in Figure 6, and we can find that more valid pixels are included in the support window generated by our proposed method in large low texture regions compared with the other two methods.
(a) (b) Figure 5. The spatial distance thresholds and color similarity thresholds proposed by Mei et al. [14] (a) and the adaptive color similarity threshold proposed by this paper (b).
The support regions generated by these three approaches are presented in Figure 6, and we can find that more valid pixels are included in the support window generated by our proposed method in large low texture regions compared with the other two methods. After the adaptive shape support region constructed, the matching cost can be aggregated using guided image filter over the support region. The guided filter kernel in Equation (8) is designed for fixed square size window. In order to apply guided filter with adaptive shape support window to the cost volume, the guided filter should be modified accordingly. The modified weight kernel for adaptive shape support window is defined as: where | | and | | represent the pixel numbers of adaptive shape support regions and respectively. and are obtained directly from the guidance image. and 2 are calculated from the pixels of guidance image that inside the support region. Thus, the weight coefficient for each neighbor can be compute. To speed up the cost aggregation process, we use an orthogonal integral image technique [19] to aggregate costs in the adaptive shape regions horizontally and vertically respectively.

Disparity Selection
The initial disparity for each pixel can be directly selected using winner-take-all (WTA) strategy as defined by: where ′ ( , ) represents the aggregated matching cost volume obtained by the cost aggregation, and denotes the set of all allowed candidate disparities. The disparity for a specific pixel is obtained by choosing the disparity that has the minimum aggregated matching cost. After the adaptive shape support region constructed, the matching cost can be aggregated using guided image filter over the support region. The guided filter kernel in Equation (8) is designed for fixed square size window. In order to apply guided filter with adaptive shape support window to the cost volume, the guided filter should be modified accordingly. The modified weight kernel for adaptive shape support window is defined as: where |N i | and |N k | represent the pixel numbers of adaptive shape support regions N i and N k respectively. G i and G j are obtained directly from the guidance image. µ k and σ 2 k are calculated from the pixels of guidance image G that inside the support region. Thus, the weight coefficient for each neighbor can be compute. To speed up the cost aggregation process, we use an orthogonal integral image technique [19] to aggregate costs in the adaptive shape regions horizontally and vertically respectively.

Disparity Selection
The initial disparity for each pixel can be directly selected using winner-take-all (WTA) strategy as defined by: where C (p, d) represents the aggregated matching cost volume obtained by the cost aggregation, and D denotes the set of all allowed candidate disparities. The disparity d p for a specific pixel p is obtained by choosing the disparity that has the minimum aggregated matching cost.

Disparity Refinement by Multi-Constraints-Based Methods
The initial disparity maps obtained by WTA strategy still have many occluded and mismatched pixels. In this section, multi-constraints-based disparity refinement methods that consist of outlier detection, occlusion/mismatch interpolation, weighted median filter, and subpixel refinement are adopted to handle the disparity errors.
Outlier Detection: In order to find out the outliers in the left and right disparity maps, a left-right consistency check is applied. A pixel p is labeled as an outlier if it violates the following constraint: Appl. Sci. 2020, 10, 1869 9 of 17 where d L (p) and d R (p − d L (p)) are the disparities of pixel p and p − d L (p) in the left and right disparity maps respectively. Then outliers are further categorized into occlusions and mismatches according to the technique proposed by Hirschmuller [35] to better interpolate the disparity of the outlier pixels using different methods in the interpolation step. Occlusion/mismatch interpolation: In this step, we adopt different interpolation strategies to interpolate the disparities of detected occlusion and mismatch pixels. For an occluded pixel p, a valid non-occluded disparity value from background region is required since occluded areas normally locate on the background. Therefore, we extract the nearest reliable pixels in eight different directions and select the pixel with the lowest disparity value for interpolation. Otherwise, the holes due to mismatches are processed by selecting the pixels with the most similar color value.
Weighted median filter: A weighted median filter [36] is usually implemented following behind the interpolation process to smooth outliers and streak-like artifacts while preserving the object boundaries. In this paper, only the invalid pixels are filtered with bilateral weights, which is computed as: where I(p) and I(q) represent the intensity values of the pixel p and q. Parameters σ s and σ c adjust the spatial distance and color similarity, respectively. The filter assigns higher weights to pixels spatially close and similar in color. Subpixel refinement: Finally, a subpixel estimation approach based on quadratic polynomial interpolation is performed to reduce the discontinuities caused by discrete disparity levels [35]. For each pixel p, its optimal sub-pixel disparity valve d sub is determined by the following formula: where d is the discrete depth with the minimal cost, d − = d − 1, and d + = d + 1. C (p, d), C (p, d + ), and C (p, d − ) denote the aggregated costs with the corresponding disparities, respectively. Finally, a 3 × 3 median filter is adopted to remove a few spikes.

Results and Discussion
In this section, we evaluate the performances of our proposed cost computation measurement and cost aggregation strategy. The rectified stereo image pairs from the Middlebury benchmark dataset [32] are used as experimental data. The experimental parameters are given in Table 1, and they are kept constant for all tests. The percentage of bad pixels of the estimated disparities over the stereo pairs was served as evaluating criterion. And the disparity error threshold was set to 1.0 pixel.

Evaluation of the Robustness to the Illumination and Exposure of Our Cost Computation Method
To verify the effectiveness of our proposed cost computation method, we used six pairs of stereo images with ground truth: Aloe, Baby1, Bowling1, Cloth1, Flowerpots, and Rocks 1, provided by Middlebury 2006 datasets [32] which have three illuminations (1, 2, 3) and three different exposure settings (0, 1, 2). Figure 7 shows the left image of the aloe data under three different illuminations (no exposure variation) and with three different exposure settings (no illumination change). In this paper, three widely used cost computation methods were considered for comparison, including a function combining the sum of absolute difference (SAD) and gradient [6], a function combining absolute difference (AD) with census transform [14] and a combination of absolute difference, gradient, and Census transform [15]. To highlight the preference of cost computation function, all disparity maps were computed with the same cost aggregation algorithm proposed in Section 2.2 and no further refinement was applied.

Evaluation of the Robustness to the Illumination and Exposure of Our Cost Computation Method
To verify the effectiveness of our proposed cost computation method, we used six pairs of stereo images with ground truth: Aloe, Baby1, Bowling1, Cloth1, Flowerpots, and Rocks 1, provided by Middlebury 2006 datasets [32] which have three illuminations (1, 2, 3) and three different exposure settings (0, 1, 2). Figure 7 shows the left image of the aloe data under three different illuminations (no exposure variation) and with three different exposure settings (no illumination change). In this paper, three widely used cost computation methods were considered for comparison, including a function combining the sum of absolute difference (SAD) and gradient [6], a function combining absolute difference (AD) with census transform [14] and a combination of absolute difference, gradient, and Census transform [15]. To highlight the preference of cost computation function, all disparity maps were computed with the same cost aggregation algorithm proposed in Section 2.2 and no further refinement was applied.  Figures 8 and 9 show two pairs of the left and right images, the ground truth maps, and the disparity maps obtained by our proposed cost function and other three common combined cost methods under various illuminations and with different exposure settings, respectively. In Figure 8, the left images are captured under Illumination 1, the right images are captured under Illumination 3, and both are with the same Exposure 1. In Figure 9, the left images are taken with Exposure 0, while the right images are taken with Exposure 2. Furthermore, both the left image and right image are under the same Illumination 2. Comparing Figure 8g with Figure 8d-f, the disparity maps in Figure 7g are better than the disparity maps in Figure 8d-f, which indicates that the proposed cost computation method is more robust to illumination variation for these textureless stereo images. Comparing Figure 9g with Figure 9d,f, the disparity maps in Figure 9g are better than the disparity maps in Figure 9d,f, which indicates that the proposed cost computation method is more robust to exposure variation than cost combined SAD with gradient and cost combined AD, gradient and census. The disparity maps in Figure 9g and the disparity maps in Figure 9d have no significant differences.   Figures 8 and 9 show two pairs of the left and right images, the ground truth maps, and the disparity maps obtained by our proposed cost function and other three common combined cost methods under various illuminations and with different exposure settings, respectively. In Figure 8, the left images are captured under Illumination 1, the right images are captured under Illumination 3, and both are with the same Exposure 1. In Figure 9, the left images are taken with Exposure 0, while the right images are taken with Exposure 2. Furthermore, both the left image and right image are under the same Illumination 2. Comparing Figure 8g with Figure 8d-f, the disparity maps in Figure 8g are better than the disparity maps in Figure 8d-f, which indicates that the proposed cost computation method is more robust to illumination variation for these textureless stereo images. Comparing Figure 9g with Figure 9d,f, the disparity maps in Figure 9g are better than the disparity maps in Figure 9d,f, which indicates that the proposed cost computation method is more robust to exposure variation than cost combined SAD with gradient and cost combined AD, gradient and census. The disparity maps in Figure 9g and the disparity maps in Figure 9d have no significant differences.

Evaluation of the Robustness to the Illumination and Exposure of Our Cost Computation Method
To verify the effectiveness of our proposed cost computation method, we used six pairs of stereo images with ground truth: Aloe, Baby1, Bowling1, Cloth1, Flowerpots, and Rocks 1, provided by Middlebury 2006 datasets [32] which have three illuminations (1,2,3) and three different exposure settings (0, 1, 2). Figure 7 shows the left image of the aloe data under three different illuminations (no exposure variation) and with three different exposure settings (no illumination change). In this paper, three widely used cost computation methods were considered for comparison, including a function combining the sum of absolute difference (SAD) and gradient [6], a function combining absolute difference (AD) with census transform [14] and a combination of absolute difference, gradient, and Census transform [15]. To highlight the preference of cost computation function, all disparity maps were computed with the same cost aggregation algorithm proposed in Section 2.2 and no further refinement was applied.  Figures 8 and 9 show two pairs of the left and right images, the ground truth maps, and the disparity maps obtained by our proposed cost function and other three common combined cost methods under various illuminations and with different exposure settings, respectively. In Figure 8, the left images are captured under Illumination 1, the right images are captured under Illumination 3, and both are with the same Exposure 1. In Figure 9, the left images are taken with Exposure 0, while the right images are taken with Exposure 2. Furthermore, both the left image and right image are under the same Illumination 2. Comparing Figure 8g with Figure 8d-f, the disparity maps in Figure 7g are better than the disparity maps in Figure 8d-f, which indicates that the proposed cost computation method is more robust to illumination variation for these textureless stereo images. Comparing Figure 9g with Figure 9d,f, the disparity maps in Figure 9g are better than the disparity maps in Figure 9d,f, which indicates that the proposed cost computation method is more robust to exposure variation than cost combined SAD with gradient and cost combined AD, gradient and census. The disparity maps in Figure 9g and the disparity maps in Figure 9d have no significant differences.  The average percentage of bad pixels of disparity maps via these four different cost measurements under three radiometric conditions are shown in Tables 2 and 3. As a reference, we also computed the disparity maps without any illumination changes (Illumination 1) and exposure changes (Exposure 1) and the result is shown in Table 4. In Table 2, the proposed cost computation method achieved smallest error matching rate in five of six pairs of stereo images and achieved the best average error matching rate, which indicates that the proposed cost computation method is more robust to illumination variation than the comparing methods. In Table 3, the proposed cost computation method also achieved smallest error matching rate in five of six pairs of images and achieved the best average error matching rate, which indicates that the proposed cost computation method is more robust to exposure variation than the comparing methods. In Table 4, the proposed cost computation method achieved smallest error matching rate in three of six pairs of the stereo images and achieved the best average error matching rate, which indicates that the proposed cost computation method is comparable to state-of-the-art cost computation methods in situations without radiometric changes. From the results in Tables 2-4, we can conclude that our proposed cost function is less sensitive to lighting changes and exposure changes. This is mostly because the absolute difference is too sensitive to the image intensity variations, which weaken the accuracy of the other three approaches. Additionally, our proposed cost method is more robust under different exposure configurations than different illumination settings. One possible reason is that the change of exposure is regarded as a global linear transformation, while changing illumination results in local radiometric differences.  The average percentage of bad pixels of disparity maps via these four different cost measurements under three radiometric conditions are shown in Tables 2 and 3. As a reference, we also computed the disparity maps without any illumination changes (Illumination 1) and exposure changes (Exposure 1) and the result is shown in Table 4. In Table 2, the proposed cost computation method achieved smallest error matching rate in five of six pairs of stereo images and achieved the best average error matching rate, which indicates that the proposed cost computation method is more robust to illumination variation than the comparing methods. In Table 3, the proposed cost computation method also achieved smallest error matching rate in five of six pairs of images and achieved the best average error matching rate, which indicates that the proposed cost computation method is more robust to exposure variation than the comparing methods. In Table 4, the proposed cost computation method achieved smallest error matching rate in three of six pairs of the stereo images and achieved the best average error matching rate, which indicates that the proposed cost computation method is comparable to state-of-the-art cost computation methods in situations without radiometric changes. From the results in Tables 2-4, we can conclude that our proposed cost function is less sensitive to lighting changes and exposure changes. This is mostly because the absolute difference is too sensitive to the image intensity variations, which weaken the accuracy of the other three approaches. Additionally, our proposed cost method is more robust under different exposure configurations than different illumination settings. One possible reason is that the change of exposure is regarded as a global linear transformation, while changing illumination results in local radiometric differences.

Evaluation of Adaptive Shape Guided Filter on the Middlebury Benchmark Dataset
In this section, we chose 21 stereo pairs in Middlebury 2006 dataset for evaluation. To verify the effectiveness of our proposed algorithm, we evaluated both the local stereo matching method with the original guided image filter (GF) [23] and our proposed stereo matching method. The parameters of cost aggregation with an original guided filter such as the window size and smooth parameter are set according to [23]. We adopted the same cost computation method in both the original guided image filter algorithm and the proposed algorithm, which was proved to be more robust than other combined cost computation methods in Section 3.1. Further, to obtain more accurate disparity maps, we also used the same multi-constraints-based disparity refinement to exclude as many outliers as possible.
The stereo matching results of six pairs of textureless stereo images (Lampshade1, Lampshade2, Midd1, Midd2, Monopoly, and Plastic) are shown in Figure 10. Comparing the disparity maps in Figure 10e matched by our proposed method to the disparity maps in Figure 10c matched by the traditional guided filter algorithm, the disparity maps matched by our proposed method are much smoother in the textureless regions and preserves the edges better. Comparing the error maps of the proposed method in Figure 10f to that of the traditional guided filter algorithm in Figure 10e, the error pixels of our proposed method is much less than that of the traditional guided filter. The results presented in Figure 10 indicates that our proposed algorithm performs better than the original guided filter in large low texture images.
The percentage of bad pixels in the matching results of all 21 stereo pairs matched by our proposed method and traditional guided filter algorithm are shown in Table 5. The traditional guided filter algorithm performs much better (better than one percent of bad pixels) than our proposed method in 7 of the 21 stereo pairs (Aloe, Baby1, Baby2, Bowling1, Bowing2, Cloth2, and Wood1). In 8 of the 21 stereo pairs (Baby3, Cloth1, Cloth3, Cloth4, Flowerpots, Rocks1, Rocks2, and Wood2), our proposed method achieved comparable bad pixel rate to the traditional guided filter algorithm. For the six pairs of low texture stereo images, the proposed method achieved a much better result than the traditional guided algorithm. Although the error pixels have increased in 8 of the 21 image pairs, our algorithm is obviously more effective for textureless images. Especially in textureless image pairs Midd1 and Midd2, our proposed algorithm obviously recovers more correct disparity in low texture background (the third and fourth row in Figure 10). The error rate of Midd1 data is reduced by 23.8% and the error rate of Midd2 data is reduced by about 19%.

Evaluation of Adaptive Shape Guided Filter on the Middlebury Benchmark Dataset
In this section, we chose 21 stereo pairs in Middlebury 2006 dataset for evaluation. To verify the effectiveness of our proposed algorithm, we evaluated both the local stereo matching method with the original guided image filter (GF) [23] and our proposed stereo matching method. The parameters of cost aggregation with an original guided filter such as the window size and smooth parameter are set according to [23]. We adopted the same cost computation method in both the original guided image filter algorithm and the proposed algorithm, which was proved to be more robust than other combined cost computation methods in Section 3.1. Further, to obtain more accurate disparity maps, we also used the same multi-constraints-based disparity refinement to exclude as many outliers as possible.
The stereo matching results of six pairs of textureless stereo images (Lampshade1, Lampshade2, Midd1, Midd2, Monopoly, and Plastic) are shown in Figure 10. Comparing the disparity maps in Figure 10e matched by our proposed method to the disparity maps in Figure 10c matched by the traditional guided filter algorithm, the disparity maps matched by our proposed method are much smoother in the textureless regions and preserves the edges better. Comparing the error maps of the proposed method in Figure 10f to that of the traditional guided filter algorithm in Figure 10e, the error pixels of our proposed method is much less than that of the traditional guided filter. The results presented in Figure 10 indicates that our proposed algorithm performs better than the original guided filter in large low texture images.
The percentage of bad pixels in the matching results of all 21 stereo pairs matched by our proposed method and traditional guided filter algorithm are shown in Table 5. The traditional guided filter algorithm performs much better (better than one percent of bad pixels) than our proposed method in 7 of the 21 stereo pairs (Aloe, Baby1, Baby2, Bowling1, Bowing2, Cloth2, and Wood1). In 8 of the 21 stereo pairs (Baby3, Cloth1, Cloth3, Cloth4, Flowerpots, Rocks1, Rocks2, and Wood2), our proposed method achieved comparable bad pixel rate to the traditional guided filter algorithm. For the six pairs of low texture stereo images, the proposed method achieved a much better result than the traditional guided algorithm. Although the error pixels have increased in 8 of the 21 image pairs, our algorithm is obviously more effective for textureless images. Especially in textureless image pairs Midd1 and Midd2, our proposed algorithm obviously recovers more correct disparity in low texture background (the third and fourth row in Figure 10). The error rate of Midd1 data is reduced by 23.8% and the error rate of Midd2 data is reduced by about 19%.

Evaluation of the Influences of Parameter Settings
Selecting proper parameters is very important in local stereo matching methods. In this section, we explore the influence of different parameter settings. The main parameters in the cost computation step are regularization parameters and , which play an important role to adjust the proportion of two costs in the combined cost function. We chose five textureless images as test images and tuned the parameter individually with the rest of the parameters remaining constant. Figure  11a,b show the quantitative influence of parameters and to disparity estimation in detail. From this figure, we can find that disparity results are stable with regularization parameters and varying from 15 to 45. When is set to 25 and is set to 15, the overall performance is relatively better, so is recommended to be set to 25 and is recommended to be set to 15. In the step of cross-based support window construction, the color similarity threshold 1 and arm length threshold 1 were used to adjust the size of the support window for the richly textured regions. These two parameters have no significant influence on the disparity accuracy for most images, as shown in Figure 11c,d, since we set a small distance threshold ( = 9) to distinguish the textureless region from the textured region. For the Lampshade2 data set, the parameter settings ( 1 = 30, 1 = 31) improved the disparity accuracy by about 2% in the discontinuous regions. The experimental results are more affected by tuning the color similarity threshold 2 and arm length

Evaluation of the Influences of Parameter Settings
Selecting proper parameters is very important in local stereo matching methods. In this section, we explore the influence of different parameter settings. The main parameters in the cost computation step are regularization parameters λ GRAD and λ CTg , which play an important role to adjust the proportion of two costs in the combined cost function. We chose five textureless images as test images and tuned the parameter individually with the rest of the parameters remaining constant. Figure 11a,b show the quantitative influence of parameters λ GRAD and λ CTg to disparity estimation in detail. From this figure, we can find that disparity results are stable with regularization parameters λ GRAD and λ CTg varying from 15 to 45. When λ GRAD is set to 25 and λ CTg is set to 15, the overall performance is relatively better, so λ GRAD is recommended to be set to 25 and λ CTg is recommended to be set to 15. In the step of cross-based support window construction, the color similarity threshold τ 1 and arm length threshold L 1 were used to adjust the size of the support window for the richly textured regions. These two parameters have no significant influence on the disparity accuracy for most images, as shown in Figure 11c,d, since we set a small distance threshold (d Lim = 9) to distinguish the textureless region from the textured region. For the Lampshade2 data set, the parameter settings (τ 1 = 30, L 1 = 31) improved the disparity accuracy by about 2% in the discontinuous regions. The experimental results are more affected by tuning the color similarity threshold τ 2 and arm length threshold L 2 , which were used to determine the size of the adaptive support window in the large textureless regions. In this paper, we set the color similarity threshold τ 2 to 6 and arm length threshold L 2 to 80 respectively to obtain good accuracy. In addition, the disparity results degraded gradually with the increasing distance threshold d Lim , so we selected a small distance threshold which was set to 9. The smooth parameter of the modified guided filter ε was set according to [23].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 14 of 17 threshold 2 , which were used to determine the size of the adaptive support window in the large textureless regions. In this paper, we set the color similarity threshold 2 to 6 and arm length threshold 2 to 80 respectively to obtain good accuracy. In addition, the disparity results degraded gradually with the increasing distance threshold , so we selected a small distance threshold which was set to 9. The smooth parameter of the modified guided filter was set according to [23].

Conclusions
It is challenging to handle occluded, textureless and discontinuous regions in stereo matching. In order to obtain better disparity maps in large textureless regions, this paper proposed a local stereo matching method using efficient combined matching cost measurement and adaptive shape guided filter. A matching cost computation method that combines enhanced gradient-based matching cost with improved census transform-based matching cost, which is more robust against exposure variations and illumination changes as well as textureless areas, is proposed. Besides, cross-based adaptive shape support region is constructed for each pixel using adaptively calculated color similarity threshold and an adaptive shape guided filter based on this cross-shaped support region is implemented to aggregate matching cost and thus improve the accuracy of disparity estimation for large low texture regions. Experiments were conducted using Middlebury benchmark dataset to validate the proposed methods. The experimental results indicate that our proposed stereo matching method can produce more accurate disparity maps for large low texture regions. The average percentage of bad pixels of our proposed method is about 9.40%, which is much lower compared with the original guided filter-based method.