## 1. Introduction

Stereo matching is an important process in the field of computer vision, the goal of which is to reconstruct three-dimensional (3D) information from a scene with left and right stereo images [

1]. Stereo matching algorithms have been commonly applied in medical imaging and 3D imaging systems, such as satellite-based earth and space exploration, autonomous robots, and vehicle and security systems [

2]. Stereo matching is a challenging task due to difficulties such as textureless regions, occlusion, illumination variation, the fattening effect, discontinuity, flying snow, sun flare, and rain blur [

3,

4].

Sparse stereo matching methods typically use feature descriptors, such as scale-invariant feature transform [

5] and speeded-up robust features [

6], to compute sparse disparity map, where not all pixels have disparity values [

7,

8,

9]. Sarkis and Diepold [

10] introduced an approach to convert sparse disparity map to dense maps. The efficient large-scale stereo matching method (ELAS) [

11] operates on rectified input images, such that correspondences are restricted to the same line in both images.

In our work, we solve the different problem, which input stereo images have been rectified, but the rectification operates imperfectly. Unlike ELAS, our proposed method does not assume that correspondences are restricted to the same line in both images. In addition, our proposed method is a dense stereo matching. There is no interpolation step in our proposed method.

Scharstein et al. [

12] classified stereo matching algorithms into local and global algorithms, which consist of steps for matching cost computation, cost aggregation, depth map computation, and depth map refinement phases. The matching cost computation step is required for both types of stereo matching algorithms and is important to the accuracy of the disparity map. The output of the matching cost computation step is a disparity space image

$\mathbf{C}$ [

12] in which

${\mathbf{C}}_{d}\left(\mathbf{p}\right)$ is the matching cost value of a pixel

$\mathbf{p}$ in the reference image, e.g., the left image of a stereo pair, and at a disparity hypothesis

d.

Local stereo matching algorithms use cost aggregation techniques to locally smooth the matching cost values in

$\mathbf{C}$. Let

${\mathbf{C}}^{\prime}$ be the result of applying a cost aggregation technique to

$\mathbf{C}$. From

${\mathbf{C}}^{\prime}$, a disparity value for

$\mathbf{p}$ can be obtained by using a winner-takes-all strategy, as follows:

where

${\mathbf{D}}_{E}$ is an estimated disparity map.

Global stereo algorithms can use global optimization methods, such as graph-cut [

13] or belief propagation [

14], to minimize the energy function that constrains the smoothness of the disparities between two neighboring pixels. In global stereo matching, the energy function is first defined and is then solved as an energy minimization problem. A disparity map with higher energy is more erroneous, whereas a disparity map with lower energy is more accurate. The typical form of an energy function in stereo matching is

where

${E}_{data}$ is the measurement of the photo consistency which is computed using a matching cost function.

${E}_{smooth}$ is a measurement of the smoothness, which is defined as follows:

and

where

$\Delta $ is a predefined penalty value that balances the smoothness and data terms,

$\mathsf{\Omega}$ is the set of neighboring pixels in the reference image, and

$s\left(\right)$ is a smoothness function that gives a penalty if the disparities of two pixels are different.

${d}_{\mathbf{p}}$ and

${d}_{\mathbf{q}}$ are disparity values of pixels

$\mathbf{p}$ and

$\mathbf{q}$, respectively.

According to Hirschmuller et al. [

15], radiometric differences between stereo images are inherent and inevitable even when the images are produced under controlled lighting and exposure conditions. However, advanced stereo matching cost functions [

16,

17] can operate robustly with stereo images of different intensity transformations. In other words, the radiometric distortion problem in stereo matching can be solved in the matching cost computation step. Textureless regions, discontinuity, and occlusion problems can be solved by cost aggregation or depth map computation processes [

18].

The assumption of existing dense stereo matching algorithms is that input stereo images are perfectly rectified such that correspondent pixels between the rectified stereo images have the same y-coordinate values. This assumption is commonly known as the frontal-parallel assumption. However, obtaining perfect rectification for a stereo pair, especially for large stereo images, is currently a challenge [

19]. Therefore, when working on stereo images with high resolution, stereo matching algorithms are required to consider this imperfect rectification problem, as the frontal-parallel assumption does not hold true anymore.

A stereo pair, before used as input stereo images for stereo matching algorithms, typically undergoes a rectification process. The rectification process aims for correspondent pixels between stereo images to be located in the same frontal-parallel lines (or epipolar lines). However, according to [

19], it is difficult to achieve perfect results with current rectification methods when operating on a stereo pair with high resolution. Correspondent pixels in stereo images with imperfect rectification may be located in different epipolar lines [

19]. This means that correspondent pixels do not satisfy the frontal-parallel assumption that all dense stereo matching algorithms require. The imperfect problem is unavoidable when rectifying high resolution stereo images, even using advanced rectification methods [

19]. At the same time, the need for high resolution stereo images is on the rise [

18,

19]. However, there is a lack of research on imperfect rectification in stereo matching and most previous studies [

20,

21,

22,

23,

24,

25,

26,

27,

28] are not aware of the problem of high resolution images.

Existing stereo matching methods are dense methods that compute disparity values for each pixel, and most algorithms implicitly or explicitly make an assumption about epipolar geometry that the corresponding pixels locate in the same epipolar line. Currently, only the Middlebury dataset provides stereo images with high resolution and imperfect rectification, and these stereo images are not included in its benchmark. Therefore, existing research only focuses on low and high stereo images with perfect rectification.

In this paper, we propose several novel matching cost using state-of-the-art matching cost for high resolution stereo images. We use the Middlebury dataset [

19] to evaluate the proposed matching cost functions in local and global stereo matching frameworks. The testing local stereo matching algorithms include the absolute different (AD)-based window algorithm, squared difference (SD)-based window algorithm, Rank-based window algorithm, Census-based window algorithm, normalized cross correlation (NCC), and zero-mean normalized cross correlation (ZNCC) [

29]. According to [

15,

30], NCC and ZNCC can be considered a local stereo algorithm, so in our experiments, we do not apply the cost aggregation (via a window) for NCC and ZNCC. The testing global stereo matching algorithms include the AD and graph cut (GC) [

13], SD and GC, Rank and GC, and Census and GC algorithms.

## 3. Experimental Results

We used the Middlebury [

19,

33] dataset to measure the performance of matching cost functions including AD, SD, NCC, ZNCC, Rank, Census, ImpAD, ImpSD, ImpNCC, ImpZNCC, ImpRank, and ImpCensus in local and global frameworks. In the present experiments, we do not intend to compare the performance of the test matching cost functions and stereo matching algorithms. We plan to compare the performance of stereo matching algorithms before and after applying the modification to solve the imperfect rectification problem.

For each of the test matching cost functions, we implemented local and global stereo matching algorithms that use the function in the matching cost computation. For local stereo matching algorithms, we used a 15 × 15 window to aggregate matching costs using

$\mathbf{C}$. For global stereo matching, we use graph-cut (GC) [

34] to smooth

$\mathbf{C}$. We used the source code of GC in [

35]. We carefully and optimally choose the parameters of GC for global stereo algorithms, which use AD, SD, Rank, and Census, by using stereo images with perfect rectification conditions as training examples. The global stereo matching algorithms, which are based on ImpAD, ImpSD, ImpRank, and ImpCensus, use the same parameter values as the global algorithms that are based on the AD, SD, Rank, and Census matching cost functions, respectively.

According to [

15,

30], NCC and ZNCC can be considered local stereo matching algorithms; hence, we do not apply cost aggregation techniques and global optimization methods for NCC, ZNCC, ImpNCC, and ImpZNCC. For the Rank, Census, ImpRank, ImpCensus, NCC, ZNCC, ImpNCC, and ImpZNCC functions, which require a support window, we used the 9 × 9 window.

For AD, SD, ImpAD, and ImpSD, each pixel of the input stereo images is subtracted by a mean value which is computed by an image window of the pixel. As a result, these four matching cost functions can reduce the effect of illumination different between the stereo images, and we can measure better the effect of the modification that solve the imperfect rectification problem. We used the 9 × 9 window for this mean subtraction.

The performance of these four matching cost functions is measured by using the winner-takes-all strategy for

$\mathbf{C}$. All of the matching cost algorithms were evaluated using the average percentage of erroneous pixels in all zones, except occluded areas, and were computed at a 2-pixel error threshold. This error threshold is a default value in Middlebury benchmark 3 [

19]. The error percentage (

$Err$) was computed as follows:

where

${I}_{nocc}$ is the set of all nonoccluded pixels,

$\left|{I}_{nocc}\right|$ is the number of pixels in

${I}_{nocc}$, and

${\mathbf{D}}_{G}\left(\mathbf{p}\right)$ and

${\mathbf{D}}_{E}\left(\mathbf{p}\right)$ are the ground truth and estimated disparity at

$\mathbf{p}$, respectively.

Middlebury dataset 3 [

36] provides the test and training stereo images with different conditions: varying illumination and exposure, and both perfect and imperfect rectification problems. The training stereo images are with ground truth, whereas the test datasets are not. The Middlebury benchmark 3 compares submitted stereo matching algorithms using the test dataset with these four conditions. However, in this paper, we focus on solving the imperfect rectification problem of stereo images. Therefore, in our experiments, we use the training datasets, which contain stereo images with imperfect rectification and varying illumination and exposure.

Table 1 presents data for the stereo images in the training datasets. We implemented three versions with

$R=0$,

$R=1$, and

$R=2$, respectively. The algorithms with

$R=0$ has no effect on matching cost function. Therefore, for example, ImpZNCC with

$R=0$ is simply ZNCC.

#### 3.1. ImpCensus and ImpRank

We conducted experiments to evaluate the performance of Census, Rank, ImpCensus, and ImpRank matching cost functions in local and global stereo matching approaches. Denote ImpCensus/Win/R1 as a local stereo matching algorithm that uses the ImpCensus matching cost function with

$R=1$ to construct

$\mathbf{C}$ and aggregates matching costs using a window. In addition, denote ImpCensus/GC/R1 as a global stereo matching algorithm that uses ImpCensus with

$R=1$ and GC to globally optimize the energy function, as described in Equation (

2). Similarly, other denoted stereo matching algorithms can be used by changing the matching cost functions and the

R values.

Figure 1 shows the results of the ImpCensus-based stereo matching algorithms using the Backpack stereo images with different

R values. Disparity maps in the second line are the result of the ImpCensus-based local algorithms, whereas the third line shows the disparity maps of the ImpCensus-based global algorithms. Census/Win and Census/GC produced the most erroneous disparity maps because they were un-aware of the imperfect rectification problem. ImpCensus/GC/R1 and ImpCensus/GC/R2 reduced the error rates. The error rate reduction is clearly seen from

Figure 1g,h, especially in textured image regions. These observations agree with those in [

19] that the imperfect rectification problem commonly happens in textured image regions.

Table 2 and

Table 3 show the quantitative results of local and global stereo matching algorithms that use Rank and ImpRank, and Census and ImpCensus, respectively. The ImpCensus-based stereo matching algorithms outperformed the Census-based algorithms for all the test stereo images. Similarly, the performance of the ImpRank-based stereo matching algorithms were superior to the Rank-based algorithms. In the Playtable stereo images, for example, the modification allows the ImpCensus-based local algorithm to reduce the error rate by up to 27.9% (65.28% of Census/Win and 37.38% of ImpCensus/Win/R2). On the other hand, in a global approach, the error rate of ImpCensus/GC/R2 was 39% smaller than that of Census/GC (70.29% of Census/Win and 31.20% of ImpCensus/Win/R2).

For the Census- and ImpCensus-based local and global stereo matching algorithms, average error rates of ImpCensus/Win/R1 (39.49%) and ImpCensus/Win/R2 (38.50%) were about 6% smaller than that of Census/Win (45.46%), whereas average error rates of ImpCensus/GC/R1 (37.74%) and ImpCensus/GC/R2 (32.42%) were more than 12% smaller than that of Census/GC (50.38%). Similarly, the awareness of high resolution images had the positive effect for the ImpRank-based stereo matching algorithms such that the ImpRank-based algorithms with $R=1$ and $R=2$ had smaller average error rates than the Rank-based algorithms.

#### 3.2. ImpAD and ImpSD

We performed experiments to evaluate the performance of AD, SD, ImpAD, and ImpSD in local and global stereo matching approaches.

Table 4 and

Table 5 show the quantitative results of local and global stereo matching algorithms that use AD and ImpAD, and SD and ImpSD, respectively. For all of the test stereo images, ImpAD/Win/R1 and ImpAD/Win/R2 outperformed AD/Win, and ImpAD/GC/R1 and ImpAD/GC/R2 were superior to AD/GC. Similarly, the error rates of ImpSD/Win/R1 and ImpSD/Win/R2 were smaller than those of SD/Win, and ImpSD/GC/R1 and ImpSD/GC/R2 performed better than ImpSD/GC/R2 for all the test stereo pairs.

We computed the average performance of each of the test stereo matching algorithms for the test stereo images. For the AD- and ImpAD-based stereo matching algorithms, AD/Win and AD/GC had the largest errors in their corresponding groups, with average error rates of 54.46% and 45.72%, respectively. In contrast, ImpAD/Win/R1 and ImpAD/GC/R1 had the beter performance in the local and global approaches, respectively. ImpAD/Win/R1 performed with average error rates of 48.38%, whereas ImpAD/GC/R1 operated at 34.59% for the test stereo pairs.

For the SD- and ImpSD-based stereo matching algorithms, SD/Win and SD/GC had the largest errors in their correspondent groups, with the average error rates of 54.76% and 45.47%, respectively. In contrast, ImpSD/Win/R1 and ImpSD/GC/R1 had the best performance in the local and global approach, respectively. ImpSD/Win/R1 performed with average error rate of 48.79%, whereas ImpSD/GC/R1 had an error rate of 35.39% over the test stereo pairs.

#### 3.3. ImpNCC and ImpZNCC

We evaluated the performance of NCC and ZNCC with and without using the modification. We evaluated the performance of NCC, ImpNCC, ZNCC, and ImpZNCC directly from the corresponding disparity space image $\mathbf{C}$ using a winner-take-all strategy. Denote ImpNCC/R1 as a matching cost function that uses ImpNCC with $R=1$ to construct $\mathbf{C}$.

Figure 2 shows the results of the ImpZNCC matching cost functions with different

R values using the Motorcycle stereo images.

Figure 2a,b show the left and right images, whereas the ground truth of the left image is shown in

Figure 2c. Disparity maps of ZNCC, ImpZNCC/R1, and ImpZNCC/R1 are shown in

Figure 2d–f, respectively. ZNCC produced the most erroneous disparity maps with an average error rate of 49.02% because ZNCC ignores the imperfect rectification problem. ImpZNCC/R1 and ImpZNCC/R2 reduced the error rates with average error rates of 43.73% and 43.64%, respectively.

Table 6 and

Table 7 show the quantitative results of the NCC, ImpNCC, ZNCC, and ImpZNCC matching functions, respectively. Using the modification, NCC had the worst performance when producing more erroneous disparity maps than ImpNCC/R1 and ImpNCC/R2. Similarly, the awareness of high resolution images improved the performance of ImpZNCC/R1 and ImpZNCC/R2, which were superior to ZNCC for all of the test stereo pairs.

#### 3.4. Stereo Image with Radiometric Distortion

Stereo matching algorithms need to operate robustly on stereo images with radiometric distortion such that they can be used for outdoor applications and road-driving images. In this subsection, we evaluated the performance of stereo matching algorithms that are aware of the high resolution images for stereo images with radiometric distortion and imperfect rectification problems. We used two Middlebury sub-datasets in which one sub-dataset had imperfect rectification and varying exposure and the other sub-dataset had imperfect rectification and varying illumination.

In the present experiments, because Census is one of the most robust matching functions for stereo images with radiometric distortions [

15], we use only the ImpCensus-based global stereo matching algorithms.

Figure 3 shows the results of Census/GC, ImpCensus/GC/R1, and ImpCensus/GC/R2 using two stereo pairs. The second line shows the disparity maps of the test stereo matching algorithms using a stereo pair (a) and (b) with varying exposure and imperfect rectification, whereas the third line shows the disparity maps using a stereo pair (a) and (c) with varying illumination and imperfect rectification. The error rates of ImpCensus/GC/R1 and ImpCensus/GC/R2 were smaller than those of Census/GC in the two stereo pairs.

Table 8 and

Table 9 show the quantitative results of the local and global stereo matching algorithms, which use ImpCensus and the two Middlebury sub-datasets. For all of the cases in the two tables, the performance of the ImpCensus-based global stereo matching algorithms were improved. Stereo images with varying illumination are often more challenging for stereo matching algorithms than stereo images with varying exposure [

15]. Overall, the performance of ImpCensus/GC/R1 and ImpCensus/GC/R2 were superior to Census/GC for all the test stereo images.

#### 3.5. Using Normal Stereo Images

In this subsection, we evaluated the performance of the proposed stereo matching methods using normal stereo images. In other words, we measure the Imperfect-based method using perfectly rectified Middebury stereo datasets.

We used sub-datasets, including Aloe, Baby1, Baby2, Baby3, Cloth1, Cloth2, Cloth3, Cloth4, Rocks1, Rocks2, Wood1, and Wood2, to evaluate the Imperfect-based method with different

R.

Figure 4 shows the quanlitative results of the ImpCensus-based method for the Aloe, Baby1, Rock1, and Wood2 image pairs. The ImpCensus-based method explores correspondences in larger searching spaces in terms of the expasion parameter

r. As a result, the ImpCensus-based method marginally degraded for perfectly rectified stereo images.

Table 10 shows the error rates for the ImpCensus-based method using perfectly rectified stereo images. Clearly, the expansion parameter

r had no benefit for these images. Looking for correspondences for larger searching space (with

$R=1$ and

$R=2$) made the ImpCensus-based method more erroneous.

#### 3.6. Computation Time

In order to measure the computation times of the matching cost functions, we used the Bicycle stereo images with a resolution of

$1968\times 3052$ and a disparity range of 180. We experimentally investigated the matching cost functions, including ImpCensus, ImpRank, ImpAD, ImpSD, ImpNCC, and ImpZNCC, with

$R=0$,

$R=1$, and

$R=2$, respectively. The experimental PC platform had a configuration consisting of an Intel core i7, a 4.00 GHz CPU, and 16.00 GB of memory.

Table 11 shows the computation times that are needed for the test matching cost functions to compute the disparity space image

$\mathbf{C}$. The testing algorithms requires more computation time when the expansion factor

R increases.

As shown in the above tables, methods with the expansion range $R=1$ clearly reduce the error rates of their original versions. However, methods with $R=2$ performed comparable or marginally better than those with $R=1$.

In addition, we further evaluated performance of the proposed local stereo matching methods for

$R=3$ and

$R=4$ using the imperfectly rectified stereo images of the Middlebury dataset, as shown in

Table 12. Increasing value for the parameter range

R had the negative effects and increase error rates. Therefore, generally,

$R=1$ shows to be the best appropriate value.

Let $\left|I\right|$ be image size and D be disparity range. AD and SD are pixel-wise method, so their computational complexities are $\mathcal{O}\left(\left|I\right|\times D\right)$. Rank and Census are window-based cost functions that each matching cost is computed for windows W. For each window pairs, Rank accumulates values of relative order between center pixel and its neighbors. Therefore, Rank computational complexity is $\mathcal{O}\left(\left|I\right|\times D\times (P-1)\right)$. Census encodes $(P-1)$ relative orders into a bit string and then compute a matching cost by comparing differences between two strings. Therefore, Census computational complexity is $\mathcal{O}\left(\left|I\right|\times D\times {P}^{2}\right)$.

The proposed cost functions with the parameter range R requires to process $K=R\times 2+1$ pixels in the right images for each pixel in the left image. Therefore, the computational complexities for ImpAD and ImpSD are $\mathcal{O}\left(\left|I\right|\times D\times K\right)$, and for ImpCensus and ImpRank are $\mathcal{O}\left(\left|I\right|\times D\times (P-1)\times K\right)$.