Next Article in Journal
Real-Time DTM Generation with Sequential Estimation and OptD Method
Previous Article in Journal
Research on Structural Mechanics Stress and Strain Prediction Models Combining Multi-Sensor Image Fusion and Deep Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Method for Adapting Stereo Matching Algorithms to Real Environments

by
Adam L. Kaczmarek
Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, ul. G. Narutowicza 11/12, 80-233 Gdansk, Poland
Appl. Sci. 2025, 15(7), 4070; https://doi.org/10.3390/app15074070
Submission received: 9 March 2025 / Revised: 2 April 2025 / Accepted: 2 April 2025 / Published: 7 April 2025
(This article belongs to the Section Robotics and Automation)

Abstract

:
This study challenges the commonly used testbeds and benchmarks for testing stereo matching algorithms. Although the algorithms listed in the rankings based on these testbeds score exceptionally high, stereo matching technology still suffers from major drawbacks; as such, it is much less popular in commercial use than other technologies for 3D scanning, such as structured-light 3D scanners. One of the main problems is that the poor quality of the results is either blamed on an inappropriate stereo camera calibration or a bad stereo matching algorithm. However, this study shows that both of these steps need to be considered together. In this paper, a solution is proposed by integrating the problem of camera calibration with the execution of a stereo matching algorithm. This approach makes it possible to restore stereo matching as a technology that is competitive with other methods of 3D image acquisition.

1. Introduction

Current research on stereo matching algorithms is dominated by designing algorithms that are most suited to processing images from well-known testbeds, such as Middlebury Stereo Evaluation or KITTI Vision Benchmark Suite [1,2,3,4]. Although these testbeds have had a tremendous impact on the development of the stereo vision algorithms tested, they tend to only be usable on the images included in these testbeds. The most evident example of such an approach is the algorithm proposed by Angreas Geiger, the main contributor to the KITTI benchmark. He also released the code of a stereo matching algorithm called Efficient Large-scale Stereo Matching (ELAS) [5]. In the algorithm’s source code, it can be observed that there is a configuration for the real world and a configuration for benchmarks. As far as other algorithms are concerned, their authors do not provide any information regarding whether their codes were tested or used with images other than those from the testbeds. Moreover, the authors of these algorithms usually do not provide the source code of their solutions, in contrast to those presented in the KITTI ranking [4]. The lack of access to source code makes applying algorithms and verifying their performance harder. However, there are cases of highly popular algorithms that have a low rank. One such algorithm is the Semi-Global Block Matching (StereoSGBM) algorithm implemented in the popular OpenCV library [6].
The problem of the insufficient usability of stereo matching algorithms has a significant impact on the industrial usage of stereo cameras, which is surprisingly low. If 3D scanning an object is required, methods other than stereo cameras are used, such as LIDAR scanners, structured-light 3D scanners, or multi-view stereo methods [7,8,9]. It needs to be emphasized that the stereo matching algorithms listed in the relevant rankings achieve error rates close to 0%. However, even if these algorithms are very good, the technology of stereo cameras is rarely commercially used. Such a conclusion can be drawn from the availability of different kinds of devices on the market. There are a large number of companies manufacturing LIDARs and structured-light 3D scanners. These companies include Pepperl+Fuchs Inc., Analog Devices, SICK AG, XenomatiX N.V., Innoviz Technologies Ltd., RIEGL Laser Measurement Systems GmbH, and many others [10]. Moreover, they offer a large variety of models. In the case of stereo cameras, there are only a few manufacturers with limited offerings: there is a company called StereoLabs offering four different models [11], and in 2024, Basler also included stereo cameras in its offerings [12]. However, many of them are based on active stereo matching, in which light patterns are emitted in order to perform the measurement. Companies also offer only stereo camera hardware without a stereo matching algorithm [13].
The problem with datasets in such areas is that they are calibrated using exceptionally high precision, which is impossible to achieve when using stereo cameras in real environments. For example, the images included in the KITTI benchmark were calibrated every day during routine image acquisition [14]. This is reasonable when preparing a high-quality testbed. However, it is unacceptable for commercial use. Moreover, the calibration of stereo cameras deteriorates over time when they are actively used in their target environment [15].
This study addresses the problem of designing stereo matching algorithms that are only suitable for testbeds. The main contributions of this paper are the following: (1) The author proposes a novel, additional module for existing stereo matching algorithms. This module adapts algorithms to real applications. (2) The paper introduces a testing method for stereo matching algorithms. The method is used to verify the usability of algorithms in real environments to a larger extent than those currently used in testbeds. (3) The author also presents the results of experiments using the proposed methods.

2. Related Work

Pattern-based calibration is the most common method for calibrating stereo cameras, whereas bundle adjustment is regularly used to process a larger set of images [16,17]. The research presented in this paper is derived from both of these technologies.

2.1. Pattern-Based Calibration

Zhang presented a highly influential paper on stereo camera calibration [16]. His work has been implemented in a well-known library called OpenCV [6]. The method proposed by Zhang is based on taking a series of images of a 2D pattern that has black squares. These images are made using a stereo camera, utilizing different angles and points of view with respect to the location of a chessboard pattern. The purpose of preparing an image set is to determine the coefficients that characterize the analyzed stereo camera. In the OpenCV version of Zhang’s work, these coefficients are applied to Equation (1), which describes the transformation of a point P w from the 3D coordinates of a real space to point p , which is located in a 2D plane.
s p = A R | t P w
where R and t are rotation and translation matrices, A is a camera intrinsic matrix, and s is a scaling factor.
Matrices R and t are determined by coefficients. In this paper, the same model as the one used in the fourth version of the OpenCV library is used, in which sixteen coefficients are taken into account, including six radial coefficients, four tangential distortion coefficients, four thin-prism-distortion coefficients, and two angular parameters for rotation [18].
Another calibrating method was described by Geiger et al., who participated in the preparation of KITTI Vision Benchmark Suite [19]. They proposed calibrating cameras using sets of chessboards placed in different locations in a 3D space. Li et al. proposed applying calibrating patterns generated from partly random noise [20]. The pattern that they proposed contained more features of varying scales than a chessboard pattern. It was designed for calibrating sets of multiple cameras. Blaschitz et al. used a pattern containing dots of different sizes. Recently, deep learning methods have also been adapted to the field of calibrating cameras; Duan et al. described a calibrating method intended for use with large-scale scenes [21].

2.2. Bundle Adjustment

Bundle adjustment (BA) is a method commonly used in multiple-view geometry; it aims to reconstruct the 3D structure of objects visible in images [22]. Typically, bundle adjustment is performed on a set of images taken from a variety of different points of view in a scene. The purpose of bundle adjustment is to estimate the camera poses from which the images were taken. This method is used in algorithms for multiple-view stereo setups and structures from motion [9]. Bundle adjustment has a wide range of applications; e.g., Li et al. used it for satellites [23,24].
In the context of this paper, it is significant that bundle adjustment does not operate on predefined patterns, contrary to pattern-based stereo calibration. Bundle adjustment is based on identifying keypoints. Keypoints are located using descriptors such as the scale-invariant feature transform (SIFT) or speeded-up robust features (SURFs) [25]. Keypoints are matched with each other in order to identify the same parts of objects in different images from the processed set of images. During the process of matching keypoints, some errors can appear. Therefore, having many images of the same part of a reconstructed object is crucial.
The second feature of bundle adjustment, which affects the research presented in this paper, is that BA is intended for use on images not taken by cameras adjacent to each other, such as the stereo camera setup. This issue is discussed in papers describing the problem of image selection for the process of 3D reconstruction [26]. It is not possible to determine the exact poses of cameras when using BA. Any inaccuracies are crucial in the case of stereo camera calibrations where the cameras are close to each other. For this reason, pattern-based calibration is used with this equipment.
Another difference from pattern-based camera calibration is that BA is an iterative process in which multiple camera poses are estimated. BA minimizes errors in pose estimation. Each pose corresponds to every processed image. Similarly, a series of images taken from different points of view are used in the case of pattern-based stereo camera calibration, but the purpose of calibration is different. Camera poses with respect to pattern locations in 3D space fall outside the scope of the calibration process in pattern-based stereo camera calibration. The aim is to estimate the mutual location of cameras in stereo camera setups and their intrinsic parameters.
In general, bundle adjustment is not typically used simultaneously with pattern-based calibration. However, a prominent paper by Scharstein et al. described the application of bundle adjustment to images obtained from a stereo camera after using pattern-based calibration [27]. They used bundle adjustment to reduce distortions in the y-axis. They also supported their calculations with structured lighting. Other authors also used descriptors to calibrate camera arrays [28].

2.3. Self-Calibration

Self-calibration, which is also called autocalibration, is a method for calibrating cameras. This method is often applied when there is no possibility of performing pattern-based calibration. Zhang et al. described performing self-calibration on a stereo vision system of a lunar rover [29]. Yang et al. presented the self-calibration of stereo cameras placed on a satellite [30].
There is a similarity between self-calibration and bundle adjustment; both of these methods identify keypoints in images and, on this basis, estimate camera parameters. Liu et al. proposed a self-calibration method for stereo cameras based on SIFT [31]. Boukamcha, Atri, and Smach analyzed the usage of various kinds of descriptors for the purpose of autocalibration [32]. Hartley and Zisserman presented an in-depth description of self-calibration methods for multi-view geometry [22].
There are also calibration methods designed for Simultaneous Localization and Mapping (SLAM) [33]. The main feature of SLAM is that it applies to robots or moving vehicles in some areas. In this process, a large number of images can be captured by cameras placed on a moving device. These are images of different scenes taken from different points of view, and these data are used to perform self-calibration on cameras used with SLAM.

2.4. Stereo Matching Algorithms

There is a large variety of stereo matching algorithms. Many of them are listed in the rankings of this kind of algorithm. The Middlebury Stereo Evaluation—Version 3 ranking contains 247 algorithms as of 23 August 2024 [3]. The KITTI Vision Benchmark Suite [4] includes 340 algorithms as of 23 August 2024.
As mentioned in the introduction, in the case of most of these algorithms, their source codes are not provided by the authors; this makes it hard to verify their functionality. However, there are also algorithms that are usable in real environments. One such algorithm, for example, is the well-known Semi-Global Block Matching (StereoSGBM) algorithm, which has been implemented in the OpenCV library [6]. The algorithm was derived from the work of Hirschmüller [34].
Stereo cameras also have some limitations. In particular, they make obtaining data for objects with transparent or reflective surfaces problematic. Flat surfaces of the same color are also problematic because it is hard to match parts of objects that are visible in one image with the corresponding visible parts in other images when using a stereo camera.

3. Materials and Methods

Pattern-based calibration and bundle adjustment are procedures that are not intended to be executed in real time. Pattern-based calibration is regarded as a part of the process of assembling a stereo camera. When the camera is ready, stereo matching can be used. Bundle adjustment is expected to process a large number of input images; therefore, using it in real time can be a challenging task, but it is possible.

3.1. The Method of Auxiliary Calibration

In this paper, a two-stage stereo camera calibrating method is proposed. The first stage uses regular pattern-based calibration, and the second stage uses additional calibration based on bundle adjustment and self-calibration. In this paper, this second stage is called the auxiliary calibration step (ACS). It is executed right before the stereo matching algorithm. In fact, it forms a part of the stereo matching algorithm. As described in the introduction, commonly used stereo camera calibration methods do not guarantee calibrating cameras to the extent achieved for the purpose of preparing testbeds. Stereo matching algorithms are frequently designed for processing images calibrated for high quality. The auxiliary calibration step aims to additionally transform images obtained from stereo cameras used in real environments in order to improve their calibration quality. The process of obtaining disparity maps using the ACS is presented in Figure 1.
The first step required to use the ACS is the preparation of a calibration pattern in order to obtain transformation parameters for pattern-based camera calibration. Uncalibrated images from a stereo camera are then transformed into calibrated ones. The images after these transformations are typically used to obtain disparity maps using a stereo matching algorithm.
However, with the use of the ACS module, the images are further calibrated. This module is integrated into a stereo matching algorithm because it is applied to every pair of images processed by a stereo matching algorithm; this is unlike regular calibration, which aims to generally identify distortions created by stereo cameras.
The main purpose of adding the ACS module is to reduce discrepancies in the y-axis. This kind of distortion severely affects the possibility of obtaining a proper disparity map. As presented in Section 2.2, Scharstein et al. used bundle adjustment after pattern-based calibration in order to handle distortions in the y-axis [27]. The work described in [27] aimed to prepare a testbed. However, the ACS module is expected to work in real time for every pair of images processed by a stereo matching algorithm. Therefore, it is not acceptable for the ACS to be considered part of an iterative algorithm, as is the case for bundle adjustment. This paper proposes a solution based on bundle adjustment, but it focuses only on distortions in the y-axis.
The functions of the ACS module are presented in this paper using the same notations as in Section 2.1. If pattern-based calibration is performed according to the transformations presented in Equation (1), then this calibration applies to both of the cameras included in a stereo camera. Therefore, Equation (1) takes the form presented in Equation (2).
p 0 = A 0 R 0 | t 0 P w 0 p 1 = A 1 R 1 | t 1 P w 1
where index 0 refers to the left camera, which is assumed to be a reference camera. The reference camera performs the function of being a point of view in a stereo camera. Index 1 in Equation (1) denotes the right camera, which is a side camera. An image from this camera is used to determine disparities in the points visible in an image taken from a reference camera. Equation (1) presented in Section 2.1 applies to homogeneous coordinates. Therefore, the parameter s is excluded from further calculations presented in this paper, and it is not present in Equation (2).
Equation (2) sufficiently describes the transformations required to obtain a calibrated pair of images from a stereo camera if a perfect pattern-based calibration exists. However, inaccuracies remain after calibration. These inaccuracies are modeled by vector v 1 , which is included in Equation (3).
p 0 = A 0 R 0 | t 0 P w 0 p 1 + v 1 = A 1 R 1 | t 1 P w 1
The concept of the v 1 vector is derived from the bundle adjustment process; however, in this paper, it will be applied to a single stereo camera instead of a large number of images taken from different points of view. v 1 reflects the inaccuracy of mapping point P w 1 to 2D plane cameras. In Equation (3), p 1 denotes the location of a point after a perfectly performed calibration.
The v 1 vector is included only in the equation for the side camera because adjusting a side image to a reference image is sufficient to obtain extrinsic calibration. This is sufficient because, in the ACS module, only a rotation and a translation in the y-axis are considered. Not including the vector v 0 in the equation for a reference image is even more important because it does not interfere with the process of transforming disparity maps into depth maps. Stereo matching algorithms produce a disparity map. However, in real-world applications, a disparity map is not needed; it is a depth map that determines the distances to objects that are visible to an image sensor. The calibration parameters of the reference camera make it possible to convert a disparity map into a depth map. If the v 0 vector is added to the calibration formula of a reference image, then it influences these transformations. Due to the exclusion of v 0 , this problem is avoided.
The process of auxiliary calibration aims to minimize vector v 1 . In order to achieve this, the author added an additional transformation C , which is a result of executing the ACS module presented in Figure 1. Transformation C does not modify matrices A 1 , R 1 , and t 1 ; it only further processes images. After adding the ACS module, Equation (3) is converted into Equation (4).
p 0 = A 0 R 0 | t 0 P w 0 p 1 + v 1 = C A 1 R 1 | t 1 P w 1
where v 1 is the result of minimizing v 1 . Vector v 1 reflects a discrepancy from a perfect calibration, even after executing the ACS module.
The C matrix contains transformation parameters corresponding to the transformation model applied in the ACS module. These transformations include translating the side image in the y-axis and rotating this image. The aim of the ACS is to reduce these extrinsic distortions. The side image is shifted by the number of points equal to c. It is also rotated by the angle θ . Therefore, the transformation C has the form presented in Equation (5).
C = c o s ( θ ) s i n ( θ ) s i n ( θ ) c s i n ( θ ) c o s ( θ ) c o s ( θ ) c 0 0 1
The process of estimating the value of parameters c and θ is similar to bundle adjustment; however, it is performed only on the basis of two images. It is described in the next section.

Calculation of ACS Parameter

In order to execute the ACS module, it is necessary to estimate the values of parameters c and θ for a pair of images processed by a stereo matching algorithm. The estimation of these parameters consists of the steps presented in Figure 2.
In the first step, keypoints are identified in both of the images from a stereo camera. In the method presented in this paper, SURF descriptors are used for this purpose [25]. The keypoints detected on the left image are then matched with the keypoints of the right image.
The next step consists of excluding some of the matched points from further calculations. One reason for excluding keypoints is related to the location of pairs of matched keypoints. Let us assume that a keypoint located in the left image at coordinates ( x k 0 , y k 0 ) is matched with a keypoint located in the right image at coordinates ( x k 1 , y k 1 ) . The difference | y k 0 y k 1 | indicates that these keypoints were matched incorrectly if this difference is beyond the acceptable value. The acceptance threshold is set by parameter H.
The value of this parameter depends on the extent of stereo camera inaccuracies that are expected to occur. The value of H is also influenced by the expected deterioration in camera calibration caused by its usage over time. In general, the number of inaccuracies remaining after a pattern-based calibration is low, and they are limited to a few pixels. Such a value is high enough to significantly deteriorate the quality of a disparity map obtained after processing these images, but it is not high enough to be corrected by additional calibration. A precise selection of parameter H is not crucial for the final results because its purpose is only to exclude pairs of points matched incorrectly from further calculations, i.e., points for which the location on a side image is evidently different from the location expected on the basis of a matched point on a reference image. The influence of the selection of parameter H on the quality of the results is presented in the experiments described in Section 4.
After the selection of points, the ACS module estimates the c parameter. The c parameter is calculated directly from the disparity values of pairs of keypoints, contrary to a typical bundle adjustment, which is performed iteratively. As mentioned earlier, the ACS module executes for every pair of images processed by a stereo matching algorithm. Many of these algorithms are expected to work in real time. Therefore, any time-consuming process, such as iterative bundle adjustment, is not acceptable. Parameter c is obtained using Equation (6).
c = k = 1 K ( y k 0 y k 1 ) K
where y k 0 and y k 1 are the y-coordinates of corresponding keypoints, and the number of keypoints is K. c is equal to the mean average of all differences in the vertical disparities of corresponding keypoints.
As far as the rotations of images are concerned, the ACS module can improve calibration by rotating a side image. It is possible to estimate the extent of a necessary rotation only on the basis of the discrepancies of corresponding keypoints in the y-axis. The locations of corresponding keypoints in the x-axis are highly affected by the disparities caused by differences in the distances of the considered points from a stereo camera. The ACS module executes before stereo matching; therefore, differences in the locations of points in the x-axis are inconclusive for estimating inaccuracies in rotation.
Rotation affects the components of points to a different extent, depending on the locations of the points. If the points are located along a horizontal line going through the center of an image, then these points are almost fully shifted in the y-axis. In this paper, this line is called line X. Let us denote with α an acute angle that is between line X, a line passing through the center of an image, and some point p. The coordinates of p are affected by rotation to a different extent, depending on angle α . In order to estimate rotation, the ACS module considers the keypoints located in the area in the vicinity of line X within the range for which α < = π / 6 . This area is marked in gray in Figure 3.
The keypoints in this area are used to estimate the rotation performed in the ACS module on the basis of three factors. The first factor is related to the location of a keypoint in the image. If a point from the right side of the side image is shifted towards the top of the image, then it indicates that the rotation should be greater in the clockwise direction. In contrast, in the left part of an image, the location of a point higher than a corresponding point for the reference image indicates that it should be moved lower in the side image by rotating the image counterclockwise. Accordingly, the rotation is affected by locations that are too low for the corresponding points in the side image. The rotation of an image performed in the ACS module is estimated on the basis of the average discrepancies in the y-axis of the keypoints with respect to the angle α .
This angle is equal to a t a n ( ( y k 1 y k 0 ) / ( x c e n t e r x k 1 ) ) , where y k 1 y k 0 is a discrepancy in the matched keypoints in the y-axis, and x c e n t e r x k 1 is the distance of a keypoint in the side image from the rotation center, which is in the middle of the image. The α angle is, in fact, the rotation that should be executed in order to minimize the discrepancy in the considered point. An average value of these angles is calculated in order to adjust the rotation to the set of all considered keypoints. The value of α is the first factor influencing the extent of the rotation θ performed in the ACS module. The whole formula for estimating the rotation is presented in Equation (7).
θ = ( k = 1 K ( a t a n ( ( y k 1 y k 0 ) / ( x c e n t e r x k 1 ) ) ( ( 1 + ( 2 ) ) / 2 ) ( x c e n t e r x k 1 / x c e n t e r ) ) / K
The second factor influencing rotation is equal to ( 1 + ( 2 ) ) / 2 . The presence of this factor is a consequence of taking into account the keypoints from the whole area marked with gray in Figure 3. For these points, the rotation partly affects their x- and y-coordinates. The impact on the y-coordinate is the lowest for points placed along the line passing through the center and inclined at an angle of π / 6 to the horizontal line. The observation that a point on this line has a discrepancy of e indicates that a rotation resulting in moving the point by e 1 c o s ( π / 6 ) should occur. The rotation for discrepancies in the points on the edge of the considered area needs to be larger than the rotation for the same discrepancy located along line X. On line X, discrepancy e implies that a point should be moved e points in the y-axis. For this reason, the second factor is equal to ( 1 + ( 2 ) ) / 2 . It is an average value for the whole considered area.
The third factor included in Equation (7) is ( x c e n t e r x k 1 / x c e n t e r ) . It is added in order to reduce the significance of points located closer to the center of an image. The significance of these points needs to be reduced because any minor inaccuracy in the estimation of the keypoint coordinates inappropriately indicates that a large rotation should be performed. For example, suppose a keypoint has a discrepancy of one pixel towards the top of an image and it is located two points to the side of the center of an image. In that case, such a discrepancy indicates that the image should be rotated by π / 6 . However, if the location of this point was placed only two points lower, then it would indicate that such a rotation should be executed in a different direction. Therefore, points closer to the center cannot be fully taken into account because any incorrectness in estimating the coordinates of these keypoints overwhelmingly affects the result.
There is also a problem when rotating images: it causes modifications to the values of the points. In the case of moving images by an integer number of points, there are no such problems because these points remain the same, but they are only placed in different coordinates. Rotating images requires pixel interpolations. Therefore, rotating them by a minor angle may cause image-quality deterioration on the level of individual pixels without a significant gain caused by the improvement in calibration. For this reason, the ACS module does not rotate images when the angle θ is below some value. In the experiments presented in this paper, this value was set to π / 600 .

3.2. Test Data

In order to test the ACS module, it is necessary to use pairs of images that resemble images taken by stereo cameras in real applications. As presented in the introduction, images from testbeds do not fulfill these criteria because of their superior quality in terms of calibration. Therefore, in order to prepare test data for the ACS module, the author of this paper implemented software to transform the images from testbeds into a form that makes them similar to images taken by a regular stereo camera. This software is based on reversing, to some extent, the calibration process. Every point p 1 o of a side image is transformed as per Equation (8).
p 1 o = c o s ( ω ) s i n ( ω ) 0 s i n ( ω ) c o s ( ω ) 0 0 0 1 1 0 0 0 1 g 0 0 1 p 1
where ω is a rotation angle, g is the extent of the translation of an image in the y-axis, and p 1 o is an image obtained as a result of the transformation.
Reversing calibration involves identifying the extrinsic parameters of a stereo camera. Therefore, the transformations performed are rotation and translation in the y-dimension. These transformations do not cover all possible miscalibrations that occur in real-world applications. However, they were selected because they can be addressed using the additional calibration performed by the ACS module with the stereo matching algorithms. Only a side image is modified. Modifying only a side image is sufficient for adjusting extrinsic parameters and simultaneously transforming a side image so as to not interfere with the ground truth of a reference image. Thus, the ground truth prepared for stereo pairs can be used for these pairs both with and without the transformations described in this section.
The extent of the transformations is defined by parameters ω and h. These parameters set the extent of the inaccuracies injected into the calibration in order to make it possible to test the performance of the ACS module, which is designed to reduce these faults. The tests are described in the next section.
The datasets used for performing the experiments with reverse calibration were acquired from Middlebury Stereo Vision [1,27,35,36,37]. Tests were performed on the images included in the training sets provided by the evaluation SDK, released as a part of the Middlebury Stereo Vision project. These sets are called Adirondack, Jadeplant, Motorcycle, Playroom, Shelves, Vintage, Piano, Pipes, Playtable, Recycle, and Teddy. They contain over 3.5 million test points.

4. Results

The experiments using the ACS module were focused on analyzing the impact that this module has on the quality of the results of stereo matching. The ACS was applied to pairs of images with different extents of inaccuracies in their calibration. In order to prepare sets of images with a different calibration precision level, image sets from the Middlebury Stereo Vision project were processed using the method described in Section 3.2.
Figure 4a shows the relationship between parameter g in Equation (8) and the quality of the resulting disparity map measured with the widely used Bad Matched Pixels (BMP) metric [1]. BMP version 1.0 was used, and the results were obtained by using the SDK application provided by the Middlebury Stereo Vision project. Figure 4b shows the average error in the disparity value.
The red line presented in the charts refers to the BMP of the disparity maps obtained without the usage of the ACS module. It is noticeable that if the right image is shifted over three points in the y-axis ( g > 3 , then the quality of the disparity map deteriorates to over 75%. The blue line shows the results with the use of the ACS. If a side image is shifted three pixels, then the average BMP remains below 30%. This shows that the ACS module fulfills its purpose.
A similar impact of the ACS module was observed under the rotation of a side image. Figure 5 shows the influence of the ω parameter on the results of stereo matching. The red line reflects the quality of disparity maps obtained without the ACS module. If a side image is rotated by only π / 180 , then the values of BMP rise to over 88%. The blue line in Figure 5 corresponds to the results obtained with the use of ACS. The errors in stereo matching are reduced, and the value of BMP remains lower than 36% when a side image is rotated less than 1.5 π / 180 .
Figure 6a shows parameter H, as described in Section Calculation of ACS Parameter. The figure presents the extent of necessary translation in the y-axis estimated by the ACS module for different values of H. The estimates were obtained for a pair of images in which a side image was moved up by one point. These estimates do not depend significantly on the selection of parameter H. If 3 H 30 , all results were in the range between 0.8221 and 0.9093. Values closest to the correct one were observed for H = 11 , H = 27 , and H = 28 .
Similarly, there is a wide range of acceptable values for the H parameter for the purpose of performing rotation. Figure 6b presents a relation between rotation calculated by the ACS module for images rotated by 1 π / 180 rad. Values between 5 and 30 are in the range of 0.879 and 1.054, with the best estimations for 12 H 19 . Therefore, an accurate selection of the H parameter is not crucial for the successful performance of the ACS module; however, it contributes to improved results.
In the calculations, parameter H is used to exclude possible errors when matching points on the basis of descriptors. Khan et al. analyzed the performance of SURF and SIFT descriptors [38]. Their research shows that these descriptors are almost invariant to minor rotations of images and translations. The scaling of images and inserting noise is more problematic.
Theoretically, the value of the H parameter is best when it is higher than the extent of miscalibration that occurs when translating the side image in the y-axis. It needs to be high enough to make it possible for the SURF algorithm to identify corresponding descriptors. However, greater values of H will not have a positive impact on calibration quality. The experiments presented in this paper show that even if the value of H is much higher than its ideal selection, it does not have a crucial impact on the quality of the results for the tested input data. Parameter H can be set according to the resolution of images from stereo cameras and their application. Nevertheless, by default, it can be set to the maximum expected miscalibration in the y-axis.
The ACS module does not require a lot of hardware resources. The execution time of the ACS module was below 240 ms for a single input pair of images. Tests were performed on an Intel Core i7-7700 central processor unit manufactured in 2017 by Intel Corporation whose headquarters are located in USA. Only a single thread was used. The module was implemented in version C++11 of the C++ language using the 3.4.13 version of the OpenCV library. The implementation did not use a GPU for calculations. As far as memory usage is concerned, the application used less than 49 MB in total. The module only processes two images, meaning memory usage is not high. The module is intended for use with images that are not over- or underexposed. In the case of it being necessary to process these kinds of images, additional filtering is required to normalize brightness.
As far as the computational complexity of the ACS module is concerned, the SUFT algorithm has a greater influence. Drews et al. reported that the computational complexity of SUFT is O(n × m + K), where n and m refer to the size of an image, and K is the number of identified points [39]. All other steps in the ACS module have a lower computational complexity, including the step in which an image is translated and rotated. Translation and rotation require performing an operation for every point of an image; therefore, the computational complexity of these steps is O(n × m). The calculations of parameters c and θ used in the ACS depend on the number of keypoints; therefore, its computational complexity is O(K).

5. Conclusions

The research presented in this paper shows that using the ACS module is most suitable for reducing decalibration caused when a side image is moved in the y-axis with respect to the reference image. If a side image is moved less than two pixels, then using the ACS causes a quality deterioration in the resulting disparity maps, remaining below 1%. It needs to be emphasized that without this module, the results contain over 38% more errors when the extent of decalibration in the y-axis is equal to two pixels. Decalibration caused by rotation is more problematic. The ACS module also limits the increase in the error rate; when the rotation is below π / 180 , the increase in the error rate does not exceed 13% when the ACS is used. Nevertheless, this is a good result in comparison to cases where the ACS is not used; without using the ACS, the results contain over 66% more errors when a rotation is equal to π / 180 .
The problem with developing technology based on stereo cameras is that research has either focused on camera calibration or stereo camera algorithms; each of these aspects assumes perfect results from the other, which cannot be achieved in real applications. This study presents a hybrid approach of combining bundle adjustment with pattern-based calibration and stereo matching. Working under the assumption that there are imperfections in camera calibration can start an entire branch of research focused on designing stereo matching algorithms intended for use with imperfectly calibrated cameras. This branch can include both inventing these algorithms and preparing methods for testing them.

Funding

This work was supported, in part, by a ministry subsidy for research to Gdansk University of Technology, Poland.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
LIDARLight Detection and Ranging
ELASEfficient Large-scale Stereo Matching
StereoSGBMStereo Semi-Global Block Matching
BMPPercentage of bad matching pixels
ACSAuxiliary calibration step
SIFTScale-invariant feature transform
SURFsSpeeded up robust features

References

  1. Scharstein, D.; Szeliski, R. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
  2. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
  3. Middlebury Stereo Evaluation—Version 3. Available online: https://vision.middlebury.edu/stereo/eval3/ (accessed on 23 August 2024).
  4. KITTI Vision Benchmark Suite. Available online: https://www.cvlibs.net/datasets/kitti/ (accessed on 23 August 2024).
  5. Geiger, A.; Roser, M.; Urtasun, R. Efficient Large-Scale Stereo Matching. In Computer Vision—ACCV 2010, Proceedings of the 10th Asian Conference on Computer Vision, Queenstown, New Zealand, 8–12 November 2010, Revised Selected Papers, Part I; Kimmel, R., Klette, R., Sugimoto, A., Eds.; Lecture Notes in Computer Science (LNCS, Volume 6492); Springer: Berlin/Heidelberg, Germany, 2011; pp. 25–38. [Google Scholar] [CrossRef]
  6. Bradski, D.G.R.; Kaehler, A. Learning Opencv, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2008. [Google Scholar]
  7. Georgopoulos, A.; Ioannidis, C.; Valanis, A. Assessing the performance of a structured light scanner. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2010, 38, 251–255. [Google Scholar]
  8. Roriz, R.; Cabral, J.; Gomes, T. Automotive LiDAR Technology: A Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 6282–6297. [Google Scholar] [CrossRef]
  9. Seitz, S.M.; Curless, B.; Diebel, J.; Scharstein, D.; Szeliski, R. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 519–528. [Google Scholar] [CrossRef]
  10. 19 LiDAR Sensor Manufacturers in 2025. Available online: https://us.metoree.com/categories/lidar/ (accessed on 24 March 2025).
  11. StereoLab Store. Available online: https://www.stereolabs.com/en-pl/store (accessed on 24 March 2025).
  12. Basler Stereo Cameras. Available online: https://www.baslerweb.com/en/cameras/basler-stereo-camera/?utm_source=google&utm_medium=cpc&utm_campaign=EU+EN+-+NB+-+Prio+1&utm_term=NB+-+Stereo&utm_id=6809394628&gad_source=1&gclid=Cj0KCQjwhYS_BhD2ARIsAJTMMQa__kfTw8d2B4iB83TrfKFPa-EBL6t5CXD3PoNiG3jGzqW923IHpekaAoH-EALw_wcB (accessed on 24 March 2025).
  13. Stereoscopic Photography with StereoPi and a Raspberry Pi. Available online: https://www.raspberrypi.com/news/stereoscopic-photography-stereopi-raspberry-pi/ (accessed on 24 March 2025).
  14. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar]
  15. Moravec, J.; Šára, R. High-recall calibration monitoring for stereo cameras. Pattern Anal. Appl. 2024, 27, 41. [Google Scholar] [CrossRef]
  16. Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
  17. Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle Adjustment—A Modern Synthesis. In Vision Algorithms: Theory and Practice, Proceedings of the International Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 1999; Triggs, B., Zisserman, A., Szeliski, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; pp. 298–372. [Google Scholar]
  18. Camera Calibration and 3D Reconstruction. Available online: https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html (accessed on 24 March 2025).
  19. Geiger, A.; Moosmann, F.; Car, O.; Schuster, B. Automatic Calibration of Range and Camera Sensors using a single Shot. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012. [Google Scholar]
  20. Li, B.; Heng, L.; Koser, K.; Pollefeys, M. A multiple-camera system calibration toolbox using a feature descriptor-based calibration pattern. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 1301–1307. [Google Scholar] [CrossRef]
  21. Duan, Q.; Wang, Z.; Huang, J.; Xing, C.; Li, Z.; Qi, M.; Gao, J.; Ai, S. A deep-learning based high-accuracy camera calibration method for large-scale scene. Precis. Eng. 2024, 88, 464–474. [Google Scholar] [CrossRef]
  22. Hartley, R.I.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004; ISBN 0521540518. [Google Scholar]
  23. Li, H.; Yin, J.; Jiao, L. Digital Surface Model Generation from Satellite Images Based on Double-Penalty Bundle Adjustment Optimization. Appl. Sci. 2024, 14, 7777. [Google Scholar] [CrossRef]
  24. Li, H.; Yin, J.; Jiao, L. An Improved 3D Reconstruction Method for Satellite Images Based on Generative Adversarial Network Image Enhancement. Appl. Sci. 2024, 14, 7177. [Google Scholar] [CrossRef]
  25. Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
  26. Hosseininaveh, A.; Serpico, M.; Robson, S.; Hess, M.; Boehm, J.; Pridden, I.; Amati, G. Automatic Image Selection in Photogrammetric Multi-view Stereo Methods. In Proceedings of the 13th International Symposium on Virtual Reality, Archaeology and Cultural Heritage VAST, Brighton, UK, 19–21 November 2012; pp. 9–16. [Google Scholar]
  27. Scharstein, D.; Hirschmüller, H.; Kitajima, Y.; Krathwohl, G.; Nešić, N.; Wang, X.; Westling, P. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. In Pattern Recognition, Proceedings of the 36th German Conference, GCPR 2014, Münster, Germany, 2–5 September 2014; Jiang, X., Hornegger, J., Koch, R., Eds.; Springer: Cham, Switzerland, 2014; pp. 31–42. [Google Scholar]
  28. Kaczmarek, A.L.; Blaschitz, B. Equal Baseline Camera Array—Calibration, Testbed and Applications. Appl. Sci. 2021, 11, 8464. [Google Scholar] [CrossRef]
  29. Zhang, S.; Liu, S.; Ma, Y.; Qi, C.; Ma, H.; Yang, H. Self calibration of the stereo vision system of the Chang’e-3 lunar rover based on the bundle block adjustment. ISPRS J. Photogramm. Remote Sens. 2017, 128, 287–297. [Google Scholar] [CrossRef]
  30. Yang, B.; Pi, Y.; Li, X.; Yang, Y. Integrated geometric self-calibration of stereo cameras onboard the ZiYuan-3 satellite. ISPRS J. Photogramm. Remote Sens. 2020, 162, 173–183. [Google Scholar] [CrossRef]
  31. Liu, R.; Zhang, H.; Liu, M.; Xia, X.; Hu, T. Stereo Cameras Self-Calibration Based on SIFT. In Proceedings of the 2009 International Conference on Measuring Technology and Mechatronics Automation, Zhangjiajie, China, 11–12 April 2009; Volume 1, pp. 352–355. [Google Scholar] [CrossRef]
  32. Boukamcha, H.; Atri, M.; Smach, F. Robust auto calibration technique for stereo camera. In Proceedings of the 2017 International Conference on Engineering MIS (ICEMIS), Monastir, Tunisia, 8–10 May 2017; pp. 1–6. [Google Scholar] [CrossRef]
  33. Yin, H.; Ma, Z.; Zhong, M.; Wu, K.; Wei, Y.; Guo, J.; Huang, B. SLAM-Based Self-Calibration of a Binocular Stereo Vision Rig in Real-Time. Sensors 2020, 20, 621. [Google Scholar] [CrossRef] [PubMed]
  34. Hirschmüller, H. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
  35. Scharstein, D.; Szeliski, R. High-accuracy stereo depth maps using structured light. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, Madison, WI, USA, 18–20 June 2003; Volume 1, pp. 195–202. [Google Scholar] [CrossRef]
  36. Scharstein, D.; Pal, C. Learning Conditional Random Fields for Stereo. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar] [CrossRef]
  37. Hirschmuller, H.; Scharstein, D. Evaluation of Cost Functions for Stereo Matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar] [CrossRef]
  38. Khan, N.Y.; McCane, B.; Wyvill, G. SIFT and SURF Performance Evaluation against Various Image Deformations on Benchmark Dataset. In Proceedings of the 2011 International Conference on Digital Image Computing: Techniques and Applications, Noosa, Australia, 6–8 December 2011; pp. 501–506. [Google Scholar] [CrossRef]
  39. Drews, P.; de Bem, R.; de Melo, A. Analyzing and exploring feature detectors in images. In Proceedings of the 2011 9th IEEE International Conference on Industrial Informatics, Lisbon, Portugal, 26–29 July 2011; pp. 305–310. [Google Scholar] [CrossRef]
Figure 1. The process of using the ACS module.
Figure 1. The process of using the ACS module.
Applsci 15 04070 g001
Figure 2. The process of estimating the parameters of transformations performed in the ACS module.
Figure 2. The process of estimating the parameters of transformations performed in the ACS module.
Applsci 15 04070 g002
Figure 3. Points affecting rotation estimation in the ACS module.
Figure 3. Points affecting rotation estimation in the ACS module.
Applsci 15 04070 g003
Figure 4. (a) BMP with respect to g; (b) average error in the disparity value with respect to g (red—without the ACS module; blue—with the ACS module).
Figure 4. (a) BMP with respect to g; (b) average error in the disparity value with respect to g (red—without the ACS module; blue—with the ACS module).
Applsci 15 04070 g004
Figure 5. (a) BMP with respect to ω ; (b) average error in the disparity value with respect to ω (red—without the ACS module; blue—with the ACS module).
Figure 5. (a) BMP with respect to ω ; (b) average error in the disparity value with respect to ω (red—without the ACS module; blue—with the ACS module).
Applsci 15 04070 g005
Figure 6. The influence of parameter H on estimated translation in the y-axis (a) and estimated rotation (b).
Figure 6. The influence of parameter H on estimated translation in the y-axis (a) and estimated rotation (b).
Applsci 15 04070 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kaczmarek, A.L. A Method for Adapting Stereo Matching Algorithms to Real Environments. Appl. Sci. 2025, 15, 4070. https://doi.org/10.3390/app15074070

AMA Style

Kaczmarek AL. A Method for Adapting Stereo Matching Algorithms to Real Environments. Applied Sciences. 2025; 15(7):4070. https://doi.org/10.3390/app15074070

Chicago/Turabian Style

Kaczmarek, Adam L. 2025. "A Method for Adapting Stereo Matching Algorithms to Real Environments" Applied Sciences 15, no. 7: 4070. https://doi.org/10.3390/app15074070

APA Style

Kaczmarek, A. L. (2025). A Method for Adapting Stereo Matching Algorithms to Real Environments. Applied Sciences, 15(7), 4070. https://doi.org/10.3390/app15074070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop