An Illumination Insensitive Descriptor Combining the CSLBP Features for Street View Images in Augmented Reality: Experimental Studies

The common feature matching algorithms for street view images are sensitive to the illumination changes in augmented reality (AR), this may cause low accuracy of matching between street view images. This paper proposes a novel illumination insensitive feature descriptor by integrating the center-symmetric local binary pattern (CS-LBP) into a common feature description framework. This proposed descriptor can be used to improve the performance of eight commonly used feature-matching algorithms, e.g., SIFT, SURF, DAISY, BRISK, ORB, FREAK, KAZE, and AKAZE. We perform the experiments on five street view image sequences with different illumination changes. By comparing with the performance of eight original algorithms, the evaluation results show that our improved algorithms can improve the matching accuracy of street view images with changing illumination. Further, the time consumption only increases a little. Therefore, our combined descriptors are much more robust against light changes to satisfy the high precision requirement of augmented reality (AR) system.


Introduction
Augmented reality (AR) is an emerging form of experience in which the real world (RW) is enhanced by computer-generated content, which is tied to specific locations and/or activities. In simple terms, AR allows digital content to be seamlessly overlaid and mixed into our perceptions of the real world [1]. AR technology can be divided into two categories: marker-based AR and markerless-based AR. Marker-based AR is to use visual features or an object to be a trigger, while markerless-based AR is to use some technology to detect the relative position (feature matching) between virtual objects and the real world [2]. The principle of markerless-based AR is to extract feature points from template objects through a series of algorithms, such as SURF, ORB, FERN, etc., and record or learn these feature points. The tracking based on image processing uses natural features that, color, shape, texture and interest point, in images to calculate a camera's pose [3]. It uses the homography matrix between adjacent frame images obtained by image matching to solve the position and pose of the camera for the registration [4]. The vision system generates in a preprocessing phase with a huge quantity of 3D-features. GIS (Geographic Information System) technological researches are carried out to design and build a geographic database that can store and query these and can add the features that are generated by the augmented reality system [5].
Street view image is the main component of a real scene in AR system, since the illumination changes and the different photosensitivity of different cameras will cause the complex brightness changes of the same street view scene after imaging, which makes it difficult to match the different images [6]. Aiming at the difficulties of image feature matching caused by illumination changes, researchers have proposed many related image feature matching algorithms. Lowe proposed a scaleinvariant local feature detection and description algorithm called SIFT [7]. Thanks to its high uniqueness and robustness, SIFT has been demonstrated as the state-of-the-art algorithm in computer vision. However, due to the loss of information and the changes of brightness of the decolorized images are not considered, the high dimensional descriptor makes it very slow when extracting information from features. To speed it up, Bay proposed the SURF algorithm for the same purpose with lower computational complexity, but illumination changes remain problematic [8]. Tola proposed another local feature description algorithm on the basis of SIFT, which convolves several Gaussian filter functions in different scales with several directional diagrams of the original image [9]. The efficiency of the algorithm has improved, but the illumination changes is not considered [10]. Leutenegger proposed a feature detector and descriptor called BRISK [11], which uses the FAST detector [12] and a binary feature descriptor, thus is faster than SURF and performs well regarding rotation and scale, while not robust against illumination changes [6]. Developed from the oriented FAST detector and rotated BRIEF descriptor [12], Rublee [13] proposed a fast feature detection and description algorithm called ORB, while it is also sensitive to illumination changes [6]. Following this, Alahi proposed FREAK to improve BRISK [14], which adopts a better sampling pattern that is very similar to the retinal ganglion cells of human eyes when receiving visual information. It is faster but less precise than BRISK. Alcantarilla proposed a feature detection and description algorithm that operates in a nonlinear scale space [15]. Through using additive operator splitting (AOS) techniques [16] and variable conductance diffusion to construct the nonlinear scale space with arbitrary step size, KAZE solves the problem of blurred boundaries and losing details of the image caused by Gaussian scale space like SIFT and SURF. Nevertheless, the KAZE algorithm improves the localization accuracy, but the computational efficiency becomes lower. Later, Alcantarilla proposed a more accurate and efficient feature detection and description algorithm called AKAZE [17], which introduces the fast explicit diffusion (FED) scheme to solve the partial differential equations, which is much faster and more accurate than AOS [18]. Moreover, a highly efficient modified-local difference binary (M-LDB) descriptor is introduced to increase the robustness against rotation and scale invariance, while it gets low precision in the scene of clear lighting changes. Gevrekci proposed an illumination robust interest point detection algorithm, but its high complexity makes its real-time performance extremely poor [19]. Liu improved the Hu invariant moment to increase its illumination robustness, while this could cause the loss of image information, leading to a relatively low precision [20]. Ouyang matches images with different brightness in affine space and its performance is good, but the large amount of computation makes it not suitable for real-time image matching [21]. Lyu uses the non-linear correlation of grayscale information in high-dimensional vector space to improve the illumination robustness of image matching, but it will cause the loss of image information as well [22].
To solve the above problems, this paper integrates the center-symmetric local binary pattern (CS-LBP), a simple but strong illumination-invariant feature, into eight common feature correspondence algorithms [23]. We conduct experiments on the combined descriptors, the CSLBP and eight original descriptors in five different illumination scenes. The experimental results show that the improved algorithms can increase the precision of image matching between street view images with different illumination changes. In addition, they can as well satisfy the real-time requirements in AR systems.

CS-LBP Feature Descriptor
The CS-LBP [24] is a simplified local binary pattern (LBP) [25], which can be used to describe the relationship of the pixel intensity between the point and its local region. The implementation steps of CS-LBP descriptor can be summarized as follows: Step-1: Suppose ( ,y, , ) i P x   is a keypoint identified by a detector, where ( , y) x represent the pixel coordinate in the original image,  defines the scale and  denotes the orientation of the keypoint.
Step-2: Take a 9 9  area centered around the keypoint (inside the black-line in Figure 1a) at the scale space of  , which is used compute the CS-LBP feature descriptor of the keypoint (here we expand a pixel outward to compute the CS-LBP feature of the pixel at the edge). Suppose j is an arbitrary pixel within the 9 9  area, as shown in Figure 1a. Then, rotate this region to the orientation of the keypoint at an angle of  to obtain the rotation invariance, as shown in Figure   1b.
Step-3: Compute the CS-LBP feature of the pixel j according to Equation (1) where i g and /2 i N g  represent the normalized gray values of a center-symmetric pairs of pixels of N equally spaced pixels on a circle with a radius of R , and T defines the threshold of normalized gray value. In this paper we take Figure 1c.
Step-4: If the keypoint is detected by BRISK, ORB, FREAK or AKAZE algorithm (where the original descriptors of these four algorithm is binary), we construct an 81-dimensional binary CS-LBP description vector i V for it by 81 j cslbp according to Equation (2). Instead, if it is detected by SIFT, SURF, DAISY or KAZE algorithm, we need to further normalize the binary vector i V by Equation (3) as a non-binary vector i U [26].

Combined Descriptors
Firstly, we obtain the keypoints and their original descriptors through eight original image feature matching algorithms (SIFT, SURF, DAISY, BRISK, ORB, FREAK, KAZE, AKAZE). Then we compute the CS-LBP descriptor of each keypoint. Afterwards, we add the CS-LBP descriptors to eight original descriptors, respectively, to construct eight combined descriptors, they are denoted correspondingly as SIFT-CSLBP, SURF-CSLBP, DASIY-CSLBP, BRISK-CSLBP, ORB-CSLBP, FREAK-CSLBP, KAZE-CSLBP and AKAZE-CSLBP. Lastly, we match the keypoints of two images by the nearest distance ratio method. The implementation steps are summarized as follows: Step-1: The keypoints sets of the two images are detected by one of the eight original feature detection algorithms.
Step-2: The original descriptor of each keypoint is computed by one of the eight original feature description algorithms.
Step-3: The CS-LBP descriptor of each keypoint is computed by the steps in Section 2.1.
Step-4: The original descriptor is simply concatenated with the CS-LBP descriptor, and then we can get the new combined descriptor with increased dimension.
The flow chart for calculating the combined descriptors and matching the keypoints is shown in Figure 3. The original feature detection algorithm in the chart can be one of the mentioned eight feature detection and matching algorithms.

Experiments and Results
We first introduce the data and method of the experiment as well as the evaluation indices for the algorithm performance. Then we design a series of experiments to compare the matching performance between the combined descriptors, the CS-LBP descriptors and the original descriptors. We finally design experiments to compare the computational and matching speed of the three types of descriptors.

Datasets
The matching experiments were performed on five street view image sequences provided by Mikolajczyk , which are the benchmark sequences used to test the performance of image matching algorithms [27]. The specific presentation of each sequence is as follows:  Leuven sequence Leuven sequence only has illumination changes, and it contains six images ( Figure 4). Each image has a size of 921 × 614 pixels.  Bikes sequence Bikes sequence has blur changes. The Normal Bikes ( Figure 5a) and New Bikes ( Figure 5b) with increasing blur are selected for the experiment. The exposures of New Bikes are adjusted by Photoshop software to simulate changing illumination, which are -1, -2, -3, +1, +2, +3, respectively. The magnitude of illumination changes increases with the absolute value of the exposure. Specifically, negative exposure means underexposure, indicating that the image appears darker, while positive means overexposure, which suggests a brighter image. After processing, this sequence has both blur and illumination changes. Each image has a size of 1000 × 700 pixels ( Figure 5).  Boat sequence Boat sequence has scale and rotation changes. The Normal Boat ( Figure 6a) and New Boat ( Figure 6b) with a scale reduction factor of 1.9 and a rotation angle of 45 degrees are selected for experiment. The exposures of New Boat are also adjusted by Photoshop software as Bikes sequence. After that, this sequence has scale, rotation and illumination changes. Each image has a size of 800 × 640 pixels ( Figure 6).

Experiment Method
In order to compare the matching performance and the speed of the combined descriptors (SIFT-CSLBP, SURF-CSLBP, DAISY-CSLBP, BRISK-CSLBP, ORB-CSLBP, FREAK-CSLBP, KAZE-CSLBP, AKAZE-CSLBP) and the CSLBP descriptors with eight original descriptors (SIFT, SURF, DAISY, BRISK, ORB, FREAK, KAZE and AKAZE) (where the CSLBP algorithm is the method that we replace the feature descriptor in the original algorithms by the CSLBP feature descriptor, and the only difference between these two algorithms is the different description), this paper tests five sequences in Section 3.1 in a programming environment of Visual Studio 2013 VC++ and OpenCV 3.3. The specific experimental processes are as follows: Step-1: Selection of evaluation indices for the algorithm performance. The performance of image feature matching algorithm commonly includes the calculation accuracy and efficiency in the detection, description and matching of keypoints [28]. Thus, a good algorithm should get more correct matches and lower time consumption. Therefore, we select the recall, precision and operating time [28] to evaluate the performance of our improved algorithms.
Step-2: Comparison of performance of the original and the improved algorithms. We use the subfigure (a) of Figures 4-8 to match against the others within the sequence by using the original, the CSLBP and the combined descriptors, respectively.
Step-3: Comparison of speed of the original and the improved algorithms.
Since the feature detection step in each improved algorithm is identical to that of the original algorithm, we only compare the computational and matching speed of each description vector. We chose the Figure 1a of Leuven sequence for testing. Firstly, we extract 30 keypoints to construct the original, the CSLBP and the improved descriptors, respectively. Then, we match the 30 keypoints against the same 30 keypoints by using the original, the CSLBP and the improved descriptors, respectively. The time consumption for each descriptor of constructing and matching for 30 keypoints are recorded.

Evaluation Indices
The recall is computed as the ratio between the number of correctly matched keypoints and the number of corresponding keypoints [27] as shown in Equation (4). The precision is represented by the number of correct matches with respect to the number of total matches as shown in Equation (5).
# # correct matches precision total matches where # correct matches is the number of correct matches, # correspondence is the number of corresponding keypoints which represents the total number of possible correct matches including the matched and unmatched keypoints, while # total matches is the number of total actual matches including the correct and false matches.
To verify the correct matches, we use the criterion proposed by Mikolajczyk [28]. The match between two keypoints is correct if the pixel error in relative location is less than 3 pixels: x y are the pixel coordinate of the keypoint in training and querying image, respectively, H is the homography between two images.

Influences of Only Illumination Changes
With respect to Leuven sequence ( Figure 4), we use Figure 4a to match against other images within the group with the original, the CSLBP and the combined descriptors, respectively. The recall and precision in the matching results of each descriptors are shown in Figure 9 and Figure 10. The red line represents for the original descriptors, green line for the CSLBP descriptors and blue line for the combined descriptors. The reduced values of recall for the CSLBP and combined descriptors compared to the original descriptors are computed in Table 1, of which in bold is the highest decrease of this descriptor, while the improved values of precision are in Table 2, of which in bold is the highest increase of this descriptor. From those figures and tables, we can get the following conclusions: 1. In most scenes, the recall of the CSLBP descriptor is far lower than that of the original descriptors, while the combined descriptors are just slightly lower. 2. The precision of the CSLBP descriptors is sometimes higher than that of the original descriptors and sometimes lower, while the combined descriptors are always higher. 3. Among the combined descriptors, the FREAK-CSLBP descriptor demonstrates the best matching performance with a less reduced recall (within 13.8%) and a higher precision (over 90%).

Influences of Blur and Illumination Changes
With respect to the Bikes sequence ( Figure 5), the matching results are shown in Figure 11, Figure  12, Table 3 and Table 4. From those figures and tables, we can get the following conclusions: 1. In most scenes, the recall of the CSLBP descriptors is approaching zero and far lower than that of the original descriptors, since the CSLBP feature is sensitive to blur changes. While the combined descriptors are just a small lower. 2. The precision of the CSLBP descriptors is approaching zero and far lower than that of the original descriptors, while combined descriptors are much higher. 3. Among the combined descriptors, the SURF-CSLBP descriptor demonstrates the best matching performance with a less reduced recall (within 9.0%) and a higher precision (over 90%).

Features Descriptors a-e a-d a-c a-b a-f a-g a-h
SIFT CSLBP

Influences of Scale, Rotation and Illumination Changes
As for Boat sequence (Figure 6), the matching results are shown in Figure 13, Figure 14, Table 5 and Table 6. From those figures and tables, we can get the following conclusions: 1. In most scenes, the recall of the CSLBP descriptors is approaching zero and far lower than that of the original descriptors except that the recall of DAISY is zero as well, since the CSLBP feature is sensitive to scale changes. While the combined descriptors are mostly higher. 2. The precision of the CSLBP descriptors mostly approaching zero except that of the CSLBP descriptor for ORB features and far lower than that of the original descriptors except that the precision of DAISY is zero as well. While the combined descriptors are mostly much higher except that the precision of DAISY-CSLBP is still zero. 3. Among the combined descriptors, the ORB-CSLBP descriptor shows the best matching performance with a less reduced recall (within 9.7%) and a higher precision (over 90%).

Features Descriptors a-e a-d a-c a-b a-f a-g a-h
SIFT CSLBP -

Influences of Viewpoint and Illumination Changes
For the Graffiti sequence (Figure 7), the matching results are shown in Figure 15, Figure 16, Table  7 and Table 8. From those figures and tables, we can get the following conclusions: 1. In most scenes, the recall of the CSLBP descriptors is approaching zero and far lower than that of the original descriptors except that the recall of DAISY and KAZE is approaching zero as well, since the CSLBP feature is sensitive to viewpoint changes. While the combined descriptors are mostly higher.
2. The precision of the CSLBP descriptors is mostly approaching zero except that of the CSLBP descriptor for BRISK and FREAK features and far lower than that of the original descriptors except that the precision of DAISY and KAZE descriptors is almost zero as well. While the combined descriptors are mostly obviously higher except that the precision of DAISY-CSLBP and KAZE-CSLBP descriptors is still nearly zero. 3. Among the combined descriptors, the FREAK-CSLBP descriptor shows the best matching performance with a less reduced recall (within 12.6%) and a higher precision (over 50%).

Features Descriptors a-e a-d a-c a-b a-f a-g a-h
SIFT CSLBP -35.0%

Influences of JPEG Compression and Illumination Changes
With respect to Ubc sequence (Figure 8), the matching results are shown in Figure 17, Figure 18, Table 9 and Table 10. From those figures and tables, we can get the following conclusions: 1. In most scenes, the recall of the CSLBP descriptors is far lower than that of the original descriptors and that of the non-binary CSLBP descriptors for SIFT, SURF, DAISY and KAZE features is almost approaching zero while the binary CSLBP descriptors for BRISK, ORB, FREAK and AKAZE features not, since the binary CSLBP feature is more robustness against the JPEG compression changes than the non-binary CSLBP feature. While the combined descriptors are mostly higher. 2. The precision of the CSLBP descriptor is mostly lower than that of the original descriptors, while the combined descriptor is all much higher 3. Among the combined descriptors, the ORB-CSLBP descriptor shows the best matching performance with a less reduced recall (within 20.8%) and a higher precision (over 70%).

Computational and Matching Speed Analysis
The time consumption of construction and matching for each descriptor of 30 keypoints are shown in Figure 19a,b. The ratios of the CSLBP and combined descriptors to the original descriptors in computational and matching speed are computed in Table 11. From those figures and table, we  can get the following conclusions: 1. The computational speed of CSLBP descriptor is much faster than that of SIFT, DAISY, KAZE and AKAZE descriptor and a bit faster than that of SURF descriptor, while slower than that of BRISK, ORB and FREAK descriptor. While that of SIFT-CSLBP, DAISY-CSLBP, KAZE-CSLBP and AKAZE-CSLBP descriptor is just slightly slower than that of SIFT, DAISY, KAZE and AKAZE descriptor and that of SURF-CSLBP descriptor is a bit lower than that of SURF descriptor, that of BRISK-CSLBP, ORB-CSLBP and FREAK-CSLBP descriptors is much slower than that of BRISK, ORB and FREAK descriptor. 2. The matching speed of CSLBP descriptor is a little faster than that of SIFT, SURF, DAISY, BRISK, FREAK, KAZE, and AKAZE descriptor, while a bit slower than that of ORB descriptor. Further, the improved descriptors is also a bit faster than that of the original descriptors except that the matching speed of ORB-CSLBP descriptor is slightly slower than that of ORB descriptor. 3. Since the computational time consumption for the descriptor is much longer than the matching time consumption, the total time consumption of the improved algorithms is a litter longer than that of the original algorithms.

Discussion
This paper focuses on addressing the accuracy problem of eight common feature matching algorithms (SIFT, SURF, DAISY, BRISK, ORB, FREAK, KAZE, and AKAZE). The decrease in accuracy under strong illumination changes leads to the poor performance in AR system. We discovered that the CSLBP descriptor has the characteristics of strong illumination invariance, so we create new descriptors by combining the descriptor of original algorithms with the CSLBP descriptor. Experiments are carried out to prove that the robustness against light changes can be improved when using our combined descriptors. We conduct experiments in five different scenes with varied illumination conditions to find out the performance of these descriptors in different environments. Further, experiments on the time consumption of matching and computation are carried out.
Our experimental results have shown that the precision of feature matching by the combined descriptors has improved significantly in most of the street view scenes with strong illumination changes. Specifically, the FREAK-CSLBP descriptor shows the best matching performance under only illumination changes. The SURF-CSLBP descriptor suggests the best matching performance under blur and illumination changes. The ORB-CSLBP descriptor shows the best matching performance under scale, rotation and illumination changes. The FREAK-CSLBP descriptor suggests the best performance under viewpoint and illumination changes. Finally, the ORB-CSLBP descriptor shows the best matching performance under different JPEG compression and illumination changes. In addition, the time consumption of the combined descriptors is just a litter longer than that of the original algorithms and they could also satisfy the real-time performance of AR systems.
Although the precision increases of each combined descriptor, the recall decreases. In other words, the number of correctly matched keypoints decreases, which means we detected fewer correct matched keypoints when using our combined descriptor. The reason of this may be that when two descriptors are combined into a new one, the filtering of keypoints is stricter than before. This may cause an excessive filtering on these keypoints candidates. In our future research, we will further improve this combined descriptor to deal with this over-filtering problem. Meanwhile, we will as well focus on the improvement in computation of these related algorithms to increase the speed of a real-time AR system. In the AR system, feature detection and matching are important procedures, and the light changes will affect the reliability of these steps. By using our proposed descriptor for feature matching, the stability of the whole AR system will be improved. With the development of 5G and other technology, the AR technology will be improved in a rapid speed in the near future.