A Unified Framework for Street-View Panorama Stitching

In this paper, we propose a unified framework to generate a pleasant and high-quality street-view panorama by stitching multiple panoramic images captured from the cameras mounted on the mobile platform. Our proposed framework is comprised of four major steps: image warping, color correction, optimal seam line detection and image blending. Since the input images are captured without a precisely common projection center from the scenes with the depth differences with respect to the cameras to different extents, such images cannot be precisely aligned in geometry. Therefore, an efficient image warping method based on the dense optical flow field is proposed to greatly suppress the influence of large geometric misalignment at first. Then, to lessen the influence of photometric inconsistencies caused by the illumination variations and different exposure settings, we propose an efficient color correction algorithm via matching extreme points of histograms to greatly decrease color differences between warped images. After that, the optimal seam lines between adjacent input images are detected via the graph cut energy minimization framework. At last, the Laplacian pyramid blending algorithm is applied to further eliminate the stitching artifacts along the optimal seam lines. Experimental results on a large set of challenging street-view panoramic images captured form the real world illustrate that the proposed system is capable of creating high-quality panoramas.


Introduction
Nowadays, with the development of street-view panoramas, which provide 360 • panoramic views along streets in the real world, the demand for high-quality panoramic images gradually becomes greater. Image stitching is the key technology to produce high-quality panoramic images, which is also an important and classical problem in the fields of photogrammetry [1][2][3][4][5], remote sensing [6][7][8][9] and computer vision [10][11][12][13][14][15], which is widely used to merge multiple aligned images into a single wide-angle composite image as seamlessly as possible.
In an ideally static scene in which both the geometric misalignments and the photometric inconsistencies do not exist or are not obviously visible in overlap regions, the stitched or mosaicked image looks perfect only when the geometric distance criterion is used. However, as we know, most of the street-view panoramic images are captured by a panoramic camera mounted on a mobile platform. Generally, the panoramic camera is comprised of multiple wide-angle or fish-eye cameras whose projection centers are slightly different. Therefore, those images cannot be precisely aligned in geometry; namely, there exist the geometric deviations for corresponding pixels from different images to different extents. In addition, there also exist photometric inconsistencies to different extents in overlap regions between adjacent images due to illumination variations and/or different exposure settings. This paper focuses on creating a visually pleasant street-view panorama by stitching or mosaicking multiple street-view panoramic images among which there may exist the severe geometric misalignments and the strong photometric inconsistencies.
One traditional and efficient way to eliminate the stitching artifacts caused by the large geometric misalignments existing in the input aligned panoramic images is to detect the optimal seam lines that avoid crossing the majority of visually obvious objects and most of the overlap regions with low image similarity and large object dislocation. The optimal seam line detection methods search for the seam lines in overlap regions between images where their intensity or gradient differences are not significant. Based on the optimally-detected seam lines, multiple aligned images can be mosaicked into a single composite image in which the obvious image parallax caused by image misalignments can be magnificently concealed. Many methods [2][3][4][5][6][16][17][18][19] regarded the optimal seam line detection as an energy optimization problem and solved it by minimizing a specially-designed energy function defined to represent the difference between the original images along the seam lines. For these methods, the key ideas concentrate on how to define the effective energy functions and how to guarantee the optimality of the solution. The energy functions are often defined by considering color, gradient and texture and are optimized via different optimization algorithms, e.g., the snake model [20], Dijkstra's algorithm [21], dynamic programming [22] and graph cuts [23]. Nowadays, the optimal seam line detected by many algorithms can avoid crossing the regions with low image similarity and high object dislocation. In our previous work presented in [19], we proposed an efficient optimal seam line detection algorithm for mosaicking aerial and panoramic images based on the graph cut energy minimization framework. In this paper, we will apply this algorithm to detect the optimal seam lines.
However, when the geometric misalignments are very large, the stitching artifacts perhaps cannot be completely avoided even though the optimal seam lines are detected, especially for street-view panoramic images among which there always exist geometric misalignments to different extents due to those images being captured from scenes with large depth differences by a panoramic camera comprised of multiple wide-angle or fish-eye cameras without a precisely common projection center, which means that the geometric misalignments are different at different positions. Therefore, the large geometric misalignments existing in the input aligned panoramic images should be eliminated as much as possible before finding the optimal seam lines. In this paper, we creatively propose an image warping algorithm based on the optical flow field to reduce the geometric misalignments between input panoramic images. Image warping is a transformation that maps all positions in one image plane to the corresponding ones in another plane [24], which has been popularly applied in many fields of computer vision, such as image morphing [25,26], image retargeting [27,28] and image mosaicking [29,30]. The key technique of image warping is to find the appropriate transformation functions based on the control conditions and then eliminate the distortions between input images. One famous image warping algorithm worked based on thin-plate splines [31] that attempted to minimize the amount of bending in the deformation. They used the radial basis functions with thin-plate splines to find a space deformation defined by control points. However, local non-uniform scaling and shearing possibly occurred in the deformed images. The work in [32] firstly introduced the concept of as-rigid-as-possible transformations, which have the property that both local scaling and shearing are very slight. To produce as-rigid-as-possible deformations, [33] proposed a point-based image deformation technique, which firstly triangulated the input image and then geometrically minimized the distortion associated with each triangle. However, this algorithm needs to triangulate the input image at first, and the results may not be smooth across triangle boundaries. The work in [34] provided an image deformation method based on moving least squares [35] using various classes of linear functions including affine, similarity and rigid transformations. It first found the deformation functions based on the control points or the line segments and then applied the deformation functions on each grid instead of each pixel to reduce the transformation time. At last, it filled the resulting quads using the bilinear interpolation. The work in [36] proposed an image warping algorithm based on radial basis functions, which formulated the image warping problem as the scattered data interpolation problem and used the radial basis functions to construct the interpolation. It aimed at identifying the best radial basis functions for image warping. Our image warping method is similar to this algorithm, but we used the Multilevel B-splines Approximation (MBA) [37] to solve the scattered data interpolation problem. Recently, the b-spline approximation technique has been widely used for image registration [38,39], image morphing, image warping, curve/surface fitting and geometric modeling.
In addition, due to the differences of both the image capturing viewpoints and the camera exposure settings, there are large differences of color and brightness between the warped panoramic images. The large color differences between those images also can cause the stitching artifacts in the last stitched or mosaicked panorama. Furthermore, the large color differences may affect the quality of the seam lines. Therefore, we also need to suppress the color differences between warped images before we apply the optimal seam line detection. Generally, the color correction approaches can be divided into two broad categories according to [40]: parametric and non-parametric. Panoramic approaches assume that the color relationship between images can be described by a certain model. A few noteworthy parametric approaches are described here. The work in [41] proposed a simple linear model to transform the color of the source image to the target image. The transformation matrix was estimated by using the histogram mapping over the overlap regions. The work in [12] applied the gain compensation (i.e., the diagonal model) to reduce color differences between input images. They computed all gains by minimizing an error function, which is the sum of gain normalized intensity errors for all overlapping pixels. The work in [42] also employed the diagonal model for the color and luminance compensation where the correction coefficients were computed as the ratio of the sum of pixel values in the overlap regions. As stated in [43], the linear transformation models can provide a simple yet effective way to transform colors, but they have clear limitations in explaining the complicated nonlinear transformations in the imaging process. Non-parametric approaches can handle this problem well. Non-parametric approaches do not follow any particular model for the color mapping, and most of them use some form of a look-up table to record the mapping of the full range of color levels. As stated in [40], parametric approaches are more effective in extending the color in non-overlap regions without generating gain artifacts, while non-parametric approaches can provide better color matching results. The work in [44] proposed to use the joint histogram of correspondences matched using the SIFT features [45] to correct the color differences. The color mapping function was estimated by using an energy minimization scheme. The work in [46] proposed a color correction approach by using the cumulative color histogram. This method used the cumulative histogram-based mapping to automatically adapt the color of all source images to the reference image. The work in [43] presented a nonlinear and nonparametric color transfer framework that operates in a 3D color space. Based on some control corresponding colors in a given image pair, this method used the probabilistic moving least squares to interpolate the transformation functions for each color. We correct the color differences between two images based on the matched extreme points, which are extracted from the histograms over the overlap regions. Both the Probability Density Functions (PDFs) and Cumulative Distribution Functions (CDFs) are used to find the reliably-matched extreme points. To reduce the gain artifacts in non-overlap regions, we propose to apply the alpha correction method to smooth the transition from non-overlapping regions to overlapping ones.
Although we propose efficient approaches to correct the color differences and detect the optimal seam lines between warped panoramic images, there may also exist some color transitions along the seam lines due to the color differences not being able to be eliminated completely. In order to further conceal these artifacts, the image blending techniques can be further applied along the seam lines. In the last several decades, many image blending algorithms have been proposed to smooth the color differences along the seam lines, such as feathering [47], alpha blending [48], Laplacian pyramid blending [49], Poisson blending [50] and the gradient domain image blending approach [51]. In this paper, we simply applied the Laplacian pyramid blending algorithm [49] to eliminate the stitching artifacts and generate the last pleasant panorama.
In this paper, we propose a unified framework for our developed street-view panorama stitching system, as described in Figure 1. First, multiple original images, which were captured from a single panoramic camera comprised of multiple wide-angle or fish-eye cameras (usually digital SLR cameras) without a precisely common projection center, are fed into our stitching system as the input. Therefore, we will align these input images into a common spherical coordinate system based on the found feature correspondences using the existing open-source library. After that, our proposed image warping method based on the dense optical flow field approximately interpolated from the sparse feature matches, which is detailed described in Section 2, is used to greatly reduce the geometric misalignments. Then, an automatic contrast adjustment and an efficient histogram matching-based color correction approach presented in Section 3 are used to reduce the color differences. Finally, we adopt an efficient seam line detection approach based on the graph cut energy minimization framework to find the optimal seam lines between two overlapped images followed by applying the image blending to eliminate the color transitions along the seam lines. By our proposed unified panorama stitching framework, our system can generate a pleasant street-view panorama as seamlessly as possible by stitching multiple panoramic images from the cameras mounted on the mobile platform. Experimental results on challenging street-view panoramic images are reported in Section 5 followed by the conclusions drawn in Section 6.

Image Warping
In our developed street-view panorama stitching system, we first check whether all input images are geometrically aligned into a common spherical coordinate system. If not, we will align them by using the open-source library PanoTools (available at http://www.panoramatools.com/), which also serves as the underlying core engine for many image stitching software, such as PTGui (available at http://www.ptgui.com/) and Hugin (available at http://hugin.sourceforge.net/). However, there always exist large geometric misalignments between these aligned images to different extents because those images were captured from scenes with large depth differences by a single panoramic camera comprised of multiple wide-angle or fish-eye cameras without a precisely common projection center.
Those geometric misalignments are so large that the stitching artifacts cannot be avoided completely even though the optimal seam lines are detected for image stitching. To ensure the high quality of the last stitched panorama, we propose to apply the image warping technique to eliminate those large geometric misalignments as much as possible. To describe our proposed image warping algorithm more clearly, we first consider a simple case of two aligned images I and I with an overlap. The process of our proposed image warping algorithm is described as follows. Firstly, the corresponding points between two images are found as the control points of image warping, and the sparse optical flows are calculated for those control points. Secondly, the Multilevel B-splines Approximation (MBA) algorithm [37] is used to approximately interpolate the dense optical flows for all integral pixels in the warped image with respect to the original one from the sparse optical flows. Lastly, we warp the two input images based on the dense optical flows, and thus, the geometric misalignments can be greatly lessened. For the case of multiple images, a simple strategy is proposed to first handle the horizontal images and then to deal with the vertical ones.

Feature Point Matching
To warp two images with large geometric misalignments, we need to find the control points at first. The quality of the warped image mainly depends on the accuracy and densities of control points. In this paper, we apply the feature matching algorithm to robustly find the sparse matching points, namely the control points. The main ideal for feature matching is to first extract local invariant features independently from two images and then characterize them by invariant descriptors. The distance between two descriptor vectors is used to identify candidate matches. However, the nearest neighbors is not always the best match due to occlusion and deformation derived from large viewpoint changes and repeated structures in the scenes. Then, we applied the assumption presented in [52] and the epipolar geometrical constraint to remove the outliers. The major steps of the feature matching include initial matching and outlier detection, which are summarized in Algorithm 1. An example of finding point correspondences between two panoramic images with an overlap is illustrated in Figure 2.

Algorithm 1
The proposed feature point matching algorithm.

Initial Matching
(a) Extract and describe two sets of local invariant features from two overlapped images I and I by using the SURF algorithm, respectively; (b) Find the initial point matches M initial between I and I according to the conditions listed in Section 2.1.1.

Initial Matching
Given two adjacent images I and I with an overlap, the local invariant features are extracted and described by the SURF algorithm [53]. Let f = (x, d) be a feature point where x = (x, y) denotes the 2D coordinate of this feature point, and d representing its corresponding invariant descriptor vector, and be the feature point sets extracted from I and I , respectively, where M and N denote the numbers of the feature points extracted from I and I , respectively. Generally, for one feature point f p in F , the feature point f q with the nearest can be regarded as the corresponding matching point of f p . However, this simple strategy has some drawbacks in the context of feature matching. This mainly because that the distance values between different corresponding pairs may vary in a relatively large range, so any permissive distance threshold T d cannot avoid the appearance of high rate outliers when covering most of the good correspondences. Thus, we propose to modify the matching strategy as follows. In this paper, we accept two feature points f p and f q as a potential match only when they satisfy the following conditions: • The feature points f p ∈ F and f q ∈ F are the nearest neighbors of each other. Namely, for the feature point f p , f q is its nearest neighbor in F . At the same time, for the feature point f q , f p is its nearest neighbor in F .

•
The Euclidean descriptor vector distance d(f p , f q ) between two feature points f p and f q is not We represent the nearest distance between f p and F as By this matching strategy, we obtain a set of initial matches denoted as

Outlier Detection
After initial matching, there may still exist a few outliers in M initial . Of course, we need to filter out those outliers. According to the assumption proposed by [52] that the matches in a small neighborhood tend to have the consistent location changes (i.e., motions), in this paper, we firstly apply this assumption to identify the outliers. Then, we applied the widely-used constraint of the epipolar geometric to further identify the outliers.
Given a match f p , f q , the motions from f p to f q along the horizontal direction and the vertical one are calculated, respectively, as follows: where f p = (x p , y p ) and f q = (x q , y q ) . Thus, the magnitude value of the motion vector (m can be calculated as: (2) Here, we use m( to represent all the three motion components of the match f p , f q . At first, we assign the labels of all matches as Inlier, namely, for each match f p , f q ∈ M initial , the label L( f p , f q ) = Inlier, and then we iteratively find the outliers. For each match f p , f q ∈ M initial , we find K n (K n = 60 was used in this paper) neighboring match points of f p from F denoted as the set N (f p ). Then, we collect all matches whose labels are Inlier from N (f p ) as a new set N inlier (f p ). If the number of inliers in N (f p ), namely, the size of N inlier (f p ), is less than K i (K i = 10 was used in this paper), we directly label this match as an Outlier, namely, L( f p , f q ) = Outlier, otherwise, we determine whether this match is an inlier by checking whether it has the consistent motion with its neighbors N inlier (f p ). For each match f m , f n ∈ N inlier (f p ), the motion m(f m ) from f m to f n can be calculated according to both Equations (1) and (2). Then, the mean motion µ µ µ( of all match points in N inlier (f p ) can be determined easily. According to the following measurement proposed by [52], the label of the match f p , f q can be determined as follows: | denotes the absolute distances in three components between the motion m(f p ) of the match f p , f q and the mean motion µ µ µ(f p ) of its neighbor matches. λ is a predefined parameter, and we set as λ = 3. After that, we applied the epipolar geometric constraint based on the estimated fundamental matrix to further detect the outliers. The fundamental matrix is estimated by using the RANSAC algorithm. As we known, we cannot estimate the fundamental matrix between two panoramic images which have been projected into the spherical coordinate at first. Therefore, we first project the panoramic images into the perspective plane, and the fundamental matrix is estimated in this plane by using the RANSAC algorithm. The epipolar lines are calculated in the perspective plane at first, and then we re-project them into the spherical coordinate to identify the outliers. At last, we can find all the inliers from M initial denoted as of the set M inlier = { f m , f n |f m ∈ F , f n ∈ F }.

Approximate Interpolation of Dense Optical Flows
LetĪ andĪ be the warped images of two adjacent images I and I , respectively. The aim of our proposed image warping algorithm is to ensure that the geometric alignments between the warped imagesĪ andĪ become smaller. To achieve this objective, we propose to approximately interpolate the optical flows of all of the integral pixels inĪ with respect to I and all of the integral pixels inĪ with respect to I based on the disparity vectors of the reliable point matches with respect to each other as the control points. Firstly, we calculate the disparity vectors d(x m ) and d(x n ) of each reliable point match x m , x n in M inlier from the warped images to the original ones as follows: where x m = (x m , y m ) and x n = (x n , y n ) . In this way, we expect to warp the images I and I based on the half offsets of real disparity vectors to reduce the warping distortion. Secondly, we propose to approximately interpolate the optical flows of all integral pixels in the warped imagesĪ andĪ based on the disparity vectors {d( respectively. This problem can be formulated as the scattered data interpolation problem. Due to the sparsity of the control points, in this paper, we adapt to apply the Multilevel B-Splines Approximation (MBA) [37] to solve this problem, which has been widely used for image registration, image morphing, image warping, curve/surface fitting and geometric modeling. By this MBA interpolation, we separately interpolate the horizontal and vertical components of optical flows (i.e., disparity vectors) of all the integral pixels inĪ andĪ , respectively. In this way, we finally obtain the dense optical flows D(Ī) = {d(p)} p∈Ī and D(Ī ) = {d(p )} p ∈Ī of all of the integral pixels {p} p∈I and {p } p ∈I in the warped imagesĪ andĪ with respect to the original images I and I , respectively.

Two Image Warping
Here, we demonstrate how to generate the warped imageĪ from the original image I based on the dense optical flows D(Ī) ofĪ with respect to I, and the generation of the warped imageĪ is similar. For each pixel p ∈Ī, we can easily calculate its corresponding 2D position in I based on its approximately interpolated optical flow (i.e., disparity vector)d(p) as p +d(p). Then, we use the bilinear interpolation algorithm to interpolate the intensity of the corresponding point p +d(p) in I as the intensity of the integral pixel p ∈Ī.
According to the above image warping procedure, we can obtain two warped images from two input panoramic images with the overlap. The geometric misalignments between warped images become smaller than those between the original images after warping correction.

Multiple Image Warping
Until now, we have introduced how to warp two images based on the optical flows. However, we need to warp multiple input images to generate the last panorama. In the experimental results presented in this paper, the input images are comprised of five horizontal ones and one vertical one, which are represented as (I 1 , I 2 , I 3 , I 4 , I 5 , I 6 ), whose correspondingly warped images are represented as (Ī 1 ,Ī 2 ,Ī 3 ,Ī 4 ,Ī 5 ,Ī 6 ), and the overlap relationship of those images is shown in Figure 3. For this particular case, here we will in detail introduce how to warp these six images for producing the last panorama before color correction. Other cases of multiple images can be handled in a similar way. For multiple input panoramic images, we first collect all image pairs according to their overlap relationship, as shown in Figure 3. Obviously, there are five image pairs along the horizontal direction, and one image pair along the vertical direction. We first handle the horizontal image pairs and then deal with the vertical image pair. For each horizontal image pair, we match them one by one by the method presented in Section 2.1, as an illustrative example shown in Figure 4, from which we can find that one horizontal image is overlapped with two adjacent images in the horizontal direction. For example, for the image I 1 , it overlaps with I 2 and I 5 , respectively, so we need to collect all matching points from these two overlap regions as the control points for warping I 1 . The dense optical flow field of the warped imageĪ 1 with respect to the original image I 1 can be approximately interpolated based on those control points via the MBA algorithm. Therefore, five horizontal warped imagesĪ 1 , I 2 ,Ī 3 ,Ī 4 andĪ 5 can be generated by warping their corresponding original images according to the method presented in Section 2.3, respectively. Figure 5 shows an example for warping one horizontal image. After that, we generate the bottom blended image I H by blending all horizontal warped images according to the proposed color correction method presented in Section 3 and the adopted image mosaicking strategy described in Section 4. Finally, to produce the last panorama, the top image I 6 and the horizontal blended image I H will be warped according to those matching points as the control ones.

Summary
In this section, we proposed a new image warping method to eliminate the large geometrical misalignments between adjacent aligned images. This method includes three stages: feature point matching, approximate interpolation of dense optical flows and image warping. In the first stage, we first used the SURF algorithm [53] to find the initial matching points between two images and then applied the assumption presented in [52] and the epipolar constraint to eliminate outliers. The sparse optical flows can be calculated between two matched feature points. In the second stage, we proposed to apply the dense optical flows to simulate the non-rigid deformations between aligned images. The dense optical flows will be interpolated via the MBA algorithm [37] from sparse ones. In the last stage, we proposed to warp the initially aligned images based on the dense optical fields, and we also proposed a strategy to handle multiple image warping. The key contribution of this section is that we proposed to use the optical flows to simulate and eliminate the large non-rigid geometrical misalignments between input aligned street-view images. This method is simple and completed by combining some conventional algorithms, such as SURF, MBA, and so on. However, to the best of the authors' knowledge, no one has applied this efficient strategy to handle the large geometrical misalignments between street-view panoramic images before color correction and image mosaicking.

Color Correction
The large geometric misalignments can be efficiently eliminated by our proposed image warping algorithm, but there also exist the color differences between the warped images, so the stitching artifacts are still visible. Generally, the image blending technique can solve this easily by smoothing the color along the seam lines. However, it does not work well for input images with very large color differences. The simple image blending may not be able to efficiently conceal the artifacts if we don not magnificently correct color differences between images in advance, which results in low-quality panoramic images, as the illustrative example shown in Figure 6. In addition, the large color differences maybe also affect the quality of the detected seam lines. Thus, in this paper, we propose to reduce the color differences between warped images before the optimal seam lines are detected. Generally, the color differences should be also corrected before the image warping step to ensure the quality of feature matching results. However, our adopted SURF feature matching algorithm is robust enough to the large photometric inconsistencies, so there is no obvious influence on our algorithm if we apply the color correction after image warping. In this paper, we first apply the automatic contrast adjustment to reduce the brightness differences between images and then propose a novel and efficient color correction algorithm via matching extreme points of intensity histograms to further reduce the color differences. For the overlap image regions between two images, we construct their own Probability Density Functions (PDFs) and Cumulative Distribution Functions (CDFs) with respect to the intensity histograms in the three HSV channels, respectively. One way to eliminate color differences is to ensure that the three CDFs of the overlap regions in the first image in the three HSV channels are approximately the same as those CDFs of the overlap regions in the second image, respectively. Obviously, we can correct the CDFs based on several uniformly-spaced knots, as [54] did. However, due to the existence of geometric misalignments, the scenes presented by two images in the overlap regions are not completely consistent. To solve this problem, we replace the knots by the matched extreme points extracted from the two PDFs. If the number of matched extreme points is not sufficient, we will suitably introduce those uniformly-space knots. At last, the intensities of all of the pixels in the two images are modified afterwards based on the matched extreme points extracted from the PDFs, not only for the pixels in the overlap regions, but also in the non-overlap regions.

Automatic Contrast Adjustment
At first, in order to make sure that multiple images have the similar contrast, which can produce satisfactory blending results, the three RGB channels of individual images are automatically adjusted in contrast. The histograms of a color image are calculated firstly in each of the three RGB channels, respectively. Let I be a single-channel image and I = {I k } N k=1 be a set of one-dimensional sorted intensities of all valid pixels in I in the ascending order, where N denotes the total number of valid pixels in I and I k represents the intensity of the k-th sorted pixel in I. The minimal and maximal intensities I min and I max in I are defined, respectively, as follows: where ∆ denotes the upper integer of a real value ∆ and c is a small percentage value in the range of (0, 50) (c = 0.1 was empirically used in this paper), which can be used to skip over a part of the real minimal and maximal intensities due to the fact that these pixels may be caused by noises and information lacking in most cases. The minimal and maximal intensity values of the R, G and B channels of a color image are denoted as R min , G min , B min , R max , G max , and B max , respectively. The minimal and maximal intensity values of the whole color image are defined as V min = min(R min , G min , B min ) and V max = max(R max , G max , B max ), respectively. Therefore, any intensity I of the R, G and B channels of a color image will be modified as: In the same way, all of the images to be used for creating a panorama will be automatically adjusted in contrast, which will slightly reduce the brightness differences between images.

Finding Extreme Points
After applying the automatic contrast adjustment on the multiple panoramic images, we propose to further reduce the color differences between panoramic images by matching extreme points of histograms. For the statistic analysis, only valid pixels in the overlap regions between two images are considered. Let A and B be the overlap image regions in two images, respectively. To make a better description of the information hidden behind the image, we convert A and B from the original RGB color space to the HSV color space, respectively. For each channel of A and B, we calculate their PDFs and CDFs, which are denoted as PDF A , PDF B , CDF A and CDF B , respectively.
To robustly find extreme points in both PDF A and PDF B , these two PDFs are smoothed first by a Gaussian function to suppress possible noise. The initial local extreme points can be easily obtained from the smoothed PDF A and PDF B . In an ideal situation, the extreme points should be uniformly distributed in the color space. However, most of the extreme points are relatively centralized in some cases, which will lead to the information redundancy due to that multiple extreme points being selected out to represent the similar image statistical information. To avoid the situation mentioned above, we further checkout all initial extreme points by the local window suppression. Let in PDF A , which are sorted in the ascending order. Given an extreme point P i A , we generate a neighborhood range [L i A − w, L i A + w] centered on the corresponding intensity L i A with the size of (2w + 1). We set w = 2 if not specifically stated in this paper. If there exists more than one extreme points located in this neighborhood range, the extreme point with the highest frequency in PDF A will be retained, and other extreme points will be removed. All initial extreme points are checked in this way, and the retained extreme points are used for the following matching. The final extreme points extracted from PDF A

Matching Extreme Points
The extreme points can sufficiently reflect image statistical characteristics. To efficiently adjust the color differences, one way is to ensure that the intensities of corresponding extreme points are the same. Thus, we should match the extreme points firstly. To reliably match these extreme points , we define a cost function to measure the matching similarity of two extreme points P i A and P j B as: where F max is the maximal frequency of all of the extreme points in both PDF A and PDF B . The above cost function judges the two extreme points from the view of both PDF and CDF. The first term indicates that those possibly matched extreme points with the higher frequencies generate higher costs, which may be peak out first in the following matching selection strategy. The second term indicates that there are the similar frequencies for two possibly matched extreme points P i A and P i B . The last term is applied to ensure that the accumulative values of two possibly matched extreme points P i A and P i B are approximate. From this term, we can find that if the small ranges of cumulative values of two extreme points are similar, the numerator max , which results in that this term being close to one. In contrast, if the numerator is smaller and the denominator is larger, this term will approach to zero. In summary, if the frequencies of two extreme points are larger and more similar and the accumulative values of those points are more approximate, their matching cost is bigger. In contrast, it is smaller. The higher the cost function value is, the more likely these two extreme points are matched. Based on this cost definition, a N A × N B matching cost matrix M = [M ij ] N A ×N B is created. In order to efficiently eliminate the impossibly matched extreme points, we empirically designed three hard conditions from the view of both PDF and CDF to check whether two extreme points P i A and P j B are possibly matched as follows: Based on the computed matching cost matrix M, we propose an efficient iterative strategy to find the matched extreme points as the following steps:

•
Step 1: Finding the highest non-zero cost element M ij from the matrix M and its corresponding extreme points P i A and P j B is selected out as a reliable extreme point match.

•
Step 2: Updating the matrix M by removing the i-th row and the j-th column due to that P i A and P j B have been successfully matched.

•
Step 3: Performing the above two steps iteratively until the updated matrix M is empty or there exists no non-zero element in M.
By the above iterative strategy, a set of reliable extreme point matches will be found. In Figure 7, we have shown a visual example of our proposed color correction approach. The two input images have large color differences in overlap regions, as shown in Figure 7a,b. We find 5 matched extreme points in the PDF of one channel. Based on those correspondences, the large color differences can be eliminated, as shown in Figure 7e,f. From this example, we can find that our proposed approach can handle the images with large color differences very well. Sometimes, no match or too few matches can be reliably found via the above matching strategy in the whole CDF range or some relatively large CDF range. In this case, we will introduce more matches with the help of both CDF A and CDF B , which are selected from H uniformly distributed points {C k A } H k=1 and {C k B } H k=1 from CDF A and CDF B , respectively, but not from the previously found extreme points. The same number of sampling points in CDF A and CDF B are uniformly selected in accordance with the cumulative density values. In our experiments, the percentages of sampling intervals were used as [0.1, 0.3, 0.5, 0.7, 0.9]. If there exists no extreme point match found in the ranges [C k A − κC max , C k A + κC max ] and [C k B − κC max , C k B + κC max ], the current sampling points C k A and C k B will be added into the matching set as a new point match, where κ is a given threshold in advance (κ = 0.1 was used in this paper).

Correcting Color Difference
The extracted matching points in the overlap image regions are then applied to correct the intensities of two adjacent images, including the pixels in non-overlap regions. Let Based on these corrections, the intensity of any pixel in both A and B will be adjusted linearly. For example, given a pixel p ∈ A whose intensity L A (p) will be linearly corrected as: where L A (p) ∈ [L u A , L l A ], L u A and L l A denote the intensities of two matching points in A that are closest to L A (p), and theL u A andL l A are the corresponding corrected intensities. In order to produce a smooth and gradual transition from non-overlaping regions to overlaping ones, the alpha correction method is conducted as: where L A (p) denotes the finally fused intensity of the pixel p, L A (p) is the original intensity of the pixel p whileL A (p) is the corrected intensity of the corresponding pixel based on the above-mentioned correction method, and α(p) is a function that related to the distance between the pixel p and the center line of the overlap image region, which ranges between zero and one as shown in Figure 8, where the smaller the distance to the center line, the larger the α is. All the pixels in other images will be processed in the same way.

Summary
In this section, we proposed a new color correction method to reduce the color differences between warped images. This method also includes three stages: automatic contrast adjustment, histogram extreme point detection and matching and color difference correction. In the first stage, we applied the traditional method to adjust the image contrast. This stage can be regarded as the pre-processing. The second stage is the key contribution of the proposed color correction method. In this stage, we proposed a new cost function to evaluate the similarity between two extreme points extracted from two PDFs and then find all matched extreme points iteratively. In the last stage, based on the matched histogram extreme points, the color differences can be corrected easily. The key contribution of this section is that we proposed a new color correction method via matching extreme points of histograms. This method is effective and efficient, which can handle large color differences between two images.

Image Mosaicking
Although the large geometric misalignments and photometric inconsistencies have been greatly reduced through our proposed image warping and color correction algorithms, respectively, there always exist small geometric misalignments and color differences between adjacent images. To stitch the color corrected panoramic images into the single composite panorama, we also need to find the optimal seam lines in the overlap image regions between warped images to magnificently conceal the parallax. In this paper, the optimal seam lines between color corrected images will be efficiently extracted using the graph cuts-based seam line detection algorithm presented in [19]. For multiple images, we apply the traditional frame-to-frame optimization strategy to efficiently find all optimal seam lines. The details of this strategy are described in Section 3.1 of [19]. Furthermore, the Laplacian pyramid blending [49] algorithm will be further applied to eliminate the stitching artifacts caused by small color differences along the seam lines.

Experimental Results
Extensive experiments on representative street-view panoramic images were conducted to comprehensively evaluate the performance of our proposed unified framework for street-view panorama stitching. In this paper, all used street-view panoramic images were captured from the real world scenes by an integrated multi-camera equipment with six Nikon D7100 cameras of 24 million pixels with wide-angle lenses mounted on a mobile vehicle platform. Six camera images were aligned into a common spherical coordinate system with the image size of 12, 000 × 6000 pixels. Due to that the projection centers of these six cameras are not precisely the same, there always exist large geometrical misalignments at different extents between the adjacently aligned images, especially in the image regions close to the camera centers. The overlap relationship of those six panoramic images is shown in Figure 3. Our algorithms in this paper were implemented with C++ under Windows and tested in a computer with an Intel Core i7-4770 at 3.4 GHz and the 16 GB RAM memory. Due to the limit of pages, more experimental results and analysis are presented at http://cvrs.whu.edu.cn/projects/ PanoStitching/. In addition, the stitching framework provided by this paper has been adopted by IShowChina(http://www.ishowchina.com/) to product the street-view panoramas for many cities in China. Nowadays, the street-view maps of some cities are already available on IShowChina, such as Beijing(http://map.ishowchina.com/index.html#c=110000&s=005200-0-201308190236430390& heading=127&pitch=-3&zoom=0) and Macao(http://map.ishowchina.com/index.html#c=440400&s= 005223-0-201401210342400290&heading=353&pitch=10&zoom=0). The large set of panoramic images generated by our proposed framework have proved that our framework is robust and effective.

Image Warping
In this section, we conducted the experiments on two groups of panoramic images to prove the effectiveness and superiority of our proposed image warping algorithm described in Section 2. The panorama stitching results without and with the use of our proposed image warping algorithm in the first group of six panoramic images are shown in Figure 9a,b, respectively. We can find that the whole seamlines in two panoramas cross the similar regions with the high image similarity. However, from the whole stitching results and especially the detailed local regions shown in Figure 9a,b, we observed that the stitching artifacts caused by the geometric dislocation in the panorama, as shown in Figure 9a, stitched without the use of image warping algorithm are more obvious than the panorama, as shown in Figure 9b, stitched with its use. Noticeably, the stitching artifacts caused by geometric dislocation become smaller as expected when the image warping algorithm was applied, as shown in Figure 9b. While not using the image warping algorithm, the geometric dislocation is very large, as shown in Figure 9a. For example, in the first enlarged local region, the seamline crossed the text without the used of image warping, and it avoided crossing the text when the image warping was used. In the second enlarged local region, although two seamlines crossed the road with pavement stairs, we can find that the dislocation is almost invisible in the pavement stairs when the image warping was used, but it is so obvious without the use of the image warping. In the aspect of computational cost, without the use of image warping, our algorithm took around 17.89 s in the above experiment, only the elapsed time in six optimal seamlines detection is included. However, with its use, our algorithm took around 70.93 s consisting of all the elapsed times in the image warping and the optimal seamline detection. From this comparison, we observed that the seamline detection is efficient, but the image warping is relatively time-consuming. This is mainly because that we need to find the inlier matches for all image pairs at first and then interpolate the dense optical flows by MBA for each image, which is time-consuming. But our proposed image warping algorithm can significantly improve the quality of the last stitched panorama. (c) (d) Figure 9. Visual comparison of the stitching results with the optimal seamlines in two groups of six panoramic images when our proposed image warping algorithm was used (Right: (b,d)) or not (Left: (a,c), namely, the stitching results of [19]). The red lines stand for the detected optimal seamlines between images.
The comparative experimental results on another group of panoramic images are presented in Figure 9c,d, respectively, and the similar conclusions can be drawn. The computational times of our algorithm without the use of image warping and with its use are 13.19 s and 56.77 s, respectively.
From the above experimental results on two groups of panoramic images, we observed that our proposed image warping algorithm can effectively eliminate the stitching artifacts caused by the geometrical dislocations and can also slightly improve the quality of the found optimal seamlines to some extent.

Color Correction and Image Blending
In this section, we conducted the experiments in two group panoramic images to prove that our proposed color correction algorithm can magnificently reduce the large color differences between the warped images. In addition, we also presented the last panoramas generated by our proposed system and compared them with the open-source software Enblend (Available at http://enblend.sourceforge. net/.) which are popularly used to generate the street-view panorama by stitching the registered panoramic images. Figure 10 shows the experimental results on the first group of panoramic images. The panorama stitching results without and with the use of color correction are shown in Figure 10a,b, respectively. From the whole stitching results and especially the detailed local regions shown in Figure 10a,b, we can find that color differences between the warped images were significantly reduced and are almost invisible. In addition, the quality of the detected optimal seamlines was improved as expected when the color correction algorithm was used due to that the color differences were greatly reduced before the seamlines were found. For example, the seamline rounded the advertising board instead of crossing it when the color correction algorithm was used, as shown in the detailed image regions in Figure 10a,b. In the aspect of computational cost, without the use of the color correction, our algorithm took around 18.01 s to find all six optimal seamlines. With its use, our algorithm took around 33.52 s to correct the color differences and find the optimal seamlines, means that the color correction algorithm took around 15.51 s. To generate the last panorama, the Laplacian pyramid blending algorithm was further applied, whose generated result is shown in Figure 10d. And in Figure 10c, we also present the last panorama generated by Enblend. From the visual comparison, we can observe that our proposed stitching system with image warping and color correction obviously outperforms Enblend. Noticeably, the stitching artifacts caused by geometric misalignments and photometric inconsistencies still exist in the panorama generated by Enblend, as shown in Figure 10c but they almost disappeared in our produced panorama, as shown in Figure 10d. In the aspect of computational cost, the Laplacian pyramid blending algorithm took 35.56 s. The experimental results on another group of panoramic images are presented in Figure 11 and the similar conclusion can be drawn. The large color differences were greatly reduced by our proposed color correction algorithm, especially in the regions of sky and the tall buildings, and the quality of the detected seamlines was slightly improved to some extent. The seamlines bypass the buildings and the white lane when the color differences were corrected for the warped images. Likewise, the stitching artifacts existed in the panorama produced by Enblend disappeared in the panorama generated by our proposed system. The elapsed times in color correction, optimal seamline detection and image blending are 17.33, 13.77 and 35.61 s, respectively. Figure 11. Visual comparison in the second group of six panoramic images: (a,b) the stitching results with the optimal seamlines when the our proposed color correction was used (b) or not (a); (c,d) the last panoramas generated by Enblend (c) and our proposed stitching system (d).

Image Stitching
To illustrate the effectiveness of our proposed framework for street-view panorama stitching, we presented the last panoramas stitched by different combination of optimal seamline detection (S), image warping (W), color correction (C) and image blending (B) algorithms in Figure 12.
At first, Figure 12a shows the panorama generated by the optimal seamline detection algorithm presented by [19], from which we can find that there are many stitching artifacts caused by geometric misalignments and photometric inconsistencies in the last stitching image, especially obvious in the detailed local regions. For example, the white lanes on the road were broken due to the large geometric dislocations. In addition, there also exist large color differences along the optimal seamlines. Our proposed image warping and color correction algorithm can eliminate large geometric misalignments and photometric inconsistencies, as shown in Figure 12b,c, respectively. The last blended panorama generated by our proposed system is shown in Figure 12d from which we can observe that the last stitched panoramic image is pleasant and high-quality, which can meet the application requirement of the street-view map.

Comparative Results
At last, to prove that our approach is superior and can generate high-quality panoramas, we compared our proposed approach with another algorithm, the open source library and the open tools. At first, we compared our proposed framework with the Xiong and Pulli's approach [42] step by step. We used two representative groups of panoramic images for visual comparison. The color differences in the first group of images are relatively small but large in the second group. Because the Xiong and Pulli's approach has not eliminated the influence of large geometric misalignments between aligned images, so we used the warped images generated by our image warping algorithm as the input ones for comparing two approaches. In addition, their approach applied the Poisson blending algorithm to generate the last blended image, however, our approach used the Laplacian pyramid blending algorithm. To evaluate the last blended panoramas generated by two approaches more fair, we replaced the Poisson blending algorithm in the tested Xiong and Pulli's approach with the Laplacian pyramid blending algorithm. Figure 13 shows the stitching results of the first group of images with relatively small color differences. Figure 13a,b illustrate the stitching results just with the detected seamlines of the Xiong and Pulli's approach and our approach without the use of color correction, respectively. Figure 13c,d illustrate the stitching results of two approaches with the use of color correction, respectively, from which we observed that both of two approaches can eliminate the small color differences effectively. Figure 13e,f show the last blended panoramas generated by two approaches, respectively, from which we found that there is some petty ghost on the top of the tallest building in the second enlarged region shown in Figure 13e, which disappeared in the panorama generated by our approach, as shown in Figure 13f. This is mainly because the horizontal seamline between bottom and top input images detected by the Xiong and Pulli's approach is close to this building, as shown in Figure 13c. In conclusion, if the color differences between input images are small, both of two approaches can generate high-quality panoramas. (e,f) the last generated panoramas. Figure 14 shows the stitching results of the second group of images with very large color differences. Figure 13a,b show the stitching results of the Xiong and Pulli's approach and our approach without the use of color correction, respectively. Figure 13c,d present the stitching results of two approaches with the use of color correction, respectively. From the visual comparison, we observed that our proposed color correction algorithm obviously outperformed than the algorithm presented in [42], especially obvious in two locally enlarged regions. For example, in the first enlarged region (from left to right), the detected seamline divides the building into two parts, one comes from the top input image which is dark, and another comes from the bottom input image which is relatively lighter. After color correction, the top image is also very dark in the result generated by the Xiong and Pulli's approach and the color differences along the seamline are also very large. In addition, due to that the top image is too dark, many detailed informations cannot be pleasantly observed. But, in our result, the color of the top image is similar with the bottom one, and more detailed informations of this region can be clearly observed. Figure 13e,f show the last blended panoramas generated by two approaches, respectively. In the second enlarged region of Figure 14e, we found that there are some very obvious ghosts on the top of the building, which disappeared in the panorama generated by our approach, as shown in Figure 14f. In addition, in Figure 14e, the color of top sky regions almost is white, which is not pleasant. However, in the last panorama generated by our approach, the color of those regions is slightly bluish, which is more reasonable and pleasant, as shown in Figure 14f.
In conclusion, if the color differences between input images are large, our approach can also generate high-quality panoramas, but the results generated by the Xiong and Pulli's approach are not so good.
(e) (f) Figure 14. Visual comparison between our approach in the left column and the Xiong and Pulli's approach in the right column on the second group of images with large color differences: (a,b) the results without the use of color correction; (c,d) the results with the use of color correction; (e,f) the last generated panoramas.
In the aspect of computational times of two approaches, the average times on two groups of images are presented in Table 1, from which we can find that our approach is a litter bit more time-consuming than their approach. This is mainly because their approach applied dynamic programming to detect the optimal seamlines but we used graph cuts, which is more time-consuming than dynamic programming. We also observed that the computation times of our proposed color correction algorithm and their algorithm are 16.83 s and 16.08 s, respectively, which are almost the same. But our color correction algorithm is more effective than their algorithm. In addition, we presented more comparative results in Figure 15. We also compared our proposed approach with two widely used open tools Enblend and Smartblend (Available at http://wiki.panotools. org/SmartBlend), and also with the stitching functions provided by OpenCV (Available at http: //opencv.org/). Similar with the experiments presented in Figures 13 and 14, we also used the warped images generated by our image warping algorithm as the input images. From those more visually comparative results, we also can find that our proposed system can generate the best high-quality street-view panoramas. At last, to strongly prove the superiority of our proposed approach, we applied the Oriented Gradients Image Assessment (OG-IQA) algorithm [55] to quantitatively evaluate the qualities of the final panoramic images stitched by different tools and approaches. Here, we evaluated the panoramic images presented in Figures 13 and 14, and six groups of panoramic images presented in Figure 15.
The results of quality assessment are presented in Table 2. In Table 2, we used the Group 1 to Group 6 to represent the panoramic images in the first row to the last row of Figure 15 . From these results, we observed that our approach performed best in most of cases, except of Group 4. But, in Group 4, the quality score of our approach is only litter less than the score of Enblend. Table 2. The quantitative quality assessments of five tested tools and approaches.

Conclusions
In this paper, we proposed a unified framework for street-view panorama stitching system which is comprised of image warping, color correction, optimal seamline detection and image blending for stitching or mosaicking a set of geometrically aligned street-view panoramic images with large geometric misalignments and photometric inconsistencies into a visual-appealing and informative wide-angle composite image. The contributions in this paper are summarized as follows: • We creatively proposed a novel image warping method based on the dense optical flows to greatly reduce the large geometric misalignment existed in the input images as much as possible.
Experimental results have demonstrated the superiority of our proposed image warping method, which can efficiently and greatly eliminate the influence of the large geometric misalignment.

•
We proposed a novel color correction and image blending method to further reduce the color differences between panoramic images based on extreme point matching of histograms of the overlapped image regions of two involved images via both probability density functions and cumulative distribution functions. Experimental results on representative street-view panoramic images have proved that our proposed color correction method is capable of eliminating the large color differences between adjacent images captured in different illumination conditions and/or different exposure settings.

•
We proposed a unified framework for street-view panorama stitching system. Even thought there are large geometrical misalignments and photometric inconsistencies in the input aligned images, our system can also generate pleasant and high-quality panoramas. Our proposed framework outperforms the open-source tools Enblend, Smartblend, OpenCV stitching functions and the approach proposed by [42].
Nevertheless, the proposed system may be improved in the future in the following ways. First, when detecting the optimal seamlines, the superpixel segmentation can be introduced to greatly improve the optimization efficiency by decreasing the number of elements in graph cuts, and the scene understanding or parsing can also be applied in some particular image data. For example, the roads can be detected out for guiding the seamlines. Second, the whole image mosaicking method can be improved to handle more different types of images, not only street-view panoramic ones, but also aerial and oblique ones. At last, the parallel optimization strategy is expected to be developed to more efficiently generate the last panorama.
Author Contributions: Li Li and Jian Yao conducted the algorithm design, and Li Li wrote the paper and performed all the experiments under the supervision of Jian Yao. Renping Xie, Menghan Xia and Wei Zhang worked on the image alignment and contributed to prepare and analyze the experimental data. All authors were involved in modifying the paper, the literature review and the discussion of the results.

Conflicts of Interest:
The authors declare no conflict of interest.