# Rapid 3D Reconstruction for Image Sequence Acquired from UAV Camera

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Literature Review

#### 2.1. SfM

#### 2.2. MVS

#### 2.3. SLAM

## 3. Method

#### 3.1. Algorithm Principles

#### 3.2. Selecting Key Images

_{1}, I

_{2})) if they have a sufficient overlap area. In this study, we propose a method for directly selecting key images for reconstructing the UAV camera’s images (the GPS equipped on the UAV can only reach an accuracy on the order of meters; by using GPS information as a reference for the selection of key images, discontinuous images will form). The overlap area between images can be estimated by the correspondence between the feature points of the images. In order to reduce the computational complexity of feature point matching, we propose a method of compressing the feature points based on principal component analysis (PCA). It is assumed that the images used for reconstruction are rich in texture. Three principal component points (PCPs) can be generated from PCA, each reflecting the distribution of the feature points in different images. If the two images are captured almost at the same position, the PCPs of them almost coincide in the same place. Otherwise, the PCPs will move and be located in different positions on the image. The process steps are as follows. First, we use the scale-invariant feature transform (SIFT) [19] feature detection algorithm to detect the feature points of each image (Figure 2a). There must be at least four feature points, and the centroid of these feature points can then be calculated as follows:

_{i}is the pixel coordinate of the feature point, and $\overline{p}$ is the centroid. The following matrix is formed by the image coordinates of the feature points:

_{m}

_{1}, p

_{m}

_{2}, and p

_{m}

_{3}are the three PCPs, and V

_{1}and V

_{2}are the two vectors of V*. The PCPs can reflect the distribution of the feature points in the image. After that, by calculating the positional relationship of the corresponding PCPs between two consecutive images, we can estimate the overlap area between images. The average displacement (${d}_{p}$) between PCPs, as expressed in Equation (5), can be calculated as follows: ${d}_{p}$ reflects the relative displacement of feature points; when ${d}_{p}<{D}_{l}$, it is likely that the two images are almost captured at the same position; and when ${d}_{p}>{D}_{h}$, the overlap area of two images becomes too small. In this paper, we use 1/100 of the resolution as the value of ${D}_{l}$ and 1/10 of the resolution as the value of ${D}_{h}$. When ${d}_{p}$ is within the certain range given in Equation (6), the two images will meet the key image constraint $R\left({I}_{1},{I}_{2}\right)$:

_{1i}is the ith PCP of the first image (I

_{1}), and p

_{2i}is that of the second image (I

_{2}). The result is presented in Figure 2c. This is a method for estimating the overlap areas between images, and it is not necessary to calculate the actual correlation between the two images when selecting key images. Moreover, the algorithm is not time-consuming for either the calculation of the PCPs or the estimation of the distance between PCPs. Therefore, this method is suitable for quickly selecting key images from a UAV camera’s video image sequence.

#### 3.3. Image Queue SfM

#### 3.3.1. SfM Calculation for the Images in the Queue

_{q}, and the structure of all of the images in C

_{q}is calculated.

- Two images are selected from the queue as the initial image pair using the method proposed in [21]. The fundamental matrix of the two images is obtained by the random sample consensus (RANSAC) method [22], and the essential matrix between the two images is then calculated when the intrinsic matrix (obtained by the calibration method proposed in [23]) is known. The first two terms of radial and tangential distortion parameters are also obtained and used for image rectification. After remapping the pixels onto new locations on the image based on distortion model, the image distortion caused by lens could be eliminated. Then, the positions and orientations of the images can be obtained by decomposing the essential matrix according to [24].
- According to the correspondence of the feature points in different images, the 3D coordinates of the feature points are obtained by triangulation (the feature points are denoted as ${P}_{i}\text{}\left(i=1,\dots ,t\right)$).
- The structure of the initial image pair is calculated, and one of the coordinate systems of the cameras taking the image pair is set as the global coordinate system. The image of the queue that has completed the structure calculation is placed into the set ${C}_{SFM}$ (${C}_{SFM}\subset {C}_{\mathrm{q}}$).
- The new image (${I}_{new}$) is placed into the set (${C}_{SFM}$), and the structural calculation is performed. The new image must meet the following two conditions. First, there should be at least one image in ${C}_{SFM}$ that has common feature points with ${I}_{new}$. Second, at least six of these common feature points must be in ${P}_{i}\left(i=1,\dots ,t\right)$ (in order to improve the stability of the algorithm, this study requires at least 15 common feature points). Finally, all of the parameters from the structure calculation are optimized by bundle adjustment.
- Repeat step 6 until the structure of all of the images inside the queue is calculated (${C}_{SFM}$ = ${C}_{\mathrm{q}}$).

#### 3.3.2. Updating the Image Queue

_{r}are marked as a set U

_{r}, and the projection relationship is expressed as $P:{P}_{r}\to {U}_{r}$. Then, the pixels of the feature points (marked as U

_{k}) of the images in C

_{k}are detected, and the pixels in U

_{k}and U

_{r}are matched. We obtain the correspondence $M:{U}_{rC}\leftrightarrow {U}_{kc}$$({U}_{rc}\in {U}_{r},{U}_{kc}\in {U}_{k})$, and U

_{rC}and U

_{k}

_{c}are the image pixels of the same object points (marked as ${P}_{\mathrm{c}}$) in different images from C

_{r}and C

_{k}, respectively, expressed as $P:{P}_{c}\to {U}_{kc},{P}_{c}\to {U}_{rc},$ where P

_{c}is the control point. The projection matrix of the images in C

_{k}can be estimated by the projection relationship between ${P}_{\mathrm{c}}$ and ${U}_{\mathrm{kc}}$; then, the positions and orientations of the cameras can be calculated. In contrast, ${P}_{\mathrm{c}}$ can be used in the later weighted bundle adjustment to ensure the continuity of the structure. Then, we repeat step 6 until ${C}_{SFM}={C}_{\mathrm{q}}$. Finally, the structure of all of the images can be calculated by repeating the following two procedures alternately: calculate the SfM of the images in the queue and update the image queue.

#### 3.3.3. Weighted Bundle Adjustment

_{i}is the 3D feature point,$\text{}\left(\begin{array}{c}{v}_{i}\\ {u}_{i}\end{array}\right)$ is the actual pixel coordinate of the feature point, and $\left(\begin{array}{c}{v}_{i}{}^{f}\\ {u}_{i}{}^{f}\end{array}\right)$ is the pixel coordinate calculated from the structural parameters. The number of control points is k. The calculation of the bundle adjustment is a nonlinear least-squares problem. The structural parameters $\left(R,T,{P}_{i\left(i=1,\dots ,n\right)}\right)$ can be optimized by minimizing ${\mathrm{e}}_{\mathrm{projrct}}$ after changing the value of the parameters.

#### 3.3.4. MVS

## 4. Experiments

#### 4.1. Data Sets

#### 4.2. Precision Evaluation

_{STL}which is captured by structured light scans (The RMS error of all ground truth poses is within 0.15 mm) provided by roboimagedata [27]. The accuracy of the algorithm is determined by calculating the nearest neighbor distance of the two point clouds [28]. First, the position of the point cloud is registered by the iterative nearest point method. For the common part of PC and PC

_{STL}, each point p

_{1}in the PC, PC

_{STL}is searched for the nearest point p

_{1}’, and the Euclidean distance between p

_{1}and p

_{1}’ is calculated. The distance point cloud is obtained after the distance calculation of each point and marked with different color. We compare the results of our method to those of openMVG [7], openMVS [16] and MicMac [29,30,31] (three open-source software packages). The main concern of openMVG is SfM calculation, while the main concern of openMVS is dense reconstruction. MicMac is a free open-source photogrammetric suite that can be used in a variety of 3D reconstruction scenarios. They both achieved state-of-the-art results. An open source software named Cloud Compare [32] is used for the test. The results are presented in Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12.

#### 4.3. Speed Evaluation

^{2}). After using the methods proposed in this study, the time complexity becomes $\frac{n}{k}O\left(m\times k\right)$ because the matching calculation occurs only for the images inside the image queue. Although m and k are fixed and their values are generally much smaller than N, the speed of the matching is greatly improved. Second, for the SfM calculations, most of the time is spent on bundle adjustment. Bundle adjustment itself is a nonlinear least-squares problem that optimizes the camera and structural parameters; the calculation time will increase because of the increase in the number of parameters. The proposed method divides the global bundle adjustment, which optimizes a large number of parameters, into several local bundle adjustments so that the number of the parameters remains small and the calculation speed of the algorithm improves greatly.

#### 4.4. Results

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Polok, L.; Ila, V.; Solony, M.; Smrz, P.; Zemcik, P. Incremental Block Cholesky Factorization for Nonlinear Least Squares in Robotics. Robot. Sci. Syst.
**2013**, 46, 172–178. [Google Scholar] - Kaess, M.; Johannsson, H.; Roberts, R.; Ila, V.; Leonard, J.J.; Dellaert, F. iSAM2: Incremental smoothing and mapping using the Bayes tree. Int. J. Robot. Res.
**2012**, 31, 216–235. [Google Scholar] [CrossRef] [Green Version] - Liu, M.; Huang, S.; Dissanayake, G.; Wang, H. A convex optimization based approach for pose SLAM problems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 1898–1903. [Google Scholar]
- Beardsley, P.A.; Torr, P.H.S.; Zisserman, A. 3D model acquisition from extended image sequences. In Proceedings of the European Conference on Computer Vision, Cambridge, UK, 14–18 April 1996; pp. 683–695. [Google Scholar]
- Mohr, R.; Veillon, F.; Quan, L. Relative 3-D reconstruction using multiple uncalibrated images. Int. J. Robot. Res.
**1995**, 14, 619–632. [Google Scholar] [CrossRef] - Dellaert, F.; Seitz, S.M.; Thorpe, C.E.; Thrun, S. Structure from motion without correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, USA, 15 June 2000; Volume 552, pp. 557–564. [Google Scholar]
- Moulon, P.; Monasse, P.; Marlet, R. Adaptive structure from motion with a contrario model estimation. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; pp. 257–270. [Google Scholar]
- Wu, C. Towards linear-time incremental structure from motion. In Proceedings of the International Conference on 3DTV-Conference, Aberdeen, UK, 29 June–1 July 2013; pp. 127–134. [Google Scholar]
- Gherardi, R.; Farenzena, M.; Fusiello, A. Improving the efficiency of hierarchical structure-and-motion. In Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1594–1600. [Google Scholar]
- Moulon, P.; Monasse, P.; Marlet, R. Global fusion of relative motions for robust, accurate and scalable structure from motion. In Proceedings of the IEEE International Conference on Computer Vision, Portland, OR, USA, 23–28 June 2013; pp. 3248–3255. [Google Scholar]
- Crandall, D.J.; Owens, A.; Snavely, N.; Huttenlocher, D.P. SfM with MRFs: Discrete-continuous optimization for large-scale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 2841–2853. [Google Scholar] [CrossRef] [PubMed] - Sweeney, C.; Sattler, T.; Höllerer, T.; Turk, M. Optimizing the viewing graph for structure-from-motion. In Proceedings of the IEEE International Conference on Computer Vision, Los Alamitos, CA, USA, 7–13 December 2015; pp. 801–809. [Google Scholar]
- Snavely, N.; Simon, I.; Goesele, M.; Szeliski, R.; Seitz, S.M. Scene reconstruction and visualization from community photo collections. Proc. IEEE
**2010**, 98, 1370–1390. [Google Scholar] [CrossRef] - Wu, C.; Agarwal, S.; Curless, B.; Seitz, S.M. Multicore bundle adjustment. In Proceedings of the Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3057–3064. [Google Scholar]
- Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell.
**2010**, 32, 1362–1376. [Google Scholar] [CrossRef] [PubMed] - Shen, S. Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process.
**2013**, 22, 1901–1914. [Google Scholar] [CrossRef] [PubMed] - Li, J.; Li, E.; Chen, Y.; Xu, L. Bundled depth-map merging for multi-view stereo. In Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2769–2776. [Google Scholar]
- Schönberger, J.L.; Zheng, E.; Frahm, J.M.; Pollefeys, M. Pixelwise View Selection for Unstructured Multi-View Stereo; Springer International Publishing: New York, NY, USA, 2016. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef] - Moulon, P.; Monasse, P. Unordered feature tracking made fast and easy. In Proceedings of the European Conference on Visual Media Production, London, UK, 5–6 December 2012. [Google Scholar]
- Moisan, L.; Moulon, P.; Monasse, P. Automatic homographic registration of a pair of images, with a contrario elimination of outliers. Image Process. Line
**2012**, 2, 329–352. [Google Scholar] [CrossRef] - Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Read. Comput. Vis.
**1987**, 24, 726–740. [Google Scholar] - Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell.
**2000**, 22, 1330–1334. [Google Scholar] [CrossRef] - Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Triggs, B.; Mclauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle adjustment—A modern synthesis. In Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, Corfu, Greece, 21–22 September 1999; pp. 298–372. [Google Scholar]
- Ceres Solver. Available online: http://ceres-solver.org (accessed on 14 January 2018).
- Sølund, T.; Buch, A.G.; Krüger, N.; Aanæs, H. A large-scale 3D object recognition dataset. In Proceedings of the Fourth International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; pp. 73–82. Available online: http://roboimagedata.compute.dtu.dk (accessed on 14 January 2018).
- Jensen, R.; Dahl, A.; Vogiatzis, G.; Tola, E. Large scale multi-view stereopsis evaluation. In Proceedings of the Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 406–413. [Google Scholar]
- Pierrot Deseilligny, M.; Clery, I. Apero, an Open Source Bundle Adjusment Software for Automatic Calibration and Orientation of Set of Images. In Proceedings of the ISPRS—International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXVIII-5/W16, Trento, Italy, 2–4 March 2012; pp. 269–276. [Google Scholar]
- Galland, O.; Bertelsen, H.S.; Guldstrand, F.; Girod, L.; Johannessen, R.F.; Bjugger, F.; Burchardt, S.; Mair, K. Application of open-source photogrammetric software MicMac for monitoring surface deformation in laboratory models. J. Geophys. Res. Solid Earth
**2016**, 121, 2852–2872. [Google Scholar] [CrossRef] - Rupnik, E.; Daakir, M.; Deseilligny, M.P. MicMac—A free, open-source solution for photogrammetry. Open Geosp. Data Softw. Stand.
**2017**, 2, 14. [Google Scholar] [CrossRef] - Cloud Compare. Available online: http://www.cloudcompare.org (accessed on 14 January 2018).

**Figure 2.**Feature point compression. (

**a**) Detecting the feature points of an image; (

**b**) calculating the principal component points (PCPs) of the feature points; and (

**c**) matching the PCPs.

**Figure 6.**Images for experiment (

**a**) Garden; (

**b**) Village; (

**c**) Building; (

**d**) Botanical Garden; (

**e**) Factory land; (

**f**) Academic building; (

**g**) Pot; and (

**h**) House.

**Figure 7.**Point cloud comparison. (

**a**) Point cloud of our method (m = 15, k = 6); (

**b**) Point cloud of openMVG + openMVS; (

**c**) Point cloud of MicMac; (

**d**) Standard point cloud.

**Figure 8.**(

**a**) Distance point cloud between the proposed method’s result and the standard point cloud; (

**b**) Distance point cloud between openMVG + openMVS’s result and the standard point cloud; (

**c**) Distance point cloud of MicMac and the standard point cloud.

**Figure 9.**(

**a**) Distance histogram of our result; (

**b**) Distance histogram of openMVG + openMVS’s result; (

**c**) Distance histogram of MicMac’s result.

**Figure 10.**Point cloud comparison. (

**a**) Point cloud of our method (m = 15, k = 6); (

**b**) Point cloud of openMVG + openMVS; (

**c**) Point cloud of MicMac; (

**d**) Standard point cloud.

**Figure 11.**(

**a**) Distance point cloud between the proposed method’s result and the standard point cloud; (

**b**) Distance point cloud between openMVG + openMVS’s result and the standard point cloud; (

**c**) Distance point cloud between MicMac’s result and the standard point cloud.

**Figure 12.**(

**a**) Distance histogram of our result; (

**b**) Distance histogram of openMVG + openMVS; (

**c**) Distance histogram of MicMac.

**Figure 13.**Reconstruction result of a garden. (

**a**) Part of the images used for reconstruction; (

**b**) Structure calculation of image queue SfM (green points represent the positions of the camera); (

**c**) Dense point cloud of the scene.

**Figure 14.**Reconstruction result of a village. (

**a**) Part of the images used for reconstruction; (

**b**) Structure calculation of image queue SfM (green points represent the positions of the camera); (

**c**) Dense point cloud of the scene.

**Figure 15.**Reconstruction result of buildings. (

**a**) Part of the images used for reconstruction; (

**b**) Structure calculation of image queue SfM (green points represent the positions of the camera); (

**c**) Dense point cloud of the scene.

**Figure 16.**Reconstruction result of botanical garden. (

**a**) Part of the images used for reconstruction; (

**b**) Structure calculation of image queue SfM (green points represent the positions of camera); (

**c**) Dense point cloud of the scene.

**Figure 17.**Reconstruction result of botanical garden. (

**a**) Part of the images used for reconstruction; (

**b**) Structure calculation of image queue SfM (green points represent the positions of camera); (

**c**) Dense point cloud of the scene.

**Figure 18.**Reconstruction result of botanical garden. (

**a**) Part of the images used for reconstruction; (

**b**) Structure calculation of image queue SfM (green points represent the positions of camera); (

**c**) Dense point cloud of the scene.

Name | Image | Resolution | (m, k) | ${\mathit{D}}_{\mathit{l}}$ | ${\mathit{D}}_{\mathit{h}}$ |
---|---|---|---|---|---|

Garden | 126 | 1920 × 1080 | (15, 6)(20, 7)(40, 15) | 25 | 150 |

Village | 145 | 1920 × 1080 | (15, 6)(20, 7)(40, 15) | 25 | 150 |

Building | 149 | 1280 × 720 | (15, 6)(20, 7)(40, 15) | 20 | 150 |

Botanical Garden | 42 | 1920 × 1080 | (15, 6)(20, 7)(40, 15) | 25 | 150 |

Factory Land | 170 | 1280 × 720 | (15, 6)(20, 7)(40, 15) | 20 | 150 |

Academic Building | 128 | 1920 × 1080 | (15, 6)(20, 7)(40, 15) | 25 | 150 |

Pot | 49 | 1600 × 1200 | (8, 3)(10, 4)(15, 6) | 20 | 200 |

House | 49 | 1600 × 1200 | (8, 3)(10, 4)(15, 6) | 20 | 200 |

Name | Images | Resolution | Our Method Time (s) | OpenMVG Time (s) | MicMac Time(s) | ||
---|---|---|---|---|---|---|---|

m = 15, k = 6 | m = 20, k = 7 | m = 40, k = 15 | |||||

Garden | 126 | 1920 × 1080 | 284.0 | 291.0 | 336.0 | 1140.0 | 3072.0 |

Village | 145 | 1920 × 1080 | 169.0 | 209.0 | 319.0 | 857.0 | 2545.0 |

Building | 149 | 1280 × 720 | 171.0 | 164.0 | 268.0 | 651.0 | 2198.0 |

Botanical Garden | 42 | 1920 × 1080 | 77.0 | 82.0 | 99.0 | 93.0 | 243.0 |

Factory Land | 170 | 1280 × 720 | 170.0 | 207.0 | 343.0 | 1019.0 | 3524.0 |

Academic building | 128 | 1920 × 1080 | 124.0 | 182.0 | 277.0 | 551.0 | 4597.0 |

m = 15, k = 6 | m = 10, k = 4 | m = 8, k = 3 | |||||

Pot | 49 | 1600 × 1200 | 35.0 | 39.0 | 47.0 | 56.0 | 351.0 |

House | 49 | 1600 × 1200 | 59.0 | 53.0 | 54.0 | 74.0 | 467.0 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Qu, Y.; Huang, J.; Zhang, X.
Rapid 3D Reconstruction for Image Sequence Acquired from UAV Camera. *Sensors* **2018**, *18*, 225.
https://doi.org/10.3390/s18010225

**AMA Style**

Qu Y, Huang J, Zhang X.
Rapid 3D Reconstruction for Image Sequence Acquired from UAV Camera. *Sensors*. 2018; 18(1):225.
https://doi.org/10.3390/s18010225

**Chicago/Turabian Style**

Qu, Yufu, Jianyu Huang, and Xuan Zhang.
2018. "Rapid 3D Reconstruction for Image Sequence Acquired from UAV Camera" *Sensors* 18, no. 1: 225.
https://doi.org/10.3390/s18010225