Urban Visual Localization of Block-Wise Monocular Images with Google Street Views
Abstract
:1. Introduction
2. Related Work
2.1. Visual Localization with Perspective Images
2.2. Visual Localization with Panoramic Images
2.3. Template Matching
3. Methodology
3.1. Permanent Object Segmentation
3.2. GSV Correspondence Finding with Template Matching
3.3. Image-Wise and Block-Wise Similarity Computation
3.4. Pose Estimation of the Query Image
4. Experimental Datasets and Evaluation
4.1. Datasets
4.2. Experimental Results
5. Discussion
5.1. Significance of Permanent Objects
5.2. Size of the GSV Block
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Usman, M.; Asghar, M.R.; Ansari, I.S.; Granelli, F.; Qaraqe, K.A. Technologies and Solutions for Location-Based Services in Smart Cities: Past, Present, and Future. IEEE Access 2018, 6, 22240–22248. [Google Scholar] [CrossRef]
- Burgard, W.; Brock, O.; Stachniss, C. Map-Based Precision Vehicle Localization in Urban Environments. In Robotics: Science and Systems III; MIT Press: Cambridge, MA, USA, 2008; pp. 121–128. ISBN 9780262255868. [Google Scholar]
- Xiao, Z.; Yang, D.; Wen, T.; Jiang, K.; Yan, R. Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs. Sensors 2020, 20, 1870. [Google Scholar] [CrossRef]
- Agarwal, P.; Burgard, W.; Spinello, L. Metric Localization Using Google Street View. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October 2015. [Google Scholar]
- Pauls, J.-H.; Petek, K.; Poggenhans, F.; Stiller, C. Monocular Localization in HD Maps by Combining Semantic Segmentation and Distance Transform. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 4595–4601. [Google Scholar]
- Stenborg, E.; Toft, C.; Hammarstrand, L. Long-Term Visual Localization Using Semantically Segmented Images. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 6484–6490. [Google Scholar]
- Zamir, A.R.; Shah, M. Accurate Image Localization Based on Google Maps Street View. In Proceedings of the Computer Vision—ECCV 2010, Heraklion, Greece, 5–11 September 2010; pp. 255–268. [Google Scholar]
- Qu, X.; Soheilian, B.; Paparoditis, N. Vehicle Localization Using Mono-Camera and Geo-Referenced Traffic Signs. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; pp. 605–610. [Google Scholar]
- Senlet, T.; Elgammal, A. A Framework for Global Vehicle Localization Using Stereo Images and Satellite and Road Maps. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 2034–2041. [Google Scholar]
- De Paula Veronese, L.; de Aguiar, E.; Nascimento, R.C.; Guivant, J.; Auat Cheein, F.A.; De Souza, A.F.; Oliveira-Santos, T. Re-Emission and Satellite Aerial Maps Applied to Vehicle Localization on Urban Environments. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October 2015; pp. 4285–4290. [Google Scholar]
- Chu, H.; Mei, H.; Bansal, M.; Walter, M.R. Accurate Vision-Based Vehicle Localization Using Satellite Imagery. arXiv 2015, arXiv:1510.09171. [Google Scholar]
- Dogruer, C.U.; Koku, B.; Dolen, M. Global Urban Localization of Outdoor Mobile Robots Using Satellite Images. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 3927–3932. [Google Scholar]
- Bresson, G.; Yu, L.; Joly, C.; Moutarde, F. Urban Localization with Street Views Using a Convolutional Neural Network for End-to-End Camera Pose Regression. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 1199–1204. [Google Scholar]
- Gruen, A. Everything Moves: The Rapid Changes in Photogrammetry and Remote Sensing. Geo Spat. Inf. Sci. 2021, 24, 33–49. [Google Scholar] [CrossRef]
- Zhang, W.; Kosecka, J. Image Based Localization in Urban Environments. In Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), Chapel Hill, NC, USA, 14–16 June 2006; pp. 33–40. [Google Scholar]
- Yu, L.; Joly, C.; Bresson, G.; Moutarde, F. Improving Robustness of Monocular Urban Localization Using Augmented Street View. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 513–519. [Google Scholar]
- Yu, L.; Joly, C.; Bresson, G.; Moutarde, F. Monocular Urban Localization Using Street View. In Proceedings of the 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), Phuket, Thailan, 13–15 November 2016; pp. 1–6. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Virtual, 6–14 December 2021; pp. 12077–12090. [Google Scholar]
- Cheng, J.; Wu, Y.; AbdAlmageed, W.; Natarajan, P. QATM: Quality-Aware Template Matching for Deep Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; Volume 139, pp. 8748–8763. [Google Scholar]
- Ali, N.; Bajwa, K.B.; Sablatnig, R.; Chatzichristofis, S.A.; Iqbal, Z.; Rashid, M.; Habib, H.A. A Novel Image Retrieval Based on Visual Words Integration of SIFT and SURF. PLoS ONE 2016, 11, e0157428. [Google Scholar] [CrossRef] [PubMed]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Proceedings of the Computer Vision—ECCV 2006, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Karakasis, E.G.; Amanatiadis, A.; Gasteratos, A.; Chatzichristofis, S.A. Image Moment Invariants as Local Features for Content Based Image Retrieval Using the Bag-of-Visual-Words Model. Pattern Recognit. Lett. 2015, 55, 22–27. [Google Scholar] [CrossRef]
- Jégou, H.; Perronnin, F.; Douze, M.; Sánchez, J.; Pérez, P.; Schmid, C. Aggregating Local Image Descriptors into Compact Codes. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1704–1716. [Google Scholar] [CrossRef]
- Torii, A.; Sivic, J.; Okutomi, M.; Pajdla, T. Visual Place Recognition with Repetitive Structures. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 2346–2359. [Google Scholar] [CrossRef] [PubMed]
- Torii, A.; Arandjelović, R.; Sivic, J.; Okutomi, M.; Pajdla, T. 24/7 Place Recognition by View Synthesis. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1808–1817. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 806–813. [Google Scholar]
- Tolias, G.; Sicre, R.; Jégou, H. Particular Object Retrieval with Integral Max-Pooling of CNN Activations. arXiv 2015, arXiv:1511.05879. [Google Scholar]
- Jogin, M.; Mohana; Madhulika, M.S.; Divya, G.D.; Meghana, R.K.; Apoorva, S. Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 18–19 May 2018; pp. 2319–2323. [Google Scholar]
- Liu, Y.H. Feature Extraction and Image Recognition with Convolutional Neural Networks. J. Phys. Conf. Ser. 2018, 1087, 062032. [Google Scholar] [CrossRef]
- Chen, Z.; Jacobson, A.; Sünderhauf, N.; Upcroft, B.; Liu, L.; Shen, C.; Reid, I.; Milford, M. Deep Learning Features at Scale for Visual Place Recognition. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3223–3230. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Cinaroglu, I.; Bastanlar, Y. Long-Term Image-Based Vehicle Localization Improved with Learnt Semantic Descriptors. Eng. Sci. Technol. Int. J. 2022, 35, 101098. [Google Scholar] [CrossRef]
- Orhan, S.; Baştanlar, Y. Efficient Search in a Panoramic Image Database for Long-Term Visual Localization. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, QC, Canada, 11–17 October 2021; pp. 1727–1734. [Google Scholar]
- Goedemé, T.; Nuttin, M.; Tuytelaars, T.; Van Gool, L. Omnidirectional Vision Based Topological Navigation. Int. J. Comput. Vis. 2007, 74, 219–236. [Google Scholar] [CrossRef]
- Murillo, A.C.; Singh, G.; Kosecká, J.; Guerrero, J.J. Localization in Urban Environments Using a Panoramic Gist Descriptor. IEEE Trans. Rob. 2013, 29, 146–160. [Google Scholar] [CrossRef]
- Hansen, P.; Browning, B. Omnidirectional Visual Place Recognition Using Rotation Invariant Sequence Matching. 2015. Available online: https://kilthub.cmu.edu/ndownloader/files/12039332 (accessed on 15 September 2023).
- Lu, H.; Li, X.; Zhang, H.; Zheng, Z. Robust Place Recognition Based on Omnidirectional Vision and Real-Time Local Visual Features for Mobile Robots. Adv. Robot. 2013, 27, 1439–1453. [Google Scholar] [CrossRef]
- Wang, T.-H.; Huang, H.-J.; Lin, J.-T.; Hu, C.-W.; Zeng, K.-H.; Sun, M. Omnidirectional CNN for Visual Place Recognition and Navigation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 2341–2348. [Google Scholar]
- Cheng, R.; Wang, K.; Lin, S.; Hu, W.; Yang, K.; Huang, X.; Li, H.; Sun, D.; Bai, J. Panoramic Annular Localizer: Tackling the Variation Challenges of Outdoor Localization Using Panoramic Annular Images and Active Deep Descriptors. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 920–925. [Google Scholar]
- Hashemi, N.S.; Aghdam, R.B.; Ghiasi, A.S.B.; Fatemi, P. Template Matching Advances and Applications in Image Analysis. arXiv 2016, arXiv:1610.07231. [Google Scholar]
- Hisham, M.B.; Yaakob, S.N.; Raof, R.A.A.; Nazren, A.B.A.; Wafi, N.M. Template Matching Using Sum of Squared Difference and Normalized Cross Correlation. In Proceedings of the 2015 IEEE Student Conference on Research and Development (SCOReD), Kuala Lumpur, Malaysia, 13–14 December 2015; pp. 100–104. [Google Scholar]
- Yoo, J.-C.; Han, T.H. Fast Normalized Cross-Correlation. Circuits Syst. Signal Process. 2009, 28, 819–843. [Google Scholar] [CrossRef]
- Briechle, K.; Hanebeck, U.D. Template Matching Using Fast Normalized Cross Correlation. In Proceedings of the Optical Pattern Recognition XII, Orlando, FL, USA, 19 April 2001; Volume 4387, pp. 95–102. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef]
- Talmi, I.; Mechrez, R.; Zelnik-Manor, L. Template Matching with Deformable Diversity Similarity. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 175–183. [Google Scholar]
- Luo, W.; Schwing, A.G.; Urtasun, R. Efficient Deep Learning for Stereo Matching. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5695–5703. [Google Scholar]
- Wu, Y.; Abd-Almageed, W.; Natarajan, P. Deep Matching and Validation Network: An End-to-End Solution to Constrained Image Splicing Localization and Detection. In Proceedings of the 25th ACM International Conference on Multimedia, New York, NY, USA, 19 October 2017; pp. 1480–1502. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Bai, H.; Wang, P.; Zhang, R.; Su, Z. SegFormer: A Topic Segmentation Model with Controllable Range of Attention. Proc. AAAI Conf. Artif. Intell. 2023, 37, 12545–12552. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Singh, P.K.; Sinha, S.; Choudhury, P. An Improved Item-Based Collaborative Filtering Using a Modified Bhattacharyya Coefficient and User–User Similarity as Weight. Knowl. Inf. Syst. 2022, 64, 665–701. [Google Scholar] [CrossRef]
- Rathee, N.; Ganotra, D. An Efficient Approach for Facial Action Unit Intensity Detection Using Distance Metric Learning Based on Cosine Similarity. Signal Image Video Process. 2018, 12, 1141–1148. [Google Scholar] [CrossRef]
- Dubey, V.K.; Saxena, A.K. A Sequential Cosine Similarity Based Feature Selection Technique for High Dimensional Datasets. In Proceedings of the 2015 39th National Systems Conference (NSC), Greater Noida, India, 14–16 December 2015; pp. 1–5. [Google Scholar]
- Araújo, A.B. Drawing Equirectangular VR Panoramas with Ruler, Compass, and Protractor. J. Sci. Technol. Arts 2018, 10, 15–27. [Google Scholar] [CrossRef]
- Zamir, A.R.; Shah, M. Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1546–1558. [Google Scholar] [CrossRef]
- Çinaroğlu, İ.; Baştanlar, Y. Image Based Localization Using Semantic Segmentation for Autonomous Driving. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; pp. 1–4. [Google Scholar]
- Cinaroglu, I.; Bastanlar, Y. Training Semantic Descriptors for Image-Based Localization. arXiv 2022, arXiv:2202.01212. [Google Scholar]
- Blanco-Claraco, J.-L.; Moreno-Dueñas, F.-Á.; González-Jiménez, J. The Málaga Urban Dataset: High-Rate Stereo and LiDAR in a Realistic Urban Scenario. Int. J. Rob. Res. 2014, 33, 207–214. [Google Scholar] [CrossRef]
Datasets | Coverage | Query Images | GSV Images | Ratio | ||
---|---|---|---|---|---|---|
Counts | Date | Counts | Date | Query:GSV | ||
UCF | 0.8 km2 | 300 | 2012–2014 | 1291 | 2018–2019 | 1:4.3 |
MSV | 0.6 km2 | 436 | 2014–2020 | 3411 | 2018–2019 | 1:7.8 |
PUVBN | 1.0 km2 | 714 | 2022 | 1820 | 2018–2019 | 1:2.5 |
UCF | MSV | PUVBN | |
---|---|---|---|
Image-wise | 4.51 ± 12.75 m | 2.79 ± 9.33 m | 2.54 ± 7.97 m |
Block-wise | 2.12 ± 9.01 m | 1.35 ± 7.29 m | 1.09 ± 5.77 m |
Datasets | |||||||||
---|---|---|---|---|---|---|---|---|---|
Road | Sidewalk | Building | Wall | Fence | Pole | Traffic Light | Traffic Sign | Total | |
UCF | −0.063 | −0.0186 | 0.184 | −0.058 | −0.103 | 0.026 | −0.021 | 0.015 | 0.153 |
MSV | 0.122 | −0.002 | 0.154 | 0.172 | 0.116 | −0.187 | −0.016 | −0.052 | 0.217 |
PUVBN | 0.142 | 0.072 | 0.234 | 0.064 | 0.009 | 0.132 | 0.014 | 0.080 | 0.281 |
GSV Block Size | Horizontal | Vertical | Orientation |
---|---|---|---|
3 | 1.09 ± 5.77 m | 1.24 ± 3.22 m | 4.88 ± 3.21° |
6 | 1.18 ± 3.91 m | 1.33 ± 1.55 m | 5.02 ± 3.19° |
9 | 1.21 ± 2.60 m | 1.17 ± 1.20 m | 4.18 ± 2.67° |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Li, S.; Anderson, J.; Shan, J. Urban Visual Localization of Block-Wise Monocular Images with Google Street Views. Remote Sens. 2024, 16, 801. https://doi.org/10.3390/rs16050801
Li Z, Li S, Anderson J, Shan J. Urban Visual Localization of Block-Wise Monocular Images with Google Street Views. Remote Sensing. 2024; 16(5):801. https://doi.org/10.3390/rs16050801
Chicago/Turabian StyleLi, Zhixin, Shuang Li, John Anderson, and Jie Shan. 2024. "Urban Visual Localization of Block-Wise Monocular Images with Google Street Views" Remote Sensing 16, no. 5: 801. https://doi.org/10.3390/rs16050801
APA StyleLi, Z., Li, S., Anderson, J., & Shan, J. (2024). Urban Visual Localization of Block-Wise Monocular Images with Google Street Views. Remote Sensing, 16(5), 801. https://doi.org/10.3390/rs16050801