2D3D-DescNet: Jointly Learning 2D and 3D Local Feature Descriptors for Cross-Dimensional Matching
Abstract
:1. Introduction
- A novel end-to-end network, 2D3D-DescNet, is proposed to learn 2D and 3D local feature descriptors, which can work on both 2D and 3D domains.
- The constructed 2D-3D consistent loss balances the 2D and 3D local feature descriptors between 2D and 3D domains and bridges the domain gap between 2D images and 3D point clouds.
- Compared with the current approaches based on jointly learning 2D and 3D feature descriptors, the 2D and 3D local feature descriptors learned by 2D3D-DescNet achieve state-of-the-art performance on 2D-3D retrieval and 2D-3D matching.
2. Related Works
2.1. 2D and 3D Feature Descriptor
2.2. Unified 2D and 3D Feature Descriptor
3. Method
3.1. 2D3D-DescNet
3.1.1. Feature Extractor
3.1.2. Cross-Domain Image-Wise Feature Map Extractor
3.1.3. Metric Network
3.2. 2D-3D Consistent Loss Function
3.3. Chamfer Loss
3.3.1. Hard Triplet Margin Loss
3.3.2. Adversarial Loss
3.3.3. Cross-Entropy Loss
3.3.4. Total Loss
3.4. Training Strategy
4. Experiments
4.1. Dataset
4.2. Similarity of 2D and 3D Feature Descriptors
4.2.1. 2D-3D Retrieval
4.2.2. Histogram Visualization of 2D & 3D Feature Descriptors
4.3. 2D-3D Matching
4.4. 3D Point Cloud Registration
4.5. Outdoor 2D-3D Retrieval
4.6. Ablation Study
4.6.1. Dimensions of 2D and 3D Feature Descriptor
4.6.2. Cross-Domain Image-Wise Feature Map
4.6.3. Metric Network and Adversarial Loss
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, W.; Wang, C.; Chen, S.; Bian, X.; Lai, B.; Shen, X.; Cheng, M.; Lai, S.H.; Weng, D.; Li, J. Y-Net: Learning Domain Robust Feature Representation for Ground Camera Image and Large-scale Image-based Point Cloud Registration. Inf. Sci. 2021, 581, 655–677. [Google Scholar] [CrossRef]
- Nadeem, U.; Bennamoun, M.; Togneri, R.; Sohel, F.; Rekavandi, A.M.; Boussaid, F. Cross domain 2D-3D descriptor matching for unconstrained 6-DOF pose estimation. Pattern Recognit. 2023, 142, 109655. [Google Scholar] [CrossRef]
- Shi, C.; Chen, X.; Lu, H.; Deng, W.; Xiao, J.; Dai, B. RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11372–11383. [Google Scholar] [CrossRef]
- Chen, L.; Rottensteiner, F.; Heipke, C. Feature detection and description for image matching: From hand-crafted design to deep learning. Geo-Spat. Inf. Sci. 2021, 24, 58–74. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-invariant Leypoints. Int. J. Comput. Vis. (IJCV) 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up Robust Features. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D Registration. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
- Tombari, F.; Salti, S.; Di Stefano, L. Unique Signatures of Histograms for Local Surface Description. In Proceedings of the European Conference on Computer Vision (ECCV), Heraklion, Greece, 5–11 September 2010; pp. 356–369. [Google Scholar]
- Guo, Y.; Sohel, F.; Bennamoun, M.; Lu, M.; Wan, J. Rotational Projection Statistics for 3D Local Surface Description and Object Recognition. Int. J. Comput. Vis. (IJCV) 2013, 105, 63–86. [Google Scholar] [CrossRef]
- Dhal, P.; Azad, C. A Comprehensive Survey on Feature Selection in the Various Fields of Machine Learning. Appl. Intell. 2022, 52, 4543–4581. [Google Scholar] [CrossRef]
- Bello, S.A.; Yu, S.; Wang, C.; Adam, J.M.; Li, J. Deep Learning on 3D Point Clouds. Remote. Sens. 2020, 12, 1729. [Google Scholar] [CrossRef]
- Dubey, S.R. A Decade Survey of Content Based Image Retrieval Using Deep Learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2687–2704. [Google Scholar] [CrossRef]
- Simo-Serra, E.; Trulls, E.; Ferraz, L.; Kokkinos, I.; Fua, P.; Moreno-Noguer, F. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 118–126. [Google Scholar]
- Tian, Y.; Fan, B.; Wu, F. L2-net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 661–669. [Google Scholar]
- Tian, Y.; Yu, X.; Fan, B.; Wu, F.; Heijnen, H.; Balntas, V. Sosnet: Second Order Similarity Regularization for Local Descriptor Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11016–11025. [Google Scholar]
- Tyszkiewicz, M.; Fua, P.; Trulls, E. DISK: Learning Local Features with Policy Gradient. Adv. Neural Inf. Process. Syst. (NerulIPS) 2020, 33, 14254–14265. [Google Scholar]
- Zhang, J.; Jiao, L.; Ma, W.; Liu, F.; Liu, X.; Li, L.; Zhu, H. RDLNet: A Regularized Descriptor Learning Network. IEEE Trans. Neural Networks Learn. Syst. 2021, 34, 5669–5681. [Google Scholar] [CrossRef]
- Lindenberger, P.; Sarlin, P.E.; Pollefeys, M. Lightglue: Local Feature Matching at Light Speed. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 17627–17638. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Adv. Neural Inf. Process. Syst. (NerulIPS) 2017, 30, 5099–5108. [Google Scholar]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on X-transformed Points. In Proceedings of the Advances in Neural Information Processing Systems (NerulIPS), Montreal, QC, Canada, 3–8 December 2018; pp. 820–830. [Google Scholar]
- Deng, H.; Birdal, T.; Ilic, S. Ppfnet: Global Context Aware Local Features for Robust 3D Point Matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 195–205. [Google Scholar]
- Bai, X.; Luo, Z.; Zhou, L.; Fu, H.; Quan, L.; Tai, C.L. D3feat: Joint Learning of Dense Detection and Description of 3D Local Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6359–6367. [Google Scholar]
- Ao, S.; Hu, Q.; Yang, B.; Markham, A.; Guo, Y. Spinnet: Learning a General Surface Descriptor for 3D Point Cloud Registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11753–11762. [Google Scholar]
- Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. Pointnext: Revisiting Pointnet++ with Improved Training and Scaling Strategies. Adv. Neural Inf. Process. Syst. (NerulIPS) 2022, 35, 23192–23204. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. (NerulIPS) 2014, 27, 2672–2680. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution Gray-scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Chen, J.; Kellokumpu, V.; Zhao, G.; Pietikäinen, M. RLBP: Robust Local Binary Pattern. In Proceedings of the British Machine Vision Conference (BMVC), Bristol, UK, 9–13 September 2013. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary Robust Independent Elementary Features. In Proceedings of the European Conference on Computer Vision (ECCV), Heraklion, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Wang, Z.; Fan, B.; Wu, F. Local Intensity Order Pattern for Feature Description. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 603–610. [Google Scholar]
- Wang, Z.; Fan, B.; Wang, G.; Wu, F. Exploring Local and Overall Ordinal Information for Robust Feature Description. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2015, 38, 2198–2211. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J.; Kwok, N.M. A Comprehensive Performance Evaluation of 3D Local Feature Descriptors. Int. J. Comput. Vis. (IJCV) 2016, 116, 66–89. [Google Scholar] [CrossRef]
- Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image Matching from Handcrafted to Deep Features: A Survey. Int. J. Comput. Vis. (IJCV) 2020, 129, 23–79. [Google Scholar] [CrossRef]
- Xia, Y.; Xu, Y.; Li, S.; Wang, R.; Du, J.; Cremers, D.; Stilla, U. SOE-Net: A Self-attention and Orientation Encoding Network for Point Cloud Based Place Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11348–11357. [Google Scholar]
- Xia, Y.; Gladkova, M.; Wang, R.; Li, Q.; Stilla, U.; Henriques, J.F.; Cremers, D. CASSPR: Cross Attention Single Scan Place Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 8461–8472. [Google Scholar]
- Xia, Y.; Shi, L.; Ding, Z.; Henriques, J.F.; Cremers, D. Text2Loc: 3D Point Cloud Localization from Natural Language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
- Georgiou, T.; Liu, Y.; Chen, W.; Lew, M. A Survey of Traditional and Deep Learning-based Feature Descriptors for High Dimensional Data in Computer Vision. Int. J. Multimed. Inf. Retr. 2020, 9, 135–170. [Google Scholar] [CrossRef]
- Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A Review of Multimodal Image Matching: Methods and Applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar] [CrossRef]
- Han, X.F.; Feng, Z.A.; Sun, S.J.; Xiao, G.Q. 3D Point Cloud Descriptors: State-of-The-Art. Artif. Intell. Rev. 2023, 56, 12033–12083. [Google Scholar] [CrossRef]
- Feng, M.; Hu, S.; Ang, M.H.; Lee, G.H. 2D3D-Matchnet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud. In Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4790–4796. [Google Scholar]
- Liu, W.; Lai, B.; Wang, C.; Bian, X.; Yang, W.; Xia, Y.; Lin, X.; Lai, S.H.; Weng, D.; Li, J. Learning to Match 2D Images and 3D LiDAR Point Clouds for Outdoor Augmented Reality. In Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Atlanta, GA, USA, 22–26 March 2020; pp. 655–656. [Google Scholar]
- Liu, W.; Shen, X.; Wang, C.; Zhang, Z.; Wen, C.; Li, J. H-Net: Neural Network for Cross-domain Image Patch Matching. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 856–863. [Google Scholar]
- Pham, Q.H.; Uy, M.A.; Hua, B.S.; Nguyen, D.T.; Roig, G.; Yeung, S.K. LCD: Learned Cross-Domain Descriptors for 2D-3D Matching. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; pp. 11856–11864. [Google Scholar]
- Liu, W.; Lai, B.; Wang, C.; Bian, X.; Wen, C.; Cheng, M.; Zang, Y.; Xia, Y.; Li, J. Matching 2D Image Patches and 3D Point Cloud Volumes by Learning Local Cross-domain Feature Descriptors. In Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Lisbon, Portugal, 27 March–1 April 2021; pp. 516–517. [Google Scholar]
- Lai, B.; Liu, W.; Wang, C.; Bian, X.; Su, Y.; Lin, X.; Yuan, Z.; Shen, S.; Cheng, M. Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network. In Proceedings of the Image and Graphics: 11th International Conference (ICIG), Haikou, China, 6–8 August 2021; pp. 15–27. [Google Scholar]
- Lai, B.; Liu, W.; Wang, C.; Fan, X.; Lin, Y.; Bian, X.; Wu, S.; Cheng, M.; Li, J. 2D3D-MVPNet: Learning Cross-domain Feature Descriptors for 2D-3D Matching Based on Multi-view Projections of Point Clouds. Appl. Intell. 2022, 52, 14178–14193. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Mishchuk, A.; Mishkin, D.; Radenovic, F.; Matas, J. Working Hard to Know Your Neighbor’s Margins: Local Descriptor Learning Loss. In Proceedings of the Advances in Neural Information Processing Systems (NerulIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 4826–4837. [Google Scholar]
- Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3Dmatch: Learning Local Geometric Descriptors from RGB-D Reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1802–1811. [Google Scholar]
- Wu, Q.; Shen, Y.; Jiang, H.; Mei, G.; Ding, Y.; Luo, L.; Xie, J.; Yang, J. Graph Matching Optimization Network for Point Cloud Registration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 5320–5325. [Google Scholar]
- Tamata, K.; Mashita, T. Feature Description with Feature Point Registration Error Using Local and Global Point Cloud Encoders. IEICE Trans. Inf. Syst. 2022, 105, 134–140. [Google Scholar] [CrossRef]
- Bai, X.; Luo, Z.; Zhou, L.; Chen, H.; Li, L.; Hu, Z.; Fu, H.; Tai, C.L. Pointdsc: Robust point cloud registration using deep spatial consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15859–15869. [Google Scholar]
- Ren, Y.; Luo, W.; Tian, X.; Shi, Q. Extract descriptors for point cloud registration by graph clustering attention network. Electronics 2022, 11, 686. [Google Scholar] [CrossRef]
- Choi, S.; Zhou, Q.Y.; Koltun, V. Robust reconstruction of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5556–5565. [Google Scholar]
- Zhou, Q.Y.; Park, J.; Koltun, V. Fast global registration. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 766–782. [Google Scholar]
- Gojcic, Z.; Zhou, C.; Wegner, J.D.; Wieser, A. The perfect match: 3d point cloud matching with smoothed densities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5545–5554. [Google Scholar]
Method | TOP1 | TOP5 |
---|---|---|
2D3D-MatchNet [40] | 0.2097 | 0.4318 |
Siam2D3D-Net [41] | 0.2123 | 0.4567 |
2D3D-GAN-Net [44] | 0.5842 | 0.8811 |
LCD [43] | 0.7174 | 0.9412 |
HAS-Net [45] | 0.7565 | 0.9623 |
2D3D-MVPNet [46] | 0.8011 | 0.9482 |
2D3D-DescNet (Ours) | 0.9271 | 0.9916 |
Method | FPR95 | Precision |
---|---|---|
2D3D-DescNet | 0.1891 | 99.605 |
2D3D-DescNet | CZK [54] | FGR [55] | 3DMatch [49] | 3DSmoothNet [56] | PointNetAE [43] | LCD [43] | |
---|---|---|---|---|---|---|---|
Kitchen | 0.781 | 0.499 | 0.305 | 0.853 | 0.871 | 0.766 | 0.891 |
Home1 | 0.613 | 0.632 | 0.434 | 0.783 | 0.896 | 0.726 | 0.783 |
Home2 | 0.528 | 0.403 | 0.283 | 0.610 | 0.723 | 0.579 | 0.629 |
Hotel1 | 0.714 | 0.643 | 0.401 | 0.786 | 0.791 | 0.786 | 0.808 |
Hotel2 | 0.589 | 0.667 | 0.426 | 0.590 | 0.846 | 0.680 | 0.769 |
Hotel3 | 0.423 | 0.577 | 0.385 | 0.577 | 0.731 | 0.731 | 0.654 |
Study | 0.572 | 0.547 | 0.291 | 0.633 | 0.556 | 0.641 | 0.662 |
MIT Lab | 0.444 | 0.378 | 0.200 | 0.511 | 0.467 | 0.511 | 0.600 |
Average | 0.583 | 0.543 | 0.342 | 0.688 | 0.735 | 0.677 | 0.725 |
Method | TOP1 | TOP5 |
---|---|---|
2D3D-MatchNet [40] | 0.0081 | 0.0449 |
Siam2D3D-Net [41] | 0.0084 | 0.0365 |
2D3D-GAN-Net [44] | 0.0101 | 0.0489 |
LCD [43] | 0.0343 | 0.0698 |
HAS-Net [45] | 0.1782 | 0.2393 |
2D3D-MVPNet [46] | 0.2588 | 0.3615 |
2D3D-DescNet (Ours) | 0.6971 | 0.7563 |
Dimension | TOP1 | TOP5 | FPR95 | Precision |
---|---|---|---|---|
64 | 0.8813 | 0.9844 | 0.2206 | 99.605 |
128 | 0.9271 | 0.9916 | 0.1891 | 99.605 |
256 | 0.9221 | 0.9922 | 0.2153 | 99.610 |
TOP1 | TOP5 | FPR95 | Precision | |
---|---|---|---|---|
2D3D-DescNet | 0.9271 | 0.9916 | 0.1891 | 99.605 |
2D3D-DescNet w/o | ||||
image-wise feature map | 0.9169 | 0.9900 | 0.1839 | 99.660 |
2D3D-DescNet w/o | ||||
metric network | 0.8873 | 0.9512 | 0.1855 | 99.391 |
2D3D-DescNet w/o | ||||
adversarial loss | 0.8452 | 0.9374 | 0.1872 | 98.824 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, S.; Su, Y.; Lai, B.; Cai, L.; Hong, C.; Li, L.; Qiu, X.; Jia, H.; Liu, W. 2D3D-DescNet: Jointly Learning 2D and 3D Local Feature Descriptors for Cross-Dimensional Matching. Remote Sens. 2024, 16, 2493. https://doi.org/10.3390/rs16132493
Chen S, Su Y, Lai B, Cai L, Hong C, Li L, Qiu X, Jia H, Liu W. 2D3D-DescNet: Jointly Learning 2D and 3D Local Feature Descriptors for Cross-Dimensional Matching. Remote Sensing. 2024; 16(13):2493. https://doi.org/10.3390/rs16132493
Chicago/Turabian StyleChen, Shuting, Yanfei Su, Baiqi Lai, Luwei Cai, Chengxi Hong, Li Li, Xiuliang Qiu, Hong Jia, and Weiquan Liu. 2024. "2D3D-DescNet: Jointly Learning 2D and 3D Local Feature Descriptors for Cross-Dimensional Matching" Remote Sensing 16, no. 13: 2493. https://doi.org/10.3390/rs16132493
APA StyleChen, S., Su, Y., Lai, B., Cai, L., Hong, C., Li, L., Qiu, X., Jia, H., & Liu, W. (2024). 2D3D-DescNet: Jointly Learning 2D and 3D Local Feature Descriptors for Cross-Dimensional Matching. Remote Sensing, 16(13), 2493. https://doi.org/10.3390/rs16132493