HFF-Net: An Efficient Hierarchical Feature Fusion Network for High-Quality Depth Completion
Abstract
1. Introduction
2. Related Work
2.1. Image-Based Depth Prediction
2.1.1. Monocular Depth Estimation
2.1.2. Binocular/Multi-View Stereo Matching
2.2. Learning-Based Depth Completion
3. Methodology
3.1. Generating Multi-Scale Feature Maps
3.2. Hierarchical Depth Completion Architecture
3.3. MLSPN-Based Depth Map Refinement
3.4. Training Loss
4. Experiments and Analysis
4.1. Implementation Details
4.2. Quantitative Evaluation
4.3. Discussion
4.3.1. Ablation Study
4.3.2. Comparison with Different-Level Sparsity Measurements
4.3.3. Zero-Shot Generalization Ability
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, Y.; Li, B.; Zhang, G.; Liu, Q.; Gao, T.; Dai, Y. LRRU: Long-short Range Recurrent Updating Networks for Depth Completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 9422–9432. [Google Scholar]
- Chen, Z.; Li, W.; Cui, Z.; Zhang, Y. Surface depth estimation from multi-view stereo satellite images with distribution contrast network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 17837–17845. [Google Scholar] [CrossRef]
- Hu, P.; Yang, B.; Dong, Z.; Yuan, P.; Huang, R.; Fan, H.; Sun, X. Towards reconstructing 3D buildings from ALS data based on gestalt laws. Remote Sens. 2018, 10, 1127. [Google Scholar] [CrossRef]
- Han, X.; Liu, C.; Zhou, Y.; Tan, K.; Dong, Z.; Yang, B. WHU-Urban3D: An urban scene LiDAR point cloud dataset for semantic instance segmentation. ISPRS J. Photogramm. Remote Sens. 2024, 209, 500–513. [Google Scholar] [CrossRef]
- Chen, C.; Jin, A.; Wang, Z.; Zheng, Y.; Yang, B.; Zhou, J.; Xu, Y.; Tu, Z. SGSR-Net: Structure Semantics Guided LiDAR Super-Resolution Network for Indoor LiDAR SLAM. IEEE Trans. Multimed. 2023, 26, 1842–1854. [Google Scholar] [CrossRef]
- Yan, Z.; Wang, K.; Li, X.; Zhang, Z.; Li, G.; Li, J.; Yang, J. Learning complementary correlations for depth super-resolution with incomplete data in real world. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5616–5626. [Google Scholar] [CrossRef] [PubMed]
- Thiel, K.; Wehr, A. Performance capabilities of laser scanners–an overview and measurement principle analysis. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2004, 36, 14–18. [Google Scholar]
- Uhrig, J.; Schneider, N.; Schneider, L.; Franke, U.; Brox, T.; Geiger, A. Sparsity invariant cnns. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 11–20. [Google Scholar]
- Yin, W.; Liu, Y.; Shen, C. Virtual normal: Enforcing geometric constraints for accurate and robust depth prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7282–7295. [Google Scholar] [CrossRef]
- Yin, W.; Zhang, C.; Chen, H.; Cai, Z.; Yu, G.; Wang, K.; Chen, X.; Shen, C. Metric3d: Towards zero-shot metric 3d prediction from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 9043–9053. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 767–783. [Google Scholar]
- Chang, J.; Chen, Y. Pyramid Stereo Matching Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 5410–5418. [Google Scholar]
- Jiang, L.; Wang, F.; Zhang, W.; Li, P.; You, H.; Xiang, Y. Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 4936–4948. [Google Scholar] [CrossRef]
- Zuo, Y.; Deng, J. OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations. arXiv 2024, arXiv:2406.11711. [Google Scholar]
- Min, D.; Lu, J.; Do, M.N. Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 2011, 21, 1176–1190. [Google Scholar] [CrossRef]
- Zhang, Y.; Feng, Y.; Liu, X.; Zhai, D.; Ji, X.; Wang, H.; Dai, Q. Color-guided depth image recovery with adaptive data fidelity and transferred graph Laplacian regularization. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 320–333. [Google Scholar] [CrossRef]
- Zhang, Y.; Guo, X.; Poggi, M.; Zhu, Z.; Huang, G.; Mattoccia, S. Completionformer: Depth completion with convolutions and vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18527–18536. [Google Scholar]
- Long, C.; Zhang, W.; Chen, Z.; Wang, H.; Liu, Y.; Tong, P.; Cao, Z.; Dong, Z.; Yang, B. SparseDC: Depth Completion from sparse and non-uniform inputs. Inf. Fusion 2024, 110, 102470. [Google Scholar] [CrossRef]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part V 12. Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Cheng, X.; Wang, P.; Yang, R. Learning depth with convolutional spatial propagation network. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2361–2379. [Google Scholar] [CrossRef] [PubMed]
- Park, J.; Joo, K.; Hu, Z.; Liu, C.-K.; Kweon, I.S. Non-local spatial propagation network for depth completion. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 120–136. [Google Scholar]
- Hu, M.; Wang, S.; Li, B.; Ning, S.; Fan, L.; Gong, X. Penet: Towards precise and efficient image guided depth completion. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 13656–13662. [Google Scholar]
- Tang, J.; Tian, F.-P.; An, B.; Li, J.; Tan, P. Bilateral Propagation Network for Depth Completion. arXiv 2024, arXiv:2403.11270. [Google Scholar]
- Cheng, X.; Wang, P.; Guan, C.; Yang, R. Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10615–10622. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Cheng, X.; Wang, P.; Yang, R. Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 103–119. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P. End-to-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Xu, Z.; Jiang, Y.; Wang, J.; Wang, Y. A Dual Branch Multi-scale Stereo Matching Network for High-resolution Satellite Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 949–964. [Google Scholar] [CrossRef]
- Wang, K.; Yan, Z.; Fan, J.; Li, J.; Yang, J. Learning Inverse Laplacian Pyramid for Progressive Depth Completion. arXiv 2025, arXiv:2502.07289. [Google Scholar]
- He, S.; Li, S.; Jiang, S.; Jiang, W. HMSM-Net: Hierarchical multi-scale matching network for disparity estimation of high-resolution satellite stereo images. ISPRS J. Photogramm. Remote Sens. 2022, 188, 314–330. [Google Scholar] [CrossRef]
- Gu, X.; Fan, Z.; Zhu, S.; Dai, Z.; Tan, F.; Tan, P. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2495–2504. [Google Scholar]
- Saxena, A.; Chung, S.; Ng, A. Learning depth from single monocular images. Adv. Neural Inf. Process. Syst. 2005, 18, 1161–1168. [Google Scholar]
- Liu, B.; Gould, S.; Koller, D. Single image depth estimation from predicted semantic labels. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1253–1260. [Google Scholar]
- Ladicky, L.; Shi, J.; Pollefeys, M. Pulling things out of perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 89–96. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 2014, 27, 2366–2374. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Wang, K.; Yan, Z.; Fan, J.; Zhu, W.; Li, X.; Li, J.; Yang, J. Dcdepth: Progressive monocular depth estimation in discrete cosine domain. Adv. Neural Inf. Process. Syst. 2024, 37, 64629–64648. [Google Scholar]
- Yang, L.; Kang, B.; Huang, Z.; Xu, X.; Feng, J.; Zhao, H. Depth anything: Unleashing the power of large-scale unlabeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 10371–10381. [Google Scholar]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Shen, T.; Fang, T.; Quan, L. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference. In Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 5525–5534. [Google Scholar]
- Wang, X.; Xu, G.; Jia, H.; Yang, X. Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching. arXiv 2024, arXiv:2403.00486. [Google Scholar]
- Zhang, S.; Wei, Z.; Xu, W.; Zhang, L.; Wang, Y.; Zhang, J.; Liu, J. Edge aware depth inference for large-scale aerial building multi-view stereo. ISPRS J. Photogramm. Remote Sens. 2024, 207, 27–42. [Google Scholar] [CrossRef]
- Zbontar, J.; Lecun, Y. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1592–1599. [Google Scholar]
- Hirschmuller, H.; Scharstein, D. Evaluation of Stereo Matching Costs on Images with Radiometric Differences. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 1582–1599. [Google Scholar] [CrossRef] [PubMed]
- Rho, K.; Ha, J.; Kim, Y. Guideformer: Transformers for image guided depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6250–6259. [Google Scholar]
- Liu, L.-K.; Chan, S.H.; Nguyen, T.Q. Depth reconstruction from sparse samples: Representation, algorithm, and sampling. IEEE Trans. Image Process. 2015, 24, 1983–1996. [Google Scholar] [CrossRef] [PubMed]
- Hawe, S.; Kleinsteuber, M.; Diepold, K. Dense disparity maps from sparse disparity measurements. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2126–2133. [Google Scholar]
- Ku, J.; Harakeh, A.; Waslander, S.L. In defense of classical image processing: Fast depth completion on the cpu. In Proceedings of the 2018 15th Conference on Computer and Robot Vision (CRV), Toronto, ON, Canada, 8–10 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 16–22. [Google Scholar]
- Yan, Z.; Wang, K.; Li, X.; Zhang, Z.; Li, J.; Yang, J. RigNet: Repetitive image guided network for depth completion. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XXVII. Springer: Berlin/Heidelberg, Germany, 2022; pp. 214–230. [Google Scholar]
- Eldesokey, A.; Felsberg, M.; Khan, F.S. Confidence propagation through cnns for guided sparse depth regression. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2423–2436. [Google Scholar] [CrossRef]
- Lu, K.; Barnes, N.; Anwar, S.; Zheng, L. From depth what can you see? Depth completion via auxiliary image reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11306–11315. [Google Scholar]
- Zhao, S.; Gong, M.; Fu, H.; Tao, D. Adaptive context-aware multi-modal network for depth completion. IEEE Trans. Image Process. 2021, 30, 5264–5276. [Google Scholar] [CrossRef]
- Liu, X.; Shao, X.; Wang, B.; Li, Y.; Wang, S. Graphcspn: Geometry-aware depth completion via dynamic gcns. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 90–107. [Google Scholar]
- Yan, Z.; Wang, K.; Li, X.; Zhang, Z.; Li, J.; Yang, J. Desnet: Decomposed scale-consistent network for unsupervised depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Montréal, QC, Canada, 8–10 August 2023; Volume 37, pp. 3109–3117. [Google Scholar]
- Ma, F.; Cavalheiro, G.V.; Karaman, S. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3288–3295. [Google Scholar]
- Tang, J.; Tian, F.-P.; Feng, W.; Li, J.; Tan, P. Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 2020, 30, 1116–1129. [Google Scholar] [CrossRef]
- Liu, L.; Song, X.; Sun, J.; Lyu, X.; Li, L.; Liu, Y.; Zhang, L. MFF-Net: Towards Efficient Monocular Depth Completion with Multi-Modal Feature Fusion. IEEE Robot. Autom. Lett. 2023, 8, 920–927. [Google Scholar] [CrossRef]
- Wang, Y.; Mao, Y.; Liu, Q.; Dai, Y. Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 1186–1198. [Google Scholar] [CrossRef]
- Yan, Z.; Li, X.; Zhang, Z.; Li, J.; Yang, J. RigNet++: Efficient Repetitive Image Guided Network for Depth Completion. arXiv 2023, arXiv:2309.00655. [Google Scholar]
- Liu, S.; De Mello, S.; Gu, J.; Zhong, G.; Yang, M.-H.; Kautz, J. Learning affinity via spatial propagation networks. Adv. Neural Inf. Process. Syst. 2017, 30, 1520–1530. [Google Scholar]
- Xu, Z.; Yin, H.; Yao, J. Deformable spatial propagation networks for depth completion. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Virtual, 25–28 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 913–917. [Google Scholar]
- Zhou, W.; Yan, X.; Liao, Y.; Lin, Y.; Huang, J.; Zhao, G.; Cui, S.; Li, Z. BEV@ DC: Bird’s-Eye View Assisted Training for Depth Completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9233–9242. [Google Scholar]
- Yan, Z.; Lin, Y.; Wang, K.; Zheng, Y.; Wang, Y.; Zhang, Z.; Li, J.; Yang, J. Tri-Perspective View Decomposition for Geometry-Aware Depth Completion. arXiv 2024, arXiv:2403.15008. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5693–5703. [Google Scholar]
- Zhang, Z. Microsoft kinect sensor and its effect. IEEE Multimed. 2012, 19, 4–10. [Google Scholar] [CrossRef]
- Guney, F.; Geiger, A. Displets: Resolving stereo ambiguities using object knowledge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4165–4175. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Nazir, D.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. SemAttNet: Toward Attention-Based Semantic Aware Guided Depth Completion. IEEE Access 2022, 10, 120781–120791. [Google Scholar] [CrossRef]
- Chen, H.; Yang, H.; Zhang, Y. Depth completion using geometry-aware embedding. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 8680–8686. [Google Scholar]
- Qiu, J.; Cui, Z.; Zhang, Y.; Zhang, X.; Liu, S.; Zeng, B.; Pollefeys, M. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3313–3322. [Google Scholar]
- Zhao, Y.; Bai, L.; Zhang, Z.; Huang, X. A surface geometry model for lidar depth completion. IEEE Robot. Autom. Lett. 2021, 6, 4457–4464. [Google Scholar] [CrossRef]
- Song, S.; Lichtenberg, S.P.; Xiao, J. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 567–576. [Google Scholar]










| Input | Setting | Output |
|---|---|---|
| or | or | |
| Conv0_1 | ||
| Conv0_2 | ||
| Conv1_1 | ||
| Conv2_1 | ||
| Conv3_1 | ||
| Conv4_1 | ||
| Conv5_1 | ||
| Dconv1_1 | Deconv add Conv4_1 | |
| Dconv2_1 | Deconv add Conv4_1 | |
| Dconv3_1 | Deconv add Conv4_1 | |
| Dconv4_1 | Deconv add Conv4_1 | |
| Dconv5_1 | Deconv add Conv4_1 |
| Datasets | Image Resolution | Data Splitting | Batch Size | Epochs |
|---|---|---|---|---|
| KITTI DC | 1226 × 370 pixels | 85,898, 1000, and 1000 image frames were used for model training, validation, and testing, respectively. | 3 | 30 |
| NYUv2 | 640 × 480 pixels | 45,205, 2379, and 654 image frames were used for model training, validation, and testing, respectively. | 12 | 100 |
| Methods | RMSE (mm) ↓ | MAE (mm) ↓ | iRMSE (1/km) ↓ | iMAE (1/km) ↓ |
|---|---|---|---|---|
| CSPN [26] | 1019.64 | 279.46 | 2.93 | 1.15 |
| SD2 [55] | 814.73 | 249.95 | 2.80 | 1.21 |
| GAENet [70] | 773.90 | 231.29 | 2.29 | 1.08 |
| DeepLiDAR [71] | 758.38 | 226.50 | 2.56 | 1.15 |
| ACMNet [52] | 744.91 | 206.09 | 2.08 | 0.90 |
| CSPN++ [24] | 743.69 | 209.28 | 2.07 | 0.90 |
| NLSPN [21] | 741.68 | 199.59 | 1.99 | 0.84 |
| GuideNet [56] | 736.24 | 218.83 | 2.25 | 0.99 |
| PENet [22] | 730.08 | 210.55 | 2.17 | 0.94 |
| GuideFormer [45] | 721.48 | 207.76 | 2.14 | 0.97 |
| CFormer [17] | 708.87 | 203.45 | 2.01 | 0.88 |
| LRRU [1] | 696.51 | 189.96 | 1.87 | 0.81 |
| HFF-Net | 694.90 | 201.54 | 1.95 | 0.88 |
| Methods | RMSE (m) ↓ | REL ↓ | (%) ↑ | (%) ↑ | (%) ↑ |
|---|---|---|---|---|---|
| SD2 [55] | 0.230 | 0.044 | 97.1 | 99.4 | 99.8 |
| CSPN [26] | 0.117 | 0.016 | 99.2 | 99.9 | 100.0 |
| CSPN++ [24] | 0.116 | - | - | - | - |
| DeepLiDAR [71] | 0.115 | 0.022 | 99.3 | 99.9 | 100.0 |
| ACMNet [52] | 0.105 | 0.015 | 99.4 | 99.9 | 100.0 |
| GuideNet [56] | 0.101 | 0.015 | 99.5 | 99.9 | 100.0 |
| NLSPN [21] | 0.092 | 0.012 | 99.6 | 99.9 | 100.0 |
| LRRU [1] | 0.091 | 0.011 | 99.6 | 99.9 | 100.0 |
| CFormer [17] | 0.090 | 0.012 | 99.6 | 99.9 | 100.0 |
| HFF-Net | 0.093 | 0.013 | 99.6 | 99.9 | 100.0 |
| Models | Hierarchical Depth Completion Subnetwork | MLSPN-Based Depth Refinement Subnetwork | Results (mm) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Full-Scale | 1/2-Scale | 1/4-Scale | 1/8-Scale | RMSE ↓ | MAE ↓ | ||||
| Baseline | √ | 754.60 | 220.42 | ||||||
| Model-Net1 | √ | √ | 746.89 | 231.74 | |||||
| Model-Net2 | √ | √ | √ | 739.72 | 225.44 | ||||
| Model-Net3 | √ | √ | √ | √ | 747.03 | 240.62 | |||
| Model-Net4 | √ | √ | √ | √ | 733.05 | 206.91 | |||
| Model-Net5 | √ | √ | √ | √ | √ | 732.05 | 205.83 | ||
| HFF-Net | √ | √ | √ | √ | √ | √ | 726.21 | 200.64 | |
| Methods | Scanning Lines | RMSE (mm) ↓ | MAE (mm) ↓ |
|---|---|---|---|
| GuideNet [56] | 1-line | 20,432.87 | 14,620.19 |
| NLSPN [21] | 1-line | 14,005.34 | 9086.54 |
| CFormer [17] | 1-line | 16,079.99 | 11,719.06 |
| LRRU [1] | 1-line | 15,513.83 | 11,282.55 |
| HFF-Net | 1-line | 13,985.01 | 9079.81 |
| GuideNet [56] | 4-line | 15,259.46 | 10,224.12 |
| NLSPN [21] | 4-line | 6154.77 | 3416.86 |
| CFormer [17] | 4-line | 6907.37 | 4080.68 |
| LRRU [1] | 4-line | 7628.76 | 4204.29 |
| HFF-Net | 4-line | 6709.29 | 3847.34 |
| GuideNet [56] | 8-line | 7102.85 | 4548.64 |
| NLSPN [21] | 8-line | 3722.92 | 1697.49 |
| CFormer [17] | 8-line | 3639.23 | 1749.24 |
| LRRU [1] | 8-line | 3610.21 | 1588.76 |
| HFF-Net | 8-line | 3495.12 | 1730.93 |
| GuideNet [56] | 16-line | 3122.79 | 1292.43 |
| NLSPN [21] | 16-line | 1974.55 | 672.50 |
| CFormer [17] | 16-line | 2178.57 | 752.09 |
| LRRU [1] | 16-line | 1883.25 | 676.93 |
| HFF-Net | 16-line | 1835.00 | 654.21 |
| GuideNet [56] | 32-line | 1385.81 | 475.08 |
| NLSPN [21] | 32-line | 1192.62 | 347.92 |
| CFormer [17] | 32-line | 1380.82 | 400.03 |
| LRRU [1] | 32-line | 1125.11 | 353.77 |
| HFF-Net | 32-line | 1129.75 | 358.89 |
| GuideNet [56] | 64-line | 760.19 | 218.90 |
| NLSPN [21] | 64-line | 790.67 | 201.28 |
| CFormer [17] | 64-line | 821.74 | 211.38 |
| LRRU [1] | 64-line | 743.04 | 196.18 |
| HFF-Net | 64-line | 726.21 | 200.64 |
| Methods | Params (M) | Memory (GB) | RMSE (m) ↓ | REL ↓ | Time (s) |
|---|---|---|---|---|---|
| NLSPN [21] | 26.23 | 0.90 | 2.269 | 1.121 | 0.279 |
| CFormer [17] | 83.51 | 4.77 | 0.558 | 0.150 | 0.154 |
| HFF-Net | 8.95 | 7.08 | 0.490 | 0.109 | 0.168 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, Y.; Tian, M.; Li, Q.; Shan, W. HFF-Net: An Efficient Hierarchical Feature Fusion Network for High-Quality Depth Completion. ISPRS Int. J. Geo-Inf. 2025, 14, 412. https://doi.org/10.3390/ijgi14110412
Han Y, Tian M, Li Q, Shan W. HFF-Net: An Efficient Hierarchical Feature Fusion Network for High-Quality Depth Completion. ISPRS International Journal of Geo-Information. 2025; 14(11):412. https://doi.org/10.3390/ijgi14110412
Chicago/Turabian StyleHan, Yi, Mao Tian, Qiaosheng Li, and Wuyang Shan. 2025. "HFF-Net: An Efficient Hierarchical Feature Fusion Network for High-Quality Depth Completion" ISPRS International Journal of Geo-Information 14, no. 11: 412. https://doi.org/10.3390/ijgi14110412
APA StyleHan, Y., Tian, M., Li, Q., & Shan, W. (2025). HFF-Net: An Efficient Hierarchical Feature Fusion Network for High-Quality Depth Completion. ISPRS International Journal of Geo-Information, 14(11), 412. https://doi.org/10.3390/ijgi14110412
