Learning Depth from Focus with Multi-Candidate Estimation and Proximal Refinement
Abstract
1. Introduction
2. Related Work
2.1. Traditional Approaches
2.2. Deep Learning-Based Approaches
3. Proposed Framework
3.1. Feature Volume Construction
3.2. Depth Candidate Generation
3.3. Proximal Refinement
3.4. Gated Proximal Network
3.5. Loss Function
4. Results and Discussion
4.1. Experimental Setup
4.2. Ablation Study
4.3. Comparative Analysis
5. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liao, M.; Lu, F.; Zhou, D.; Zhang, S.; Li, W.; Yang, R. Dvi: Depth guided video inpainting for autonomous driving. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXI 16; Springer: Cham, Switzerland, 2020; pp. 1–17. [Google Scholar]
- Dong, X.; Garratt, M.A.; Anavatti, S.G.; Abbass, H.A. Towards real-time monocular depth estimation for robotics: A survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16940–16961. [Google Scholar] [CrossRef]
- Du, R.; Turner, E.; Dzitsiuk, M.; Prasso, L.; Duarte, I.; Dourgarian, J.; Afonso, J.; Pascoal, J.; Gladstone, J.; Cruces, N.; et al. DepthLab: Real-time 3D interaction with depth maps for mobile augmented reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology; Association for Computing Machinery: New York, NY, USA, 2020; pp. 829–843. [Google Scholar]
- Park, K.; Kim, S.; Sohn, K. High-precision depth estimation with the 3d lidar and stereo fusion. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2018; pp. 2156–2163. [Google Scholar]
- Cui, Y.; Schuon, S.; Chan, D.; Thrun, S.; Theobalt, C. 3D shape scanning with a time-of-flight camera. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2010; pp. 1173–1180. [Google Scholar]
- Griffin, B.A.; Corso, J.J. Depth from camera motion and object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 1397–1406. [Google Scholar]
- Tosi, F.; Bartolomei, L.; Poggi, M. A survey on deep stereo matching in the twenties. Int. J. Comput. Vis. 2025, 133, 4245–4276. [Google Scholar] [CrossRef]
- Zheng, Z.; Feng, S.; Chen, C.; Qu, Y. Depth from Focus in 3D Measurement: An Overview. IEEE Trans. Instrum. Meas. 2025, 74, 5034338. [Google Scholar] [CrossRef]
- Nayar, S.K.; Nakagawa, Y. Shape from focus. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 824–831. [Google Scholar] [CrossRef]
- Ali, U.; Mahmood, M.T. Robust focus volume regularization in shape from focus. IEEE Trans. Image Process. 2021, 30, 7215–7227. [Google Scholar] [CrossRef]
- Hazirbas, C.; Soyer, S.G.; Staab, M.C.; Leal-Taixé, L.; Cremers, D. Deep depth from focus. In Proceedings of the Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Revised Selected Papers, Part III 14; Springer: Cham, Switzerland, 2019; pp. 525–541. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 66–75. [Google Scholar]
- Wang, N.H.; Wang, R.; Liu, Y.L.; Huang, Y.H.; Chang, Y.L.; Chen, C.P.; Jou, K. Bridging unsupervised and supervised depth from focus via all-in-focus supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2021; pp. 12621–12631. [Google Scholar]
- Yang, F.; Huang, X.; Zhou, Z. Deep depth from focus with differential focus volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 12642–12651. [Google Scholar]
- Zhang, K.; Zuo, W.; Gu, S.; Zhang, L. Learning deep CNN denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2017; pp. 3929–3938. [Google Scholar]
- Rick Chang, J.; Li, C.L.; Poczos, B.; Vijaya Kumar, B.; Sankaranarayanan, A.C. One network to solve them all–solving linear inverse problems using deep projection models. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 5888–5897. [Google Scholar]
- Metzler, C.A.; Maleki, A.; Baraniuk, R.G. From denoising to compressed sensing. IEEE Trans. Inf. Theory 2016, 62, 5117–5144. [Google Scholar] [CrossRef]
- Li, T.; Yan, Q.; Zou, Q.; Dai, Q. Gates-controlled deep unfolding network for image compressed sensing. IEEE Trans. Comput. Imaging 2024, 10, 103–114. [Google Scholar] [CrossRef]
- Dogan, R.O.; Dogan, H.; Cal, S. From Handcrafted Focus Measurement Operators to Deep Learning: A Comprehensive Review of Shape from Focus Strategies. Arch. Comput. Methods Eng. 2026, 33, 4609–4623. [Google Scholar] [CrossRef]
- Krotkov, E. Focusing. Int. J. Comput. Vis. 1988, 1, 223–237. [Google Scholar] [CrossRef]
- Pertuz, S.; Puig, D.; Garcia, M.A. Analysis of focus measure operators for shape-from-focus. Pattern Recognit. 2013, 46, 1415–1432. [Google Scholar] [CrossRef]
- Subbarao, M.; Choi, T. Accurate recovery of three-dimensional shape from image focus. IEEE Trans. Pattern Anal. Mach. Intell. 1995, 17, 266–274. [Google Scholar] [CrossRef]
- Mahmood, M.T.; Choi, T.S. Nonlinear approach for enhancement of image focus volume in shape from focus. IEEE Trans. Image Process. 2012, 21, 2866–2873. [Google Scholar] [CrossRef] [PubMed]
- Jeon, H.G.; Surh, J.; Im, S.; Kweon, I.S. Ring Difference Filter for Fast and Noise Robust Depth From Focus. IEEE Trans. Image Process. 2019, 29, 1045–1060. [Google Scholar] [CrossRef] [PubMed]
- Ali, U.; Lee, I.H.; Mahmood, M.T. Guided image filtering in shape-from-focus: A comparative analysis. Pattern Recognit. 2021, 111, 107670. [Google Scholar] [CrossRef]
- Moeller, M.; Benning, M.; Schönlieb, C.; Cremers, D. Variational depth from focus reconstruction. IEEE Trans. Image Process. 2015, 24, 5369–5378. [Google Scholar] [CrossRef]
- Li, Y.; Li, Z.; Zheng, C.; Wu, S. Adaptive weighted guided image filtering for depth enhancement in shape-from-focus. Pattern Recognit. 2022, 131, 108900. [Google Scholar] [CrossRef]
- Danismaz, S.; Dogan, R.O.; Dogan, H. Two-phase deep learning method for image fusion-based extended depth of focus. J. Supercomput. 2025, 81, 1298. [Google Scholar] [CrossRef]
- Ashfaq, K.; Mahmood, M.T. A dual-stage focus measure for vector-valued images in shape from focus. Pattern Recognit. 2026, 170, 112112. [Google Scholar] [CrossRef]
- Ashfaq, K.; Mahmood, M.T. Depth from focus using directional spherical difference filter and Vector to Scalar Fusion. J. Vis. Commun. Image Represent. 2026, 117, 104794. [Google Scholar] [CrossRef]
- Lu, Y.; Milliron, G.; Slagter, J.; Lu, G. Self-supervised single-image depth estimation from focus and defocus clues. IEEE Robot. Autom. Lett. 2021, 6, 6281–6288. [Google Scholar] [CrossRef]
- Yang, X.; Fu, Q.; Elhoseiny, M.; Heidrich, W. Aberration-aware depth-from-focus. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 47, 7268–7278. [Google Scholar] [CrossRef] [PubMed]
- Fujimura, Y.; Iiyama, M.; Funatomi, T.; Mukaigawa, Y. Deep depth from focal stack with defocus model for camera-setting invariance. Int. J. Comput. Vis. 2024, 132, 1970–1985. [Google Scholar] [CrossRef]
- Jiang, C.; Lin, M.; Zhang, C.; Wang, Z.; Yu, L. Learning Depth from Focus with Event Focal Stack. IEEE Sens. J. 2024, 25, 1950–1958. [Google Scholar] [CrossRef]
- Won, C.; Jeon, H.G. Learning depth from focus in the wild. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 1–18. [Google Scholar]
- Xie, X.; Qingyan, J.; Chen, D.; Guo, B.; Li, P.; Zhou, S. StackMFF: End-to-end multi-focus image stack fusion network. Appl. Intell. 2025, 55, 503. [Google Scholar] [CrossRef]
- Kang, X.; Han, F.; Fayjie, A.R.; Vandewalle, P.; Khoshelham, K.; Gong, D. FocDepthFormer: Transformer with Latent LSTM for Depth Estimation from Focal Stack. In Proceedings of the Australasian Joint Conference on Artificial Intelligence; Springer: Singapore, 2024; pp. 273–290. [Google Scholar]
- Ganj, A.; Su, H.; Guo, T. HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); IEEE: New York, NY, USA, 2025; pp. 973–982. [Google Scholar]
- Ashfaq, K.; Mahmood, M.T. Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network. Int. J. Comput. Vis. 2026, 134, 115. [Google Scholar] [CrossRef]
- Yang, H.; Liu, Z.; Liu, W.; Wang, H.; Zhang, Y.; Wang, H. Graph-MDETR: A graph-guided Mamba-DETR network for UAV catenary support components detection in electrified railways. IEEE Trans. Intell. Transp. Syst. 2026, 27, 6319–6332. [Google Scholar] [CrossRef]
- Chen, Z.; You, K.; Yang, J.; Chen, L.; Li, F.; Feng, Z.; Jia, L. A sparse-to-dense guided fusion framework for three-dimensional object detection in railway environments. Eng. Appl. Artif. Intell. 2026, 178, 115095. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 4040–4048. [Google Scholar]
- Scharstein, D.; Hirschmüller, H.; Kitajima, Y.; Krathwohl, G.; Nešić, N.; Wang, X.; Westling, P. High-resolution stereo datasets with subpixel-accurate ground truth. In Proceedings of the Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, 2–5 September 2014; Proceedings 36; Springer: Cham, Switzerland, 2014; pp. 31–42. [Google Scholar]
- Honauer, K.; Johannsen, O.; Kondermann, D.; Goldluecke, B. A dataset and evaluation methodology for depth estimation on 4D light fields. In Proceedings of the Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Revised Selected Papers, Part III 13; Springer: Cham, Switzerland, 2017; pp. 19–34. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (accessed on 7 June 2026).
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]









| Setting | MAE | RMS | logRMS | AbsRel | Acc_1 | Acc_2 | Acc_3 | Corr |
|---|---|---|---|---|---|---|---|---|
| Iteration 1 | 1.525 | 2.952 | 0.081 | 0.040 | 98.89 | 99.62 | 99.77 | 0.996 |
| Iteration 3 | 1.475 | 2.867 | 0.072 | 0.036 | 99.11 | 99.74 | 99.85 | 0.997 |
| Iteration 5 | 1.469 | 2.849 | 0.069 | 0.036 | 99.18 | 99.77 | 99.87 | 0.997 |
| Refined | 1.460 | 2.842 | 0.066 | 0.035 | 99.28 | 99.81 | 99.89 | 0.997 |
| Dataset | MAE | RMS | logRMS | AbsRel | Acc_1 | Acc_2 | Acc_3 | Corr |
|---|---|---|---|---|---|---|---|---|
| MB | 23.59 | 29.94 | 0.60 | 1.42 | 4.92 | 37.32 | 93.11 | 0.76 |
| HCI | 0.07 | 0.16 | 0.06 | 0.03 | 97.78 | 99.53 | 99.91 | 0.97 |
| Method | MAE | RMS | logRMS | AbsRel | Acc_1 | Acc_2 | Acc_3 | Corr |
|---|---|---|---|---|---|---|---|---|
| RFVR | 11.89 | 23.63 | 0.79 | 1.55 | 72.79 | 80.81 | 84.60 | 0.73 |
| AiFDNet | 6.81 | 13.14 | 0.59 | 0.73 | 85.43 | 87.67 | 88.87 | 0.93 |
| DFV-FV | 6.33 | 12.10 | 0.57 | 0.89 | 85.09 | 87.60 | 89.52 | 0.95 |
| DFV-Diff | 5.51 | 10.65 | 0.53 | 0.62 | 86.18 | 88.09 | 89.93 | 0.97 |
| DWild | 5.54 | 10.44 | 0.53 | 0.61 | 86.35 | 88.21 | 89.84 | 0.97 |
| Ours | 1.46 | 2.84 | 0.07 | 0.04 | 99.28 | 99.81 | 99.89 | 0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Mahmood, M.T. Learning Depth from Focus with Multi-Candidate Estimation and Proximal Refinement. Electronics 2026, 15, 2548. https://doi.org/10.3390/electronics15122548
Mahmood MT. Learning Depth from Focus with Multi-Candidate Estimation and Proximal Refinement. Electronics. 2026; 15(12):2548. https://doi.org/10.3390/electronics15122548
Chicago/Turabian StyleMahmood, Muhammad Tariq. 2026. "Learning Depth from Focus with Multi-Candidate Estimation and Proximal Refinement" Electronics 15, no. 12: 2548. https://doi.org/10.3390/electronics15122548
APA StyleMahmood, M. T. (2026). Learning Depth from Focus with Multi-Candidate Estimation and Proximal Refinement. Electronics, 15(12), 2548. https://doi.org/10.3390/electronics15122548
