Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System
Abstract
1. Introduction
- A unified RGB-D camera calibration framework has been established to solve the calibration problem of RGB-D cameras. It can calibrate different types of RGB-D cameras, such as monocular and binocular cameras, in a unified way.
- A camera calibration method is proposed based on the EM algorithm, which simultaneously calculates hardware parameters and lens distortion parameters. This method can efficiently improve the accuracy of camera calibration.
- The depth data is calibrated under a motion capture system, transferring the lens’s straight-line depth to spatial depth, which provides a new solution for calibration of depth data.
2. Related Work
2.1. Traditional Universal Camera Calibration Methods
2.2. Learning-Based Camera Calibration Methods
2.3. RGB-D Camera Calibration
3. Method
3.1. Equipment and Data
3.2. EM Algorithm for RGB Camera Calibration
3.2.1. Expectation Step (E-Step)
3.2.2. Maximization Step (M-Step)
3.3. Depth Correction
4. Results and Discussion
4.1. Evaluation Criteria
4.2. Pixel Point Calibration for RGB Image
4.3. Initialization of Latent Variable Parameters
4.4. Depth Calibration
4.5. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DLT | Direct Linear Transform |
| BA | Bundle Adjustment |
| EM | Expectation–Maximization |
| LM | Levenberg–Marquardt |
| LS | Least Squares |
| RGB-D | Red–Green–Blue and Depth |
| ToF | Time of Flight |
| TRF | Trust Region Reflection |
References
- Wu, Y.; Wang, Y.; Zhang, S.; Ogai, H. Deep 3D object detection networks using LiDAR data: A review. IEEE Sens. J. 2020, 21, 1152–1171. [Google Scholar] [CrossRef]
- Liu, F.; Chen, D.; Zhou, J.; Xu, F. A review of driver fatigue detection and its advances on the use of RGB-D camera and deep learning. Eng. Appl. Artif. Intell. 2022, 116, 105399. [Google Scholar] [CrossRef]
- Jiao, Y.; Jie, Z.; Chen, S.; Chen, J.; Ma, L.; Jiang, Y.G. MSMD Fusion: Fusing LiDAR and camera at multiple scales with multi-depth seeds for 3D object detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: New York, NY, USA, 2023; pp. 21643–21652. [Google Scholar]
- Poggi, M.; Tosi, F.; Batsos, K.; Mordohai, P.; Mattoccia, S. On the synergies between machine learning and binocular stereo for depth estimation from images: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5314–5334. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Yang, X.; Yu, Y.; Jia, W.; Chu, Z.; Guo, Y. Depth estimation by combining binocular stereo and monocular structured-light. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 1746–1755. [Google Scholar]
- Li, Y.; Liu, X.; Dong, W.; Zhou, H.; Bao, H.; Zhang, G.; Zhang, Y.; Cui, Z. DeltaR: Depth estimation from a lightweight ToF sensor and RGB image. In Proceedings of the Computer Vision—ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 619–636. [Google Scholar]
- Barreto, J.; Roquette, J.; Sturm, P.; Fonseca, F. Automatic camera calibration applied to medical endoscopy. In Proceedings of the BMVC 2009—20th British Machine Vision Conference, London, UK, 7–10 September 2009; HAL: Milwaukee, WI, USA, 2009; pp. 1–10. [Google Scholar]
- Yadav, N.K.; Saraswat, M. A novel fuzzy clustering based method for image segmentation in RGB-D images. Eng. Appl. Artif. Intell. 2022, 111, 104709. [Google Scholar] [CrossRef]
- Lin, J.; Gu, Y.; Du, G.; Qu, G.; Chen, X.; Zhang, Y.; Gao, S.; Liu, Z.; Gunasekaran, N. 2D/3D image morphing technology from traditional to modern: A survey. Inf. Fusion 2024, 117, 102913. [Google Scholar] [CrossRef]
- Peng, Y.; Yang, M.; Zhao, G.; Cao, G. Binocular-vision-based structure from motion for 3-D reconstruction of plants. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8019505. [Google Scholar] [CrossRef]
- Massaro, G. Assessing the 3D resolution of refocused correlation plenoptic images using a general-purpose image quality estimator. Eur. Phys. J. Plus 2024, 139, 727. [Google Scholar] [CrossRef]
- Castaneda, V.; Mateus, D.; Navab, N. SLAM combining ToF and high-resolution cameras. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV), Kona, HI, USA, 5–7 January 2011; IEEE: New York, NY, USA, 2011; pp. 672–678. [Google Scholar]
- Endres, F.; Hess, J.; Sturm, J.; Cremers, D.; Burgard, W. 3-D mapping with an RGB-D camera. IEEE Trans. Robot. 2013, 30, 177–187. [Google Scholar] [CrossRef]
- Song, L.; Wu, W.; Guo, J.; Li, X. Survey on camera calibration technique. In Proceedings of the 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 26–27 August 2013; IEEE: New York, NY, USA, 2013; Volume 2, pp. 389–392. [Google Scholar]
- Chen, D.; Huang, T.; Song, Z.; Deng, S.; Jia, T. AGG-Net: Attention-guided gated-convolutional network for depth image completion. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; IEEE: New York, NY, USA, 2023; pp. 8853–8862. [Google Scholar]
- Li, B.; Xu, Z.; Gao, F.; Cao, Y.; Dong, Q. 3D reconstruction of high reflective welding surface based on binocular structured light stereo vision. Machines 2022, 10, 159. [Google Scholar] [CrossRef]
- Wu, L.; Zhu, B. Binocular stereovision camera calibration. In Proceedings of the 2015 IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 2–5 August 2015; IEEE: New York, NY, USA, 2015; pp. 2638–2642. [Google Scholar]
- Wang, T.L.; Ao, L.; Zheng, J.; Sun, Z.B. Reconstructing Depth Images for Time-of-Flight Cameras Based on Second-Order Correlation Functions. Photonics 2023, 10, 1223. [Google Scholar] [CrossRef]
- Qiao, X.; Poggi, M.; Deng, P.; Wei, H.; Ge, C.; Mattoccia, S. RGB-guided ToF imaging system: A survey of deep learning-based methods. Int. J. Comput. Vis. 2024, 132, 4954–4991. [Google Scholar] [CrossRef]
- Cai, Z.; Han, J.; Liu, L.; Shao, L. RGB-D datasets using Microsoft Kinect or similar sensors: A survey. Multimed. Tools Appl. 2017, 76, 4313–4355. [Google Scholar] [CrossRef]
- Liu, Q.; Sun, X.; Peng, Y. A Distortion Image Correction Method for Wide-Angle Cameras Based on Track Visual Detection. Photonics 2025, 12, 767. [Google Scholar] [CrossRef]
- Yin, Y.; Zhu, H.; Yang, P.; Yang, Z.; Liu, K.; Fu, H. High-precision and rapid binocular camera calibration method using a single image per camera. Opt. Express 2022, 30, 18781–18799. [Google Scholar] [CrossRef]
- Grossberg, M.D.; Nayar, S.K. A general imaging model and a method for finding its parameters. In Proceedings of the Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; IEEE: New York, NY, USA, 2001; Volume 2, pp. 108–115. [Google Scholar]
- Sturm, P.; Ramalingam, S. A generic concept for camera calibration. In Proceedings of the 8th European Conference on Computer Vision (ECCV ’04), Prague, Czech Republic, 11–14 May 2004; HAL: Milwaukee, WI, USA, 2004; pp. 1–13. [Google Scholar]
- Yu, J.; McMillan, L. General linear cameras. In Proceedings of the Computer Vision—ECCV 2004, Prague, Czech Republic, 11–14 May 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 14–27. [Google Scholar]
- Strobl, K.H.; Hirzinger, G. More accurate pinhole camera calibration with imperfect planar target. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; IEEE: New York, NY, USA, 2011; pp. 1068–1075. [Google Scholar]
- Bielecki, J.; Wojcik-Gargula, A.; Wiacek, U.; Scholz, M.; Igielski, A.; Drozdowicz, K.; Woznicka, U. A neutron pinhole camera for PF-24 source: Conceptual design and optimization. Eur. Phys. J. Plus 2015, 130, 145. [Google Scholar] [CrossRef][Green Version]
- Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 22, 1330–1334. [Google Scholar] [CrossRef]
- Urban, S.; Leitloff, J.; Hinz, S. Improved wide-angle, fisheye and omnidirectional camera calibration. ISPRS J. Photogramm. Remote Sens. 2015, 108, 72–79. [Google Scholar] [CrossRef]
- Claus, D.; Fitzgibbon, A.W. A rational function lens distortion model for general cameras. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: New York, NY, USA, 2005; Volume 1, pp. 213–219. [Google Scholar]
- Genovese, K. Single-image camera calibration with model-free distortion correction. Opt. Lasers Eng. 2024, 181, 108348. [Google Scholar] [CrossRef]
- Zhu, H.; Li, Y.; Liu, X.; Yin, X.; Shao, Y.; Qian, Y.; Tan, J. Camera calibration from very few images based on soft constraint optimization. J. Frankl. Inst. 2020, 357, 2561–2584. [Google Scholar] [CrossRef]
- Bergamasco, F.; Cosmo, L.; Gasparetto, A.; Albarelli, A.; Torsello, A. Parameter-free lens distortion calibration of central cameras. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 3847–3855. [Google Scholar]
- Lopez, M.; Mari, R.; Gargallo, P.; Kuang, Y.; Gonzalez-Jimenez, J.; Haro, G. Deep single image camera calibration with radial distortion. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, NY, USA, 2019; pp. 11817–11825. [Google Scholar]
- Lin, J.; Zhao, M.; Yin, G.; Zhou, H.; Hudoyberdi, T.; Jiang, B. A method for depth camera calibration based on motion capture system. In Proceedings of the 2023 IEEE International Conference on the Cognitive Computing and Complex Data (ICCD), Huaian, China, 21–22 October 2023; IEEE: New York, NY, USA, 2023; pp. 242–246. [Google Scholar]
- Ramalingam, S.; Sturm, P. A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1309–1319. [Google Scholar] [CrossRef] [PubMed]
- Rong, J.; Huang, S.; Shang, Z.; Ying, X. Radial lens distortion correction using convolutional neural networks trained with synthesized images. In Proceedings of the Computer Vision—ACCV 2016, Taipei, Taiwan, 20–24 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 35–49. [Google Scholar]
- Zhao, K.; Liao, K.; Lin, C.; Liu, M.; Zhao, Y. Joint distortion rectification and super-resolution for self-driving scene perception. Neurocomputing 2021, 435, 176–185. [Google Scholar] [CrossRef]
- Liao, K.; Lin, C.; Zhao, Y. A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 2021, 30, 3362–3375. [Google Scholar] [CrossRef]
- Ray, L.S.S.; Zhou, B.; Krupp, L.; Suh, S.; Lukowicz, P. SynthCal: A synthetic benchmarking pipeline to compare camera calibration algorithms. In Proceedings of the Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, 1–5 December 2024; ACM: New York, NY, USA, 2023. [Google Scholar]
- Yang, L.; Wang, B.; Zhang, R.; Zhou, H.; Wang, R. Analysis on location accuracy for the binocular stereo vision system. IEEE Photonics J. 2017, 10, 7800316. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, D.; Wu, J.; Huang, M.; Weng, Y. Calibration of RGB-D camera using depth correction model. J. Phys. Conf. Ser. 2022, 2203, 012032. [Google Scholar] [CrossRef]
- Basso, F.; Menegatti, E.; Pretto, A. Robust intrinsic and extrinsic calibration of RGB-D cameras. IEEE Trans. Robot. 2018, 34, 1315–1332. [Google Scholar] [CrossRef]
- Ramírez-Hernández, L.R.; Rodríguez-Quinoñez, J.C.; Castro-Toscano, M.J.; Hernández-Balbuena, D.; Flores-Fuentes, W.; Rascón-Carmona, R.; Lindner, L.; Sergiyenko, O. Improve three-dimensional point localization accuracy in stereo vision systems using a novel camera calibration method. Int. J. Adv. Robot. Syst. 2020, 17, 1729881419896717. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, X.; Xu, L.; Ding, F. Expectation–maximization algorithm for bilinear systems using the Rauch–Tung–Striebel smoother. Automatica 2022, 142, 110365. [Google Scholar] [CrossRef]
- Abdel-Aziz, Y.I.; Karara, H.M. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogramm. Eng. Remote Sens. 2015, 81, 103–107. [Google Scholar] [CrossRef]
- Zheng, Y.; Peng, S. A practical roadside camera calibration method based on least squares optimization. IEEE Trans. Intell. Transp. Syst. 2013, 15, 831–843. [Google Scholar] [CrossRef]
- Tian, S.X.; Lu, S.; Liu, Z.M. Levenberg–Marquardt algorithm based nonlinear optimization of camera calibration for relative measurement. In Proceedings of the 2015 34th Chinese Control Conference (CCC), Hangzhou, China, 28–30 July 2015; IEEE: New York, NY, USA, 2015; pp. 4868–4872. [Google Scholar]
- Le, T.M.; Fatahi, B.; Khabbaz, H.; Sun, W. Numerical optimization applying trust-region reflective least squares algorithm with constraints to optimize the non-linear creep parameters of soft soil. Appl. Math. Model. 2017, 41, 236–256. [Google Scholar] [CrossRef]
- Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle Adjustment—A Modern Synthesis. In Proceedings of the Vision Algorithms: Theory and Practice, International Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 1999; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 1999; Volume 1883, pp. 298–372. [Google Scholar]
- Yuhai, O.; Cho, Y.; Choi, A.; Mun, J.H. Enhanced Three-Axis Frame and Wand-Based Multi-Camera Calibration Method Using Adaptive Iteratively Reweighted Least Squares and Comprehensive Error Integration. Photonics 2024, 11, 867. [Google Scholar] [CrossRef]
- Cooper, M.A.; Raquet, J.F.; Patton, R. Range information characterization of the hokuyo ust-20lx lidar sensor. Photonics 2018, 5, 12. [Google Scholar] [CrossRef]













| Parameter | Intel RealSense D455 (Cam 1) | Orbbec Femto Bolt (Cam 2) |
|---|---|---|
| 387.3/388.7 | 1121.29/1120.45 | |
| 321.2/243.7 | 950.894/547.579 | |
| Width/Height | 640/480 | 1920/1080 |
| 0.0063/−0.0040/0.0028 | 0.0790/−0.0111/0.0480 | |
| Depth range | 0.6–6 m | 0.25–5.46 m |
| Algorithm: Expectation Step for Intrinsic Parameter Estimation |
|---|
| Input: 3D points , 2D pixel points , distortion parameters , image width w and height h. |
| Output: Intrinsic parameters . |
| Step 1: Initialization.
The intrinsic parameters are initialized as
|
| Algorithm: Maximization Step for Distortion Parameter Estimation |
|---|
| Input: 3D points , 2D pixel points , intrinsic parameters , image width w and height h. |
| Output: Distortion parameters and . |
| Step 1: Radial distance computation.
For each correspondence , the radial distance is computed as
|
| Method | Cam 1 (Pixel) | Cam 2 (Pixel) | ||||||
|---|---|---|---|---|---|---|---|---|
| <2 | <4 | <6 | >6 | <2 | <4 | <6 | >6 | |
| DLT | 12 | 26 | 27 | 0 | 3 | 15 | 22 | 5 |
| LS | 8 | 23 | 26 | 1 | 1 | 13 | 20 | 7 |
| LM | 4 | 21 | 24 | 3 | 0 | 12 | 16 | 11 |
| TRF | 9 | 22 | 26 | 1 | 1 | 13 | 17 | 10 |
| EM | 24 | 27 | 27 | 0 | 6 | 22 | 26 | 1 |
| Method | RMSE (Pixel) | Speed (s) | ||
|---|---|---|---|---|
| Cam 1 | Cam 2 | Cam 1 | Cam 2 | |
| DLT | 1.26 | 4.02 | 0.001 | 0.001 |
| LS | 1.47 | 4.73 | 1.494 | 0.083 |
| LM | 1.70 | 5.02 | 1.802 | 1.871 |
| TRF | 1.50 | 4.72 | 1.011 | 0.370 |
| EM (Ours) | 0.89 | 3.22 | 0.007 | 0.007 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lin, J.; Du, G.; Zhang, Y.; Zhao, Y.; Xie, Q.; Yao, J.; Khadka, A. Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System. Photonics 2026, 13, 183. https://doi.org/10.3390/photonics13020183
Lin J, Du G, Zhang Y, Zhao Y, Xie Q, Yao J, Khadka A. Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System. Photonics. 2026; 13(2):183. https://doi.org/10.3390/photonics13020183
Chicago/Turabian StyleLin, Jianchu, Guangxiao Du, Yugui Zhang, Yiyan Zhao, Qian Xie, Jian Yao, and Ashim Khadka. 2026. "Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System" Photonics 13, no. 2: 183. https://doi.org/10.3390/photonics13020183
APA StyleLin, J., Du, G., Zhang, Y., Zhao, Y., Xie, Q., Yao, J., & Khadka, A. (2026). Expectation–Maximization Method for RGB-D Camera Calibration with Motion Capture System. Photonics, 13(2), 183. https://doi.org/10.3390/photonics13020183

