3D Semantic Map Reconstruction for Orchard Environments Using Multi-Sensor Fusion
Abstract
1. Introduction
- Integrate data from LiDAR, cameras and IMU for real-time positioning and construction of 3d semantic maps to solve the efficiency and accuracy problems when only vision is used.
- The extraction of semantic information is divided into two stages to meet the complex and changeable environmental characteristics of the orchard.
- Use the Bayesian fusion method to further improve the accuracy of constructing semantic maps.
2. Materials and Methods
2.1. Data Collection and Processing
2.2. Methods
2.2.1. Open Vocabulary Detection and Semantic Segmentation Algorithm
2.2.2. LiDAR-Based Positioning and Point Cloud Map Construction
- Sequential ESIKF Updates: LiDAR measurements are processed first via point-to-plane residuals, followed by visual updates using sparse direct image alignment. This sequential approach avoids dimensional mismatch while maintaining theoretical equivalence to joint updates.
- Unified Voxel Map Representation: A single adaptive voxel hash map stores both geometric (LiDAR points, plane parameters) and visual (image patches) information, enabling efficient data association and memory management through ring-buffer sliding.
- Adaptive Robustness Features: The system incorporates several adaptive mechanisms to enhance robustness. First, plane priors derived from LiDAR points are utilized to improve the accuracy of visual alignment. Second, online exposure time estimation dynamically compensates for varying illumination conditions. Third, an on-demand raycasting strategy actively retrieves map points in scenarios where LiDAR measurements become sparse, ensuring continuous constraints for state estimation.
- Fault-Tolerant Design: The ESIKF naturally weights sensor measurements based on their noise characteristics. During sensor degradation (e.g., LiDAR in featureless environments or camera in low-light conditions), the system automatically relies on remaining reliable sensors, preventing catastrophic failure.
2.2.3. 3D Semantic Maps Construction and Postprocessing
3. Experiments and Results
3.1. Experiments Platform
3.2. Semantic Segmentation Performance Analysis
3.3. Evaluation of 3D Semantic Map Construction in a Orchard
- Occlusion issues: In natural orchard environments, dragon fruit are often obscured by dense branches and foliage. Some fruits have only a small portion of their surface exposed within the sensor’s field of view, preventing complete reconstruction in the 3D point cloud and leading to missed detections.
- Point cloud noise: Influenced by factors such as LiDAR measurement accuracy, multipath reflections, and edge effects, semantic point clouds inevitably contain outliers and areas with blurred boundaries. This noise interferes with density-based clustering algorithms, leading to adjacent fruits being incorrectly merged or individual fruits being excessively fragmented;
- Semantic segmentation errors: During the projection of 2D image segmentation results onto the 3D point cloud, imprecision in segmentation boundaries is transferred to point cloud annotations, affecting final clustering and counting outcomes.
3.4. Localization Accuracy Evaluation
4. Conclusions
5. Discussion and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Liu, T.; Kang, H.; Chen, C. ORB-Livox: A real-time dynamic system for fruit detection and localization. Comput. Electron. Agric. 2023, 209, 107834. [Google Scholar] [CrossRef]
- Yuan, Q.; Wang, P.; Luo, W.; Zhou, Y.; Chen, H.; Meng, Z. Simultaneous Localization and Mapping System for Agricultural Yield Estimation Based on Improved VINS-RGBD: A Case Study of a Strawberry Field. Agriculture 2024, 14, 784. [Google Scholar] [CrossRef]
- Kutyrev, A.; Khort, D.; Smirnov, I.; Zubina, V. UAV-based sustainable orchard management: Deep learning for apple detection and yield estimation. In E3S Web of Conferences; EDP Sciences: Les Ulis, France, 2025; Volume 614, p. 03021. [Google Scholar]
- Casado-García, A.; Heras, J.; Milella, A.; Marani, R. Semi-supervised deep learning and low-cost cameras for the semantic segmentation of natural images in viticulture. Precis. Agric. 2022, 23, 2001–2026. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, J.; Liu, C.; Li, Z.; Jiang, H.; Ma, Y.; Zhang, Y.; Wang, Z. Online Point Coverage Path Planning for Prior-Free Robotic Weeding Using Deep Reinforcement Learning. Authorea Prepr. 2025. Available online: https://www.techrxiv.org/doi/full/10.36227/techrxiv.175338547.70910319 (accessed on 13 January 2026).
- Rapado-Rincon, D.; Kootstra, G. Tree-SLAM: Semantic object SLAM for efficient mapping of individual trees in orchards. Smart Agric. Technol. 2025, 12, 101439. [Google Scholar] [CrossRef]
- Lei, J.; Prabhu, A.; Liu, X.; Cladera, F.; Mortazavi, M.; Ehsani, R.; Chaudhari, P.; Kumar, V. Spatio-Temporal Metric-Semantic Mapping for Persistent Orchard Monitoring: Method and Dataset. IEEE Robot. Autom. Lett. 2025, 10, 8610–8617. [Google Scholar] [CrossRef]
- Peng, C.; Roy, P.; Luby, J.; Isler, V. Semantic mapping of orchards. IFAC-PapersOnLine 2016, 49, 85–89. [Google Scholar] [CrossRef]
- Xiong, J.; Liang, J.; Zhuang, Y.; Hong, D.; Zheng, Z.; Liao, S.; Hu, W.; Yang, Z. Real-time localization and 3D semantic map reconstruction for unstructured citrus orchards. Comput. Electron. Agric. 2023, 213, 108217. [Google Scholar] [CrossRef]
- Nakaguchi, V.M.; Abeyrathna, R.R.D.; Liu, Z.; Noguchi, R.; Ahamed, T. Development of a Machine stereo vision-based autonomous navigation system for orchard speed sprayers. Comput. Electron. Agric. 2024, 227, 109669. [Google Scholar] [CrossRef]
- Papadimitriou, A.; Kleitsiotis, I.; Kostavelis, I.; Mariolis, I.; Giakoumis, D.; Likothanassis, S.; Tzovaras, D. Loop closure detection and slam in vineyards with deep semantic cues. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2251–2258. [Google Scholar]
- Wang, S.; Song, J.; Qi, P.; Yuan, C.; Wu, H.; Zhang, L.; Liu, W.; Liu, Y.; He, X. Design and development of orchard autonomous navigation spray system. Front. Plant Sci. 2022, 13, 960686. [Google Scholar] [CrossRef]
- Blok, P.M.; van Boheemen, K.; van Evert, F.K.; IJsselmuiden, J.; Kim, G.H. Robot navigation in orchards with localization based on Particle filter and Kalman filter. Comput. Electron. Agric. 2019, 157, 261–269. [Google Scholar] [CrossRef]
- Dong, W.; Roy, P.; Isler, V. Semantic mapping for orchard environments by merging two-sides reconstructions of tree rows. J. Field Robot. 2020, 37, 97–121. [Google Scholar] [CrossRef]
- Peng, H.; Guo, S.; Zou, X.; Wang, H.; Xiong, J.; Liang, Q. UAVO-NeRF: 3D reconstruction of orchards and semantic segmentation of fruit trees based on neural radiance field in UAV images. Comput. Electron. Agric. 2025, 237, 110631. [Google Scholar] [CrossRef]
- Pan, Y.; Hu, K.; Cao, H.; Kang, H.; Wang, X. A novel perception and semantic mapping method for robot autonomy in orchards. Comput. Electron. Agric. 2024, 219, 108769. [Google Scholar] [CrossRef]
- Fu, H.; Li, X.; Zhu, L.; Xin, P.; Wu, T.; Li, W.; Feng, Y. DSC-DeepLabv3+: A lightweight semantic segmentation model for weed identification in maize fields. Front. Plant Sci. 2025, 16, 1647736. [Google Scholar] [CrossRef] [PubMed]
- Sodano, M.; Magistri, F.; Marks, E.; Hosn, F.; Zurbayev, A.; Marcuzzi, R.; Malladi, M.V.; Behley, J.; Stachniss, C. 3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors. arXiv 2025, arXiv:2503.13188. [Google Scholar] [CrossRef]
- Cuaran, J.; Ahluwalia, K.S.; Koe, K.; Uppalapati, N.K.; Chowdhary, G. Active Semantic Mapping with Mobile Manipulator in Horticultural Environments. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 12716–12722. [Google Scholar]
- Liu, Z.; Feng, Q.; Qin, C.; Lin, Y.; Xia, P.; Wang, H.; Gong, L.; Liu, C. EDSC-HRAFNet: An apple tree branch semantic segmentation model for harvesting robots under complex orchard conditions. Artif. Intell. Agric. 2026; in press. [Google Scholar] [CrossRef]
- Cheng, T.; Song, L.; Ge, Y.; Liu, W.; Wang, X.; Shan, Y. YOLO-World: Real-Time Open-Vocabulary Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 19–21 June 2024. [Google Scholar]
- Zhao, X.; Ding, W.; An, Y.; Du, Y.; Yu, T.; Li, M.; Tang, M.; Wang, J. Fast Segment Anything. arXiv 2023, arXiv:2306.12156. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5105–5114. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 11108–11117. [Google Scholar]
- Zhou, H.; Zhu, X.; Song, X.; Ma, Y.; Wang, Z.; Li, H.; Lin, D. Cylinder3d: An effective 3d framework for driving-scene lidar semantic segmentation. arXiv 2020, arXiv:2008.01550. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Zheng, C.; Xu, W.; Zou, Z.; Hua, T.; Yuan, C.; He, D.; Zhou, B.; Liu, Z.; Lin, J.; Zhu, F.; et al. Fast-livo2: Fast, direct lidar-inertial-visual odometry. IEEE Trans. Robot. 2024, 41, 326–346. [Google Scholar] [CrossRef]
- Liu, Z.; Li, H.; Yuan, C.; Liu, X.; Lin, J.; Li, R.; Zheng, C.; Zhou, B.; Liu, W.; Zhang, F. Voxel-slam: A complete, accurate, and versatile lidar-inertial slam system. arXiv 2024, arXiv:2410.08935. [Google Scholar]
- Zheng, C.; Zhu, Q.; Xu, W.; Liu, X.; Guo, Q.; Zhang, F. Fast-livo: Fast and tightly-coupled sparse-direct lidar-inertial-visual odometry. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 4003–4009. [Google Scholar]
- Carion, N.; Gustafson, L.; Hu, Y.T.; Debnath, S.; Hu, R.; Suris, D.; Ryali, C.; Alwala, K.V.; Khedr, H.; Huang, A.; et al. SAM 3: Segment Anything with Concepts. arXiv 2025, arXiv:2511.16719. [Google Scholar] [CrossRef]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 1290–1299. [Google Scholar]
- Vasconez, J.P.; Delpiano, J.; Vougioukas, S.; Cheein, F.A. Comparison of convolutional neural networks in fruit detection and counting: A comprehensive evaluation. Comput. Electron. Agric. 2020, 173, 105348. [Google Scholar] [CrossRef]
- Yuan, M.; Wang, L.; Waslander, S.L. Opennav: Open-world navigation with multimodal large language models. In Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 19–25 October 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 18948–18955. [Google Scholar]























| Parameter | Value | Description |
|---|---|---|
| Confidence range | [0.55, 0.95] | Bayesian fusion score bounds |
| Voxel resolutions | 1.0, 0.5, 0.2 m | Multi-scale grid sizes |
| Label threshold | 0.6 | Dominant category ratio |
| Smoothing KNN | 10 | Boundary point neighbors |
| Smoothing threshold | 60% | Minimum vote for reassignment |
| Min 3D depth | 0.5 m | Detection box depth lower bound |
| Method | mIoU/% | PA/% | F1/% | FPS |
|---|---|---|---|---|
| BiSeNet | 78.61 | 87.75 | 88.0 | 115.15 |
| U-Net | 66.65 | 83.75 | 87.56 | 67.3 |
| DeepLabV3+ | 67.14 | 82.24 | 79.28 | 133.08 |
| FastSAM | 74.33 | 85.02 | 84.72 | 74.85 |
| Experiment ID | Actual Value (Count) | Predicted Value (Count) | Relative Error (%) |
|---|---|---|---|
| 1 | 24 | 19 | 20.8 |
| 2 | 20 | 20 | 0.0 |
| 3 | 19 | 16 | 15.7 |
| 4 | 19 | 17 | 10.5 |
| 5 | 20 | 23 | 15.0 |
| Mean | – | – | 12.4 |
| Condition | Max (m) | Mean (m) | Median (m) | RMSE (m) | Std (m) |
|---|---|---|---|---|---|
| Slow motion, stable lighting | 1.256 | 0.529 | 0.434 | 0.628 | 0.337 |
| Fast motion, rapid lighting changes | 2.357 | 1.290 | 1.375 | 1.376 | 0.479 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Q.; Chen, Y.; Li, J.; Chen, Y.; Wang, H. 3D Semantic Map Reconstruction for Orchard Environments Using Multi-Sensor Fusion. Agriculture 2026, 16, 455. https://doi.org/10.3390/agriculture16040455
Wang Q, Chen Y, Li J, Chen Y, Wang H. 3D Semantic Map Reconstruction for Orchard Environments Using Multi-Sensor Fusion. Agriculture. 2026; 16(4):455. https://doi.org/10.3390/agriculture16040455
Chicago/Turabian StyleWang, Quanchao, Yiheng Chen, Jiaxiang Li, Yongxing Chen, and Hongjun Wang. 2026. "3D Semantic Map Reconstruction for Orchard Environments Using Multi-Sensor Fusion" Agriculture 16, no. 4: 455. https://doi.org/10.3390/agriculture16040455
APA StyleWang, Q., Chen, Y., Li, J., Chen, Y., & Wang, H. (2026). 3D Semantic Map Reconstruction for Orchard Environments Using Multi-Sensor Fusion. Agriculture, 16(4), 455. https://doi.org/10.3390/agriculture16040455

