Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (69)

Search Parameters:
Keywords = stereo RGB images

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 4026 KiB  
Article
Multi-Trait Phenotypic Analysis and Biomass Estimation of Lettuce Cultivars Based on SFM-MVS
by Tiezhu Li, Yixue Zhang, Lian Hu, Yiqiu Zhao, Zongyao Cai, Tingting Yu and Xiaodong Zhang
Agriculture 2025, 15(15), 1662; https://doi.org/10.3390/agriculture15151662 - 1 Aug 2025
Viewed by 244
Abstract
To address the problems of traditional methods that rely on destructive sampling, the poor adaptability of fixed equipment, and the susceptibility of single-view angle measurements to occlusions, a non-destructive and portable device for three-dimensional phenotyping and biomass detection in lettuce was developed. Based [...] Read more.
To address the problems of traditional methods that rely on destructive sampling, the poor adaptability of fixed equipment, and the susceptibility of single-view angle measurements to occlusions, a non-destructive and portable device for three-dimensional phenotyping and biomass detection in lettuce was developed. Based on the Structure-from-Motion Multi-View Stereo (SFM-MVS) algorithms, a high-precision three-dimensional point cloud model was reconstructed from multi-view RGB image sequences, and 12 phenotypic parameters, such as plant height, crown width, were accurately extracted. Through regression analyses of plant height, crown width, and crown height, and the R2 values were 0.98, 0.99, and 0.99, respectively, the RMSE values were 2.26 mm, 1.74 mm, and 1.69 mm, respectively. On this basis, four biomass prediction models were developed using Adaptive Boosting (AdaBoost), Support Vector Regression (SVR), Gradient Boosting Decision Tree (GBDT), and Random Forest Regression (RFR). The results indicated that the RFR model based on the projected convex hull area, point cloud convex hull surface area, and projected convex hull perimeter performed the best, with an R2 of 0.90, an RMSE of 2.63 g, and an RMSEn of 9.53%, indicating that the RFR was able to accurately simulate lettuce biomass. This research achieves three-dimensional reconstruction and accurate biomass prediction of facility lettuce, and provides a portable and lightweight solution for facility crop growth detection. Full article
(This article belongs to the Section Crop Production)
Show Figures

Figure 1

37 pages, 55522 KiB  
Article
EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera
by Orla Sealy Phelan, Dara Molloy, Roshan George, Edward Jones, Martin Glavin and Brian Deegan
Sensors 2025, 25(15), 4540; https://doi.org/10.3390/s25154540 - 22 Jul 2025
Viewed by 240
Abstract
Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional [...] Read more.
Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional Neural Networks (CNNs) often have a resolution limitation, requiring images to be down-sampled before inference, causing significant information loss. Event-based cameras are neuromorphic vision sensors with high temporal resolution, low power consumption, and high dynamic range, making them preferable to regular RGB cameras in many situations. This project proposes the fusion of an event-based camera with an RGB camera to mitigate the trade-off between temporal resolution and accuracy, while minimising power consumption. The cameras are calibrated to create a multi-modal stereo vision system where pixel coordinates can be projected between the event and RGB camera image planes. This calibration is used to project bounding boxes detected by clustering of events into the RGB image plane, thereby cropping each RGB frame instead of down-sampling to meet the requirements of the CNN. Using the Common Objects in Context (COCO) dataset evaluator, the average precision (AP) for the bicycle class in RGB scenes improved from 21.08 to 57.38. Additionally, AP increased across all classes from 37.93 to 46.89. To reduce system latency, a novel object detection approach is proposed where the event camera acts as a region proposal network, and a classification algorithm is run on the proposed regions. This achieved a 78% improvement over baseline. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 4774 KiB  
Article
InfraredStereo3D: Breaking Night Vision Limits with Perspective Projection Positional Encoding and Groundbreaking Infrared Dataset
by Yuandong Niu, Limin Liu, Fuyu Huang, Juntao Ma, Chaowen Zheng, Yunfeng Jiang, Ting An, Zhongchen Zhao and Shuangyou Chen
Remote Sens. 2025, 17(12), 2035; https://doi.org/10.3390/rs17122035 - 13 Jun 2025
Viewed by 461
Abstract
In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in [...] Read more.
In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in a significant decline in image quality and making it difficult to meet the task requirements. The method based on lidar has poor imaging effects in rainy and foggy weather, close-range scenes, and scenarios requiring thermal imaging data. In contrast, infrared cameras can effectively overcome this challenge because their imaging mechanisms are different from those of RGB cameras and lidar. However, the research on three-dimensional scene reconstruction of infrared images is relatively immature, especially in the field of infrared binocular stereo matching. There are two main challenges given this situation: first, there is a lack of a dataset specifically for infrared binocular stereo matching; second, the lack of texture information in infrared images causes a limit in the extension of the RGB method to the infrared reconstruction problem. To solve these problems, this study begins with the construction of an infrared binocular stereo matching dataset and then proposes an innovative perspective projection positional encoding-based transformer method to complete the infrared binocular stereo matching task. In this paper, a stereo matching network combined with transformer and cost volume is constructed. The existing work in the positional encoding of the transformer usually uses a parallel projection model to simplify the calculation. Our method is based on the actual perspective projection model so that each pixel is associated with a different projection ray. It effectively solves the problem of feature extraction and matching caused by insufficient texture information in infrared images and significantly improves matching accuracy. We conducted experiments based on the infrared binocular stereo matching dataset proposed in this paper. Experiments demonstrated the effectiveness of the proposed method. Full article
(This article belongs to the Collection Visible Infrared Imaging Radiometers and Applications)
Show Figures

Figure 1

18 pages, 7939 KiB  
Article
Edge_MVSFormer: Edge-Aware Multi-View Stereo Plant Reconstruction Based on Transformer Networks
by Yang Cheng, Zhen Liu, Gongpu Lan, Jingjiang Xu, Ren Chen and Yanping Huang
Sensors 2025, 25(7), 2177; https://doi.org/10.3390/s25072177 - 29 Mar 2025
Cited by 1 | Viewed by 882
Abstract
With the rapid advancements in computer vision and deep learning, multi-view stereo (MVS) based on conventional RGB cameras has emerged as a promising and cost-effective tool for botanical research. However, existing methods often struggle to capture the intricate textures and fine edges of [...] Read more.
With the rapid advancements in computer vision and deep learning, multi-view stereo (MVS) based on conventional RGB cameras has emerged as a promising and cost-effective tool for botanical research. However, existing methods often struggle to capture the intricate textures and fine edges of plants, resulting in suboptimal 3D reconstruction accuracy. To overcome this challenge, we proposed Edge_MVSFormer on the basis of TransMVSNet, which particularly focuses on enhancing the accuracy of plant leaf edge reconstruction. This model integrates an edge detection algorithm to augment edge information as input to the network and introduces an edge-aware loss function to focus the network’s attention on a more accurate reconstruction of edge regions, where depth estimation errors are obviously more significant. Edge_MVSFormer was pre-trained on two public MVS datasets and fine-tuned with our private data of 10 model plants collected for this study. Experimental results on 10 test model plants demonstrated that for depth images, the proposed algorithm reduces the edge error and overall reconstruction error by 2.20 ± 0.36 mm and 0.46 ± 0.07 mm, respectively. For point clouds, the edge and overall reconstruction errors were reduced by 0.13 ± 0.02 mm and 0.05 ± 0.02 mm, respectively. This study underscores the critical role of edge information in the precise reconstruction of plant MVS data. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 13379 KiB  
Article
From Simulation to Field Validation: A Digital Twin-Driven Sim2real Transfer Approach for Strawberry Fruit Detection and Sizing
by Omeed Mirbod, Daeun Choi and John K. Schueller
AgriEngineering 2025, 7(3), 81; https://doi.org/10.3390/agriengineering7030081 - 17 Mar 2025
Cited by 1 | Viewed by 1862
Abstract
Typically, developing new digital agriculture technologies requires substantial on-site resources and data. However, the crop’s growth cycle provides only limited time windows for experiments and equipment validation. This study presents a photorealistic digital twin of a commercial-scale strawberry farm, coupled with a simulated [...] Read more.
Typically, developing new digital agriculture technologies requires substantial on-site resources and data. However, the crop’s growth cycle provides only limited time windows for experiments and equipment validation. This study presents a photorealistic digital twin of a commercial-scale strawberry farm, coupled with a simulated ground vehicle, to address these constraints by generating high-fidelity synthetic RGB and LiDAR data. These data enable the rapid development and evaluation of a deep learning-based machine vision pipeline for fruit detection and sizing without continuously relying on real-field access. Traditional simulators often lack visual realism, leading many studies to mix real images or adopt domain adaptation methods to address the reality gap. In contrast, this work relies solely on photorealistic simulation outputs for training, eliminating the need for real images or specialized adaptation approaches. After training exclusively on images captured in the virtual environment, the model was tested on a commercial-scale strawberry farm using a physical ground vehicle. Two separate trials with field images resulted in F1-scores of 0.92 and 0.81 for detection and a sizing error of 1.4 mm (R2 = 0.92) when comparing image-derived diameters against caliper measurements. These findings indicate that a digital twin-driven sim2real transfer can offer substantial time and cost savings by refining crucial tasks such as stereo sensor calibration and machine learning model development before extensive real-field deployments. In addition, the study examined geometric accuracy and visual fidelity through systematic comparisons of LiDAR and RGB sensor outputs from the virtual and real farms. Results demonstrated close alignment in both topography and textural details, validating the digital twin’s ability to replicate intricate field characteristics, including raised bed geometry and strawberry plant distribution. The techniques developed and validated in this strawberry project have broad applicability across agricultural commodities, particularly for fruit and vegetable production systems. This study demonstrates that integrating digital twins with simulation tools can significantly reduce the need for resource-intensive field data collection while accelerating the development and refinement of agricultural robotics algorithms and hardware. Full article
Show Figures

Graphical abstract

19 pages, 26378 KiB  
Article
2D to 3D Human Skeleton Estimation Based on the Brown Camera Distortion Model and Constrained Optimization
by Lan Ma and Hua Huo
Electronics 2025, 14(5), 960; https://doi.org/10.3390/electronics14050960 - 27 Feb 2025
Viewed by 1404
Abstract
In the rapidly evolving field of computer vision and machine learning, 3D skeleton estimation is critical for applications such as motion analysis and human–computer interaction. While stereo cameras are commonly used to acquire 3D skeletal data, monocular RGB systems attract attention due to [...] Read more.
In the rapidly evolving field of computer vision and machine learning, 3D skeleton estimation is critical for applications such as motion analysis and human–computer interaction. While stereo cameras are commonly used to acquire 3D skeletal data, monocular RGB systems attract attention due to benefits including cost-effectiveness and simple deployment. However, persistent challenges remain in accurately inferring depth from 2D images and reconstructing 3D structures using monocular approaches. The current 2D to 3D skeleton estimation methods overly rely on deep training of datasets, while neglecting the importance of human intrinsic structure and the principles of camera imaging. To address this, this paper introduces an innovative 2D to 3D gait skeleton estimation method that leverages the Brown camera distortion model and constrained optimization. Utilizing the Azure Kinect depth camera for capturing gait video, the Azure Kinect Body Tracking SDK was employed to effectively extract 2D and 3D joint positions. The camera’s distortion properties were analyzed, using the Brown camera distortion model which is suitable for this scenario, and iterative methods to compensate the distortion of 2D skeleton joints. By integrating the geometric constraints of the human skeleton, an optimization algorithm was analyzed to achieve precise 3D joint estimations. Finally, the framework was validated through comparisons between the estimated 3D joint coordinates and corresponding measurements captured by depth sensors. Experimental evaluations confirmed that this training-free approach achieved superior precision and stability compared to conventional methods. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

24 pages, 6085 KiB  
Article
Research on Apple Recognition and Localization Method Based on Deep Learning
by Zhipeng Zhao, Chengkai Yin, Ziliang Guo, Jian Zhang, Qing Chen and Ziyuan Gu
Agronomy 2025, 15(2), 413; https://doi.org/10.3390/agronomy15020413 - 6 Feb 2025
Cited by 1 | Viewed by 1022
Abstract
The development of robotic systems for apple picking is indeed a crucial advancement in agricultural technology, particularly in light of the ongoing labor shortages and the continuous evolution of automation technologies. Currently, during apple picking in complex environments, it is difficult to classify [...] Read more.
The development of robotic systems for apple picking is indeed a crucial advancement in agricultural technology, particularly in light of the ongoing labor shortages and the continuous evolution of automation technologies. Currently, during apple picking in complex environments, it is difficult to classify and identify the growth pattern of an apple and obtain information on its attitude synchronously. In this paper, through the integration of deep learning and stereo vision technology, the growth pattern and attitude of apples in the natural environment are identified, and three-dimensional spatial positioning is realized. This study proposes a fusion recognition method based on improved YOLOv7 for apple growth morphology classification and fruit position. Firstly, the multi-scale feature fusion network is improved by adding a 160 × 160 feature scale layer in the backbone network, which is used to enhance the model’s sensitivity in the recognition of very small local features. Secondly, the CBAM attention mechanism is introduced to improve the network’s attention to the target region of interest of the input image. Finally, the Soft-NMS algorithm is adopted, which can effectively prevent high-density overlapping targets from being suppressed at one time and thus prevent the occurrence of missed detection. In addition, the UNet segmentation network and the minimum outer circle and rectangle features are combined to obtain the unobstructed apple position. A depth image of the apple is obtained using an RGB-D camera, and the 3D coordinates of the apple picking point are obtained by combining the 2D coordinates in the RGB image with the depth value. The experimental results show that the recognition accuracy, recall and average recognition precision of the improved YOLOv7 are 86.9%, 80.5% and 87.1%, respectively, which are 4.2, 2.2 and 3.7 percentage points higher compared to the original YOLOv7 model; in addition, the average angular error of the apple position detection method is 3.964°, with an accuracy of 94%, and the error in the three-dimensional coordinate positioning of the apple is within the range of 0.01 mm–1.53 mm, which can meet the demands of apple-picking system operation. The deep-learning-based stereo vision system constructed herein for apple picking robots can effectively identify and locate apples and meet the vision system requirements for the automated picking task performed by an apple-picking robot, with a view to laying the foundation for lossless and efficient apple picking. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

21 pages, 12015 KiB  
Article
Segment Any Leaf 3D: A Zero-Shot 3D Leaf Instance Segmentation Method Based on Multi-View Images
by Yunlong Wang and Zhiyong Zhang
Sensors 2025, 25(2), 526; https://doi.org/10.3390/s25020526 - 17 Jan 2025
Cited by 2 | Viewed by 1346
Abstract
Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D [...] Read more.
Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D datasets. This paper proposes a zero-shot 3D leaf instance segmentation method using RGB sensors. It extends the 2D segmentation model SAM (Segment Anything Model) to 3D through a multi-view strategy. RGB image sequences captured from multiple viewpoints are used to reconstruct 3D plant point clouds via multi-view stereo. HQ-SAM (High-Quality Segment Anything Model) segments leaves in 2D, and the segmentation is mapped to the 3D point cloud. An incremental fusion method based on confidence scores aggregates results from different views into a final output. Evaluated on a custom peanut seedling dataset, the method achieved point-level precision, recall, and F1 scores over 0.9 and object-level mIoU and precision above 0.75 under two IoU thresholds. The results show that the method achieves state-of-the-art segmentation quality while offering zero-shot capability and generalizability, demonstrating significant potential in plant phenotyping. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

24 pages, 31029 KiB  
Article
InCrowd-VI: A Realistic Visual–Inertial Dataset for Evaluating Simultaneous Localization and Mapping in Indoor Pedestrian-Rich Spaces for Human Navigation
by Marziyeh Bamdad, Hans-Peter Hutter and Alireza Darvishy
Sensors 2024, 24(24), 8164; https://doi.org/10.3390/s24248164 - 21 Dec 2024
Cited by 1 | Viewed by 1803
Abstract
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically [...] Read more.
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 h of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 m and the 1% drift threshold, with classical methods showing drift up to 5–10%. While deep learning-based approaches maintained high pose estimation coverage (>90%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

21 pages, 16376 KiB  
Article
Triple-Camera Rectification for Depth Estimation Sensor
by Minkyung Jeon, Jinhong Park, Jin-Woo Kim and Sungmin Woo
Sensors 2024, 24(18), 6100; https://doi.org/10.3390/s24186100 - 20 Sep 2024
Cited by 1 | Viewed by 1481
Abstract
In this study, we propose a novel rectification method for three cameras using a single image for depth estimation. Stereo rectification serves as a fundamental preprocessing step for disparity estimation in stereoscopic cameras. However, off-the-shelf depth cameras often include an additional RGB camera [...] Read more.
In this study, we propose a novel rectification method for three cameras using a single image for depth estimation. Stereo rectification serves as a fundamental preprocessing step for disparity estimation in stereoscopic cameras. However, off-the-shelf depth cameras often include an additional RGB camera for creating 3D point clouds. Existing rectification methods only align two cameras, necessitating an additional rectification and remapping process to align the third camera. Moreover, these methods require multiple reference checkerboard images for calibration and aim to minimize alignment errors, but often result in rotated images when there is significant misalignment between two cameras. In contrast, the proposed method simultaneously rectifies three cameras in a single shot without unnecessary rotation. To achieve this, we designed a lab environment with checkerboard settings and obtained multiple sample images from the cameras. The optimization function, designed specifically for rectification in stereo matching, enables the simultaneous alignment of all three cameras while ensuring performance comparable to traditional methods. Experimental results with real camera samples demonstrate the benefits of the proposed method and provide a detailed analysis of unnecessary rotations in the rectified images. Full article
(This article belongs to the Collection 3D Imaging and Sensing System)
Show Figures

Figure 1

18 pages, 4787 KiB  
Article
Estimating Bermudagrass Aboveground Biomass Using Stereovision and Vegetation Coverage
by Jasanmol Singh, Ali Bulent Koc, Matias Jose Aguerre, John P. Chastain and Shareef Shaik
Remote Sens. 2024, 16(14), 2646; https://doi.org/10.3390/rs16142646 - 19 Jul 2024
Cited by 3 | Viewed by 1143
Abstract
Accurate information about the amount of standing biomass is important in pasture management for monitoring forage growth patterns, minimizing the risk of overgrazing, and ensuring the necessary feed requirements of livestock. The morphological features of plants, like crop height and density, have been [...] Read more.
Accurate information about the amount of standing biomass is important in pasture management for monitoring forage growth patterns, minimizing the risk of overgrazing, and ensuring the necessary feed requirements of livestock. The morphological features of plants, like crop height and density, have been proven to be prominent predictors of crop yield. The objective of this study was to evaluate the effectiveness of stereovision-based crop height and vegetation coverage measurements in predicting the aboveground biomass yield of bermudagrass (Cynodon dactylon) in a pasture. Data were collected from 136 experimental plots within a 0.81 ha bermudagrass pasture using an RGB-depth camera mounted on a ground rover. The crop height was determined based on the disparity between images captured by two stereo cameras of the depth camera. The vegetation coverage was extracted from the RGB images using a machine learning algorithm by segmenting vegetative and non-vegetative pixels. After camera measurements, the plots were harvested and sub-sampled to measure the wet and dry biomass yields for each plot. The wet biomass yield prediction function based on crop height and vegetation coverage was generated using a linear regression analysis. The results indicated that the combination of crop height and vegetation coverage showed a promising correlation with aboveground wet biomass yield. However, the prediction function based only on the crop height showed less residuals at the extremes compared to the combined prediction function (crop height and vegetation coverage) and was thus declared the recommended approach (R2 = 0.91; SeY= 1824 kg-wet/ha). The crop height-based prediction function was used to estimate the dry biomass yield using the mean dry matter fraction. Full article
Show Figures

Figure 1

27 pages, 14613 KiB  
Article
A UAV-Based Single-Lens Stereoscopic Photography Method for Phenotyping the Architecture Traits of Orchard Trees
by Wenli Zhang, Xinyu Peng, Tingting Bai, Haozhou Wang, Daisuke Takata and Wei Guo
Remote Sens. 2024, 16(9), 1570; https://doi.org/10.3390/rs16091570 - 28 Apr 2024
Cited by 2 | Viewed by 1908
Abstract
This article addresses the challenges of measuring the 3D architecture traits, such as height and volume, of fruit tree canopies, constituting information that is essential for assessing tree growth and informing orchard management. The traditional methods are time-consuming, prompting the need for efficient [...] Read more.
This article addresses the challenges of measuring the 3D architecture traits, such as height and volume, of fruit tree canopies, constituting information that is essential for assessing tree growth and informing orchard management. The traditional methods are time-consuming, prompting the need for efficient alternatives. Recent advancements in unmanned aerial vehicle (UAV) technology, particularly using Light Detection and Ranging (LiDAR) and RGB cameras, have emerged as promising solutions. LiDAR offers precise 3D data but is costly and computationally intensive. RGB and photogrammetry techniques like Structure from Motion and Multi-View Stereo (SfM-MVS) can be a cost-effective alternative to LiDAR, but the computational demands still exist. This paper introduces an innovative approach using UAV-based single-lens stereoscopic photography to overcome these limitations. This method utilizes color variations in canopies and a dual-image-input network to generate a detailed canopy height map (CHM). Additionally, a block structure similarity method is presented to enhance height estimation accuracy in single-lens UAV photography. As a result, the average rates of growth in canopy height (CH), canopy volume (CV), canopy width (CW), and canopy project area (CPA) were 3.296%, 9.067%, 2.772%, and 5.541%, respectively. The r2 values of CH, CV, CW, and CPA were 0.9039, 0.9081, 0.9228, and 0.9303, respectively. In addition, compared to the commonly used SFM-MVS approach, the proposed method reduces the time cost of canopy reconstruction by 95.2% and of the cost of images needed for canopy reconstruction by 88.2%. This approach allows growers and researchers to utilize UAV-based approaches in actual orchard environments without incurring high computation costs. Full article
Show Figures

Figure 1

19 pages, 10016 KiB  
Article
LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction
by Weiming Luo, Zongqing Lu and Qingmin Liao
Sensors 2024, 24(8), 2400; https://doi.org/10.3390/s24082400 - 9 Apr 2024
Cited by 6 | Viewed by 2532
Abstract
With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching [...] Read more.
With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments. Full article
(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)
Show Figures

Figure 1

22 pages, 13859 KiB  
Article
Stereo Vision for Plant Detection in Dense Scenes
by Thijs Ruigrok, Eldert J. van Henten and Gert Kootstra
Sensors 2024, 24(6), 1942; https://doi.org/10.3390/s24061942 - 18 Mar 2024
Cited by 2 | Viewed by 2138
Abstract
Automated precision weed control requires visual methods to discriminate between crops and weeds. State-of-the-art plant detection methods fail to reliably detect weeds, especially in dense and occluded scenes. In the past, using hand-crafted detection models, both color (RGB) and depth (D) data were [...] Read more.
Automated precision weed control requires visual methods to discriminate between crops and weeds. State-of-the-art plant detection methods fail to reliably detect weeds, especially in dense and occluded scenes. In the past, using hand-crafted detection models, both color (RGB) and depth (D) data were used for plant detection in dense scenes. Remarkably, the combination of color and depth data is not widely used in current deep learning-based vision systems in agriculture. Therefore, we collected an RGB-D dataset using a stereo vision camera. The dataset contains sugar beet crops in multiple growth stages with a varying weed densities. This dataset was made publicly available and was used to evaluate two novel plant detection models, the D-model, using the depth data as the input, and the CD-model, using both the color and depth data as inputs. For ease of use, for existing 2D deep learning architectures, the depth data were transformed into a 2D image using color encoding. As a reference model, the C-model, which uses only color data as the input, was included. The limited availability of suitable training data for depth images demands the use of data augmentation and transfer learning. Using our three detection models, we studied the effectiveness of data augmentation and transfer learning for depth data transformed to 2D images. It was found that geometric data augmentation and transfer learning were equally effective for both the reference model and the novel models using the depth data. This demonstrates that combining color-encoded depth data with geometric data augmentation and transfer learning can improve the RGB-D detection model. However, when testing our detection models on the use case of volunteer potato detection in sugar beet farming, it was found that the addition of depth data did not improve plant detection at high vegetation densities. Full article
(This article belongs to the Special Issue Intelligent Sensing and Machine Vision in Precision Agriculture)
Show Figures

Figure 1

16 pages, 11571 KiB  
Article
Design and Characterization of a Powered Wheelchair Autonomous Guidance System
by Vincenzo Gallo, Irida Shallari, Marco Carratù, Valter Laino and Consolatina Liguori
Sensors 2024, 24(5), 1581; https://doi.org/10.3390/s24051581 - 29 Feb 2024
Cited by 4 | Viewed by 1578
Abstract
The current technological revolution driven by advances in machine learning has motivated a wide range of applications aiming to improve our quality of life. Representative of such applications are autonomous and semiautonomous Powered Wheelchairs (PWs), where the focus is on providing a degree [...] Read more.
The current technological revolution driven by advances in machine learning has motivated a wide range of applications aiming to improve our quality of life. Representative of such applications are autonomous and semiautonomous Powered Wheelchairs (PWs), where the focus is on providing a degree of autonomy to the wheelchair user as a matter of guidance and interaction with the environment. Based on these perspectives, the focus of the current research has been on the design of lightweight systems that provide the necessary accuracy in the navigation system while enabling an embedded implementation. This motivated us to develop a real-time measurement methodology that relies on a monocular RGB camera to detect the caregiver’s feet based on a deep learning method, followed by the distance measurement of the caregiver from the PW. An important contribution of this article is the metrological characterization of the proposed methodology in comparison with measurements made with dedicated depth cameras. Our results show that despite shifting from 3D imaging to 2D imaging, we can still obtain comparable metrological performances in distance estimation as compared with Light Detection and Ranging (LiDAR) or even improved compared with stereo cameras. In particular, we obtained comparable instrument classes with LiDAR and stereo cameras, with measurement uncertainties within a magnitude of 10 cm. This is further complemented by the significant reduction in data volume and object detection complexity, thus facilitating its deployment, primarily due to the reduced complexity of initial calibration, positioning, and deployment compared with three-dimensional segmentation algorithms. Full article
(This article belongs to the Collection Instrument and Measurement)
Show Figures

Figure 1

Back to TopTop