MDPI - Publisher of Open Access Journals

28 pages, 7272 KiB

Open AccessArticle

Dynamic Object Detection and Non-Contact Localization in Lightweight Cattle Farms Based on Binocular Vision and Improved YOLOv8s

by Shijie Li, Shanshan Cao, Peigang Wei, Wei Sun and Fantao Kong

Agriculture 2025, 15(16), 1766; https://doi.org/10.3390/agriculture15161766 - 18 Aug 2025

Viewed by 360

Abstract

The real-time detection and localization of dynamic targets in cattle farms are crucial for the effective operation of intelligent equipment. To overcome the limitations of wearable devices, including high costs and operational stress, this paper proposes a lightweight, non-contact solution. The goal is [...] Read more.

The real-time detection and localization of dynamic targets in cattle farms are crucial for the effective operation of intelligent equipment. To overcome the limitations of wearable devices, including high costs and operational stress, this paper proposes a lightweight, non-contact solution. The goal is to improve the accuracy and efficiency of target localization while reducing the complexity of the system. A novel approach is introduced based on YOLOv8s, incorporating a C2f_DW_StarBlock module. The system fuses binocular images from a ZED2i camera with GPS and IMU data to form a multimodal ranging and localization module. Experimental results demonstrate a 36.03% reduction in model parameters, a 33.45% decrease in computational complexity, and a 38.67% reduction in model size. The maximum ranging error is 4.41%, with localization standard deviations of 1.02 m (longitude) and 1.10 m (latitude). The model is successfully integrated into an ROS system, achieving stable real-time performance. This solution offers the advantages of being lightweight, non-contact, and low-maintenance, providing strong support for intelligent farm management and multi-target monitoring. Full article

(This article belongs to the Topic Precision Feeding and Management of Farm Animals, 3rd Edition)

► Show Figures

Figure 1

26 pages, 15535 KiB

Open AccessArticle

BCA-MVSNet: Integrating BIFPN and CA for Enhanced Detail Texture in Multi-View Stereo Reconstruction

by Ning Long, Zhengxu Duan, Xiao Hu and Mingju Chen

Electronics 2025, 14(15), 2958; https://doi.org/10.3390/electronics14152958 - 24 Jul 2025

Viewed by 247

Abstract

The 3D point cloud generated by MVSNet has good scene integrity but lacks sensitivity to details, causing holes and non-dense areas in flat and weak-texture regions. To address this problem and enhance the point cloud information of weak-texture areas, the BCA-MVSNet network is [...] Read more.

The 3D point cloud generated by MVSNet has good scene integrity but lacks sensitivity to details, causing holes and non-dense areas in flat and weak-texture regions. To address this problem and enhance the point cloud information of weak-texture areas, the BCA-MVSNet network is proposed in this paper. The network integrates the Bidirectional Feature Pyramid Network (BIFPN) into the feature processing of the MVSNet backbone network to accurately extract the features of weak-texture regions. In the feature map fusion stage, the Coordinate Attention (CA) mechanism is introduced into 3DU-Net to obtain the position information on the channel dimension related to the direction, improve the detail feature extraction, optimize the depth map and improve the depth accuracy. The experimental results show that BCA-MVSNet not only improves the accuracy of detail texture reconstruction, but also effectively controls the computational overhead. In the DTU dataset, the Overall and Comp metrics of BCA-MVSNet are reduced by 10.2% and 2.6%, respectively; in the Tanksand Temples dataset, the Mean metrics of the eight scenarios are improved by 6.51%. Three scenes are shot by binocular camera, and the reconstruction quality is excellent in the weak-texture area by combining the camera parameters and the BCA-MVSNet model. Full article

► Show Figures

Figure 1

18 pages, 2592 KiB

Open AccessArticle

A Minimal Solution for Binocular Camera Relative Pose Estimation Based on the Gravity Prior

by Dezhong Chen, Kang Yan, Hongping Zhang and Zhenbao Yu

Remote Sens. 2025, 17(15), 2560; https://doi.org/10.3390/rs17152560 - 23 Jul 2025

Viewed by 260

Abstract

High-precision positioning is the foundation for the functionality of various intelligent agents. In complex environments, such as urban canyons, relative pose estimation using cameras is a crucial step in high-precision positioning. To take advantage of the ability of an inertial measurement unit (IMU) [...] Read more.

High-precision positioning is the foundation for the functionality of various intelligent agents. In complex environments, such as urban canyons, relative pose estimation using cameras is a crucial step in high-precision positioning. To take advantage of the ability of an inertial measurement unit (IMU) to provide relatively accurate gravity prior information over a short period, we propose a minimal solution method for the relative pose estimation of a stereo camera system assisted by the IMU. We rigidly connect the IMU to the camera system and use it to obtain the rotation matrices in the roll and pitch directions for the entire system, thereby reducing the minimum number of corresponding points required for relative pose estimation. In contrast to classic pose-estimation algorithms, our method can also calculate the camera focal length, which greatly expands its applicability. We constructed a simulated dataset and used it to compare and analyze the numerical stability of the proposed method and the impact of different levels of noise on algorithm performance. We also collected real-scene data using a drone and validated the proposed algorithm. The results on real data reveal that our method exhibits smaller errors in calculating the relative pose of the camera system compared with two classic reference algorithms. It achieves higher precision and stability and can provide a comparatively accurate camera focal length. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Figure 1

20 pages, 3688 KiB

Open AccessArticle

Intelligent Fruit Localization and Grasping Method Based on YOLO VX Model and 3D Vision

by Zhimin Mei, Yifan Li, Rongbo Zhu and Shucai Wang

Agriculture 2025, 15(14), 1508; https://doi.org/10.3390/agriculture15141508 - 13 Jul 2025

Viewed by 625

Abstract

Recent years have seen significant interest among agricultural researchers in using robotics and machine vision to enhance intelligent orchard harvesting efficiency. This study proposes an improved hybrid framework integrating YOLO VX deep learning, 3D object recognition, and SLAM-based navigation for harvesting ripe fruits [...] Read more.

Recent years have seen significant interest among agricultural researchers in using robotics and machine vision to enhance intelligent orchard harvesting efficiency. This study proposes an improved hybrid framework integrating YOLO VX deep learning, 3D object recognition, and SLAM-based navigation for harvesting ripe fruits in greenhouse environments, achieving servo control of robotic arms with flexible end-effectors. The method comprises three key components: First, a fruit sample database containing varying maturity levels and morphological features is established, interfaced with an optimized YOLO VX model for target fruit identification. Second, a 3D camera acquires the target fruit’s spatial position and orientation data in real time, and these data are stored in the collaborative robot’s microcontroller. Finally, employing binocular calibration and triangulation, the SLAM navigation module guides the robotic arm to the designated picking location via unobstructed target positioning. Comprehensive comparative experiments between the improved YOLO v12n model and earlier versions were conducted to validate its performance. The results demonstrate that the optimized model surpasses traditional recognition and harvesting methods, offering superior target fruit identification response (minimum 30.9ms) and significantly higher accuracy (91.14%). Full article

(This article belongs to the Special Issue Advanced Image Collection, Processing, and Analysis in Crop and Livestock Management)

► Show Figures

Figure 1

30 pages, 9360 KiB

Open AccessArticle

Dynamic Positioning and Optimization of Magnetic Target Based on Binocular Vision

by Jing Li, Yang Wang, Ligang Qu, Guangming Lv and Zhenyu Cao

Machines 2025, 13(7), 592; https://doi.org/10.3390/machines13070592 - 8 Jul 2025

Viewed by 222

Abstract

Aiming at the problems of visual occlusion, reduced positioning accuracy and pose loss in the dynamic scanning process of aviation large components, this paper proposes a binocular vision dynamic positioning method based on magnetic target. This method detects the spatial coordinates of the [...] Read more.

Aiming at the problems of visual occlusion, reduced positioning accuracy and pose loss in the dynamic scanning process of aviation large components, this paper proposes a binocular vision dynamic positioning method based on magnetic target. This method detects the spatial coordinates of the magnetic target in real time through the binocular camera, extracts the target center to construct a unified reference system of the measurement platform, and uses MATLAB simulation to analyze the influence of different target layouts on the scanning stability and positioning accuracy. On this basis, a dual-objective optimization model with the objectives of ‘minimizing the number of targets’ and ‘spatial distribution uniformity’ is established, and Monte Carlo simulation is used to evaluate the robustness under Gaussian noise and random frame loss interference. The experimental results on the C-Track optical tracking platform show that the optimized magnetic target layout reduces the rotation error of the dynamic scanning from 0.055° to 0.035°, the translation error from 0.31 mm to 0.162 mm, and the scanning efficiency is increased by 33%, which significantly improves the positioning accuracy and tracking stability of the system under complex working conditions. This method provides an effective solution for high-precision dynamic measurement of aviation large components. Full article

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

► Show Figures

Figure 1

21 pages, 33500 KiB

Open AccessArticle

Location Research and Picking Experiment of an Apple-Picking Robot Based on Improved Mask R-CNN and Binocular Vision

by Tianzhong Fang, Wei Chen and Lu Han

Horticulturae 2025, 11(7), 801; https://doi.org/10.3390/horticulturae11070801 - 6 Jul 2025

Viewed by 517

Abstract

With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and [...] Read more.

With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and positioning accuracy in complex orchard environments (e.g., uneven illumination, foliage occlusion, and fruit overlap), which hinders practical applications. This study proposes a visual system for apple-harvesting robots based on improved Mask R-CNN and binocular vision to achieve more precise fruit positioning. The binocular camera (ZED2i) carried by the robot acquires dual-channel apple images. An improved Mask R-CNN is employed to implement instance segmentation of apple targets in binocular images, followed by a template-matching algorithm with parallel epipolar constraints for stereo matching. Four pairs of feature points from corresponding apples in binocular images are selected to calculate disparity and depth. Experimental results demonstrate average coefficients of variation and positioning accuracy of 5.09% and 99.61%, respectively, in binocular positioning. During harvesting operations with a self-designed apple-picking robot, the single-image processing time was 0.36 s, the average single harvesting cycle duration reached 7.7 s, and the comprehensive harvesting success rate achieved 94.3%. This work presents a novel high-precision visual positioning method for apple-harvesting robots. Full article

(This article belongs to the Section Fruit Production Systems)

► Show Figures

Figure 1

20 pages, 7167 KiB

Open AccessArticle

Drone-Based 3D Thermal Mapping of Urban Buildings for Climate-Responsive Planning

by Haowen Yan, Bo Zhao, Yaxing Du and Jiajia Hua

Sustainability 2025, 17(12), 5600; https://doi.org/10.3390/su17125600 - 18 Jun 2025

Viewed by 582

Abstract

Urban thermal environment is directly linked to the health and comfort of local residents, as well as energy consumption. Drone-based thermal infrared image acquirement provides an efficient and flexible way of assessing urban heat distribution, thereby supporting climate-resilient and sustainable urban development. Here, [...] Read more.

Urban thermal environment is directly linked to the health and comfort of local residents, as well as energy consumption. Drone-based thermal infrared image acquirement provides an efficient and flexible way of assessing urban heat distribution, thereby supporting climate-resilient and sustainable urban development. Here, we present an advanced approach that utilizes the thermal infrared camera mounted on the drone for high-resolution building wall temperature measurement and achieves centimeter accuracy. According to the binocular vision theory, the three-dimensional (3D) reconstruction of thermal infrared images is first conducted, and then the two-dimensional building wall temperature is extracted. Real-world validation shows that our approach can measure the wall temperature within a 5 °C error, which confirms the reliability of this approach. The field measurement of Yuquanting in Xiong’an New Area China during three time periods, i.e., morning (7:00–8:00), noon (13:00–14:00) and evening (18:00–19:00), was used as a case study to demonstrate our approach. The results show that during the heating season, the building wall temperature was the highest at noon time and the lowest in evening time, which were mostly caused by solar radiation. The highest wall temperature at noon time was 55 °C, which was under direct sun radiation. The maximum wall temperature differences were 39 °C, 55 °C, and 20 °C during morning, noon and evening time, respectively. The lighter wall coating color tended to have a lower temperature than the darker wall coating color. Beyond this application, this approach has potential in future autonomous thermal environment measuring systems as a foundational element. Full article

(This article belongs to the Special Issue Air Pollution Control and Sustainable Urban Climate Resilience)

► Show Figures

Figure 1

18 pages, 4774 KiB

Open AccessArticle

InfraredStereo3D: Breaking Night Vision Limits with Perspective Projection Positional Encoding and Groundbreaking Infrared Dataset

by Yuandong Niu, Limin Liu, Fuyu Huang, Juntao Ma, Chaowen Zheng, Yunfeng Jiang, Ting An, Zhongchen Zhao and Shuangyou Chen

Remote Sens. 2025, 17(12), 2035; https://doi.org/10.3390/rs17122035 - 13 Jun 2025

Viewed by 514

Abstract

In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in [...] Read more.

In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in a significant decline in image quality and making it difficult to meet the task requirements. The method based on lidar has poor imaging effects in rainy and foggy weather, close-range scenes, and scenarios requiring thermal imaging data. In contrast, infrared cameras can effectively overcome this challenge because their imaging mechanisms are different from those of RGB cameras and lidar. However, the research on three-dimensional scene reconstruction of infrared images is relatively immature, especially in the field of infrared binocular stereo matching. There are two main challenges given this situation: first, there is a lack of a dataset specifically for infrared binocular stereo matching; second, the lack of texture information in infrared images causes a limit in the extension of the RGB method to the infrared reconstruction problem. To solve these problems, this study begins with the construction of an infrared binocular stereo matching dataset and then proposes an innovative perspective projection positional encoding-based transformer method to complete the infrared binocular stereo matching task. In this paper, a stereo matching network combined with transformer and cost volume is constructed. The existing work in the positional encoding of the transformer usually uses a parallel projection model to simplify the calculation. Our method is based on the actual perspective projection model so that each pixel is associated with a different projection ray. It effectively solves the problem of feature extraction and matching caused by insufficient texture information in infrared images and significantly improves matching accuracy. We conducted experiments based on the infrared binocular stereo matching dataset proposed in this paper. Experiments demonstrated the effectiveness of the proposed method. Full article

(This article belongs to the Collection Visible Infrared Imaging Radiometers and Applications)

► Show Figures

Figure 1

19 pages, 2465 KiB

Open AccessArticle

The Design and Implementation of a Dynamic Measurement System for a Large Gear Rotation Angle Based on an Extended Visual Field

by Po Du, Zhenyun Duan, Jing Zhang, Wenhui Zhao, Engang Lai and Guozhen Jiang

Sensors 2025, 25(12), 3576; https://doi.org/10.3390/s25123576 - 6 Jun 2025

Cited by 1 | Viewed by 552

Abstract

High-precision measurement of large gear rotation angles is a critical technology in gear meshing-based measurement systems. To address the challenge of high-precision rotation angle measurement for large gear, this paper proposes a binocular vision method. The methodology consists of the following steps: First, [...] Read more.

High-precision measurement of large gear rotation angles is a critical technology in gear meshing-based measurement systems. To address the challenge of high-precision rotation angle measurement for large gear, this paper proposes a binocular vision method. The methodology consists of the following steps: First, sub-pixel edges of calibration circles on a 2D dot-matrix calibration board are extracted using edge detection algorithms to obtain pixel coordinates of the circle centers. Second, a high-precision calibration of the measurement reference plate is achieved through a 2D four-parameter coordinate transformation algorithm. Third, binocular cameras capture images of the measurement reference plates attached to large gear before and after rotation. Coordinates of the camera’s field-of-view center in the measurement reference plate coordinate system are calculated via image processing and rotation angle algorithms, thereby determining the rotation angle of the large gear. Finally, a binocular vision rotation angle measurement system was developed, and experiments were conducted on a 600 mm-diameter gear to validate the feasibility of the proposed method. The results demonstrate a measurement accuracy of 7 arcseconds (7”) and a repeatability precision of 3 arcseconds (3”) within the 0–30° rotation range, indicating high accuracy and stability. The proposed method and system effectively meet the requirements for high-precision rotation angle measurement of large gear. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

25 pages, 3655 KiB

Open AccessArticle

A Multi-Sensor Fusion Approach Combined with RandLA-Net for Large-Scale Point Cloud Segmentation in Power Grid Scenario

by Tianyi Li, Shuanglin Li, Zihan Xu, Nizar Faisal Alkayem, Qiao Bao and Qiang Wang

Sensors 2025, 25(11), 3350; https://doi.org/10.3390/s25113350 - 26 May 2025

Viewed by 835

Abstract

With the continuous expansion of power grids, traditional manual inspection methods face numerous challenges, including low efficiency, high costs, and significant safety risks. As critical infrastructure in power transmission systems, power grid towers require intelligent recognition and monitoring to ensure the reliable and [...] Read more.

With the continuous expansion of power grids, traditional manual inspection methods face numerous challenges, including low efficiency, high costs, and significant safety risks. As critical infrastructure in power transmission systems, power grid towers require intelligent recognition and monitoring to ensure the reliable and stable operation of power grids. However, existing methods struggle with accuracy and efficiency when processing large-scale point cloud data in complex environments. To address these challenges, this paper presents a comprehensive approach combining multi-sensor fusion and deep learning for power grid tower recognition. A data acquisition scheme that integrates LiDAR and a binocular depth camera, implementing the FAST-LIO algorithm, is proposed to achieve the spatiotemporal synchronization and fusion of sensor data. This integration enables the construction of a colored point cloud dataset with rich visual and geometric features. Based on the RandLA-Net framework, an efficient processing method for large-scale point cloud segmentation is developed and optimized explicitly for power grid tower scenarios. Experimental validation demonstrates that the proposed method achieves 90.8% precision in tower body recognition and maintains robust performance under various environmental conditions. The proposed approach successfully processes point cloud data containing over ten million points while effectively handling challenges such as uneven point distribution and environmental interference. These results validate the reliability of the proposed method in providing technical support for intelligent inspection and the management of power grid infrastructure. Full article

(This article belongs to the Special Issue Progress in LiDAR Technologies and Applications)

► Show Figures

Figure 1

17 pages, 5356 KiB

Open AccessEditor’s ChoiceArticle

A Study on the Features for Multi-Target Dual-Camera Tracking and Re-Identification in a Comparatively Small Environment

by Jong-Chen Chen, Po-Sheng Chang and Yu-Ming Huang

Electronics 2025, 14(10), 1984; https://doi.org/10.3390/electronics14101984 - 13 May 2025

Viewed by 634

Abstract

Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates [...] Read more.

Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates our exploration of some of the above challenging issues. This study uses stereo camera calibration and key point detection to reconstruct the three-dimensional key points of the person being tracked, thereby performing person-tracking tasks. The results show that the dual cameras’ 3D spatial tracking method can have a relatively better continuous monitoring effect than a single camera alone. This study adopts four ways to evaluate person similarity, which can effectively reduce the unnecessary identity generation of persons. However, using all four methods simultaneously may not produce better results than a specific assessment method alone due to differences in people’s activity situations. Full article

(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)

► Show Figures

Figure 1

24 pages, 22571 KiB

Open AccessArticle

Non-Invasive Multivariate Prediction of Human Thermal Comfort Based on Facial Temperatures and Thermal Adaptive Action Recognition

by Kangji Li, Fukang Liu, Yanpei Luo and Mushtaque Ali Khoso

Energies 2025, 18(9), 2332; https://doi.org/10.3390/en18092332 - 2 May 2025

Viewed by 556

Abstract

Accurately assessing human thermal comfort plays a key role in improving indoor environmental quality and energy efficiency of buildings. Non-invasive thermal comfort recognition has shown great application potential compared with other methods. Based on thermal correlation analysis, human facial temperature recognition and body [...] Read more.

Accurately assessing human thermal comfort plays a key role in improving indoor environmental quality and energy efficiency of buildings. Non-invasive thermal comfort recognition has shown great application potential compared with other methods. Based on thermal correlation analysis, human facial temperature recognition and body thermal adaptive action detection are both performed by one binocular infrared camera. The YOLOv5 algorithm is applied to extract facial temperatures of key regions, through which the random forest model is used for thermal comfort recognition. Meanwhile, the Mediapipe tool is used to detect probable thermal adaptive actions, based on which the corresponding thermal comfort level is also assessed. The two results are combined with PMV calculation for multivariate human thermal comfort prediction, and a weighted fusion strategy is designed. Seventeen subjects were invited to participate in experiments for data collection of facial temperatures and thermal adaptive actions in different thermal conditions. Prediction results show that, by using the experiment data, the overall accuracies of the proposed fusion strategy reach 82.86% (7-class thermal sensation voting, TSV) and 94.29% (3-class TSV), which are better than those of facial temperature-based thermal comfort prediction (7-class: 78.57%, 3-class: 90%) and PMV model (7-class: 20.71%, 3-class: 65%). If probable thermal adaptive actions are detected, the accuracy of the proposed fusion model is further improved to 86.8% (7-class) and 100% (3-class). Furthermore, by changing clothing thermal resistance and metabolic level of subjects in experiments, the influence on thermal comfort prediction is investigated. From the results, the proposed strategy still achieves better accuracy compared with other single methods, which shows good robustness and generalization performance in different applications. Full article

(This article belongs to the Section G: Energy and Buildings)

► Show Figures

Figure 1

19 pages, 1974 KiB

Open AccessArticle

MFBCE: A Multi-Focal Bionic Compound Eye for Distance Measurement

by Qiwei Liu, Xia Wang, Jiaan Xue, Shuaijun Lv and Ranfeng Wei

Sensors 2025, 25(9), 2708; https://doi.org/10.3390/s25092708 - 24 Apr 2025

Viewed by 522

Abstract

In response to the demand for small-size, high-precision, and real-time target distance measurement in platforms such as autonomous vehicles and drones, this paper investigates the multi-focal bionic compound eye (MFBCE) and its associated distance measurement algorithm. MFBCE was designed to integrate multiple lenses [...] Read more.

In response to the demand for small-size, high-precision, and real-time target distance measurement in platforms such as autonomous vehicles and drones, this paper investigates the multi-focal bionic compound eye (MFBCE) and its associated distance measurement algorithm. MFBCE was designed to integrate multiple lenses with different focal lengths and a CMOS array. Based on this system, a multi-eye distance measurement algorithm based on target detection was proposed. The algorithm derives the application of binocular distance measurement on cameras with different focal lengths, overcoming the limitation of traditional binocular algorithms that only work with identical cameras. By utilizing the multi-scale information obtained from multiple lenses with different focal lengths, the ranging accuracy of the MFBCE is improved. The telephoto lenses, with their narrow field of view, are beneficial for capturing detailed target information, while wide-angle lenses, with their larger field of view, are useful for acquiring information about the target’s environment. Experiments using the least squares method for ranging targets at 100 cm yielded a mean absolute error (MAE) of 1.05, approximately one-half of the binocular distance measurement algorithm. The proposed MFBCE demonstrates significant potential for applications in near-range obstacle avoidance, robotic grasping, and assisted driving. Full article

(This article belongs to the Section Biosensors)

► Show Figures

Graphical abstract

19 pages, 13912 KiB

Open AccessArticle

MSDP-Net: A YOLOv5-Based Safflower Corolla Object Detection and Spatial Positioning Network

by Hui Guo, Haiyang Chen and Tianlun Wu

Agriculture 2025, 15(8), 855; https://doi.org/10.3390/agriculture15080855 - 15 Apr 2025

Cited by 1 | Viewed by 544

Abstract

In response to the challenge of low detection and positioning accuracy for safflower corollas during field operations, we propose a deep learning-based object detection and positioning algorithm called the Mobile Safflower Detection and Position Network (MSDP-Net). This approach is designed to overcome issues [...] Read more.

In response to the challenge of low detection and positioning accuracy for safflower corollas during field operations, we propose a deep learning-based object detection and positioning algorithm called the Mobile Safflower Detection and Position Network (MSDP-Net). This approach is designed to overcome issues related to the small size of safflower corollas and their tendency to be occluded in complex agricultural environments. For object detection, we introduce an improved YOLO v5m model, referred to as C-YOLO v5m, which integrates a Convolutional Block Attention Module (CBAM) into both the backbone and neck networks. This modification enhances the model’s ability to focus on key features, resulting in increases in the precision, recall, and mean average precision of 4.98%, 4.3%, and 5.5%, respectively. For spatial positioning, we propose a mobile camera-based method in which a binocular camera is mounted on a translation stage, enabling horizontal movement that maintains optimal positioning accuracy and mitigates occlusion issues. Field experiments demonstrate that this mobile positioning method achieves a success rate of 93.79% with average deviations of less than 3 mm in the X, Y, and Z directions. Moreover, comparisons with five mainstream object detection algorithms reveal that MSDP-Net offers superior overall performance, making it highly suitable for safflower corolla detection. Finally, when applied to our self-developed safflower harvesting robot, 500 indoor trial tests achieved a harvest success rate of 90.20%, and field tests along a 15 m row confirmed a success rate above 90%, thereby validating the effectiveness of the proposed methods. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Graphical abstract

23 pages, 1297 KiB

Open AccessArticle

Multi-Granularity and Multi-Modal Feature Fusion for Indoor Positioning

by Lijuan Ye, Yi Wang, Shenglei Pei, Yu Wang, Hong Zhao and Shi Dong

Symmetry 2025, 17(4), 597; https://doi.org/10.3390/sym17040597 - 15 Apr 2025

Viewed by 523

Abstract

Despite the widespread adoption of indoor positioning technology, the existing solutions still face significant challenges. On one hand, Wi-Fi-based positioning struggles to balance accuracy and efficiency in complex indoor environments and architectural layouts formed by pre-existing access points (APs). On the other hand, [...] Read more.

Despite the widespread adoption of indoor positioning technology, the existing solutions still face significant challenges. On one hand, Wi-Fi-based positioning struggles to balance accuracy and efficiency in complex indoor environments and architectural layouts formed by pre-existing access points (APs). On the other hand, vision-based methods, while offering high-precision potential, are hindered by prohibitive costs associated with binocular camera systems required for depth image acquisition, limiting their large-scale deployment. Additionally, channel state information (CSI), containing multi-subcarrier data, maintains amplitude symmetry in ideal free-space conditions but becomes susceptible to periodic positioning errors in real environments due to multipath interference. Meanwhile, image-based positioning often suffers from spatial ambiguity in texture-repeated areas. To address these challenges, we propose a novel hybrid indoor positioning method that integrates multi-granularity and multi-modal features. By fusing CSI data with visual information, the system leverages spatial consistency constraints from images to mitigate CSI error fluctuations while utilizing CSI’s global stability to correct local ambiguities in image-based positioning. In the initial coarse-grained positioning phase, a neural network model is trained using image data to roughly localize indoor scenes. This model adeptly captures the geometric relationships within images, providing a foundation for more precise localization in subsequent stages. In the fine-grained positioning stage, CSI features from Wi-Fi signals and Scale-Invariant Feature Transform (SIFT) features from image data are fused, creating a rich feature fusion fingerprint library that enables high-precision positioning. The experimental results show that our proposed method synergistically combines the strengths of Wi-Fi fingerprints and visual positioning, resulting in a substantial enhancement in positioning accuracy. Specifically, our approach achieves an accuracy of 0.4 m for 45% of positioning points and 0.8 m for 67% of points. Overall, this approach charts a promising path forward for advancing indoor positioning technology. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

Search Results (181)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (181)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI