SCEW-YOLOv8 Detection Model and Camera-LiDAR Fusion Positioning System for Whole-Growth-Cycle Management of Cabbage
Abstract
1. Introduction
- Theoretical optimization of the recognition architecture: An improved SCEW-YOLOv8 object perception model is designed to overcome the strong morphological heterogeneity and severe leaf occlusion of cabbage. Specifically, it introduces the SPD-Conv space-to-depth mapping to prevent fine-grained feature loss. It also incorporates the C2f-CX module to enhance global contextual modeling via large-kernel convolution. Furthermore, it embeds the EMA cross-space attention mechanism to adaptively suppress background noise, and utilizes the WIoU v3 dynamic non-monotonic loss function to optimize bounding box regression.
- Theoretical framework for cross-modal 3D perception: A loosely coupled visual semantic-driven 3D fusion ranging algorithm is constructed. It fundamentally resolves the asynchronous spatial-temporal mapping between 2D visual semantics and 3D LiDAR point clouds. Through a two-stage cascaded filtering strategy based on directional spatial constraints and statistical outlier elimination, it establishes a rigorous mathematical framework for millimeter-level positioning. This effectively circumvents the depth degradation problem of traditional stereo-matching vision schemes under extreme field illuminations.
- Empirical validation and performance breakthrough: Extensive experiments systematically validate the presented theories. The SCEW-YOLOv8 model achieves a remarkable 95.8% mAP@0.5 and 90.8% recall. Furthermore, the fusion positioning system achieves a 1.45 mm mean absolute error (MAE) in the height direction. This represents a 96.3% reduction in spatial positioning error at extended operating distances compared to mainstream pure vision baselines, ultimately providing highly reliable technical support for the automated management of locally dominant green cabbage cultivars.
2. Materials and Methods
2.1. Construction and Preprocessing of Whole-Growth-Cycle Cabbage Image Dataset
2.1.1. Field Image Acquisition and Scene Coverage
2.1.2. Data Augmentation and Dataset Division
2.2. Improved SCEW-YOLOv8 Model for Cabbage Perception
2.2.1. Optimization of Space-to-Depth Downsampling Based on SPD-Conv
| Algorithm 1: Forward pass of the SPD-Conv Module |
| Input: Intermediate feature map , scale factor . Output: Downsampled feature map . 1: Initialize an empty list: 2: for to do 3: for to do 4: Extract sub-feature maps by strided slicing 5: 6: Append 7: end for 8: end for 9: Concatenate all sub- feature maps along the channel dimension 10: 11: Apply non-strided convolution to fuse features and adjust channels 12: 13: return |
2.2.2. Global Feature Extraction Enhancement Based on C2f-CX Module
| Algorithm 2: Forward pass of the ConvNeXt V2 Bottleneck inside C2f-CX |
|
Input: Feature map Output: Enhanced feature map 1: Large-kernel depthwise convolution for global geometry modeling 2: 3: 4: Pointwise convolution to expand channel dimension 5: 6: 7: Apply Global Response Normalization (GRN) 8: 9: Residual connection to maintain gradient stability 10: 11: Residual connection to maintain gradient stability 12: 13: return |
2.2.3. Target Feature Focusing Optimization Based on EMA Cross-Space Attention
| Algorithm 3: Forward pass of Efficient Multi-Scale Attention (EMA) |
| Input: Feature map , number of groups Output: Attention-calibrated feature map . 1: Split input into groups along the channel dimension 2: 3: 4: for each do 5: 1D Branch: Encode spatial features along H and W 6: 7: 8: 9: 3D Branch: Capture local context 10: 11: Cross-spatial interaction via matrix dot product 12: 13: Re-weight the original grouped features 14: 15: 16: end for 17: Concatenate all groups back to original shape 18: 19: return |
2.2.4. Bounding Box Regression Loss Optimization Based on WIoU v3
2.3. Design of Camera-LiDAR Fusion Perception System
2.3.1. Sensor Hardware Selection and System Integration
2.3.2. Multi-Sensor Time Synchronization Scheme Based on ROS2
- Global clock unified calibration: The original point cloud topic, /livox/lidar, is received through the lidar_time_converter node. The device clock timestamp carried by the LiDAR hardware is uniformly converted to the ROS2 system global clock timestamp. Subsequently, the calibrated point cloud topic, /livox/lidar_converted, is output, effectively eliminating the clock offset between the two sensor types at the source.
- Inter-frame precise matching: Utilizing the sync_node, a sliding window cache queue is constructed based on the approximate time synchronization strategy of the message_filters library. This enables inter-frame matching between the calibrated point cloud data and the original camera image topic, /image_raw. Finally, the time-synchronized image topic, /sync/image, and the point cloud topic, /sync/point_cloud, are output to the downstream fusion nodes.
2.3.3. Camera-LiDAR Spatial Joint Calibration Method
- Camera internal parameter calibration: The camera intrinsic parameters were calibrated using the standard ROS2 camera_calibration functional package, which implements the classic Zhang’s calibration method [34]. This process solved for the intrinsic parameter matrix as well as the radial and tangential distortion coefficients. The calibration setup is illustrated in Figure 10. A checkerboard with an 8 × 6 internal corner specification and a grid size of 30 mm × 30 mm was employed as the calibration target. Twenty-five groups of images, captured at varying distances and pitch/yaw attitudes, were collected in a controlled environment. Through sub-pixel corner extraction and perspective projection geometric constraints, the parameters were solved iteratively. Verification confirmed an average reprojection error of 0.42 pixels, which strictly meets the pixel-level accuracy requirements for field cabbage recognition.
- Targetless external parameter calibration of camera-LiDAR: Traditional checkerboard calibration is often hindered by crop occlusion and terrain constraints in complex field environments. This makes it difficult to rapidly perform extrinsic calibration before agricultural robots begin operation. To overcome this critical limitation, we adopted a targetless calibration method based on natural edge features. Specifically, the spatial pose between the monocular camera and the Livox Mid-40 LiDAR was solved using the open-source tool livox_camera_calib [35]. Since laser beam divergence inherently causes edge expansion errors, our method simultaneously extracts 3D LiDAR edge features with continuous depth and 2D image edge features using the Canny operator. It then establishes geometric constraints between the laser and image edges. By setting the minimum reprojection residual of the edge points as the optimization objective, the algorithm iteratively solves the rigid transformation extrinsic parameters using the BFGS nonlinear optimization method.

2.3.4. Visual Semantic-Driven Camera-LiDAR Fusion 3D Ranging Algorithm
- Visual semantic feature extraction of cabbage targets: This stage corresponds to the primary camera_object_detection node. Its fundamental purpose is to provide target-level semantic anchors for the unsemantic LiDAR point clouds. Field environments present significant perception challenges, particularly overlapping leaf occlusion and background weed interference. To address these issues, the improved SCEW-YOLOv8 algorithm performs real-time identification. It receives the time-synchronized image topic, generates the bounding box, and extracts its center pixel coordinates as the positioning anchor. Simultaneously, the semantic category, center coordinates, and hardware timestamp are encapsulated into a structured topic. This ensures a strict spatiotemporal correspondence between the semantic data and LiDAR point clouds, establishing a unified benchmark for subsequent fusion.
- Pixel space-LiDAR space coordinate mapping: This stage corresponds to the central coordinate conversion logic within the camera_lidar_fusion node. Based on the camera intrinsic and extrinsic parameters calibrated in Section 2.3.3, a geometric association between the 2D pixels and 3D LiDAR space is established. First, the target center pixel coordinates are back-projected into the camera coordinate system to obtain the normalized ray vector originating from the camera’s optical center. Subsequently, this ray vector is mapped to the LiDAR coordinate system using the rotation extrinsic matrix. After L2 norm normalization, the target unit direction vector is obtained. Since this step focuses on calculating the spatial ray direction, and the translation extrinsic parameters only shift the coordinate system origin without altering the direction, only the rotation matrix is involved in this specific operation. The fundamental conversion formulas are defined in Equations (9) and (10):where are the equivalent focal lengths of the x and y axes of the camera calibrated in Section 2.3.3, are the principal point pixel coordinates of the camera imaging plane, and represents the norm.
- Fine screening and denoising of target point clouds: To mitigate point cloud noise caused by leaf occlusion and soil clutter, a two-stage cascaded screening strategy was designed to ensure robust ranging. The first stage implements an angular spatial constraint based on the target unit direction vector; only the effective point set with an angular deviation of less than 1° from the vector is retained. The second stage applies statistical outlier filtering using the $1\sigma$ (mean ± 1 standard deviation) criterion for real-time processing. By calculating the mean and standard deviation of the ranging values frame-by-frame, outliers falling outside this interval are eliminated. This effectively suppresses ranging fluctuations caused by complex field environments. If the number of effective points after screening is fewer than five, the frame is skipped to prevent statistical bias, ensuring a high-quality resultant point cloud set.
- Calculation of target 3D coordinates and ranging values: Based on the refined point cloud set, the 3D positioning and ranging of cabbage targets are finalized. First, the average ranging value of the effective point set is calculated and utilized as the final ranging result to enhance robustness. Subsequently, combined with the target unit direction vector , the 3D spatial coordinate corresponding to the center of the detection box is back-calculated. This relationship is mathematically defined in Equation (11):where is the average ranging value of the target point cloud set, that is, the linear distance of the cabbage target relative to the LiDAR.
2.4. Experimental Setup and Evaluation Metrics
2.4.1. Training Environment and Hyperparameter Configuration
2.4.2. Performance Evaluation Metrics
3. Results and Analysis
3.1. Performance Evaluation of Cabbage Recognition Model
3.1.1. Comparative Experiment of Mainstream Object Recognition Models
3.1.2. Ablation Experiment of Improved Modules
- Feature retention gain of SPD-Conv module: Compared to the baseline (Group 1), Group 2—which exclusively introduces the SPD-Conv module—elevated the recall rate from 85.7% to 88.2% and increased the mAP@0.5 by 0.7 percentage points, while the inference speed decreased by a marginal 0.4 FPS. This demonstrates that replacing traditional strided convolutions with space-to-depth operations effectively mitigates the loss of fine-grained spatial information. This mechanism proves instrumental in detecting small-sized seedlings and leaf-occluded targets.
- Synergistic effect of feature enhancement of C2f-CX and EMA modules: Building upon the SPD-Conv foundation, configurations that cumulatively integrated the C2f-CX and EMA modules (e.g., Group 12) further increased the mAP@0.5 to 95.3%, with precision and recall rising synchronously. This confirms the efficacy of global large-kernel convolution modeling and cross-space attention within complex backgrounds. These components significantly enhance feature discriminability between cabbage targets and soil/weed interference, strengthen global contour extraction, and reduce false detection rates.
- Regression accuracy optimization of WIoU v3 loss function: The final configuration (Group 16), integrating all modules including the WIoU v3 loss function, achieved optimal detection performance, peaking at 95.8% mAP@0.5 and a 90.8% recall rate. This validates the crucial role of the dynamic non-monotonic weighting mechanism in optimizing bounding box positioning. It is particularly effective for targets with irregular seedling morphology, blurred edges, and severe field occlusion, as it accelerates the model’s ability to learn from hard samples.
3.1.3. Visual Analysis of Inference Results
3.2. Performance Analysis of the Fusion Ranging System
3.2.1. Ranging Experimental Platform and Test Scheme
3.2.2. Ranging Error Statistics and Performance Analysis
4. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Asano, M.; Onishi, K.; Fukao, T. Robust Cabbage Recognition and Automatic Harvesting under Environmental Changes. Adv. Rob. 2023, 37, 960–969. [Google Scholar] [CrossRef]
- Júnior, M.R.B.; Santos, R.G.d.; Sales, L.d.A.; Oliveira, L.P.d. Advancements in Agricultural Ground Robots for Specialty Crops: An Overview of Innovations, Challenges, and Prospects. Plants 2024, 13, 3372. [Google Scholar] [CrossRef] [PubMed]
- Thakur, A.; Venu, S.; Gurusamy, M. An Extensive Review on Agricultural Robots with a Focus on Their Perception Systems. Comput. Electron. Agric. 2023, 212, 108146. [Google Scholar] [CrossRef]
- Huang, Y.; Xu, S.; Chen, H.; Li, G.; Dong, H.; Yu, J.; Zhang, X.; Chen, R. A Review of Visual Perception Technology for Intelligent Fruit Harvesting Robots. Front. Plant Sci. 2025, 16, 1646871. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS); IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Appe, S.N.; Arulselvi, G.; Balaji, G.N. CAM-YOLO: Tomato Detection and Classification Based on Improved YOLOv5 Using Combining Attention Mechanism. PeerJ Comput. Sci. 2023, 9, e1463. [Google Scholar] [CrossRef] [PubMed]
- Bai, Y.; Yu, J.; Yang, S.; Ning, J. An Improved YOLO Algorithm for Detecting Flowers and Fruits on Strawberry Seedlings. Biosyst. Eng. 2024, 237, 1–12. [Google Scholar] [CrossRef]
- Lawal, M.O. Tomato Detection Based on Modified YOLOv3 Framework. Sci. Rep. 2021, 11, 1447. [Google Scholar] [CrossRef]
- Nan, Y.; Zhang, H.; Zeng, Y.; Zheng, J.; Ge, Y. Intelligent Detection of Multi-Class Pitaya Fruits in Target Picking Row Based on WGB-YOLO Network. Comput. Electron. Agric. 2023, 208, 107780. [Google Scholar] [CrossRef]
- Hai, T.; Shao, Y.; Zhang, X.; Yuan, G.; Jia, R.; Fu, Z.; Wu, X.; Ge, X.; Song, Y.; Dong, M.; et al. An Efficient Model for Leafy Vegetable Disease Detection and Segmentation Based on Few-Shot Learning Framework and Prototype Attention Mechanism. Plants 2025, 14, 760. [Google Scholar] [CrossRef]
- Fu, Y.; Shi, C. ProtoLeafNet: A Prototype Attention-Based Leafy Vegetable Disease Detection and Segmentation Network for Sustainable Agriculture. Sustainability 2025, 17, 7443. [Google Scholar] [CrossRef]
- Yuan, K.; Wang, Q.; Mi, Y.; Luo, Y.; Zhao, Z. Improved Feature Fusion in YOLOv5 for Accurate Detection and Counting of Chinese Flowering Cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsen et Lee) Buds. Agronomy 2023, 14, 42. [Google Scholar] [CrossRef]
- Zheng, J.; Wang, X.; Shi, Y.; Zhang, X.; Wu, Y.; Wang, D.; Huang, X.; Wang, Y.; Wang, J.; Zhang, J. Keypoint Detection and Diameter Estimation of Cabbage (Brassica oleracea L.) Heads under Varying Occlusion Degrees via YOLOv8n-CK Network. Comput. Electron. Agric. 2024, 226, 109428. [Google Scholar] [CrossRef]
- Tian, Y.; Zhao, C.; Zhang, T.; Wu, H.; Zhao, Y. Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n. Agriculture 2024, 14, 1125. [Google Scholar] [CrossRef]
- Jiang, P. Field Cabbage Detection and Positioning System Based on Improved YOLOv8n. Plant Methods 2024, 20, 96. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Cao, X.; Zhang, T.; Wu, H.; Zhao, C.; Zhao, Y. CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics. Sensors 2024, 24, 8115. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Wang, X.; Wang, Z.; Xu, Q.; Xu, X.; Liu, H. Improving Stability of Gaze Target Detection in Videos. In Proceedings of the IECON 2023—49th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 16–19 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Xu, G.; Cao, X.; Liu, J.; Fan, J.; Li, E.; Long, X. Robust and Accurate Depth Estimation by Fusing LiDAR and Stereo. Meas. Sci. Technol. 2023, 34, 125107. [Google Scholar] [CrossRef]
- Song, H.; Choi, W.; Kim, H. Robust Vision-Based Relative-Localization Approach Using an RGB-Depth Camera and LiDAR Sensor Fusion. IEEE Trans. Ind. Electron. 2016, 63, 3725–3736. [Google Scholar] [CrossRef]
- Liu, H.; Wu, C.; Wang, H. Real Time Object Detection Using LiDAR and Camera Fusion for Autonomous Driving. Sci. Rep. 2023, 13, 8056. [Google Scholar] [CrossRef]
- Gao, S.; Chen, X.; Wu, X.; Zeng, T.; Xie, X. Analysis of Ranging Error of Parallel Binocular Vision System. In 2020 IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 13–16 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 621–625. [Google Scholar]
- Li, N.; Ho, C.P.; Xue, J.; Lim, L.W.; Chen, G.; Fu, Y.H.; Lee, L.Y.T. A Progress Review on Solid-State LiDAR and Nanophotonics-Based LiDAR Sensors. Laser Photonics Rev. 2022, 16, 2100511. [Google Scholar] [CrossRef]
- Karim, M.R.; Reza, M.N.; Jin, H.; Haque, M.A.; Lee, K.-H.; Sung, J.; Chung, S.-O. Application of LiDAR Sensors for Crop and Working Environment Recognition in Agriculture: A Review. Remote Sens. 2024, 16, 4623. [Google Scholar] [CrossRef]
- Kang, H.; Wang, X.; Chen, C. Accurate Fruit Localisation Using High Resolution LiDAR-Camera Fusion and Instance Segmentation. Comput. Electron. Agric. 2022, 203, 107450. [Google Scholar] [CrossRef]
- Ban, C. A Camera-LiDAR-IMU Fusion Method for Real-Time Extraction of Navigation Line between Maize Field Rows. Comput. Electron. Agric. 2024, 223, 109114. [Google Scholar] [CrossRef]
- Hu, X.; Zhang, X.; Chen, X.; Zheng, L. Research on Corn Leaf and Stalk Recognition and Ranging Technology Based on LiDAR and Camera Fusion. Sensors 2024, 24, 5422. [Google Scholar] [CrossRef] [PubMed]
- Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Pearson: Upper Saddle River, NJ, USA, 2009; pp. 140–153. [Google Scholar]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Machine Learning and Knowledge Discovery in Databases; Springer Nature Switzerland: Cham, Switzerland, 2023; Volume 13715, pp. 443–459. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-Designing and Scaling ConvNets with Masked Autoencoders. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 16133–16142. [Google Scholar]
- Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Sun, S.; Mo, B.; Xu, J.; Li, D.; Zhao, J.; Han, S. Multi-YOLOv8: An Infrared Moving Small Object Detection Model Based on YOLOv8 for Air Vehicle. Neurocomputing 2024, 588, 127685. [Google Scholar] [CrossRef]
- Zhang, Z. Flexible Camera Calibration by Viewing a Plane from Unknown Orientations. In Proceedings of the Seventh IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 1999; Volume 1, pp. 666–673. [Google Scholar]
- Yuan, C.; Liu, X.; Hong, X.; Zhang, F. Pixel-Level Extrinsic Self Calibration of High Resolution LiDAR and Camera in Targetless Environments. IEEE Robot. Autom. Lett. 2021, 6, 7517–7524. [Google Scholar] [CrossRef]
- Liu, X.; Jing, X.; Jiang, H.; Younas, S.; Wei, R.; Dang, H.; Wu, Z.; Fu, L. Performance Evaluation of Newly Released Cameras for Fruit Detection and Localization in Complex Kiwifruit Orchard Environments. J. Field Rob. 2024, 41, 881–894. [Google Scholar] [CrossRef]
- Coll-Ribes, G.; Torres-Rodríguez, I.J.; Grau, A.; Guerra, E.; Sanfeliu, A. Accurate Detection and Depth Estimation of Table Grapes and Peduncles for Robot Harvesting, Combining Monocular Depth Estimation and CNN Methods. Comput. Electron. Agric. 2023, 215, 108362. [Google Scholar] [CrossRef]
- Abeyrathna, R.M.R.D.; Nakaguchi, V.M.; Minn, A.; Ahamed, T. Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems. Sensors 2023, 23, 3810. [Google Scholar] [CrossRef]
- Fan, Z.; Sun, N.; Qiu, Q.; Li, T.; Zhao, C. Depth Ranging Performance Evaluation and Improvement for RGB-D Cameras on Field-Based High-Throughput Phenotyping Robots. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2021; pp. 3299–3304. [Google Scholar]
- Neupane, C.; Koirala, A.; Wang, Z.; Walsh, K.B. Evaluation of Depth Cameras for Use in Fruit Localization and Sizing: Finding a Successor to Kinect V2. Agronomy 2021, 11, 1780. [Google Scholar] [CrossRef]
















| Device | Type | Parameter | Value |
|---|---|---|---|
| Solid-State LiDAR | Mid-40 (Livox, Shenzhen, China) | Maximum detection range/m | 260 (@ 80% target reflectivity) |
| Range precision/cm | <2 @ 20 m, 80% target reflectivity | ||
| Circular field of view (FoV)/° | 38.4 | ||
| Point rate/points·s−1 | 100,000 | ||
| Frame rate/frame·s−1 | 10 (typical) | ||
| Dimension (L × W × H)/ mm | 88 × 69 × 76 | ||
| RGB industrial camera | MV-CS020-10UC (HIKROBOT, Hangzhou, China) | Resolution/pixel | 1624 × 1240 |
| Lens focal length/mm | 8 | ||
| Maximum frame rate/frame·s−1 | 90 | ||
| Dimension (L × W × H)/mm | 29 × 29 × 30 |
| Model | P/% | R/% | mAP@0.5/% | mAP@0.95/% | Parameters/M | GFLOPs/G | FPS |
|---|---|---|---|---|---|---|---|
| Faster R-CNN | 91.4 | 82.0 | 90.8 | 69.3 | 41.1 | 370.2 | 13.5 |
| SSD | 81.9 | 72.2 | 80.5 | 41.2 | 23.7 | 35.0 | 27.8 |
| YOLOv5n | 88.7 | 84.2 | 90.5 | 68.8 | 2.5 | 7.2 | 61.3 |
| YOLOv8n | 90.1 | 85.7 | 93.5 | 72.4 | 3.0 | 8.2 | 65.8 |
| YOLOv8s | 91.2 | 87.1 | 94.2 | 73.5 | 11.2 | 28.6 | 48.5 |
| YOLOv9t | 90.7 | 86.3 | 94.0 | 72.0 | 2.5 | 7.5 | 70.2 |
| RT-DETR-R18 | 90.4 | 87.8 | 93.9 | 71.8 | 11.4 | 12.1 | 22.5 |
| EfficientDetV2-Lite0 | 91.2 | 87.0 | 94.1 | 72.2 | 18.5 | 20.7 | 18.2 |
| YOLOv10n | 87.3 | 87.0 | 92.9 | 69.7 | 3.2 | 8.4 | 68.5 |
| YOLO11n | 92.2 | 88.1 | 95.1 | 73.0 | 3.5 | 6.4 | 67.3 |
| Ours | 92.6 | 90.8 | 95.8 | 73.8 | 3.8 | 8.5 | 64.2 |
| Group | SPD-Conv | C2f-CX | EMA | WIoUv3 | Precision/% | Recall/% | mAP@0.5/% | FPS |
|---|---|---|---|---|---|---|---|---|
| 1 (Baseline) | - | - | - | - | 90.1 | 85.7 | 93.5 | 65.8 |
| 2 | ✓ | - | - | - | 90.9 | 88.2 | 94.2 | 65.4 |
| 3 | - | ✓ | - | - | 90.7 | 87.5 | 94.0 | 65.5 |
| 4 | - | - | ✓ | - | 90.5 | 86.9 | 93.8 | 65.6 |
| 5 | - | - | - | ✓ | 90.3 | 86.3 | 93.6 | 65.7 |
| 6 | ✓ | ✓ | - | - | 91.5 | 88.9 | 94.8 | 65.0 |
| 7 | ✓ | - | ✓ | - | 91.2 | 88.5 | 94.5 | 65.2 |
| 8 | ✓ | - | - | ✓ | 91.1 | 88.3 | 94.4 | 65.3 |
| 9 | - | ✓ | ✓ | - | 91.0 | 88.1 | 94.3 | 65.3 |
| 10 | - | ✓ | - | ✓ | 90.9 | 87.9 | 94.2 | 65.4 |
| 11 | - | - | ✓ | ✓ | 90.8 | 87.7 | 94.1 | 65.4 |
| 12 | ✓ | ✓ | ✓ | - | 92.1 | 90.2 | 95.3 | 64.5 |
| 13 | ✓ | ✓ | - | ✓ | 92.0 | 90.0 | 95.2 | 64.6 |
| 14 | ✓ | - | ✓ | ✓ | 91.8 | 89.7 | 95.0 | 64.7 |
| 15 | - | ✓ | ✓ | ✓ | 91.7 | 89.5 | 94.9 | 64.8 |
| 16 (Full) | ✓ | ✓ | ✓ | ✓ | 92.6 | 90.8 | 95.8 | 64.2 |
| ID | Real Coordinate (mm) | Detection Coordinate (mm) | Coordinate Error (mm) |
|---|---|---|---|
| 1 | (1000, −45, 75) | (1001.8, −47.7, 72.2) | (1.8, 2.7, 2.8) |
| 2 | (1050, 30, −20) | (1052.9, 32.1, −17.2) | (2.9, 2.1, 2.8) |
| 3 | (1100, −80, 120) | (1100.7, −82.8, 118.1) | (0.7, 2.8, 1.9) |
| 4 | (1150, 55, −60) | (1154.2, 52.2, −63.1) | (4.2, 2.8, 3.1) |
| 5 | (1200, −110, 90) | (1201.9, −112.2, 88.2) | (1.9, 2.2, 1.8) |
| 6 | (1250, 70, −95) | (1253.1, 73.8, −92.2) | (3.1, 3.8, 2.8) |
| 7 | (1300, −35, 150) | (1300.8, −37.1, 147.2) | (0.8, 2.1, 2.8) |
| 8 | (1350, 95, −40) | (1354.9, 92.2, −42.1) | (4.9, 2.8, 2.1) |
| 9 | (1400, −140, 60) | (1401.7, −142.9, 58.1) | (1.7, 2.9, 1.9) |
| 10 | (1450, 25, −130) | (1453.0, 27.9, −127.1) | (3.0, 2.9, 2.9) |
| 11 | (1500, −70, 180) | (1500.6, −72.1, 178.2) | (0.6, 2.1, 1.8) |
| 12 | (1550, 110, −75) | (1554.1, 107.2, −77.9) | (4.1, 2.8, 2.9) |
| 13 | (1600, −100, 110) | (1601.8, −102.9, 108.1) | (1.8, 2.9, 1.9) |
| 14 | (1650, 45, −160) | (1653.0, 47.9, −157.1) | (3.0, 2.9, 2.9) |
| 15 | (1700, −160, 85) | (1700.7, −162.1, 83.2) | (0.7, 2.1, 1.8) |
| 16 | (1750, 135, −20) | (1754.9, 132.2, −22.9) | (4.9, 2.8, 2.9) |
| 17 | (1800, −50, 210) | (1801.8, −52.9, 208.1) | (1.8, 2.9, 1.9) |
| 18 | (1850, 65, −190) | (1853.0, 67.9, −187.1) | (3.0, 2.9, 2.9) |
| 19 | (1900, −120, 140) | (1900.6, −122.1, 138.2) | (0.6, 2.1, 1.8) |
| 20 | (2000, 85, −110) | (2004.1, 82.2, −112.9) | (4.1, 2.8, 2.9) |
| Hardware Representation | 3D Positioning Accuracy (Range: 1.0–2.0 m) | Robustness to Field Environment | Hardware Cost (USD) |
|---|---|---|---|
| Depth Camera (e.g., RealSense D415/D435i/D455) | Error escalates to 30–50+ mm RMSE | Medium (Prone to failure under strong direct sunlight or textureless leaves) | ~$350–540 |
| Monocular Camera + Solid-State LiDAR | 1.45 mm MAE (Stable across range) | Excellent (Physical ToF ranging, immune to ambient light) | $853.26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Han, J.; Lyu, D.; Xia, C. SCEW-YOLOv8 Detection Model and Camera-LiDAR Fusion Positioning System for Whole-Growth-Cycle Management of Cabbage. Appl. Sci. 2026, 16, 3510. https://doi.org/10.3390/app16073510
Han J, Lyu D, Xia C. SCEW-YOLOv8 Detection Model and Camera-LiDAR Fusion Positioning System for Whole-Growth-Cycle Management of Cabbage. Applied Sciences. 2026; 16(7):3510. https://doi.org/10.3390/app16073510
Chicago/Turabian StyleHan, Jiangyi, Deyuan Lyu, and Changgao Xia. 2026. "SCEW-YOLOv8 Detection Model and Camera-LiDAR Fusion Positioning System for Whole-Growth-Cycle Management of Cabbage" Applied Sciences 16, no. 7: 3510. https://doi.org/10.3390/app16073510
APA StyleHan, J., Lyu, D., & Xia, C. (2026). SCEW-YOLOv8 Detection Model and Camera-LiDAR Fusion Positioning System for Whole-Growth-Cycle Management of Cabbage. Applied Sciences, 16(7), 3510. https://doi.org/10.3390/app16073510

