RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings

Yuan, Fengwei; Ma, Guoning; Zeng, Qinghao; Liu, Jinghong; Xiao, Zhang; Zou, Zhenhong; Wang, Xiangjiang

doi:10.3390/agronomy15071568

Open AccessArticle

RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings

by

Fengwei Yuan

¹,

Guoning Ma

¹,

Qinghao Zeng

¹,

Jinghong Liu

¹,

Zhang Xiao

^1,*,

Zhenhong Zou

² and

Xiangjiang Wang

¹

College of Mechanical Engineering, University of South China, Hengyang 421001, China

²

Hengyang Vegetable Seeds Co., Hengyang 421001, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(7), 1568; https://doi.org/10.3390/agronomy15071568

Submission received: 13 May 2025 / Revised: 21 June 2025 / Accepted: 25 June 2025 / Published: 27 June 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

As an emerging vegetable cultivation technology, plug seedling cultivation significantly improves seedling production efficiency and reduces costs through standardization. Grading and transplanting, as the final step before the sale of plug seedlings, categorizes seedlings into different grades to ensure consistent quality. However, most current grading methods can only detect seedling emergence but cannot classify the emerged seedlings. Therefore, this study proposes an intelligent grading method for pepper plug seedlings based on RGB and point cloud images, enabling precise grading using both RGB and 3D point cloud data. The proposed method involves the following steps: First, RGB and point cloud images of the seedlings are acquired using 2D and 3D cameras. The point cloud data is then converted into a 2D representation and aligned with the RGB images. Next, a deep learning-based object detection algorithm identifies the positions of individual seedlings in the RGB images. Using these positions, the seedlings are segmented from both the RGB and 2D point cloud images. Subsequently, a deep learning-based leaf recognition algorithm processes the segmented RGB images to determine leaf count, while another deep learning-based algorithm segments the leaves in the 2D point cloud images to extract their spatial information. Their surface area is measured using 3D reconstruction method to calculate leaf area. Additionally, plant height is derived from the point cloud’s height data. Finally, a classification model is trained using these extracted features to establish a grading system. Experimental results demonstrate that this automated grading method achieves a success rate of 97%, and compared with manual methods, this method has higher production efficiency. Meanwhile, it can grade different tray seedlings by training different models and provide reliable technical support for the quality evaluation of seedlings in industrialized transplanting production.

Keywords:

plug seedling; machine learning; deep learning; feature extraction; classification method

1. Introduction

Vegetables contain a variety of nutrients essential for human health and are indispensable in daily diets. Plug seedling transplantation, as an emerging vegetable cultivation method, has achieved cost reductions, efficiency improvements, yield increases, and quality enhancements by incorporating advanced technologies such as modern biotechnology and environmental control during the seedling stage [1,2]. Before commercial plug seedlings are sold, they typically undergo sorting to remove missing or weak seedlings and replace them with healthy ones, ensuring uniform and robust seedlings [3,4]. However, the current sorting in industrialized seedling production is predominantly manual, which is time-consuming and labor-intensive, making it difficult to meet the demands of large-scale production [5,6]. Research on automated plug seedling sorting technology is therefore of great significance, and machine vision-based seedling grading is a crucial approach to achieving automated sorting.

At present, the detection and grading of tray seedlings include both traditional machine learning methods and deep learning methods and have demonstrated unique technical advantages and application potential in this field. Traditional machine learning methods rely on artificially designed feature engineering, such as feature extraction algorithms based on color, texture, and geometric shape. Through classifiers such as support vector machine (SVM) and random forest, mature detection and classification models are constructed. Their advantages lie in strong model interpretability and relatively low computing resource requirements, and they are suitable for scenarios with clear features and limited data scale. Deep learning methods, relying on architectures such as convolutional neural networks (CNNs) and transformer, automatically learn complex feature patterns from massive data, demonstrating outstanding performance in object detection and semantic segmentation tasks, especially when dealing with complex backgrounds, multi-scale objects, and occlusion problems. Next, this article will introduce the detection and grading methods of tray seedlings from two aspects: traditional machine learning methods and deep learning methods.

As early as the last century, researchers began to apply traditional machine vision technology to the classification methods of plants. In 1994, Tai et al. [7] used cameras and laser emitters to capture images of plug seedlings, applied the area of interest (AOI) technique for empty-cell detection, and achieved a 95% success rate in identifying empty cells. In 1996, Ling et al. [8] successfully measured the canopy area using Otsu’s adaptive thresholding method, providing key features for seedling grading. In 2009, Jiang Huanyu et al. [9] employed the watershed algorithm to extract features such as leaf area and perimeter from tomato seedlings, achieving a 98% classification accuracy. In 2010, Sun Guoxiang et al. [10] proposed a Freeman chain code-based method for leaf area extraction in overlapping tomato seedling leaves, achieving 100% and 96% segmentation success rates for 72-cell and 128-cell plug trays, respectively. In 2013, Hu Fei et al. [11] developed a machine vision-based method to detect empty cells and substandard seedlings using a CK-JV200CI CCD industrial camera and threshold segmentation, achieving over 95.8% accuracy for 13-day-old seedlings in 72-cell trays. The same year, Giselsson et al. [12] introduced two novel shape-feature generation methods based on distance transformation, demonstrating superior performance compared to traditional feature sets. In 2018, Wang Yongwei et al. [13] designed an automatic seedling replenishment system for Arabidopsis seedlings, achieving 100% accuracy in detecting empty and occupied cells by analyzing pixel statistics from grayscale and threshold-segmented images. In 2020, Zhang Guodong et al. [14] proposed a machine vision-based method to identify empty cells and unhealthy seedlings using CKVisionBuilder software, achieving over 95.8% accuracy for lettuce, cabbage, and flowering Chinese cabbage seedlings in 72-cell trays. In 2021, Tong et al. [15] developed a mobile detection system for seedling replenishment, optimizing image stitching algorithms (block matching, Harris corner detection, and SURF feature detection) to achieve 98.7% accuracy in seedling health assessment. Wang Jizhang et al. [16] used a Kinect camera to acquire color and depth images, calculating germination rate, plant height, and leaf area to establish a robust seedling index model with high measurement precision. In 2022, Zhang Lina et al. [17] proposed an RGB-D image-based method for detecting delayed emergence, combining point cloud segmentation (conditional filtering, statistical filtering, and Euclidean clustering) with α-shape-based leaf area and curvature-derived plant height measurements, achieving 95% accuracy. Jin et al. [18] optimized seedling extraction paths using edge recognition and orthogonal experiments to minimize transplant damage. Jin et al. [19] employed an Intel RealSense D415 camera to capture point clouds, planning L-shaped paths to avoid stem and leaf contact, reducing damage rates by 11.11% with only a 0.029 s increase in transplant time. With the advances in computer technology, deep learning has achieved breakthrough progress, with practical applications emerging in the industrial, agricultural, and medical fields through models such as ChatGPT4.0, ERNIE Bot3.0, and DeepSeekR1 [20,21,22]. As technology continues to break new ground, modern agriculture stands at the cusp of a revolutionary transformation. To usher in an efficient and intelligent new era of agriculture, the concept of precision agriculture has been proposed, utilizing various smart sensors combined with big data technology and decision-making algorithms to achieve the digitalization and increased intelligence of agriculture [23,24,25].

The development of deep learning-based image processing technology has further advanced agricultural intelligence. Thanks to its powerful feature extraction capabilities, deep learning algorithms hold significant advantages over traditional methods in processing large and complex datasets. Deep learning models can perform appearance quality inspection of seeds before sowing, assist in monitoring abnormal growth conditions, and conduct quality grading of fruits and vegetables post-harvest. The successful application of deep learning in computer vision has provided new tools for intelligent agricultural and forestry plant information management [26,27,28,29]. In 2019, He Yan et al. [30] developed a closed image acquisition system using an AdaBoost algorithm, multilayer perceptron (MLP), and convolutional neural network (CNN) recognition technology, achieving 97.58% accuracy in identifying tobacco seedling tray categories and robust seedlings. Zhang Yong et al. [31] employed the LeNet deep learning algorithm as the core method to identify empty cells and inferior seedlings in trays, achieving 98.7% recognition accuracy, and subsequently designed a deep learning-based image processing system for seedling grading and transplanting. Xiao et al. [32] proposed a transfer learning-based classification method for plug seedlings by extracting regions of interest from original images and applying grayscale processing, then constructing a classification model using a VGG16 convolutional neural network, ultimately achieving 95.50% classification accuracy. In 2021, Perugachi-Diaz et al. [33] used a dataset containing 13,200 seedling images to predict the growth success rate of cabbage seedlings, comparing traditional logistic regression (LR), multilayer perceptron (MLP), and four pre-trained CNN architectures (AlexNet, DenseNet, ResNet, and VGG). The results showed AlexNet performed best, achieving 94% accuracy and 0.95 AUC on the test set, demonstrating CNN’s superiority over traditional methods in processing image data. Kolhar et al. [34] employed spatiotemporal deep neural networks to classify Arabidopsis thaliana strains, comparing 3D CNNs, CNNs with convolutional long short-term memory (ConvLSTM) networks, and vision transformer. These methods utilized temporal and spatial information through time-series RGB images to classify four different Arabidopsis strains. Experimental results showed Vision Transformer achieved the highest classification accuracy at 98.59%, though with more parameters, while CNN-ConvLSTM achieved 97.97% accuracy with fewer parameters. In 2022, Jin et al. [35] established a healthy lettuce seedling identification model based on ResNet18 network through deep learning and transfer learning strategies. The model achieved 97.44% detection accuracy with model loss maintained at approximately 0.005, outperforming physical feature-based recognition models.

Most current plug seedling grading methods only detect missing seedlings in trays without assessing seedling quality. The few studies that do perform quality grading are based on extracting basic seedling features, where occlusion and interference between seedlings are inevitable, significantly reducing grading accuracy. Therefore, high-precision and high-stability plug seedling grading methods still require further research. Meanwhile, in view of the problem that the features extracted from the tray seedlings are not comprehensive due to the perspective during classification, accordingly, this study investigates an intelligent grading method for pepper plug seedlings based on RGB and point cloud images, aiming to achieve quality grading during the transplanting process. By utilizing RGB and point cloud images of pepper plug seedlings for precise grading, this method improves the grading accuracy and efficiency while reducing labor costs. Additionally, an intelligent grading system for pepper plug seedlings using RGB and point cloud images is designed to better facilitate grading and transplanting operations.

The main contributions of this paper are as follows:

The design and construction of an image acquisition platform to collect RGB and point cloud images of pepper plug seedlings. The acquired images were preprocessed and annotated to create three distinct datasets: an RGB seedling recognition dataset, an RGB leaf recognition dataset, and a 2D point cloud image segmentation dataset.
The investigation of mainstream object detection algorithms and conducted comparative experiments to establish pepper seedling recognition and leaf recognition models. Through these experiments, YOLOv11 was selected as the detection network for this method.
The improvement of the U-Net based image segmentation network by incorporating C-Res residual modules and ResAG gate attention modules into the skip connections. The enhanced U-Net was experimentally evaluated against mainstream segmentation networks, demonstrating performance improvements across all metrics. Comparative analysis of the segmentation results confirmed the functional effectiveness of the proposed modules. Ablation studies further verified that each modification contributed to performance enhancement, with the final optimized model meeting the practical requirements of this method.
Developed feature extraction methods for plug seedlings, obtaining parameters including normal leaf count, abnormal leaf count, leaf area, and plant height. These features were used to create a seedling grading dataset. After training with various classification algorithms, the random forest algorithm demonstrated superior performance and was selected as the grading method for this approach.

2. Materials and Methods

2.1. Intelligent Grading Method for Pepper Seedling Trays Based on RGB and Point Cloud Images

Currently, algorithms for plug seedling grading primarily use 2D images, with two mainstream methods:

The advantages and disadvantages of the two mainstream methods can be seen from Table 1. This paper proposes an intelligent plug seedling grading system based on RGB and point cloud images. The image acquisition device is placed above the seedlings to capture RGB and point cloud images from top to bottom. The RGB images are mainly used to provide position information and leaf count of individual seedlings, while the point cloud images primarily provide plant height and leaf area. Benefiting from the three-dimensional information of point cloud images, they can not only provide plant height information but also better restore the actual size of leaves during shooting, thereby more accurately obtaining the leaf area.

The core of this paper lies in using deep learning for seedling feature extraction, followed by grading through machine learning. If the extracted features have significant errors, the grading results will be severely affected. Therefore, the accuracy of feature extraction is crucial. To improve accuracy, this paper first uses deep learning algorithms to identify and segment individual seedlings from the tray, reducing interference from other seedlings and facilitating subsequent leaf identification and segmentation. Leaf count is one of the important parameters for evaluating seedling quality, yet it rarely appears in the grading indicators of various methods. The reason is that unlike other parameters, leaves vary in shape and size, making it difficult for traditional methods to capture their features. Therefore, we adopted deep learning methods to identify leaves and then counted them to obtain the leaf number. Plant height and leaf area are also the most intuitive features for evaluating seedling grade. Benefiting from the high-dimensional information of point cloud images, this paper easily obtained the plant height and leaf area information from point clouds. The only challenge is determining the precise position of seedlings in point cloud images. This paper projects point cloud images onto the xy-plane to obtain 2D point cloud images, then uses deep learning segmentation to locate seedlings accurately on the xy-plane. The position information is then used to extract the corresponding 3D point cloud data of each seedling. Finally, plant height and leaf area are calculated based on point cloud height and density.

The specific process of the grading method in this paper is shown in Figure 1. Firstly, a 2D camera and a 3D camera are used to collect the RGB and point cloud images of the plug seedlings in the tray. The point cloud images are 2D processed through image processing, and the 2D point cloud images are aligned with the RGB images. Then, a target recognition deep learning algorithm is employed to conduct image recognition on the RGB images so as to identify the position of each plug seedling in the tray. By using the position information of the plug seedlings obtained after recognition, the plug seedlings are segmented from the RGB images and the 2D point cloud images. The segmented RGB images of the plug seedlings undergo leaf recognition based on a deep learning algorithm to obtain the number of leaves. The segmented 2D point cloud images are subjected to leaf segmentation based on a deep learning algorithm to obtain the position information of the leaves. With the position information of the leaves, the point clouds contained in the leaves in the original point cloud image can be obtained, thus getting the leaf area. Meanwhile, the plant height can be obtained according to the height information of the point clouds. A classification model for the plug seedlings in the tray is designed, and the characteristic parameters of the plug seedlings obtained are utilized to construct a grading model. Finally, the grading results of the plug seedlings in the tray are obtained.

2.2. Data Collection

In order to collect 3D images and 2D images simultaneously, this paper designed and built an image acquisition platform for the grading of pepper plug seedlings. The image acquisition platform mainly consisted of a conveying structure, image acquisition equipment, and image processing equipment. The conveying structure was composed of a conveyor belt and a control module, which was used to simulate the actual working conditions during the operation of the transplanter. The image acquisition equipment was composed of an industrial camera and a laser camera. The image processing equipment was a computer.

The image acquisition equipment in this paper was composed of an industrial camera and a laser camera. The training effect of the deep learning model depends largely on the quality of the dataset. Therefore, an industrial camera was selected in this paper for dataset collection. The MI-230U150C industrial camera of MVCAM was obtained from Hangzhou, China and was produced by Hangzhou HikRobot Co., LTD. Due to its high resolution and excellent imaging quality, the camera is widely used in the field of machine vision and has become an ideal choice in this field. Thus, it was selected as the acquisition device for RGB images in this paper, and the physical picture of the camera is shown in Figure 2a. The SICK TriSpector 1000 is a high-performance 3D vision sensor that can quickly and accurately obtain the three-dimensional point cloud of an object. This equipment is from Walderkirch, Germany and is produced by the SICK Company. Its main features include high measurement accuracy and fast data processing. Therefore, it was selected as the acquisition device for point cloud images in this paper. The point cloud acquisition device is shown in Figure 2b.

Figure 3 shows the image acquisition platform built in this paper. The main frame of the platform is made of aluminum profiles and mainly consists of an MI-230U150C industrial camera and lens, a SICK TriSpector 1000, an acquisition box, an LED lamp, a light source controller, a conveyor belt, a conveyor belt controller, a communication cable, and a computer. The specific parameters of the experimental platform were as follows: the maximum height of the platform was 150 cm, the height of the conveyor belt from the ground was 70 cm, the length was 150 cm, the width was 30 cm, the height of the acquisition box was 80 cm, the height of the industrial camera from the conveyor belt was 59 cm, the height of the SICK TriSpector 1000 from the conveyor belt was 47 cm, and the height of the light source from the conveyor belt was 78 cm. During the image acquisition operation, the plug seedlings in trays were placed in the acquisition box. At this time, the industrial camera was directly above the position of the seedling tray to be acquired. After receiving the control command, the camera took photos. After the computer received the RGB images, it controlled the conveyor belt to move to the right. When it moved to the point cloud image acquisition area, the SICK TriSpector 1000 image acquisition device received the command and started to collect point cloud images. After the image acquisition was completed, the conveyor belt stopped moving.

In order to construct a dataset of pepper plug seedlings, the pepper seedlings of the “Xiangla 66” variety were selected for the experiment in this paper. The stems of the “Xiangla 66” chili seedlings are thick and strong, with strong adaptability and disease resistance. Pepper farmers in Hunan Province often adopt modern seedling raising techniques, such as soil-with seedling raising technology. The 72-hole trays combined with imported substrate soil became the mainstream choice, which can enable the root system of seedlings to form a close symbiosis with the soil, effectively reducing root damage during transplanting. The transplanting cycle of this type of pepper seedlings is 6–8 leaves. This paper cooperates with Hengyang Vegetable Seeds Co., Ltd. The company is located at No. 46, Xianfeng Road, Yanfeng District, Hengyang City, Hunan Province, China. The company was entrusted with the cultivation of the plug seedlings in trays. Pepper seedlings with a seedling age of 7–10 days were selected for image acquisition. Firstly, in this paper, the industrial camera was used to take photos of the whole tray of plug seedlings in trays to obtain RGB images. In this paper, 20 trays of pepper plug seedlings were collected. The specification of the seedling tray was 5 × 10, and a total of 1000 pepper seedlings were obtained.

Figure 4 and Figure 5 are the RGB images and point cloud images collected in this paper. The point cloud images are colored according to the height. Firstly, in this paper, the three-dimensional point cloud images were two-dimensionally processed, preparing for the production of the dataset and the training of the model in the next step. Then, in order to align the RGB images with the two-dimensional point cloud images, in this paper, the RGB images and the two-dimensional point cloud images were cropped by using the region of interest (ROI) technology. After the cropping was completed, the size of the two-dimensional point cloud images was adjusted to be the same as that of the RGB images, preparing for the cropping after the recognition of the plug seedlings in trays. Figure 6 shows the comparison of the two-point cloud image and the RGB image before and after processing.

After collecting the images, in this paper, the captured pictures of pepper seedlings were annotated, and the pepper seedlings were framed one by one through the Labelimg 1.8.6 software. After the annotation was completed, in this paper, 20 RGB pictures of plug seedlings in trays were divided into a training set and a validation set according to a ratio of 7:3 to obtain an RGB dataset for the recognition of plug seedlings in trays.

In order to extract the characteristic of the number of leaves of the plug seedlings in trays, in this paper, the single-plant images of the plug seedlings in trays were cut out by using the annotation frames in the RGB dataset for the recognition of plug seedlings in trays. A total of 500 single-plant images of the plug seedlings in trays were selected, and the leaves of each plug seedling in the tray were annotated through the Labelimg software. Two types of leaves were annotated, one was the normal leaf, and the other was the abnormal leaf. The abnormal leaves included the insect-eaten leaves, curled leaves, and yellow leaves. After the annotation is completed, in this paper, the annotated dataset was still divided into a training set and a validation set according to a ratio of 7:3 to obtain an RGB dataset for leaf recognition.

In order to extract the characteristic of the leaf area of the plug seedlings in trays, in this paper, the single-plant images in the two-dimensional point cloud images were cut out by using the position of the annotation frames in the RGB dataset for the recognition of plug seedlings in trays. Similarly, 500 single-plant images of the plug seedlings in trays were selected, and the single-plant plug seedlings in trays were segmented and annotated using the Labelme software. After the annotation was completed, in this paper, the annotated dataset was divided into a training set and a validation set according to a ratio of 7:3 to obtain a 2D point cloud image segmentation dataset.

This study formed an annotation team consisting of two domain researchers and three annotators who had received standardized training. All annotators have passed the pre-annotation assessment (with a pass rate of ≥90%) and completed the learning of the unified standard before the formal annotation.

2.3. The YOLO Object Recognition Algorithm

The You Only Look Once (YOLO) series algorithms are advanced one-stage object detection algorithms. Since Joseph and others proposed YOLOv1 in 2016, through continuous iterations, multiple subsequent models such as YOLOv2, YOLOv3, YOLOv4, YOLOv5, YOLOv6, YOLOv7, and YOLOv8 have been developed. Each model has been improved and innovated in aspects such as the network structure, loss function, and training techniques. Currently, it has become one of the algorithms with the best performance in the object recognition task.

At present, the YOLO series has been iterated to version v11. In this paper, the latest YOLOv11 network was adopted as the main network for the identification of pepper plug seedlings and leaf identification. Figure 7 is the network structure diagram of YOLOv11 used in this paper.

The YOLOv11 network is mainly composed of three parts: the backbone network (Backbone), the neck network (Neck), and the detection head (Head). The backbone network serves as the foundation of the entire YOLOv11 series of algorithms, primarily responsible for feature extraction of the input image. Through a series of operations such as convolutional layers and pooling layers, it gradually transforms the original image data into feature maps with different abstract features. It adopts the CSPDarknet structure, introducing the concept of cross-stage partial connection (CSP), which reduces the computational load while ensuring the feature extraction ability and improves the operation efficiency of the model. Meanwhile, using Darknet-53 as the backbone network endows it with powerful feature extraction capabilities.

The neck network is located between the backbone network and the detection head, and its main function is to further process and fuse the feature maps extracted by the backbone network. Since the feature maps extracted by the backbone network usually have different scales, the feature maps of different scales contain target information of different sizes. The neck network can fuse these feature maps of different scales through feature fusion, enabling the model to use feature information of different scales simultaneously, thus improving the detection ability for objects of different sizes. Compared with other versions, YOLOv11 adopts the path aggregation network (PAN) architecture, and incorporates advanced attention mechanisms such as spatial pyramid fast pooling (SPFF) and C2PSA in the neck network. In addition, there is a new C3K2 module. The PAN architecture helps to integrate features from multiple scales and optimize the efficiency of feature transfer, and as an evolved version of the C2F module, the C3K2 module can further enhance the feature processing ability.

The detection head is the part of the YOLO series of algorithms that finally performs object detection. Based on the fused feature maps output by the neck network, it predicts the category and location information of the targets. The detection head usually contains multiple convolutional layers, which process the feature maps through these convolutional layers and output information such as the category probability and bounding box coordinates of each prediction box. YOLOv11 adds two depth-wise separable convolutions (DWConv) in the classification detection head. In this way, while reducing the computational load and the number of parameters, the network can effectively improve the operation efficiency of the model and perform inference and prediction more quickly.

2.4. Improvements Based on the U-Net Image Segmentation Algorithm

In this paper, the U-net network was used as the basic network for the segmentation of two-dimensional point cloud images, and improvements were made on the basis of the original network. The improved network structure diagram in this paper is shown in Figure 8.

In this paper, improvements were made to the connection structure and decoder part of the U-net. By adding the residual module C-Res to the connection structure part, the network structure of the U-Net is deepened. At the same time, the input information can directly skip some network layers and be added to the subsequent output. This enables the network to solve the problems of gradient disappearance and degradation that occur with the increase in the depth of the neural network. Moreover, the network can more easily learn the identity mapping of features, thus allowing the training of extremely deep networks. Meanwhile, it can also accelerate the training convergence speed and improve the generalization ability of the model, enabling the model to better capture the complex features in the data. At the same time, a module named residual attention gate (ResAG) was added to the decoder part. The residual block is used to extract depth information, and then the Attention Gates (AGs) module is used to focus on the key areas of the target. In this way, weights are dynamically assigned to the input feature maps to highlight important information and suppress secondary information. Ultimately, it improves the model’s ability to capture key features and enhances the representational ability and performance of the model.

2.5. Improve the Residual Module C-Res

In traditional neural networks, for example, cnn [36] and vgg [37] networks as the depth of the network increases, information may gradually be lost or become blurred during the transmission process. In residual networks, residual connections allow input information to directly skip some network layers and be added to the output after operations such as convolution. In this way, during the forward propagation of the network, the input information can be more directly and completely transmitted to subsequent layers, avoiding excessive attenuation of the information. During the backpropagation process, residual networks are helpful in solving the problem of gradient disappearance. Since the input can directly participate in the calculation of the output, the gradient can flow more smoothly through the residual connection path during backpropagation, making the network easier to train. Even in very deep networks, the gradient can be effectively propagated to the previous layers, enabling the network to learn more complex mapping relationships. In this paper, the improved residual module C-Res is added to improve the network to achieve better results. Figure 9 shows the structural diagram of the proposed C-Res module in this paper.

This paper enables the U-Net to incorporate extremely deep network structures without encountering problems of gradient vanishing and degradation by introducing the improved residual module C-Res. In this way, it can learn very complex and advanced feature representations, improving the performance of the model in segmentation tasks. The residual block of this residual module is composed of a CBR convolutional block. A CBR convolutional block generally refers to a module composed in sequence of a convolutional layer, a batch normalization layer, and an activation function (such as the ReLU function). This module moves the convolutional kernel in the convolutional layer on the input feature map to perform a convolution operation, enabling it to obtain the key features in the feature map. Convolutional kernels of different sizes can capture different types of features.

In the convolutional module, the batch normalization layer mainly normalizes the feature map output by the convolutional layer, adjusting its distribution to be close to a standard normal distribution with a mean of 0 and a variance of 1. This can accelerate the convergence speed of the network and reduce problems of gradient vanishing or explosion, allowing the network to be trained with a larger learning rate and thus shortening the training time. At the same time, batch normalization also has a certain regularization effect, which can reduce the model’s dependence on initial parameters and improve the generalization ability of the model. The activation function introduces non-linear factors into the network, enabling the network to learn the non-linear relationship between the input and output, allowing the network to fit any complex function and thus handle various complex tasks. Moreover, a 1 × 1 convolution is added to the original residual structure. By using the 1 × 1 convolution, linear combinations can be made of the different feature information contained in each channel of the feature map, realizing cross-channel information interaction. After the features of different channels are convolved, they will be fused together, enabling the network to learn the correlations between channels and excavate more representative features. For example, when processing an image, different channels may represent information such as the color and texture of the image. The 1 × 1 convolution can fuse this information to generate a richer feature representation.

It can be expressed by the following formula:

C - Res (x) = C B R (x) \oplus C o n v 1 \times 1 (x)

(1)

C B R (x) = Re L U (B N (C o n v 3 \times 3 (x)))

(2)

where Convn×n() represents the convolution operation using a convolution kernel of size n × n;

\oplus

denotes the addition operation; BN() represents normalization using a normalization layer; and ReLU() represents linear transformation using ReLU.

2.6. The Improved Attention Gate Module ResAG

The attention mechanism is a crucial technology in deep learning. It mimics the human visual system’s ability to selectively focus on different parts of an image, enabling the model to concentrate on important regions, thereby enhancing the model’s performance and efficiency. In 2018, Oktay et al. [38] proposed the gated attention mechanism. As a variant of the attention mechanism, the gated attention module is mainly used to regulate the information flow of feature maps. It typically has a control signal (such as the features transmitted through the connection structure in U-Net) and input features (such as the features in the decoder). By performing linear transformations on the input features and the control signal, then adding the two results and passing them through an activation function, a gating signal is obtained. This gating signal is then multiplied element-wise with the input features to achieve the screening and enhancement of features.

To improve the network’s extraction of deep features, this paper first calculates the residuals of the input features of the gated attention before attention processing to obtain a feature map containing deeper feature information. Then, the feature map processed by the residual network is fed into the gated attention for feature screening and adjustment. By introducing the gating mechanism, the flow of features can be controlled more precisely. It can enhance or suppress certain features according to specific task requirements, thereby improving the model’s performance in specific regions or tasks. This can effectively suppress background noise and highlight the features of the target area, making the segmentation results more consistent with the actual situation. The detailed structure of ResAG is shown in Figure 10.

The two inputs of the gated attention are the

x^{l}

directly connected through the connection structure and the

g^{'}

processed by the residual network in the decoder, respectively. They are convolved using a 1 × 1 convolution. The feature maps obtained from the operation are added together, and then the number of channels of the feature map is reduced to 1 through a ReLU activation function and a 1 × 1 convolution. After that, a weight coefficient is obtained by using a Sigmoid activation function, and then the size of the feature map is changed to the size before processing through the Resample module. In this way, an attention coefficient with the same shape as the feature map is obtained, and finally the feature map is weighed using the attention coefficient. Since the gated attention module is mainly based on element-wise operations and simple linear transformations, it does not require large-scale similarity calculations and matrix operations like the ordinary attention mechanism. Therefore, it has certain advantages in computational efficiency and is more suitable for tasks with high real-time requirements.

It can be expressed by the formula as follows:

Re s A G (x) = C a t (g, A G (x^{l}, Re s (g)))

(3)

A G (,) = σ (C o n v 1 \times 1 (Re L U (C o n v 1 \times 1 (g) \oplus C o n v 1 \times 1 (x^{l}))))

(4)

Among them, Cat() represents the feature multiplication operation; Res() represents the residual operation; and

σ

() represents the sigmoid activation function.

2.7. Feature Extraction Methods for Plug Seedlings

In this paper, the plug seedlings were graded mainly according to the characteristic parameters of the plug seedlings. Therefore, how to correctly extract various characteristics was one of the important steps for the grading of pepper plug seedlings in this paper. In the local document of Bayingolin Mongolian Autonomous Prefecture, “Quality Grading of Plug Seedlings for Processing Peppers” (DB 6528/T 110-2023) [39], it is mentioned that the evaluation is carried out from aspects such as plant height, the number of leaves, the thickness of the stem base, the ratio of dry weight to root-shoot, and the vigorous seedling index. The evaluation index of the plug seedlings was calculated through different weights, and finally the grade evaluation of the plug seedlings was obtained. Through the investigation and interview with the growers, it is learned that when the growers conduct manual grading, they mainly grade by visual observation. The growers mainly judge the grade of the plug seedlings by observing the number of leaves, plant height, leaf area, and the presence of insect-infected leaves and diseased leaves of the plug seedlings. Through the discussion and analysis of the above investigation results, in order to quickly and accurately grade the plug seedlings without damaging the seedlings, in this paper, the leaf area, the number of leaves, and the plant height were finally selected for the grading of the plug seedlings.

2.7.1. Extraction of the Leaf Area Parameter of Plug Seedlings

Since the point cloud generated by the point cloud camera is disordered and there is no topological relationship among the points, the three-dimensional point cloud needs to be projected onto a two-dimensional plane through the normal, and then the point cloud in the plane is triangulated to obtain the connection relationship between each point. Edelsbrunner et al. [40] proposed a point cloud surface-reconstruction method based on the α shape algorithm. This method first performs Delaunay triangulation on the point cloud and then defines a sphere to roll in the point cloud set. The radius of the sphere is α. For each simplex (tetrahedron, triangular patch, edge, and vertex) in the triangulation result, the value interval belonging to the α shape is calculated. If α is within this value interval, the simplex is retained; if α is not within this value interval, the simplex is deleted. This algorithm can well achieve the reconstruction of the surface, and the constructed surface basically has no holes. In the α-shape method, the fluctuation of the α value will have a multi-dimensional impact on the result. From the perspective of shape characteristics, the α value determines the tightness or looseness of the shape formed by the point set. When the α value is large, the α-shape tends to include scattered points, forming a relatively smooth and broad contour, and local details may be lost. When the α value is relatively small, the generated shape will be closer to the closely distributed area of the point set, capable of capturing sharp corners and fine structures. However, it is also prone to jagged irregular edges due to overfitting. Ultimately, it leads to inaccurate area measurement.

Through preliminary experiments, when α is set to 0.2454, the surface reconstruction effect is better. The point cloud of the seedling leaves after reconstruction is composed of many triangular patches with topological relationships. The sum of the areas of all triangular patches is the area of the point cloud of the seedling leaves. The visualization effect of the leaf area fitted by α-shape is shown in Figure 11.

The leaf area calculation values obtained based on the α shape algorithm are linearly fitted with the true values, and the linear regression equation is obtained as

y = 0.0048x + 0.7925

(5)

Here, x represents the area calculated by the algorithm, and y represents the actual value of the leaf area, with the unit being cm². The fitting results are shown in Figure 12.

The leaf area fitted through Equation (5) is compared with the true value. As shown in Figure 13, the average error between the two is 1.12 cm², which is comparable on average. The error is 9.31%, which is a relatively high accuracy rate. Therefore, this paper will directly fit it. The leaf area is used for the calculation of the grading coefficient and the grading threshold for late emergence.

2.7.2. Extraction of Plant Height Parameter for Plug Seedlings

In the document “Quality Grading of Plug Seedlings for Processing Peppers”, the plant height is defined as the vertical distance from the highest point of the seedling to the ground in its natural state. This paper uses the same method as described in the document to measure the plant height. In Section 4, the positions of the leaves of each plug seedling in the two-dimensional point cloud image were obtained through image segmentation. In this paper, these position information are converted into polygons, which are then projected onto the point cloud image. The highest point cloud in this area is obtained through an algorithm, and thus the maximum z-axis height of this area is obtained. By converting the point cloud coordinates into real-world coordinates, the plant height information of the plug seedlings is obtained. In this paper, the relationship between the actual height and the point cloud height is obtained through measurement experiments on the point cloud acquisition device. A cube with a length and width of 120 mm and a height of 90 mm is selected as the measurement sample, and its point cloud is collected.

Through the measurement experiment, this paper obtains that the point cloud height of the conveyor belt is 59.86, and the point cloud height of the measured object is 150.2. Through calculation, the height of the measured object in the point cloud image is 90.34, and the actual height of the measured object is 90 mm. Therefore, the conversion coefficient between the point cloud coordinates and the real coordinates is 0.996. The conversion formula for converting the final point cloud coordinates into real coordinates is as follows:

h = 0 . 996 (G - H)

(6)

In the formula, h represents the height of the plug seedlings in real coordinates, G represents the height of the plug tray cells in the point cloud image, and H represents the height of the plug seedlings in the point cloud image.

Compare the plant height fitted through Equation (6) with the true value. As shown in Figure 14, this paper will directly fit it. The plant height is used for the calculation of the grading coefficient and the grading threshold for late emergence.

2.7.3. Extraction of Parameters of the Number of Normal Leaves and the Number of Abnormal Leaves of Plug Seedlings

The number of leaves of plug seedlings is one of the important indicators in the grading process of plug seedlings. During the growth of plug seedlings, insect-infested leaves, diseased leaves, and damaged leaves will have a significant impact on their grading. Generally, comprehensive judgment is made based on the number of these leaves, the severity level, and their influence on the overall growth of the seedlings. When there are only a few abnormal leaves on the plug seedlings and the overall impact on the leaves is small, and the growth of the seedlings is not significantly disturbed, there will generally be little impact during the grading. When there are a large number of abnormal leaves, it will affect the photosynthesis and transpiration of the seedlings, resulting in a certain degree of inhibition of the growth of the plug seedlings, leading to the unqualified growth of the plug seedlings.

In this paper, a deep learning algorithm was used to identify the leaves of plug seedlings, so as to identify the number of normal leaves and abnormal leaves of each seedling, and thus obtain the number of normal leaves and abnormal leaves of the plug seedlings.

2.8. Methods for Grading Plug Seedlings

The quality grading problem of plug seedlings is a typical classification problem based on features. In machine learning, there are many classification algorithms that can solve this kind of problem, such as support vector machines (SVMs), random forests, etc. In this paper, the plug seedlings are divided into three grades, namely first-grade seedlings, second-grade seedlings, and unqualified seedlings. In this paper, the features of 500 seedlings are collected as the dataset for machine learning, including four features: the number of normal leaves, the number of abnormal leaves, the leaf area, and the plant height.

Random forest is an ensemble learning algorithm that combines the results of multiple decision trees to improve the accuracy and stability of the model. A decision tree is a model that makes decisions based on a tree structure. Each internal node represents an attribute test, the branches are the test results, and the leaf nodes are the categories. It tests the attributes of the samples and uses the test results to gradually divide the samples into different child nodes until reaching the leaf nodes, so as to achieve the classification or prediction of the samples. Random Forest completes the classification or regression task by constructing multiple decision trees and then synthesizing the results of these decision trees. When constructing each decision tree, it introduces two kinds of randomness: Data random sampling: The bootstrap sampling method is adopted to randomly draw a certain number of samples with replacement from the original training dataset to form a new training subset for constructing each decision tree. This means that the training data of each tree may have repetitions and omissions, increasing the differences among the trees.

3. Results

3.1. Training Environment and Parameter Setting

Intel(R) Core(TM) i9-10900X CPU @ 3.70 GHz, GPU: RTX 3080, 10 GB video memory, Windows10 operating system, Cuda 11.1. Deep learning framework platforms: Pytorch version 1.10.0, Python version 3.9. Among them, the network training parameters of yolov11 were set as follows: batch size was 2, and the initial learning rate was set to 0.001. The Stochastic Graded Descent (SGD) method is used to optimize the momentum factor is 0.937 and the weight attenuation factor is 0.0005. The model was trained for 300 rounds in total, the model weights are saved through 10 iterations, and finally the model with the highest recognition accuracy is selected. The improved U-Net batch size for the experiment is set to 1. The optimizer is RMSprop with the initial learning rate set to 0.0001 and the weight decay factor set to 1 × 10⁻⁸.

3.2. Evaluation Index1

The evaluation metrics adopted in this paper are as follows: accuracy P (Precision), Recall R (Recall), average Accuracy (mAP), DICE, and average intersection and union ratio MIOU. Among them, the accuracy rate represents the proportion of true samples among the true samples predicted by the model. That is, the ratio of the number of true samples predicted correctly to the total number of true samples predicted. The formula is expressed as:

P r e c i s i o n = \frac{T P}{F P + T P}

(7)

In the formula, TP (True Positive) is the true sample predicted correctly, that is, the number of samples that are actually true samples and correctly predicted as true samples by the model; FP (False Positive) is the wrongly predicted true sample, that is, the number of samples that are actually false but wrongly predicted as true by the model.

Recall rate refers to the ratio of the number of true samples that a model can correctly detect to the actual number of true samples, reflecting the model’s ability to capture positive examples. The formula is expressed as follows:

R e c a l l = \frac{T P}{F N + T P}

(8)

In the formula, FN (false negative) is a false counterexample, that is, the number of samples that are actually true samples but wrongly predicted as false samples by the model.

mAP is a comprehensive metric calculated based on accuracy and recall rate, used to measure the overall detection performance of the model in different categories. It is obtained by calculating the average precision (AP) for each category and then taking the average of the aps for all categories.

A P = \int_{0}^{1} P (R) d R

(9)

m A P = \sum_{i = 1}^{N} \frac{A P_{i}}{N}

(10)

In the formula, N represents the number of identified categories.

The DICE index, also known as the DICE coefficient, is calculated based on the intersection and union between the predicted results and the true labels. It is mainly used for binary classification segmentation tasks. The maximum value of DICE is 1. The larger the value, the higher the similarity between the predicted segmentation result and the real label, and the better the segmentation effect. Conversely, the closer the DICE value is to 0, the worse the segmentation effect will be. The calculation formula of the DICE coefficient is as follows:

D I C E = \frac{2 |A \cap B|}{|A| + |B|}

(11)

Among them, A and B represent the predicted segmentation result set A and the true annotation set B, respectively.

The main function of the Mean Intersect-Union Ratio (MIOU) is to calculate the overlapping area of the predicted results with the true labels and the area of their union. When MIOU is 1, it indicates that the predicted results of all categories are completely consistent with the true labels, that is, the segmentation of all categories is perfect and accurate. This is an ideal and optimal state. When the MIOU is 0, it means that the predicted results of all categories have no overlap with the true labels at all, that is, the segmentation of all categories is completely wrong. Generally speaking, the closer the MIOU value is to 1, the better the overall segmentation effect of the model for each category is, and the higher the performance of the model will be. Conversely, the closer the MIOU value is to 0, the worse the segmentation effect is. The calculation formula of MIOU is as follows:

M I O U = \frac{1}{n} \sum_{i = 1}^{n} \frac{|A_{i} \cap B_{i}|}{|A_{i} \cup B_{i}|}

(12)

Among them, A is the predicted area of this category by the model; B is the truly labeled area of this category; and n is the number of categories. In this work, n = 1.

3.3. Experimental Results and Analysis of the Recognition Algorithm for Pepper Plug Seedlings Based on YOLOv11

After conducting experiments on the above experimental platform and in the given environment, we compared the results obtained from training YOLOv11 with the dataset for recognizing plug seedlings in RGB images with those of other versions of the YOLO algorithm. The experimental results are shown in Table 2.

From the table, we can see that the recognition effect of the YOLO series on the dataset for recognizing plug seedlings in RGB images generally improves with the iteration of the versions. Among them, YOLOv3 has the worst performance, with relatively low precision and recall rates, and its Mean Average Precision (mAP) is the lowest at 70.2%. YOLOv11 has the best performance. Both its precision and recall rates are the highest, reaching 97.8% and 96.3%, respectively, and its mAP reaches 98.5%. Therefore, the model trained by the YOLOv11 network we selected can meet the requirements of practical applications.

3.4. Experimental Results and Analysis of the Pepper Leaf Recognition Algorithm Based on YOLOv11

In order to obtain the number of leaves of pepper seedlings and determine whether there are problems such as insect-infested leaves, yellow leaves, and curled leaves in pepper plug seedlings, we use the recognition algorithm to identify the leaves of pepper seedlings and train the network with the RGB leaf recognition dataset. In this dataset, the leaves are labeled in two types. One is the normal leaf, labeled as 1, and the other is the abnormal leaf, labeled as 2. In this dataset, there are more than 1500 normal leaves and more than 300 abnormal leaves. We compared the evaluation indicators and detection effects of the models trained by different networks, and the evaluation indicators of each model are shown in Table 3.

In Table 3, the first row of each model represents the detection metrics for normal leaves, and the second row represents those for abnormal leaves. From the table, it can be observed that due to the relative simplicity of the dataset, each method achieves a relatively high recognition rate for normal leaves, meeting the requirements of practical applications. However, abnormal leaves exhibit inconsistent abnormal states. For example, yellow leaves and insect-infested leaves not only differ significantly in color but also in leaf morphology. This makes it challenging for various networks to capture the common features of abnormal leaves, resulting in generally lower metrics for abnormal leaves compared to normal leaves. Therefore, this paper focused on analyzing the recognition effects of each model on abnormal leaves.

Regarding the recognition of abnormal leaves, YOLOv3 still shows relatively low metrics across the board, while YOLOv11 performs excellently in all aspects. In terms of precision, YOLOv11 outperforms YOLOv3 by 19.7%. In terms of recall, YOLOv11 is 4.8% higher than YOLOv3. In terms of mAP, YOLOv11 exceeds YOLOv3 by 13.6%. YOLOv11 is clearly superior to YOLOv3 in every aspect.

3.5. Experimental Results and Analysis of the Pepper Leaf Recognition Algorithm Based on YOLOv11

To verify the effectiveness of the method proposed in this paper, the improved U-Net network model was compared with the original U-Net, DeepLabv3, MBSNet, AttUnet, TransAttUnet, and SegNet networks by training them on a two-dimensional point cloud image segmentation dataset. The experimental results of the trained models are shown in Table 4.

From the table, it can be seen that the improved U-Net performs excellently in the three indicators of DICE, IOU, and recall, and also has a relatively high precision, indicating that its overall performance in the segmentation task is good. AttUnet and DeepLabv3 also have good performances and have their own advantages in different indicators. MBSNet has a relatively high precision but a low recall, suggesting that there may be cases of missed detections. TransAttUnet and the original U-Net perform relatively poorly in the various indicators.

In addition to comparing the evaluation indicators of each model, this paper also randomly selected five images for segmentation by each model, so as to more intuitively display and analyze the segmentation effects of each model.

As can be seen from Figure 15, the improved U-Net outperforms other models. By comparing the images segmented by different models, it can be found that the improved U-Net basically has no missed detections. For other networks, such as SegNet, there is a relatively serious problem of missed detections. Although other networks do not have the problem of missed detections, the integrity of the detected leaves is not high. The essence of these phenomena is that the deep learning network fails to fully master the target features, resulting in the trained model being unable to completely identify the leaves. Thanks to the addition of the residual module C-Res in the connection structure of the U-Net in this paper, the ability to extract deep features is enhanced, enabling the network to have better ability to extract depth information during model training. This allows the network to better learn the deep information of the leaves, enabling the model to better identify the leaves and reducing the occurrence of missed detections. Therefore, the leaves segmented by the method proposed in this paper are the most complete among many methods.

In addition, the images segmented by the improved U-Net model in this paper are cleaner than those segmented by other models. There are noise points in the images segmented by other models, especially in U-Net, MBSnet, and Deeplabv3. The images segmented by these models have a large number of noise points, which seriously affects the segmentation effect of the model. Since the network extracts and collects various features during training, some features are correct target features, while some are incorrect features. Eventually, the trained model segments non-target areas during segmentation, resulting in noise in the segmented images. In the improved U-Net network selected in this paper, the features are screened during feature fusion through the gate attention module, reducing the impact of incorrect information during fusion and improving the effect of correct information during fusion. This enables the model trained by the network to better identify the target area, so the improved Unet has relatively fewer noise points. Based on the original gate attention module, this paper further improves it. The ResAG module uses the residual module to further enhance the feature extraction ability of the network, enabling the network to obtain more and deeper features, and these features are screened and fused through the gate attention module. Therefore, compared with the original U-Net network, the improved network in this paper has better feature extraction and learning abilities, and the model trained by the network proposed in this paper has the best segmentation effect.

To further verify the effectiveness of the module proposed in this paper, an ablation experiment was conducted to test the improvement of the network by different modules. The results are shown in Table 5.

Model 1 in Table 5 represents the model trained by the original U-net, Model 2 represents the model trained by the U-net with the C-Res module added, and Model 3 represents the model trained by the U-net with the C-Res module and the ResAG module added, which is also the improved U-net proposed in this paper. It can be seen from the chart that Model 3 introduces the C-Res module and the ResAG module on the basis of the basic model. Compared with Model 1 and Model 2, it shows significant advantages in multiple performance indicators. Specifically, the DICE coefficient of Model 3 reached 0.834, increasing by 0.126 and 0.024 compared with Model 1 (0.708) and Model 2 (0.810); the IOU index was 0.725, which increased by 0.148 and 0.022 compared with Model 1 (0.577) and Model 2 (0.703). Furthermore, Model 3 also outperforms Model 1 and Model 2 in terms of precision (0.914) and recall (0.78) metrics, indicating that while maintaining a high detection accuracy, it can capture the target area more comprehensively and reduce the situations of missed detections and false detections.

3.6. Experimental and Result Analysis of the Grading Model

After obtaining the dataset, in this paper, manual classification was carried out according to various features. The classified dataset was fed into different machine learning algorithms for training. After obtaining the model, result analysis was conducted, and finally, the model with the best performance was selected as the grading model of this paper.

In this paper, accuracy, F1 score, and AUC–ROC are selected as the evaluation indicators of the model. Accuracy refers to the proportion of correctly classified samples in the total number of samples, which reflects the degree of correct classification of all samples by the model. It can be expressed by the formula:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(13)

TP represents the number of samples that are actually positive and predicted as positive by the model; TN represents the number of samples that are actually negative and predicted as negative by the model; FP represents the number of samples that are actually negative but predicted as positive by the model; FN represents the number of samples that are actually positive but predicted as negative by the model.

The F1 value is the harmonic mean of Precision and Recall. It comprehensively takes into account the accuracy and integrity of the model and can more comprehensively evaluate the performance of the model in classifying positive and negative samples, especially suitable for the situation where the positive and negative samples are imbalanced.

F 1 = \frac{2 \times P r e c i s i o n + R e c a l l}{P r e c i s i o n \times R e c a l l}

(14)

The ROC curve is a curve plotted with the false positive rate (FPR) as the abscissa and the TRUE POSITIVE RATE (TPR) as the ordinate, and the AUC–ROC represents the area under this curve. The larger the value of AUC–ROC, the better the classification performance of the model. It measures the ability of the model to correctly distinguish positive and negative examples under different thresholds, and comprehensively evaluates the overall performance of the classifier, which is not affected by the imbalance of samples.

R O C = \sum_{i = 1}^{n - 1} \frac{(x_{i + 1} - x_{i}) \times (y_{i + 1} + y_{i})}{2}

(15)

Among them, (xi, yi) and (xi+1, yi+1) are the coordinates of two adjacent points on the ROC curve, and n is the number of points on the ROC curve.

As can be seen from Table 6, logistic regression shows relatively stable performance, with an accuracy of 92.0%, an F1 score of 91.8%, and an ROC-AUC of 96.8%. Although its performance is not as good as that of random forest and XGBoost, it has advantages in simplicity and computational efficiency. Random forest performs the best, with an accuracy of 97.0%, an F1 score of 96.9%, and an ROC-AUC of 99.1%. This indicates that random forest has strong capabilities in handling complex data and nonlinear relationships. XGBoost’s performance is close to that of random forest, with an accuracy of 96.0%, an F1 score of 95.8%, and an ROC-AUC of 99.4%. The support vector machine has an Accuracy and an F1 score both of 96.0%, and an ROC-AUC of 95.5%. Although its performance is good, the training time is long. K-Means clustering performs the worst, with an Accuracy of 71.0%, an F1 score of 70.5%, and an ROC-AUC of 83.4%. Since it is essentially an unsupervised learning method and lacks the utilization of label information, the model trained by it has the worst effect. Through the experiment in this paper, the random forest algorithm has a better effect on this dataset, so random forest is selected as the grading algorithm in this paper.

4. Discussion

At present, most of the mainstream methods for grading tray seedlings only detect whether there are missing seedlings in the trays, but do not grade the quality of the tray seedlings. The few studies on quality grading are based on extracting the basic characteristics of tray seedlings for grading. During this process, the occlusion and interference between seedlings are indispensable, which greatly reduces the accuracy of grading. Therefore, the grading method of tray seedlings with high precision and high stability still needs further research. This paper studies an intelligent grading method for pepper tray seedlings based on RGB and point cloud images, which is used to achieve the quality grading of tray seedlings during the grading transplanting process. The RGB images and point cloud images of the pepper tray seedlings were used to accurately grade the pepper tray seedlings, so as to improve the grading accuracy and efficiency and reduce the labor cost. And a set of intelligent grading system for pepper tray seedlings based on RGB and point cloud images is designed to be better applied in the grading and transplanting work.

This paper focuses on improving the segmentation algorithm based on deep learning. Based on the U-net algorithm, this algorithm introduces the innovative C-Res residual module and the gate attention module ResAG module. These introduced innovative modules enable the algorithm to have better segmentation accuracy and segmentation stability. By experimentally comparing the improved U-Net with the mainstream segmentation networks, the results show that the improved model has significant improvements in various indicators. Among them, the accuracy rate, recall rate, DICE coefficient, and average intersection and union ratio reach 91.4%, 78.0%, 83.4%, and 72.5%, respectively, performing the best among similar algorithms. Compared with the algorithm before improvement, the improved algorithm has increased by 2.6%, 6.4%, 5.5%, and 6.5% in terms of accuracy rate, recall rate, DICE coefficient and average intersection and union ratio, respectively.

Study the extraction methods of various features of plug seedlings, and extract the number of normal leaves, the number of abnormal leaves, the leaf area, and the plant height parameters of plug seedlings. Use the obtained characteristic parameters to create a grading dataset for plug seedlings and use mainstream classification problem algorithms for classification training. Finally, use the random forest algorithm with the best effect as the grading method of this method. The accuracy, F1, and ROC of the random forest are 97.0%, 96.9%, and 99.1%, respectively.

In terms of real-time performance, when dealing with large-scale data or complex scenarios, the computing speed of the model can meet the real-time requirements of seedling production and improve the overall efficiency of transplanting. Despite the obvious technical advantages, many challenges are still faced in the actual deployment process. In terms of cost, the research and development, procurement, and maintenance expenses of the entire set of equipment are high, including hardware investments such as high-precision sensors and professional computing equipment, as well as software costs such as algorithm optimization and system upgrades. This poses a significant economic pressure on small agricultural enterprises and farmers. In addition, the problem of occlusion treatment is also quite prominent. In the actual seedling raising environment, the mutual occlusion of tray seedlings is widespread. Although the model used in this paper can grade tray seedlings under slight occlusion, for large-scale occlusion, this method is difficult to accurately identify the characteristics of the occluded parts of the seedlings, which is prone to cause deviation in the detection results and reduce the accuracy of grading.

5. Conclusions

This paper takes pepper plug seedlings as the research object, proposes an intelligent grading method for plug seedlings based on RGB images and point cloud images, and designs an intelligent grading software. The model trained by the deep learning network is used to extract features from the images of pepper plug seedlings, and important parameters for the grading of plug seedlings are obtained. Then, the random forest algorithm is used to grade the plug seedlings. The specific research contents of this paper are as follows:

Design and build an image acquisition platform and use the built image acquisition platform to collect RGB images and point cloud images of pepper plug seedlings. Preprocess the collected images, annotate the processed images, and create an RGB plug seedling recognition dataset, an RGB leaf recognition dataset, and a two-dimensional point cloud image segmentation dataset.
Conduct research on mainstream target recognition algorithms, compare mainstream target recognition algorithms through experiments, construct a pepper plug seedling recognition model and a pepper leaf recognition dataset, and determine that YOLOv11 is used as the recognition network of this method.
Improve the image segmentation network based on U-net and add the C-Res residual module and the ResAG gate attention module to the connection structure of U-net. Conduct experiments and analyses on the improved U-Net, compare it with mainstream segmentation networks in experiments. The improved model has improvements in various indicators. By comparing and analyzing the segmented images obtained by other models, it is shown that the improved modules are effective in function. At the same time, ablation experiments are carried out on the improved network. The results show that each module used in the improved network can improve the detection effect of the model, and the finally improved model can meet the actual needs of this method.
The extraction methods of various characteristics of the tray seedlings were studied, and the normal leaf number, abnormal leaf number, leaf area, and plant height parameters of the tray seedlings were extracted. The obtained characteristic parameters were used to create a classification dataset of tray seedlings, and the mainstream classification problem algorithms were used for classification training. Finally, the random forest algorithm with the best effect was used as the classification method of this method.

Due to limited conditions, this paper did not build a prototype for actual grading experiments. Subsequently, it is planned to construct a complete hardware experimental platform to simulate the real agricultural production environment, in order to verify the feasibility and stability of the intelligent grading method proposed in this paper in practical applications. Through prototype experiments, we can further optimize the layout and parameter settings of the image acquisition equipment to ensure that high-quality image data can still be obtained under different environmental conditions such as light and humidity.

Author Contributions

Methodology, F.Y. and Z.X.; methodology, F.Y. and Z.X.; software, G.M. and J.L.; validation, Z.X. and Q.Z.; formal analysis, J.L. and Q.Z.; investigation, G.M. and Z.Z.; resources, Q.Z. and Z.Z.; data curation, J.L. and Q.Z.; writing—original draft, G.M.; writing—review and editing, F.Y., Z.X., and X.W.; visualization, G.M. and J.L.; supervision, X.W. and Z.X.; project administration, F.Y. and X.W.; funding acquisition, Z.Z. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Hunan Province, China, grant number 2023JJ50125, and the Key Projects of Industry University Research Cooperation in Hengyang City, grant number 202330066437.

Data Availability Statement

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

Author Zhenhong Zou was employed by the company Hengyang Vegetable Seeds Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ji, J.; Li, P.; Jin, X.; Feng, H.; Li, M. Quantitative Detection of the Robustness of Tomato Seedlings during the Spring Seedling Raising and Transplanting Period Based on Spectroscopy. Spectrosc. Spectr. Anal. 2022, 42, 1741. [Google Scholar]
Zhou, M.; Shan, Y.; Xue, X.; Yin, D. Theoretical analysis and development of a mechanism with punching device for transplanting potted vegetable seedlings. Int. J. Agric. Biol. Eng. 2020, 13, 85–92. [Google Scholar] [CrossRef]
Ting, K.C.; Giacomelli, G.A.; Ling, P.P. Workability and productivity of robotic plug transplanting work cell. Vitr. Cell. Dev. Biol.-Plant 1992, 28, 5–10. [Google Scholar] [CrossRef]
Jiang, Z.; Hu, Y.; Jiang, H.; Tong, J. Design and optimization of end-effector for automatic plug seedling transplanter in greenhouses. In Proceedings of the 2015 ASABE Annual International Meeting, New Orleans, LA, USA, 26–29 July 2015; p. 1. [Google Scholar]
Feng, Q.; Zhao, C.; Jiang, K.; Fan, P.; Wang, X. Design and test of tray-seedling sorting transplanter. Int. J. Agric. Biol. Eng. 2015, 8, 14–20. [Google Scholar]
Tong, J.H.; Li, J.B.; Jiang, H.Y. Machine vision techniques for the evaluation of seedling quality based on leaf area. Biosyst. Eng. 2013, 115, 369–379. [Google Scholar] [CrossRef]
Tai, Y.; Ling, P.; Ting, K.C. Machine vision assisted robotic seedling transplanting. Trans. ASAE 1994, 37, 661–667. [Google Scholar] [CrossRef]
Ling, P.P.; Ruzhitsky, V.N. Machine vision techniques for measuring the canopy of tomato seedling. J. Agric. Eng. Res. 1996, 65, 85–95. [Google Scholar] [CrossRef]
Jiang, H.; Shi, J.; Ren, Y.; Ying, Y. Application of Machine Vision in Automatic Seedling Pot Transfer Operation. Trans. Chin. Soc. Agric. Eng. 2009, 25, 127–131. [Google Scholar]
Sun, G.; Wang, X.; He, G. Overlapping Tomato Seedling Leaf Segmentation Algorithm Based on Edge Chain Code Information. Trans. Chin. Soc. Agric. Eng. 2010, 26, 206–211. [Google Scholar]
Hu, F.; Yin, W.; Chen, C.; Xu, B. Research on Recognition and Positioning of Plug Seedlings Based on Machine Vision. J. Northwest AF Univ. (Nat. Sci. Ed.) 2013, 41, 183–188. [Google Scholar]
Giselsson, T.M.; Midtiby, H.S.; Jørgensen, R.N. Seedling discrimination with shape features derived from a distance transform. Sensors 2013, 13, 5585–5602. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Xiao, X.; Liang, X.; Wang, J.; Wu, C.; Chen, J. Hole Positioning and Seedling Deficiency Detection System of the Automatic Seedling Supplementary Test Bench for Vegetable Plug Seedlings. Trans. Chin. Soc. Agric. Eng. 2018, 34, 35–41. [Google Scholar]
Zhang, G.; Fan, K.; Wang, H.; Cai, F.; Gu, S. Experimental Study on the Detection of Plug Seedlings Based on Machine Vision. J. Agric. Mech. Res. 2020, 42, 175–179. [Google Scholar]
Tong, J.; Yu, J.; Wu, C.; Yu, G.; Du, X.; Shi, H. Health information acquisition and position calculation of plug seedling in greenhouse seedling bed. Comput. Electron. Agric. 2021, 185, 106146. [Google Scholar] [CrossRef]
Wang, J.; Gu, R.; Sun, L.; Zhang, Y. Non-destructive Monitoring Method for the Growth Process of Plug Seedlings Based on Kinect Camera. Trans. Chin. Soc. Agric. Mach. 2021, 52, 227–235. [Google Scholar]
Zhang, L.; Tan, Y.; Jiang, Y.; Wang, S. Automatic Detection Method for Late-emerging Plug Seedlings Based on Point Cloud Processing. Trans. Chin. Soc. Agric. Mach. 2022, 53, 261–269. [Google Scholar]
Jin, X.; Li, R.; Tang, Q.; Wu, J.; Jiang, L.; Wu, C. Low-damage transplanting method for leafy vegetable seedlings based on machine vision. Biosyst. Eng. 2022, 220, 159–171. [Google Scholar] [CrossRef]
Jin, X.; Tang, L.; Li, R.; Zhao, B.; Ji, J.; Ma, Y. Edge recognition and reduced transplantation loss of leafy vegetable seedlings with Intel RealsSense D415 depth camera. Comput. Electron. Agric. 2022, 198, 107030. [Google Scholar] [CrossRef]
Sifat, R.I. ChatGPT and the Future of Health Policy Analysis: Potential and Pitfalls of Using ChatGPT in Policymaking. Ann. Biomed. Eng. 2023, 51, 1357–1359. [Google Scholar] [CrossRef]
Feng, Z.; Zhang, Z.; Yu, X.; Fang, Y.; Li, L.; Chen, X.; Lu, Y.; Liu, J.; Yin, W.; Feng, S.; et al. ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts. arXiv 2022, arXiv:2210.15257. [Google Scholar]
Ren, X.; Zhou, P.; Meng, X.; Huang, X.; Wang, Y.; Wang, W.; Li, P.; Zhang, X.; Podolskiy, A.; Arshinov, G.; et al. PanGu-∑: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. arXiv 2023, arXiv:2303.10845. [Google Scholar]
Cheng, C. Integration and Practice of Intelligent Sensors in Precision Agriculture. Farm Mach. Use Maint. 2024, 46–49. [Google Scholar] [CrossRef]
Guo, Z. Application and Prospect of Precision Agriculture Technology in Modern Agriculture. Rural. Pract. Technol. 2025, 67–68. [Google Scholar]
Ding, C. Application Status and Development of Mechatronics in Precision Agriculture. Farm Mach. Use Maint. 2025, 75–77. [Google Scholar] [CrossRef]
Qin, C. Research on the Application of Big Data Technology in the Management of Precision Agricultural Machinery. S. Agric. Mach. 2025, 56, 83–86. [Google Scholar]
Fan, X.; Wang, L.; Liu, J.; Zhou, Y.; Zhang, Z.; Suo, X. Appearance Quality Detection Method of Maize Seeds Based on Improved YOLO v4. Trans. Chin. Soc. Agric. Mach. 2022, 53, 226–233. [Google Scholar]
Zhang, H.; Yan, N.; Wu, X.; Wang, C.; Luo, B. Design and Experiment of On-line Detection and Sorting Device for Single Maize Seeds. Trans. Chin. Soc. Agric. Mach. 2022, 53, 159–166. [Google Scholar]
Samiei, S.; Rasti, P.; Vu, J.L.; Buitink, J.; Rousseau, D. Deep learning-based detection of seedling development. Plant Methods 2020, 16, 1–11. [Google Scholar] [CrossRef]
He, Y. Research on the Recognition Algorithm of Plug Tobacco Seedlings Based on Machine Vision. Master’s Thesis, Southwest University, Chongqing, China, 2019; pp. 40–47. [Google Scholar]
Zhang, Y. Research on the Recognition and Control System of Plug Seedlings for the Full-automatic Transplanter. Master’s Thesis, Shandong Agricultural University, Tai’an, China, 2019. [Google Scholar]
Xiao, Z.; Tan, Y.; Liu, X.; Yang, S. Classification method of plug seedlings based on transfer learning. Appl. Sci. 2019, 9, 2725. [Google Scholar] [CrossRef]
Perugachi-Diaz, Y.; Tomczak, J.M.; Bhulai, S. Deep learning for white cabbage seedling prediction. Comput. Electron. Agric. 2021, 184, 106059. [Google Scholar] [CrossRef]
Kolhar, S.; Jagtap, J. Spatio-temporal deep neural networks for accession classification of Arabidopsis plants using image sequences. Ecol. Inform. 2021, 64, 101334. [Google Scholar] [CrossRef]
Jin, X.; Tang, L.; Li, R.; Ji, J.; Liu, J. Selective transplantation method of leafy vegetable seedlings based on ResNet 18 network. Front. Plant Sci. 2022, 13, 893357. [Google Scholar] [CrossRef] [PubMed]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
DB6528/T 110-2023; Quality Grading of Chili Seedling Trays for Processing. The Bureau of Agriculture and Rural Affairs of Bayingolin Mongolian Autonomous Prefecture: Korla, China, 2023.
Edelsbrunner, H.; Kirkpatrick, D.; Seidel, R. On the shape of a set of points in the plane. IEEE Trans. Inf. Theory 1983, 29, 551–555. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–23 June 2023; pp. 7464–7475. [Google Scholar]
Varghese, R.; Sambath, M. Yolov8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 28 September–4 October 2024; Springer Nature Switzerland: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; part III 18. Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Ye, Y.; Huang, P.; Sun, Y.; Shi, D. MBSNet: A deep learning model for multibody dynamics simulation and its application to a vehicle-track system. Mech. Syst. Signal Process. 2021, 157, 107716. [Google Scholar] [CrossRef]
Chen, B.; Liu, Y.; Zhang, Z.; Lu, G.; Kong, A.W.K. TransAttUnet: Multi-level attention-guided U-Net with transformer for medical image segmentation. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 8, 55–68. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]

Figure 1. Working flowchart of the hierarchical system.

Figure 2. Image acquisition equipment.

Figure 3. Image acquisition platform.

Figure 4. Acquired RGB images of pepper seedlings.

Figure 5. Acquired point cloud image.

Figure 6. Comparison of images before and after ROI processing. (a) Comparison of two-dimensional point cloud images before and after processing. (b) Comparison of RGB images before and after processing.

Figure 7. YOLOv11 network structure diagram.

Figure 8. Improved U-net network structure.

Figure 9. C-Res module structure.

Figure 10. ResAG module structure diagram.

Figure 11. Point cloud of seedling leaves composed by triangular patches with topological relationship.

Figure 12. Linear fitting results between true values and calculated values of leaf area.

Figure 13. Comparison between true values and fitting values of leaf area.

Figure 14. Comparison between true values and calculated values of plant height.

Figure 15. Comparison of segmentation results of different algorithms. Among them, black represents the background and white represents the segmented leaves.

Table 1. The advantages and disadvantages of shooting from different angles.

	Advantages	Disadvantages
Top-down shooting	1. Take pictures of the entire tray of seedlings at one time. 2. It can output the grades of the entire tray of seedlings at one time.	1. Due to the perspective issue, the obtained leaf area is inaccurate. 2. The obtained feature information is too limited, resulting in low classification accuracy.
Side shooting	More features can be obtained, and higher classification accuracy can be achieved.	Since the seedlings cover each other from the side, each seedling must be photographed one by one before grading, which makes the grading process cumbersome and inefficient.

Table 2. The effect of each model on the RGB image cavitation seedling recognition dataset.

Model	Precision (P)	Recall (R)	mAP
YOLOv3 [41]	80.8%	62.1%	70.2%
YOLOv4 [42]	95.8%	88.7%	88.6%
YOLOv5 [43]	82.8%	76.4%	81.8%
YOLOv6 [44]	80.9%	78.0%	84.4%
YOLOv7 [45]	89.0%	61.5%	77.4%
YOLOv8 [46]	87.4%	79.8%	89.0%
YOLOv9 [47]	96.6%	89.4%	95.2%
YOLOv10 [48]	97.2%	91.8%	96.8%
YOLOv11 [49]	97.8%	96.3%	98.5%

Table 3. Effects of each model on the RGB leaf recognition dataset.

Model	Precision (P)	Recall (R)	mAP
YOLOv3	97.1%	88.0%	97.7%
YOLOv3	76.9%	76.2%	72.1%
YOLOv4	94.1%	86.5%	95.1%
YOLOv4	85.3%	77.6%	78.4%
YOLOv5	95.9%	93.1%	98.2%
YOLOv5	97.0%	78.1%	84.7%
YOLOv6	93.7%	91.1%	97.3%
YOLOv6	97.9%	73.4%	79.3%
YOLOv7	94.6%	92.3%	97.8%
YOLOv7	96.5%	77.1%	80.5%
YOLOv8	95.4%	93.9%	97.8%
YOLOv8	97.4%	76.9%	86.2%
YOLOv9	94.7%	93.6%	98.0%
YOLOv9	94.4%	80.3%	84.9%
YOLOv10	95.7%	96.9%	98.5%
YOLOv10	96.7%	76.2%	85.0%
YOLOv11	97.0%	89.4%	98.5%
YOLOv11	96.6%	81.0%	85.7%

Table 4. Performance of different segmentation models on datasets.

	DICE	IOU	Precision	Recall
U-Net [50]	0.708	0.577	78.2%	69.5%
DeepLabv3 [51]	0.766	0.643	92.5%	68.0%
MBSNet [52]	0.711	0.577	91.8%	60.8%
AttUnet [38]	0.779	0.660	88.8%	71.6%
TransAttUnet [53]	0.692	0.559	85.0%	61.8%
SegNet [54]	0.715	0.578	81.5%	66.7%
The improved U-Net	0.834	0.725	91.4%	78.0%

Table 5. Comparison of ablation experiments.

	Basic	C-Res	ResAG	DICE	IOU	Precision	Recall
model 1	√			0.708	0.577	78.2%	69.5%
model 2	√	√		0.810	0.703	82.4%	82.2%
model 3	√	√	√	0.834	0.725	91.4%	78.0%

Table 6. Experimental results of different classification models.

	Accuracy	F1	ROC
Logistic Regression	92.0%	91.8%	96.8%
Random Forest	97.0%	96.9%	99.1%
XGBoost	96.0%	95.8%	99.4%
Support Vector Machine	96.0%	95.8%	95.5%
K-Clustering	71.0%	70.5%	83.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, F.; Ma, G.; Zeng, Q.; Liu, J.; Xiao, Z.; Zou, Z.; Wang, X. RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings. Agronomy 2025, 15, 1568. https://doi.org/10.3390/agronomy15071568

AMA Style

Yuan F, Ma G, Zeng Q, Liu J, Xiao Z, Zou Z, Wang X. RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings. Agronomy. 2025; 15(7):1568. https://doi.org/10.3390/agronomy15071568

Chicago/Turabian Style

Yuan, Fengwei, Guoning Ma, Qinghao Zeng, Jinghong Liu, Zhang Xiao, Zhenhong Zou, and Xiangjiang Wang. 2025. "RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings" Agronomy 15, no. 7: 1568. https://doi.org/10.3390/agronomy15071568

APA Style

Yuan, F., Ma, G., Zeng, Q., Liu, J., Xiao, Z., Zou, Z., & Wang, X. (2025). RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings. Agronomy, 15(7), 1568. https://doi.org/10.3390/agronomy15071568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RGB and Point Cloud-Based Intelligent Grading of Pepper Plug Seedlings

Abstract

1. Introduction

2. Materials and Methods

2.1. Intelligent Grading Method for Pepper Seedling Trays Based on RGB and Point Cloud Images

2.2. Data Collection

2.3. The YOLO Object Recognition Algorithm

2.4. Improvements Based on the U-Net Image Segmentation Algorithm

2.5. Improve the Residual Module C-Res

2.6. The Improved Attention Gate Module ResAG

2.7. Feature Extraction Methods for Plug Seedlings

2.7.1. Extraction of the Leaf Area Parameter of Plug Seedlings

2.7.2. Extraction of Plant Height Parameter for Plug Seedlings

2.7.3. Extraction of Parameters of the Number of Normal Leaves and the Number of Abnormal Leaves of Plug Seedlings

2.8. Methods for Grading Plug Seedlings

3. Results

3.1. Training Environment and Parameter Setting

3.2. Evaluation Index1

3.3. Experimental Results and Analysis of the Recognition Algorithm for Pepper Plug Seedlings Based on YOLOv11

3.4. Experimental Results and Analysis of the Pepper Leaf Recognition Algorithm Based on YOLOv11

3.5. Experimental Results and Analysis of the Pepper Leaf Recognition Algorithm Based on YOLOv11

3.6. Experimental and Result Analysis of the Grading Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI