In Situ Measuring Stem Diameters of Maize Crops with a High-Throughput Phenotyping Robot

: Robotic High-Throughput Phenotyping (HTP) technology has been a powerful tool for selecting high-quality crop varieties among large quantities of traits. Due to the advantages of multi-view observation and high accuracy, ground HTP robots have been widely studied in recent years. In this paper, we study an ultra-narrow wheeled robot equipped with RGB-D cameras for inter-row maize HTP. The challenges of the narrow operating space, intensive light changes, and messy cross-leaf interference in rows of maize crops are considered. An in situ and inter-row stem diameter measurement method for HTP robots is proposed. To this end, we ﬁrst introduce the stem diameter measurement pipeline, in which a convolutional neural network is employed to detect stems, and the point cloud is analyzed to estimate the stem diameters. Second, we present a clustering strategy based on DBSCAN for extracting stem point clouds under the condition that the stem is shaded by dense leaves. Third, we present a point cloud ﬁlling strategy to ﬁll the stem region with missing depth values due to the occlusion by other organs. Finally, we employ convex hull and plane projection of the point cloud to estimate the stem diameters. The results show that the R 2 and RMSE of stem diameter measurement are up to 0.72 and 2.95 mm, demonstrating its effectiveness. point (5) polygon area calculation PCL the of the point the X–Y projection the and maximum on the Y -axis according to the 2D point cloud coordinates. Calculating the stem diameters with the following equation: Here, W is the stem S is the polygon area of the point on the plane, and the Y max and Y min the maximum and minimum on the respectively.


Introduction
Agricultural production is undergoing the ever-increasing challenges of global climate change, natural disasters, and population growth. The global population is expected to reach 10 billion [1,2], meaning that a 70% increment of crop yields has to be achieved over the next 30 years to meet the growing demand for food [3,4]. To increase the crop yields, molecular/genetic technology has been utilized in the breeding process [5]. However, the manual phenotypic screening in breeding is usually high-cost, time-consuming, and laborious, leading to a bottleneck in the further development of breeding technology [6,7]. The robotic HTP technique provides abundant phenotypic information in an automated and effective way and considerably eases the manual workload of breeders to select highyield varieties from a large amount of samples [8]. Consequently, robotic HTP has been one of the most attractive topics in agriculture [9].
Maize, one of the main food and economic crops, has been the focus of many breeders in cultivating new varieties. High-throughput phenotyping of maize crops is a critical step 2 of 20 to improve the yield. Particularly, the stem diameter of maize crops is an important index to measure the lodging resistance [10], which requires the HTP robots to collect phenotypic data of stems through inter-row shuttling. Unfortunately, the inter-row environment is narrow, the lighting varies, and leaf-occlusion clutter is present. Furthermore, the depth noise or error of three-dimensional (3D) sensors used to collect phenotypic data has a negative effect on the estimation accuracy of phenotypic parameters. Low-quality depth information cannot completely represent the 3D structure of crops [11,12]. Determining how to overcome these difficulties to accurately estimate the stem diameter between crop rows is urgent and challenging.
In our work, we study a stem diameter measurement pipeline on a self-developed phenotyping platform equipped with an RGB-D camera and propose an HTP robot system from data acquisition to analyze phenotypic parameters and measure the maize stem diameters under complicated field conditions. The main contributions are summarized as follows:

1.
A strategy of stem point cloud extraction is proposed to cope with the stems in the shade of dense leaves. This strategy solves the problem of extracting stem point clouds under canopy with narrow row spacing and cross-leaf occlusion; 2.
A real-time measurement pipeline is proposed to estimate the stem diameters. In this pipeline, we present two novel stem diameter estimation approaches based on stem point cloud geometry. Our approaches can effectively reduce the influences of depth noise or error on the estimation results; 3.
A post-processing approach is presented to fill the missing parts of the stem point clouds caused by the occlusion of dense adjacent leaves. This approach ensures the integrity of the stem point clouds obtained by RGB-D cameras in complex field scenarios and improves the accuracy of stem diameter estimation.
We hope that the study of stem diameter estimation for high-stem crops, such as maize, in real and complex field scenarios can accelerate the development of field phenotypic equipment and technology.

HTP Platforms
Field-based HTP is a multi-scale crop observation technique based on phenotyping platforms equipped with multiple types of sensors [13]. Currently, the common phenotyping platforms can be roughly divided into three types: fixed, aerial, and mobile platforms [14]. Based on the inherent properties of these platforms, phenotyping parameters can be obtained at various scales, from organ level to plot level. Generally, the fixed platforms fitted with visual and laser sensors allow for high-accuracy monitoring of different crop organs with a 360 • view [15,16]. However, they only obtain the phenotyping parameters of fixed plots. Aerial platforms, such as Unmanned Aerial vehicles (UAVs), enable rapid observation of crops at the plot level [17]. However, the sensing accuracy gradually decreases with increases in flight altitude [18,19]. Mobile platforms, especially mobile robots, are a new type of phenotyping platform that can travel freely in breeding fields. This high mobility gives them a unique ability: they can automatically obtain phenotypic parameters between narrow crop rows in different plots [20,21]. Therefore, the mobile robots used for phenotypic observation integrate the dual advantages of fixed and aerial platforms, which is significant for promoting the rapid development of phenomics [22].

HTP Robots
As an interdisciplinary subject of agronomy, computer science, and robotics, phenotyping technologies based on mobile robots equipped with various phenotyping sensors, named with HTP robots, have been widely reported [23][24][25][26]. The representative HTP robots come from some of the top research institutions, such as Carnegie Mellon University [27,28] and the University of Illinois at Urbana-Champaign [29,30]. Their HTP robots focus on crop row phenotyping of high-stem crops (maize, sorghum, and sugarcane, for example) Remote Sens. 2022, 14, 1030 3 of 20 by stereo cameras and/or depth cameras. Ref. [28] presented a deep-learning-based online pipeline for in situ sorghum stem detection and grasping. Ref. [29] developed a tracked HTP robot from design to field evaluation, and measured the stem height and width of energy sorghum based on their previous work. Ref. [30] described high-precision control and corn stand counting algorithms for an autonomous field robot. Additionally, Ref. [31] employed the "Phenomobile", a phenotyping robot [32] equipped with 3D LIDAR to obtain the row spacing and plant height of a maize field. The highlight of this work is that the HTP robot could obtain parcel-level phenotyping parameters by moving along the side of a road in a breeding field, rather than traveling between crop rows.

Phenotyping Sensors
These advanced robots have exhibited excellent performance in terms of plant phenotyping, which benefits from phenotyping sensors used to perceive crop information. Color digital cameras, spectrometers (hyperspectral and multispectral), thermal infrared cameras, etc. are widely used on HTP robots [33]. In addition, some 3D sensors, stereo cameras, depth cameras, and LIDAR, can be used to obtain crop 3D data. Based on the sensing data, crops' morphological and structural parameters can be extracted [34]. Although these sensors work well in phenotyping, they usually only capture a single type of data. For example, color digital cameras and LIDAR can only capture RGB images and 3D point clouds, respectively. Stereo cameras can calculate the depth values of observed targets from two RGB images with different perspectives, but they have heavy computation loads [35]. In recent years, RGB-D cameras have received sufficient attention in the application of HTP [36,37], because they can obtain color and depth images in the same frame at close range, which makes them a potential alternative to color cameras and LIDAR. In our work, we conduct phenotyping using a self-developed phenotyping platform equipped with an RGB-D camera.

Maize Phenotyping
Field-based maize phenotyping is difficult under natural growing conditions because of the disturbance of lighting conditions and crossover/shading of leaves from adjacent shoots that occurs in the later growth stages. Thus, many phenotyping studies have focused on the early growth stages, observing the maize canopy with top-view and relatively regular scenarios [38][39][40]. Ref. [39] proposed an approach to extract morphological and color-related phenotypes that uses an end-to-end segmentation network from top-view images at the seeding stage. Ref. [40] utilized image sequences obtained by UAV to reconstruct a 3D model of maize crops and estimated the leaf number, plant height, individual leaf area, etc. Ref. [38] developed a robot system composed of a four-degree robotic manipulator, a ToF camera, and a linear potentiometer, which used deep learning and conventional image processing to detect and grasp maize stems in a greenhouse. Admittedly, some exploratory studies have been carried out in real open fields, as mentioned in references [27,29]. Ref. [41] described a 3D reconstruction and point cloud processing pipeline of maize crops in the native environment, and realized the extraction of main parameters for individual plant phenotypic characteristics. However, some key technologies, such as feature recognition, phenotypic parameter extraction, and occlusion mentioned above, still need to be addressed in a better way. The maize yield is directly influenced by lodging tolerance. Related studies have shown that lodging can cause 80% yield losses, depending on the crop and field location [42]. The stem strength of crops is an important index of lodging tolerance. Herein, our research interests focus on the automatic measurement of maize stem diameters during the mature stage in the natural environment.

HTP Platform
Maize plants usually have row spacing of 0.5-0.8 m for most cultivation patterns of equal row spacing. To extract the phenotypic parameters automatically under these Remote Sens. 2022, 14, 1030 4 of 20 cultivation patterns, we developed an ultra-narrow phenotyping robot platform that can move between maize crop rows [43]. The mechanical dimensions of the platform are 0.80 m × 0.45 m × 0.40 m (length × width × height, not counting the height of the mast), with a mass of 40 kg. In addition, a retractable mast could be mounted on the robot platform. It is used to fix sensors to observe crop organs during any growth period of maize. The maximum height of the mast can reach 2.2 m.
The robot system adopted a distributed design. We divided it into three control units: driving module, navigation module, and phenotypic data acquisition module. The driving module was a wireless remote-control system and the navigation module was a control system based on an industrial personal computer (IPC). The bottom controller of the robot was an STM32 development board, which was used for the motion control of four electronic governors by connecting to a central expansion board. The four electronic governors were four deceleration motors connected to the wheel bearing. The navigation module used a Global Navigation Satellite System (GNSS) and a laser scanner to realize the mapping and navigation in the field based on Cartographer, a simultaneous localization and mapping (SLAM) algorithm. Notes that the navigation module was an integral part of our robot, but it was not the research focus of this paper. The data acquisition module consisted of an RGB-D camera and four lighting devices, which could ensure that the robot walking under the dense canopy could still capture sufficient 3D information of crops, even in poor lighting conditions. The robot software algorithms we developed were coded under Robot Operating System (ROS) based on Ubuntu 20.04. The schematic of our HTP robot platform is shown in Figure 1. The specifications and parameters of our robot are shown in Table 1.

HTP Platform
Maize plants usually have row spacing of 0.5-0.8 m for most cultivation patterns of equal row spacing. To extract the phenotypic parameters automatically under these cultivation patterns, we developed an ultra-narrow phenotyping robot platform that can move between maize crop rows [43]. The mechanical dimensions of the platform are 0.80 m × 0.45 m × 0.40 m (length × width × height, not counting the height of the mast), with a mass of 40 kg. In addition, a retractable mast could be mounted on the robot platform. It is used to fix sensors to observe crop organs during any growth period of maize. The maximum height of the mast can reach 2.2 m.
The robot system adopted a distributed design. We divided it into three control units: driving module, navigation module, and phenotypic data acquisition module. The driving module was a wireless remote-control system and the navigation module was a control system based on an industrial personal computer (IPC). The bottom controller of the robot was an STM32 development board, which was used for the motion control of four electronic governors by connecting to a central expansion board. The four electronic governors were four deceleration motors connected to the wheel bearing. The navigation module used a Global Navigation Satellite System (GNSS) and a laser scanner to realize the mapping and navigation in the field based on Cartographer, a simultaneous localization and mapping (SLAM) algorithm. Notes that the navigation module was an integral part of our robot, but it was not the research focus of this paper. The data acquisition module consisted of an RGB-D camera and four lighting devices, which could ensure that the robot walking under the dense canopy could still capture sufficient 3D information of crops, even in poor lighting conditions. The robot software algorithms we developed were coded under Robot Operating System (ROS) based on Ubuntu 20.04. The schematic of our HTP robot platform is shown in Figure 1. The specifications and parameters of our robot are shown in Table 1.

Field Data Collection
We conducted field experiments at the Beijing Academy of Agriculture and Forestry Sciences in August 2020. We selected a planting area of 18 × 22 m 2 as the experimental Remote Sens. 2022, 14, 1030 5 of 20 area, which was composed of three parts: crops, crop rows, and aisles. Our robot could move freely through the crop rows and aisles in the teleoperation mode. Usually, the stem portion at the base of crops had a greater impact on the lodging resistance. To ensure that the sensor's field of view covered the basal stem area as much as possible, we fixed the RGB-D camera (Intel ® RealSense D435i) to a tray with a height of 0.5 m. Besides, due to the poor lighting conditions under dense crop canopies, we also installed four LED lighting devices on sensor trays. These LED lights were kept on as the robot worked between crop rows. To prevent the camera lens from being blocked by the messy maize leaves, we kept the camera lens facing opposite to the moving direction of the robot, which could reduce the contact frequency between the camera lens and the leaves. In this way, the RGB-D camera could capture the stems on either side of the crop rows.
The specific experimental scheme is shown in Figure 2. The aisle divided the experimental field into two areas of 10 × 18 m 2 . For each area, the crop row spacing was 0.6 m. The robot collected phenotypic data from 8 crop rows at a speed of 0.1 m/s. In addition, to enrich our data sets, we kept our robot moving in an aisle with a width of 1.4 m to collect phenotypic data for both sides of the crop areas. The arrows in Figure 2 show the route traveled by our robot.

Steering mode
Differential steering Ground clearance 0.10 m Working environment In-row Applied coding interface ROS, C++, Python

Field Data Collection
We conducted field experiments at the Beijing Academy of Agriculture and Forestry Sciences in August 2020. We selected a planting area of 18 × 22 m 2 as the experimental area, which was composed of three parts: crops, crop rows, and aisles. Our robot could move freely through the crop rows and aisles in the teleoperation mode. Usually, the stem portion at the base of crops had a greater impact on the lodging resistance. To ensure that the sensor's field of view covered the basal stem area as much as possible, we fixed the RGB-D camera (Intel ® RealSense D435i) to a tray with a height of 0.5 m. Besides, due to the poor lighting conditions under dense crop canopies, we also installed four LED lighting devices on sensor trays. These LED lights were kept on as the robot worked between crop rows. To prevent the camera lens from being blocked by the messy maize leaves, we kept the camera lens facing opposite to the moving direction of the robot, which could reduce the contact frequency between the camera lens and the leaves. In this way, the RGB-D camera could capture the stems on either side of the crop rows.
The specific experimental scheme is shown in Figure 2. The aisle divided the experimental field into two areas of 10 × 18 m 2 . For each area, the crop row spacing was 0.6 m. The robot collected phenotypic data from 8 crop rows at a speed of 0.1 m/s. In addition, to enrich our data sets, we kept our robot moving in an aisle with a width of 1.4 m to collect phenotypic data for both sides of the crop areas. The arrows in Figure 2 show the route traveled by our robot.  During our experiments, the resolutions and frequencies of the RGB and depth images from the RealSense D435i cameras were set to be 640 × 480 and 30 fps, respectively. The cameras captured approximately 360 samples of maize plants. It is worth noting that we marked the maize plants in the experimental areas with different color flags and measured 120 sets of the stem diameters manually beside the flags as the benchmark values to verify the measurement performances. The experimental scenarios are shown in Figure 3.

Data Processing
The comprehensive framework of our algorithm for calculating the stem diameters using the RGB-D camera consisted of two steps: (1) extracting the point cloud for the maize stems, which consisted of stem detection, mask processing, and point cloud extraction; (2) estimating the stem diameters with the two approaches we proposed-one is based on point cloud convex hull (SD-PCCH), and the other is based on the projection of point cloud (SD-PPC). In addition, we filled the missing stem point cloud in the process of calculating the stem diameters. Figure 4 shows the flowchart for the whole data processing framework. To speed up the coding process, we developed the data processing project using Point Cloud Library (PCL).
During our experiments, the resolutions and frequencies of the RGB and depth images from the RealSense D435i cameras were set to be 640 × 480 and 30 fps, respectively. The cameras captured approximately 360 samples of maize plants. It is worth noting that we marked the maize plants in the experimental areas with different color flags and measured 120 sets of the stem diameters manually beside the flags as the benchmark values to verify the measurement performances. The experimental scenarios are shown in Figure 3.

Data Processing
The comprehensive framework of our algorithm for calculating the stem diameters using the RGB-D camera consisted of two steps: (1) extracting the point cloud for the maize stems, which consisted of stem detection, mask processing, and point cloud extraction; (2) estimating the stem diameters with the two approaches we proposed-one is based on point cloud convex hull (SD-PCCH), and the other is based on the projection of point cloud (SD-PPC). In addition, we filled the missing stem point cloud in the process of calculating the stem diameters. Figure 4 shows the flowchart for the whole data processing framework. To speed up the coding process, we developed the data processing project using Point Cloud Library (PCL).

Data Processing
The comprehensive framework of our algorithm for calculating the stem diameters using the RGB-D camera consisted of two steps: (1) extracting the point cloud for the maize stems, which consisted of stem detection, mask processing, and point cloud extraction; (2) estimating the stem diameters with the two approaches we proposed-one is based on point cloud convex hull (SD-PCCH), and the other is based on the projection of point cloud (SD-PPC). In addition, we filled the missing stem point cloud in the process of calculating the stem diameters. Figure 4 shows the flowchart for the whole data processing framework. To speed up the coding process, we developed the data processing project using Point Cloud Library (PCL).

Extraction of Stem Point Cloud
Stem detection is a critical pre-processing step, which helps to accurately extract regions of interest (ROI) from noisy data. A faster RCNN model, a two-stage object detector, can be well applied to real-time detection for field stems. The model consists of three parts: the backbone, Region Proposal Network (RPN), and classification and regression module. The backbone is a convolutional layer used to extract the feature maps of input images. We adopted a residue network based on the ResNet50 as the backbone. At the same time, a feature pyramid network (FPN) was introduced into the backbone to improve the precision of the feature maps. RPN is a core network of the Faster RCNN, which is used to quickly generate potential regions of interest. The classification and regression module used the feature in each ROI to identify the ROI classes and generate the object bounding boxes.
Due to the limited number of labeled images, we adopted transfer learning technology to accelerate the convergence rate of model training in small sample datasets. The Faster RCNN model was initialized using weights pretrained from the Pascal VOC 2012 dataset, a large annotated image dataset open to the public. In the model training, we annotated a total of 1800 images with maize stems in the format of Pascal VOC 2012. These images included typical field scenes under different lighting intensities (e.g., strong lighting intensity, backlighting, and exposure) and field of views (e.g., close-distance and long-distance).
As the maize stem was the only object to be detected, the number of classes for the detection model was set to 2, i.e., background and stem. The Faster RCNN model training was performed using stochastic gradient descent (SGD) by the momentum optimizer with an initial learning rate of 0.005, a momentum of 0.9, and a weight decay of 0.0005. To improve the stability of model convergence, the learning rate was adjusted once every 5 epochs according to the ratio factor of 0.33. Based on the Pascal VOC 2012 training model, a total of 300 epochs were used to ensure the model convergence for the maize stem detection task. The model weights were saved for each iteration of the epoch. We chose the model weight with the highest accuracy for stem detection. Our dataset was trained on a graphic workstation, Dell Precision 7920 Tower (2 Xeon Silver 4214R @ 2.4 GHz CPU cores, 128 GB RAM, and NVIDIA GeForce RTX 3070 (8 GB)), using the operating system of Ubuntu 20.04 with Pytorch 1.6.0.
The stems detected by the Faster RCNN model were annotated with red bounding boxes based on their pixel coordinates. To better extract all of the pixel values of these stems based on the color information, the inner pixels of the bounding boxes were filled with red rectangles to highlight the stem pixels, named mask processing. Generally, the HSV color space represents the color characteristics of an object better than RGB space. Thus, we converted the RGB images with rectangular markers to the HSV space for post-processing. We defined the red mask threshold in the HSV space as follows: where Hue min = 50, Saturation min = 100, Value min = 100, Hue max = 70, Saturation max = 255, and Value max = 255. According to the mask area, the pixel coordinates of stem regions could be extracted. To better distinguish the stem and background pixels in color image, the background pixels were replaced with zeros. In general, the stem point cloud could be obtained based on RGB images and their corresponding depth images. All of the depth values of stems in the depth images could be extracted according to the coordinates of the stem pixels in the RGB images. These depth values were used to calculate the stem point cloud based on camera parameters. Specifically, stem point cloud extraction contains three steps: (1) judging whether each pixel coordinate belongs to the stem region according to the color information. It can be seen from the above that the color values of the stem area were not zeros, while the color values of the background pixels were zeros; (2) obtaining the non-zero coordinates of the RGB images, which were the stem coordinates of the depth images. Therefore, the depth values of stems could be extracted according to non-zero coordinates; and (3) generating a 3D point cloud from the depth values of stems and camera parameters. The equations for calculating the 3D points are: where P z , P x , and P y represent the spatial coordinates of the 3D point, d is the depth value of the current pixel, u and v represent the pixel coordinates of the depth image, camera_cx and camera_cy determine the aperture center of the camera, camera_fx and camera_fy represent the focal length of the camera on the X and Y axes, respectively, and camera_factor is the scale factor for the depth image.
Normally, the extraction of stem point clouds is fulfilled after calculating all of the 3D points in the detected rectangle markers. However, the stem detection model will recognize all maize stems within the camera's field of view, resulting in multiple maize stem point clouds for each frame. Additionally, one stem may be divided into multiple point clouds due to the occlusion of leaves. Figure 5a shows both cases. Thus, we need to accurately detect every stem and cluster point clouds belonging to the same stem. DBSCAN (Density-Based Spatial Clustering of Application with Noise) is an excellent clustering algorithm for point clouds that have significant density characteristics [44]. This algorithm can be used to cluster stem point clouds, since the point clouds belonging to the same stem have higher density characteristics. For DBSCAN, we set the threshold of neighborhood distance to 0.01 m, and the minimum number of points in the core point field to 20. To improve the efficiency of clustering, we introduced a KD-Tree search algorithm to search neighborhood points for DBSCAN. The white rectangle in Figure 5c indicates that the stems divided into multiple point clouds are clustered into one cluster. Note that the grey point cloud in Figure 5c is the missing part caused by occlusion or observation error. We give an in-depth introduction for the approach of filling the missing stem parts in the subsequent section.

Estimation of Stem Diameters
We used two approaches, SD-PCCH and SD-PPC, to estimate the stem diameters of maize plants based on the point cloud clustering results. Among them, SD-PCCH used the volume and height of the point cloud convex hull to calculate the stem diameters. This approach assumed that the geometry of the detected maize stem parts was semi-cylindrical. The estimation process is shown in Figure 6a. SD-PPC is another approach to calculating the stem diameters, which is based on the projection of the point cloud on a 2D plane. Figure 6b shows the calculation process for SD-PPC. SD-PCCH estimates the stem diameters by constructing a convex hull. The convex hull is a convex polygon formed by connecting the outermost points of a point cloud cluster. Convex hull detection is often used in object recognition, gesture recognition, and boundary detection. Thus, based on the geometry of the stem point cloud, the volume and In fact, the existing DBSCAN algorithm may have led to the mis-clustering of stem point clouds. For example, two plants were divided into four parts, which were considered to be four plants by DBSCAN, as shown in Figure 5a. As a result, we propose an improved DBSCAN, named with 2D-DBSCAN, as shown in Figure 5. 2D-DBSCAN can realize accurate clustering of stem point clouds. We assumed that the growth direction of crops is roughly parallel to the vertical axis (Y-axis) of the 3D coordinates. Then, a stem could be divided into several segments at different heights along the Y-axis. 2D-DBSCAN consists of three steps: (i). The stem point clouds in Figure 5a are projected onto the X-Z plane; (ii). DBSCAN is used to cluster each stem in a 2D plane, as shown in Figure 5b; (iii). The stem point clouds after clustering are restored to a 3D space, as shown in Figure 5c.
The white rectangle in Figure 5c indicates that the stems divided into multiple point clouds are clustered into one cluster. Note that the grey point cloud in Figure 5c is the missing part caused by occlusion or observation error. We give an in-depth introduction for the approach of filling the missing stem parts in the subsequent section.

Estimation of Stem Diameters
We used two approaches, SD-PCCH and SD-PPC, to estimate the stem diameters of maize plants based on the point cloud clustering results. Among them, SD-PCCH used the volume and height of the point cloud convex hull to calculate the stem diameters. This approach assumed that the geometry of the detected maize stem parts was semi-cylindrical. The estimation process is shown in Figure 6a. SD-PPC is another approach to calculating the stem diameters, which is based on the projection of the point cloud on a 2D plane. Figure 6b shows the calculation process for SD-PPC.
SD-PCCH estimates the stem diameters by constructing a convex hull. The convex hull is a convex polygon formed by connecting the outermost points of a point cloud cluster. Convex hull detection is often used in object recognition, gesture recognition, and boundary detection. Thus, based on the geometry of the stem point cloud, the volume and height of the point cloud can be estimated by convex hull detection algorithms. SD-PCCH consists of four steps: (1) conducting pre-processing, such as statistical filtering, to remove the noise points from the point cloud cluster; (2) generating the point cloud convex hull based on the Convex Hull function in PCL, and obtaining the volume of the convex hull; (3) obtaining the minimum and maximum values of the Y-axis (Y max and Y min ) for each stem point cloud; and (4) regarding the convex hull as a semi-cylinder based on the geometric characteristics of the stem. Note that the volume of the convex hull is half of the volume of the cylinder, because the point cloud covers only half of the stem. In step (iii), the height of the stem is calculated by: into one cluster.
The white rectangle in Figure 5c indicates that the stems divided into multiple point clouds are clustered into one cluster. Note that the grey point cloud in Figure 5c is the missing part caused by occlusion or observation error. We give an in-depth introduction for the approach of filling the missing stem parts in the subsequent section.

Estimation of Stem Diameters
We used two approaches, SD-PCCH and SD-PPC, to estimate the stem diameters of maize plants based on the point cloud clustering results. Among them, SD-PCCH used the volume and height of the point cloud convex hull to calculate the stem diameters. This approach assumed that the geometry of the detected maize stem parts was semi-cylindrical. The estimation process is shown in Figure 6a. SD-PPC is another approach to calculating the stem diameters, which is based on the projection of the point cloud on a 2D plane. Figure 6b shows the calculation process for SD-PPC.  In this way, the stem diameters can be calculated as: where D and V are the width and volume of the semi-cylinder, respectively. The estimation of stem diameters is effective using the convex hull of point clouds because the shape of the convex hull can be regarded as a semi-cylinder, which is similar to the real morphological structure of stems. However, this approach has a high requirement for the reconstruction precision of the stem point cloud. We know that the convex hull is composed of the vertex coordinates of the point cloud in all directions. Thus, even a small number of outliers will have a great influence on the measurement accuracy of stems. As shown in Figure 7, this image is a stem point cloud generated by 2D-DBSCAN clustering. We can see that there are a few outliers around the stem, such as the points within the red circular area. As a result, the convex hull generated by the point cloud has a larger volume than the actual stem, which will result in a larger stem diameter.
Here, we propose the second approach for measuring stem diameters, SD-PPC. As mentioned above, the depth values of the point cloud, that is, in the Z-axis direction, are sometimes inaccurate, which will affect the stem measurement accuracy, as shown in Figure 8a. However, mapping on the X-Y plane can eliminate the influence of inaccurate depth values, as shown in Figure 8b, which can accurately describe the characteristics of the stem. As a result, we used the projection of the stem point cloud on the X-Y plane to estimate the stem diameters, as shown in Figure 8. This approach consisted of seven steps. (1) Pre-processing. This step is the same as the pre-processing for SD-PCCH.
(2) Establishing a plane projection model, which reassigns the Z values of the point cloud to zeros. In this way, the point cloud changes from 3D to 2D, as shown in Figure 8b. (3) Generating a virtual point cloud. This point cloud is used to fill in the missing data area due to leaf occlusion. The detailed method is described in Section 3.
Here, W stem is the stem diameter, S area is the polygon area of the point cloud on the X-Y projection plane, and the Y max and Y min represent the maximum and minimum values on the X-axis for 2D point cloud coordinates, respectively.
where D and V are the width and volume of the semi-cylinder, respectively.
The estimation of stem diameters is effective using the convex hull of poin because the shape of the convex hull can be regarded as a semi-cylinder, which i to the real morphological structure of stems. However, this approach has a high ment for the reconstruction precision of the stem point cloud. We know that th hull is composed of the vertex coordinates of the point cloud in all directions. Th a small number of outliers will have a great influence on the measurement acc stems. As shown in Figure 7, this image is a stem point cloud generated by 2D-D clustering. We can see that there are a few outliers around the stem, such as th within the red circular area. As a result, the convex hull generated by the point c a larger volume than the actual stem, which will result in a larger stem diameter Here, we propose the second approach for measuring stem diameters, SDmentioned above, the depth values of the point cloud, that is, in the Z-axis direc sometimes inaccurate, which will affect the stem measurement accuracy, as show ure 8a. However, mapping on the X-Y plane can eliminate the influence of in depth values, as shown in Figure 8b, which can accurately describe the characte the stem. As a result, we used the projection of the stem point cloud on the X-Y estimate the stem diameters, as shown in Figure 8. This approach consisted of sev (1) Pre-processing. This step is the same as the pre-processing for SD-PCCH. (2) E ing a plane projection model, which reassigns the Z values of the point cloud to this way, the point cloud changes from 3D to 2D, as shown in Figure 8b Here, Wstem is the stem diameter, Sarea is the polygon area of the point cloud on the X-Y projection plane, and the Ymax and Ymin represent the maximum and minimum values on the X-axis for 2D point cloud coordinates, respectively.

Filling Strategy for Missing Stem Parts in the Point Cloud
In the above section, we have realized the real-time estimation of stem diameters with our pipeline. However, the stem point cloud extracted usually has missing regions due to occlusions of plant leaves and holes in depth images of RGB-D cameras. These regions split the point cloud belonging to the same stem into two or more parts, which then affects the stem measurement results. In Section 3.3.1, we proposed the 2D-DBSCAN to accurate cluster the 3D stem points. In this section, we need to fill in the missing parts after clustering, as the gray point cloud shows in Figure 5c. Here, we propose a missing point cloud filling strategy based on the grid division, as shown in Figure 9, which consists of six steps: (i). Traversing the point cloud from the minimum value to the maximum value for the Y coordinate at a grid threshold interval of 0.003 m on the Y-axis. The range of the traversal is given by:

Filling Strategy for Missing Stem Parts in the Point Cloud
In the above section, we have realized the real-time estimation of stem diameters with our pipeline. However, the stem point cloud extracted usually has missing regions due to occlusions of plant leaves and holes in depth images of RGB-D cameras. These regions split the point cloud belonging to the same stem into two or more parts, which then affects the stem measurement results. In Section 3.3.1, we proposed the 2D-DBSCAN to accurate cluster the 3D stem points. In this section, we need to fill in the missing parts after clustering, as the gray point cloud shows in Figure 5c. Here, we propose a missing point cloud filling strategy based on the grid division, as shown in Figure 9, which consists of six steps: (i). Traversing the point cloud from the minimum value to the maximum value for the Y coordinate at a grid threshold interval of 0.003 m on the Y-axis. The range of the traversal is given by: where i is the number of rows traversed on the Y-axis, α is 0.003 and indicates that the grid threshold of the traversal interval is 0.003 m, and y max and y min present the maximum value and minimum value of the point cloud coordinates on the Y-axis, respectively.
(ii). For the region between row i and row i + 1 on the Y-axis during the traversal process, if the region has 3D points, that is, Equation (7) is satisfied, the region does not need to be filled; where P y represents the coordinate value of existing 3D points along the Y-axis.
(iii). If the requirements of the previous step are not met, point cloud filling is performed on the correspondence region; (iv). The number of points that need to be filled in this region can be expressed as: where x max and x min represent the maximum and minimum values of the point cloud on the X-axis, respectively, and K is the number of points that need to be filled.
(v). The X and Y coordinates of the added points are given by: (v). The X and Y coordinates of the added points are given by: (vi). The Z values of the points are set to 0 s. The reason is that the approach of calculating the stem diameters by SD-PPC does not need the Z values. Meanwhile, for the SD-PCCH approach, the convex hull already encloses the missing point cloud area, so there is also no need to calculate the Z values. It is worth noting that SD-PCCH does not need to fill the point cloud for missing areas, because the approach for measuring the stem diameters only relies on the point cloud convex hull. On the contrary, the point cloud filling strategy is applicable to SD-PPC, because SD-PPC needs to extract the contour of the point cloud to calculate the stem diameters. Missing areas easily cause the wrong point cloud contour. It is worth noting that SD-PCCH does not need to fill the point cloud for missing areas, because the approach for measuring the stem diameters only relies on the point cloud convex hull. On the contrary, the point cloud filling strategy is applicable to SD-PPC, because SD-PPC needs to extract the contour of the point cloud to calculate the stem diameters. Missing areas easily cause the wrong point cloud contour.

Extraction of Stem Point Clouds
The extraction of stem point clouds consists of two steps: stem detection and point cloud extraction. Figure 10 shows the stem detection result of Faster RCNN under natural scenarios. These scenarios include strong lighting/backlighting, close-distance/longdistance, etc. Here, we used the mean average precision (mAP) to evaluate the performance of the stem detection model. mAP is the area covered by the PR (Precision-Recall) curve. mAP is computed as: where TP, FP, and TN are true positive, false positive, and true negative, respectively, N C is the object number of C class in all images, and Q is the number of the detected object. Since the maize stem is the only detection object in our model, Q = 1. Figure 11 shows the loss curve and PR curve after model convergence. We found that mAP of stems was 67%. The stem point cloud extracted based on the stem detection results is shown in Figure 12. Figure 12b shows the mask processing results based on the stem bounding box. Figure 12c shows the ROI of the extracted stem coordinates. Figure 12d shows the stem point cloud obtained from the depth image based on the RGB-D camera parameters. Noting that the stem area coordinates in the color image are completely the same as those in the depth image for RGB-D camera frames, the stem depth values in the depth image can be extracted according to the stem pixel location in the color image. Since the maize stem is the only detection object in our model, Q = 1. Figure 11 shows the loss curve and PR curve after model convergence. We found that mAP of stems was 67%.    The stem point cloud extracted based on the stem detection results is shown in Figure  12. Figure 12b shows the mask processing results based on the stem bounding box. Figure  12c shows the ROI of the extracted stem coordinates. Figure 12d shows the stem point cloud obtained from the depth image based on the RGB-D camera parameters. Noting that the stem area coordinates in the color image are completely the same as those in the depth image for RGB-D camera frames, the stem depth values in the depth image can be extracted according to the stem pixel location in the color image.

Visualization of Convex Hull and 2D Projection of Point Cloud
Building convex hulls and plane projections of stem point clouds are key steps to implement SD-PCCH and SD-PPC. Figure 13 shows the results of constructing convex hulls and plane projections of stem point clouds. Figure 13b shows the point clouds of each stem obtained by 2D-DBSCAN clustering from Figure 13a. Figure 13c,d show the convex hulls and plane projections of each stem extracted, respectively.

Visualization of Convex Hull and 2D Projection of Point Cloud
Building convex hulls and plane projections of stem point clouds are key steps to implement SD-PCCH and SD-PPC. Figure 13 shows the results of constructing convex hulls and plane projections of stem point clouds. Figure 13b shows the point clouds of each stem obtained by 2D-DBSCAN clustering from Figure 13a. Figure 13c,d show the convex hulls and plane projections of each stem extracted, respectively. Figure 14 shows the results of our proposed point cloud filling strategy. The red areas are the point clouds obtained from the depth images, and the gray areas are the filling points. Our approach can join point clouds belonging to the same stem. It is worth noting that point cloud filling does not significantly affect the measurement results of stem diameters, but can avoid multiple measurements for the same observed stem in the SD-PPC. The reason is that split stems without filling will generate two or more point cloud contours.

Visualization of Convex Hull and 2D Projection of Point Cloud
Building convex hulls and plane projections of stem point clouds are key steps to implement SD-PCCH and SD-PPC. Figure 13 shows the results of constructing convex hulls and plane projections of stem point clouds. Figure 13b shows the point clouds of each stem obtained by 2D-DBSCAN clustering from Figure 13a. Figure 13c,d show the convex hulls and plane projections of each stem extracted, respectively.  Figure 14 shows the results of our proposed point cloud filling strategy. The red areas are the point clouds obtained from the depth images, and the gray areas are the filling points. Our approach can join point clouds belonging to the same stem. It is worth noting

Stem Diameter Estimation with SD-PCCH and SD-PPC
In this section, we compare and analyze the stem diameter measurement results of our approaches with the manually measured values. Specifically, we evaluate the measurement accuracy of the two approaches, SD-PCCH and SD-PPC. Figures 15 and 16 show the comparison results of our two approaches with the manual measurement values, where the horizontal and vertical coordinates represent the manual measurement values and the estimated values by our approaches, respectively. We used approximately 120 sets of stem samples from the testing dataset to estimate the stem diameters. We employed R 2 and RMSE to evaluate the stem diameter estimation results. R 2 shows the similarity between the estimated error and manual measurement error, where R 2 = 1 means that the two types of error are the same. RMSE is the root-mean-square error, which emphasizes the deviations between the estimated values and manual measurement values. R 2 and RMSE can be expressed as: where n is the data sample size, y k and ∧ y k are the k-th manual measurement value and estimated value, respectively, and y is the mean value of the manual measurement.
As can be seen from Figures 15 and 16, the two approaches both achieved good results. The R 2 and RMSE for SD-PCCH were 0.35 and 4.99 mm, respectively. The R 2 and RMSE for SD-PPC were 0.72 and 2.95 mm, respectively. The experimental results showed that SD-PPC had better measurement accuracy than SD-PCCH. The reason is that the noise points of the depth values increased the volumes of the point cloud convex hulls, which caused the stem diameters to increase accordingly. On the contrary, the stem diameter estimation was not affected by the depth values of the point clouds for SD-PPC.
We also performed a statistical analysis of the measurement result distribution for the SD-PCCH, SD-PPC, and manual measurement values, as shown in Figure 17. The results showed that SD-PPC had a higher measurement accuracy than SD-PCCH, because the maximum, minimum, and median of SD-PPC were in better agreement with the results of the manual measurements than those of SD-PCCH. Meanwhile, according to the interquartile range, we found that the estimated values of SD-PPC were more concentrated around the median compared to those of SD-PCCH. that point cloud filling does not significantly affect the measurement results of stem diameters, but can avoid multiple measurements for the same observed stem in the SD-PPC. The reason is that split stems without filling will generate two or more point cloud contours. Figure 14. The point cloud filling results.

Stem Diameter Estimation with SD-PCCH and SD-PPC
In this section, we compare and analyze the stem diameter measurement results of our approaches with the manually measured values. Specifically, we evaluate the measurement accuracy of the two approaches, SD-PCCH and SD-PPC. Figures 15 and 16 show the comparison results of our two approaches with the manual measurement values, where the horizontal and vertical coordinates represent the manual measurement values and the estimated values by our approaches, respectively. We used approximately 120 sets of stem samples from the testing dataset to estimate the stem diameters. We employed R 2 and RMSE to evaluate the stem diameter estimation results. R 2 shows the similarity between the estimated error and manual measurement error, where R 2 = 1 means that the two types of error are the same. RMSE is the root-mean-square error, which emphasizes the deviations between the estimated values and manual measurement values. R 2 and RMSE can be expressed as:  (11) where n is the data sample size, yk and k y  are the k-th manual measurement value and estimated value, respectively, and y is the mean value of the manual measurement.
As can be seen from Figures 15 and 16, the two approaches both achieved good results. The R 2 and RMSE for SD-PCCH were 0.35 and 4.99 mm, respectively. The R 2 and RMSE for SD-PPC were 0.72 and 2.95 mm, respectively. The experimental results showed that SD-PPC had better measurement accuracy than SD-PCCH. The reason is that the noise points of the depth values increased the volumes of the point cloud convex hulls, which caused the stem diameters to increase accordingly. On the contrary, the stem diameter estimation was not affected by the depth values of the point clouds for SD-PPC.  We also performed a statistical analysis of the measurement result distribution for the SD-PCCH, SD-PPC, and manual measurement values, as shown in Figure 17. The results showed that SD-PPC had a higher measurement accuracy than SD-PCCH, because the maximum, minimum, and median of SD-PPC were in better agreement with the re-  We also performed a statistical analysis of the measurement result distribution for the SD-PCCH, SD-PPC, and manual measurement values, as shown in Figure 17. The results showed that SD-PPC had a higher measurement accuracy than SD-PCCH, because the maximum, minimum, and median of SD-PPC were in better agreement with the re- Additionally, we compared our stem diameter estimation results with those previously reported in [25,27]. We took R 2 , RMSE, and the Mean Absolute Error (MAE) as evaluation indexes. We add MAE as a comparison term because it is insensitive to outliers generated by the estimated values. This can better express the robustness of different algorithms to the stem diameter estimation results. MAE is computed as: where yk and k y  are the k-th manual measurement value and estimated value, respectively. The comparison results are shown in Table 2. We found that our stem diameter estimation results were better than those of [25] and [27], because our R 2 was greater and RMSE/MAE were smaller compared with those of [25] and [27]. Also, we compared SD-PCCH with SD-PPC, and found that SD-PPC had better estimation results than SD-PCCH in all metrics.

Discussion
Robotics-based high-throughput phenotyping has the potential to break the phenotypic bottleneck. Particularly, the study of HTP robots based on ground mobile platforms has been proven to be an effective way to accelerate automatic phenotyping [27,29,45]. Currently, HTP robots can realize non-contact measurement for crop morphological parameters by equipping them with sensing devices. In this study, we present a stem diameter measurement pipeline using a self-developed mobile phenotyping platform equipped with an RGB-D camera. The experiments show that the pipeline can meet the requirements of automatic measuring for maize stems in fields. More generally, our pipeline has good generalization capabilities for measuring the stem diameters of high crops in an ideal data collection condition. There is no denying that the existing measurement pipeline still has some challenges and limitations: (1). real-time performance. It is difficult Additionally, we compared our stem diameter estimation results with those previously reported in [25,27]. We took R 2 , RMSE, and the Mean Absolute Error (MAE) as evaluation indexes. We add MAE as a comparison term because it is insensitive to outliers generated by the estimated values. This can better express the robustness of different algorithms to the stem diameter estimation results. MAE is computed as: where y k and ∧ y k are the k-th manual measurement value and estimated value, respectively. The comparison results are shown in Table 2. We found that our stem diameter estimation results were better than those of [25] and [27], because our R 2 was greater and RMSE/MAE were smaller compared with those of [25] and [27]. Also, we compared SD-PCCH with SD-PPC, and found that SD-PPC had better estimation results than SD-PCCH in all metrics.

Discussion
Robotics-based high-throughput phenotyping has the potential to break the phenotypic bottleneck. Particularly, the study of HTP robots based on ground mobile platforms has been proven to be an effective way to accelerate automatic phenotyping [27,29,45]. Currently, HTP robots can realize non-contact measurement for crop morphological parameters by equipping them with sensing devices. In this study, we present a stem diameter measurement pipeline using a self-developed mobile phenotyping platform equipped with an RGB-D camera. The experiments show that the pipeline can meet the requirements of automatic measuring for maize stems in fields. More generally, our pipeline has good generalization capabilities for measuring the stem diameters of high crops in an ideal data collection condition. There is no denying that the existing measurement pipeline still has some challenges and limitations: (1). real-time performance. It is difficult for phenotyping robots to process large amounts of phenotype data online in field conditions due to the limitations of hardware devices; and (2). multi-parameter measurements. Our pipeline is currently used to estimate the stem diameters. How to achieve simultaneous measurement of multiple phenotypic parameters is a direction to be explored in the future. Currently, we need to address the following issues: (i) Improving the stem detection accuracy of convolutional neural network. We used the existing two-stage object detector Faster-RCNN to identify field stems. The mAP of stem detection after network convergence was 67%. This may be caused by the strong lighting changes and the inconspicuous color characteristics of stems under the crop canopy. In the future, we hope to improve the detection accuracy of the detector by labeling more data sets and adjusting the network structure; (ii) Evaluating the 3D image quality of RealSense D435i. RealSense D435i cameras have been proven to have excellent ranging performances under natural conditions. However, it is still necessary to evaluate the depth value accuracy for different crop organs to improve the 3D imaging quality. It will be helpful to improve the measurement accuracy of maize stem diameters; (iii) Improving the real-time phenotyping performances of our algorithm pipeline. Currently, our algorithm pipeline is implemented on a graphics workstation. During our experiment, the bag recording function of ROS was used to obtain crop images in the field. These image data were parsed and used on the graphics workstation to run our phenotyping algorithm. In the future, our algorithm will be processed in real-time with an edge computing module on our HTP robot; (iv) Extending our algorithm pipeline to different crop varieties. At present, maize crops are our main focus. However, the algorithm pipeline we proposed is expected to be applied to other common high-stem plants, such as sorghum, sugarcane, etc. Furthermore, we believe that our method can also be used for the measurement of the phenotypic parameters of various crop organs by only adjusting some necessary algorithm parameters.

Conclusions
This paper aimed to investigate a high-throughput phenotyping solution based on mobile robots and RGB-D sensing technologies. An in situ and inter-row stem diameter measurement pipeline for maize crops was proposed. In this pipeline, we used Faster RCNN to detect stems in color images and employed the point clouds converted from depth images to measure the maize stem diameters. We first solved the inaccurate clustering problem of stem point clouds caused by dense leave occlusions using a dimension reduction clustering algorithm based on DBSCAN. Then, we presented a point cloud filling strategy to fill the missing depth values of the stem point clouds. Finally, we proposed two stem diameter-estimation approaches (e.g., SD-PCCH and SD-PPC) by analyzing the geometric structures of the stem point clouds. Here, SD-PCCH and SD-PPC calculated the stem diameters using the 3D point cloud convex hull and the 2D projection of point clouds, respectively. The comparison of our approaches with other existing literatures showed that SD-PCCH and SD-PPC were effective in measuring stem diameters. In addition, since SD-PPC avoids the effect of depth noise on the estimation results, SD-PPC has higher measurement accuracy than SD-PCCH. By analyzing 120 test samples, the R 2 and RMSE of SD-PPC were up to 0.72 and 2.95 mm, respectively.
Currently, greenhouse or controlled scenarios are still dominant for high-throughput phenotyping [9]. However, field-based crop cultivation is the main mode for food production. In-field phenotyping is still in the exploratory stage due to some intractable problems, such as intensive lighting changes, leaf-occlusion clutter, etc. The phenotyping robot we developed is exactly for in situ measuring the stem diameters of field crops. We hope that our algorithm pipeline can improve the phenotype screening efficiency, and can better serve breeders in the future. Meanwhile, we are also trying to integrate more advanced algorithms in our robot to realize online measurement for multiple phenotypic parameters, such as leaf length, leaf number, leaf angle, etc.