Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder

Gao, Yuliang; Li, Zhen; Liu, Tao; Li, Bin; Zhang, Lifeng

doi:10.3390/agronomy15051155

Open AccessEditor’s ChoiceArticle

Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder

by

Yuliang Gao

¹

,

Zhen Li

^2,*

,

Tao Liu

³,

Bin Li

⁴ and

Lifeng Zhang

^1,*

¹

Graduate School of Engineering, Kyushu Institute of Technology, Kitakyushu 8040015, Japan

²

School of Electrical Engineering, Nantong University, Nantong 226021, China

³

Institute of Smart Agriculture, College of Agriculture, Yangzhou University, Yangzhou 225009, China

⁴

College of Artificial Intelligence, Yangzhou University, Yangzhou 225012, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2025, 15(5), 1155; https://doi.org/10.3390/agronomy15051155

Submission received: 6 April 2025 / Revised: 7 May 2025 / Accepted: 7 May 2025 / Published: 9 May 2025

(This article belongs to the Special Issue Intelligent Detection and Classification of External Traits in Crop Plants, Fruits, and Vegetables)

Download

Browse Figures

Versions Notes

Abstract

To obtain the complete shape and pose of corn under occlusion, this study proposes a point cloud completion algorithm for completing the fragmented corn point cloud after segmentation. Considering that this work focuses on a single-class crop—corn—the proposals mainly focus on the deep learning model size and the completion of the overall shape of the corn. In this work, the 3D corn models derived from segmentation are employed to systematically output the fragmented point cloud data in batches. The Shape Coding PointAttN (SCPAN) algorithm is also proposed, which is based on PointAttN. The model’s structure is simplified to output sparse point clouds and minimize computational complexity, and a gated multilayer perceptron (MLP) containing 3D position coding is introduced to enhance the model’s spatial awareness. In addition, the prior shape encoder module is initially trained and subsequently integrated into the model to enhance its focus on shape characteristics. Compared to the original model, PointAttN, SCPAN achieves a 34.2% reduction in the number of parameters, and the inference time is reduced by 30 ms while maintaining comparable accuracy. The experimental results show that the proposed method can complete the corn point cloud more effectively, using a small model to help estimate the pose and dimensions of corn accurately. This work supports the precise phenotypic analysis of corn and similar crops, such as citrus and tomatoes, and promotes the development of smart agricultural technology.

Keywords:

corn; point cloud completion; pose estimation; 3D position coding; prior shape encoder

1. Introduction

Retrieving the precise corn position and dimensions is essential for expanding the use of smart and precision agriculture in corn cultivation [1], as corn is one of the three primary grain crops in the world [2]. Under ideal conditions without occlusion, the information about the pose and shape of corn fruit with their dimensions can be directly obtained from its point cloud using an oriented bounding box (OBB). However, in real agricultural environments, the acquisition of corn point cloud data is often affected by occlusion from leaves, leading to fragmented point clouds. This incompleteness negatively impacts subsequent pose estimation and shape measurements. Compared to classical 6D pose estimation [3], corn exhibits consistent width and height, eliminating the need for roll angle estimation. Consequently, there are two fewer detection parameters—height and rotation angle.

With this information, several agricultural applications become feasible as follows:

(1): Precise spraying [4]: Pose estimation enables targeted pesticide application for common corn diseases such as kernel rot and corn smut. Leveraging spatial data improves the dosage estimation, ensuring efficiency and accuracy.
(2): Precision harvesting [5]: Pose estimation determines corn’s position relative to the camera and mechanical equipment. Through incorporating its specific shape and position, robotic arms can execute precise harvesting.
(3): Phenotypic measurement [6]: Pose estimation streamlines the extraction of phenotypic traits, eliminating the need for manual measurements or 3D reconstruction. Deep learning models directly obtain phenotype data with greater speed and accuracy.
(4): Field monitoring: Integrating camera positioning, pose estimation accurately maps each corn plant’s location, enhancing the monitoring of overall growth conditions.

Thus, 3D information acquisition is crucial for advancing precision agriculture in corn cultivation.

To address the occlusion problem, mature image segmentation techniques, such as the Segment Anything Model (SAM) [7] and Mask R-CNN [8], have been developed to extract corn images from corn farmland data effectively. These deep learning-based segmentation methods have been extensively studied in smart agriculture and have achieved promising results. Semantic segmentation has mature and diverse application scenarios in agriculture. Lei [9], Luo [10], and Charisis [11] conducted detailed research on these applications. Semantic segmentation based on deep learning can be applied to crop growth monitoring and plant health analysis. There has been a great deal of research into weed segmentation [12] and fruit segmentation [13].

In the existing literature, deep learning-based segmentation techniques have been widely adopted for extracting crop images from occluded scenes, achieving high accuracy in handling leaf occlusion. In particular, utilizing the SAM pre-trained model, the corn body can be efficiently extracted with minimal background prompts, enabling precise segmentation.

However, research on the point cloud completion of corn from fragmented point clouds remains limited. Most existing studies have focused on public datasets for common object completion, while related research on point cloud completion [14] for specific agricultural crops remains insufficient.

In the domain of agriculture, point cloud-related technologies are primarily employed in phenotype extraction. For example, Gené-Mola [15] generated 3D point clouds via visual technology to estimate the size of apples in the field under different occlusion conditions; Guo [16] accomplished precise phenotypic analysis of cabbages based on 3D point cloud segmentation methods; Magistri [17] predicted the three-dimensional shape of fruit by using RGB-D data to complete the point cloud complementation; and Chen [18] realized the 3D reconstruction of leaf point clouds utilizing deep learning technology, thereby extracting leaf phenotypes.

This work comprehensively investigates the task of corn point cloud completion. Data generation is first carried out. High-quality corn point cloud data are collected under occlusion-free conditions and detailed 3D corn models are constructed. To enhance the data diversity and simulate the morphological variations encountered in real-world scenarios, we generate 3D corn point cloud models of varying scales and structures through random transformations. Furthermore, we use a random occlusion strategy to create a large-scale dataset of incomplete corn point clouds, which serves as the training data for subsequent algorithm development.

In terms of algorithms, to meet the requirements for the precise localization and measurement of corn in 3D space, Shape Coding PointAttN (SCPAN) is proposed. This begins by simplifying the original model architecture to reduce the number of parameters, as dense point clouds are not required for capturing the overall shape. Consequently, the simplified model is designed to directly output sparse point clouds. A gated multilayer perceptron (MLP) with 3D position encoding [19] is incorporated into the completion network to enhance spatial feature perception. Additionally, given that this study focuses on a single-class point cloud completion task for corn and primarily aims at overall shape completion, a shape prior encoder [20] is trained using the original 3D corn models and is incorporated into the model.

The main contributions of this study are as follows:

(1): This work develops an efficient method, SCPAN, for completing fragmented corn point clouds to obtain the pose and shape information about corn, helping the precise phenotypic analysis of corn and providing technical support for intelligent perception and decision making in smart agriculture.
(2): It proposes a novel data generation process for the fragmented corn point cloud. In this way, we construct a corn point cloud completion dataset through data collection, segmentation, modeling, transformation, and occlusion simulation, providing high-quality training data for point cloud completion.
(3): It examines the architecture of the baseline model PointAttN and explores our task’s goal. The structure of the original model is simplified to output sparse point clouds and fit the single-class point cloud completion task.
(4): A prior shape encoder is trained and incorporated into SCPAN, which helps the model to focus on the overall shape. A novel gated MLP with spatial feature enhancement is also incorporated into SCPAN. Three-dimensional position encoding is introduced into the gated MLP to improve the model’s spatial awareness, thereby enhancing its point cloud completion ability.

2. Materials and Methods

2.1. Data Acquisition

The core objective of this study is to complete fragmented corn point clouds with varying positions and sizes under occlusion conditions. Therefore, obtaining high-quality data is a crucial step. In this work, unoccluded corn data were first collected from multiple perspectives and the segmented corn converted into 3D models. Considering the natural variations in corn size during growth, a point cloud transformation algorithm was applied to adjust the scale, thereby expanding the dataset and enhancing the model’s robustness. Finally, a random occlusion strategy was introduced to generate a large set of incomplete point clouds, which serve as training data for point cloud completion. The process of producing data is shown in Figure 1.

2.1.1. Original Data Collection

The data collection process was conducted in the experimental fields of Yangzhou University, Jiangsu Province. To address the challenges posed by outdoor strong light conditions, a ZED2i stereo depth camera(Stereolabs, San Francisco, CA, USA) [21] was used for depth data acquisition as shown in Figure 2. In total, 100 unoccluded corn RGB-D images with different orientations were captured. The RGB images had a resolution of 1080 × 720 pixels, and the corresponding depth maps shared the same resolution. All images were saved in PNG format. To extract corn from the background, we utilized the SAM and converted the segmented corn data into point cloud models based on the depth data.

2.1.2. Model Transformation and Random Occlusion

To introduce variability and simulate real-world conditions, the length and width of the point cloud models were randomly scaled within a transformation range of 0.8–1.2. After transformation, the transformed models obtained were randomly occluded to obtain a large amount of data. The sample data changes are shown in Figure 3. Table 1 shows the composition of the dataset for this work. This work divides the dataset in a 9:1 ratio for training and testing.

2.2. Acquisition of the Pose and Shape Information

Through the information about the completed point cloud’s OBB, the information about the pose and shape with their dimensions can be obtained.

(1): Dimensions: These indicate the OBB’s width and length, respectively. They explain the properties of the corn as they are shown in Figure 4.
(2): Three-dimensional coordinates of the center: These indicate the 3D coordinates of the object center in the camera coordinate system. As shown in Figure 4, they specify the object’s location along the x-, y-, and z-axes.
(3): Rotation matrix: The rotation matrix that transforms the local OBB axes to world coordinates. Each column of the rotation matrix represents the direction of one of the OBB’s local axes in world coordinates. Figure 4 illustrates the connection between these rotation angles.

2.3. Point Cloud Completion Algorithm

In this work, a method for estimating the position and pose of occluded corn using point cloud completion is proposed. SCPAN, an enhanced version of PointAttN [22] in the point cloud completion network, is introduced. PointAttN is one of the best point cloud completion algorithms, achieving better results in the PCN dataset [23] than PCN [24] and SnowflakeNet [25].

The proposed SCPAN model simplifies the original structure, while also incorporating a gated MLP that integrates 3D positional encoding and a prior shape encoder to improve the model’s performance. The model is capable of completing fragmented point clouds, and the pose information of the corn can be extracted by calculating the OBB of the completed point cloud using Open3D. The process proposed in this work is shown in Figure 5.

2.3.1. Standard PointAttN Model

PointAttN is a point cloud completion framework designed to predict a complete 3D shape from an incomplete point cloud. It addresses the limitations of k-nearest neighbors (KNNs), which struggle to describe local geometric structures. Its key innovation is the elimination of explicit local region partitioning and the introduction of cross-attention and self-attention mechanisms to capture both short-range and long-range structural relationships among points. PointAttN consists of the following three main modules: the feature extractor, seed generator, and point generator. The feature extractor adaptively captures both local structural details and the global context of an object to generate a shape code. The seed generator takes the shape code as input to produce a sparse yet complete point cloud. Finally, the point generator refines this output by taking both the sparse point cloud and shape code as input. To achieve these, two key blocks are integrated within these modules as follows: the Geometric Details Perception (GDP) block, which facilitates information aggregation, and the Self-Feature Augmentation (SFA) block, which captures intricate geometric details about 3D shapes. The original PointAttN structure is illustrated in Figure 6.

2.3.2. Shape Coding PointAttN

The standard PointAttN demonstrates excellent performance in completing point clouds. However, when it comes to achieving shape and pose estimation through the completion of corn point clouds, there is still room for improvement. Firstly, the results of this work do not need to be generated by dense point clouds. Secondly, the original model excessively focuses on the recovery of local details. Thirdly, for a single category (such as corn), in order to enhance the overall performance of the model, it is necessary to further strengthen its 3D modeling ability of point clouds. Based on the above analysis, the model structure of PointAttN is simplified to output sparse point clouds. The model’s attention to local details is appropriately reduced and instead focuses on the completion of the overall shape and the processing of sparse point clouds. Therefore, we propose a gated MLP combined with 3D position encoding and introduce a prior shape encoder to enhance the model’s shape regression ability. The overall architecture is shown in Figure 7.

The improvements proposed in this work are as follows:

(1): Model structure simplification:
The original PointAttN model consists of two main stages—the generation of a sparse point cloud in the first stage, followed by densification in the second stage. However, in this work, the final output is derived through the generation of an OBB, which is independent of point cloud density. Therefore, the model structure is simplified accordingly, as illustrated in Figure 7.
(2): Three-dimensional positional gated MLP:
The MLP is an important component of PointAttN. However, traditional MLPs present the following problem: they rely on a single activation function, which restricts their ability to model complex features. In contrast, gated MLP utilizes the activation function to regulate the information flow, enhancing feature representation and making it more suitable for point cloud tasks with complex geometric structures. However, conventional gated MLP lacks spatial feature modeling, making it less effective for spatial tasks. To address this limitation, we introduce a 3D position encoding structure into the gated MLP to propose a 3D positional gated MLP, improving the model’s spatial awareness as shown in Figure 7.
(3): Prior shape encoder:
This work primarily focuses on the point cloud completion of single-crop corn, given that the shape of corn is relatively consistent. A prior shape encoder is constructed by encoding 3D corn models to capture their geometric characteristics. As illustrated in Figure 7, the prior shape encoder takes the 3D point cloud of the corn models as input, extracts global features via the MLP, and subsequently obtains high-dimensional feature vectors through a pooling-based prior encoder. During model training, these high-dimensional feature vectors are embedded into the model to enhance its shape perception capabilities.

2.4. Experimental Setting

Table 2 lists the hardware and software setups used for model testing and training in this study. With a batch size of nine, the training epochs are set to 400. The Adam optimizer with a momentum of 0.9 is used to carry out the optimization. Every 100 rounds, the learning rate is modified from the starting setting of 0.0001.

2.5. Training Loss

In order to define the loss function during training, we use the Chamfer Distance (

C D

) as the measure for point cloud similarity. The seed point cloud and the output point cloud from the two cascaded point generators are denoted by

P_{0}, P_{1}, P_{2}

, respectively. Three sub-clouds,

S_{0}, S_{1}

, and

S_{2}

, are obtained by downsampling the ground truth point cloud using FPS. These sub-clouds have the same density as

P_{0}, P_{1}

, and

P_{2}

, respectively. The definition of the model loss is shown in Equation (1), expressed as follows:

L = \sum_{i = 0}^{2} λ_{i} d_{C D} (P_{i}, S_{i})

(1)

where

d_{C D}

is the Chamfer Distance loss, which is defined by Equation (2), expressed as follows:

d_{C D} (P, S) = \frac{1}{| P |} \sum_{p \in P} min_{s \in S} ∥ p - s ∥ + \frac{1}{| S |} \sum_{s \in S} min_{p \in P} ∥ s - p ∥

(2)

In the implementation, each

λ_{i}

is set to 1.

2.6. Evaluation Metrics

To evaluate the quality of the completed point clouds, we adopt the Chamfer Distance, a widely used metric in point cloud generation and completion tasks. The

C D

measures the average closest-point distances between the predicted point cloud and the ground truth and effectively captures both the accuracy and completeness of the reconstructed shape. Let

\hat{P} = {\{{\hat{x}}_{i}\}}_{i = 1}^{N}

denote the predicted point cloud and

Y = {\{y_{j}\}}_{j = 1}^{M}

denote the ground truth point cloud. We compute the

C D

in both directions as follows:

The

C D

from the predicted point cloud to the ground truth is defined as shown in Equation (3), expressed as follows:

{CD}_{p} = \frac{1}{N} \sum_{\hat{x} \in \hat{P}} min_{y \in Y} {∥ \hat{x} - y ∥}_{2}^{2}

(3)

The

C D

from the ground truth to the prediction is defined as shown in Equation (4), expressed as follows:

{CD}_{t} = \frac{1}{M} \sum_{y \in Y} min_{\hat{x} \in \hat{P}} {∥ y - \hat{x} ∥}_{2}^{2}

(4)

Here,

{CD}_{p}

evaluates how accurately the predicted points approximate the ground truth surface, while

{CD}_{t}

reflects the completeness of the predicted shape with respect to the ground truth.

In the case of multi-stage completion frameworks, we further report the

C D

at the coarse prediction stage, denoted as

{\hat{P}}_{coarse}

. The coarse-level metrics are defined as in Equation (5) as follows:

\begin{matrix} {CD}_{p}^{coarse} & = \frac{1}{|{\hat{P}}_{coarse}|} \sum_{\hat{x} \in {\hat{P}}_{coarse}} min_{y \in Y} {∥ \hat{x} - y ∥}_{2}^{2} \\ {CD}_{t}^{coarse} & = \frac{1}{| Y |} \sum_{y \in Y} min_{\hat{x} \in {\hat{P}}_{coarse}} {∥ y - \hat{x} ∥}_{2}^{2} \end{matrix}

(5)

All

C D

values are reported in squared Euclidean distance. Lower values of each metric indicate a better reconstruction performance. In our experiments, we report all four metrics—

{CD}_{p}, {CD}_{t}

,

{CD}_{p}^{coarse}

, and

{CD}_{t}^{coarse}

—to comprehensively assess the quality of both the final and coarse predictions.

The accuracy of the pose estimates is gauged by the

A c c u r a c y

. The formula for its computation is shown in Equation (6), expressed as follows:

A c c u r a c y = \frac{| P r e d i c t i o n s - G T |}{| G T |}

(6)

where the model’s predicted values are shown by

P r e d i c t i o n s

, whereas the associated ground truth values are denoted by

G T

.

3. Results

3.1. Comparisons with Baseline

Based on PointAttN, this work simplifies the model structure and introduces a gated MLP with position encoding while integrating a prior shape encoder. As illustrated in Table 3, the number of SCPAN parameters is reduced by 34.2% compared to the baseline, without compromising the accuracy of completing the fragmented point cloud. In inference time, PointAttN takes approximately 80 ms, while the proposed SCPAN achieves a faster runtime of around 50 ms, resulting in a 30 ms improvement. Compared with other structured point cloud completion networks such as PCN [24] and SnowflakeNet [25], SCPAN can also achieve better point cloud completion results. Table 3 also illustrates the impacts of the various proposals. The results show that the proposed method achieves a superior point cloud completion result with fewer parameters. In comparison with various alternative proposals, it is evident that simplifying the model structure reduces the number of model parameters. While model simplification may lead to a certain degree of capability reduction, the incorporation of a 3D positional gated MLP and prior shape coder not only compensates for the capability loss caused by structural simplification but also further enhances the model’s performance. It is also evident that the 3D positional gated MLP and prior shape encoder demonstrate more significant improvements in

{CD}_{p}^{coarse}

and

{CD}_{t}^{coarse}

, suggesting that these two approaches can more effectively enhance the model’s overall shape regression capability.

3.2. Ablation Experiments

Table 4 presents the results of the various ablation experiments that were performed, revealing the impact of different proposed combinations on the results. The results of the ablation experiments reveal that our proposals can better complete the fragmented point clouds and enhance the ability to complete the overall shape. The outcomes also indicate that adding a 3D positional gated MLP and prior shape encoder to the original PointAttN can yield a performance comparable to that of the proposed SCPAN; however, the number of parameters will increase substantially. This certifies that the SCPAN proposed in this work achieves superior effects with fewer parameters.

3.3. Statistical Evaluation of Point Cloud Completion Metrics

To rigorously assess the reliability and consistency of our point cloud completion method, we report not only the mean Chamfer Distance values but also their associated standard deviations and 95% confidence intervals over the test dataset. These statistics offer insights into the model’s robustness across diverse samples and its generalization capability. The statistical evaluation of SCPAN is shown in Table 5, and the statistical evaluation of PointAttN is shown in Table 6:

From the statistical evaluation results, SCPAN exhibits better robustness, as reflected in its lower standard deviations and narrower 95% confidence intervals across all metrics. For instance, the standard deviation of

{CD}_{p}

is only 0.0024 for SCPAN, compared to 0.0028 for PointAttN, and its 95% CI spans from 0.0076 to 0.0220, while PointAttN’s CI ranges from 0.0155 to 0.0324. These tighter intervals suggest that SCPAN not only produces more accurate predictions but also maintains greater consistency across varying input conditions. This robustness is especially valuable for practical deployment in scenarios with complex or incomplete observations.

3.4. Loss During Training

Figure 8 displays the loss change curves of PointAttN and SCPAN during training. Both models exhibit similar loss changes, indicating a comparable performance in the training process.

3.5. Presentation of Results

Figure 9 shows the process of the proposed method to complete the fragmented point clouds and then detect their OBBs. Through this process, the length and width of the corn, rotation matrix, and 3D center-point can be obtained.

Estimation Accuracy of SCPAN

As shown in Table 7, our proposed SCPAN demonstrates an accuracy of approximately 98.5% in the pose and shape estimation of corn compared to the ground truth.

4. Discussion

4.1. Main Work

Within a complex agricultural environment, due to the occlusion of corn leaves, obtaining the complete three-dimensional structure of corn plants and their precise pose encounters considerable challenges. To tackle this problem, this work puts forward a method based on point cloud completion to detect the pose and shape of corn from the OBBs of complete corn models, aimed at completing the fragmented point cloud resulting from occlusion and accurately obtaining information about the pose and shape of the corn in three dimensions. For the purpose of effectively training the point cloud completion model, this work generates data systematically in batches. It utilizes high-quality 3D models of corn, in combination with a random occlusion strategy, to construct a large-scale point cloud completion dataset, thereby enhancing the generalization ability of the model.

To further improve the accuracy and stability of point cloud completion, this work proposes a point cloud completion algorithm, which we call SCPAN, based on PointAttN. SCPAN is a transformer-based network architecture that generally involves more parameters; it offers higher accuracy and improved robustness when dealing with corn of varying sizes during different growth stages. SCPAN first simplifies the structure of the model to reduce the number of model parameters and outputs a sparse point cloud as the result. This work also proposes a gated MLP incorporating 3D position encoding to enhance the model’s awareness of spatial features. Simultaneously, in order to reduce the parameters of the model, we utilize 3D corn models to train a prior shape encoder and introduce it into the model training.

The experimental outcomes indicate that, on the occluded point cloud dataset constructed in this study, the point cloud completion algorithm proposed in this work is capable of achieving a completion effect equivalent to the baseline with fewer parameters, verifying the effectiveness of the proposed method.

4.2. Comparison with Existing Research

Table 8 shows the comparison between this work and previous related work. This work is the first in the field of smart agriculture to employ the point cloud completion technique to obtain the pose and shape information about corn under occlusion. It can detect a single corn plant with a speed of up to 50 ms. In terms of speed, this work is better than those of Chen [18] and Gené-Mola [15] in completing shape estimation using 3D modeling. In terms of the speed and simplicity of the method, it is superior to that of Guo [16] in using point cloud segmentation.

Compared with our previous work [26], as the results show in Table 7, this work can better solve the occlusion problem, updating the data generation method to improve the robustness of the algorithm.

The idea of this work is similar to that of Magistri [17], which uses the existing 3D models for supervision, which is not conducive to expanding to data that have not been collected. In this work, the dataset can be expanded in batches through the random transformation of 3D models, so that the model is suitable for completing the corn plants without 3D model collection, and it is more suitable for large-scale farmland environments.

4.3. Limitations

The results in the test set were presented in Section 3. To further evaluate the robustness of the proposed method under challenging conditions, we simulate point cloud completion in extreme scenarios with severe occlusion. Specifically, the method was tested on data with 40% to 60% occlusion of the input points. As illustrated in Figure 10, when the occlusion level exceeds 50%, the completion results begin to degrade significantly. As can be seen from Figure 10, when the missing portion reaches 50%, SCPAN is still capable of recovering the overall shape of the corn. However, there remains a noticeable discrepancy in the completed corn’s dimensions compared to the ground truth. This deviation hinders the precise estimation of the corn’s physical measurements. In such cases, the predicted point cloud diverges noticeably from the ground truth, indicating failure in recovering the original object shape. This highlights the difficulty in accurate completion under extreme occlusion.

Nevertheless, this study still has certain limitations. Firstly, this work merely investigates point cloud completion under partial occlusion; that is, the occluded area is confined to a portion of the corn and does not encompass more complex occlusion scenarios, such as the completion of point clouds caused by multiple leaves or the corn being in the frontal position to the observer. Moreover, this work did not explore multiple corn plants in one scene simultaneously.

Additionally, the research objectives of this paper mainly pertain to corn at the maturity stage, where the characteristics of the corn are relatively pronounced, and existing deep learning segmentation techniques can extract them from the background with relative ease. However, in practical agricultural applications, the morphological characteristics of corn exhibit significant variations at different growth stages, particularly in the early growth period, when the contrast between the corn and the leaves is low, and traditional segmentation methods might face challenges in extracting the target area effectively. Future research can further explore these challenging occlusion circumstances to enhance the robustness and practicality of the model. Hence, future research could further explore methods for completing and segmenting occluded point clouds of corn at different growth stages and more challenging occlusion circumstances to enhance the model’s adaptability.

5. Conclusions

This work aimed to address the issue of point cloud completion for occluded corn to obtain the pose and shape. A method based on completing the fragmented point cloud after segmentation to obtain the complete corn model was proposed. The information about the pose and shape and their dimensions can be obtained from the OBB of the complete corn model. A better point cloud completion algorithm based on PointAttN, which we call SCPAN, was also proposed in this work. Considering that the task is focused on the completion of the overall shape, SCPAN simplifies the model structure to output a sparse point cloud as the result, thereby significantly reducing the number of model parameters. Furthermore, by introducing a gated MLP with 3D position encoding and a prior shape encoder, the regression of the overall shape of the object by the model is enhanced. The experimental results demonstrated that SCPAN achieves comparable point cloud completion results while having 34% fewer parameters than PointAttN, and the inference time was reduced by 30 ms. Simultaneously, the experimental results also reveal that the 3D positional gated MLP and the prior shape encoder can notably improve the

{CD}_{p}^{coarse}

and

{CD}_{t}^{coarse}

metrics, indicating that these two proposals can enable the model to better focus on the overall shape of the object. Statistical evaluation further proved the validity of our proposal. The method and algorithm proposed in this work are not only applicable for the estimation of corn poses but also to other crops (e.g., citrus and tomatoes). This work can provide technical support for automatic picking and phenotypic extraction in precision agriculture, promoting the advancement of smart agriculture.

Author Contributions

Conceptualization, Y.G.; Methodology, Y.G., B.L. and Z.L.; Software, Y.G. and Z.L.; Validation, Y.G.; Formal analysis, Y.G.; Data curation, T.L.; Investigation, Y.G. and T.L.; Writing—original draft, Y.G. and T.L.; Writing—review and editing, Y.G. and Z.L.; Visualization, Y.G. and T.L.; Supervision, B.L., Z.L. and L.Z.; Formal analysis, B.L. and Y.G.; Project administration, B.L. and L.Z.; Resources, B.L. and L.Z.; Funding acquisition, L.Z. All authors read and agreed to the published version of the manuscript.

Funding

This work was supported by JST SPRING, Japan, Grant Number JPMJSP2154.

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author upon reasonable request. The related code will be released at https://github.com/GYLLLLLL/corn-point-cloud-completion (accessed on 7 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SAM	Segment Anything Model
SCPAN	Shape Coding PointAttN
MLP	Multilayer Perceptron
OBB	Oriented Bounding Box
GDP	Geometric Details Perception
SFA	Self-Feature Augmentation
CD	Chamfer Distance

References

Karunathilake, E.; Le, A.T.; Heo, S.; Chung, Y.S.; Mansoor, S. The path to smart farming: Innovations and opportunities in precision agriculture. Agriculture 2023, 13, 1593. [Google Scholar] [CrossRef]
Erenstein, O.; Jaleta, M.; Sonder, K.; Mottaleb, K.; Prasanna, B.M. Global maize production, consumption and trade: Trends and R&D implications. Food Secur. 2022, 14, 1295–1319. [Google Scholar]
Hoque, S.; Arafat, M.Y.; Xu, S.; Maiti, A.; Wei, Y. A comprehensive review on 3D object detection and 6D pose estimation with deep learning. IEEE Access 2021, 9, 143746–143770. [Google Scholar] [CrossRef]
Zanin, A.R.A.; Neves, D.C.; Teodoro, L.P.R.; da Silva Júnior, C.A.; da Silva, S.P.; Teodoro, P.E.; Baio, F.H.R. Reduction of pesticide application via real-time precision spraying. Sci. Rep. 2022, 12, 5638. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Wang, X.; Au, W.; Kang, H.; Chen, C. Intelligent robots for fruit harvesting: Recent developments and future challenges. Precis. Agric. 2022, 23, 1856–1907. [Google Scholar] [CrossRef]
Visakh, R.L.; Anand, S.; Reddy, S.B.; Jha, U.C.; Sah, R.P.; Beena, R. Precision Phenotyping in Crop Science: From Plant Traits to Gene Discovery for Climate-Smart Agriculture. Plant Breed. 2024. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–3 October 2023; pp. 4015–4026. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Lei, L.; Yang, Q.; Yang, L.; Shen, T.; Wang, R.; Fu, C. Deep learning implementation of image segmentation in agricultural applications: A comprehensive review. Artif. Intell. Rev. 2024, 57, 149. [Google Scholar] [CrossRef]
Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic segmentation of agricultural images: A survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
Charisis, C.; Argyropoulos, D. Deep learning-based instance segmentation architectures in agriculture: A review of the scopes and challenges. Smart Agric. Technol. 2024, 8, 100448. [Google Scholar] [CrossRef]
Champ, J.; Mora-Fallas, A.; Goëau, H.; Mata-Montero, E.; Bonnet, P.; Joly, A. Instance segmentation for the fine detection of crop and weed plants by precision agricultural robots. Appl. Plant Sci. 2020, 8, e11373. [Google Scholar] [CrossRef] [PubMed]
Perez-Borrero, I.; Marin-Santos, D.; Vasallo-Vazquez, M.J.; Gegundez-Arias, M.E. A new deep-learning strawberry instance segmentation methodology based on a fully convolutional neural network. Neural Comput. Appl. 2021, 33, 15059–15071. [Google Scholar] [CrossRef]
Zhuang, Z.; Zhi, Z.; Han, T.; Chen, Y.; Chen, J.; Wang, C.; Cheng, M.; Zhang, X.; Qin, N.; Ma, L. A survey of point cloud completion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5691–5711. [Google Scholar] [CrossRef]
Gené-Mola, J.; Sanz-Cortiella, R.; Rosell-Polo, J.R.; Escola, A.; Gregorio, E. In-field apple size estimation using photogrammetry-derived 3D point clouds: Comparison of 4 different methods considering fruit occlusions. Comput. Electron. Agric. 2021, 188, 106343. [Google Scholar] [CrossRef]
Guo, R.; Xie, J.; Zhu, J.; Cheng, R.; Zhang, Y.; Zhang, X.; Gong, X.; Zhang, R.; Wang, H.; Meng, F. Improved 3D point cloud segmentation for accurate phenotypic analysis of cabbage plants using deep learning and clustering algorithms. Comput. Electron. Agric. 2023, 211, 108014. [Google Scholar] [CrossRef]
Magistri, F.; Marks, E.; Nagulavancha, S.; Vizzo, I.; Läebe, T.; Behley, J.; Halstead, M.; McCool, C.; Stachniss, C. Contrastive 3D shape completion and reconstruction for agricultural robots using RGB-D frames. IEEE Robot. Autom. Lett. 2022, 7, 10120–10127. [Google Scholar] [CrossRef]
Chen, H.; Liu, S.; Wang, C.; Wang, C.; Gong, K.; Li, Y.; Lan, Y. Point cloud completion of plant leaves under occlusion conditions based on deep learning. Plant Phenomics 2023, 5, 0117. [Google Scholar] [CrossRef]
Wu, Z.; Wu, Y.; Pu, J.; Li, X.; Wang, X. Attention-based depth distillation with 3d-aware positional encoding for monocular 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Washinngton, DC, USA, 7–14 February 2023; Volume 37, pp. 2892–2900. [Google Scholar]
Tian, M.; Ang, M.H.; Lee, G.H. Shape prior deformation for categorical 6d object pose and size estimation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXI 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 530–546. [Google Scholar]
Tadic, V.; Toth, A.; Vizvari, Z.; Klincsik, M.; Sari, Z.; Sarcevic, P.; Sarosi, J.; Biro, I. Perspectives of realsense and zed depth sensors for robotic vision applications. Machines 2022, 10, 183. [Google Scholar] [CrossRef]
Wang, J.; Cui, Y.; Guo, D.; Li, J.; Liu, Q.; Shen, C. Pointattn: You only need attention for point cloud completion. In Proceedings of the AAAI Conference on AStificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 5472–5480. [Google Scholar]
Xie, H.; Yao, H.; Zhou, S.; Mao, J.; Zhang, S.; Sun, W. Grnet: Gridding residual network for dense point cloud completion. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 365–381. [Google Scholar]
Yuan, W.; Khot, T.; Held, D.; Mertz, C.; Hebert, M. Pcn: Point completion network. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 728–737. [Google Scholar]
Xiang, P.; Wen, X.; Liu, Y.S.; Cao, Y.P.; Wan, P.; Zheng, W.; Han, Z. Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 5499–5509. [Google Scholar]
Gao, Y.; Li, Z.; Hong, Q.; Li, B.; Zhang, L. Corn pose estimation using 3D object detection and stereo images. Comput. Electron. Agric. 2025, 231, 110016. [Google Scholar] [CrossRef]

Figure 1. The method flow proposed in this work generates 3D models from the original image with the segmentation and depth map, and it then generates fragmented point clouds by random transformation and occlusion.

Figure 2. Experimental field and equipment used in this study.

Figure 3. (A), (B), (C) and (D) respectively show the generation of fragmented point clouds through different 3D models of corn. The data generation process begins with capturing images of corn under non-occluded conditions. Subsequently, individual corn point cloud models are extracted through segmentation. The final dataset is generated by applying random geometric transformations and occlusions to the point cloud models.

Figure 4. Information about the pose and shape of the corn with dimensions obtained using an OBB.

Figure 5. Process of obtaining the shape and pose of the fragmented corn point cloud using SCPAN.

Figure 6. Original PointAttN framework.

Figure 7. Proposed original framework for SCPAN. Based on PointAttN, it simplifies the model structure and incorporates a prior shape encoder and 3D positional gated MLP.

Figure 8. Loss change during training.

Figure 9. Process of completing the fragmented point clouds and obtaining their OBBs.

Figure 10. Failure cases of the point cloud completion algorithm.

Table 1. Composition of the dataset produced.

Original Corn Models	Transformed Corn Models	Fragmented Point Clouds
100	500	4000

Table 2. Software and hardware configuration.

Accessories	Model
CPU	Intel (R) Xeon (R) CPU E5-1650 v4
RAM	$64 G$
Operating system	Ubuntu18.04
GPU	NVIDIA GeForce RTX 1080Ti × 3
Development	Python 3.8, Pytorch 1.9.1
Environment	CUDA11.3

Table 3. Results of our proposals and other algorithms.

Model	${CD}_{p}$	${CD}_{t}$	${CD}_{p}^{coarse}$	${CD}_{t}^{coarse}$	Parameters
PCN [24]	20.82	13.58	34.62	39.67	6.85 M
SnowflakeNet [25]	7.96	5.21	26.54	36.42	73.7 M
PointAttN	6.92	4.53	23.08	31.67	146.7 M
PointAttN(Simplified)	9.18	7.18	23.49	29.74	96.4 M
PointAttN(Simplified)+3DMLP	4.86	2.38	6.25	14.25	96.7 M
SCPAN	2.99	1.41	4.36	11.46	97.0 M

Table 4. Results of the ablation experiments.

Model	${CD}_{p}$	${CD}_{t}$	${CD}_{p}^{coarse}$	${CD}_{t}^{coarse}$	Parameters
PointAttN (Simplified)	6.84	4.51	22.68	31.76	96.6 M
+ Prior shape encoder
PointAttN	5.86	2.88	7.25	16.25	146.9 M
+ Prior shape encoder
PointAttN	5.26	2.68	6.85	14.85	147.0 M
+3DMLP
PointAttN	2.85	1.38	4.34	11.36	147.2 M
+ Prior shape encoder + 3DMLP

Table 5. Statistical evaluation of SCPAN.

Metric	Mean	Std Dev	95% CI
${CD}_{p}$	0.0148	0.0024	(0.0076, 0.0220)
${CD}_{t}$	0.0042	0.0009	(0.0014, 0.0070)
${CD}_{t}^{coarse}$	0.0148	0.0019	(0.0090, 0.0207)
${CD}_{p}^{coarse}$	0.0611	0.0052	(0.0452, 0.0769)

Table 6. Statistical evaluation of PointAttN.

Metric	Mean	Std Dev	95% CI
${CD}_{p}$	0.0240	0.0028	(0.0155, 0.0324)
${CD}_{t}$	0.0119	0.0025	(0.0043, 0.0195)
${CD}_{t}^{coarse}$	0.0313	0.0078	(0.0075, 0.0551)
${CD}_{p}^{coarse}$	0.0713	0.0115	(0.0364, 0.1062)

Table 7. Accuracy (%) of pose estimation in test set.

3D Center	Dimensions	Rotation Matrix	Pitch Angle ( $γ$ )	Orientation Angle ( $β$ )
98.7	99.5	96.1	99.4	96.5

Table 8. Comparison between the shape coding PointAttN algorithm and previous studies.

Author	Crop	Method	Task
Magistri et al. [17]	Fruit	Encoder–decoder	3D shape estimation
Chen et al. [18]	Cabbage	3D reconstruction	Phenotype extraction
Guo et al. [16]	Cabbage	Point cloud segmentation	Phenotype extraction
Gené-Mola et al. [15]	Apple	Point cloud generation	Size estimation
Gao et al. [26]	Corn	3D object detection	Pose estimation
SCPAN (Ours)	Corn	Point cloud completion	Pose estimation

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, Y.; Li, Z.; Liu, T.; Li, B.; Zhang, L. Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder. Agronomy 2025, 15, 1155. https://doi.org/10.3390/agronomy15051155

AMA Style

Gao Y, Li Z, Liu T, Li B, Zhang L. Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder. Agronomy. 2025; 15(5):1155. https://doi.org/10.3390/agronomy15051155

Chicago/Turabian Style

Gao, Yuliang, Zhen Li, Tao Liu, Bin Li, and Lifeng Zhang. 2025. "Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder" Agronomy 15, no. 5: 1155. https://doi.org/10.3390/agronomy15051155

APA Style

Gao, Y., Li, Z., Liu, T., Li, B., & Zhang, L. (2025). Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder. Agronomy, 15(5), 1155. https://doi.org/10.3390/agronomy15051155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.1.1. Original Data Collection

2.1.2. Model Transformation and Random Occlusion

2.2. Acquisition of the Pose and Shape Information

2.3. Point Cloud Completion Algorithm

2.3.1. Standard PointAttN Model

2.3.2. Shape Coding PointAttN

2.4. Experimental Setting

2.5. Training Loss

2.6. Evaluation Metrics

3. Results

3.1. Comparisons with Baseline

3.2. Ablation Experiments

3.3. Statistical Evaluation of Point Cloud Completion Metrics

3.4. Loss During Training

3.5. Presentation of Results

Estimation Accuracy of SCPAN

4. Discussion

4.1. Main Work

4.2. Comparison with Existing Research

4.3. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI