Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping

Ahmed, Foysal; Li, Dawei; Zhao, Boyuan; Wang, Zhanjiang; Huang, Jiali; Li, Tingzhicheng; Huang, Jingjing; Hou, Jiahui; Jobaer, Sayed; Yan, Han

doi:10.3390/plants15040599

Open AccessArticle

Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping

by

Foysal Ahmed

^1,†

,

Dawei Li

^1,2,3,†

,

Boyuan Zhao

^4,5,

Zhanjiang Wang

¹

,

Jiali Huang

¹,

Tingzhicheng Li

¹,

Jingjing Huang

¹,

Jiahui Hou

¹,

Sayed Jobaer

¹

and

Han Yan

^6,*

¹

School of Information and Intelligent Science (SIIS), Donghua University, Shanghai 201620, China

²

Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education, Donghua University, Shanghai 201620, China

³

State Key Laboratory of Advanced Fiber Materials (SKLAFM), Donghua University, Shanghai 201620, China

⁴

College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China

⁵

Key Laboratory of Spectroscopy Sensing, Ministry of Agriculture and Rural Affairs, Hangzhou 310058, China

⁶

School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Plants 2026, 15(4), 599; https://doi.org/10.3390/plants15040599

Submission received: 19 January 2026 / Revised: 9 February 2026 / Accepted: 10 February 2026 / Published: 13 February 2026

(This article belongs to the Special Issue AI-Driven Machine Vision Technologies in Plant Science)

Download

Browse Figures

Versions Notes

Abstract

Pepper (Capsicum annuum) is a globally significant horticultural crop cultivated for its culinary, medicinal, and economic value. Traditional approaches for boosting the agricultural production of pepper, notably, expanding farmland, have become increasingly unsustainable. Recent advancements in artificial intelligence and 3D computer vision have started to transform crop cultivation and phenotyping, which has shed new light on increasing production by advanced breeding. However, currently, the field still lacks 3D pepper data that contains enough detail for organ-level analysis. Therefore, we propose Pepper-4D, a new, high-precision 4D point cloud dataset that records both the spatial structure and temporal development of pepper plants across various continuous growth stages. Our dataset is divided into three subsets, including a total of 916 individual point clouds from 29 indoor-cultivated pepper plant samples. Our dataset provides manual annotations at both the plant-level and organ-level, supporting phenotyping tasks such as pepper growth status classification, organ semantic segmentation, organ instance segmentation, organ growth tracking, new organ detection, and even the generation of synthetic 3D pepper plants.

Keywords:

plant phenotyping; pepper; neural radiance field (NeRF); point clouds; 4D data; organ segmentation; organ tracking; growth event detection; generative 3D plants

Graphical Abstract

1. Introduction

Pepper (Capsicum annuum) is a globally significant horticultural crop cultivated for its culinary, medicinal, and economic value. Originating in Central and South America, peppers have a long history of domestication [1,2], with archeological evidence indicating their use as early as 7500 BC. Through the Columbian Exchange, Capsicum peppers spread from the Americas across continents and were rapidly integrated into regional diets and agricultural systems [3]. Today, peppers enrich a wide range of cuisines and support a multi-billion-dollar global trade, underpinning the livelihoods of millions of smallholder farmers [4]. The presence of capsaicinoids, which exhibit antioxidant and anti-inflammatory activities, combined with high levels of vitamins A and C, underscores the significance of peppers in nutritional science and preventive healthcare. Together, these culinary, economic, and nutritional/health-related aspects highlight the strategic importance of pepper production [5].

In parallel with the expanding economic and nutritional role of pepper cultivation, global agriculture is facing significant challenges such as the pressure of global population growth, climate change, and the shrinking arable land area; these challenges have already negatively impacted crop yields, especially key staples such as wheat, maize, and rice [6,7]. Traditional approaches for boosting agricultural production, notably, expanding farmland, have become increasingly unsustainable due to biodiversity loss, soil degradation, and environmental instability [8,9]. Recent advancements in artificial intelligence (AI) and computer vision have started to transform crop cultivation and breeding methods. Techniques of plant phenotyping, such as Convolutional Neural Network (CNN) and YOLO-like detection networks, enable automated tasks such as disease detection [10], pest identification [11], fruit localization [12], and yield estimation [13] using two-dimensional (2D) images.

Although 2D-based methods have shown strong results in specific agricultural applications, they have fundamental limitations due to their flat perspective. Visual obstructions and viewpoint dependence often result in data loss, making it difficult to accurately reconstruct the plant’s total shape, volume, and spatial arrangement of organs. Furthermore, existing 2D plant datasets only record snapshots at one or several discrete time points, which limits their utility in analyzing continuous growth and structural changes over time. To address these shortcomings, several 3D crop datasets have been developed to capture detailed spatial information and enable more accurate organ-level analysis. These include three crop datasets introduced in [14] (tomato, tobacco, and sorghum), as well as datasets for rose [15], maize [16], and soybean [17], which collectively demonstrate the value of 3D data for advancing structural phenotyping across diverse species. However, these datasets generally only capture static snapshots at one or several discrete time points, which limits their utility in analyzing continuous growth and structural changes over time. This limitation reduces their effectiveness for studying dynamic developmental processes such as germination, rapid expansion, and senescence. To the best of our knowledge, currently, none of the public datasets capture the complete 3D structural and developmental dynamics of pepper plants in enough detail for organ-level analysis.

To overcome these challenges, we introduce Pepper-4D, a new, high-precision 4D point cloud dataset that records both the spatial structure and temporal development of pepper plants across various continuous growth stages. This dataset offers detailed insights into morphological changes from early vegetative stages to flowering, fruiting, and aging. By addressing the shortcoming of the temporal aspect on existing 3D datasets, Pepper-4D aims to serve as a benchmark for advancing dynamic 3D plant phenotyping, automated growth monitoring, and the structural analysis of plant organs at different developmental phases. Our contributions are three-fold.

(i): We establish Pepper-4D, a large-scale spatiotemporal dataset comprising 916 individual point clouds from 29 indoor-cultivated pepper plant samples. The peppers were scanned daily across a period of more than 45 days with the Neural Radiance Field (NeRF) technique, which uses about 100 images from different viewing angles to generate a single pepper point cloud containing point numbers from 36,762 to 1,320,895. The total point number of the dataset is 322.72 million.
(ii): Pepper-4D contains three different subsets. Subset 1 records a long period of growth for 11 potted peppers; Subset 2 contains eight pepper sequences for geotropism tests, during which the cultivation pots are overturned for a period and then recovered; and Subset 3 contains 10 pepper growth sequences. These subsets document developmental events such as budding, flowering, fruiting, organ disappearance, and withering, and contain plant-level and organ-level annotations for testing different phenotyping algorithms.
(iii): Our dataset provides manual annotations such as point-level growth status label (healthy or withering), point-level semantic organ labels (e.g., stem, leaf) and instance organ annotations, and the point-level organ labels tracked in the timeline, as well as the point-level new organ labels. Based on these annotations, we successfully conducted experiments that cover the phenotyping tasks of pepper growth status classification, pepper organ semantic segmentation, pepper organ instance segmentation, pepper organ growth tracking, new organ detection, and even the generation of synthetic 3D pepper plants with existing strong baselines.

2. Related Work

2.1. 2D Pepper Datasets and Applications

Nowadays, the number and scale of crop-related image datasets are huge [18], and new open datasets are still sprouting up. Therefore, we will only review datasets that are closely related to pepper plants. Two-dimensional (2D) image datasets have played a foundational role in the development of deep learning models for automated pepper phenotyping. Spanning from real-world agricultural images to synthetic renderings, these datasets have enabled substantial progress in precision agriculture for tasks such as disease classification [19], fruit detection [2], and fruit grading [20].

Multiple 2D image datasets have been developed to support studies on peppers, particularly in disease recognition/classification. Barth et al. [21] introduced a Capsicum annuum dataset for image-based semantic segmentation in agricultural scenes, showing that learning-based models can effectively parse pepper plant imagery. In a complementary direction, Sosa-Herrera et al. [19] investigated Capsicum annuum crop health estimation from RGB imagery using deep learning, further demonstrating the potential of 2D vision models for pepper phenotyping and monitoring tasks. Pallewatta et al. [22] introduced the BellCrop dataset with 4860 leaf images covering three categories (healthy, powdery mildew-infected, and magnesium-deficient leaves), where EfficientNet-B7 achieved 96.67% accuracy. Bezabh et al. [23] proposed a concatenated model combining VGG16 and AlexNet for pepper leaf and fruit disease classification on a dataset containing 1596 images with six disease classes. Yin et al. [20] constructed a large-scale spicy pepper dataset with over 28,011 images and 34 classes of diseases and pests and used a ResNet50-based model to conduct classification. For external fruit grading, Ren et al. [24] developed a dataset of 776 images, including 400 images of normal peppers and 376 images of inferior peppers for pepper fruit grading. While these studies demonstrate a strong performance, most of these datasets remain unavailable to the public.

Beyond disease classification, several studies have focused on organ or disease/pest area detection on peppers to support robotic harvesting and yield estimation. Lopez-Barrios et al. [25] applied Mask R-CNN with a ResNet-101 backbone to detect fruits and peduncles of green sweet peppers in greenhouse conditions with 30,000 images under varying light intensity. Li et al. [26] applied an improved YOLOv4-tiny model for green pepper detection on a dataset of 1500 greenhouse images collected under different illumination conditions. Zheng et al. [27] proposed MSPB-YOLO, an improved YOLOv8-based model that integrates RVB-EMA attention, RepGFPN feature fusion, and DIoU loss to detect multi-site pepper blight disease on leaves, stems, and fruits with a mAP of 96.4% under complex conditions. Kapetas et al. [28] developed YOLOv11-Small for leaf segmentation with Transformer and RT-qPCR validation, achieving 87.42% accuracy and an F1-score of 81.13% for early detection of the Botrytis Cinerea pest on pepper leaves.

Despite these advances, existing pepper image datasets still fall short in representing the structural and developmental complexity of pepper plants, and applications mainly focus on detection or identification, lacking organ-level segmentation and analysis. Most of the datasets consist of static snapshots without temporal information, making them unsuitable for continuous growth monitoring and detailed organ-level analysis over time. These issues motivated us to bridge the research gap by establishing a comprehensive, annotated, spatiotemporal 3D dataset of the pepper plants.

2.2. 3D Crop Datasets

Three-dimensional (3D) crop datasets have become increasingly popular in advancing structural plant phenotyping [29,30], organ-level analysis [31], and growth monitoring [32]. Unlike traditional 2D datasets, 3D point clouds preserve volumetric and spatial integrity, which enables high-precision, organ-level trait extraction under both controlled and field conditions. One of the earliest efforts was made by Conn et al. [14]; they captured 558 high-resolution models of tomato, tobacco, and sorghum species across different growth conditions during a growth period. However, the lack of organ-level annotations restricts the applicability of the dataset for fine-grained phenotyping. Subsequent work [33,34] addressed this limitation by introducing manual organ annotations of this dataset. Dutagaci et al. [15] developed ROSE-X, an X-ray tomography-based dataset of 11 rosebushes with organ-level semantic labels for stems and leaves. This dataset enables the application of organ-level segmentation methods; however, its small scale may introduce scalability problems. Schunck et al. [16] established Pheno4D, a publicly available spatiotemporal dataset featuring point clouds of seven maize and seven tomato plants collected across a period of time. In total, the dataset provides 49 labeled maize point clouds and 77 labeled tomato point clouds to support tasks such as semantic and instance segmentation.

More recent contributions have expanded both feature diversity and scale. Sun et al. [17] introduced Soybean-MVS, which reconstructed 102 soybean point clouds with real colors across 13 growth stages using Multi-view Stereo (MVS), and the authors also provided both semantic and instance labels for leaves. Mertoglu et al. [35] presented PLANesT-3D, providing 34 annotated color point clouds of pepper, rose, and ribes plants reconstructed via MVS, with both semantic and instance annotations. However, the 10 pepper point clouds in PLANesT-3D only provide point clouds for a single growth stage without temporal coverage. Marks et al. [36] presented BonnBeetClouds3D, a large-scale 3D sugar beet plant dataset captured in real breeding plots. The dataset provides organ-level labels for 186 plants and 2661 leaves across 48 varieties, along with over 10,000 salient features such as leaf tips, leaf corners, and plant centers.

At a larger scale, Zhu et al. [37] released Crops3D, which compiled 1230 point clouds from eight crop species (maize, rice, wheat, potato, cabbage, tomato, cotton, and rapeseed), scanned with a combination of Terrestrial Leaser Scanning (TLS), Structured Light, and MVS techniques. With semantic and instance annotations for multiple organs, Crops3D supports applications of classification and segmentation. Most recently, Kimara et al. [38] introduced AgriField3D, an open dataset that provides a total of 1045 maize point clouds, with 520 inside being labeled. The dataset also offers multiple resolution versions (100k, 50k, and 10k points) to facilitate diversified downstream applications, making it an effective dataset for phenotypic analysis.

Despite publicly available 3D crop datasets increasing, the research community still lacks public, high-quality 3D datasets for pepper plants. Although temporal pepper datasets based on RGB image sequences and morphological measurements have been released [39,40], publicly available temporal 3D datasets for pepper remain scarce, which restricts dynamic growth monitoring and phenotyping across time. These shortcomings also explain why a dense, labeled, and spatiotemporal dataset of pepper is important.

2.3. Applications on 3D Crop Datasets

Three-dimensional (3D) crop datasets have enabled a wide range of advanced applications in plant phenotyping. The increasing availability of annotated point cloud datasets has accelerated progress in the semantic segmentation of crops, organ-level trait extraction, and growth monitoring.

3D crop datasets facilitate the tasks of organ semantic segmentation, organ instance segmentation, fruit detection, and trait analysis. Li et al. [33] introduced PlantNet, a dual-function deep learning network that simultaneously performs the semantic segmentation of stems and leaves and instance segmentation of individual leaves from 3D plant point clouds with multiple species. Zhang et al. [41] proposed an automatic organ semantic segmentation network SN-MGGE for cucumber seedling point cloud segmentation and phenotypic parameter extraction. SN-MGGE achieves high accuracy by integrating the Euclidean local geometric relationship and the HGS to enhance point embedding. Cui et al. [42] proposed PVSegNet to leverage voxel convolutions to enhance feature extraction for both the organ semantic and instance segmentations of mature soybean pods and stems on point clouds. Wu et al. [43] proposed a skeleton extraction approach for deriving phenotypic traits of maize plants directly from 3D point clouds. The method integrates point cloud de-noising, Laplacian-based contraction, adaptive sampling, and skeleton calibration to generate organ-level structural skeletons. Xie et al. [44] introduced Plant-MAE, a self-supervised 3D plant organ semantic segmentation framework trained on the large, unlabeled point cloud datasets of eight crop species, including maize, tomato, potato, cotton, cabbage, rapeseed, rice, and wheat. Dong et al. [45] proposed an automatic two-stage organ instance segmentation method that integrates PointNeXt for stem–leaf semantic segmentation and Quickshift++ for leaf instance segmentation on sugarcane, maize, and tomato point clouds. Gene-Mola et al. [46] developed a fruit detection and 3D localization framework that integrates Mask R-CNN instance segmentation with structure-from-motion (SfM) photogrammetry to identify and spatially locate fruits in orchard environments. The method was evaluated on 11 Fuji apple trees containing 1455 fruits and showed satisfactory results on 3D fruit detection and localization. Liu et al. [47] introduced FF3D, a 3D apple detection framework that integrates an anisotropic Gaussian-based next-best view (NBV) estimator to achieve robust fruit localization under occlusion for robotic harvesting.

Tracking the same organ across growth stages and detecting growth events (like budding, bifurcation, and decay) are also essential for dynamic plant trait analysis. Li et al. [48] presented a framework for analyzing plant growth from 4D point cloud data, enabling the robust detection of spatiotemporal growth events such as budding, bifurcation, and decay and also the tracking of individual organs through time. Magistri et al. [49] developed a registration framework that aligns plant point clouds across growth stages by integrating SVM-based stem–leaf classification, DBSCAN-based leaf instance segmentation, and self-organizing maps (SOMs). Li et al. [50] developed TrackPlant3D, a 3D organ growth tracking framework that enables dynamic organ-level phenotyping from time-series crop point clouds. The study introduced a spatiotemporal dataset of four plant species (maize, sorghum, tobacco, and tomato) with temporally consistent organ instance annotations, supporting the detection of growth events. Most recently, Li et al. [32] presented 3D-NOD, a spatiotemporal segmentation framework for detecting new organs from time-series point clouds. The method reached a mean F1 of 88.1% and mean IoU of 80.7% for new organ detection on species including tobacco, tomato, and sorghum.

Existing 3D crop datasets mainly focus on crops such as maize, tomato, cotton, and apple. Annotated 3D resources for pepper plants are still missing. Also, the evaluation of a broad range of algorithms on different phenotyping tasks on the existing 3D datasets is also rare to find, especially the spatiotemporal analysis at organ-level. All these reasons motivated us to build the Pepper-4D dataset.

3. Materials and Methods

3.1. Materials

The Pepper-4D dataset comprises three distinct subsets that collectively capture developmental diversity and experimental variability in pepper plants cultivated under controlled indoor conditions. Indoor lighting was provided for approximately 12 h per day, with an average ambient temperature of ~25 °C and average relative humidity of ~45%. Point clouds were scheduled to be acquired at two-day intervals once plants reached a minimum size suitable for reliable 3D scanning; the reported month ranges denote the overall cultivation period. All three subsets were collected under the same indoor cultivation conditions, and representative samples are shown in Figure 1. Subset 1 (Figure 1a) contains 460 point clouds from 11 potted plant sequences collected during the July–December 2024 cultivation period, covering the full developmental cycle from early vegetative growth to flowering, fruiting, and senescence. Subset 2 (Figure 1b) includes 238 point clouds from 8 plant sequences collected between September and December 2024 for geotropism testing: potted plants were placed on their side (approximately horizontal) for a controlled duration and then restored upright to observe recovery behavior. This subset captures structural reorientation and morphological adaptation under gravitational perturbation. Subset 3 (Figure 1c) consists of 218 point clouds from 10 plant sequences collected during the July–September 2025 cultivation period, focusing on early-to-mid developmental phases characterized by rapid organ emergence and canopy expansion. Together, these subsets span major growth events such as budding, flowering, fruiting, organ disappearance, and withering; therefore, our Pepper-4D dataset is a comprehensive benchmark for precise plant-level and organ-level dynamic spatiotemporal phenotyping of pepper plants.

3.2. Data Acquisition

The workflow of the data acquisition is illustrated in Figure 2, consisting of three sequential stages—image acquisition and preprocessing (Figure 2a), 3D reconstruction (Figure 2b), and preparation of plant-only point cloud. During image acquisition, each sample plant was photographed by an RGB camera from a moving cellphone along two circled trajectories, collecting approximately 100 multi-view images per scan under uniform indoor LED illumination to ensure consistent color and low reflection artifacts. The preprocessing sub-step uses the Pose Refinement (PR) method to obtain a sparse scene point cloud and estimate camera poses from captured multi-view images. PR uses the Scale-Invariant Feature Transformation (SIFT) to identify key points and their SIFT feature descriptors in each image. Subsequently, the SIFT feature descriptors are matched across different images. Then, the RANSAC algorithm is used to estimate the relative motion (i.e., relative rotation and translation) between each pair of images. Finally, through Bundle Adjustment (BA), the internal and external parameters of the camera are refined, and a sparse 3D point cloud of pepper (the rightmost of Figure 2a) is obtained. In the 3D reconstruction stage, we turn the sparse point cloud into a high-precision one. NeRFacto [51], an improved version of NeRF, is used as the neural backbone for the 3D reconstruction step (Figure 2b). NeRFacto takes rough scene point clouds and camera poses as input and outputs precise point clouds with adjustable density by Point Cloud Exporter (a part of NeRFacto). Finally, we prepare plant-only point cloud (shown in Figure 2c) by keeping the crop part above the cultivation pot and then applying the Statistical Outlier Removal (SOR) operation from CloudCompare [52] to reduce noise.

3.3. Data Annotation

Data annotations were carried out on both plant-level and organ-level, and we will explain the annotation tasks with visualized examples from Subset 1 and Subset 2. The data annotation examples on Subset 1 are shown in Figure 3, and annotations on Subset 2 are shown in Figure 4. All point clouds were manually annotated using the CloudCompare [52] tool. In Subset 1, we focus on four different types of labels spanning across organ-level labels to plant-level labels. One sequence from Subset 1 is visualized to show how we annotate organ instance labels [32,33], organ semantic labels (stem and leaf), temporally aligned organ labels (for organ tracking) [50], and plant-level health labels in Figure 3. In Subset 2, we focus on two other different types of labels spanning across organ-level labels to plant-level labels. One sequence from Subset 2 is visualized to show plant-level geotropism and organ-level new organ growth events in Figure 4. The organ-level new organ growth events were annotated using the Backward and Forward Labeling (BFL) technique proposed in 3D-NOD [32], in which each pepper point cloud is compared bidirectionally in the timeline to identify new buds and new leaves.

4. Tasks and Results

4.1. Health Assessment by Classification

Plant phenotyping fundamentally aims to monitor crop health, because the timely recognition of stress symptoms is essential for optimizing cultivation management and accelerating breeding decisions. Hereby, we show that the automatic crop health assessment can be carried out on Pepper-4D; the task is realized as a binary point cloud classification task on the pepper sequences, where each point cloud will be classified as either normal (being vigorous and healthy) or withering (undergoing stress-induced decline). Three representative deep architectures, PointNet [53], PointNet++ [54], and Dynamic Graph CNN (DGCNN) [55], were evaluated on Pepper-4D to evaluate the accuracy and robustness of plant health assessment.

4.1.1. Methodology

Point clouds provide a natural and compact representation of plant architecture, preserving the complete three-dimensional arrangement of stems, leaves, and fruits. PointNet [53] is a classical baseline model for point cloud classification and segmentation. PointNet applies shared Multi-layer Perceptrons (MLPs) to individual points and aggregates features through max pooling into a global descriptor, ensuring permutation invariance. PointNet++ (Figure 5a), an improved version of PointNet, extends the feature extraction capability of PointNet through hierarchical architecture design, where points are sampled, grouped into local neighborhoods, and processed to capture multi-scale geometric features. DGCNN, illustrated in (Figure 5b), employs Dynamic Graph learning with EdgeConv operations on the k-nearest neighbor of graphs of feature embeddings to model local geometry and higher-order structural relationships, producing a more adaptive representation of plant morphology.

4.1.2. Health Assessment Results

The health assessment experiment on the Pepper-4D dataset includes 460 point clouds from 11 growth sequences, from which nine sequences have withering events. During data augmentation, we enriched the training data using standard point cloud augmentation, with Farthest Point Sampling (FPS) applied to generate uniform point sets, following the augmentation protocol in the original implementation [54]. The normal pepper point clouds were augmented 10 times, and the withering (dying) point clouds were augmented 100 times to keep a balanced training set, since the dying samples are substantially fewer than the normal ones. After augmentation, the training set contains 2710 “Normal” samples and 2400 “Withering” samples, while the testing set contains 1540 “Normal” and 1100 “Withering” samples, and each point cloud contains exactly 2048 points. The quantitative results of our health assessment experiments are summarized in Table 1, which includes comparative metrics from all three networks. In Table 1, three metrics—precision (Prec), recall (Rec), and F1 [33]—are listed to show the quantitative performances. All three networks achieve high accuracy with mean F1-scores above 96.00%. PointNet++ achieves the best result under all metrics, with a mean Prec at 98.38%, a mean Rec at 98.43%, and a mean F1-score of 98.32%.

The qualitative results of the health assessment are shown in Figure 6. PointNet++ shows the best prediction results compared with the other two models, having no misclassification in the ten test samples. Both quantitative and qualitative results implicate that the PointNet++ model (across the three models) is best suited for solving the pepper health assessment task, which achieves a stable and consistent performance by effectively capturing hierarchical geometric features and local structural information.

4.2. Organ Segmentation

Organ segmentation is a key component of 3D phenotyping, enabling fine-grained analysis of the plant structure. Organ segmentation usually comprises two subtasks—organ semantic segmentation, which assigns an organ semantic label (stem/leaf/fruit, etc.) to each point of the crop, and organ instance segmentation, e.g., distinguishing a dense canopy class of a crop into individual leaves. We apply two representative dual-branch networks—PlantNet [33] and PSegNet [34]—to carry out the organ segmentation on the Pepper-4D dataset. Both models are capable of performing organ semantic segmentation and organ instance segmentation simultaneously.

4.2.1. Methodology

PlantNet [33] is a dual-function segmentation network specifically developed for multi-species crop point cloud datasets. It is able to perform leaf–stem semantic segmentation and leaf instance segmentation simultaneously. As shown in Figure 7a, the PlantNet model adopts a dual-pathway architecture comprising a shared encoder and a dual-pathway decoder that connects to two segmentation tasks, respectively. To effectively capture fine-grained geometric characteristics, PlantNet incorporates multiple Local Feature Extraction Operations (LFEOs) constructed upon EdgeConv blocks [55]. A Feature Fusion Module (FFM) is subsequently employed to integrate multi-level representations, while a spatial attention mechanism in the rear of the architecture emphasizes spatially salient features to enhance segmentation. Similarly, PSegNet [34] is a biologically inspired dual-function deep network designed for the organ semantic and instance segmentations of crop point clouds. As illustrated in Figure 7b, its architecture begins with a shared encoder containing a series of Local Feature Extraction Modules (LFEMs) for efficient local feature aggregation. In the middle part, a biologically inspired dual-pathway decoder that resembles the dorsal flow and the ventral flow inside the human brain is first used to form two streams of features, and then a Dual-granularity Feature Fusion Module (DGFFM) integrates the two complementary feature streams to enrich feature representation. The final stage combines spatial and channel attention mechanisms to further improve feature extraction. Ultimately, one pathway carries out organ semantic segmentation, and the other pathway conducts organ instance segmentation.

4.2.2. Semantic Segmentation Results

In the organ semantic segmentation experiments done by PlantNet and PSegNet on the Pepper-4D dataset, a total of 187 point clouds from Subset 1 were used for training and 50 point clouds from Subset 1 for testing. During data preparation, each point cloud was down-sampled to 4096 points using the Farthest Point Sampling (FPS) method to ensure uniform spatial resolution and low point redundancy. Ten-fold data augmentation based on FPS was applied to all training point clouds, resulting in a total of 1870 training samples. The quantitative results of our organ semantic segmentation experiments are summarized in Table 2. The evaluation metrics for semantics include precision (Prec), recall (Rec), F1-score (F1), and Intersection-over-Union (IoU) [33] and are computed separately for stems and leaves in Table 2. Both networks achieve high accuracy with mean F1-scores and mean IoU values above 90.00%. Among them, PSegNet demonstrates a slightly better overall quantitative performance against PlantNet.

The qualitative results of organ semantic segmentation by the two networks are shown in Figure 8. Both PlantNet and PSegNet accurately separate the stem and leaf regions across samples with different structural complexity and demonstrate similar results against the Ground Truth. PSegNet, in particular, produces more stable predictions in dense canopy regions (the last column of Figure 8) and effectively mitigates local misclassifications along the boundaries between the leaves and the stems. The superiority of PSegNet over PlantNet may be explained by its biologically inspired architecture that enhances the capability to capture hierarchical geometric cues and spatial information.

4.2.3. Instance Segmentation Results

Organ instance segmentation aims to separate individual organs (usually leaves and fruits) for detailed phenotypic analysis. In this task, we applied PlantNet and PSegNet on the Pepper-4D dataset, with 187 training and 50 testing point clouds from Subset 1 to segment leaf instances. During data preparation, each point cloud was down-sampled to 4096 points using the Farthest Point Sampling (FPS) method to ensure uniform spatial resolution and to reduce point redundancy across samples. To further improve the training process, ten-fold data augmentation based on FPS was applied to all training point clouds, resulting in a total of 1870 training samples for actual training. The quantitative results of our leaf instance segmentation experiments are summarized in Table 3, including evaluation metrics of the mean precision (mPrec), mean recall (mRec), mean Coverage (mCov), and mean Weighted Coverage (mWCov) [33]. Both PlantNet and PSegNet achieved strong performances with all metrics exceeding 70.00%. PSegNet exhibited slightly superior results compared to PlantNet, showing higher mPrec, mCov, and mWCov values.

The qualitative results of leaf instance segmentation by the two networks are presented in Figure 9, showing individual leaf segmentations across different growth stages with varying canopy densities. Both networks accurately identified most leaf instances, even under partial occlusion. PlantNet effectively detected small or emerging leaves near the bottom of the main stem but occasionally misclassified stem regions. PSegNet, in contrast, predicted more stable leaf instances along the main stem; however, PSegNet tends to produce over-segmentation on several leaves. Overall, both quantitative and qualitative results show that PSegNet achieved more reliable organ instance segmentation than PlantNet in our task, which may be explained by the benefit from PSegNet’s capacity to effectively capture hierarchical geometric features through the biologically inspired architecture and the dual-granularity feature extraction.

4.3. Detection of New Organs

The detection of new plant organs on crop point clouds is a task focusing on identifying emergent budding/new leaves in the 3D growth sequence. This task is crucial for plant phenotyping, as accurately detecting these temporal growth events offers critical insights into the physiological and structural responses of plants under varying environmental conditions. The new organ detection task remains challenging due to the tiny scales of buds, the imbalanced classes of mature organs and new organs, and the drastic morphological state change from the “new” organ to the “old” organ in growth. To address this task in the Pepper-4D dataset, we adopt the 3D-NOD [32] framework, which detects new organs across the point cloud sequences of pepper plants.

4.3.1. Methodology

The 3D-NOD [32] framework is specifically designed for new organ detection in spatiotemporal point cloud datasets. The design of 3D-NOD drew inspiration from how a well-experienced human utilizes spatiotemporal information to identify buds on a plant across several adjacent growth stages in the timeline. This framework transforms the detection problem into a point cloud segmentation task by integrating three key components. The Backward and Forward Labeling (BFL) process ensures temporally consistent annotations by refining organ labels across consecutive growth stages. The Registration and Mix-up (RMU) module performs spatial alignment between adjacent crop point clouds, fusing the spatial information and temporal information. The Humanoid Data Augmentation (HDA) strategy further strengthens model robustness by simulating human annotator behaviors, such as viewpoint variation and panoramic visibility, which aids in capturing small, newly emerging organs. Additionally, a Dynamic Graph CNN (DGCNN) [55] backbone, enriched with temporal encoding, is employed for new organ segmentation. Together, these components allow 3D-NOD to capture the subtle geometric and structural variations caused by growth development, thereby facilitating accurate and temporally consistent new organ detection.

4.3.2. New Organ Detection Results

The new organ detection experiments were done on eight partial growth sequences from Subset 2 in our Pepper-4D dataset, including a total of 80 point clouds, 60 samples of which were used for training and 20 others were for testing. During data preparation, we followed the steps of 3D-NOD and first annotated the entire dataset at the point level using the Backward and Forward Labeling (BFL) strategy [32], defining two semantic classes—the old organ and new organ (some examples are given in Figure 4b). Each point cloud was then down-sampled to 2048 points using the Farthest Point Sampling (FPS) algorithm to ensure both uniform spatial resolution and computational efficiency. Subsequently, we applied the Registration and Mix-up (RMU) step [32], which aligned every two adjacent point clouds in the timeline and then merged them into a single mixture containing 4096 points. This process automatically aligned the same organ instances in 4D space, ensuring both spatial and temporal coherence. Finally, the Humanoid Data Augmentation (HDA) strategy was employed to generate 10 times the augmented data of each mixed point cloud to increase data diversity. After training on a modified DGCNN backbone [55], a Split and Refinement step was applied to separate the combined point cloud back into the two original point clouds, effectively reversing the earlier Mix-up process. This step enables the 3D-NOD framework to refine the spatial boundaries of detected new organs. The evaluation metrics included precision (Prec), recall (Rec), F1-score (F1), and Intersection-over-Union (IoU) [32]. The 3D-NOD framework achieved a strong quantitative performance on the testing dataset, with a mean precision of 77.64%, recall of 84.90%, F1-score of 80.82%, and IoU of 71.88%. The qualitative results of two growth sequences from the testing dataset, as well as the GTs, are illustrated in Figure 10, in which the red-circled regions highlight newly detected organs. The quantitative and qualitative results show that 3D-NOD identifies new organs in our Pepper-4D dataset with high sensitivity and satisfactory accuracy. Overall, 3D-NOD exhibits a robust and temporally consistent performance for new organ detection in Pepper-4D, and also proves the Pepper-4D dataset to be an ideal dataset for time-series phenotyping analysis under spatiotemporal conditions.

4.4. Organ Tracking

Dynamic 3D phenotyping is essential for understanding plant growth and morphological development, as it enables the quantitative analysis of temporal structural variations and organ-level traits. However, significant challenges remain due to non-rigid organ deformation, frequent growth events such as budding and withering that continuously alter organ numbers, and the difficulty of maintaining the spatial and temporal correspondences of the same organ. To overcome these challenges, we not only need good organ-level growth tracking algorithms but also a high-precision spatiotemporal crop dataset, such as Pepper-4D, that allows algorithm testing. In order to perform the organ tracking task on Pepper-4D, we adopted TrackPlant3D [50] to track individual leaves across the different growth stages of pepper plants.

4.4.1. Methodology

TrackPlant3D [50] is an automated organ-level growth tracking framework developed for time-series crop point clouds, aiming to establish temporally consistent organ labels throughout sequential growth stages. The framework comprises three main stages—point cloud processing, organ correspondence estimation, and per-organ growth tracking. In the first stage, adjacent plant point clouds in the timeline are spatially aligned through iterated non-rigid registrations to minimize structural discrepancies caused by non-rigid morphological changes in growth. The second stage associates the same organ instance across time by combining the geometric similarity and spatial proximity on both local and global aspects. Finally, the third stage propagates and refines organ labels throughout the sequence to produce tracked point clouds. By modeling spatiotemporal relationships between consecutive growth stages, TrackPlant3D is able to achieve accurate short- and long-term organ tracking performances across several different plant species.

4.4.2. Organ Tracking Results

In the organ tracking experiment, we conducted TrackPlant3D [50] on the Pepper-4D dataset. A total of 236 point clouds from Subset 1 were used for evaluation. Each point cloud contains temporally aligned organ instance labels, including the stem system and individual leaves (several examples are shown in Figure 3d). Considering the dense organ structure of pepper plants, each point cloud was down-sampled to 10,000 points using the Farthest Point Sampling (FPS) method to ensure uniform spatial resolution and to reduce point redundancy before calculation. In the organ tracking process, randomly arranged organ instance labels were used as the segmentation input to TrackPlant3D, and the framework successfully established temporal correspondences among independently segmented organs for most cases, achieving accurate qualitative tracking results in Figure 11. Despite bending and the emergence of new leaves, most organ outputs of TrackPlant3D have a unique and consistent color identity in the time-series sequence, clearly illustrating label consistency. The satisfactory tracking result of TrackPlant3D also underscores the Pepper-4D dataset’s significance for reliable organ-level analysis and dynamic phenotyping research.

4.5. Generating Natural and Vivid 3D Plants

Natural and vivid 3D reconstructions of a plant/crop are beneficial to the fields of plant physiology research, precision agriculture, plant phenotyping research, and even the game industry. The generation of 3D plants aims to accurately represent real plant morphology with coherent organ topology, realistic branching, and smooth geometric transitions among different organs. When serving the plant phenotyping purpose, 3D-generated plants not only support the development of data-driven growth models for physiological simulation and crop output prediction, but they can also augment limited real datasets for improving the robustness of deep learning models. Mature methodologies such as L-systems [56,57] in this domain have achieved the realistic visualization of some crop species; however, generating a structurally coherent 3D plant remains challenging due to non-rigid deformation, self-occlusion, and the continuous evolution of hierarchical organ development during growth. Generative Adversarial Networks (GANs) [58] offer a promising solution by learning the underlying distribution of training data to synthesize realistic samples. By generating additional data beyond the original dataset, GANs alleviate data scarcity through synthetic augmentation, enhancing model robustness and generalization in deep learning applications. Most existing GAN frameworks target two-dimensional images or generic geometric objects; however, synthesizing realistic 3D plant point clouds remains more difficult due to the intricate topology of stems, leaves, and branching architectures. To advance this direction, the Pepper-4D dataset provides a comprehensive dataset for plant structure generation, and two representative GAN architectures—TreeGAN [59] and WarpingGAN [60]—were employed to evaluate their generative performance on realistic pepper point clouds.

4.5.1. Methodology

Generative Adversarial Networks (GANs) [58] are a class of deep generative models that learn the underlying distribution of training data to synthesize new, realistic samples through an adversarial learning process. A Generative Adversarial Network (GAN) typically consists of two adversarial components—a generator and a discriminator—that are jointly optimized in opposition to each other, as illustrated in Figure 12. The generator receives random noise as the input and synthesizes realistic 3D point clouds that emulate authentic plant structures in the training dataset. The discriminator, in contrast, learns to distinguish real point clouds from all outputs generated by the generator, thereby driving the generator to progressively refine its outputs until they become similar to real crop data. Based on our Pepper-4D dataset, two representative GAN architectures, TreeGAN [59] and WarpingGAN [60], were employed for 3D pepper plant generation. TreeGAN utilizes tree-structured graph convolutions that keep structurally consistent point distributions and coherent organ geometries. WarpingGAN, on the other hand, incorporates a warping mechanism that deforms multiple uniform 3D priors into realistic plant shapes, achieving enhanced geometric coherence and structural uniformity across synthesized pepper samples.

4.5.2. Results of 3D Generation of Pepper Plants

In this subsection, we show the point cloud results from both TreeGAN and WarpingGAN on the Pepper-4D dataset and evaluate their respective generative performances. A total of 20 pepper plant samples were used for training, with each point cloud down-sampled to 2048 points using Farthest Point Sampling (FPS) and augmented 128-fold to ensure both uniform spatial resolution and robust adversarial training. As illustrated in Figure 13, the qualitative results demonstrate that both TreeGAN and WarpingGAN can generate plausible pepper plant morphologies that approximate the original samples but mix with diversities at the same time. It is also interesting to note that TreeGAN and WarpingGAN differ in detail. TreeGAN produces sharper organ-level geometries (such as leaf surface and branching hierarchies), owing to its tree-structured graph convolutions; however, several TreeGAN-generated outputs encounter incomplete reconstruction, with sparse or missing leaf regions. WarpingGAN generates overall (globally) coherent plants but sometimes exhibits distorted or partially deformed organs, particularly in the upper canopy. Together, TreeGAN and WarpingGAN demonstrate the effectiveness of adversarial learning for generating complex 3D plant architectures, and they also reveal the potential value of our Pepper-4D dataset to advance the research of 3D-generated plant models.

5. Conclusions

This study introduces Pepper-4D, an open spatiotemporal (4D) point cloud dataset that captures the developmental cycle of pepper plants from early growth to senescence under controlled indoor conditions. The dataset consists of 916 high-precision point clouds collected from 29 plants and supports a wide range of 3D plant phenotyping tasks, including health classification, organ semantic segmentation, organ instance segmentation, new organ detection, organ-level growth tracking, and 3D virtual plant generation.

Beyond benchmarking 3D plant-understanding models, Pepper-4D provides practical value for dynamic phenotyping and controlled-environment cultivation research. It enables the quantitative analysis of stage-dependent architectural changes (e.g., organ emergence, canopy expansion, and senescence progression) and supports algorithm development for applications such as new growth detection, health/stress assessment over time, and organ-level trait extraction. By combining detailed 3D structural information with spatiotemporal annotations, Pepper-4D helps fill an important need for the accurate, scalable, time-resolved analysis of growth dynamics from 3D data.

Our future work will focus on enhancing dataset completeness and usability by extending organ-level annotations to the full dataset and incorporating additional labels for new organ detection and growth tracking analysis. In parallel, we will explore advanced learning frameworks for organ segmentation and spatiotemporal plant growth modeling that better leverage the temporal richness of Pepper-4D. We believe Pepper-4D will continue to evolve as a reliable and comprehensive 4D resource for plant phenotyping and precision agriculture.

Author Contributions

Conceptualization, D.L.; methodology, F.A. and D.L.; software, F.A. and B.Z.; validation, Z.W., J.H. (Jiali Huang), T.L., J.H. (Jingjing Huang), J.H. (Jiahui Hou), S.J. and H.Y.; formal analysis, H.Y.; resources, D.L.; data curation, Z.W., J.H. (Jiali Huang), T.L., J.H. (Jingjing Huang), J.H. (Jiahui Hou) and S.J.; writing—original draft preparation, F.A.; writing—review and editing, D.L.; visualization, F.A. and D.L.; supervision, D.L.; project administration, H.Y.; funding acquisition, D.L. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shanghai Sailing Program under Grant 24YF2701200, in part by the Fundamental Research Funds for the Central Universities under Grant 2232025D-50, and in part by Donghua University 2025 Cultivation Project of Discipline Innovation under grant No. xkcx-202505.

Data Availability Statement

Data and code can be found at https://github.com/foysalahmed10/Pepper-4D (accessed on 9 February 2026).

Acknowledgments

No GenAI has been used in the preparation of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Silvar, C.; Rocha, F.; Barata, A.M. Tracing Back the History of Pepper (Capsicum annuum) in the Iberian Peninsula from a Phenomics Point of View. Plants 2022, 11, 3075. [Google Scholar] [CrossRef]
Kondo, F.; Kumanomido, Y.; D’aNdrea, M.; Palombo, V.; Ahmed, N.; Futatsuyama, S.; Nemoto, K.; Matsushima, K. Prediction of fruit shapes in F1 progenies of chili peppers (Capsicum annuum) based on parental image data using elliptic Fourier analysis. Comput. Electron. Agric. 2025, 236, 110422. [Google Scholar] [CrossRef]
da Veiga, V.F., Jr.; Wiedemann, L.S.M.; de Araujo, C.P., Jr.; da Silva Antonio, A. Chemistry and Nutritional Effects of Capsicum; Royal Society of Chemistry: London, UK, 2022. [Google Scholar]
Palmer, N. Spicing Up Global Agriculture: WorldVeg Pepper Breeder Derek Barchenger Wins 2025 Borlaug Field Award. World Vegetable Center (WorldVeg). 2025. Available online: https://avrdc.org/spicing-up-global-agriculture-worldveg-pepper-breeder-derek-barchenger-wins-2025-borlaug-field-award/ (accessed on 9 February 2026).
Ilie, M.A.; Caruntu, C.; Tampa, M.; Georgescu, S.-R.; Matei, C.; Negrei, C.; Ion, R.-M.; Constantin, C.; Neagu, M.; Boda, D. Capsaicin: Physicochemical properties, cutaneous reactions and potential applications in painful and inflammatory conditions. Exp. Ther. Med. 2019, 18, 916–925. [Google Scholar] [CrossRef]
Ray, D.K.; West, P.C.; Clark, M.; Gerber, J.S.; Prishchepov, A.V.; Chatterjee, S. Climate change has likely already affected global food production. PLoS ONE 2019, 14, e0217148. [Google Scholar] [CrossRef]
Intergovernmental Panel on Climate Change. Climate Change 2007: Impacts, Adaptation and Vulnerability; Intergovernmental Panel on Climate Change: Geneva, Switzerland, 2001. [Google Scholar]
Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food security: The challenge of feeding 9 billion people. Science 2010, 327, 812–818. [Google Scholar] [CrossRef] [PubMed]
Department of Economic. World Population Prospects 2024: Summary of Results; Stylus Publishing, LLC: Sterling, VA, USA, 2024. [Google Scholar]
Li, D.; Ahmed, F.; Wu, N.; Sethi, A.I. Yolo-JD: A Deep Learning Network for jute diseases and pests detection from images. Plants 2022, 11, 937. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Wang, X. A multimodal framework for pepper diseases and pests detection. Sci. Rep. 2024, 14, 28973. [Google Scholar] [CrossRef]
Chen, H.; Zhang, R.; Peng, J.; Peng, H.; Hu, W.; Wang, Y.; Jiang, P. YOLO-chili: An efficient lightweight network model for localization of pepper picking in complex environments. Appl. Sci. 2024, 14, 5524. [Google Scholar] [CrossRef]
Nan, Y.; Zhang, H.; Zeng, Y.; Zheng, J.; Ge, Y. Faster and accurate green pepper detection using NSGA-II-based pruned YOLOv5l in the field environment. Comput. Electron. Agric. 2023, 205, 107563. [Google Scholar] [CrossRef]
Conn, A.; Pedmale, U.V.; Chory, J.; Navlakha, S. High-resolution laser scanning reveals plant architectures that reflect universal network design principles. Cell Syst. 2017, 5, 53–62. [Google Scholar] [CrossRef]
Dutagaci, H.; Rasti, P.; Galopin, G.; Rousseau, D. ROSE-X: An annotated data set for evaluation of 3D plant organ segmentation methods. Plant Methods 2020, 16, 28. [Google Scholar] [CrossRef] [PubMed]
Schunck, D.; Magistri, F.; Rosu, R.A.; Cornelißen, A.; Chebrolu, N.; Paulus, S.; Léon, J.; Behnke, S.; Stachniss, C.; Kuhlmann, H.; et al. Pheno4D: A spatio-temporal dataset of maize and tomato plant point clouds for phenotyping and advanced plant analysis. PLoS ONE 2021, 16, e0256340. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, Z.; Sun, K.; Li, S.; Yu, J.; Miao, L.; Zhang, Z.; Li, Y.; Zhao, H.; Hu, Z.; et al. Soybean-MVS: Annotated three-dimensional model dataset of whole growth period soybeans for 3D plant organ segmentation. Agriculture 2023, 13, 1321. [Google Scholar] [CrossRef]
Hong, K.; Zhou, Y.; Han, H. The pipelines of deep learning-based plant image processing. Quant. Plant Biol. 2025, 6, e23. [Google Scholar] [CrossRef]
Sosa-Herrera, J.A.; Alvarez-Jarquin, N.; Cid-Garcia, N.M.; López-Araujo, D.J.; Vallejo-Pérez, M.R. Automated health estimation of Capsicum annuum L. crops by means of deep learning and RGB aerial images. Remote Sens. 2022, 14, 4943. [Google Scholar] [CrossRef]
Yin, H.; Gu, Y.H.; Park, C.J.; Park, J.H.; Yoo, S.J. Transfer learning-based search model for hot pepper diseases and pests. Agriculture 2020, 10, 439. [Google Scholar] [CrossRef]
Barth, R.; IJsselmuiden, J.; Hemming, J.; Van Henten, E.J. Data synthesis methods for semantic segmentation in agriculture: A Capsicum annuum dataset. Comput. Electron. Agric. 2018, 144, 284–296. [Google Scholar] [CrossRef]
Pallewatta, P.; Halloluwa, T.; Karunanayaka, K.; Seneviratne, G.; Arachchi, S.M. BellCrop–A Bell Pepper Leaf Dataset for Disease Classification and Yield Enhancement using Machine Learning. In Proceedings of the IECON 2024-50th Annual Conference of the IEEE Industrial Electronics Society, Chicago, IL, USA, 3–6 November 2024; pp. 1–7. [Google Scholar]
Bezabh, Y.A.; Salau, A.O.; Abuhayi, B.M.; Mussa, A.A.; Ayalew, A.M. CPD-CCNN: Classification of pepper disease using a concatenation of convolutional neural network models. Sci. Rep. 2023, 13, 15581. [Google Scholar] [CrossRef] [PubMed]
Ren, R.; Zhang, S.; Sun, H.; Gao, T. Research on pepper external quality detection based on transfer learning integrated with convolutional neural network. Sensors 2021, 21, 5305. [Google Scholar] [CrossRef]
López-Barrios, J.D.; Escobedo Cabello, J.A.; Gómez-Espinosa, A.; Montoya-Cavero, L.E. Green sweet pepper fruit and peduncle detection using mask R-CNN in greenhouses. Appl. Sci. 2023, 13, 6296. [Google Scholar] [CrossRef]
Li, X.; Pan, J.; Xie, F.; Zeng, J.; Li, Q.; Huang, X.; Liu, D.; Wang, X. Fast and accurate green pepper detection in complex backgrounds via an improved Yolov4-tiny model. Comput. Electron. Agric. 2021, 191, 106503. [Google Scholar] [CrossRef]
Zheng, X.; Shao, Z.; Chen, Y.; Zeng, H.; Chen, J. MSPB-YOLO: High-Precision Detection Algorithm of Multi-Site Pepper Blight Disease Based on Improved YOLOv8. Agronomy 2025, 15, 839. [Google Scholar] [CrossRef]
Kapetas, D.; Kalogeropoulou, E.; Christakakis, P.; Klaridopoulos, C.; Pechlivani, E.M. Comparative Evaluation of AI-Based Multi-Spectral Imaging and PCR-Based Assays for Early Detection of Botrytis cinerea Infection on Pepper Plants. Agriculture 2025, 15, 164. [Google Scholar] [CrossRef]
Yikai, Q.; Yang, Z.; Longbin, X. An automated skeleton extraction method for 3D point-cloud phenotyping of Schima Superba seedlings. PLoS ONE 2025, 20, e0329715. [Google Scholar] [CrossRef]
Li, D.; Zhou, Z.; Wei, Y. Unsupervised shape-aware SOM down-sampling for plant point clouds. ISPRS J. Photogramm. Remote Sens. 2024, 211, 172–207. [Google Scholar] [CrossRef]
Li, D.; Huang, J.; Zhao, B.; Wen, W. Organ3DNet: A deep network for segmenting organ semantics and instances from dense plant point clouds. Artif. Intell. Agric. 2025, 16, 342–364. [Google Scholar] [CrossRef]
Li, D.; Ahmed, F.; Wang, Z. 3D-NOD: 3D new organ detection in plant growth by a spatiotemporal point cloud deep segmentation framework. Plant Phenomics 2025, 7, 100002. [Google Scholar] [CrossRef]
Li, D.; Shi, G.; Li, J.; Chen, Y.; Zhang, S.; Xiang, S.; Jin, S. PlantNet: A dual-function point cloud segmentation network for multiple plant species. ISPRS J. Photogramm. Remote Sens. 2022, 184, 243–263. [Google Scholar] [CrossRef]
Li, D.; Li, J.; Xiang, S.; Pan, A. PSegNet: Simultaneous semantic and instance segmentation for point clouds of plants. Plant Phenomics 2022, 2022, 9787643. [Google Scholar] [CrossRef] [PubMed]
Mertoğlu, K.; Şalk, Y.; Sarıkaya, S.K.; Turgut, K.; Evrenesoğlu, Y.; Çevikalp, H.; Gerek, Ö.N.; Dutağacı, H.; Rousseau, D. PLANesT-3D: A new annotated dataset for segmentation of 3D plant point clouds. arXiv 2024, arXiv:2407.21150. [Google Scholar]
Marks, E.; Bömer, J.; Magistri, F.; Sah, A.; Behley, J.; Stachniss, C. BonnBeetClouds3D: A dataset towards point cloud-based organ-level phenotyping of sugar beet plants under field conditions. arXiv 2023, arXiv:2312.14706. [Google Scholar]
Zhu, J.; Zhai, R.; Ren, H.; Xie, K.; Du, A.; He, X.; Cui, C.; Wang, Y.; Ye, J.; Wang, J.; et al. Crops3D: A diverse 3D crop dataset for realistic perception and segmentation toward agricultural applications. Sci. Data 2024, 11, 1438. [Google Scholar] [CrossRef]
Kimara, E.; Hadadi, M.; Godbersen, J.; Balu, A.; Jubery, T.; Li, Y.; Krishnamurthy, A.; Schnable, P.S.; Ganapathysubramanian, B. AgriField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel. arXiv 2025, arXiv:2503.07813. [Google Scholar]
Ruiz-Gonzalez, R.; Nascimento, A.M.M.D.; Santos, M.B.d.C.; Porto, R.K.S.d.B.; Medeiros, A.M.; dos Santos, F.S.; Martínez-Martínez, V.; Barroso, P.A. Temporal forecasting of plant height and canopy diameter from RGB images using a CNN-based regression model for ornamental pepper plants (Capsicum spp.) growing under high-temperature stress. Neural Comput. Appl. 2025, 37, 22107–22128. [Google Scholar] [CrossRef]
do Nascimento, A.M.M.; Ruiz-Gonzalez, R.; Martínez-Martínez, V.; Medeiros, A.M.; Santos, F.S.D.; Rêgo, E.R.D.; Pimenta, S.; Sudré, C.P.; Bento, C.d.S.; Cambra, C.; et al. Ornamental potential classification and prediction for pepper plants (Capsicum spp.): A comparison using morphological measurements and RGB images as data source. Appl. Sci. 2025, 15, 7801. [Google Scholar] [CrossRef]
Zhang, Y.; Xie, Y.; Zhou, J.; Xu, X.; Miao, M. Cucumber seedling segmentation network based on a multiview geometric graph encoder from 3D point clouds. Plant Phenomics 2024, 6, 0254. [Google Scholar] [CrossRef]
Cui, D.; Liu, P.; Liu, Y.; Zhao, Z.; Feng, J. Automated Phenotypic Analysis of Mature Soybean Using Multi-View Stereo 3D Reconstruction and Point Cloud Segmentation. Agriculture 2025, 15, 175. [Google Scholar] [CrossRef]
Wu, S.; Wen, W.; Xiao, B.; Guo, X.; Du, J.; Wang, C.; Wang, Y. An accurate skeleton extraction approach from 3D point clouds of maize plants. Front. Plant Sci. 2019, 10, 248. [Google Scholar] [CrossRef] [PubMed]
Xie, K.; Cui, C.; Jiang, X.; Zhu, J.; Liu, J.; Du, A.; Yang, W.; Song, P.; Zhai, R. Automated 3D Segmentation of Plant Organs via the Plant-MAE: A Self-Supervised Learning Framework. Plant Phenomics 2025, 7, 100049. [Google Scholar] [CrossRef] [PubMed]
Dong, S.; Fan, X.; Li, X.; Liang, Y.; Zhang, M.; Yao, W.; Yang, X.; Wang, Z. Automatic 3D Plant Organ Instance Segmentation Method Based on PointNeXt and Quickshift++. Plant Phenomics 2025, 7, 100065. [Google Scholar] [CrossRef] [PubMed]
Gené-Mola, J.; Sanz-Cortiella, R.; Rosell-Polo, J.R.; Morros, J.R.; Ruiz-Hidalgo, J.; Vilaplana, V.; Gregorio, E. Fruit detection and 3D location using instance segmentation neural networks and structure-from-motion photogrammetry. Comput. Electron. Agric. 2020, 169, 105165. [Google Scholar] [CrossRef]
Liu, T.; Wang, X.; Hu, K.; Zhou, H.; Kang, H.; Chen, C. FF3D: A rapid and accurate 3D fruit detector for robotic harvesting. Sensors 2024, 24, 3858. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Fan, X.; Mitra, N.J.; Chamovitz, D.; Cohen-Or, D.; Chen, B. Analyzing growing plants from 4D point cloud data. ACM Trans. Graph. (TOG) 2013, 32, 1–10. [Google Scholar] [CrossRef]
Magistri, F.; Chebrolu, N.; Stachniss, C. Segmentation-based 4D registration of plants point clouds for phenotyping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Virtual, 25–29 October 2020; pp. 2433–2439. [Google Scholar]
Li, D.; Liu, L.; Xu, S.; Jin, S. TrackPlant3D: 3D organ growth tracking framework for organ-level dynamic phenotyping. Comput. Electron. Agric. 2024, 226, 109435. [Google Scholar] [CrossRef]
Tancik, M.; Weber, E.; Ng, E.; Li, R.; Yi, B.; Wang, T.; Kristoffersen, A.; Austin, J.; Salahi, K.; Ahuja, A.; et al. Nerfstudio: A modular framework for neural radiance field development. In Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, Los Angeles, CA, USA, 6–10 August 2023; pp. 1–12. [Google Scholar]
CloudCompare Development Team. CloudCompare (Version 2.x) [GPL Software]. 2025. Available online: https://www.cloudcompare.org/ (accessed on 9 February 2026).
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
Ghrer, S.; Godin, C.; Wuhrer, S. Learning to Infer Parameterized Representations of Plants from 3D Scans. arXiv 2025, arXiv:2505.22337. [Google Scholar] [CrossRef]
Prusinkiewicz, P.; Shirmohammadi, M.; Samavati, F. L-systems in geometric modeling. Int. J. Found. Comput. Sci. 2012, 23, 133–146. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Shu, D.W.; Park, S.W.; Kwon, J. 3d point cloud generative adversarial network based on tree structured graph convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3859–3868. [Google Scholar]
Tang, Y.; Qian, Y.; Zhang, Q.; Zeng, Y.; Hou, J.; Zhe, X. WarpingGAN: Warping multiple uniform priors for adversarial 3D point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6397–6405. [Google Scholar]

Figure 1. Part of pepper samples in the three subsets of the Pepper-4D dataset. (a) Several mature pepper plants during the fruiting stage in Subset 1; (b) the pepper samples from Subset 2 used before geotropism test; and (c) potted samples from Subset 3.

Figure 2. Workflow of image acquisition and 3D reconstruction for the Pepper-4D dataset. (a) The step of image acquisition and preprocessing; (b) the 3D reconstruction step using the NeRFacto framework; and (c) the preparation step of the plant-only point cloud by pot removal and filtering on the result from Point Cloud Exporter.

Figure 3. Illustration of the annotation on a pepper point cloud sequence of Subset 1 from Pepper-4D dataset. From top to bottom: row (a) shows the original point clouds reconstructed across a series of consecutive growth stages from the same sequence; row (b) shows the manually annotated organ instance labels (including individual leaf instances and the stem) with different colors; row (c) shows manually annotated organ semantic labels (stem class and leaf class); row (d) shows temporally aligned organ instance indices across time for organ tracking; and row (e) shows the plant-level health status labels covering from the normal (green) stages to the withering (yellow) stages. The interval of the data is 2 days.

Figure 4. Representative annotation on a pepper point cloud sequence of Subset 2 from Pepper-4D dataset. Row (a) shows the original point clouds of the pepper sequence, in which the stages containing the geotropism event are labeled as “Geotropism”. Row (b) presents annotations of the organ-level growth event (new organ) using the Backwards and Forward Labeling (BFL) mechanism proposed in [32].

Figure 5. Architectures of tested networks for health assessment. (a) The PointNet++ architecture for pepper point cloud health classification; and (b) the DGCNN architecture for pepper point cloud health classification.

Figure 6. The comparative qualitative results of health assessment on Pepper-4D. This figure illustrates representative examples from the testing set, covering normal and withering pepper plants. The Ground Truth (GT) labels are shown on the 1st row; the results of PointNet, PointNet++, and DGCNN are listed on the 2nd, 3rd, and 4th rows, respectively. The classified normal (healthy) plants are shown in green, and the withering (unhealthy) plants are olive green, respectively. The misclassified plants are highlighted with a dotted red circle. Across all shown samples, PointNet++ has the best prediction results.

Figure 7. Overall segmentation pipelines of (a) PlantNet and (b) PSegNet on the Pepper-4D dataset. Both networks employ dual-pathway architectures that simultaneously perform organ semantic segmentation (stem/leaf) and organ instance segmentation (individual leaf separation).

Figure 8. The qualitative results of organ semantic segmentation by the two networks on the Pepper-4D dataset. This figure illustrates representative samples from the testing set, covering plants with varying structural complexity and canopy density. The GTs are shown in the first row, while the results of PlantNet and PSegNet are presented in the second and third rows, respectively. Points corresponding to leaves are in pink, and the stem points are in green. We also enlarge several parts for detailed comparison.

Figure 9. The comparative qualitative results of instance segmentations of the two networks on the Pepper-4D dataset. The GT labels are shown in the first row, while the predictions of PlantNet and PSegNet are displayed in the second and third rows, respectively. Each leaf instance is rendered in a distinct color for visual clarity. Misclassified or incomplete regions are highlighted with red dotted circles.

Figure 10. Visualization of two testing sequences in Subset 2 of Pepper-4D. Each sequence is tested by 3D-NOD for new organ detection, and the results are contrasted with GTs. For better visual effect, some areas containing small buds and leaves are enlarged. The points of new organs are rendered in purple, and the points of old organs are rendered in green.

Figure 11. Qualitative results of organ tracking on Pepper-4D using the TrackPlant3D framework. The figure presents two representative pepper growth sequences with manually labeled GTs. In each sequence, the 1st row shows the GTs of organ tracking; the 2nd row shows the instance segmentations as the input of TrackPlant3D (please note that the segmented organs are not aligned in the timeline); and the 3rd row shows the result of TrackPlant3D. Only two regions are incorrectly tracked, and they are highlighted with red dotted circles. For instance segmentation results, different organ colors are only used for better visual effect; but for organ tracking results, different organ colors represent different organ indices.

Figure 12. A standard Generative Adversarial Network (GAN) framework for generation of 3D pepper plants. The framework consists of two adversarial components—a generator and a discriminator. The architecture has a training stage and an evaluation stage, and the final output is generated at the evaluation (testing) stage.

Figure 13. Qualitative comparison of GAN-based 3D plant generation on the Pepper-4D dataset. The real samples (the 1st row) illustrate the natural structural variability of pepper plants across the Pepper-4D dataset. TreeGAN (the 2nd row) and WarpingGAN (the 3rd row) generate virtual pepper point clouds that capture both the similarity to real morphological patterns and diversity across generated instances. Different colors used in this figure are only for better visual seffect.

Table 1. Quantitative performance of PointNet, PointNet++, and DGCNN on the Pepper-4D health assessment (normal vs. withering). Metrics are reported as class-wise precision, recall, and F1-score (%), with mean values averaged across both classes. The “↑” sign means the higher the better, and the bold means the best in comparison.

Metrics	Network	Normal	Withering	Mean
Prec (%) ↑	PointNet	99.95	94.84	97.39
	PointNet++	99.54	97.23	98.38
	DGCNN	99.98	93.22	96.60
Rec (%) ↑	PointNet	96.13	99.98	98.05
	PointNet++	97.43	99.44	98.43
	DGCNN	94.81	99.96	97.38
F1 (%) ↑	PointNet	98.04	97.34	97.69
	PointNet++	98.31	98.34	98.32
	DGCNN	97.33	96.49	96.91

Table 2. Quantitative performances of PlantNet and PSegNet on organ semantic (stem/leaf) segmentation in Pepper-4D Subset 1. The “↑” sign means the higher the better, and the bold means the best in comparison.

Metrics	Network	Stem	Leaf	Mean
Prec (%) ↑	PlantNet	92.89	97.60	95.25
Prec (%) ↑	PSegNet	93.06	97.68	95.38
Rec (%) ↑	PlantNet	91.63	98.08	94.86
Rec (%) ↑	PSegNet	92.13	98.21	95.17
F1 (%) ↑	PlantNet	92.26	97.84	95.05
F1 (%) ↑	PSegNet	92.59	97.95	95.27
IoU (%) ↑	PlantNet	85.63	95.78	90.71
IoU (%) ↑	PSegNet	86.21	95.98	91.10

Table 3. Quantitative performances of PlantNet and PSegNet for leaf instance segmentation on Pepper-4D. The “↑” sign means the higher the better, and the bold means the best in comparison.

Metrics	Networks	Leaf Instance Segmentation
mPrec (%) ↑	PlantNet	73.86
mPrec (%) ↑	PSegNet	76.58
mRec (%) ↑	PlantNet	87.48
mRec (%) ↑	PSegNet	86.34
mCov (%) ↑	PlantNet	81.62
mCov (%) ↑	PSegNet	82.57
mWCov (%) ↑	PlantNet	85.79
mWCov (%) ↑	PSegNet	86.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahmed, F.; Li, D.; Zhao, B.; Wang, Z.; Huang, J.; Li, T.; Huang, J.; Hou, J.; Jobaer, S.; Yan, H. Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping. Plants 2026, 15, 599. https://doi.org/10.3390/plants15040599

AMA Style

Ahmed F, Li D, Zhao B, Wang Z, Huang J, Li T, Huang J, Hou J, Jobaer S, Yan H. Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping. Plants. 2026; 15(4):599. https://doi.org/10.3390/plants15040599

Chicago/Turabian Style

Ahmed, Foysal, Dawei Li, Boyuan Zhao, Zhanjiang Wang, Jiali Huang, Tingzhicheng Li, Jingjing Huang, Jiahui Hou, Sayed Jobaer, and Han Yan. 2026. "Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping" Plants 15, no. 4: 599. https://doi.org/10.3390/plants15040599

APA Style

Ahmed, F., Li, D., Zhao, B., Wang, Z., Huang, J., Li, T., Huang, J., Hou, J., Jobaer, S., & Yan, H. (2026). Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping. Plants, 15(4), 599. https://doi.org/10.3390/plants15040599

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pepper-4D: Spatiotemporal 3D Pepper Crop Dataset for Phenotyping

Abstract

1. Introduction

2. Related Work

2.1. 2D Pepper Datasets and Applications

2.2. 3D Crop Datasets

2.3. Applications on 3D Crop Datasets

3. Materials and Methods

3.1. Materials

3.2. Data Acquisition

3.3. Data Annotation

4. Tasks and Results

4.1. Health Assessment by Classification

4.1.1. Methodology

4.1.2. Health Assessment Results

4.2. Organ Segmentation

4.2.1. Methodology

4.2.2. Semantic Segmentation Results

4.2.3. Instance Segmentation Results

4.3. Detection of New Organs

4.3.1. Methodology

4.3.2. New Organ Detection Results

4.4. Organ Tracking

4.4.1. Methodology

4.4.2. Organ Tracking Results

4.5. Generating Natural and Vivid 3D Plants

4.5.1. Methodology

4.5.2. Results of 3D Generation of Pepper Plants

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI