1. Introduction
In recent years, research on plant phenotyping using artificial intelligence has progressed significantly in large-scale greenhouse agriculture [
1]. Among various crops, tomatoes are particularly important for large-scale greenhouse horticulture. Numerous studies have focused on facility-grown tomatoes, including automated detection of tomato fruits, leaf area estimation, and yield prediction [
2,
3,
4]. In these studies, object detection and instance segmentation techniques played a vital role in automating the identification of tomato fruits, thereby contributing to labor cost reduction and improved operational efficiency [
5,
6,
7].
However, accurately detecting tomato fruits and estimating their size and color remains a significant challenge in greenhouse environments, where a large number of plants are cultivated in dense, multi-row arrangements. This difficulty is exacerbated in the case of cherry tomatoes, which require high-precision size estimations. Although previous research has been successful in applying image-based methods for yield estimation. In situations where data collection is limited or object position angles are not suitable, the effectiveness of image-based assessment techniques decreases. Includes advances in tomato phenotypic analysis. However, this method has been underexplored.
To address these challenges, the proposed method detects individual cherry tomato fruits from daily video recordings captured in a greenhouse, extracts size and color information, and analyzes their growth dynamics over time. For a precise size estimation, this method reconstructs 3D point clouds for each detected fruit using instance segmentation with the YOLOv8x-seg model [
8,
9]. The 3D reconstruction pipeline integrates Structure-from-Motion (SfM), Multi-View Stereo (MVS), and the Nerfacto framework to generate detailed point clouds, which are then refined using ellipsoid-fitting techniques [
10,
11].
This research focuses on developing an integrated 3D growth tracking framework, which combines instance segmentation with point cloud visualization. The framework enables accurate fruit size estimation through a fitting shape approach. Facilitates the modeling of growth dynamics over time using logistic curve fitting, thus providing a comprehensive strategy for quantitative phenological analysis.
The proposed approach enables automated and quantitative growth assessment, contributing to sustainable greenhouse agriculture by reducing manual labor, increasing resource efficiency, and supporting data-driven crop management approaches that align with sustainable food production goals.
Growth estimation was performed based on morphological and color features. Temporal changes in fruit size and color from fruit set to harvest are modeled using logistic growth curves, enabling the objective estimation of current growth stages and future harvest timing [
12].
The effectiveness of the proposed method is validated using real-world data collected from operational greenhouse farms.
The remainder of this paper is structured as follows.
Section 2 reviews related studies to contextualize the background of this research.
Section 3 details the proposed methodology for tomato phenotyping and the estimation of ripeness and size.
Section 4 presents the experimental results obtained using the proposed approach, and
Section 5 discusses these results in detail. Finally,
Section 6 concludes the paper by summarizing the main findings.
3. Materials and Methods
This section presents the methods used in this study, including details of the data and study area, the tomato detection method, 3D reconstruction, feature extraction for estimating tomato size and ripeness, and the analysis used to estimate tomato growth.
3.1. Area of Study and Description of Data
This study focused on agricultural technology, specifically targeting cherry tomatoes, which are small tomato cultivars. The monitoring period spans approximately 30–40 days, from the initial fruiting stage to full maturity, depending on the seasonal conditions.
The data used in this study were obtained from video recordings collected at Asai Nursery, located in Tsu, Mie Prefecture, Japan (longitude 136°28′16.8″ E, latitude 34°47′07.9″ N). The tomatoes were cultivated in a greenhouse, and recordings were captured using a standard mobile phone camera (Lenovo Tab M9, 8-megapixel resolution). Videos were recorded in MP4 format at a resolution of 1920 × 1080 pixels. In this study, the camera was not fixed or calibrated for daily use, resulting in slight variations in camera position and angle across different days, with a capture distance of up to approximately 60 cm. To avoid disturbing plant growth, videos were taken from an approximate 180° angle, rather than a full 360° view. Consequently, this study required the application of additional techniques to compensate for incomplete viewpoints before performing accurate size estimation.
3.2. Approach Overview
The overall approach of this research involved estimating tomato growth patterns through the extraction of key features such as size and ripeness. This methodology combines 3D modeling for accurate measurements with mathematical modeling to represent the growth dynamics of tomato plants and fruits. By integrating computer vision and mathematical approaches, this study aims to provide a comprehensive framework for the nondestructive monitoring of tomato development in controlled-environment agriculture.
The process begins with Step 1, data preparation, which includes converting raw video data into individual frames, followed by annotation and preprocessing to ensure high-quality input for model training. The prepared data were then used to train an instance segmentation model in Step 2 for accurate tomato detection, enabling the identification and localization of individual fruits across different growth stages. Once tomatoes were detected, the process branched into two parallel paths to estimate their size and ripeness.
For size estimation, after detection, the workflow proceeds to Step 3, where a 3D reconstruction model is created to generate a point cloud representation of the tomatoes. This step ensured that the fine morphological details were preserved for reliable measurements. The reconstruction was followed by shape fitting to obtain a complete 3D tomato model, after which the actual size of the tomatoes, such as diameter and volume, was estimated with high precision.
Simultaneously, for ripeness estimation, the process continues to Step 4, which evaluates the ripeness of each detected tomato using the color features extracted from the Lab* color space. This approach enables the quantitative assessment of maturity levels, which are closely associated with fruit quality and harvest readiness.
Finally, once the size and ripeness estimations were complete, Step 5 involved the construction of a logistic growth model to describe the relationship between tomato size and ripeness over time using a mathematical formulation. This integration provides a dynamic representation of phenological changes, offering valuable insights into the growth trajectory and the maturation process.
An overview of the methodology is illustrated in
Figure 1, and detailed descriptions of each step are provided in the following sections.
As illustrated in
Figure 1, Step 1 corresponds to the input stage in which raw data are prepared for subsequent analysis. Step 2 involves tomato detection using an instance segmentation model; further details are provided in
Section 3.3. Step 3 focuses on size estimation through 3D reconstruction and shape fitting, as described in
Section 3.4,
Section 3.5,
Section 3.6. In parallel, Step 4 addresses ripeness estimation based on color analysis, as discussed in
Section 3.7. Finally, Step 5 presents the construction of a logistic growth model that integrates the outcomes of the size and ripeness estimations, as explained in
Section 3.8. Together, these steps establish a unified framework that enables accurate phenotypic measurements and dynamic growth modeling of tomatoes, forming the basis for subsequent experimental evaluation.
3.3. Instance Segmentation and Color Overlay
In this study, an object detection model based on YOLOv8x-seg was trained to identify tomato plants in video frames. The tomatoes detected were identified using instance segmentation. YOLOv8x-seg is a deep learning object detection model in the YOLO series, released in 2023 by Ultralytics. YOLOv8 supports instance segmentation and includes five subversions (n, s, m, l, and x), each with different internal structures and performance. Yue et al. [
9] compared the performance of various YOLOv8 versions, such as the segmentation of healthy and diseased tomato plants. The most efficient version, YOLOv8x-seg, achieved a mean average precision (mAP; 0.5) of 90.7%. However, it also has the largest model size compared to the others. In this study, YOLOv8x-seg was selected to build the model because tracking performance was prioritized over the slightly larger model size. To improve model generalization, basic data augmentation strategies were applied during training, including vertical and horizontal flipping, 90° rotations clockwise and counterclockwise, random rotations within ±45°, and shear transformations within ±15°. Examples of these augmentations are shown in
Figure 2.
The YOLOv8x-seg model was trained under this environment using a batch size of 16, for 10 epochs, and a learning rate of 0.01. All input images were resized to 640 × 640 pixels. The hyperparameter settings used for training are summarized in
Table 1.
Color masking refers to the process of highlighting important objects identified through object detection before generating 3D images. This step is crucial because various factors in the 3D image generation can distort the results. One significant factor is the color of the image. When an image is converted to 3D, many point clusters are created, each retaining its original color. This can cause background colors or unrelated objects to blend with the object of interest, thereby distorting the final 3D shape. For example, a green tomato surrounded by green leaves and branches, when converted into a cluster of 3D points without distinguishing colors, may result in overlapping green points, making accurate analysis impossible [
26,
27,
28].
Color masking also simplifies point grouping. The process begins by converting an RGB image into grayscale. Subsequently, a unique color is assigned to prevent overlapping or closely positioned tomatoes from being confused. During the color masking process, the color values are converted from RGB to HSV, where H represents the hue, S represents the saturation, and V represents the value (or brightness). The advantage of using HSV is that the H value corresponds to a specific color, which can be easily adjusted. Different colors can also be assigned, making it easier to distinguish between tomatoes. For example, in an 8-bit image with 255 possible H values, up to 255 different colors can be assigned to different tomatoes with S and V held constant.
3.4. 3D Point Clouds Generation
Cherry tomatoes have small dimensions, typically in millimeters (mm), making accurate size measurements highly dependent on image resolution. As discussed in
Section 2, measurements based on 3D images generally provide a higher accuracy than those based on 2D images [
18]. Furthermore, because the camera used in this study was not positioned at a fixed location, it was not possible to reliably calculate the focal length, making accurate size estimation from 2D images particularly challenging. Therefore, generation of accurate 3D point clouds played a critical role in this research [
16].
In 3D image generation, the primary inputs are camera position information and point cloud coordinates. During the input acquisition process, the SfM and MVS techniques were employed. SfM calculates a 3D structure from multiple photographs captured from different camera positions or viewpoints. SfM detects specific features in each image such as corners or edges and matches these features across different images. Geometric calculations are then used to estimate the camera positions and generate 3D points.
This process effectively compensates for the lack of a fixed camera setup, since SfM automatically estimates the relative camera poses from overlapping images, allowing accurate 3D reconstruction even under slight variations in angle and position. The output of SfM is a point cloud that is not dense, and is primarily used to create the main structure of the image [
18]. The fundamental principle SfM is shown in
Figure 3.
Once the camera position information and main structure of the 3D image are obtained, the MVS technique is utilized to create a high-resolution 3D model based on multi-view photographs with known camera positions. This process results in a significantly higher-resolution image [
29].
Subsequently, the data were used to train the Nerfacto model to produce high-quality 3D scenes. Nerfacto is a pipeline designed to generate NeRF models by integrating NeRF with instant-NGP, thereby improving the speed and efficiency of 3D scene reconstruction. The training process leverages neural networks to learn the radiance and point density values in 3D space, enabling the generation of photorealistic renderings [
23,
24].
This workflow utilizes COLMAP for SfM, MVS, and Nerfstudio to train Nerfacto, thus providing a user-friendly framework that enhances the efficiency of generating and rendering high-quality 3D models.
In summary, the 3D image generation process consists of the following steps. First, the SfM technique uses intrinsic and extrinsic parameters to generate a 3D image structure in the form of sparse point clusters. Then, the MVS technique increases the density of point clusters to make the image more complete. Finally, the Nerfacto framework utilizes the output from these SfM-MVS to enhance texture and render a more comprehensive and realistic 3D image.
3.5. 3D Point Cloud Identification and Shape Fitting
Individual tomatoes were identified after generating 3D point clouds from the processed images. This was achieved using a combination of clustering techniques and geometric analyses. Specifically, the point clouds were segmented into distinct groups representing individual tomatoes. Several steps were followed to achieve this goal.
1. Point Cloud Clustering: Using density-based spatial clustering (DBSCAN or another suitable clustering algorithm), groups of points corresponding to individual tomatoes were identified. Additionally, color values from the staining were used to assist with grouping alongside spatial clustering, allowing for finer differentiation between tomatoes. This helps to isolate tomatoes within a point cloud [
21,
22].
2. Ellipsoid fitting: The morphology of the cherry tomato closely resembles an ellipsoid rather than a perfect sphere, as its major and minor diameters differ slightly but consistently. Therefore, the ellipsoid model is suitable for fitting the fruit’s natural geometry. The approach helped fit occluded or partially reconstructed tomatoes. Even when point clusters were incomplete due to leaf or fruit overlap during SfM–MVS, the least-squares fitting process could approximate missing regions by inferring the most likely fruit geometry. This provided a more reliable estimate of fruit size and shape, even under partial visibility conditions. Each cluster was fitted to a 3D ellipsoid model using the least-squares ellipsoid-fitting technique. The ellipsoid is defined by three semi-principal axes (a, b, and c, representing the principal dimensions of the fruit) and a center point. The goal is to minimize the distance between the points in the cluster and the surface of the ellipsoid, providing an estimate of the shape and size of the tomato in 3D space.
The ellipsoid equation is expressed as:
where
S is a 3 × 3 symmetric matrix defined as:
The parameter d represents the scale factor (set to d = 3 in this case for a 3D ellipsoid). Least-squares optimization was employed to minimize the residuals (xTSx = d) for all points x in the cloud. The eigenvalues and eigenvectors of S were used to calculate the principal axes and volume of the ellipsoid. These results provide insights into the geometry and growth characteristics of tomatoes.
The diameters of the ellipsoid along its axes were derived from the eigenvalues, and the volume of the ellipsoid was calculated using the equation:
where
a,
b, and
c are the lengths of the semi-principal axes. This method enables accurate measurements of tomato dimensions and volume, supporting detailed growth analysis [
19].
3. Visibility estimation: For clusters that do not fit well to an ellipsoid owing to occlusions or incomplete data, interpolation methods are used to fill in the missing parts. In some cases, manual refinement may be necessary to correct significant occlusions caused by leaves or overlapping fruits.
Tomato growth rate estimation utilizes the morphological characteristics of tomato fruits [
16], such as fruit volume, fruit surface area, vertical diameter, transverse diameter, fruit shape index, and fruit color, as follows:
Fruit volume: the volume of the convex hull of the generated 3D point cloud.
Fruit surface area: the total area of the surface meshes.
Vertical diameter: distance between the pedicel and opposite end of the tomato fruit.
Transverse diameter: diameter of the largest transverse section perpendicular to the vertical diameter.
Fruit shape index: the ratio between the vertical and transverse diameters.
Fruit color: the color range of tomatoes indicates the ripeness of the fruit.
However, before estimating the growth rate, the accuracy of 3D data must be evaluated. In this study, the transverse diameter obtained from 3D images was compared with the actual size measured using calipers.
3.6. Size Estimation
Once the 3D reconstruction and identification of each fruit were completed, the next crucial step was to calibrate the 3D model to ensure accurate size estimation. Metric calibration is essential for aligning the distances and dimensions in a point cloud using real-world measurements [
20]. For this purpose, a reference object was introduced during the data collection process, which was approximately the size of a ping-pong ball (40 mm in diameter) that closely matches the average size of a real-world tomato. By calibrating the 3D model to a known reference object, the scales and sizes of the reconstructed tomatoes were ensured to be accurate. The fruit size was calculated using the following equation:
where
Stomato is the tomato size estimation,
Ltomato3D is the length of the tomato obtained from the 3D image,
Lref3D is the length of the reference ball obtained from the 3D image, and 40 is the actual diameter of the reference ball. The ratio of the actual reference ball size to the 3D reference ball size represents the calibration process. When multiplied by the 3D tomato size, the size of the tomato is in standard units of millimeters.
3.7. Color-Based Ripeness Estimation
In this study, CIELAB standard color measurements were used to assess tomato ripeness, specifically the luminosity (L*), red-green component (a*), and yellow-blue component (b*). These parameters are effective for distinguishing different ripening stages [
15].
Following the segmentation of the tomatoes, the a* and b* values were extracted for each pixel. A scatter plot was generated using these values, and the slope of the linear interpolation was calculated to quantify the color-based ripeness characteristics.
The interpolation slope serves as a quantitative indicator of ripeness, reflecting the color transition during the maturation process. Once the color-based ripeness estimation was completed, the tomato ripeness grade was determined based on
Figure 4 to classify tomatoes according to their ripeness stage.
3.8. Logistic Modeling of Tomato Size and Ripeness Transitions
Monitoring tomato fruit growth in large-scale greenhouse environments is essential for accurate yield estimation and for maximizing labor efficiency, which aligns with the objectives of the present study. To achieve this, mathematical modeling is commonly employed to characterize the growth patterns of plants and fruits and to understand their responses to environmental influences.
Baar et al. [
12] implemented a logistic model for precise tomato fruit growth prediction based on fruit diameter measurements under greenhouse conditions and demonstrated the effectiveness of a simple logistic approach. Their findings highlighted that such models can accurately predict tomato maturation with minimal complexity, supporting their continued use in real-time greenhouse monitoring systems.
Among the various modeling approaches, the logistic function is one of the most suitable for describing plant and fruit growth dynamics. The general form of the logistic equation is as follows:
The equation was applied to this study, where A is the maximum size of tomato, x0 is the position of the peak growth day, and d can describe the growth rate of the tomato. However, for comparisons with other datasets. If values of A and x0 are the same, the values of d can be compared to determine which dataset has the faster growth rate. On the other hand, when the values of A and x0 are different, the maximum growth rate (gmax) value can be used to compare the results, which is calculated from the differential function as follows: gmax = df/dx(max(df/dx)x).
To obtain the optimal parameters, the minimum optimization approach was applied using the Generalized Reduced Gradient (GRG) nonlinear method, with the initial parameter limits assigned as shown in
Table 2.
Various logistic-type functions, including the sigmoid, Richards, and Gompertz models, are commonly employed to describe biological growth processes. Fang et al. [
31] provided a detailed analysis of these functions and proposed enhancements for several complex formulations.
Nevertheless, this study adopts the standard logistic model introduced earlier primarily to reduce computational complexity and simplify parameter estimation. Future work may explore the integration of more advanced models to enhance fitting accuracy and better capture biological variability.
4. Experimental Results
This section presents the experimental results obtained by applying the proposed methodology to video data collected in a greenhouse environment. The effectiveness of each component instance segmentation, 3D reconstruction, size and ripeness estimation, and growth modeling was evaluated. Quantitative assessments were performed to validate the accuracy of the detection and size estimation processes, whereas temporal analyses illustrated the growth trends and seasonal variations in tomato development. Visual examples and statistical comparisons are included to demonstrate the performance of the system under real-world conditions.
The experiments, which took approximately 40 min per instance segmentation model training and took approximately hour and half to generated a 3D model, were conducted on Windows 11 using a 13th Gen Intel® Core™ i7-13700KF CPU and an NVIDIA GeForce RTX 4070 Ti GPU. The deep learning framework was Python 3.10.0 with PyTorch 2.1.2 and CUDA 12.2, accelerated by cuDNN 9.1.0.2. The 3D reconstruction framework was nerfstudio 1.1.3 and colmap 3.10.
4.1. Accuracy of Instance Segmentation Model
The object detection model employed in this study was trained using the publicly available Laboro Tomato dataset hosted on GitHub [
32]. This dataset included tomato images categorized into three ripeness stages: fully ripe, half-ripe, and green. It contains 804 images, each of which features multiple tomatoes of varying ripeness levels. The images were obtained at two resolutions: 3024
4032 and 3120
4160 pixels.
YOLOv8x-seg was selected based on the comparative subversion (n, s, m, l, and x) in
Section 3.3, which demonstrated high segmentation performance and suitability for instance-level detection in tomato images. In addition, comparisons with other commonly used instance segmentation frameworks, including Mask R-CNN [
8,
9]. The performance comparison of each model is shown in
Table 3.
The instance masks produced by the model achieved a mAP of 0.881 at an Intersection over Union (IoU) threshold of 0.8, demonstrating a robust performance in segmentation tasks. The overall precision and recall across all classes were 0.806 and 0.818, respectively, indicating effective identification of tomato fruits at various ripening stages. The model demonstrated high performance for fully ripened tomatoes, achieving a mAP of 0.886, followed by 0.919 for green tomatoes, and 0.839 for half-ripened tomatoes. These results highlight the effectiveness of the model in accurately distinguishing different tomato ripeness levels, underscoring its potential for practical applications in agricultural monitoring and management.
4.2. 3D Tomato Phenotyping and Metric Calibration
Evaluating the accuracy of a reconstructed 3D point cloud is challenging. A common method involves comparison with 3D images acquired using precise instruments, such as LiDAR and structured light scanners [
10,
16,
25]; however, the high cost of such devices can be a limitation. In this study, a metric calibration technique was used to evaluate accuracy by employing a reference object to compare the measurements obtained from the 3D model with the corresponding real-world values. The variables selected for comparison were the transverse diameter of the tomato and the diameter of the reference ball [
20].
After instance segmentation to detect tomatoes, 3D reconstruction was performed, beginning with staining to highlight key features before generating the 3D model. The process involved segmenting the point cloud data, followed by selecting tomatoes of interest using the DBSCAN technique and color value clustering. The resulting 3D point groups were then analyzed to isolate the target tomatoes, allowing for a more focused assessment of their growth characteristics and other relevant features (
Figure 5).
4.3. Shape Fitting Techniques and Size Estimation Analysis for Tomatoes
The point clouds were separated into reference spheres and tomato fruits (
Figure 6). Each point cloud was then shaped based on the principles outlined in
Section 3.5; the reference sphere was adjusted to form a circular shape, whereas the ellipsoid was fitted to the tomato, as this tomato variety exhibited an ellipsoid shape. After completing these processes, the diameter of the tomato was measured, and the variables specified in
Section 3.6 were calculated to estimate the growth rate.
After measuring tomato size from 3D point cloud, the accuracy of the size estimation was evaluated by comparison with the actual size measured using calipers. The results indicate that the growth trend characteristics were similar, with an average percentage error of 8.01% (
Figure 7). However, the size estimation slightly overestimated growth, although it remained within the acceptable criteria for agricultural use.
From
Figure 7 The time (days) begins from when the tomato’s size was first detectable by instance segmentation and continues until the fruit reached harvestable size. The growth curves in the graph were fitted using logarithmic approximation equations for comparison [
12].
After evaluating the accuracy of the tomato size estimation, a logistic function was used for curve fitting to construct a transition graph of tomato growth for analysis based on growth principles, as presented in
Section 4.5.
4.4. Assessment of Tomato Ripeness Status Based on Color Estimation
The results of the color-based ripeness estimation calculation are shown in
Figure 8, following the method described in
Section 3.7. After calculating the pixel-by-pixel color values, the data form a cluster of points distributed on the graph, where the
x-axis represents the a* values and the
y-axis represents the b* values. The slope was then determined using linear interpolation. For example, on 1 July, 2 July, 4 July, and 12 July, the calculated slope values were −0.3, 0.03, 0.83, and 1.31, respectively, with ripeness levels classified as classes 1, 2, 5, and 8, respectively.
After estimating the ripeness of tomatoes based on color values and classifying the ripeness levels, a logistic function was used for curve fitting to generate a tomato ripeness progression graph for the growth pattern analysis, as presented in
Section 4.5.
4.5. Growth Rate Estimation and Comparative Analysis Across Seasons
This section demonstrates a technique for estimating the tomato growth rate by analyzing the size and ripeness, and generating a graph using a logistic function, as shown in
Figure 9.
Figure 9 shows the comparative growth patterns of the 11 tomatoes from the same bunch, with Tomatoes 1 and 11 representing the lowest and topmost tomatoes, respectively.
The graph shows a clear consistency between the data and the actual images: the top tomato grows faster than the bottom tomato. For example, the sample image shows that Tomato 11 is almost fully grown, whereas Tomato 1 remains green and immature. This aligns with the graph in which the top tomato exhibited a faster increase in size and ripeness than the bottom tomato.
Moreover, the graph shows that tomatoes grew rapidly in size during the first 40 days and remained relatively constant after day 45. In terms of ripeness, the tomatoes began to change color around day 20, progressing sequentially from the top to the bottom.
Various environmental factors significantly influenced the growth rate of tomatoes. The main factors in this study include daily temperature, humidity deficit, and solar radiation [
33]. These factors varied seasonally, even under greenhouse conditions, particularly during winter and summer. When environmental data from these two periods were compared, the differences were clearly visible, as shown in
Figure 10.
Figure 10, data collected in 2024, compares the measured environmental factors between winter and summer. The graph shows that these factors differ across seasons. Winter temperatures do not exceed 20 °C, while summer temperatures average 25.7 °C and reach a maximum of 30 °C. Temperatures decrease at night. Furthermore, humidity deficits are little higher in summer than in winter, and solar radiation levels are also higher in summer, with an average of 12,226 J/cm
2, compared to 9125 J/cm
2 in winter.
Different environmental factors in each season affect the growth period of tomatoes, both size and ripening period. By using the estimated transition method with the logistic function to analyze the growth rate of tomatoes in summer and winter, the differences can be compared more clearly. The relationship between growth size and ripeness, as well as the differences in tomato cultivation between summer and winter, are shown in
Figure 11.
Figure 11 illustrates the relationship between the tomato growth rate, size, and ripeness at each growth stage and shows that these features are correlated. For example, tomatoes exhibit a rapid growth rate when they remain green until the point of color change. After the tomatoes were fully ripe, their size remained relatively constant.
Based on the calculation of g
max values using the method in
Section 3.8, the g
max values of tomato size and tomato ripeness in summer are 1.08 and 0.41, respectively, while those in winter are 0.87 and 0.21, respectively. The comparison fo the results from both seasons reveal that the growth rate of tomato size in summer was approximately 24.14% higher than in winter. In addition, tomatoes in summer ripened approximately 95.24% faster than in winter.
A comparison of tomato growth between winter and summer showed that tomatoes grew faster and changed color earlier in summer than in winter. Moreover, the growth rate in winter was approximately 10 days slower than that in summer.
5. Discussion
5.1. Instance Segmentation Framework
This study presents a novel instance segmentation framework for tomato tracking and 3D tomato phenotyping based on daily video images. The YOLOv8x-seg model demonstrated high segmentation performance, achieving a mAP of 0.881 at an IoU threshold of 0.8, confirming its effectiveness in distinguishing tomatoes at different ripening stages.
5.2. 3D Reconstruction and Morphological Analysis
A combination of SfM, MVS, and the Nerfacto framework was used to generate high-quality, high-resolution 3D point clouds of tomatoes. This reconstruction process enabled a detailed and precise analysis of fruit morphology. To separate the individual tomatoes from the point cloud data, the DBSCAN clustering algorithm was applied, which effectively grouped the points corresponding to each fruit for further analysis. A key contribution of this study is the implementation of ellipsoid fitting for 3D shape analysis. This noninvasive approach allows the estimation of important morphological traits, such as volume, surface area, and vertical diameter, which serve as reliable indicators of tomato growth. Compared with the image-based approach, the proposed 3D framework provides the natural shape and spatial structure of the fruit more effectively. However, these techniques can be applied to increase efficiency in smart and sustainable agriculture.
5.3. Tomato Feature Extraction: Size Estimation and Ripeness Assessment
To accurately assess tomato growth, metric calibration was performed using a spherical reference object, enabling reliable size estimation with an average error of 8.01% compared with manual caliper measurements over a 40-day cultivation period. This calibration improved the accuracy of the morphological traits derived from the ellipsoid fitting, such as volume and diameter, and provided precise size information.
The observed errors could be due to slight inaccuracies in manual caliper measurements. For example, when the data collector applies too much force to the instrument, soft-skinned tomatoes may collapse, causing size discrepancies. Additionally, the error may be caused by limitations in the 3D point cloud generation using daily video. To further reduce these errors, future plans will focus on validating and verifying the manual measurement process, and possibly considering the use of LiDAR technology for performance comparisons and potential application in 3D image generation.
For ripeness assessment, the CIELAB color space was utilized, focusing on the luminosity (L*), red-green (a*), and yellow-blue (b*) components. After segmenting the tomatoes from the images, pixel-wise a* and b* values were extracted to generate scatter plots representing the color distribution. The slope of the linear interpolation through these points quantified the color transition during ripening, serving as a robust numerical indicator of ripeness. This slope-based metric aligned well with the tomato color standard (
Figure 4) and allowed classification into ripeness stages consistent with visual observations.
5.4. Growth Rate Estimation and Seasonal Comparison
Growth rate estimation using logistic function curve fitting effectively captured the dynamic development of individual tomatoes over time. The modeled growth patterns revealed a clear trend in which tomatoes positioned higher in the bunch exhibited faster increases in size and ripeness than those at the bottom (
Figure 9). This observation aligned well with the actual images, validating the ability of the framework to reflect real-world growth variability within a single cluster.
The growth trajectory demonstrated a rapid increase in size during the initial 40 days, followed by a stabilization phase after day 45, consistent with typical fruit maturation stages. Similarly, ripeness progression, indicated by color changes starting around day 20, followed a top-to-bottom sequence within the bunch, reflecting biological developmental patterns.
The analysis across seasons reveals that in summer, the tomato growth rate was higher than in winter, with tomato size growing and ripeness faster by 24.14% and 95.24%, respectively.
Moreover, the seasonal comparison highlighted the significant environmental influences on tomato growth. Tomatoes cultivated in summer displayed faster growth rates and earlier color changes than those grown in winter, where maturation was delayed by approximately 10 days. This finding emphasizes the importance of considering seasonal effects in phenotyping studies, and supports the utility of the proposed system for adaptive management practices to optimize yield and quality across different growing conditions.
6. Conclusions
This study successfully developed a framework for 3D tomato phenotyping and growth monitoring using daily video recordings under real greenhouse conditions. Integrating instance segmentation (YOLOv8x-seg) and 3D reconstruction techniques (SfM, MVS, and Nerfacto) enabled the accurate morphological and ripeness analyses of individual tomato fruits. Ellipsoid fitting and metric calibration supported reliable size estimation, whereas ripeness classification was achieved through color analysis based on the CIELAB color space.
Logistic modeling of tomato growth revealed consistent developmental trends and highlighted seasonal differences in size and color progression, with summer-grown tomatoes maturing faster than those in winter. These findings demonstrate the potential of the proposed framework for supporting nondestructive, data-driven crop monitoring in precision agriculture.
Although the system performed well, some limitations remained, particularly regarding occlusions, lighting variability, and the need for manual corrections. Future enhancements will aim to improve automation, reduce the sensitivity to environmental variations, and expand the applicability of the system to other crops and growing conditions. These developments will be the key to realizing a fully practical and scalable phenotyping solution for smart farming. Despite the proposed system demonstrating high accuracy in instance segmentation and 3D reconstruction, some limitations remain. One major challenge is the occlusion caused by leaves and overlapping fruits, which can interfere with accurate detection and shape fitting. Although clustering and coloring techniques are used to mitigate this problem, reconstruction errors can still occur, particularly when tomatoes are partially hidden or only partially visible in video frames.
In addition, variations in camera position and lighting owing to handheld recordings may introduce inconsistencies in 3D modeling and ripeness estimation. These variations may have affected the stability of the results on different days.
Furthermore, although many processes in the pipeline are automated, some steps such as object selection, shape verification, and tracking still require manual intervention. This reduces the efficiency of the system when applied over long periods of monitoring.
Future work will focus on improving the robustness of the system under actual greenhouse conditions, minimizing manual adjustments, and exploring methods for adapting the method to other tomato varieties and environmental conditions. These improvements will enhance the practicality and scalability of the proposed approach for broader agricultural applications.