Review Reports - Tomato Growth Monitoring and Phenological Analysis Using Deep Learning-Based Instance Segmentation and 3D Point Cloud Reconstruction

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for submitting your manuscript. I have carefully reviewed the manuscript and appreciate the authors' efforts in Tomato growth monitoring and phenological period analysis. Below are my comments and suggestions for further improvement。

1. The abstract mentions the use of SfM, MVS, and Nerfacto for 3D reconstruction, but fails to briefly explain the key advantages of this. It is recommended to provide one or two sentences describing these key advantages and enhance the intuitiveness of the research's innovation.

2.Section3.1 does not specify the core acquisition parameters, which directly affect the consistency and accuracy of 3D reconstruction. It is recommended to complete this to ensure that other studies can replicate the experimental design.

3.Section3.3 does not specify whether data augmentation strategies (such as random rotation, cropping, illumination/contrast perturbations, and noise addition, which are often required for agricultural image processing to improve model generalization) were used.

4. Key parameters of the training process (such as batch size, learning rate scheduling strategy, and number of iterations) are recommended to be supplemented with these details and to clarify the data partitioning (training/validation/test set ratios) to improve the traceability of model training.
5.Section 4.1 only reports YOLOv8x-seg's mAP = 0.881, precision 0.806, and recall 0.818, but does not compare its performance with other mainstream segmentation models of the same period (such as YOLOv9-seg, Mask R-CNN, and SegNeXt) on the same Laboro Tomato dataset. This lack of horizontal comparison fails to highlight the advantages of this model choice. It is recommended to supplement the mAP, speed, and parameter count of each model to strengthen the demonstration of the method's advanced nature.

6.Section4.5 states that "tomatoes mature 10 days later in winter," but does not provide the differences in key environmental parameters between winter and summer greenhouses (such as average daily temperature, diurnal temperature difference, daylight hours, and relative humidity). It is recommended to supplement with specific environmental monitoring data from both seasons to establish a correlation between "environmental differences and growth differences" and enhance the scientific validity of the seasonal comparison conclusions.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The abstract should be strengthened to stress on the motivation and the limitations of the current techniques.
It is better to explicitly provide the research question/statement in the introduction section.
It is better to highlight the relevance to sustainability in the introduction section. Additionally, deeper analytical connection to the related studies should be stressed in the introduction. Above all, the research gap is not that clear in this section and needs to be strengthened and described clearly. Also, the novelty and contribution of the paper should be highlighted with more details in the introduction.
The related work should be more analytical rather than summering what is there with little to no analytical comparison or contrasting.
It is better to have a flow diagram that describes the different steps adopted in the methodology.
The details of the dataset used in the study should be provided in addition to the details of the pre-processing step, the training parameters, color analysis details, validation process details, … etc. which are crucial in the reproducibility of the whole study.
Most of the model hyperparameters are missing such as number of learning rate, batch size, number of epochs, … etc. which should be detailed.
The manuscript lacks quantitative comparative study against prior related work or against a reasonable baseline in this direction.
The results section is mostly descriptive rather than analytical, it should provide justification and reasons behind the results, analysis and contrasting … etc.
The study lacks computational analysis/cost concerning the required computations needed by the provided framework, which is really important to be studied and provided in the manuscript.
The paper refers to the estimated error; however, it never tells which kind of error has been adopted and how it is computed.
It is really important to provide the details of the hardware specifications used in the experiments.
Equation (4), the parameters should be renamed to reflect their actual role in the equation, for example “3D Tomato” what do you mean by it, if you mean its size, it should be renamed to reflect that or better have a variable to describe it since this description is too long for an equation.
The captions of some figures are overly long, for example the caption of Fig. 6.
Some references are not peer-reviewed, for example reference [24]. Please kindly use an alternative peer-reviewed ones.
Please adjust the information for reference [1]], which is written in Japanese it should be written in English.
Please kindly complete the missing information for reference [8].
References [28], [29], and [30] are barely relevant (or even not relevant) to the provided study, please try to use alternative relevant references.
iThenticate report shows 17% similarity index, please consider reducing it to the minimum possible.

Comments on the Quality of English Language

Several sentences throughout the manuscript are overly long, for example the sentences in lines 68 to 71, 73 to 75 … etc.
The use of first-person tone (e.g., we, our … etc.) throughout the paper is not recommended in Academic writing, please kindly use passive tone instead.
Line 378: “performed particularly well” is too informal for academic writing, better use an alternative phrase.
Line 234: “generating of accurate 3D” should be “generation of accurate 3D”
Line 410: “The top shows sphere fitting for the spherical reference ball, and the bottom …” better use “the upper section and the bottom section of the image” since the “top” and “bottom” seem to be very vague.
Line 413: “using a caliper” should be “using calipers”
Line 471: “we present a novel …” should be “We present a novel …”

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

sustainability-3937326-peer-review-v1

Will be beneficial if authors can provide some more results in their abstract. In current way, the abstract is quite general and does not give real summary of the performed research regarding obtained results.

In my opinion this is interesting paper. I am not expert in modeling, however, reading the manuscript, the information was presented with easy to follow way and authors have explained all steps of their manuscript in way following logic and clear presentation. The text is easy to follow. Arguments are presented in systematic way and provide positive and negative sides of the existing approaches in use of high technologies in farming and suggesting improvements that will be appropriate steps to solve the problems of existing technologies.

Authors have provided additional visual material confirming their experimental procedures as an algorithm for the application of the suggested methods.

In the discussion section will be beneficial if authors can provide some similar research projects where 3D was suggested and applied. And maybe again, some arguments in favor of 3D compared to 2D will be beneficial to be added in the discussion section with appropriate references. This will give some additional fortification to current paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review this manuscript. The novelty of the paper is really high. The integration of YOLOv8x-seg and Nerfacto for greenhouse phenotyping is state-of-the-art. My recommendation is to emphasize this novelty in the manuscript. There are my specific comments:

In abstracts part include a phrase explicitly linking to sustainability and emphasize the novelty
In Section 3.3. please provide more details about YOLOv8x-seg such as epochs, learning rate, training-validation split, etc.
In Section 3.4-3.6. explain how occluded or partially visible tomatoes were handled during SfM-MVS reconstruction
In Section 3.8. please clarify whether the parameters for logistic growth model were fitted per tomato or per dataset, and whether the curve was normalized to time or growth ratio.

This paper addresses the integration of YOLOv8x-seg and Nerfacto for 3D tomato phenotyping under greenhouse conditions. The research question is whether advanced instance segmentation and 3D reconstruction can provide accurate, nondestructive growth monitoring and it is clearly defined and highly relevant. The topic is original and timely, filling a gap between costly LiDAR-based methods and low-resolution 2D imaging. The study contributes by integrating cutting-edge deep learning with photogrammetry and logistic modeling for real-time, sustainable crop monitoring. Methodological improvements:
In abstracts part include a phrase explicitly linking to sustainability and emphasize the novelty In Section 3.3. please provide more details about YOLOv8x-seg such as epochs, learning rate, training-validation split, etc. In Section 3.4-3.6. explain how occluded or partially visible tomatoes were handled during SfM-MVS reconstruction In Section 3.8. please clarify whether the parameters for logistic growth model were fitted per tomato or per dataset, and whether the curve was normalized to time or growth ratio. Conclusions are consistent with the data and demonstrate strong practical potential for sustainable agriculture. References are current and appropriate. Figures are clear but could better illustrate workflow and seasonal differences.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

In the manuscript named “Tomato Growth Monitoring and Phenological Analysis Using Deep Learning-Based Instance Segmentation and 3D Point Cloud Reconstruction”, authors have proposed a DL method for tomato growth monitoring, but the manuscript was not well written, and there were many traces that suggested manuscript was generated by AI, for example, from line 62 to line 66, the conclusion section, etc. The manuscript was not recommended for publication in journal Sustainability.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 6 Report

Comments and Suggestions for Authors

Manuscript ID: Sustainability-3937326

Title: Tomato Growth Monitoring and Phenological Analysis Using Deep Learning-Based Instance Segmentation and 3D Point Cloud Reconstruction

Summary: This study introduces an automated framework for monitoring cherry tomato growth. It combines deep learning with 3D point cloud reconstruction to non-destructively estimate fruit size and ripeness from videos. Tested in a greenhouse, the method yielded an 8.01% size estimation error and identified a 10-day maturity delay in winter. This approach enables precise growth tracking and seasonal comparison for smart farming.

My specific comments are below for the authors' consideration.

While the abstract serves as a good introduction, it omits key elements such as the research question, methodology, and the duration of data collection. These details should be included.

The introduction is too brief. Please expand it with a more detailed and in-depth discussion.

The line 90 authors mentioned “As highlighted by Wang et al.,” but it is not corrected. I suggested that they please add a better reference—no need to mention the scholar’s names.

The Materials and Methods section is unclear. Please revise it to clearly state the specific methods and experiments used in this research.

The authors present excellent figures, data, and experimental work. However, the discussion section is currently too brief and incomplete. Please expand on this to further interpret the key findings and highlight their significance in more detail.

I find that new research paper is published with a well-structured format, and I recommend using them as a reference. https://doi.org/10.3390/foods13111610

Limitations and Future Research Directions should be added after the conclusions section.

Good Luck!

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have improved the paper based on the suggested revisions, meeting the standards for publication in the journal.

Author Response

I sincerely appreciate your thorough review and your encouraging comments.

Reviewer 2 Report

Comments and Suggestions for Authors

Most of the comments in the previous review cycle have been adequately addressed in the manuscript except for one core comment, which is comment number 8 that needs additional experiments for proper comparison to a related benchmark. Actually, such kinds of experiments must be provided for completeness of the experimental methods, it cannot be delegated to a future work plan since it is needed to validate the provided results in the manuscript.
There is one minor issue, which is reference [25] is not peer-reviewed, please kindly use an alternative peer-reviewed one.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

The research findings were relatively solid, however, improvements were still needed in the manuscript's writing. There were some comments about the revision.

In Section 3.4, the manuscript mentioned that SfM, MVS, and Nerfacto collaborated to generate 3D point clouds, but it failed to clarify the technical sequence of their integration (e.g., how SfM generates sparse point clouds, how MVS enhances point cloud density based on these sparse clouds, and at which stage Nerfacto optimizes surface continuity), please check it.
Authors had mentioned least-squares ellipsoid fitting to estimate fruit size, but it did not explain "why the ellipsoid model is suitable for the morphology of cherry tomatoes" (e.g., the difference between the major and minor axes of cherry tomatoes matched the ellipsoid characteristics) nor clarify "how ellipsoid parameters (semi-major axis, semi-minor axis) were converted into actual fruit diameter/volume", please clarify it in detail.
Authors had described that the growth rate of tomato size in summer is 24.14% higher than in winter, and ripening is 95.24% faster in section 4.5, It also mentioned that temperature, humidity deficit, and solar radiation were key environmental factors, but it did not quantify "the contribution of a single factor change to the growth rate", It was recommended to add correlation analysis, or other analysis results for it.
Authors had described an average size estimation error of 8.01% in Section 4.3, but they did not decompose the error sources, please check it. The error tracing analysis was better for displaying in this research.
In Section 3.1, authors had mentioned that the camera "lacked fixed position calibration," leaded to angle variations, and they had also noted that "unfixed camera positions made 2D size estimation was difficult" in Section 3.4. However, the two sections did not connect "how 3D reconstruction compensates for the defect of unfixed cameras" (e.g., SfM eliminates single-view position deviation through multi-view matching). Please check it.
Authors had alternately used "mini tomatoes" and "cherry tomatoes" (e.g., "cherry tomatoes" in Section 3.1 and "mini tomatoes" in Section 3.4), which would cause confusion, please unify them. The similar was "ripeness" and "maturity".

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 6 Report

Comments and Suggestions for Authors

Thank you for providing comments well.

Author Response

I sincerely appreciate your thorough review and your encouraging comments.

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

All the comments are addressed in the updated manuscript, just one minor, but important, point which is the provided comparison would be much more better if the provided framework is compared against a related model that can work as a comparison baseline not just the different YOLO versions as shown in Table 3.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Thanks for authors’ works, most comments were well addressed. But some figures needed to be improved, for example, the figure 7 was still poor in quality for publishing. Good luck.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf