RepC-MVSNet: A Reparameterized Self-Supervised 3D Reconstruction Algorithm for Wheat 3D Reconstruction
Round 1
Reviewer 1 Report
3D crop modeling is the process of creating 3D models of agricultural crops. 3D modeling of crop plants has many applications. It can be used to visualize plants for phenotyping, for plant breeding and agricultural experimentation, to analyze plant growth and development to optimize growing conditions, but also for educational and design purposes. In the development of these methods, we encounter devices for reproducing the geometry of plants, physical models describing the growth of plants (L-ssytem), modeling based on the image of the 3D scanner.
The article addresses the use of digital 3D models to reproduce a very reliable image of a plant of both geometric structure and leaf area using wheat as an example. The issues discussed involve very advanced techniques in digital image analysis, physics, neural networks as well as edited at a high level of content. However, due to the use of specialized language with a huge number of technical terms and definitions and jargons, the language is difficult to understand for the reader who is not an expert in these fields and who is interested in agronomy issues. In principle, the work is interdisciplinary in name but lacks a larger and more practical description and possibilities of its use in agriculture which would bring it closer to researchers dealing with agriculture
The authors of this article propose a solution (RepC-MVSNet algorithm) for phenotyping wheat plants (an important cereal crop) based on images taken from a camera in order to improve the acquisition of plant habit information for breeding and research purposes. It is much less expensive than a 3D scanner.
Extensive research has been conducted to test the algorithms. The average number of reconstructed 3D points is 19.4% higher than with the NN algorithm, the average path length is 31.2% higher than with the NN algorithm, and the minimum reprojection error is reduced by 2.7%.
There is a lack of statistical measures used in modeling like: mean squared error, relative mean squared error, standard eror of prdiction, index of agreement or others.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The main question is to construct a wheat point cloud generation dataset based on multi-view images to complete the phenotypic analysis and 3D reconstruction for wheat and accelerate the research and breeding process.
The topic is relevant in the field. Alternatives to imaging such as manual measurement and laser scanning 12 have high costs. The work does address a specific gap in the field.
The authors propose an integrated framework for non-contact multi-view 3D reconstruction based on MVS (Multiple View Stereo) and SfM (Structure From Motion), and introduce various optimizations and adjustment strategies to enhance the network performance. According to them, this method achieves non-invasive reconstruction of the 3D phenotypic structure of realistic objects, the accuracy of the proposed model improved reconstruction by nearly 43.3%, the overall value is improved by nearly 14.3%, which provides a new idea for the development of virtual 3D digitization.
The methodology is correct.
The conclusions are consistent with the evidence and arguments presented and they address the main question posed.
The Figures and Tables are clear and complete.
Excellent job.
I'm just add a few corrections:
Many of the references are incorrect.
For example,
56 S.Lakshmi et al. [5] improved wheat crop yield by using image analysis based ()
5. Toda, Y.; Okura, F.; Ito, J.; Okada, S.; Kinoshita, T.; Tsuji, H.; Saisho, D. Training instance segmentation neural network with synthetic datasets for crop seed phenotyping. Communications Biology 2020, 3.
6. Lakshmi, S.; Sivakumar, R. Plant Phenotyping Through Image Analysis Using Nature Inspired Optimization Techniques. Intelligent Systems Reference Library 2018.
Haoqiang Fan et al. [21] proposed a single view 3D reconstruction scheme
21. Qi, C.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In 697 Proceedings of the NIPS, 2017.
22. Fan, H.; Su, H.; Guibas, L.J. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. 2017 IEEE 699 Conference on Computer Vision and Pattern Recognition (CVPR) 2016, 2463-2471.
398 In this paper, we use RANSAC [33] fitting data points, 3D points, Average Track length and Minimum Reprojection Error to evaluate the SfM system.
33. Lindenberger, P.; Sarlin, P.-E.; Larsson, V.; Pollefeys, M. Pixel-Perfect Structure-from-Motion with Featuremetric Refinement. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021, 5967-5977.
We used the hand-made local feature SIFT [34] and the deep learning-based SuperPoint [35, 36], D2Net [36, 1], and R2D2 [37] algorithms for comparison in order to find the optimal strategy.
34. Chum, O.; Matas, J.; Kittler, J. Locally Optimized RANSAC. In Proceedings of the DAGM-Symposium, 2003.
The correct citation is :
Chum, O., Matas, J., Kittler, J. (2003). Locally Optimized RANSAC. In: Michaelis, B., Krell, G. (eds) Pattern Recognition. DAGM 2003. Lecture Notes in Computer Science, vol 2781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45243-0_31
35. LoweDavid, G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 2004. 724
36. DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. 2018 725 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2017, 337-33712. 726
37. Revaud, J.; Weinzaepfel, P.; Souza, C.R.d.; Pion, N.e.; Csurka, G.; Cabon, Y.; Humenberger, M. R2D2: Repeatable and Reliable Detector and Descriptor. ArXiv 2019, abs/1906.06195.
from my point of view the correct citation for D2Net is:
1. Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. ArXiv 2019, abs/1905.03561.
Charles R. Qi et al. pioneered the concept of "point cloud features" [19][20] [19, 20],
Yuhang Yang [22] et al. built a semantic information-based 3D reconstruction method for
Among them, Di cChang et al [30]
some figures are not referenced in the text. E.g.
At the same time, we compared the point cloud generation result of our model with 568 that of the traditional method Colmap[38] (Figure 14).
13 have high costs, and multi-view multiview unsupervised reconstruction methods are still blank in the
16 … sequential wheat crop images. Firstly, the Camera Intrin-sics Intrinsics…
17 from-motion system by fea-ture
20 difficulty of capturing complex features by the tradition-al MVS model.
60 Currently, point cloud technology is rapidly developing to reconstruct data points on the surface of objects into 3D models. On the one hand, point cloud data… data set to the original sensor; on the other hand, point cloud stores … compared to two-dimensional data.
103 model based on deep neural network ,using spatial pyramidal pooling to improve wheat
The experimental environments were:Ubuntu(Canonical Ltd,London,UK),CUDA 389 11.3(NVIDIA Corporation,Santa Clara,CA,USA),Python 3.8(Python Software Founda- 390 tion,New Castle,DE,USA),PyTorch 1.10.0(Facebook Artificial Intelligence Institute,New 391 York,NY,USA) . Missing spaces-
unsupervised trunk CasMVSNet[27] CasMVSNet [27]
Author Response
Please see the attachment.
Author Response File: Author Response.pdf