Deep Learning-Based 3D Reconstruction for Defect Detection in Shipbuilding Sub-Assemblies
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents an unsupervised anomaly detection framework based on reconstruction, which is used to detect defects of shipbuilding components based on 3D point clouds. However, some revisions are required before acceptance.
- The validity and transferability of the experimental setup are insufficient. It is recommended that detailed revisions be made.
- The text describes that normal samples are divided into training, testing, and validation sets in a 70/20/10 ratio, while defective samples are "only used in the validation stage". However, Table 2 reports "Test performance", and the contamination parameter is adjusted for each model/object combination to maximize F1. If this parameter selects to use data with defect labels, then this dataset actually assumes the role of a validation set, and treating it as a "test performance" would lead to data leakage/optimistic bias and overestimation of generalization ability. It is suggested that the author should clearly clarify and standardize the evaluation process.
- The statement regarding whether "AtlasNet" is included or not is inconsistent. The abstract and the chapters clearly list AtlasNet as one of the candidate architectures, and the illustrations also include this model; however, the experimental part and Tables 1/2 only present four types of architectures (VAE, FoldingNet, DGCNN, PointNet++AE). This inconsistency will undermine the completeness and reproducibility of the comparative experiments.
- The lack of defect location and local interpretability makes it difficult to meet the requirements of industrial rework. The current process compresses the reconstruction error of the entire point cloud into a global metric (CD/EMD) and inputs it into the Isolation Forest, thereby achieving component-level "whether there is a defect" judgment. However, for industrial rework scenarios, locating the defect area and evaluating the defect range are equally crucial. Using a global scalar as a feature may be insufficiently sensitive to small local areas and is not conducive to interpretation and engineering loop.
Author Response
Comment 1: This paper presents an unsupervised anomaly detection framework based on reconstruction, which is used to detect defects of shipbuilding components based on 3D point clouds. However, some revisions are required before acceptance. 1. The validity and transferability of the experimental setup are insufficient. It is recommended that detailed revisions be made.
Response 1: We would like to thank the reviewer for the feedback and the time dedicated to reviewing this paper. We have clarified the scope of our claims in the introduction by stating that the purpose of the study is to compare several approaches and identify suitable options for automated overshooting detection, rather than a single optimal solution. The revised text now reads as follows:
“ (...) The purpose of this study is to compare several approaches and identify suitable options for automated overshooting detection in a realistic industrial setting. Our goal is to offer practical insights for implementing reliable quality control in shipbuilding and other high-precision manufacturing environments.(...) ”
In addition, we have expanded the conclusions to provide explicit practical recommendations on when each architecture should be preferred, thereby clarifying how the experimental findings can be transferred to new shipbuilding sub-assemblies with different geometries and computational constraints. The revised conclusions now include the following text:
“ (...) From a practical point of view, our results indicate that FoldingNet is the most suitable default choice for overshooting detection in new shipbuilding sub-assemblies, as it offers a robust trade-off between accuracy, stability across geometries, and model complexity. PointNet++ Autoencoder can be recommended when slightly higher computational cost is acceptable in exchange for comparable performance, particularly on geometries similar to objects 206 and 221, while DGCNN should be reserved for scenarios where its behaviour can be validated on shapes close to object 301, given its strong dependence on the underlying geometry. (...)”
Should any aspects of the validity or transferability of the experimental setup remain unclear, we would be grateful if the reviewer could specify which points require further clarification, so that we can address them more precisely in the manuscript.
Comment 2: The text describes that normal samples are divided into training, testing, and validation sets in a 70/20/10 ratio, while defective samples are "only used in the validation stage". However, Table 2 reports "Test performance", and the contamination parameter is adjusted for each model/object combination to maximize F1. If this parameter selects to use data with defect labels, then this dataset actually assumes the role of a validation set, and treating it as a "test performance" would lead to data leakage/optimistic bias and overestimation of generalization ability. It is suggested that the author should clearly clarify and standardize the evaluation process.
Response 2: Thank you very much for pointing out this issue and for carefully examining our experimental setup. The confusion arose from an imprecise wording in the manuscript: in our experiments, anomalous (defective) samples are only used in the validation stage, not for training, and the reported F1 scores correspond to this validation step. To make this explicit and avoid any misunderstanding, we have revised the sentence in the Results and Analysis subsection to:
“(...) The best validation performance achieved by each reconstruction model on the three objects is detailed in Table 2(...)”
Also, we have changed the caption of Table 2 to “Validation performance of the models for each object.”.
Comment 3: The statement regarding whether "AtlasNet" is included or not is inconsistent. The abstract and the chapters clearly list AtlasNet as one of the candidate architectures, and the illustrations also include this model; however, the experimental part and Tables 1/2 only present four types of architectures (VAE, FoldingNet, DGCNN, PointNet++AE). This inconsistency will undermine the completeness and reproducibility of the comparative experiments.
Response 3: Thank you for pointing out this inconsistency. AtlasNet was considered in the initial design of the study but was not included in the final set of implemented and evaluated architectures. The remaining references to AtlasNet in the manuscript were left by mistake and could mislead readers regarding the completeness and reproducibility of the experiments. We have therefore removed AtlasNet from the manuscript so that it consistently describes only the four architectures that were actually implemented and evaluated.
Comment 4: The lack of defect location and local interpretability makes it difficult to meet the requirements of industrial rework. The current process compresses the reconstruction error of the entire point cloud into a global metric (CD/EMD) and inputs it into the Isolation Forest, thereby achieving component-level "whether there is a defect" judgment. However, for industrial rework scenarios, locating the defect area and evaluating the defect range are equally crucial. Using a global scalar as a feature may be insufficiently sensitive to small local areas and is not conducive to interpretation and engineering loop.
Response 4: We appreciate this insightful comment. We agree that our current pipeline, which compresses the reconstruction error of the entire point cloud into a global scalar (CD/EMD) for Isolation Forest, is limited to component-level defect detection and does not directly provide defect localization or defect size estimation. However, in the shipbuilding context we consider, even a single overshooting defect in these base sub-assemblies is sufficient to render the part unusable, so our primary objective in this work is defect detection at the component level rather than precise defect localization. We nevertheless acknowledge that localizing and quantifying the defective region is highly relevant for industrial settings, so we have added the following paragraph to the conclusions and future work section:
“ (...)For future work, we first suggest expanding the number and diversity of objects considered, including different types of sub-assemblies and defect geometries, to more thoroughly assess the generalization capability of the reconstruction models. We also suggest training and evaluating the full pipeline directly on point clouds acquired from real shipbuilding components instead of relying solely on 3D-printed representations. In addition, future work will focus on extending the proposed reconstruction-based approach from component-level defect detection to point-wise or region-wise defect localization on 3D point cloud in order to better support defect localization and rework procedures in complex shipbuilding environments. Finally, we suggest integrating the best-performing models into an actual industrial inspection workflow to validate their performance under real-time constraints and evaluate their impact on shipyard quality control processes.(...) ”
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article deals with a practical subject matter. Detection of defects in ship construction is a significant issue. It encompasses numerous subfields with distinct technical and scientific characteristics. Handling and manufacture of large parts can sometimes lead to damage with consequences for subsequent assembly processes. The authors' study actually represents the initial stage of broader research into anomaly detection in this specific context. The article outlines cases with evident damage, and the author's intention is to compare various automatic detection algorithms to inform assessments for more challenging scenarios. It has been carefully designed to be easily followed from start to finish. In the reviewer's opinion, some interventions are nonetheless required.
1. The title "Deep Learning-Based 3D Reconstruction for Defect Detection in Metalworking Industry" implies a broad applicability to the Metalworking Industry, but the paper specifically focuses on the situation of "T subassemblies". The entire approach regarding this situation is carried out on a very general theoretical basis, valid in other fields as well, but not coagulated by a specific methodology. As a result:
(a) The title should align with the actual thematic area addressed in the manuscript.
(b) The authors should specify their distinct contributions.
(c) A potential contribution could include a methodology that is illustrated through a flowchart and also incorporates Figure 2.
2. In general, an article typically employs a notation (symbol) with a single, fixed meaning. The manuscript employs the same symbol with distinct connotations in multiple segments of the theoretical foundation. It should be specified that the meanings specified have only local applicability, not general.
3. Some figures should be improved. Thus Figure 3 does not show the perspective image. In Figures 3-5 the characters are very small. Also, the size of Figure 5 could be reduced.
4. Several text sequences are repeated (L63-68, L290-294).
5. The term “overshooting defects” should be explained.
6. The article lacks theoretical goals but its practical focus, backed by mathematical evidence, warrants encouragement. The topic of "Deep Learning-Based 3D Reconstruction for Defect Detection" is well documented in existing literature. To synthesize a methodology and its associated graphic material, one can refer to studies that are available on Google Scholar under the topic previously mentioned.
7. The authors state the following in L148-151.: "The purpose of this study is to identify the optimal approach for automated overshooting detection in a realistic industrial setting. Our goal is to provide practical guidance for implementing reliable quality control in shipbuilding and other high-precision manufacturing environments." The article fails to deliver on its promises of an 'optimal approach...' or 'practical guidance...', as the conclusions fall short. L693-793 consist of general statements, L704-730 include specific quantitative summaries, L731-737 contain overall analyses without methodological revisions, and L738-743 outline future plans in line with L148-L151.
Author Response
Comment 1: The title "Deep Learning-Based 3D Reconstruction for Defect Detection in Metalworking Industry" implies a broad applicability to the Metalworking Industry, but the paper specifically focuses on the situation of "T subassemblies". The entire approach regarding this situation is carried out on a very general theoretical basis, valid in other fields as well, but not coagulated by a specific methodology. As a result:
(a) The title should align with the actual thematic area addressed in the manuscript.
(b) The authors should specify their distinct contributions.
(c) A potential contribution could include a methodology that is illustrated through a flowchart and also incorporates Figure 2.
Response 1: We would like to thank the reviewer for the feedback and the time dedicated to reviewing this paper. Regarding Comment 1:
(a) Title alignment: We agree that the original title suggested a broader scope than the one actually addressed. The title has been revised to better reflect the specific focus on overshooting defects in shipbuilding sub-assemblies. “Deep Learning-Based 3D Reconstruction for Defect Detection in Shipbuilding Sub-Assemblies”
- b) Specification of contributions: We have added a short paragraph at the end of the Introduction explicitly listing the main contributions of this work, highlighting our proposed pipeline and the comparative analysis of the different reconstruction models.
“ (...) Based on this context, the main contributions of this work are threefold. First, we propose an unsupervised, reconstruction-based pipeline for detecting overshooting defects in shipbuilding sub-assemblies using 3D point clouds and anomaly scores derived from reconstruction errors. Second, we systematically compare several deep learning architectures for point cloud reconstruction under a common training and evaluation setup. We analyze their convergence behavior, computational cost, and detection performance. Third, we demonstrate the practical relevance of this approach on multiple sub-assembly geometries, showing how the choice of architecture and anomaly threshold affects the trade-off between defect detection capability and false alarms, and providing guidelines for its adoption in industrial quality control.(...) ’’
(c)Methodology and flowchart. We have clarified the overall methodology in the Approach section as follows: “Our approach follows a four-stage reconstruction-based anomaly detection pipeline, illustrated in Figure 2. The main stages are: (1) component modeling and defect definition, (2) point cloud acquisition and dataset construction, (3) training of point cloud reconstruction models on non-defective data, and (4) anomaly detection using Isolation Forest on reconstruction error features.(...) In the third stage (model training), (...) In the final stage (anomaly detection) (...)’’
Comment 2: In general, an article typically employs a notation (symbol) with a single, fixed meaning. The manuscript employs the same symbol with distinct connotations in multiple segments of the theoretical foundation. It should be specified that the meanings specified have only local applicability, not general.
Response 2: We thank the reviewer for this observation regarding the notation. In response, we have added a paragraph at the Models subsection stating that, in the following subsections, symbols are defined locally within each model description and that their meaning does not extend beyond the corresponding subsection.
“In the following subsections, we describe each reconstruction model using a local mathematical notation. All symbols such as X, Xˆ , z, U, and N are defined within the context of each model and their meaning does not extend beyond the corresponding subsection.”
Comment 3: Some figures should be improved. Thus Figure 3 does not show the perspective image. In Figures 3-5 the characters are very small. Also, the size of Figure 5 could be reduced.
Response 3: Thank you very much for your comment. We have revised and updated the relevant figures (Figures 3–5) to improve their clarity, readability, and overall presentation quality in the manuscript.
Comment 4: Several text sequences are repeated (L63-68, L290-294).
Response 4: We thank the reviewer for carefully spotting this issue. The sentences were unintentionally duplicated while we were reworking the wording of that paragraph. We have removed the redundant text and kept a single, streamlined version of the paragraphs in the revised manuscript.
Comment 5: The term “overshooting defects” should be explained.
Response 5: We thank the reviewer for this remark. The meaning of the term overshooting defect has been clarified in the Case Study section as follows:
“One of the most common issues observed during production is the overshooting defect, which occurs when material is unintentionally removed beyond the intended design surface during the cutting process. This typically happens when the thermal or mechanical cutting tool slightly exceeds the programmed trajectory, or when positioning or alignment errors cause the torch or blade to advance too far. As a result, small recesses or notches are generated along the edge, where the actual contour lies inside the nominal boundary, reducing the effective contact area at welded joints and potentially affecting both assembly precision and structural integrity. ”
Comment 6: The article lacks theoretical goals but its practical focus, backed by mathematical evidence, warrants encouragement. The topic of "Deep Learning-Based 3D Reconstruction for Defect Detection" is well documented in existing literature. To synthesize a methodology and its associated graphic material, one can refer to studies that are available on Google Scholar under the topic previously mentioned.
Response 6: We thank the reviewer for this insightful comment. In the revised manuscript, we have expanded the Introduction by adding a new paragraph and several additional references on deep learning-based 3D reconstruction for defect detection in industrial context. We also distinguish these approaches with our work, clarifying the relationship between our contribution and the existing literature on this topic. The modifications we made to the introduction section are the following:
“Recent studies have shown that deep learning-based 3D reconstruction can be effectively employed for defect detection in industrial components. For example, a recent 3D anomaly detection method based solely on point cloud reconstruction demonstrates that autoencoder-style reconstruction can localize industrial surface defects without relying on large memory banks or pre-trained models [21]. Other authors propose full point-cloud reconstruction pipelines for 3D anomaly detection on industrial parts, demonstrating competitive performance on datasets such as MVTec 3D-AD [22]. Reconstruction-based autoencoders have also been applied to surface inspection scenarios, where deviations are detected by comparing reconstructed and measured shapes from laser sensors [23]. In parallel, several reviews highlight the growing role of point-cloud deep learning in industrial production and defect inspection, with reconstruction-based methods emerging as a key direction for automatic quality control[24][25]. Unlike these works, which mainly focus on generic benchmark datasets or other manufacturing domains, the application of reconstruction-based point cloud models to real industrial components presents an opportunity to extend their use to more complex and critical scenarios, such as shipbuilding sub-assemblies. ”
Comment 7: The authors state the following in L148-151.: "The purpose of this study is to identify the optimal approach for automated overshooting detection in a realistic industrial setting. Our goal is to provide practical guidance for implementing reliable quality control in shipbuilding and other high-precision manufacturing environments." The article fails to deliver on its promises of an 'optimal approach...' or 'practical guidance...', as the conclusions fall short. L693-793 consist of general statements, L704-730 include specific quantitative summaries, L731-737 contain overall analyses without methodological revisions, and L738-743 outline future plans in line with L148-L151.
Response 7: Thank you very much for the comment. We agree that the original wording of the introduction and the conclusions overstated the strength of our practical claims. To address this, we have revised the introduction as follows:
“ (...)The purpose of this study is to compare several approaches and identify suitable options for automated overshooting detection in a realistic industrial setting. Our goal is to offer practical insights for implementing reliable quality control in shipbuilding and other high-precision manufacturing environments.(...) ”
Additionally, we have expanded the conclusions to include the following explicit practical recommendation:
“(...)From a practical point of view, our results indicate that FoldingNet is the most suitable default choice for overshooting detection in new shipbuilding sub-assemblies, as it offers a robust trade-off between accuracy, stability across geometries, and model complexity. PointNet++ Autoencoder can be recommended when slightly higher computational cost is acceptable in exchange for comparable performance, particularly on geometries similar to objects 206 and 221, while DGCNN should be reserved for scenarios where its behaviour can be validated on shapes close to object 301, given its strong dependence on the underlying geometry.(...)”
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have accepted all proposed content changes and incorporated them appropriately into the paper. They still need to resolve several interventions in the figures.
1) Now, two figures have the same number: 3.
2) The font size in the last three figures (3 bis, 4, and 5) should be enlarged. There is sufficient space available.
Author Response
We thank the reviewer for the careful reading of our manuscript and the helpful revisions suggested.
Comment 1:Now, two figures have the same number: 3.
Response 1: Thank you for noticing this issue; the duplicated numbering of the figures was caused by an error in the LaTeX code when inserting the figures, and we have now corrected the figure numbers in the revised version.
Comment 2: The font size in the last three figures (3 bis, 4, and 5) should be enlarged. There is sufficient space available.
Response 2: We thank the reviewer for this helpful suggestion. We have increased the font size in the last three figures to improve readability in the revised manuscript. If the reviewer still finds the text insufficiently legible, we can further adjust the layout by placing these figures in a single column instead of two columns.
