Next Article in Journal
Sustainable Strategies for Concrete Infrastructure Preservation: A Comprehensive Review and Perspective
Previous Article in Journal
Modeling Riding and Stopping Behaviors at Motorcycle Box Intersections: A Case Study in Chiang Mai City, Thailand
Previous Article in Special Issue
Evaluation of Flange Grease on Revenue Service Tracks Using Laser-Based Systems and Machine Learning
 
 
Article
Peer-Review Record

Uncertainty Quantification to Assess the Generalisability of Automated Masonry Joint Segmentation Methods

Infrastructures 2025, 10(4), 98; https://doi.org/10.3390/infrastructures10040098
by Jack M. W. Smith * and Chrysothemis Paraskevopoulou
Reviewer 1:
Reviewer 3: Anonymous
Infrastructures 2025, 10(4), 98; https://doi.org/10.3390/infrastructures10040098
Submission received: 28 February 2025 / Revised: 3 April 2025 / Accepted: 12 April 2025 / Published: 18 April 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript addresses an important and timely topic regarding the automation of masonry joint segmentation in railway tunnels using deep learning techniques. It is particularly valuable in the context of aging infrastructure, which is critical for ensuring the safety of railway networks. The proposed use of uncertainty quantification methods in conjunction with semantic segmentation is highly relevant and innovative, especially for practical applications in civil engineering. The manuscript is well-structured and provides a clear introduction to the problem and the motivation behind the study. 
1. Image segmentation with unsupervised learning should be reviewed with state of the art papers such as 10.1155/2024/8933148.

2. There are some minor typos and formatting inconsistencies (e.g., missing punctuation marks, inconsistent spacing between paragraphs) that should be addressed to improve readability.

3. The computational cost of uncertainty quantification (increased inference time) is briefly mentioned, but the manuscript would benefit from more detailed discussions on how this might be overcome in real applications. 

4. While the study shows promise, there is little discussion on how engineers would actually use the uncertainty maps.

5. The use of the results assisted in the management of infrastructure should be mentioned such as 10.1061/JBENF2.BEENG-6159, and others related.

6. The figures are useful and provide clear visualizations of segmentation results and uncertainty maps. However, figure captions could be more descriptive. 

7. The results are thorough, and the performance evaluation metrics (e.g., Intersection Over Union) are appropriate for the task. However, the discussion around uncertainty metrics could be expanded to better explain the practical implications of these results. For instance, how might uncertainty be used to guide engineering decisions, such as whether a particular tunnel section requires further inspection or manual correction?

Author Response

The authors would like to thank the reviewer for their comments and suggestions towards the improvement of the paper’s quality. The Reviewer’s comments have been taken into account by the authors and specific changes are made as follows:

Reviewer 1:

The manuscript addresses an important and timely topic regarding the automation of masonry joint segmentation in railway tunnels using deep learning techniques. It is particularly valuable in the context of aging infrastructure, which is critical for ensuring the safety of railway networks. The proposed use of uncertainty quantification methods in conjunction with semantic segmentation is highly relevant and innovative, especially for practical applications in civil engineering. The manuscript is well-structured and provides a clear introduction to the problem and the motivation behind the study. 


  1. Image segmentation with unsupervised learning should be reviewed with state of the art papers such as 10.1155/2024/8933148.

R1: This paper focuses on uncertainty quantification for supervised deep learning approaches. As a result, unsupervised methods are not reviewed here.

  1. There are some minor typos and formatting inconsistencies (e.g., missing punctuation marks, inconsistent spacing between paragraphs) that should be addressed to improve readability.

R1: This has been adjusted

  1. The computational cost of uncertainty quantification (increased inference time) is briefly mentioned, but the manuscript would benefit from more detailed discussions on how this might be overcome in real applications. 

R2: he following text is added to the conclusion to elaborate further (Page 17, Line 527):

“There is a substantial runtime increase when implementing uncertainty quantifi-cation methods. The runtime increase is proportional to the number of augmentations or dropout variations that are assessed, since inference needs to be computed for every Monte Carlo and augmentation sample. For this study, implementing TTA and MCD increased the inference time by approximately 1500%, as 100 MCD samples and 50 TTA samples were used. Reducing the number of samples reduces the effectiveness of the method as it is necessary to generate a distribution of outputs in order to more con-fidently determine the mean and standard deviation of them. With a low number of Monte Carlo samples it is possible that key augmentation/ dropout permutations are missed, generating misleading results. It is recommended that MCD and TTA should only be implemented on standard office hardware when the computational cost of in-ference is small. Alternatively, cloud compute instances could be rented for the infer-ence process. This would prevent the analysis time forming a bottleneck in conducting a condition assessment without requiring the purchase of expensive specialist hard-ware that may only have occasional use over the lifetime of a project.”

  1. While the study shows promise, there is little discussion on how engineers would actually use the uncertainty maps.

R4: the following text is added to section 3.5 (Page 15, Line 565):

“Without uncertainty maps, it would be necessary for an engineer to inspect every segmentation map in detail to validate the segmentations. However, Sections 3.3 and 3.4 show that although high levels of uncertainty are correlated with poor perfor-mance, it is possible for a patch with low epistemic or aleatoric uncertainty to also generate a poor segmentation. As a result, areas with poor performance, where the segmentation needs to be manually analyzed and corrected cannot be exclusively de-termined using uncertainty values. It is necessary for an engineer to take a holistic ap-proach when identifying locations with poor segmentation performance. The following workflow is suggested:

  1. An engineer should identify the typical masonry block dimensions from the segmentation maps. If there are multiple types of masonry present, then the engineer should conduct the following steps over each type of masonry in turn as uncertainty values are not directly comparable between areas with substantially different properties.
  2. TTA and MCD image patches should be sorted by uncertainty level.
  3. Starting from patches with the highest uncertainty, the patch predictions should be observed alongside the pixelwise uncertainty values and the input depth map. If the predicted joint locations do not appear realistic, then the segmentation should be manually corrected. The MCD pixel uncertainty maps show segmentation outputs with variants of the trained neural network. They can therefore be used as a guide to identify more realistic segmentation can-didates.
  4. Step 3 should be conducted for patches with progressively smaller uncertain-ties until the observed patches have qualitatively acceptable segmentations.
  5. It is necessary to account for areas where there may be poor segmentation performance despite a low level of uncertainty being identified. These regions are likely caused by abnormalities in the input depth map caused by tunnel features that have not been encountered during training and are challenging to accurately identify from the depth map alone. While many of these cases will lead to epistemic uncertainty, it is possible for the network to be confi-dently incorrect if a joint is not visible in the depth map. This may occur, for example, if the mortar is level with the masonry surface. In addition, high levels of noise are not always detected by TTA as aleatoric uncertainty if no reasonable segmentations can be generated. As a result, the engineer should conduct further pixelwise segmentation verification in areas they have identified as anomalous during their on site qualitative inspection of the tunnel.  

 

Although this method is not guaranteed to remove all incorrect segmentations, it is a cost effective procedure for improving segmentation performance given limited available manual analysis time and would substantially reduce the analysis time compared to fully manual labelling of masonry block locations.”  

  1. The use of the results assisted in the management of infrastructure should be mentioned such as 10.1061/JBENF2.BEENG-6159, and others related.

R5: A short discussion on the impact on infrastructure management is added to section 1 as follows (Page 2, Line 56):

“It is vital that condition assessment tasks are digitalised, as automated digital analysis workflows enable better standardisation and traceability of the reasoning behind maintenance recommendations [8]. Generating consistent and reliable automated structural condition assessments would also pave the way for more reliable predictive maintenance strategies [9,10], reducing cost and improving safety across an asset manager’s portfolio.”

  1. The figures are useful and provide clear visualizations of segmentation results and uncertainty maps. However, figure captions could be more descriptive. 

R6: These have been updated

  1. The results are thorough, and the performance evaluation metrics (e.g., Intersection Over Union) are appropriate for the task. However, the discussion around uncertainty metrics could be expanded to better explain the practical implications of these results. For instance, how might uncertainty be used to guide engineering decisions, such as whether a particular tunnel section requires further inspection or manual correction?

R7: See R4

Reviewer 2 Report

Comments and Suggestions for Authors

The main purpose of this article is to compare 2 methods for masonry joint segmentation of tunnel linings. More precisely the Monte Carlo Dropout and the Test Time Augmentation processes are compared. The data were retrieved from 4 lings of masonry, stone and brickwork, in the UK have been measured with Lidar. The data are then divided in learning and test data.
Three learning networks have been composed, the first being well generalised, the second overfitted and the third underfitted. In order to qualify the processes and the networks, both the Intersection Over Union and the Area Variation Coefficient criteria were applied. It turns out there is no clear preference or best choice for the process or the accuracy criterion. Therefore, further research is needed before obtaining a practical tool for masonry joint segmentation of tunnel lining.
The reviewer understands this submission to be a reflection of the present state of development. The paper is quite heavy to read, which can be expected from this type of topic. However, the latter is of high relevance to the community, since thorough analysis of weathered and torn masonry analysis, whether it be with micro-, meso- or macromodels is of capital importance for assessment of old tunnel linings.
The illustrations are well presented and the wording and writing is good.
This reviewer has but small comments and questions, which will be easily addressed by the authors.
1.    In subsection 2.6, line 208, the question rises why you call figure 1 left a depth map ? It seems a 2-D map and joint depth will probably not be quantified in this manner. Please clarify.
2.    Equally, in subsection 2.9, you state the contrast adjustments scaling of pixel values allows to quality the different mortar depths. How is the depth quantified ? Obviously, for practical application, the mortar depth of joints is of capital importance.
3.    In subsection 3.1, would you not agree that the accuracy aimed at should approach a value of 1 ? The outcome for tunnels 2 and 4 is rather poor, compared to the images of figure 2.

Author Response

The authors would like to thank the reviewer for their comments and suggestions towards the improvement of the paper’s quality. The Reviewer’s comments have been taken into account by the authors and specific changes are made as follows:

Reviewer 2:

The main purpose of this article is to compare 2 methods for masonry joint segmentation of tunnel linings. More precisely the Monte Carlo Dropout and the Test Time Augmentation processes are compared. The data were retrieved from 4 lings of masonry, stone and brickwork, in the UK have been measured with Lidar. The data are then divided in learning and test data.
Three learning networks have been composed, the first being well generalised, the second overfitted and the third underfitted. In order to qualify the processes and the networks, both the Intersection Over Union and the Area Variation Coefficient criteria were applied. It turns out there is no clear preference or best choice for the process or the accuracy criterion. Therefore, further research is needed before obtaining a practical tool for masonry joint segmentation of tunnel lining.
The reviewer understands this submission to be a reflection of the present state of development. The paper is quite heavy to read, which can be expected from this type of topic. However, the latter is of high relevance to the community, since thorough analysis of weathered and torn masonry analysis, whether it be with micro-, meso- or macromodels is of capital importance for assessment of old tunnel linings.
The illustrations are well presented and the wording and writing is good.
This reviewer has but small comments and questions, which will be easily addressed by the authors.
1.    In subsection 2.6, line 208, the question rises why you call figure 1 left a depth map ? It seems a 2-D map and joint depth will probably not be quantified in this manner. Please clarify.

R1:

The depth map presented here is the 3D offset of the tunnel lining point cloud from a fitted cylinder. This has been unrolled into a plane and converted to a 2D image where pixel values represent the out of plane distance into the wall. The depth of mortar or block damages can be seen through this. Colour or lidar intensity data are not used in this study. The following text is added to section 2.6 to clarify this (Page 5, Line 214):

“Pixel intensities in the depth map image correspond to the out of plane distance of each point from an ideal cylindrical tunnel lining.”


  1.    Equally, in subsection 2.9, you state the contrast adjustments scaling of pixel values allows to quality the different mortar depths. How is the depth quantified ? Obviously, for practical application, the mortar depth of joints is of capital importance.

R2:

The mortar joint depths are presented in each depth map image used as training data. Scaling the pixel values (changing the image contrast) in the depth map image changes the represented depth of mortar. Creating a wider variation in depth of mortar within the training data improves the generalisability of the network.  The following text is added to 2.9 (Page 7, Line 277):

“Through trial and error, the data augmentations that lead to the best test data performance were determined and applied for Network A.”


  1.    In subsection 3.1, would you not agree that the accuracy aimed at should approach a value of 1 ? The outcome for tunnels 2 and 4 is rather poor, compared to the images of figure 2.

R3:

Yes, if the neural network output was to perfectly match the ground truth, then an ideal segmentation is achieved and the IOU would be 1. Typically, an IOU of 0.5 is considered adequate for segmentation tasks. However,  we are only considering blocks with topologically correct joint maps. Ie. Blocks with full joint closure. The IOU drops significantly when there are many gaps in the segmented joints, as this causes the detected block to be substantially larger than it should. eg. for Tunnel 2 and 4 in networks B and C. The following text is added to section 3.1(Page 8, Line 321):

“Although the decrease in performance when moving from Network A to B and C is qualitatively visible in Figure 2 for tunnels 2 and 4, the IOU shown in Table 3 decreases substantially. This is because even when the joints are segmented largely correctly, small gaps in the joints connect adjacent block instances, leading to a substantial breakdown in block segmentation performance.”

Reviewer 3 Report

Comments and Suggestions for Authors

This research analyzes the potential of Monte Carlo Dropout (MCD) and Test Time Augmentation (TTA) methods for quantifying the uncertainty of neural networks used in masonry joint segmentation from lidar data. The goal is to enhance the reliability of automated procedures for assessing the condition of masonry-lined tunnels, which rely on deep learning-based segmentation.  

The study proposes a method for automatically identifying anomalous segmentations, allowing engineers to manually adjust or remove them. The main contributions include: a comparison between MCD and TTA for uncertainty evaluation, an analysis of the relationship between uncertainty and model performance, and an examination of the usefulness of generated uncertainty maps.  

The paper outlines the process of training a neural network for masonry joint semantic segmentation and then adapts the model to assess the two uncertainty quantification methods. By comparing MCD and TTA, the research investigates the extent to which uncertainty can serve as an indicator of model performance. The findings can contribute to improving the reliability of automated structural inspection methods, enabling specialists to interpret data more accurately and confidently.The article makes original contributions to the field, the research method is innovative, and the bibliography is comprehensive. Consequently, the article can be accepted for publication.

Author Response

The authors would like to thank the reviewer for their time and analysis of the paper. No changes are proposed according to their comments.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Although this paper primarily focuses on uncertainty quantification in supervised deep learning approaches, it is important to note that unsupervised methods for uncertainty quantification often offer a more efficient alternative. Recent research, such as the study by [10.1155/2024/8933148], highlights the advantages of unsupervised techniques, including reduced reliance on labeled data and enhanced scalability in complex datasets. This work, along with other critical studies in the field, provides valuable guidance for advancing research in unsupervised uncertainty quantification, emphasizing its potential to complement or even surpass supervised methods in certain applications. 

Author Response

Comments 1:

Although this paper primarily focuses on uncertainty quantification in supervised deep learning approaches, it is important to note that unsupervised methods for uncertainty quantification often offer a more efficient alternative. Recent research, such as the study by [10.1155/2024/8933148], highlights the advantages of unsupervised techniques, including reduced reliance on labeled data and enhanced scalability in complex datasets. This work, along with other critical studies in the field, provides valuable guidance for advancing research in unsupervised uncertainty quantification, emphasizing its potential to complement or even surpass supervised methods in certain applications.

 

R1: This paper focuses on uncertainty quantification for supervised deep learning approaches. A short review of unsupervised uncertainty quantification is added as follows to section 2 line 138:

 

Unsupervised methods for anomaly segmentation and associated uncertainty estimation have also been developed [25]. These methods typically involve self-training a student network with a teacher one and have recently been applied to medical images [26] and for structural surface damage detection [27]. However, these methods have not been designed to quantify the aleatoric and epistemic uncertainty of the output from existing trained models, so are not examined further. 

Author Response File: Author Response.docx

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

All concerns have been addressed.

Back to TopTop