Semi-Symmetrical, Fully Convolutional Masked Autoencoder for TBM Muck Image Segmentation
Abstract
:1. Introduction
1.1. Challenges in TBM Muck Segmentation Task
1.2. Innovation of Our Work
- The lightweight decoder of the original MAE [33] has been replaced by a multi-tier sparse convolutional decoder. This modification permits the training errors brought forward by the detailed spatial location of low-level features in images to be accurately calculated and back-propagated through the decoder, thus improving the encoder’s feature extraction capabilities.
- Histogram of Oriented Gradients (HOG) descriptors [38] and labels generated through Gauss–Laplace filtering of the original image—referred to as “Laplacian features” henceforth—have been incorporated as additional self-supervision targets. These enhancements furnish the model with extra training signals for contrast relationships and boundary features, preventing convergence on trivial solutions.
1.3. Terms and Abbreviations
2. Materials and Methods
2.1. Fundamentals of Fully Convolutional Masked Autoencoders
- MAEs obscure a substantial portion (typically over 60%) of the input image on a patch-by-patch basis. A patch is a small, contiguous area within the image, typically measuring 16 × 16 pixels in size. This is the most salient feature distinguishing MAEs from denoising autoencoders. Such masking strategy prevents the model from merely replicating adjacent pixels in the masked areas to reconstruct the image, thereby compelling the model to capture more abstract image features.
- MAEs generate training signals by comparing the pixel value differences between the model’s output and the original input image. In contrast, BeiT [31] initially pre-trains a dVAE [39] to map image patches to visual tokens and to build a codebook. This process turns the reconstruction of masked areas into a classification task over the codebook. However, experiments using MAEs have demonstrated that the tokenization step is not essential for pre-training in Computer Vision [33].
2.2. Design of Semi-Symmetrical, Fully Convolutional Masked Autoencoders
2.2.1. Challenges Associated with the Original MAE in Muck Segmentation Tasks
2.2.2. Self-Supervision Target
- The input image after 4 × down-sampling.
- The HOG feature map of the input image. The HOG descriptor, a feature descriptor widely employed in Computer Vision tasks, encapsulates local shape and texture information by calculating the distribution of gradient orientations within localized sections of an image. As depicted in Figure 5, the MuckSeg-SS-FCMAE computes HOG descriptors using 4 × 4 pixel cells across 8 directions. Block normalization is executed over an 8 × 8 pixel block using a 4 × 4 pixel stride, with the L2-Hys normalization method set to a threshold of 0.4. The block-level HOG descriptors are then reorganized into a cell-wise format, as illustrated in Figure 5b. Owing to the potential quadruple utilization of each cell’s HOG descriptor during normalization, this process yields 4 distinct sets of HOG feature vectors with dimensions (W/4) × (H/4) × 8, where W and H represent the width and height of the input image in pixels, respectively. These sets are subsequently concatenated to form a comprehensive HOG feature map with dimensions (W/4) × (H/4) × 32, which serves as the supervision target, as demonstrated in Figure 5c. It is important to note that the HOG feature vectors at the image’s periphery—specifically at the corners and edges—are only employed once or twice in block normalization and are, therefore, replicated to complete the feature map.
- Boundary features extracted via Gauss–Laplace filtering. The Laplacian filter, a second-order derivative filter, is employed to enhance the edge features in images. Due to its sensitivity to noise, it is typically combined with Gaussian filtering to form Gauss–Laplace filtering. As depicted in Figure 6, the process begins with the input image being blurred with a Gaussian filter with a kernel size of 13. This step is followed by the application of a Laplacian filter with a kernel size of 5, after which the image is normalized. The filtered image is then binarized using a threshold of 0.8, and morphological opening operations are conducted to eliminate noise spots. Finally, the image is down-sampled by a factor of 4 to produce a binary image with dimensions (W/4) × (H/4) × 1, which is used for supervision.
2.2.3. Network Structure
2.2.4. Decoder Used for Downstream End-to-End Finetuning
2.2.5. Loss Function
2.3. Data Preparation
3. Experiments and Results
3.1. Experiment Environment
3.1.1. Hardware Specifications
- CPU: AMD Ryzen 9 5950X 16-core processor.
- GPU: Nvidia GeForce RTX 4090.
- RAM: 64GB DDR4 at 3200 MHz.
3.1.2. Software Environment
- Operating system: Ubuntu 22.04.
- Programming language: Python 3.9.13.
- Deep Learning Framework: PyTorch 2.0.1 with Lightning 2.0.1.
- Sparse convolution operator library: MinkowskiEngine 0.5.4.
3.2. Evaluation Criteria
- Image reconstruction capability. The autoencoder’s ability to reconstruct images is a vital metric for determining if the learned representations hold enough information to replicate the original input. This reflects not only the model’s grasp of the input data’s structure but also its ability to identify essential features.
- Feature extraction capability of the MAE encoder. The MAE is designed to empower the encoder to identify fundamental image features autonomously. This positions the model at a more advantageous starting point in the parameter space for downstream tasks, increasing the likelihood of optimal parameter convergence and reducing the risk of entrapment in local minima. Furthermore, exposing the encoder to a broader dataset enhances the model’s generalization and robustness. Thus, the MAE encoder’s proficiency in discerning target feature semantics is crucial for gauging MAE’s effectiveness.
- Performance improvement when transferring to downstream tasks. The MAE’s impact on muck segmentation tasks is most directly observed by comparing the improvements in the IoU of boundary and region labels predicted by the pre-trained network against those trained from scratch. The IoU is calculated as follows:
3.3. Training Procedure
3.4. Image Reconstruction Capability
3.5. Feature Extraction Capability
3.6. Improvement on Segmentation Results
3.7. Hyperparameter Optimization
- Non-independent hyperparameters include the following:
- The dimension of the feature map produced by the stem block, denoted as D.
- The count of ConvNeXt blocks in each encoder tier, denoted as Nei for i = 1, 2, 3, and 4.
- 2.
- Independent hyperparameters include the following:
- The feature dimension of the mask token, denoted as Dm.
- The count of ConvNeXt blocks in each MAE decoder tier, denoted as Ndi for i = 1 and 2.
- The count of ConvNeXt blocks in the MAE neck, denoted as Nn.
- The number of model parameters, measured in millions of parameters (MParams).
- The computational cost of processing a single sample, measured in billions of floating-point operations (GFLOPs).
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, J.B.; Jing, L.J.; Zheng, X.F.; Li, P.Y.; Yang, C. Application and Outlook of Information and Intelligence Technology for Safe and Efficient TBM Construction. Tunn. Undergr. Space Technol. 2019, 93, 103097. [Google Scholar] [CrossRef]
- Li, J.B.; Chen, Z.Y.; Li, X.; Jing, L.J.; Zhang, Y.P.; Xiao, H.H.; Wang, S.J.; Yang, W.K.; Wu, L.J.; Li, P.Y.; et al. Feedback on a Shared Big Dataset for Intelligent TBM Part I: Feature Extraction and Machine Learning Methods. Undergr. Space 2023, 11, 1–25. [Google Scholar] [CrossRef]
- Liu, B.; Wang, J.W.; Wang, R.R.; Wang, Y.X.; Zhao, G.Z. Intelligent Decision-Making Method of TBM Operating Parameters Based on Multiple Constraints and Objective Optimization. J. Rock Mech. Geotech. Eng. 2023, 15, 2842–2856. [Google Scholar] [CrossRef]
- Guo, D.; Li, J.H.; Jiang, S.H.; Li, X.; Chen, Z.Y. Intelligent Assistant Driving Method for Tunnel Boring Machine Based on Big Data. Acta Geotech. 2022, 17, 1019–1030. [Google Scholar] [CrossRef]
- Zhang, Y.K.; Gong, G.F.; Yang, H.Y.; Chen, Y.X.; Chen, G.L. Towards Autonomous and Optimal Excavation of Shield Machine: A Deep Reinforcement Learning-Based Approach. J. Zhejiang Univ. Sci. A 2022, 23, 458–478. [Google Scholar] [CrossRef]
- Yokota, Y.; Yamamoto, T.; Shirasagi, S.; Koizumi, Y.; Descour, J.; Kohlhaas, M. Evaluation of Geological Conditions Ahead of TBM Tunnel Using Wireless Seismic Reflector Tracing System. Tunn. Undergr. Space Technol. 2016, 57, 85–90. [Google Scholar] [CrossRef]
- Li, C.S.; Gu, T.; Ding, J.F.; Yu, W.G.; He, F.L. Horizontal Sound Probing (HSP) Geology Prediction Method Appropriated to Tbm Construction. J. Eng. Geol. 2008, 16, 111–115. [Google Scholar]
- Li, S.C.; Nie, L.C.; Liu, B. The Practice of Forward Prospecting of Adverse Geology Applied to Hard Rock TBM Tunnel Construction: The Case of the Songhua River Water Conveyance Project in the Middle of Jilin Province. Engineering 2018, 4, 131–137. [Google Scholar] [CrossRef]
- Kaus, A.; Boening, W. BEAM—Geoelectrical Ahead Monitoring for TBM-Drives. Geomech. Tunn. 2008, 1, 442–449. [Google Scholar] [CrossRef]
- Mohammadi, M.; Khademi Hamidi, J.; Rostami, J.; Goshtasbi, K. A Closer Look into Chip Shape/Size and Efficiency of Rock Cutting with a Simple Chisel Pick: A Laboratory Scale Investigation. Rock Mech. Rock Eng. 2020, 53, 1375–1392. [Google Scholar] [CrossRef]
- Tuncdemir, H.; Bilgin, N.; Copur, H.; Balci, C. Control of Rock Cutting Efficiency by Muck Size. Int. J. Rock Mech. Min. Sci. 2008, 45, 278–288. [Google Scholar] [CrossRef]
- Heydari, S.; Khademi Hamidi, J.; Monjezi, M.; Eftekhari, A. An Investigation of the Relationship between Muck Geometry, TBM Performance, and Operational Parameters: A Case Study in Golab II Water Transfer Tunnel. Tunn. Undergr. Space Technol. 2019, 88, 73–86. [Google Scholar] [CrossRef]
- Barron, L.; Smith, M.L.; Prisbrey, K. Neural Network Pattern Recognition of Blast Fragment Size Distributions. Part. Sci. Technol. 1994, 12, 235–242. [Google Scholar] [CrossRef]
- Jemwa, G.T.; Aldrich, C. Estimating Size Fraction Categories of Coal Particles on Conveyor Belts Using Image Texture Modeling Methods. Expert Syst. Appl. 2012, 39, 7947–7960. [Google Scholar] [CrossRef]
- Rispoli, A.; Ferrero, A.M.; Cardu, M.; Farinetti, A. Determining the Particle Size of Debris from a Tunnel Boring Machine Through Photographic Analysis and Comparison Between Excavation Performance and Rock Mass Properties. Rock Mech. Rock Eng. 2017, 50, 2805–2816. [Google Scholar] [CrossRef]
- Abu Bakar, M.Z.; Gertsch, L.S.; Rostami, J. Evaluation of Fragments from Disc Cutting of Dry and Saturated Sandstone. Rock Mech. Rock Eng. 2014, 47, 1891–1903. [Google Scholar] [CrossRef]
- Al-Thyabat, S.; Miles, N.J.; Koh, T.S. Estimation of the Size Distribution of Particles Moving on a Conveyor Belt. Miner. Eng. 2007, 20, 72–83. [Google Scholar] [CrossRef]
- Chen, H.A.; Jin, Y.; Li, G.Q.; Chu, B. Automated Cement Fragment Image Segmentation and Distribution Estimation via a Holistically-Nested Convolutional Network and Morphological Analysis. Powder Technol. 2018, 339, 306–313. [Google Scholar] [CrossRef]
- Liu, H.Q.; Yao, M.B.; Xiao, X.M.; Xiong, Y.G. RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4600116. [Google Scholar] [CrossRef]
- Fan, L.L.; Yuan, J.B.; Niu, X.W.; Zha, K.K.; Ma, W.Q. RockSeg: A Novel Semantic Segmentation Network Based on a Hybrid Framework Combining a Convolutional Neural Network and Transformer for Deep Space Rock Images. Remote Sens. 2023, 15, 3935. [Google Scholar] [CrossRef]
- Liang, Z.Y.; Nie, Z.H.; An, A.J.; Gong, J.; Wang, X. A Particle Shape Extraction and Evaluation Method Using a Deep Convolutional Neural Network and Digital Image Processing. Powder Technol. 2019, 353, 156–170. [Google Scholar] [CrossRef]
- Zhou, X.X.; Gong, Q.M.; Liu, Y.Q.; Yin, L.J. Automatic Segmentation of TBM Muck Images via a Deep-Learning Approach to Estimate the Size and Shape of Rock Chips. Autom. Constr. 2021, 126, 103685. [Google Scholar] [CrossRef]
- Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.-A. Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2536–2544. [Google Scholar]
- Baevski, A.; Hsu, W.N.; Xu, Q.; Babu, A.; Gu, J.; Auli, M. Data2vec: A General Framework for Self-Supervised Learning in Speech, Vision and Language. In Proceedings of the 39th International Conference on Machine Learning; PMLR, Baltimore, MD, USA, 28 June 2022; pp. 1298–1312. [Google Scholar]
- Assran, M.; Caron, M.; Misra, I.; Bojanowski, P.; Bordes, F.; Vincent, P.; Joulin, A.; Rabbat, M.; Ballas, N. Masked Siamese Networks for Label-Efficient Learning. In Proceedings of the Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 456–473. [Google Scholar]
- He, K.M.; Fan, H.Q.; Wu, Y.X.; Xie, S.N.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning; PMLR, Virtual Event, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Dong, X.Y.; Bao, J.M.; Zhang, T.; Chen, D.D.; Zhang, W.M.; Yuan, L.; Chen, D.; Wen, F.; Yu, N.H.; Guo, B.N. PeCo: Perceptual Codebook for BERT Pre-Training of Vision Transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 552–560. [Google Scholar] [CrossRef]
- Kenton, J.D.M.-W.C.; Toutanova, L.K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Naacl-HLT, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 2. [Google Scholar]
- Bao, H.B.; Dong, L.; Piao, S.H.; Wei, F.R. BEiT: BERT Pre-Training of Image Transformers. arXiv 2023, arXiv:2106.08254. [Google Scholar] [CrossRef]
- Dong, X.Y.; Bao, J.M.; Zhang, T.; Chen, D.D.; Zhang, W.M.; Yuan, L.; Chen, D.; Wen, F.; Yu, N.H. Bootstrapped Masked Autoencoders for Vision BERT Pretraining. In Proceedings of the Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 247–264. [Google Scholar]
- He, K.M.; Chen, X.L.; Xie, S.N.; Li, Y.H.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Xie, Z.D.; Zhang, Z.; Cao, Y.; Lin, Y.T.; Bao, J.M.; Yao, Z.L.; Dai, Q.; Hu, H. SimMIM: A Simple Framework for Masked Image Modeling. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2022; pp. 9653–9663. [Google Scholar]
- Cai, Z.X.; Ghosh, S.; Stefanov, K.; Dhall, A.; Cai, J.F.; Rezatofighi, H.; Haffari, R.; Hayat, M. MARLIN: Masked Autoencoder for Facial Video Representation LearnINg. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1493–1504. [Google Scholar]
- Pang, Y.T.; Wang, W.X.; Tay, F.E.H.; Liu, W.; Tian, Y.H.; Yuan, L. Masked Autoencoders for Point Cloud Self-Supervised Learning. In Proceedings of the Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 604–621. [Google Scholar]
- Reed, C.J.; Gupta, R.; Li, S.; Brockman, S.; Funk, C.; Clipp, B.; Keutzer, K.; Candido, S.; Uyttendaele, M.; Darrell, T. Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Vancouver, BC, Canada, 17–24 June 2023; pp. 4088–4099. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 25–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning; PMLR, Virtual Event, 18–24 July 2021; pp. 8821–8831. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.H.; Chen, X.L.; Liu, Z.; Kweon, I.S.; Xie, S.N. ConvNeXt V2: Co-Designing and Scaling ConvNets with Masked Autoencoders 2023. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Graham, B.; van der Maaten, L. Submanifold Sparse Convolutional Networks. arXiv 2023, arXiv:1706.01307. [Google Scholar] [CrossRef]
- Deng, M.J. Challenges and Thoughts on Risk Management and Control for the Group Construction of a Super-Long Tunnel by TBM. Engineering 2018, 4, 112–122. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2023, arXiv:1711.05101. [Google Scholar] [CrossRef]
- Smith, L.N. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 17–24 March 2017; pp. 464–472. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [CrossRef]
Abbreviation | Full Phrase | Description |
---|---|---|
BCE | Binary Cross-Entropy | A loss function for binary classification problems |
BERT | Bidirectional Encoder Representations from Transformers | A self-supervised training framework for natural language processing |
BEiT | Bidirectional Encoder representation from Image Transformers | A BERT-like self-supervised training framework for image processing |
CNN | Convolutional Neural Network | A class of neural networks centered around convolutional operations |
ConvNeXt | – | A modern CNN design concept, which also refers to a specific convolutional module structure |
CV | Computer Vision | A field of study that enables computers to understand visual information from the world |
dVAE | Discrete Variational Autoencoder | A type of generative model that learns to encode data into a discrete latent space |
FCMAE | Fully Convolutional Masked Autoencoder | An alternative version of MAE which uses CNN instead of transformer as the encoder |
HOG | Histogram of Oriented Gradients | A classical image feature descriptor |
IoU | Intersection over Union | A metric used to evaluate the conformity of two sets of spatial positions |
MAE | Masked Autoencoder | A self-supervised training framework for image processing which directly regresses the masked pixels |
MIM | Masked Image Modeling | A category of algorithms that learn image features by artificially removing parts of the information in images |
MLP | Multi-Layer Perceptron | A basic neural network structure |
MSE | Mean Squared Error | A loss function for regression problems |
NLP | Natural Language Processing | A branch of artificial intelligence that focuses on the interaction between computers and humans through natural language |
ROI | Region Of Interest | A selected subset of samples within an image dataset |
SimMIM | – | Another masked image modeling method that is very similar to the MAE |
TBM | Tunnel Boring Machine | A piece of highly integrated heavy machinery equipment used for tunnel construction |
UCS | Uniaxial Compressive Strength | A measure of the maximum stress that a rock specimen can withstand under uniaxial loading conditions before failure occurs |
ViT | Vision Transformer | A transformer-based neural network architecture for image processing |
Network Units | Kernel Size | Padding | Number of Feature Dimensions |
---|---|---|---|
Identity | – | – | 1 |
Maximum pooling | 3 × 3 | 1 × 1 | 2 |
Average pooling | 3 × 3 | 1 × 1 | 1 |
Depth-wise convolution | 3 × 3 | 1 × 1 | 8 |
Depth-wise convolution | 5 × 5 | 2 × 2 | 4 |
Depth-wise convolution | 7 × 7 | 3 × 3 | 4 |
Depth-wise convolution | 9 × 9 | 4 × 4 | 4 |
Depth-wise convolution × 2 | 3 × 3 | 1 × 1 | 4 |
Depth-wise convolution × 2 | 5 × 5 | 2 × 2 | 4 |
# | Project Name | Primary Rock Types | Estimate Range of UCS/MPa | Total Image Count |
---|---|---|---|---|
1 | Xi-Er tunnel of project [43] | granite, gneiss, and quartzite | 30~150 | over 32,000 |
2 | Tianshan Shengli tunnel | granite, gneiss, diorite, and sandstone | 50~120 | over 8000 |
Symbol | Description | Value |
---|---|---|
B | Batch size | 2 |
ntotal | Total training steps | 300,000 |
β1 | 1st rank momentum coefficient of AdamW | 0.9 |
β2 | 2nd rank momentum coefficient of AdamW | 0.99 |
λ | Weight decay coefficient of AdamW | 0.01 |
LB | Base learning rate | 10−5 |
LU | Learning rate limit | 10−4 |
ζ | Damping coefficient | 1−10−5 |
SU | Learning rate increase period | 2000 |
SD | Learning rate decrease period | 2000 |
Metric | MSD-UNet | Scratch | Orig. FCMAE | SS-FCMAE w/o HOG & Lap. | SS-FCMAE |
---|---|---|---|---|---|
IoU (Boundary) | 0.725 | 0.741 | 0.755 | 0.775 | 0.785 |
IoU (Region) | 0.878 | 0.897 | 0.908 | 0.913 | 0.919 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lei, K.; Tan, Z.; Wang, X.; Zhou, Z. Semi-Symmetrical, Fully Convolutional Masked Autoencoder for TBM Muck Image Segmentation. Symmetry 2024, 16, 222. https://doi.org/10.3390/sym16020222
Lei K, Tan Z, Wang X, Zhou Z. Semi-Symmetrical, Fully Convolutional Masked Autoencoder for TBM Muck Image Segmentation. Symmetry. 2024; 16(2):222. https://doi.org/10.3390/sym16020222
Chicago/Turabian StyleLei, Ke, Zhongsheng Tan, Xiuying Wang, and Zhenliang Zhou. 2024. "Semi-Symmetrical, Fully Convolutional Masked Autoencoder for TBM Muck Image Segmentation" Symmetry 16, no. 2: 222. https://doi.org/10.3390/sym16020222