Exploring Deep Learning Models on GPR Data: A Comparative Study of AlexNet and VGG on a Dataset from Archaeological Sites
Abstract
:1. Introduction
2. Methodology
2.1. Dataset Description
- Anomaly: a generic class that represents strong reflections from the subsurface identified either as stratigraphic layers, bedrock, buried metallic objects, or buried objects not related to the archaeological context. Their shape and size vary, from small and circular (i.e., metallic object) to large and irregular or with some linearity (i.e., stratigraphic layer)
- Noise: in linear form, created either by the rough terrain (i.e., plowing lines) or residual noise when the background noise removal correction is applied.
- Structure: patterns of identified buried foundations and walls of residential and public complexes that are linear, forming corners and rectangles. The structural remains included in this dataset are from the Neolithic, Minoan, Hellenistic, Roman, and early Byzantine Periods. They were all detected in the range of 0.5–1.5 m deep. The identified walls in the dataset exhibit a thickness in the range of ~0.3 to ~1.5 m. The material of most structures is limestone. Further, linear patterns delining ancient roads of the Hellenistic period were also included.
2.1.1. C-scans Processing and Preprocessing
2.1.2. Training Datasets
2.1.3. Evaluation Set
2.2. Deep Learning Architectures
2.2.1. AlexNet
2.2.2. VGG-16 and VGG-19
2.3. Training Overview
- The first trial uses dataset-1, which consists of 15,000 training samples. The resulting models for each architecture are named AlexNet-1, VGG-16-1, and VGG-19-1.
- The second trial uses dataset-2, which is produced from image augmentation techniques and has 60,000 training samples. The resulting models for each architecture were named AlexNet-2, VGG-16-2, and VGG-19-2.
- The third trial uses dataset-1, and image augmentation techniques are applied to replace training samples without affecting the volume. The resulting models for each architecture were named AlexNet-3, VGG-16-3, and VGG-19-3.
2.4. Metrics and Performance Evaluation
2.4.1. Training Performance Metrics
- The loss represents the error between the predicted output and the true output for the images of the training set. It is a measure of how well the model is able to fit the training data.
- Validation loss measures the error for the images on the test set that were not used during training.
- Accuracy expresses the fraction of correctly classified images out of the total number of images. In other words, it measures the percentage of predictions that the model got right in the training set. A higher accuracy value indicates better performance of the model.
- Validation accuracy is the accuracy of the model calculated on the test set.
2.4.2. Classification Metrics
- Confusion matrix: the matrix that is calculated from true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs). In other words, it shows the number of correct and incorrect predictions made by the model in each class and helps to assess the performance of a classification model.
- Precision: a metric that measures the proportion of TPs among the predicted positives. In other words, it measures the model’s ability to identify TPs without including FPs. A high precision indicates a low FPs rate.
- Recall: a metric that measures the proportion of TPs among the actual positives. In other words, it measures the model’s ability to identify all positives. A high recall indicates a low FN rate.
- F1 score: the harmonic mean of precision and recall. It delivers a balance between precision and recall and is a good metric for evaluating the overall performance of a classification model.
2.4.3. Grad-CAM
3. Results
3.1. Training Performance
- AlexNet-1 appears to be more stable towards the end of training. In contrast to validation, the training curves are smooth with model AlexNet-1 to present the fastest convergence to 1, followed closely by model AlexNet-2. Among the three models, AlexNet-3 performed the worst, having very noisy validation curves, while training curves of accuracy and loss do not converge to 1 and 0, respectively. Convergence to 1 and 0 for accuracy and loss are important indices expressing how effectively a model learns during training. Hence, a faster convergence suggests faster learning.
- The models trained by VGG-16 architecture demonstrate overall better training performance. Fluctuations are still present in the validation curves but are more limited compared to AlexNet models. Models VGG16-1 and VGG16-2 performed similarly, with VGG16-1 being slightly more stable toward the end of the training. All three models exhibit smooth training curves, with VGG16-2 having faster convergence, followed closely by VGG16-1. VGG16-3 has a poorer performance exhibiting more noisy validation curves and slower convergence in the training curves.
- The behavior of VGG-19 models is mixed in comparison to VGG-16 models. Models VGG19-1 and VGG19-3 have worse performance than VGG16-1 and VGG16-3, respectively, with more fluctuations in the validation curves. However, towards the end of the training, VGG19-1 also stabilizes its performance. On the other hand, the model VGG19-2 performed the best and better than VGG16-2, exhibiting much smoother and more stable validation curves. The training curves also show good behavior with the accuracy to converge to 1 and loss to converge to 0.
3.2. Generalization
- For the Anomaly class, most models have precision ranging from 0.8 to 0.87. Exceptions are the models VGG16-3 and VGG19-3, which have relatively low precision of 0.74 and 0.76, respectively. The highest precision of 0.87 is scored by model VGG19-2, having a recall of 0.95. The f1-score ranges from 0.84 to 0.91, with VGG19-2 having the highest and VGG16-3 the lowest. Overall, the model with the best performance for the anomaly class is VGG19-2.
- The Noise class’s precision is very high and ranges from 0.89 to 1.00, with model AlexNet-1 scoring the lowest and models AlexNet-3, VGG16-2, VGG16-3, and VGG19-2 scoring the highest. The recall metric ranges from 0.85 to 0.96, with model VGG19-3 scoring lower and VGG19-2 scoring higher. Finally, the f1-score is very high, ranging from 0.91 to 0.98, with model VGG19-3 scoring the lowest and VGG19-2 scoring the highest. In this case, the model with the best performance is VGG19-2, while VGG19-3 performs the worst.
3.3. Grad-CAM Results
4. Discussion
4.1. Training Performance Comparisons
4.2. Classification Results Comparisons
4.3. Grad-CAM Results Comparison
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Conyers, L.B. Ground-Penetrating Radar for Archaeology; AltaMira Press: Walnut Creek, CA, USA, 2004. [Google Scholar]
- Goodman, D. GPR methods for archaeology. In Seeing the Unseen. Geophysics and Landscape Archaeology; Taylor & Francis: Abingdon, UK, 2009; pp. 229–244. [Google Scholar]
- Manataki, M.; Sarris, A.; Donati, J.C.; Cuenca Garcia, C.; Kalayci, T. GPR: Theory and Practice in Archaeological Prospection. In Best Practices of Geoinformatic Technologies for the Mapping of Archaeolandscapes; Archaeopress Archaeology: Oxford, UK, 2015; pp. 13–24. [Google Scholar]
- Manataki, M.; Vafidis, A.; Sarris, A. GPR Data Interpretation Approaches in Archaeological Prospection. Appl. Sci. 2021, 11, 7531. [Google Scholar] [CrossRef]
- Küçükdemirci, M.; Sarris, A. GPR Data Processing and Interpretation Based on Artificial Intelligence Approaches: Future Perspectives for Archaeological Prospection. Remote Sens. 2022, 14, 3377. [Google Scholar] [CrossRef]
- Shen, D.; Wu, G.; Suk, H.-I. Deep Learning in Medical Image Analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [Google Scholar] [CrossRef] [Green Version]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef] [Green Version]
- Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019, 189, 102805. [Google Scholar] [CrossRef]
- Li, L.; Zhang, S.; Wang, B. Plant Disease Detection and Classification by Deep Learning—A Review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
- Huang, J.; Yang, X.; Zhou, F.; Li, X.; Zhou, B.; Lu, S.; Ivashov, S.; Giannakis, I.; Kong, F.; Slob, E. A deep learning framework based on improved self-supervised learning for ground-penetrating radar tunnel lining inspection. Comput. Aided Civ. Infrastruct. Eng. 2023; early view. [Google Scholar] [CrossRef]
- Li, X.; Liu, H.; Zhou, F.; Chen, Z.; Giannakis, I.; Slob, E. Deep learning–based nondestructive evaluation of reinforcement bars using ground-penetrating radar and electromagnetic induction data. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 1834–1853. [Google Scholar] [CrossRef]
- Elghaish, F.; Matarneh, S.T.; Talebi, S.; Abu-Samra, S.; Salimi, G.; Rausch, C. Deep learning for detecting distresses in buildings and pavements: A critical gap analysis. Constr. Innov. 2021, 22, 554–579. [Google Scholar] [CrossRef]
- Küçükdemirci, M.; Sarris, A. Deep learning based automated analysis of archaeo-geophysical images. Archaeol. Prospect. 2020, 27, 107–118. [Google Scholar] [CrossRef]
- Wunderlich, T.; Wilken, D.; Majchczack, B.S.; Segschneider, M.; Rabbel, W. Hyperbola Detection with RetinaNet and Comparison of Hyperbola Fitting Methods in GPR Data from an Archaeological Site. Remote Sens. 2022, 14, 3665. [Google Scholar] [CrossRef]
- Manataki, M.; Vafidis, A.; Sarris, A. Comparing Adam and SGD optimizers to train AlexNet for classifying GPR C-scans featuring ancient structures. In Proceedings of the 2021 11th International Workshop on Advanced Ground Penetrating Radar (IWAGPR), Valletta, Malta, 1–4 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Abu-Mostafa, Y.S.; Magdon-Ismail, M.; Lin, H.-T. Learning from Data; AMLBook: New York, NY, USA, 2012; Volume 4. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Curran Associates, Inc.: Red Hook, NY, USA, 2012. Available online: https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html (accessed on 29 April 2023).
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar] [CrossRef]
- Chollet, F.; Keras. Keras: Deep Learning for Humans. 2015. Available online: https://keras.io/ (accessed on 8 May 2023).
- Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]
- Lever, J. Classification evaluation: It is important to understand both what a classification metric expresses and what it hides. Nat. Methods 2016, 13, 603–605. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Donati, J.C.; Sarris, A.; Papadopoulos, N.; Kalaycı, T.; Simon, F.-X.; Manataki, M.; Moffat, I.; Cuenca-García, C. A regional approach to ancient urban studies in Greece through multi-settlement geophysical survey. J. Field Archaeol. 2017, 42, 450–467. [Google Scholar] [CrossRef]
- Driessen, J.; Sarris, A. Archaeology and Geophysics in Tandem on Crete. J. Field Archaeol. 2020, 45, 571–587. [Google Scholar] [CrossRef]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar] [CrossRef] [Green Version]
Model | Average Epoch Time (s) | Model’s Total Parameters | Model’s Size (MB) |
---|---|---|---|
AlexNet-1 | ~55.4 | ||
AlexNet-2 | ~173.1 | 58,299,139 | 222 |
AlexNet-3 | ~208.2 | ||
VGG16-1 | ~253.2 | ||
VGG16-2 | ~972.0 | 134,289,731 | 512 |
VGG16-3 | ~283.2 | ||
VGG19-1 | ~309.3 | ||
VGG19-2 | ~1217.8 | 139,604,547 | 533 |
VGG19-3 | ~307.8 |
Model | Class Anomaly | Class Noise | Class Structure | ||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | f1-Score | Precision | Recall | f1-Score | Precision | Recall | f1-Score | |
AlexNet-1 | 0.84 | 0.90 | 0.87 | 0.89 | 0.94 | 0.92 | 0.90 | 0.83 | 0.86 |
AlexNet-2 | 0.85 | 0.95 | 0.90 | 0.98 | 0.90 | 0.94 | 0.92 | 0.91 | 0.91 |
AlexNet-3 | 0.80 | 0.95 | 0.87 | 1.00 | 0.87 | 0.93 | 0.88 | 0.87 | 0.87 |
VGG16-1 | 0.80 | 0.88 | 0.84 | 0.92 | 0.92 | 0.92 | 0.88 | 0.83 | 0.85 |
VGG16-2 | 0.80 | 0.95 | 0.87 | 1.00 | 0.94 | 0.97 | 0.94 | 0.88 | 0.91 |
VGG16-3 | 0.74 | 0.95 | 0.83 | 1.00 | 0.92 | 0.96 | 0.93 | 0.83 | 0.88 |
VGG19-1 | 0.80 | 0.93 | 0.86 | 0.91 | 0.94 | 0.92 | 0.91 | 0.80 | 0.85 |
VGG19-2 | 0.87 | 0.95 | 0.91 | 1.00 | 0.96 | 0.98 | 0.95 | 0.92 | 0.93 |
VGG19-3 | 0.76 | 0.98 | 0.85 | 0.98 | 0.85 | 0.91 | 0.90 | 0.84 | 0.87 |
Highest metric score per class | Lowest metric score per class |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Manataki, M.; Papadopoulos, N.; Schetakis, N.; Di Iorio, A. Exploring Deep Learning Models on GPR Data: A Comparative Study of AlexNet and VGG on a Dataset from Archaeological Sites. Remote Sens. 2023, 15, 3193. https://doi.org/10.3390/rs15123193
Manataki M, Papadopoulos N, Schetakis N, Di Iorio A. Exploring Deep Learning Models on GPR Data: A Comparative Study of AlexNet and VGG on a Dataset from Archaeological Sites. Remote Sensing. 2023; 15(12):3193. https://doi.org/10.3390/rs15123193
Chicago/Turabian StyleManataki, Merope, Nikos Papadopoulos, Nikolaos Schetakis, and Alessio Di Iorio. 2023. "Exploring Deep Learning Models on GPR Data: A Comparative Study of AlexNet and VGG on a Dataset from Archaeological Sites" Remote Sensing 15, no. 12: 3193. https://doi.org/10.3390/rs15123193
APA StyleManataki, M., Papadopoulos, N., Schetakis, N., & Di Iorio, A. (2023). Exploring Deep Learning Models on GPR Data: A Comparative Study of AlexNet and VGG on a Dataset from Archaeological Sites. Remote Sensing, 15(12), 3193. https://doi.org/10.3390/rs15123193