Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Gravity Data-Driven Machine Learning: A Novel Approach for Predicting Volcanic Vent Locations in Geohazard Investigation

GeoHazards 2025, 6(3), 49; https://doi.org/10.3390/geohazards6030049

by Murad Abdulfarraj^1,2

, Ema Abraham³

, Faisal Alqahtani^1,2

and Essam Aboud^4,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

GeoHazards 2025, 6(3), 49; https://doi.org/10.3390/geohazards6030049

Submission received: 22 July 2025 / Revised: 21 August 2025 / Accepted: 25 August 2025 / Published: 29 August 2025

(This article belongs to the Topic Machine Learning and Big Data Analytics for Natural Disaster Reduction and Resilience)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The prediction of volcanic vent locations using gravity data is highly interesting, but several sections of the manuscript—‌particularly regarding the key dataset components‌—lack clarity and require careful reconsideration and revision by the authors.

The introduction fails to adequately summarize current methodologies (both conventional and machine learning-based) for volcanic vent localization using gravity data, including their advancements and limitations.

The caption of Figure 3 should be explicitly revised to "Bouguer gravity anomaly map" (unit: mGal).

The relevance of Bouguer anomaly data to this study requires clarification. Please provide geophysical rationale for this dataset selection.

Reorganize the text arrangement in Figure 5 to improve visual coherence.

Standardize the presentation of the five ML models by: Bolding model names (e.g., Random Forest). Placing model names at the beginning of each description

‌6. Training Dataset Provenance and Reproducibility Concerns‌

The manuscript merely states "spatial coordinates were labeled as vent locations (1) or non-vent areas (0)" without clarifying whether these labels derive from:

(a) ‌Local field measurements‌ within the study area, or

(b) ‌Existing datasets‌ from other volcanic regions.

The authors should explicitly specify in the Methods section:

(a) ‌Training dataset sources‌;

(b) ‌Data partitioning strategy‌ (train/validation/test split ratios), ‌sample sizes‌, and provide ‌geospatial distribution maps‌ of sampling locations to ensure experimental reproducibility.

The discussion should address: Potential regional sensitivity variations across models，Whether performance discrepancies stem from parameter configurations

Figure 9 Annotation Deficiencies Add a color scale bar；Clearly indicate which model's prediction results are displayed

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Author’s Response to Reviewer’s Report

geohazards-3802743 – Gravity Data-Driven Machine Learning: A Novel Approach for Predicting Volcanic Vent Locations in Geohazard Investigation

Thank you very much for all the excellent comments, suggestions and corrections you offered in your review. These have significantly improved our manuscript.

We have overhauled the manuscript and additional information/data has been added to present a more robust outlook of the revised manuscript. Please find the comments below as our response to your review.

Note: Reviewer’s comment is denoted by R and Authors response by A.

R1. The prediction of volcanic vent locations using gravity data is highly interesting, but several sections of the manuscript—‌particularly regarding the key dataset components‌—lack clarity and require careful reconsideration and revision by the authors.

The introduction fails to adequately summarize current methodologies (both conventional and machine learning-based) for volcanic vent localization using gravity data, including their advancements and limitations.

A1. Thank you for this observation. In the revised manuscript, we have expanded the introduction to; summarize field-based geological mapping, remote sensing interpretation, and classical geophysical methods (gravity inversion, residual anomaly mapping, forward modeling), highlighting their strengths in structural delineation but also their limitations in terms of spatial coverage, resolution, and operational cost. We have reviewed prior studies that have applied supervised and unsupervised ML algorithms to volcanic hazard mapping, including vent localization using gravity and other geophysical datasets. We also emphasize that, to the best of our knowledge, no previous studies have applied machine learning specifically to predict volcanic vent locations in the Rahat Volcanic Field using gravity data alone, thereby justifying our approach. Pages 2 – 4.

R2. 2. The caption of Figure 3 should be explicitly revised to "Bouguer gravity anomaly map" (unit: mGal).

A2. Done. Thank you for your suggestion.

R3. 3. The relevance of Bouguer anomaly data to this study requires clarification. Please provide geophysical rationale for this dataset selection.

A3. Thank you for your comment. We have added the comments below to the manuscript. Page 4.

The Bouguer anomaly dataset was selected for this study because it provides a refined representation of subsurface density variations after correcting observed gravity for elevation, latitude, and the gravitational attraction of surface masses. In volcanic terrains such as the Rahat Volcanic Field (RVF), these density contrasts are critical indicators of geologic structures influencing magma ascent (Alqahtani et al., 2022; Abdulfarraj et 2024; Aboud et al., 2023). Low Bouguer anomalies typically correspond to zones of partial melt, vesicular volcanic deposits, or fractured crustal segments that facilitate magma migration and vent formation. Conversely, high Bouguer anomalies may signify solidified intrusions, dense basaltic flows, or unfractured basement rocks that can act as barriers to magma movement (Alqahtani et al., 2022). By working with residual Bouguer anomalies, where regional trends are removed, we emphasize localized subsurface heterogeneities linked to magmatic pathways and potential vent locations. This geophysical basis makes the Bouguer anomaly data a direct and meaningful input for our ML models, ensuring that the predictive features are physically tied to volcanic processes.

R4. Reorganize the text arrangement in Figure 5 to improve visual coherence.

A4. Done. We have replaced Figure 5 with a more detailed section that discusses the concepts for implementation. Thank you for your comments.

R5. Standardize the presentation of the five ML models by: Bolding model names (e.g., Random Forest). Placing model names at the beginning of each description.

A5. Done. Thank you for your suggestion.

R6. 6. Training Dataset Provenance and Reproducibility Concerns‌

The manuscript merely states "spatial coordinates were labeled as vent locations (1) or non-vent areas (0)" without clarifying whether these labels derive from:

(a) ‌Local field measurements‌ within the study area, or

(b) ‌Existing datasets‌ from other volcanic regions.

The authors should explicitly specify in the Methods section:

(a) ‌Training dataset sources‌;

A6. Thank you for your comment. We have added the section below to the manuscript.

2.2 Training Dataset Provenance and Partitioning Strategy

Dataset sources and labeling

All volcanic vent coordinates (label “1”) used in this study were sourced exclusively from the Rahat Volcanic Field (RVF). Verified vent positions were obtained from the compre-hensive geological mapping and vent inventory provided by Robinson and Downs (2023), supplemented by field-validated locations from Alqahtani et al. (2022). Non-vent samples (label “0”) were generated from spatial coordinates within the study area that do not in-tersect with any known vent location. These non-vent points were drawn from the same gravity dataset grid to ensure consistent spatial resolution. No data from other volcanic fields or regions were incorporated.

Labeling process

The binary classification target was created by assigning a value of “1” to coordinates matching verified vent locations and “0” to all other grid points in the gravity dataset. This approach ensures that the model learns from both positive (vent) and negative (non-vent) examples within the same geophysical context.

Data partitioning and sample sizes

The labeled dataset was randomly split into three subsets while maintaining the same proportion of vent and non-vent samples across each:

a) Training set: 80% of total samples
b) Validation set: 10% of total samples
c) Testing set: 10% of total samples

This stratified partitioning prevents class imbalance bias and ensures robust model evaluation.

Geospatial distribution maps

To ensure transparency and reproducibility, we generated geospatial distribution maps of vent and non-vent (every other regions of the study area without marked vents) sample locations for the training, validation, and testing sets (Figure 3). This confirm that all sub-sets are spatially representative of the study area and that no geographic clustering bias was introduced during sampling.

R7. The discussion should address: Potential regional sensitivity variations across models，Whether performance discrepancies stem from parameter configurations.

A7. Thank you for your comments. We have added the comments below to the Discussion to provide more information and address the concerns.

The performance differences observed across the tested models may be partially explained by regional sensitivity variations within the RVF. Areas with pronounced gravity anomalies and high-density vent clusters (particularly in the northern RVF) tend to provide stronger predictive signals, which most models captured with high accuracy. Conversely, regions with weaker or more diffuse anomalies, such as the southern RVF, posed greater challenges for detection, leading to reduced sensitivity in some algorithms. This suggests that the predictive strength of each model is not uniform across the study area and is influenced by spatial heterogeneity in the geophysical signal. Additionally, discrepancies in performance can also be attributed to differences in parameter configurations. The Random Forest model, for example, benefitted from optimized hyperparameters (example, number of estimators, maximum depth), which enhanced its ability to utilize the full feature space and control overfitting. In contrast, the SVM’s relatively lower performance may reflect suboptimal kernel selection and parameter scaling, factors to which this algorithm is highly sensitive in large, heterogeneous datasets. These findings underscore the importance of both regional geophysical context and careful parameter tuning in achieving robust volcanic vent prediction performance.

R8. Figure 9 Annotation Deficiencies Add a color scale bar；Clearly indicate which model's prediction results are displayed

A8. Done. Thank you for your observation.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Authors performed manuscript titled "Gravity data-driven machine learning: a novel approach for predicting volcanic vent locations in geohazard investigation". Authors attempted to evaluate various machine learning algorithms using high resolution gravity data from Rahat volcanic field (Saudi Arabia). The goal was to elaborate AI method for better prediction of vent locations for hazard monitoring. As a consequence, authors picked the one Random Forest model with the best evaluation metrics: prediction accuracy (95%) and correlation (75%) with the current vent coordinates. In my opinion it is questionable what kind of data authors used. In the figure 4, indeed, It is marked that two types of data were used. However, in the body text there is no one word on testing data, but only training data was described. How authors can explain this? This is odd. Authors "tested models" but no unknown, blind data was used for evaluation but randomly picked and with no clearly presented attributes. I understand, that validation of the methodology proposed is barely possible in real environment with this particular test site. So, it looks as a very theoretical issue to me, and in such a way, contribution to hazard monitoring is of the low level. As I find the material, authors rather contributed to site geology characteristics and petrology. Please notice that with more detailed/advanced petrological/volcanological input that manuscript could be of a higher scientific value. I suggest modifying structure to publish material in the journal of the alternative content as: petrology or volcanology.

Selected comments:

1.Please clearly present data used providing limitations.

2.Authors provided data, methods, results, unfortunately, analysis section wasn't provided. Please improve.

3.Discussion section was badly arranged. The section contains characteristics of the test site and description of the figures from the previous pages (sections) but should interpret the results and refer to the literature providing suggestions for the future studies. Please modify the structure. Additionally, please provide some clarifications if any similar investigations were carried out elsewhere. And if were checked in field reconnaissance and what is the potential to do so. Please discuss the nature of the problem with its capabilities, potential, limitations, concerns.

4.Figures also require corrections. It would be nice to see some satellite/optical photos of the described test site and the vents. As a remark: fig. 1 (tectonic scheme/map) and fig. 2 (geological map) should be better correlated with size and place. The same for data figure (figure 3). Please zoom in the site, or superimposed the figures.

Other comments:

L.36˗"signify a significant"? Please improve style

L.39˗Keywords should better fit. They are not very descriptive.

L.56˗"indirect indicators"? Please improve style.

L.89/91˗Figure (3) or (Figure 3). Please decide.

L.93˗Please provide a source. Please unify the scale units for all figures. Please correlate data figure with the test site, graphically. Figures have scale and coordinates however good graphics makes a better job.

L.158/162˗"concepts" or "concept". Please decide.

L.162˗It is process flow, isn't it?

L.188˗189˗This is processing, not a results. Please modify the structure.

L.191˗193˗Please note that interpretation of the model is not a result. Remark as above. Authors probably thought “understand”.

L.194˗199˗This paragraph is on methods used. Remark as above.

L.202˗Please improve the caption.

L.222˗What algorithm was used for density map creation? Please clarify and improve the caption.

L.253˗266˗This material characterizes tectonics and geology of the test site. This content should be placed in introduction. Please reorganize as mentioned above.

Author Response

Author’s Response to Reviewer’s Report 2

geohazards-3802743 – Gravity Data-Driven Machine Learning: A Novel Approach for Predicting Volcanic Vent Locations in Geohazard Investigation

We are grateful for your comments and suggestions towards the improvement of our manuscript. The attached reviewer inputs on the manuscript have also been very helpful in providing clarity, corrections, rephrasing and inputs which have been duly applied to greatly improve the quality of our manuscript. Our comments below, are responses to your review comments. Thank you.

Note: Reviewer’s comment is denoted by R and Authors response by A.

R1. Authors performed manuscript titled "Gravity data-driven machine learning: a novel approach for predicting volcanic vent locations in geohazard investigation". Authors attempted to evaluate various machine learning algorithms using high resolution gravity data from Rahat volcanic field (Saudi Arabia). The goal was to elaborate AI method for better prediction of vent locations for hazard monitoring. As a consequence, authors picked the one Random Forest model with the best evaluation metrics: prediction accuracy (95%) and correlation (75%) with the current vent coordinates. In my opinion it is questionable what kind of data authors used. In the figure 4, indeed, It is marked that two types of data were used. However, in the body text there is no one word on testing data, but only training data was described. How authors can explain this? This is odd. Authors "tested models" but no unknown, blind data was used for evaluation but randomly picked and with no clearly presented attributes. I understand, that validation of the methodology proposed is barely possible in real environment with this particular test site. So, it looks as a very theoretical issue to me, and in such a way, contribution to hazard monitoring is of the low level. As I find the material, authors rather contributed to site geology characteristics and petrology. Please notice that with more detailed/advanced petrological/volcanological input that manuscript could be of a higher scientific value. I suggest modifying structure to publish material in the journal of the alternative content as: petrology or volcanology.

A1. We appreciate the reviewer’s detailed feedback and understand the concerns regarding the description of the testing data and the perceived applicability of our results to hazard monitoring. We have addressed the reviewer’s concerns by revising the Materials and Methods and Discussion sections accordingly.

Clarification on testing data:

The dataset used in this study was split into three subsets, training (80%), validation (10%), and testing (10%), using stratified random sampling to maintain the same vent/non-vent ratio in each subset.

The testing dataset consisted exclusively of spatial coordinates and gravity anomaly values not used in model training or validation. These points were “unknown” to the model during training, ensuring that evaluation metrics (including the reported 95% accuracy and 75% spatial correlation) were computed on truly unseen data.

We have explicitly stated in the revised manuscript (Discussion section – Page 11) that the model performance evaluation was based solely on this held-out testing set, which functioned as a blind dataset within the study’s context.

Addressing real-environment validation limitations:

We acknowledge that field validation of predicted vent locations is challenging in the RVF due to possible burial by younger flows or limited surface exposure. We have added this as a limitation in the Discussion and emphasize that while our approach provides a robust proof-of-concept using existing gravity data, further validation, either through targeted field campaigns or integration with complementary geophysical datasets, is essential for operational hazard monitoring.

Scientific contribution scope:

While our primary focus was methodological, the work does contribute to understanding the geological structure of the RVF by correlating gravity anomalies with known vent distributions and identifying potential concealed vents.

In response to the reviewer’s suggestion, we have strengthened the petrological and volcanological context by incorporating additional lithological, eruption history, and structural framework information from prior studies (Introduction section - Pages 3 and 4), making the manuscript more valuable to readers in both applied geohazards and fundamental volcanic research.

R2. - Selected comments:

Please clearly present data used providing limitations.

A2. Thank you for your comments. We have added the information below to various sections of the manuscript to provide more clarification on the data utilized as advised. Pages 5 – 8.

Gravity dataset

The gravity data employed in this investigation was sourced from Alqahtani et al. (2022). A total of 149 gravity stations, spaced at intervals of 0.2-1 km, were processed to generate a residual Bouguer anomaly map (Figure 3). Alqahtani et al. (2022) have extensively elucidated the intricate processing steps, encompassing various corrections applied to the dataset. The data presented in Figure (3) was utilized for further analysis in the current study.

Vent location dataset

Verified volcanic vent coordinates (positive class, label “1”) were obtained from detailed geologic mapping by Robinson and Downs (2023) and supplemented with field-verified vent positions from Alqahtani et al. (2022). These include scoria cones, maars, cryptodomes, and craters mapped across the RVF. Non-vent samples (negative class, label “0”) were generated from spatial coordinates within the gravity dataset that did not overlap with any known vent locations.

Labeling approach

Each data point was assigned a binary label based on its spatial coincidence with verified vent coordinates (“1” for vent, “0” for non-vent regions). This ensured that model training incorporated both positive and negative examples from the same geophysical and geological context.

Data partitioning

The labeled dataset was stratified to preserve the vent/non-vent ratio and randomly split into:

Training set (80%) – model fitting.

Validation set (10%) – hyperparameter tuning and overfitting control.

Testing set (10%) – final blind evaluation, never used in model training or tuning.

Possible Limitations of the dataset

Incomplete surface representation: Some vents may remain undetected due to burial beneath younger lava flows, pyroclastic deposits, or erosion, potentially leading to underrepresentation of the positive class.
Resolution constraints: The gravity station spacing limits the ability to resolve very small or subtle density anomalies.
Single dataset dependency: Additional datasets (from other methods) could improve predictive robustness.
Field validation – No direct field verification of predicted vent locations was conducted due to logistical constraints; this remains a recommended step for future work.

R3. 2. Authors provided data, methods, results, unfortunately, analysis section wasn't provided. Please improve.

A3. Thank you for your comment. In addition to the analysis we provided in the Discussion section, we have created a new section (Section 4) in the revised manuscript, dedicated to Interprets model outputs, comparing evaluation metrics across all tested algorithms and explaining the reasons for performance differences based on the data characteristics and model properties. Explores spatial patterns, analyzing the distribution of predicted vents relative to known vents, structural lineaments, and gravity anomaly zones, and quantifying agreement using correlation coefficients and density distribution metrics and Examines feature importance, for the Random Forest model, highlighting which gravity-derived features contributed most to vent prediction.

R4. 3. Discussion section was badly arranged. The section contains characteristics of the test site and description of the figures from the previous pages (sections) but should interpret the results and refer to the literature providing suggestions for the future studies. Please modify the structure. Additionally, please provide some clarifications if any similar investigations were carried out elsewhere. And if were checked in field reconnaissance and what is the potential to do so. Please discuss the nature of the problem with its capabilities, potential, limitations, concerns.

A4. Done. Thank you for your comments and suggestions. The entire Discussion section has been overhauled to incorporate all your suggestions to include our method’s capabilities, potential, limitations, and concerns.

R5. 4. Figures also require corrections. It would be nice to see some satellite/optical photos of the described test site and the vents. As a remark: fig. 1 (tectonic scheme/map) and fig. 2 (geological map) should be better correlated with size and place. The same for data figure (figure 3). Please zoom in the site, or superimposed the figures.

A5. We thank the reviewer for the suggestion to include satellite/optical photographs of the Rahat Volcanic Field and its vents. However, we could not include these in the current submission in order to maintain clarity and avoid visual clutter in the presentation. The vent locations used in this study were already derived from a combination of high-resolution satellite image interpretation and field mapping (Robinson & Downs, 2023; Alqahtani et al., 2022). Therefore, the imagery served as an integral part of the dataset preparation process rather than as a separate figure. To address the reviewer’s other valuable points, we have improved the correlation between Figures 1, 2, and 3 by aligning their spatial extents and scales, and by zooming in on the core study area where appropriate. These modifications enhance figure coherence while preserving the manuscript’s readability and focus.

R6. Other comments:

L.36˗"signify a significant"? Please improve style

L.39˗Keywords should better fit. They are not very descriptive.

L.56˗"indirect indicators"? Please improve style.

L.89/91˗Figure (3) or (Figure 3). Please decide.

L.158/162˗"concepts" or "concept". Please decide.

L.162˗It is process flow, isn't it?

L.188˗189˗This is processing, not a results. Please modify the structure.

L.191˗193˗Please note that interpretation of the model is not a result. Remark as above. Authors probably thought “understand”.

L.194˗199˗This paragraph is on methods used. Remark as above.

L.202˗Please improve the caption.

L.222˗What algorithm was used for density map creation? Please clarify and improve the caption.

L.253˗266˗This material characterizes tectonics and geology of the test site. This content should be placed in introduction. Please reorganize as mentioned above.

A6. Thank you very much for the corrections and suggestions. We have duly effected all the corrections in our revised manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

No further comments. Please just check for grammatical and spelling errors.

Author Response

Author’s Response to Reviewer’s Report

geohazards-3802743 – Gravity Data-Driven Machine Learning: A Novel Approach for Predicting Volcanic Vent Locations in Geohazard Investigation

Note: Reviewer’s comment is denoted by R and Authors response by A.

R1. No further comments. Please just check for grammatical and spelling errors.

A1. We sincerely thank the reviewer for his review and acceptance of the revised version of our manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

Manuscript is improved according to the reviewer's suggestions. All clarifications are involved. Please change the title "4.Analysis" to "4.Interpretation of results".

Author Response

Author’s Response to Reviewer’s Report

geohazards-3802743 – Gravity Data-Driven Machine Learning: A Novel Approach for Predicting Volcanic Vent Locations in Geohazard Investigation

Note: Reviewer’s comment is denoted by R and Authors response by A.

R1. Manuscript is improved according to the reviewer's suggestions. All clarifications are involved. Please change the title "4.Analysis" to "4.Interpretation of results".

A1. We sincerely thank the reviewer for the positive feedback and acknowledgment that the manuscript has been improved according to the reviewer’s suggestions. We appreciate the confirmation that all requested clarifications have been incorporated.

Regarding the specific suggestion to change the title from "4. Analysis" to "4. Interpretation of results," we have implemented this revision as requested. The section heading has been updated to "4. Interpretation of results" to better reflect the content and provide clearer guidance to readers about the nature of the discussion within this section.

We believe this change enhances the manuscript's clarity and organization, and we are grateful for the reviewer's constructive feedback throughout the review process.

Article Menu

Gravity Data-Driven Machine Learning: A Novel Approach for Predicting Volcanic Vent Locations in Geohazard Investigation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI