Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data

Abdulfarraj, Murad; Abraham, Ema; Alqahtani, Faisal; Aboud, Essam

doi:10.3390/geosciences14120328

Open AccessArticle

Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data

¹

Geohazards Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Department of Petroleum Geology and Sedimentology, Faculty of Earth Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia

³

Department of Geology/Geophysics, Alex Ekwueme Federal University, Ndufu-Alike Ikwo, P.M.B. 1010, Abakaliki 482131, Ebonyi State, Nigeria

^*

Author to whom correspondence should be addressed.

Geosciences 2024, 14(12), 328; https://doi.org/10.3390/geosciences14120328

Submission received: 24 September 2024 / Revised: 26 November 2024 / Accepted: 27 November 2024 / Published: 3 December 2024

(This article belongs to the Section Geophysics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study investigates the application of machine learning techniques for predicting volcanic vent locations based on aeromagnetic geophysical data. Magnetic data, known to reflect subsurface geological structures, presents a valuable source of information for understanding volcanic activity. Leveraging this data, we aim to develop and validate predictive models capable of discerning the presence of volcanic vents. Through a comprehensive data analysis, feature engineering, and model training, we explore the intricate relationships between magnetic variations and volcanic vent locations. Various machine learning algorithms were evaluated for their efficacy in binary classification, with a focus on identifying areas with a high likelihood of volcanic vent presence. The Random Forest model (RFM) was adopted given its high performance metrics, achieving a prediction accuracy of 92%. Our results demonstrate the successful prediction of volcanic vent locations, with a significant correlation of 86% between the actual and predicted vent locations and a high Degree of Certainty (DC) at 97%. This research contributes to the advancement of geospatial data analysis within the field of geoscience, showcasing the potential of machine learning in interpreting and utilizing magnetic data for volcanic hazard assessment and early warning systems. The findings represent a significant step towards enhancing our understanding of volcanic dynamics and improving the predictive tools available for volcanic hazard assessment.

Keywords:

volcanic vents; magnetic; machine learning; random forest regression; geohazard

1. Introduction

Volcanic eruptions are natural phenomena that pose significant hazards to human populations, infrastructure, and the environment. Timely and accurate prediction of volcanic vents is crucial for mitigating these hazards and implementing effective emergency response plans. The concealment of volcanic vents beneath the Earth’s surface is an important consideration for volcanic hazard assessment and monitoring. It underscores the need for comprehensive geological and geophysical studies to understand the subsurface structure and potential volcanic hazards in volcanic regions. Predicting volcanic vents is a critical aspect of geohazard investigations, as it can significantly contribute to the mitigation of volcanic risks and the protection of human lives and infrastructure. Traditional methods of volcanic hazard assessment rely heavily on geological surveys [1,2], seismic monitoring [3,4,5], and ground deformation measurements [6,7]. However, these methods often have limitations in regard to providing early warnings and precise predictions of vent locations in terms of spatial coverage, resolution, and real-time monitoring capabilities. Recent advancements in geophysical data analysis, particularly using magnetic anomalies, offer promising opportunities to enhance our understanding and predictive capabilities in volcanic hazard investigations. Magnetic anomalies result from variations in the magnetic properties of rocks, which can be influenced by the presence of hydrothermal alterations, magma chambers, and volcanic vents beneath the Earth’s surface. Integrating machine learning techniques with magnetic data provides a robust framework for detecting subtle changes in magnetic anomalies that may correlate with volcanic vent locations. Recent advancements in machine learning and geospatial data analysis have opened up new possibilities for predicting vent locations using magnetic data.

This study aims to contribute to the field of geohazard investigation by developing a novel machine learning framework specifically designed for predicting vents at volcanic fields using magnetic data. By harnessing the power of machine learning algorithms in conjunction with the advanced feature engineering of magnetic anomalies, this research seeks to improve the accuracy and reliability of vent location predictions. The proposed framework not only leverages the spatial and temporal resolution capabilities of magnetic data but also enhances our ability to discern complex subsurface structures associated with volcanic activity.

The magnetic data, which records the strength and direction of the Earth’s magnetic field at various locations, is becoming more widely acknowledged as a valuable resource for comprehending subsurface geological formations, such as the locations of volcanic vents. Magnetic anomalies connected with volcanic activity can provide valuable insights into subsurface structures, including magma chambers and pathways where the susceptibility of the rocks are varied, which are essential for understanding the potential locations of vents.

Machine learning frameworks, including supervised and unsupervised learning algorithms, can be trained on magnetic data to identify patterns and relationships that are indicative of volcanic vent locations. By extracting relevant features from magnetic measurements and incorporating geospatial information, these models can learn to make spatial predictions of vent locations within volcanic fields. By applying machine learning to magnetic data, our approach aims to leverage the relationship between magnetic characteristics and the presence of volcanic vents to make informed predictions about potential vent locations. This would be a valuable tool for volcanic monitoring and risk assessment. The significance of this research lies in its potential to revolutionize current practices in volcanic hazard assessment by providing early warnings and precise predictions of vent locations. These advancements are expected to have profound implications for disaster management agencies, urban planners, and communities residing in volcanic regions worldwide. By advancing our ability to forecast volcanic vents using machine learning and magnetic data, this study represents a critical step forward in enhancing public safety and mitigating the impacts of volcanic eruptions on society and the environment. In this paper, we present the methodology, results, and implications of our research, demonstrating how the integration of machine learning with geophysical data can pave the way for more effective volcanic hazard assessments and management strategies.

The study area is the northern part of the Rahat volcanic field (Figure 1), the Cenozoic lava field in western Saudi Arabia. It is one of the largest volcanic fields on the western margin of the peninsula and a part of the Red Sea rift. At an approximate coverage area of 20,000 km², the Rahat evolution commenced at 10 Ma [8] with 36 shield volcanoes, 644 scoria cones, and 24 domes [9,10,11]. Harrat Rahat comprises over 900 exposed vents, including maars, cryptodomes, craters, and scoria cones. Among these, 289 vents are isolated by more recent volcanic deposits and have not been associated with the 234 distinct volcanic rock units identified through geological mapping [12]. A recent investigation and mapping effort focused on 32 geologic units that erupted within the northern Rahat volcanic field (RVF) after 1 Ma [13]. These units consist of mugearite, tholeiitic and continental basalts, hawaiites, and intraplate alkali (Figure 2).

The RVF is elongated in an N–S direction of about 310 km in the direction of the Makkah–Madinah–Nafud (MMN) volcanic line and comprises four smaller harrats that form an elongated shield-like shape with noticeable linear vents in the central area [9,14].

The most recent volcanic event in the region occurred in 1256 A.D.; it is known as the Al-Madinah eruption or historical eruption and resulted in a 2.25 km long fissure eruption and the formation of a 23 km long lava flow east of Medina. There were earlier eruptions, such as in 641 CE, which made finger-like flows to the east of the 1256 A.D flow [15], thus fueling concerns of reoccurrences in the near future. This volcanic field appears to be the biggest lava field in Saudi Arabia [16]. In addition to these concerns, there has also been a recent increase in seismic activity in the region since 2009 [17]. This significant 1256 A.D eruption lasted for 52 days, during which 0.5 km³ of alkali olivine basalt was extruded from the fissure, leading to the creation of six scoria cones. The extensive lava flow reached a distance of nearly 8 km from the city of Al-Madinah. The erupted basaltic rocks encompass a diverse range, including alkaline olivine basalts to hawaiite, as well as olivine transitional basalts. Additionally, formations of Benmoreite, mugerarite, and trachyte are present in the form of tuff and lava flows. The Harrat Al-Madinah basalts are further categorized into lower and upper Al-Madinah Al-Munawwarah basalts, as documented by Camp et al. [8]. This eruption event provides valuable insights into the geological history and volcanic activity of the region, shedding light on the complex processes associated with volcanic events in this area. Further details regarding the general geology in the study area are provided in Aboud et al. [9], Alqahtani et al. [10], Robinson and Downs [12], Al-Amri et al. [14], El-Hussain et al. [18], and Moufti et al. [19].

Figure 1. Cenozoic lava fields of western Saudi Arabia with their distribution of ages (modified from [9,19]). The inserted yellow square portrays the study area.

Figure 2. Geology map of the region considered in this study (modified from [10]).

The RVF is of special interest owing to its proximity to the city of Al-Madinah al Munawwarah, which sits within, and is continuing to expand southward over, the north end of the volcanic field [12]. Multiple volcanic vent systems exist within the study area (Figure 3). Such settings have been considered as pointers to volcanic centers that host very large high-temperature reservoirs [20]. The relationship between the number of volcanic vent and magma reservoir characteristics is not always a simple direct correlation. Some large reservoirs may have a low number of vents, while smaller reservoirs can sometimes have multiple vents. Nevertheless, multiple vents may often indicate a larger underlying magma reservoir system, and larger magma chambers typically have the capacity to feed multiple surface expressions [21]. The depth and geometry of the reservoir may affect vent distribution just as local tectonics and stress field play major roles in vent formation. In addition, rock type and structural features influence how easily multiple pathways can form [22]. An inversion of the gravity field data from northern Rahat [23] indicated a wide depression of the basement surface. This depression appears deeper along the primary vent axis of the eastern and southwestern parts of the volcanic field. A less dense basement underneath the vent axis in the field was also identified and suggested to have arisen from lithological deviations in the basement that occurred prior to Cenozoic volcanism. In a series of fieldwork spanning through years 2014, 2015, 2016, and 2017, ref. [12] successfully identified probable contacts between volcanic rocks and deposits through examination of shaded relief images produced from digital elevations and from satellite photographic images in a geographic information system (GIS). Mapped units were then characterized, correlated, and distinguished by thin-section petrographic, geochemical, paleomagnetic, and geochronologic studies.

Volcanic vents and eruptive fissures were delineated and characterized using a combination of advanced remote sensing methodologies and comprehensive field studies at Harrat Khaybar, located further north of Harrat Rahat (Alohali et al. [24]). Their in-depth analysis unveiled the evolutionary history of Harrat Khaybar, revealing a progression through five distinct eruptive phases. Notably, the spatial distribution of vent locations exhibited a notable convergence towards the central axis, resulting in the formation of a prominent north–south (NS) trend. Furthermore, a concentrated cluster of vents was observed along the axis of the regional Makkah–Madinah–Nafud (MMN) line. Drawing from their research, the team derived estimations for the cumulative probabilities of at least one eruption occurring within Harrat Khaybar over the next 100 years. The calculated probabilities ranged from 1.09% to 16.3%, representing the lower and upper bounds, respectively, with the highest probabilities being centered within the region of the central axis. This insight provides valuable implications for understanding the volcanic activity and associated risks within the Harrat Khaybar region.

Runge et al. [25] devised a statistical approach for identifying eruptive events from observable vents. Their methodology was evaluated using the 968 vents within the Harrat Rahat volcanic field, spanning from 10 Ma to 0.6 ka. Furthermore, they conducted an assessment to estimate the quantity of concealed vents within a substantial volcanic deposit in order to establish an enhanced spatial recurrence rate estimate. The study yielded an average temporal recurrence rate of 7.5 × 10⁻⁵ events per year for eruptive events. With application of some geophysical techniques of tilt derivative, Euler deconvolution, and 2D modeling inversion, Aboud et al. [26] analyzed gravity and aeromagnetic data from the northern part of the RVF. The outcomes of their study indicated that the thickness of the lava flows in the research area ranged from 100 m on the eastern and western sides to as much as 300–500 m in the central part.

Several geophysical studies ([9,10]) have also been conducted at the Rahat region for the purposes of investigating the geothermal potentials of the region. The objective of the present study is to leverage machine learning techniques to explore patterns and correlations within the magnetic data and discern potential indicators associated with the presence of volcanic vents. This study would contribute to addressing the critical need for improved volcanic hazard assessment and management, as well as the development of early warning systems.

2. Methodology

The magnetic data used in this study were extracted from a collection of aeromagnetic datasets from the periods of 1962 to 1983. Surveys were carried out over designated individual blocks with ground clearances set at 150, 300, and 500 m, employing a line spacing of approximately 800 m. Further information on the aeromagnetic data has been provided by Alqahtani et al. [10] and Zahran et al. [27]. Figure 3 shows the extracted aeromagnetic data covering the study area and also displays known vent locations. The slight mismatch in some vent locations between Figure 2 and Figure 3 likely arises due to differences in the data sources used to prepare the maps. The geological map (Figure 2) was derived from some early geological surveys (1958–1963), while the vent locations on map (Figure 3) were extracted from recent satellite imagery from the Saudi Geological Survey (2022). These sources may have different levels of accuracy or interpretations in terms of the precise locations of vents. Therefore, considering the vent boundaries, some vents might be represented as points in one map and as larger features (e.g., vent clusters or craters) in another, as seen in Figure 2. This simplification or generalization could lead to apparent mismatches. The vent data on the general geology map may not have been updated to match recent satellite observations and geological studies. The vent locations depicted in Figure 3, derived from high-resolution satellite imagery, are likely more reliable than those in Figure 2 due to the inherent advantages of remote sensing. Satellite-derived data provide precise geospatial information, consistent coverage, and the ability to detect subtle volcanic features that may not be evident in older field surveys or geological maps. Furthermore, the alignment of these vents with gravity lows and other geological patterns supports their validity. While the slight discrepancies between Figure 2 and Figure 3 exist, the advanced techniques used to derive vent data in Figure 3 make it a more robust dataset for predicting volcanic vent locations.

Machine learning techniques would be used to analyze the magnetic characteristics of known volcanic vent locations and to also make predictions about the likelihood of new volcanic vents present at other geographical coordinates. Multiple machine learning models were taken into account for this study. The criteria for consideration was based on available data and the complexity of the relationships between the features and target variable. The target variable was defined as the presence or absence of a new volcanic vent at specific geographical coordinates. This was represented as binary labels (1 for presence, 0 for absence). With the target variable definition, the model was trained using existing vents as positive examples and other non-vent locations as negative examples in order to learn the patterns associated with volcanic vent locations in the magnetic data.

It is often beneficial to experiment with multiple models and compare their performance using appropriate evaluation metrics to determine the most effective model for a specific investigation. In our study, we considered a range of machine learning models, each with its own strengths and suitability for different tasks and datasets. The models we focused on include:

Logistic Regression:
-
A simple and interpretable model suitable for binary classification tasks.
-
Provides the probability of the input belonging to a certain class.
-
Can be extended to handle multi-class classifications.
Decision Trees and Random Forests:
-
Capable of capturing non-linear relationships and interactions between features.
-
Decision trees are easy to interpret and visualize.
-
Random Forests are an ensemble of decision trees, providing improved generalization and robustness.
Gradient Boosting Machines (GBMs):
-
Captures complex relationships in the data by combining multiple weak learners.
-
Can handle non-linear relationships and interactions.
-
Often provides high predictive accuracy.
Support Vector Machines (SVMs):
-
Effective for binary classification tasks, especially when the decision boundary is non-linear and the feature space is high-dimensional.
-
Can be extended to handle multi-class classification.
-
Utilizes the concept of maximizing the margin to find the optimal decision boundary.

We recognize the importance of experimenting with multiple models and comparing their performance using appropriate evaluation metrics to determine the most effective model for our specific investigation. We plan to assess the models using common evaluation metrics such as accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC) for binary classification tasks. By comparing the performance of these models on our dataset, we aim to identify the most effective model based on the specific characteristics of our data and the goals of our study. Regression algorithms are a type of machine learning algorithm used to predict numerical values based on input data. The goal of regression is to find a mathematical relationship between the input features and the target variable that can be used to make accurate predictions on new, unseen data [28].

Logistic regression, a supervised machine learning algorithm, excels in binary classification tasks by predicting the probability of a specific outcome, event, or observation. This model yields a binary or dichotomous outcome, offering two possible results, such as yes/no, 0/1, or true/false [29].

Equation (1) represents logistic regression:

y = \frac{e^{(b_{0} + b_{1} X)}}{{1 + e}^{(b_{0} + b_{1} X)}}

(1)

where

X =

input value,

y =

predicted output,

b_{0} =

bias or intercept term,

b_{1} =

coefficient for input (

X

),

Decision trees: Supervised machine learning operations that model decisions, outcomes, and predictions using a flowchart-like tree structure. A demonstration of the basic decision tree flow chart is represented in Figure 4.

The Gini index measures the chances or likelihood of a randomly selected data point misclassified by a particular node. The cost function for evaluating feature splits in a dataset is the Gini index. The Gini index is given by Equation (2);

G i n i = 1 - \sum_{i = 1}^{n} {(p_{i})}^{2}

(2)

where p = probability of an object being classified to a particular class.

Attribute Selection Measure (ASM) is a technique used for selecting the best attribute for discrimination among tuples. It gives rank to each attribute, and the best attribute is selected as the splitting criterion. When considering decision trees, they aid in constructing a suitable tree for choosing the optimal splitter, with values typically ranging between 0 and 1.

Random Forest stands out as a highly potent machine learning model in predictive analytics, establishing itself as a cornerstone in industrial machine learning applications. This model operates as an additive model, leveraging a series of base models to make predictions effectively. For a new input data point, the Random Forest Regression model makes a prediction by aggregating the predictions of all the individual trees in the forest. The mathematical equation can be represented as:

\hat{y} = \frac{1}{N} \sum (i = 1 t o N) f_{i} (x)

(3)

where

\hat{y} =

the predicted output for the input data point

(x)

;

N =

total number of trees in the Random Forest;

f_{i} (x) =

the prediction of

i t h

Decision Tree for the input data point

(x)

.

This equation represents the averaging of the predictions of all the individual trees in the Random Forest to obtain the final prediction for the input data point. For more on the Random Forest regression, see Breiman [30].

A GBM, which integrates gradient descent into boosting, follows the same algorithm as gradient boosting. This decision tree-based model utilizes a gradient descent approach to determine the

α

(step size) for a tree with T leaves. To calculate

α

at, for example, iteration m, and given several derivations in the region

{' R}_{j m}

(leaf j), the basic equation for a GBM is [31]:

f_{m} (x) = f_{m - 1} (x) + \sum_{j = 1}^{T} α_{j m} {' R}_{j m} (x)

(4)

\propto_{j m} = \arg {m i n}_{\propto} \sum_{x_{i} \in R_{j m}} L (y_{i}, f_{m - 1} (x_{i}) + α)

(5)

where

f_{m} (x)

is our model at iteration m,

y

is the actual value, and L is the loss function.

SVMs: A binary SVM aims to classify subjects into one of two classes by utilizing specific features. The basic mathematical equation for Support Vector Machines (SVMs) can be described as follows:

Given a set of training data (x₁,y₁), (x₂,y₂), ..., (x_n,y_n), where xi represents the input features and yi represents the class labels (either −1 or

1

for binary classification), the goal of a SVM is to find the optimal hyperplane that separates the data into different classes while maximizing the margin. The equation for the decision function of a linear SVM can be represented as:

f (x) = s i g n (w * x + b)

(6)

where

f(x) is the decision function that predicts the class of the input x;

w is the weight vector;

x is the input feature vector;

b is the bias term;

* represents the dot product.

The optimal hyperplane is determined by finding the weight vector w and the bias term b that maximize the margin while correctly classifying the training data. This is achieved by solving the optimization problem of maximizing the margin subject to the constraint so that the data are correctly classified. In the case of non-linear Support Vector Machines (SVMs), the process entails employing kernel functions to transform the input features into a higher-dimensional space. This transformation allows for the use of a linear hyperplane to effectively separate the classes. The mathematical formulations for the optimization problem and the utilization of kernel functions in non-linear SVMs are notably intricate, involving the principles of Lagrange multipliers and the dual form of the optimization problem. Further insights into SVMs are given by Vapnik [32] and James et al. [33].

Concept Implementation

In this study, we used the following input data for the machine learning model:

Longitude: The geographic longitude of each data point, which provides spatial information relevant to the location of magnetic anomalies and volcanic vents.
Latitude: The geographic latitude of each data point, paired with longitude to give a complete spatial reference for the dataset.
Magnetic Anomaly: The measured magnetic field intensity at each data point. This value is critical for identifying subsurface geological features that may be associated with volcanic activity.
Vent Presence (Yes/No): A binary indicator that specifies whether a volcanic vent is present at the corresponding location. This variable serves as the target variable for the regression model, allowing us to train the model to recognize patterns associated with vent locations.

The concepts for implementation were as follows:

-: Data Collection and Preprocessing: The magnetic data should include information about the longitude, latitude, and geomagnetic characteristics (magnetic field intensity) of known volcanic vent locations. Data from non-vent locations could also be included to provide a contrast for the model in order to learn the differences between vent and non-vent areas. The data would be preprocessed to handle missing values, scale the features, and prepare for model training.
-: The target variable for the machine learning input would be defined as the presence or absence of a new volcanic vent at specific geographical coordinates.
-: Feature Engineering—Geomagnetic Features: The geomagnetic data, including magnetic field intensity and other relevant characteristics, would serve as the features for the model (note: other relevant geographical or geological features could also be included, if available).
-: Model Training—Training Data: The model would be trained using the known volcanic vent locations as positive examples (label 1) and non-vent locations as negative examples (label 0).
-: We used a portion of the vent locations and magnetic field data for training and reserved a separate set for model evaluation to assess its performance reliably. Specifically, we applied an 80–20 split, where 80% of the data was randomly selected for training and the remaining 20% was used exclusively for testing.

Model Training—Machine Learning Model: A binary classification model, as with logistic regression, decision tree, Random Forest, GBM and SVM, was trained to learn the patterns associated with volcanic vent locations in the geomagnetic data.

To ensure that the model learned spatial patterns in the magnetic field data rather than simply memorizing the coordinates of known vents, we implemented several techniques. First, we applied a stratified k-fold cross-validation approach, ensuring that each fold included a diverse mix of vent and non-vent locations spread across different regions. This forced the model to generalize from patterns in the magnetic anomalies rather than focusing on specific vent coordinates. Additionally, we experimented with data augmentation by creating artificial spatial offsets in both the longitude and latitude. This technique introduced minor shifts to the magnetic field data while preserving underlying patterns, helping the model focus on features associated with vent presence rather than absolute positions.

We approximated each vent as a central point to simplify the modeling process, as the exact vent boundaries were not always clearly defined in the dataset. When multiple mesh nodes fell within the extent of a vent, we assigned a target value of 1 to all such nodes, treating them as being representative of the larger vent structure. This approach allowed the model to recognize the spatial extent of vents without requiring detailed boundary information.

Hyperparameters in machine learning are the configuration settings that are external to the model and are not learned from the data. These parameters are set prior to the training process and are used to control the learning process. For this study, the hyperparameters associated with the Random Forest algorithm are n_estimators (this hyperparameter determines the number of decision trees in the Random Forest; each tree in the forest is trained on a random subset of the training data, and n_estimators control how many of these trees are used in the ensemble), max_depth (this hyperparameter sets the maximum depth of each decision tree in the Random Forest, and controls the maximum number of levels in each decision tree, which can help prevent overfitting), min_samples_split (this hyperparameter sets the minimum number of samples required to split an internal node during the construction of a decision tree; it can help control the complexity of the model and prevent overfitting), min_samples_leaf (this hyperparameter sets the minimum number of samples required to be at a leaf node; it can also help control the complexity of the model and prevent overfitting), and bootstrap (this hyperparameter determines whether bootstrap samples are used when building trees; if set to True, each tree in the random forest is built on a bootstrap sample of the training data, which introduces randomness and helps prevent overfitting, and the depth of a decision tree). Therefore, to improve the performance of the machine learning model, model training and evaluations were performed with the best hyperparameters derived from tuning as follows: n_estimators (100, 200, 300), max_depth (10, 20, 30, None), min_samples_split (2, 5, 10), min_samples_leaf (1, 2, 4), and bootstrap (True, False).

-: Prediction and Inference: Once the model was trained, it was used to make predictions about the likelihood of new volcanic vents being present at other geographical coordinates based on their geomagnetic characteristics. These predictions were used to identify potential areas for further investigation or monitoring for volcanic activity.
-: Model Evaluation and Refinement: The model’s performance was evaluated using metrics such as accuracy, precision, recall, and F1-score.
-: Make predictions using the trained Random Forest model.

The flowchart for application of concepts used in this study is shown in Figure 5. Trained model was used to make predictions about the likelihood of new volcanic vents presence at other geographical coordinates given their magnetic characteristics.

3. Model Evaluation Metrics

Accuracy measures the proportion of correctly classified instances out of the total instances. It is calculated as (TP + TN)/(TP + TN + FP + FN), where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives (misclassifying a non-vent area as a vent), and FN is the number of false negatives (misclassifying a vent area as non-vent) [34].

Precision measures the proportion of true positive predictions out of all positive predictions made by the model. It is calculated as TP/(TP + FP).

Recall (sensitivity) measures the proportion of true positive predictions out of all actual positive instances in the dataset. It is calculated as TP/(TP + FN). Recall is useful when the cost of false negatives is high.

F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall and was calculated using Equation (7);

F 1 - S c o r e = \frac{2 \times (p r e c i s i o n \times r e c a l l)}{(p r e c i s i o n + r e c a l l)}

(7)

A Python programming script was written to compute and process these concepts according to the flowchart in Figure 5. The output report of the evaluated models are shown in Table 1. Figure 6 and Figure 7 displays the model performance plot and Receiver Operating Characteristic (ROC) curve with Area Under the Curve (AUC) evaluations. The ROC curve and the AUC are important tools for evaluating the performance of classification models. The ROC curve provides a visual representation of the model’s performance across different classification thresholds, while the AUC provides a single metric to quantify the overall discriminatory power of the model. These tools are useful for comparing and selecting the best-performing model among the multiple classifiers considered.

Figure 8 shows the predicted vent locations plotted on the magnetic anomaly data and Figure 9 shows a superimposed plot of the actual and predicted volcanic vents. Figure 10 shows the distribution density of actual and predicted vent locations, and Figure 11 displays the correlation assessment plot of distribution density of actual and predicted vent locations (Figure 10). A moving window approach was used to calculate the correlation as a spatial function in Figure 11. Specifically, we applied a moving window over the magnetic field data and vent locations, calculating the correlation coefficient within each window to capture localized variations in correlation strength across the study area. This allowed us to represent the spatial correlation between magnetic anomalies and vent presence as a continuous function rather than a single, global value. By using this method, we intended to identify regions with strong local correlations that may indicate vent-related magnetic signatures.

4. Discussion

The analysis of the geological characteristics of the study area (Figure 2) and the volcanic vent distribution across the magnetic dataset (Figure 3) reveals a nuanced understanding of the region’s subsurface dynamics. The vent system exhibits a generally sparse distribution pattern; however, significant concentrations of vents are observed in specific geological formations and regions, providing key insights into the underlying geological and geophysical processes.

In the northern section of the research area, the presence of olivine basalt, hawaiite, mugearite, benmoreite, and trachyte formations within the Madinah Basalt is notable. This area emerges as a focal point with higher concentrations of vents, correlated with low magnetic anomalies (<−20 nT). These low anomalies suggest a complex subsurface structure, likely influenced by past volcanic and tectonic activities. The concentration of vents in these formations indicates a significant correlation between the geochemistry of the host rocks and the location of volcanic activity. This supports the hypothesis that the geochemical composition of these rocks plays a crucial role in the formation and distribution of volcanic vents. In addition to the northern area, denser vent concentrations are also observed in the eastern and central regions of the study area, corresponding to low magnetic anomalies (<−10 nT). Alqhatani et al. [10] and Berthier et al. [35] have characterized these regions, particularly those situated on olivine basalt, as being indicative of an abnormal subsurface body with potential geothermal resources. This abnormality could be the result of past volcanic activities or geothermal processes, aligning well with the observed magnetic anomalies.

Conversely, the southern region displays a sparser distribution of vents and is primarily associated with olivine basalt formations. This pattern may indicate less volcanic activity or different subsurface conditions that are less conducive to vent formation. The variability in vent distribution across the study area highlights the influence of geological and geophysical factors in volcanic activity. The observed volcanic vents (Figure 3) are predominantly concentrated within the anomalous body identified by the magnetic data, indicating a strong correlation between vent distribution and subsurface geological anomalies. In northern Harrat Rahat, approximately 900 constructional vents, including scoria cones and spatter ramparts, are visible. Among these, 289 have been enclosed by more recent lava flows, effectively isolating them from their associated effusive products [12].

The selection of the optimal machine learning model for predicting vent locations was guided by factors such as dataset size, data complexity, and model interpretability. Table 1 compares the adopted models and their performance metrics. Among the models tested, the Random Forest model demonstrates the highest accuracy, precision, recall, and F1-score. Its robust performance metrics and ability to handle data imbalances make it the preferred choice for further optimization and fine-tuning. The model performance plot (Figure 6), ROC curve, and AUC evaluations (Figure 7) confirm the selection of the Random Forest classifier, with an AUC value of 0.97 indicating superior performance in differentiating between positive and negative classes.

The predicted vent locations (Figure 8) show a satisfactory alignment with the actual vent locations and identify additional potential vent sites. The predicted vents are distributed across the study area, with significant additions in the northern region, likely linked to the observed dense vent distribution. This suggests that some predicted vents might be concealed beneath the surface, obscured by volcanic eruptive deposits or structural changes such as faulting, subsidence, or uplift. During previous eruptions, volcanic vents could have been covered by lava flows, pyroclastic flows, and ash, concealing the original vent openings. The dated and mapped volcanic deposits of northern Harrat Rahat have been assigned to 12 eruptive stages [36], with the resolutions of older stages being poor due to concealment by younger volcanic products and erosion. This has resulted in some older map units being composites from multiple eruptions [12].

Figure 9 shows a superimposed plot of the predicted vent locations on the actual vent locations, facilitating direct comparison. The new predicted vent locations are represented as red triangles, while those predicted at exact locations of the actual vents (black dots) are superimposed on the actual vent locations. Significant predictions were observed in the northern, northeastern, northwestern, and central regions, with fewer predictions in the southwestern regions. The correlation between actual and predicted vent locations is 86%, with a Degree of Certainty (DC) of 97%. Figure 10 depicts the spatial distribution of known vent locations, with color intensity representing vent density in different areas. Comparing Figure 10a,b shows similar density patterns in the same geographic areas, supporting the model’s performance. The predicted vent density closely matches the actual vent density, suggesting the model’s effectiveness in capturing the spatial patterns of vent locations. Figure 11 provides a correlation measure between the two distributions with a correlation coefficient of 1, indicating a strong positive correlation. This high correlation validates the model’s predictions and its ability to identify vent locations accurately. This study quantitatively assesses the similarity between the distribution densities of actual and predicted vent locations. The insights gained from the correlation between magnetic anomalies and vent distribution provide valuable information for further geological and geophysical investigations.

5. Conclusions

The present study is the first to predict volcanic vent locations within the RVF by using the magnetic properties of the region. The application of machine learning techniques has led to the successful prediction of volcanic vent locations in northern Harrat Rahat. The Random Forest classifier was selected for this task due to its superior performance compared to other models. The predicted vent locations are distributed across the study area, with a higher density in the northern region, which is characterized by a notable magnetic anomaly and a higher concentration of actual vents. The positive correlation between the actual and predicted vent locations enhances confidence in our model’s predictions.

Our hypothesis suggests that some of the predicted vents may be concealed by sequences of volcanic rock deposition activities or tectonic processes. Therefore, further investigation is recommended to verify these potential vent locations. While our statistical evaluations provide some confidence in the model’s predictive capability, we recognize that they may not constitute proof of correctly predicting new, undiscovered vents in the field. To address this limitation, we propose that future works involve targeted field surveys in regions where the model has predicted potential vent locations. Such surveys would provide the necessary ground truth to verify the model’s predictions and improve the reliability of the vent prediction method. The insights gained can significantly contribute to our understanding of volcanic processes and aid in the effective management of volcanic hazards and geothermal resource exploration.

Author Contributions

Conceptualization, M.A. and E.A. (Ema Abraham); methodology, M.A., E.A. (Ema Abraham) and F.A.; software, E.A. (Essam Aboud) and E.A. (Ema Abraham); validation, F.A., E.A. (Essam Aboud) and E.A. (Ema Abraham); formal analysis, E.A. (Ema Abraham); investigation, M.A. and F.A.; resources, F.A.; data curation, M.A., E.A. (Essam Aboud), F.A. and E.A. (Ema Abraham); writing—original draft preparation, M.A. and E.A. (Ema Abraham); writing—review and editing, F.A. and E.A. (Ema Abraham); visualization, E.A. (Essam Aboud) and F.A.; supervision, E.A. (Ema Abraham); project administration, F.A.; funding acquisition, F.A. All authors have read and agreed to the published version of the manuscript.

Funding

This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 701-869-2024).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors, therefore, acknowledge, with thanks, the DSR for their technical and financial support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ewert, J.W.; Diefenbach, A.K.; Ramsey, D.W. 2018 Update to the, U.S. Geological Survey National Volcanic Threat Assessment. In Scientific Investigations Report; U.S. Geological Survey: Reston, VA, USA, 2018. [Google Scholar] [CrossRef]
Mackenzie, K.; Muschalik, M.; Broesche, B. Comment on the 2018 Update to the USGS National Volcanic Threat Assessment; The University of Texas at Austin: Austin, TX, USA, 2021. [Google Scholar] [CrossRef]
Aki, K.; Ferrazzini, V. Seismic monitoring and modeling of an active volcano for prediction. J. Geophys. Res. Solid Earth 2000, 105, 16617–16640. [Google Scholar] [CrossRef]
Amigo, A. Volcano monitoring and hazard assessments in Chile. Volcanica 2021, 4 (Suppl. S1), 1–20. [Google Scholar] [CrossRef]
Chang, J.-M.; Kuo, Y.T.; Chao, W.A.; Lin, C.M.; Lan, H.W.; Yang, C.M.; Chen, H. Landslide Warning Area Delineation through Seismic Signals and Landslide Characteristics: Insights from the Silabaku Landslide in Southern Taiwan. Seismol. Res. Lett. 2024, 95, 2986–2996. [Google Scholar] [CrossRef]
Bellucci, F.; Woo, J.; Christopher, R.J.; Rolandi, G. Ground deformation at Campi Flegrei, Italy: Implications for hazard assessment. Geol. Soc. Lond. Spec. Publ. 2006, 269, 141–157. [Google Scholar] [CrossRef]
Bonaccorso, A.; Bonforte, A.; Gambino, S.; Mattia, M.; Guglielmino, F.; Puglisi, G.; Boschi, E. Insight on recent Stromboli eruption inferred from terrestrial and satellite ground deformation measurements. J. Volcanol. Geotherm. Res. 2009, 182, 172–181. [Google Scholar] [CrossRef]
Camp, V.E.; Hooper, P.R.; Roobol, M.J.; White, D.L. The Madinah Historical Eruption: Magma Mixing and Simultaneous Extrusion of Three Basaltic Chemical Types. In Saudi Arabian Directorate General of Mineral Resources, Open File Report DGMR-OF-06-32; Ministry of Petroleum and Mineral Resources: Jiddah, Saudi Arabia, 1989; p. 52. [Google Scholar]
Aboud, E.; Abraham, E.; Alqahtani, F.; Abdulfarraj, M. High potential geothermal areas within the Rahat volcanic field, Saudi Arabia, from gravity data and 3D geological modeling. Acta Geophys. 2023, 72, 1713–1729. [Google Scholar] [CrossRef]
Alqahtani, F.; Abraham, E.M.; Aboud, E.; Rajab, M. Two-dimensional gravity inversion of basement relief for geothermal energy potentials at the Harrat Rahat volcanic field, Saudi Arabia, using particle swarm optimization. Energies 2022, 15, 2887. [Google Scholar] [CrossRef]
Rehman, S. Saudi Arabian Geothermal Energy Resources—An Update. In Proceedings of the World Geothermal Congress 2010, Bali, Indonesia, 25–29 April 2010. [Google Scholar]
Robinson, J.E.; Downs, D.T. Overview of the Cenozoic Geology of the Northern Harrat Rahat Volcanic Field, Kingdom of Saudi Arabia. In Professional Paper 1862-R; Sisson, T.W., Calvert, A.T., Mooney, W.D., Eds.; Active volcanism on the Arabian Shield—Geology, volcanology, and geophysics of northern Harrat Rahat and vicinity, Kingdom of Saudi Arabia; Saudi Geological Survey Special Report SGS–SP–2021–1; U.S. Geological Survey: Reston, VA, USA, 2023; p. 20. [Google Scholar] [CrossRef]
Downs, D.T.; Robinson, J.E.; Stelten, M.E.; Champion, D.E.; Dietterich, H.R.; Sisson, T.W.; Zahran, H.; Hassan, K.; Shawali, J. Geologic Map of the Northern Harrat Rahat Volcanic Field, Kingdom of Saudi Arabia. In Scientific Investigations Map 3428; Saudi Geological Survey Special Report SGS–SP–2019–2; U.S. Geological Survey: Reston, VA, USA, 2019; p. 65. [Google Scholar] [CrossRef]
Al-Amri, A.M.; Mellors, R.; Harris, D.; El-Sayed, K.A. Geothermal and Volcanic Evaluation of Harrat Rahat, Northwestern Arabian Peninsula; Profect No.: 11-SPA 2208-02; National Plan for Science, Technology & Innovation; King Saud University: Riyadh, Saudi Arabia, 2016. [Google Scholar]
Stevens, J. Living on Lava—NASA Earth Observatory. 2019. Available online: https://earthobservatory.nasa.gov/images/144471/living-on-lava (accessed on 6 June 2022).
Brown, G.F.; Schmidt, D.L.; Huffman, A.C., Jr. Geology of the Arabian Peninsula; Shield Area of Western Saudi Arabia; USGS Publication: Reston, VA, USA, 1989. [Google Scholar] [CrossRef]
Moufti, M.R.; Németh, K. Harrat Rahat: The Geoheritage Value of the Youngest Long-Lived Volcanic Field in the Kingdom of Saudi Arabia. In Geoheritage of Volcanic Harrats in Saudi Arabia; Springer International Publishing: Cham, Switzerland, 2016; pp. 33–120. ISBN 978-3-319-33013-6. [Google Scholar] [CrossRef]
El-Hussain, I.; Al-Shijbi, Y.; Deif, A.; Mohamed AM, E.; Ezzelarab, M. Developing a seismic source model for the Arabian Plate. Arab. J. Geosci. 2018, 11, 435. [Google Scholar] [CrossRef]
Moufti, M.R.; Moghazi, A.M.; Ali, K.A. 40Ar/39Ar geochronology of the Neogene-Quaternary Harrat Al-Madinah intercontinental volcanic field, Saudi Arabia: Implications for duration and migration of volcanic activity. J. Asian Earth Sci. 2013, 62, 253–268. [Google Scholar] [CrossRef]
Aboud, E.; Alqahtani, F.; Elmasry, N.; Abdulfarraj, M.; Osman, H. Geothermal anomaly detection using potential field geophysical Data in Rahat volcanic field, Madinah, Saudi Arabia. J. Geol. Geophys. 2022, 11, 1026. [Google Scholar]
Gudmundsson, A. Magma Chambers: Formation, Local stresses, excess pressure, and compartments. J. Volcanol. Geotherm. Res. 2012, 237–238, 19–41. [Google Scholar] [CrossRef]
Tibaldi, A. Structure of volcano plumbing systems: A review of multi-parameter effects. J. Volcanol. Geotherm. Res. 2015, 298, 85–135. [Google Scholar] [CrossRef]
Langenheim, V.E.; Ritzinger, B.T.; Zahran, H.; Shareef, A.; Al-dahri, M. Crustal structure of the northern Harrat Rahat volcanic field (Saudi Arabia) from gravity and aeromagnetic data. Tectonophysics 2019, 750, 9–21. [Google Scholar] [CrossRef]
Alohali, A.; Bertin, D.; de Silva, D.; Cronin, S.; Duncan, R.; Qaysi, S.; Moufti, M.R. Spatio-temporal forecasting of future volcanism at Harrat Khaybar, Saudi Arabia. J. Appl. Volcanol. 2022, 11, 12. [Google Scholar] [CrossRef]
Runge, M.G.; Bebbington, M.S.; Cronin, S.J.; Lindsay, J.M.; Kenedi, C.L.; Moufti MR, H. Vents to events: Determining an eruption event record from volcanic vent structures for the Harrat Rahat, Saudi Arabia. Bull. Volcanol. 2014, 76, 804. [Google Scholar] [CrossRef]
Aboud, E.; El-Masry, N.; Qaddah, A.; Alqahtani, F.; Moufti, M.R.H. Magnetic and gravity data analysis of Rahat volcanic field, El-Madinah city, Saudi Arabia. NRIAG J. Astron. Geophys. 2015, 4, 154–162. [Google Scholar] [CrossRef]
Zahran, H.; Stewart, I.C.F.; Johnson, P.R.; Basahel, M.H. Aeromagnetic-anomaly maps of central and western Saudi Arabia. In Saudi Geological Survey Open-File Report SGS-OF-2002-8; 4 Plates; U.S. Geological Survey: Reston, VA, USA, 2003; p. 6. [Google Scholar]
Pandey, A.K. Regression Algorithms. Medium. 2023. Available online: https://arunp77.medium.com/regression-algorithms-29f112797724 (accessed on 22 August 2023).
Kanade, V. What Is Linear Regression? Types, Equation, Examples, and Best Practices for 2022. Available online: https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/#_004 (accessed on 12 February 2024).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Baghel, V.S. Math Behind GBM and xGboost. Analytics Vidhya. Medium. 2019. Available online: https://medium.com/analytics-vidhya/math-behind-gbm-and-xgboost-d00e8536b7de (accessed on 22 August 2023).
Vapnik, V.N. The Nature of Statistical Learning Theory. In Information Science and Statistics, 2nd ed.; Springer: New York, NY, USA, 2000; ISBN 978-1-4419-3160-3. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, with Applications in R; Springer Science + Business Media, LLC, Part of Springer Nature 2021; Springer: New York, NY, USA, 2021; ISBN 978-1-0716-1418-1. [Google Scholar] [CrossRef]
Lang, N. What Is Model Evaluation? Machine Learning. Data Base Camp. 2023. Available online: https://databasecamp.de/en/ml/model-evaluation-en (accessed on 12 February 2024).
Berthier, F.; Demange, J.; Iundt, F. Geothermal Resources of Harrat Khaybar and Harrat Rahat Progress Report 1400–1401 the Kingdom of Saudi Arabia. In Saudi Arabian Deputy Ministry for Mineral Resources Open-File Report BRGM-OF-02-44; Ministry of Petroleum and Mineral Resources: Jiddah, Saudi Arabia, 1982; p. 116. [Google Scholar]
Downs, D.T.; Stelten, M.E.; Champion, D.E.; Dietterich, H.R.; Nawab, Z.; Zahran, H.; Hassan, K.; Shawali, J. Volcanic history of the northernmost part of the Harrat Rahat volcanic field, Saudi Arabia. Geosphere 2018, 14, 1253–1282. [Google Scholar] [CrossRef]

Figure 3. Aeromagnetic anomaly map with known vent locations in the northern part of the RVF.

Figure 4. Basic flow chart demonstration of the decision tree [29].

Figure 5. Flowchart for implementing the study concept.

Figure 6. Model performance plots of the tested models.

Figure 7. ROC curve and AUC for the evaluated models. The ROC curve provides a visual representation of the balance between sensitivity and specificity as the classification threshold is adjusted. In the plot, the diagonal line running from the bottom-left to the top-right (in black) signifies random guessing, whereas a curve that bends towards the top-left corner indicates superior model performance, as observed in the case of the Random Forest model. A higher AUC value indicates better overall performance of the model in terms of its ability to distinguish between positive and negative classes.

Figure 8. Predicted vent locations plotted on the magnetic map.

Figure 9. A superimposed representation of the actual and predicted volcanic vents on the magnetic map. Clearly, new vents can be seen from the predicted results while some regions with existing vents have also been predicted by our computations. The correlation between actual and predicted vents is 86%, with the Degree of Certainty (DC) being 97%.

Figure 10. Distribution density of actual and predicted vent locations. Similar density pattern is observed on (a,b) as the predicted vent density matches the actual vent density.

Figure 11. Correlation assessment plot of distribution density for the actual and predicted vent locations in Figure 10. It can be seen that the correlation evaluation is 1, indicating a stronger positive correlation. The bandwidth essentially controls the smoothness of the Kernel Density Estimation (KDE) curve used in this plot. The KDE curve is used here to visualize the distribution of actual and predicted vent locations. By this plot, we could see where the vents are concentrated and compare the patterns between the actual and predicted locations. The calculated bandwidth values give an indication of the level of detail captured in the density estimation.

Table 1. Summary scores of evaluation metrics applied to methods.

	Model	Accuracy	Precision	Recall	F1-Score
1	Logistic Regression	0.521	0.513	0.480	0.496
2	Decision Tree	0.835	0.820	0.850	0.835
3	Random Forest	0.915	0.905	0.923	0.914
4	Gradient Boosting	0.815	0.776	0.877	0.823
5	Support Vector Machine	0.517	0.513	0.330	0.402

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abdulfarraj, M.; Abraham, E.; Alqahtani, F.; Aboud, E. Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data. Geosciences 2024, 14, 328. https://doi.org/10.3390/geosciences14120328

AMA Style

Abdulfarraj M, Abraham E, Alqahtani F, Aboud E. Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data. Geosciences. 2024; 14(12):328. https://doi.org/10.3390/geosciences14120328

Chicago/Turabian Style

Abdulfarraj, Murad, Ema Abraham, Faisal Alqahtani, and Essam Aboud. 2024. "Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data" Geosciences 14, no. 12: 328. https://doi.org/10.3390/geosciences14120328

APA Style

Abdulfarraj, M., Abraham, E., Alqahtani, F., & Aboud, E. (2024). Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data. Geosciences, 14(12), 328. https://doi.org/10.3390/geosciences14120328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancements in Geohazard Investigations: Developing a Machine Learning Framework for the Prediction of Vents at Volcanic Fields Using Magnetic Data

Abstract

1. Introduction

2. Methodology

Concept Implementation

3. Model Evaluation Metrics

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI