1. Introduction
For geotechnical investigation, the primary method for assessing subsurface conditions is via borehole drillings. Geotechnical engineers work with geologists and other engineers to study these data, which is then used to inform the design of various structures, such as foundations, tunnels, and slopes [
1]. However, the material extracted directly from the boreholes constitutes a small fraction of the ground. Given the highly complex and heterogeneous nature of geological materials, uncertainty persists even when data from multiple boreholes are available [
2]. This uncertainty is particularly critical in geotechnical applications where the consequences of inaccurate predictions can be catastrophic. Traditional statistical methods have long been integrated into geotechnical engineering in order to cope with uncertainty [
3]. Nevertheless, while these methods can effectively quantify the extent of unknowns, they often fall short in enhancing predictive accuracy.
Machine learning (ML) has emerged as a powerful method for uncovering complex patterns and nonlinear relationships within large sets of data. ML algorithms can capture complex nonlinear relationships. Compared to traditional statistical methods which focus on quantifying uncertainty, ML methods are used for making predictions, with various techniques for obtaining high accuracy [
4]. The versatile nature of ML allows it to be applied to a wide variety of fields and tasks, including image recognition, natural language processing, and optimization. Different researchers have used ML to analyze data from borehole logs. Wang et al. [
5] used ML for developing more accurate geological cross-sections. In their work, they used a Bayesian supervised ML model and highlighted its potential to be both more efficient and more accurate than traditional methods. Similarly, Zhou et al. [
6] used recurrent neural networks for 3D geo-stratigraphic models. Brungard et al. [
7] used ML for predicting soil types in three semi-arid areas in the United States. They found that for greater geological and topographical complexity, there are a larger number of soil classes and lower ML accuracy. Barman and Choudhury [
8] analyzed soil samples in farming lands in India. They used ML to analyze camera photos of the soil and predict soil texture. Shao et al. [
9] provided an overview of research where ML was used for the classification of soil types. They found that the different ML algorithms have been used successfully for this purpose, including support vector machines, decision trees, and random forests. Arguably, in the current state of practice, where the performance of many ML models can be readily compared, selecting the ideal model is of secondary importance. While previous studies have focused on applying ML for improving geological cross-section and stratigraphic models, the primary question addressed in the current study is “How can ML be applied to borehole data to gain deeper insight for geotechnical understanding”?
In order to tackle this question, ML is used to analyze a large data set that consists of multiple boreholes drilled across the Gaza Strip. This coastal region is characterized by its predominantly sandy soils and sedimentary rock formations. These formations were influenced by wind-driven processes and historic fluctuations in sea levels, which led to alternating layers of marine and terrestrial deposits. Additionally, carbonate rocks, such as limestone and chalk, also exist and were formed during periods of marine submergence. This diverse and heterogeneous geological landscape complicates subsurface stability, with geotechnical challenges arising from variability in soil cohesion and the presence of loose sand pockets within denser formations.
During recent years, the Gaza Strip has been found to host a vast network of underground tunnels constructed as a series of covert operations [
10]. Hence, it is likely that standard geotechnical practices such as comprehensive site investigations were rarely carried out, leading to an increased reliance on human experience and intuition. Given both the challenging geological environment and circumstances, the Gaza Tunnel Network (GTN) serves as a unique case study [
11].
Accordingly, the ML analysis is carried out in an attempt to enhance the understanding of this geological unit, using existing geological knowledge and the GTN case study as references to compare ML with human-based learning. To achieve this, the ML results are analyzed to compare their predictability of soil type per unit depth with the human intuition of the tunnel builders. It is emphasized that the primary objective of this paper is not to provide a geological or geotechnical analysis of the Gaza Strip. Instead, it aims to demonstrate how ML techniques can be applied to borehole data to extract valuable insights, offering an approach that can be adapted to other regions.
Hereinafter, this paper is organized according to the following structure. First, the geological characteristics of the Gaza Strip are reviewed. Second, an overview of the GTN is provided, discussing its construction, stability, and relevance as a case study. Third, the methodology for data analysis is detailed. This section includes a description of the data set, the procedure for data preparation, and ML analyses. Fourth, the results of analysis are presented. Fifth, the implications of the results are discussed, along with recommendations for future research. Finally, the key findings are summarized.
2. Geology
The geology of the Gaza Strip is largely characterized by a combination of sandy and loamy soils, shaped by the region’s coastal proximity and historical climatic conditions, particularly during the Pleistocene–Holocene epochs. A key study by [
12] examines the development of paleosols, or ancient soils, in the Gaza region, revealing that windblown dust (loess) from the Sinai and Saharan deserts played a key role in the formation of these soils. The climatic conditions of the Pleistocene and Holocene epochs played a crucial role in this process. During glacial and interglacial periods, changes in sea levels affected sediment deposition along the Mediterranean coast. Colder, drier periods saw stronger aeolian activity, leading to a greater accumulation of wind-blown sand and loess, while warmer interglacial periods promoted vegetation growth, soil stabilization, and the buildup of organic matter. These shifts in sea levels also caused alternating layers of marine and terrestrial deposits, reflecting the region’s dynamic environmental history. The presence of carbonate rocks such as limestone and chalk formations can be attributed to the periods of submergence.
Zaineldeen et al. [
13] investigated the geological structure of the coastal aquifer in the southern part of the Gaza Strip and focused on the tectonic forces and faults that formed in this area. Their study highlights how the permeability and porosity of these geological formations vary, with more porous sandy layers acting as major water-bearing units, while the clay-rich layers serve as confining units. Tectonic movement and localized fracturing might also affect the stability of subsurface structures, including tunnels.
Further insights into the region’s geological history are provided by [
14], who studied the Middle to Late Pleistocene sand sheet sequence at Kerem Shalom, an area in the Israeli Negev that borders with the southern part of the Gaza Strip. Their research highlights how wind-driven sand incursions from the Mediterranean coastal plain occurred during periods of lowered sea levels and intensified arid conditions. These incursions resulted in the accumulation of extensive sand sheets, documenting significant episodes of desertification and environmental changes over tens of thousands of years. These sand deposits, shaped by aeolian processes, played a crucial role in defining the landscape and stratigraphy of the broader coastal region, including the Gaza Strip. The sandy soils, derived from sandy parent materials, dominate the coastal plain, which influences the stratigraphy and overall landscape.
From a geotechnical perspective, purely sandy soils, due to their larger grain size and low organic content, exhibit low cohesion, making them more prone to erosion and movement compared to finer soils such as loam or clay. In deeper layers, cementation processes occur, where minerals like calcium carbonate precipitate via groundwater flow, leading to the formation of more cohesive materials. The locally termed kurkar is a type of calcareous sandstone. The kurkar strength varies from weakly cemented sands to harder sandstone. The kurkar is prevalent all along the Mediterranean coast and has been encountered in tunneling projects in Tel Aviv, Israel [
15]. Although there is limited research specifically addressing the cementation of deeper sand layers in Gaza, the presence of kurkar suggests that such processes have contributed to the stability of these layers. Additionally, clay content in the Gaza Strip’s soils, introduced through aeolian (i.e., wind transport), alluvial (i.e., water transport), and marine sedimentation processes, further contributes to the region’s soil texture and composition. Together, these processes define the diverse and heterogeneous geological landscape of Gaza, complicating tunneling operations. In some cases, fine particles transported by wind or water settle and form clay pockets and contribute to the cohesion of the sand. In other cases, variations in groundwater flow, or clay settlement, create pockets of loose sand within denser formations. The latter present the greatest threat to tunnel workers, as they may collapse without warning, prior to the installation of support elements.
3. The Gaza Tunnel Network
Tunnels have been used for warfare since ancient times and for multiple purposes [
16]. Nevertheless, the GTN stands out in terms of its scale and density. The Gaza Strip is 41 km long and 6 to 12 km wide. The overall length of the GTN is estimated to be in the range of 500 to 1000 km [
10]. This suggests that the tunnels extend across multiple geological units. Despite several attacks that were targeted to destroy the tunnels, the GTN has proven to be remarkably resilient [
10]. The typical tunnel cross-section is small, allowing a single person to pass through the tunnels, while slightly bent [
17]. The estimated cross-section dimensions and a photo are shown in
Figure 1. As can be seen in
Figure 1, the standard support system used for these tunnels includes thin concrete wall and roof elements, approximately 5 cm thick. These elements consist of a single layer of rebar, with a bar diameter estimated to be 8 mm.
The tunneling procedure is such that a vertical shaft is excavated to a certain depth from where horizontal tunneling is initiated. Subsequently, through a repetitive process, the soil is excavated, and the support elements are erected.
These Gaza tunnels follow a one-size-fits-all design using similar thin concrete elements, and the tunnel workers do not fit the tunnel support for different soil types. This contrasts with the common approach for conventional tunneling where the support system is varied and selected based on changing geotechnical conditions [
18]. In addition to these standard tunnels, the GTN includes additional underground components, such as wider underground rooms, and even large caverns that allow for large vehicles to drive through them. For the latter, heavier reinforcement was used, including thick concrete and even steel beams.
Over the years, there were a number of media reports regarding collapse incidents that resulted in the death of the builders [
17]. Given the secretive circumstances of the GTN, it cannot be known exactly how many such incidents occurred. Most likely, these instances occurred when excavators encountered unpredicted large pockets of cohesionless sand.
Based on a range of strength parameters typically used for sandy formations, Elmo and Mitelman [
9] carried out geotechnical analyses of the Gaza tunnels. These analyses showed that the majority of elements should exhibit cracking and even complete failure. However, reality has shown that this is not the case. This disparity could be attributed to potential biases in traditional geotechnical practice, which often leans towards conservatism when selecting soil strength parameters. Second, it indicates that local experience has allowed the tunnel excavators to identify stable soil layers and avoid weaker soils.
To summarize, the scale of the GTN, which traverses the entire Gaza Strip, makes it a unique large-scale geotechnical project that reflects the statistical variability in the region’s geology. The GTN demonstrates how large-scale intuition has often guided successful tunneling, while also revealing the limitations of predictability, with occasional unforeseen pockets of loose sand causing collapses. By comparing ML results with these observations, the case study can help validate and enhance data-driven approaches to geotechnical assessments.
4. Methodology and Data Analysis
For the current study, a data set that consists of borehole logs from the Gaza Strip is analyzed using ML methods. The objective of the analysis is to assess the predictive power of the ML models, and to compare these findings with the geological knowledge and the observed stability of the GTN. The tunnels’ ability to remain stable across a widespread area, even without site investigations, suggests that tunnel builders relied on predictable geological conditions. Therefore, if the ML models demonstrate high accuracy, it would reinforce the idea that the subsurface conditions are sufficiently consistent, allowing tunneling operations to succeed despite the moderate density of geotechnical data. Additionally, the analysis will include an examination of the misclassifications produced by the ML models. For example, if the model’s predictions frequently confuse similar soils, e.g., sand-clay and sand-silt, this would not be the same as mistaking a soil formation for a rock formation.
A general review of ML and data analysis can be found in [
19]. Here, we provide a concise description of the procedure undertaken in this study. All coding and data analysis for this study were carried out via Python, an open-source programming language, with ML models implemented primarily via the Scikit-learn library. This choice ensured access to a wide range of robust data analysis tools [
20].
Prior to conducting ML, it is crucial to prepare the data. This preparation includes examining the quality of the data and ensuring that false information such as duplicates and nonsensical entries are cleaned from the data set. Furthermore, a process called feature engineering must be carried out. This process involves using domain knowledge in order to structure the data features in a manner that is assumed to improve the performance of the ML model.
The data set for the current study consists of 632 borehole logs from a number of archives, collected and digitized by others.
Figure 2 shows the spatial distribution of the boreholes from the data set superimposed on an aerial map. The maximum distance between the farthest-apart boreholes is 54.2 km and the average density of the boreholes is approximately 1.8 boreholes/km2. The data include the following columns: (1) borehole ID number, (2) the ground level at the borehole, (3) the x and y coordinates which define the location of the borehole, (4) the range of depths for each distinct soil type, (5) the soil type, (6) additional soil descriptions, and (7) data source.
Table 1 presents a sample of a single borehole profile from the raw data. Since the data represent a single borehole, the borehole ID, ground level, and spatial coordinates are constant, and the soil layers change with depth. This sample reflects the heterogeneity typical of the Gaza Strip subsurface, with transitions from loose sand to more cohesive layers like clay and interspersed sandstone formations.
For data preparation, a number of preprocessing steps were carried out. First, the columns with additional descriptions of the soil and data source were removed. The additional descriptions that did not appear in a majority are because these descriptions appeared only in a minority of the boreholes. However, this is not the primary reason for its exclusion, as ML models are capable of training even on incomplete data sets [
20]. The primary reason for this is that these descriptions are given in natural language, for example: “Light brown sand or sandstone with round flint pebbles”. As these descriptions do not follow any systematic classification methodology, it is highly challenging to divide them into groups. In this context, it is important to understand that for classification models, the entire data set must be converted into numerical representations to allow for the ML model to execute its calculations. The prospect of integrating advanced ML models, such as large language models (LLMs) for data preparation, is discussed in the
Section 5 and
Section 6.
Additional data preparation involved converting the depth column from ranges to incremental values. In other words, the initial data set included the top and bottom of each distinct soil type, as shown in
Table 1, so that a single row represented a soil layer. These data were transformed so that increments no larger than one meter were represented by separate rows. This was assumed to improve the consistency of the data so that the ML training process would be enhanced.
Finally, simple checks were carried out to identify whether nonsensical data exist. Examples include writing a script to check that the soil depths are indeed consistent from top to bottom and checking that no spelling errors cause additional soil classes to be created. It is important to acknowledge, however, that certain errors may remain undetected. These can stem from various sources, including misclassifications made by the geologist during the borehole investigation, as well as inaccuracies introduced during the data collection and digitization processes. While every effort was made to clean and standardize the data, such potential sources of error must be considered when interpreting the results of the analysis.
The next step is selecting an appropriate ML model to analyze the data set. ML models can be classified into supervised and unsupervised learning. Supervised learning involves building a correlation between input and output data. In contrast, for unsupervised learning, there are no output data; rather, the data are grouped into clusters with similar characteristics. For this study, both approaches were used, as is described.
Another important distinction in supervised ML is between regression and classification models. For the former, the input data are assigned to numerical outputs, and for the latter, the input data are related to different classes of data. For the current study, the supervised learning involved classification, as the output data are the type of soil identified during borehole drilling.
The supervised learning process is divided into two primary stages: (1) training and (2) testing. During the training stage, a relationship between the input and output data is established through an iterative process. This is achieved by analyzing only a subset of the data, while reserving a smaller portion of the data for the subsequent testing stage. In the testing stage, the accuracy of the ML model is evaluated against the out-of-sample data, thus ensuring that the model is generalizing well, and has not overfitted, i.e., that it has perfectly matched the data it has been trained on. For the supervised classification learning in this study, accuracy is used as the primary performance metric. Accuracy is calculated by dividing the number of correct predictions by the total number of predictions. In the context of this study, accuracy refers to the model’s ability to correctly identify the type of soil based on the input features, i.e., the location, depth, and ground level from the testing data set. The accuracy score indicates the reliability of the model to correctly classify soil types in other locations.
The ML model used for this analysis is random forest (RF). RFs are an extension of decision trees, which are ML models that predict the value of the output variable by learning decision rules derived from the input features. A decision tree is constructed by iteratively splitting the data at root nodes into smaller subsets based on feature values, a process that continues recursively. RF models enhance this approach by generating multiple decision trees and using a voting mechanism for selecting the final prediction [
21]. This model was chosen as it has been found to outperform other models for similar tasks [
9]. A comparison to other ML ensemble methods is carried out as well, as detailed in the
Section 5. A key strength of ensemble methods like RF is their ability to perform well even without hyperparameter tuning or feature normalization. For the current study, the Scikit-learn default settings were used, further demonstrating the effectiveness of RFs for geotechnical borehole data analysis.
For unsupervised learning, the k-means algorithm was used. K-means is a widely used method for dividing a data set into a specified number of clusters, where each cluster groups data points that are more similar to each other than to those in other clusters [
20]. The algorithm iteratively updates centroids to groups of data points, effectively identifying natural groupings within the data.
For this study, clustering was performed on the borehole soil profiles, which were first transformed into numerical values using one-hot encoding. This process converted categorical soil types into a binary representation, making it possible for the k-means algorithm to process and analyze them. The k-means model was assigned to group boreholes based on similarities in their soil layers across different depths. A minimum number of 3 clusters was assigned to the k-means model. To determine the number of clusters, the elbow method was used. According to this method, the k-means algorithm runs a series of analyses with an increasing number of clusters. Subsequently, it selects the number of clusters based on a rapid drop in returns, i.e., the point beyond which adding more clusters does not significantly improve the compactness of the clusters.
The unsupervised analysis can be modified to include a number of different weighted features for the clustering. For this study, the clustering considered only the similarity in soil profile, without considering other features, such as location and ground level.
The results of the clustering analysis can be assessed by visualizing the clusters on a scatter plot, where boreholes are colored according to their assigned group. Effective clustering is indicated by well-defined groups which have similar soil profiles and are of the same proportions. If one of the clusters has very few boreholes compared to the others, this may indicate that this cluster is a group of anomalous boreholes.
The purpose of performing the unsupervised analysis is to identify distinct geological regions within the study area. By analyzing large sets of data, the analysis can reveal divisions in the subsurface conditions that may not be immediately apparent through traditional methods. Potentially, this approach allows for understanding broader geological trends across the region. Practical implications of clustering analysis can include adjusting tunneling methods according to local conditions or conducting additional investigations in areas that have been flagged as outliers.
5. Results
Before examining the results of the supervised ML analysis, it is important to first understand the characteristics of the data set, particularly the distribution of soil types. Gaining insight into the prevalence of different soil types is crucial for interpreting the model’s results accurately. As a simple example, if 90% of the data set consists of sandstone, a model with 90% accuracy does not necessarily indicate strong performance, as it could simply be predicting ‘sandstone’ for all inputs. The data set used in this study contains 38 distinct soil types, and their distribution is shown in
Figure 3. To simplify the visualization, soil types that account for less than 3% of the total are grouped together under the ‘Others’ category.
Notably, sandstone is the most prevalent soil type, making up 38.1% of the data set, followed by sand at 21%. The ‘Other’ category, which encompasses less frequent soil types, constitutes 19.4% of the data set. The distribution also highlights a relatively large proportion of less common or anomalous soil types, grouped together as ‘Other’, which constitutes 19.4% of the data set. This significant presence of varied, lower-frequency soils suggests a degree of geological complexity within the region, with numerous transitions and localized variations. For the ML model, this introduces a challenge, as accurately predicting these less prevalent soils requires the model to distinguish subtle differences rather than simply relying on dominant patterns.
The RF model achieved an overall accuracy of 75.26%, i.e., it correctly classified soil types in approximately three-quarters of the test set. Commonly occurring soil types, sandstone, sand, and clay, achieved higher accuracy, indicating that the ML model is less successful in identifying the less prevalent soil types. In order to examine the performance of other ML models, two additional ensemble models, Gradient Boosting and XGBoost, were also trained and tested. While all these ML models use decision trees, RFs employ bagging, where multiple decision trees are independently trained on different random subsets of the training data, and the final prediction is made according to the majority of classification outputs. On the other hand, boosting techniques, used in models like Gradient Boosting and XGBoost, and decision trees are built in a sequential process, where each tree corrects the errors of the previous ones. XGBoost is an advanced implementation of Gradient Boosting, with additional regularization techniques, parallel computation, and a more sophisticated objective function that optimizes for both accuracy and execution speed.
Figure 4 shows the accuracy of each of the models. These results show that the RF model performed best (73%), while XGBoost performed comparably at 73.0%, and Gradient Boosting lagged significantly behind with 56.3%. These results are similar to the findings by Shao et al. [
7]. Given that RF outperformed the other models, the remainder of this paper focuses on analyzing its results. However, for more complex or noisy data sets, XGBoost’s advanced features, such as its ability to handle missing data and its fine-tuning capabilities, might enable it to outperform RF.
The accuracy of the model was evaluated against different depth intervals, as shown in
Figure 5. Presumably, at greater depths, where bedrock can be found, it would be easier for the model to make accurate predictions. Looking into the data set, while the percentage of sandstone does increase with depth, it is apparent that there are several pockets of soil even in great depths. Sand is the most prevalent soil type at depths greater than 50 m below ground level. This aligns with the geological literature, which notes that the Gaza Strip’s geology is heavily influenced by aeolian processes. As can be seen in
Figure 4, there is a slight improvement in accuracy with increasing depth; however, it is minor and inconsistent. This could be attributed to the prevalence of sand and other soils at great depths. Additionally, many boreholes in the data set have limited depth, reducing the amount of deep-level data available for training. This lack of data coverage at greater depths likely contributes to the observed variability in accuracy, limiting the model’s ability to consistently improve predictions at these levels.
To gain deeper insight into the supervised model’s performance, the 10 most prevalent misclassifications of the model are listed in
Table 2. These results indicate that the most frequent errors involve misclassifications between similar or related soil types, such as ‘Sand-Silt’ and ‘Pebble/Conglomerate-Sand’, or ‘Sand-Silt’ and ‘Chalk-Silt’. Unlike a human observer who might confuse these soils based on physical similarities in composition and texture, the RF model relies solely on numerical data inputs. Therefore, it is argued that these misclassifications demonstrate that the model is picking up on actual patterns that reflect underlying geological causes, rather than making random errors.
From a geotechnical perspective, the greatest risk stems from failing to predict sand while tunneling through stable ground. The variability in Kurkar’s cementation, from weakly to strongly cohesive sandstone, presents challenges as its mechanical properties can fluctuate considerably. As pockets of sand are prevalent even within the bedrock, this imposes a significant challenge to tunneling operations. Accordingly, the number of each misclassification of one of the rock types wrongly identified as sand was counted and found to be 391, whereas the total number of classifications computed during the test stage was 13,713 (i.e., 2.85%). This proportion is relatively minor.
Another interesting finding is that even after reducing the size of the training set to 30% of its original size, the RF model maintained an accuracy of around 75%. This was examined by selecting a random and reduced subset of boreholes and rerunning the RF analysis. In order to confirm that the results are consistent, this analysis was repeated three times, each time randomly selecting a different group of boreholes. This result shows that the underlying relationships between the input features and soil types are strong enough that even a smaller sample of borehole data can capture these patterns effectively. The implications of this finding are discussed in the following
Section 6.
To summarize the supervised ML analysis, the results demonstrate that soil types are generally predictable, with most misclassifications occurring between similar soil types. However, the data also reveal a notable minority of outliers that remain challenging for the model to classify accurately. In cases where these misclassifications involve weak or unstable soils, the implications for tunnel construction could be catastrophic, as has been reported in some instances.
As explained in the
Section 4, the objective of the unsupervised learning is to divide the data set into clusters with similar properties. The k-means model examines not just what soil types are present in each borehole, but also how they change with depth. The result of clustering with the k-means algorithm is shown in
Figure 6. Four clusters have been created, with the number of boreholes for clusters 0–3 being 163, 159, 333, and 86, respectively. While Cluster 3 is the smallest, it is still significant in size and is therefore likely not a group of anomalies.
Looking into
Figure 6, a number of observations can be made. There are no clear-cut boundaries between clusters, and there is some overlap between clusters. This suggests that there are no sharp changes in soil profile in the Gaza Strip. Nevertheless, a diagonal pattern offset from the coastline can still be recognized. As the ground level rises similarly to this trend, this may reflect the transitions in geological formations, such as sedimentary layers, that are inclined.
An attempt to visualize the results by building representative soil profiles from each cluster did not yield interpretable results, as inconsistencies were observed. Additionally, the current analysis did not account for the difference in borehole depth, a factor that can create undesired biases. Future studies could improve the clustering outcomes by employing more advanced data preparation techniques, such as filtering out anomalous layers or normalizing borehole depths. Additionally, more sophisticated unsupervised ML models may be able to capture underlying patterns more effectively, offering a clearer geological interpretation.
6. Discussion
The ML analysis results indicate that model accuracy does not significantly improve at greater depths. This aligns with the known heterogeneity of the sedimentary formations in the study area, where various geological processes have created statistical variability in subsurface conditions. From a geotechnical perspective, this finding is important because it highlights that even when tunneling through bedrock at greater depths, there is still a possibility of encountering unstable soils. This underscores the need for careful planning and risk assessment during tunneling operations, regardless of depth.
Another noteworthy finding is that reducing the data set size to 30% of its original volume did not affect the model’s accuracy. Geologically, this suggests that subsurface conditions across the Gaza Strip are relatively predictable and consistent over large areas. If soil types were highly erratic, the model would likely need a much larger data set to achieve the same accuracy. The model’s ability to perform well with less data indicates that certain soil types occur in consistent, recognizable formations. This observation is further supported by the unsupervised learning results, where distinct regions were not clearly identified through the clustering process. Furthermore, this result is consistent with the practical reality of the GTN, which has shown that the subsurface is mostly predictable to tunnel builders.
The low frequency of misclassifications involving sand reinforces the notion that the particular risk of collapse is manageable. This predictability may explain why tunnel builders have historically been able to construct stable tunnels without extensive geotechnical investigations, relying instead on a general understanding of the area’s geology and intuition gained through experience.
These findings have practical implications for the ongoing question of how many boreholes are necessary for effective site investigations. While general guidelines exist in the literature, determining the optimal number of boreholes remains an open question in geotechnical engineering. For this study, increasing the borehole density did not improve the model’s ability to detect localized geological anomalies, suggesting that a lower borehole density may be sufficient for broad, large-scale assessments. Potentially, this could lead to cost savings without compromising accuracy. However, for projects requiring the precise identification of localized features, particularly in areas where unexpected soil transitions could pose risks (e.g., pockets of sand within bedrock), a higher density of boreholes within a smaller area may still be necessary. This highlights the need to balance cost efficiency with local geological variability, and further studies should explore how ML models can guide more efficient and adaptive approaches to borehole planning.
It is important to acknowledge the differences between how humans and ML models learn information. ML models are often referred to as “black boxes” because the mechanisms driving their predictions can be difficult to interpret [
19]. In this study, the data set was limited to a few straightforward features, specifically, depth and spatial coordinates and ground level. These variables are crucial in geotechnical contexts as they strongly influence soil transitions, which can vary notably with depth and location. While the ML models might seem complex, with these data, it is likely that the ML model detects relatively simple patterns, such as frequent transitions between specific soil types at certain depths. These patterns might be intuitive to some degree to experts and experienced workers, as they align with known geological behaviors and stratigraphy. Nevertheless, ML models have a strong advantage in identifying subtle correlations that might be too nuanced for a human observer to detect. This ability allows the model to reveal underlying geological trends that humans cannot. On the other hand, in this case, human workers can incorporate additional sensory inputs that have not been included in the ML data set such as soil color, texture, and moisture content. These can be used to identify trends that are dependent on these features and enhance their predictive power.
Looking into the future, a number of approaches could be applied to enhance similar studies. First, LLMs could be applied to incorporate soil descriptions given in natural language. By coupling recursive computing with LLMs, large data sets with several qualitative observations could be studied and then classified and added as features to the ML analysis.
Another promising avenue of research is applying ML interpretation tools to extract additional insights that can complement human understanding. While this has been attempted in the current study, this field is rapidly advancing, and sophisticated tools are being developed for this purpose [
22].
Finally, by promoting data sharing, larger data sets could be built, which would allow for harnessing the full potential of ML. ML methods are particularly powerful for the analysis of large sets of data. While geological formations may vary significantly, techniques such as transfer learning could be used to learn generalizations from different data sets [
23]. These studies could benefit from integrating additional types of data, such as laboratory test results, geophysical surveys, and remote sensing data, to further enhance the accuracy and robustness of ML methods in geotechnical applications [
24].
7. Conclusions
This study demonstrates the potential of ML to enhance geotechnical insights from borehole data, using the Gaza Strip as a case study. The data set consists of 632 boreholes, and its features are the spatial coordinates, ground level, and soil type per depth.
Supervised learning was carried out to predict the soil type. A comparative analysis of ensemble models revealed that the RF model outperformed both Gradient Boosting and XGBoost in terms of accuracy, achieving 75.2% compared to 56.3% and 73.0%, respectively. This reinforces the suitability of RF for the current data set and similar applications.
When reducing the data set size to 30% of its initial size, the RF model retained this accuracy, indicating that the Gaza Strip geology is generally predictable over large distances. This finding aligns with the reality of the Gaza tunnels, which were constructed without comprehensive site investigations, yet demonstrated remarkable resilience.
Analysis of the prevalent misclassifications from the RF model found that these are largely between similar types of soil. This reflects on the model’s ability to recognize actual geological patterns. The overall misclassifications of rock types mistakenly identified as sand were of minor proportion, consistent with the reality that the risk of collapse during tunneling was relatively low.
Unsupervised learning with the k-means model was carried out based on similarities in soil profile. This analysis did not identify clear-cut boundaries between clusters, suggesting that, while on a large scale, the geology is generally predictable, localized anomalies are prevalent.
Several implications can be drawn from this study. First, the ability of ML models to uncover geological patterns is demonstrated. Second, in cases where ML models maintain their accuracy with reduced data sets, fewer boreholes may be sufficient for large-scale assessments. Finally, for future research, it is recommended to integrate advanced ML tools to enhance the proposed approach. This could include incorporating LLMs to analyze qualitative data from borehole logs and applying interpretability methods to increase ML explainability.