1. Introduction
In recent times, the frequency of natural disasters has increased due to various natural and anthropogenic factors. These natural disasters have contributed to the destruction of various types of geotechnical infrastructure, causing slope failures, landslides, soil instability, and foundation failures, often resulting in the loss of property and lives and damage to other infrastructure. The risk of infrastructure failure can be minimized by incorporating real-time monitoring and advanced design techniques.
Conventionally, risk assessment of geotechnical infrastructure is performed manually or using various numerical techniques, such as the finite element method, finite difference method, and discrete element method. However, these manual techniques have shortcomings, including time consumption, high labor costs, and the lack of feasibility for real-time risk assessment. Moreover, numerical techniques have certain drawbacks, such as complexity, limited generalizability, and time consumption. The problems can be solved by deploying advanced data-driven techniques along with advanced sensing and communication devices for geotechnical risk assessment and monitoring.
Artificial Intelligence (AI) has emerged as a state-of-the-art data-driven technique for simulating the human mind and functions, such as problem-solving, learning, reasoning, and perception. By analyzing the dataset, AI understands patterns to make predictions and decisions. With the use of AI, industry tools are transforming from generative tools and healthcare diagnostics to the automatic execution of tasks in manufacturing. Unlike conventional software, AI learns patterns from data to improve its performance.
Machine Learning (ML) is a subfield of AI that focuses on equipping systems with the ability to understand, learn, and improve from experience without being programmed specifically. ML models are trained using databases containing both input and output data for supervised learning and only input data for unsupervised learning. The model learns different patterns from the data, allowing it to make accurate predictions. Common examples of machine learning algorithms are random forest, gradient boosting, decision tree, and support vector machine [
1,
2]. Common applications of machine learning in geotechnical engineering include bearing capacity estimation, slope stability analysis, and soil property prediction. There are certain shortcomings of ML models, including the inability to capture complex non-linear relationships, poor performance on sequential and time-series data, and difficulty processing unstructured data, such as images. These shortcomings can be alleviated using deep learning (DL) models.
DL is a specialized subset of ML inspired by the function and structure of the human brain and uses multiple hidden-layer neural networks to mimic its learning process. DL involves the use of an ANN, which comprises interconnected nodes that process and transmit information similar to biological neurons. DL algorithms have proven successful for tasks such as natural language processing, image, and speech recognition. Some popular deep learning architectures include convolutional neural networks, Recurrent Neural Networks (RNNs), and the transformer architecture.
Conventionally, deep learning algorithms such as RNNs and transformer architectures [
3,
4,
5] are used to solve various problems in geotechnical engineering. However, there are certain drawbacks associated with deep learning models, including a lack of understanding of natural languages, poor generalization, and the need for large amounts of training data for model development. The inability of deep learning models to understand human language limits their ability to automate various workflows in geotechnical engineering. Moreover, as deep learning models are developed for specific cases, their generalizability decreases. For instance, a deep learning model trained to predict the factor of safety of one region cannot be used to predict the factor of safety of another region, as it was trained on a limited amount of data. These shortcomings of deep learning models can be addressed by incorporating LLMs into geotechnical engineering.
2. Problem Statement and Methodology
This study was conducted to underscore the potential of LLM in transforming different sectors of geotechnical engineering. The research objectives are (a) to highlight the application, advantages, and shortcomings of applying LLM in geotechnical engineering, (b) to identify gaps in existing research, and (c) to outline future applications of LLM for empowering geotechnical designers and practitioners.
The review paper contributes to the knowledge base in geotechnical engineering by addressing three questions: (a) How can LLM enhance design and numerical analysis of geotechnical infrastructure, and other processes in geotechnical engineering? (b) What are the challenges faced by researchers and practitioners in developing and incorporating LLM-based frameworks in geotechnical engineering? (c) How can LLM be integrated into modern technologies for improving the state-of-the-art technologies in geotechnical engineering?
This study briefly introduces the concept of LLMs, the transformer architecture, and the different LLM architectures. It presents an overview of LLM application for slope stability analysis, foundation bearing capacity computation, tunnel and underground infrastructure design, automated numerical modeling of infrastructures, and automated geotechnical site characterization. Moreover, this study will facilitate geotechnical engineers and researchers in understanding the benefits and pitfalls of applying LLMs in geotechnical engineering. Lastly, this study contributes to the fields of AI and geotechnical engineering by defining areas for future research and outlining ways that LLMs can be integrated to perform tasks to empower geotechnical practitioners.
Figure 1 shows a schematic representation of the application of LLM in geotechnical engineering.
This review paper commenced with a systematic identification of research papers following a systematic methodology to ensure congruence with research objectives. The different keywords used during the literature acquisition were geotechnical engineering, geoenvironmental engineering, geomechanics, slope stability, tunneling, geotechnical site characterization, bearing capacity estimation, and large language model.
The literature was acquired from different databases, including ScienceDirect, the American Society of Civil Engineers database, and Multi-Disciplinary Publishing Institution (MDPI), using a pair of keywords. Some of the keyword combinations include large language model and geotechnical engineering, large language model and slope stability, and large language model and tunnel designing. The literature was organized into two different clusters: (a) LLM description and overview, and (b) application of LLM in geotechnical engineering. For the second cluster, the acquired literature was grouped into eight sub-categories, namely: (1) stability analysis of slope, (2) design of tunnels and underground infrastructure, (3) bearing capacity computation, (4) content generation and knowledge support, (5) risk assessment of geotechnical infrastructure, (6) automation of numerical modeling, (7) geotechnical site investigation, and (8) workflow automation in geotechnical engineering. This framework of the paper facilitated a critical review of the instrumental role of LLMs in improving geotechnical engineering practices, highlighting their benefits and pitfalls.
3. Application of Large Language Models in Geotechnical Engineering
3.1. Large Language Models
LLMs have emerged as the next generation of advanced AI models and are increasingly adopted to understand and solve complex problems, analyze large volumes of data across various domains, and generate coherent human language. These models are trained on large volumes of data and, unlike traditional natural language models that were trained on specific domains, these models are designed as task-agnostic architectures with significantly high (hundreds of billions) [
6,
7,
8] number of parameters. Due to their scale and scope, LLMs have also shown emergent capabilities such as autonomous decision-making, reasoning, contextual learning, and planning.
The architectures of most prevalent LLMs are based on transformer architecture with encoder-only, decoder-only, or encoder–decoder model variants. Transformers are a type of deep learning architecture developed by Vaswani et al. [
3] for machine translation that can capture long-term dependencies in sequences. Before transformer architectures, the Long Short-Term Memory (LSTM) architecture was the most popular for capturing dependencies in data. However, the LSTM architecture cannot capture long-term dependencies in data and is not suitable for language processing. Another benefit of the transformer architecture is faster data processing speed, as compared to the LSTM architecture, due to its ability of parallel processing [
3,
9,
10]. Some examples of sequential data modeling in geotechnical engineering are the evaluation of the stability of slopes due to rainfall events, studying the deformation of geotechnical infrastructure due to earthquake loading, and the settlement of soil during the construction of embankments. The main components of the transformer architecture are the encoder and decoder blocks, along with positional encoding. The encoder block processes the input data to generate a mathematical representation of the input sequence, and the decoder block generates the output sequence from one token at a time. Generally, transformer architectures lack computational units to encode input data positions, but in sequence modeling, positional information is important; positional encoding provides positional information for each data point in the input sequence. Equations (1) and (2) represent the mathematical computation of the positional encoding block.
where
represents the token of the input data,
represents the dimension of the position index vector, and
represents the dimension of the representation.
The important components of the transformer architecture are a multi-head self-attention layer and a position-wise feed-forward network. The multi-head self-attention layer computes relationships between different data points in the input, as shown in Equations (3) and (4). The feed-forward neural network then processes the self-attention layer’s output to help the model understand complex data relationships; Equation (5) represents this computation. The dropout layer helps prevent overfitting, residual connections address overfitting, and layer normalization stabilizes network training.
where
X represents the encoded value with positional encoding,
σ represents the Softmax function,
Q,
K, and
V represent the query, key, and value vectors, respectively.
ds represents the dimension of the key vector, and
,
, and
represent the weight matrix of query, key, and value, respectively.
represents the input vector at position
,
and
represent the weight matrix of the first and second linear layers, respectively,
and
represent the bias vector of the first and second linear layers, respectively, and
represents the rectified linear unit activation function.
The LLM model based on the transformer architecture can be classified into three categories, namely: (a) encoder model, (b) decoder model, and (c) encoder–decoder model.
Figure 2 shows a schematic representation of the three LLM architectures. Encoder-only models (
Figure 2b) are generally used for analyzing data and can be used for summarizing borehole logs and geotechnical data reports. Some of the popular encoder-only LLM architectures are Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), and A Lite BERT (ALBERT).
Decoder-only models (
Figure 2c) are used for prediction and content generation, for generating codes for numerical analysis in geotechnical engineering, and for answering questions about different concepts in geotechnical engineering. Some of the popular decoder-only architectures are Generative Pretrained Transformer (GPT), Large Language Model Meta AI (Llama), Claude, and Mistral. Encoder–decoder models (
Figure 2a) are used for language translation and can be applied to translate geotechnical engineering reports from one language to another. Two of the popular encoder–decoder architectures are Text-to-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART). Discussion of the different LLMs is beyond the scope of this manuscript; readers are requested to read other literature for more information on these models.
3.2. Stability Analysis of Slope
Slope stability is an important aspect in geotechnical engineering, critical for the stability of highway and railway embankments, bridge abutments, earth dams, mine pits, and tailings storage facilities. Conventionally, stability analysis of soil slopes is performed using the Swedish Circle/Fellenius Method, Bishop’s Simplified Method, Janbu Method, and Morgenstern-Price Method. The shortcomings of the conventional techniques are the assumption of predefined failure surface, lack of calculation of pore water during analysis, failure to consider the constitutive relationship of soil mass, and inability to model deformation during slope failure. These shortcomings were overcome by using finite element analysis for slope stability using different commercial software. The problems associated with the commercial software are the complex procedure of finite element analysis.
LLM can facilitate slope stability analysis by providing guidance into the analysis and can also interpret the results of the slope stability analysis.
Figure 3 shows a schematic representation of slope stability analysis using LLM. The LLM receives a prompt from users, processes it, generates code, and uses that code to produce results. The user reviews the results and provides feedback to LLM. This process continues until the result appears correct to the user.
Table 1 shows the application of LLM for automating slope stability analysis.
Kim et al. [
12] used ChatGPT-generated MATLAB code to identify critical failure surfaces and calculate the Factor of Safety (FS) using the Fellenius method of slices. Additionally, ChatGPT was prompted to generate MATLAB code for solving seepage flow using the Finite Difference Method. The code computed hydraulic head distribution and flow nets comparable to those produced by commercial software (GeoStudio SEEP/W). The results were validated against GeoStudio SLOPE/W, showing accurate identification of critical failure surfaces, and achieved FS values of 1.630. The major advantage of this study is that ChatGPT was able to logically establish programming sequences, including the definition of variables and domains, the formulation of governing equations, iterative operations, and convergence checks, and the visualization of results. In addition, ChatGPT’s outputs were consistent with results from commercial software such as GeoStudio SEEP/W and SLOPE/W.
Xu et al. [
14] introduced a GPT-4o-based multi-GeoLLM, a multimodal, multi-agent MML framework that integrates text and image inputs to automate geotechnical tasks, such as footing design, bearing capacity, and settlement analysis, and to generate GPT-assisted MATLAB code for slope stability evaluation. In addition, the model generates design drawings using Python-based logic and equations derived from standard codes. The proposed model, Multi-GeoLLM, achieved perfect accuracy of 1.0 across 60 multimodal cases (text, image, and text–image sub-tests) in footing design cases, whereas it achieved 0.97 accuracy with 100 textual cases.
Wu et al. [
13] conducted a study that used photo analysis and textual reasoning to automate visual inspection of slopes and assess landslide risk. In addition, ChatGPT was employed for site-similarity prediction, simulation-parameter recommendation, and site grouping based on seismic hazard. The primary role of the LLM in these applications was to group sites with similar seismic characteristics, auto-generate Python code for spatial analysis and plotting, extract guidance from the LIQCA manual using Retriever-Augmented Generation, recommend parameter values for given soil types, and compare scatter plots of clay properties across sites. In this study, it was observed that GPT successfully grouped sites based on HVSR curves and spatial data. In addition, when location data was added, GPT-generated Python code improved clustering accuracy to match expert recommendations. Also, in site-similarity prediction, GPT’s rankings of similar sites were generally consistent with those of a hierarchical Bayesian model.
Kwak and Won [
11] attempted to integrate an LLM, specifically ChatGPT, into advanced geotechnical analyses by developing a framework for seepage-induced slope stability assessment. Their study showed that ChatGPT can generate Python code for seepage modeling, slope stability calculations using the Bishop’s simplified method, and the coupling of both analyses, achieving factors of safety within 1.86% margin of the industry standards (SEEP/W and SLOPE/W). It was observed that the LLM incorporated optimization techniques, automated phreatic line extraction, and reduced computational time by up to 70% through adaptive algorithms, demonstrating the potential of LLMs to make geotechnical workflows more efficient, accessible, and sustainable. This work underscores how LLMs can make complex numerical modeling accessible, reduce reliance on expensive software, and accelerate decision-making. This is a significant step toward embedding AI-driven automation in geotechnical engineering for sustainable infrastructure design. Additionally, the study highlights a human-in-the-loop approach to refining prompts when ChatGPT misinterprets tasks (e.g., correcting the slice angle calculation). This demonstrates that while LLMs can automate engineering workflows, expert oversight remains critical for accuracy and robustness.
Overall, researchers used various LLMs, including ChatGPT, BERT, T5, and ChatGPT-4.0, to automate slope inspection and evaluate landslide risk, and to generate Python and MATLAB code for slope stability analysis. Kim et al. [
12] focused on using ChatGPT to generate MATLAB code for conventional methods such as the Fellenius method and seepage analysis, whereas Kwak and Won [
11] employed LLM for generating Python code for solving slope stability problems using Bishop’s simplified method. Xu et al. [
14] proposed a multimodal, multi-agent GeoLLM framework for generating MATLAB codes for slope stability. Another researcher, Wu et al. [
13], employed LLMs for tasks beyond numerical analysis, including visual inspection, site similarity assessment, and seismic hazard-based clustering. Therefore, LLMs emerged as an alternative tool for slope stability analysis, with performance similar to that of commercial software. Some pitfalls of LLM-driven slope stability analysis include generating incorrect code due to LLM hallucinations or incorrect prompts. Hallucinations in LLMs can be prevented by reviewing generated code and analysis results by geotechnical engineering subject matter experts.
3.3. LLM-Based Design and Analysis of Tunnels and Underground Infrastructure
Tunnels are underground structures that pass through soil and rock and are important components of the transportation infrastructure, facilitating the movement of people and freight by highways and railways. Traditional methods of tunnel stability analysis and underground engineering are performed using analytical and empirical techniques developed by researchers, as well as numerical methods such as finite element and finite difference methods. The problems with empirical methods include limited generalizability, as these methods may not be applicable across different geologic conditions, and the neglect of the stress–strain behavior of rock or soil. Unlike empirical methods, numerical techniques can be applied across different geologic conditions and account for the stress–strain behavior of geomaterials. However, numerical techniques are complex and time-consuming. Researchers used LLMs to automate different processes in tunnel engineering;
Table 2 shows the recent application of LLMs in designing underground structures.
Wu et al. [
16] demonstrated a multimodal framework driven by LLMs (Tunnel GPT, Tunnel DeepSeek, Tunnel AliQwen, etc.) that integrates images, videos, drilling data, Ground Penetrating Radar (GPR) signals, and geological sketches into a unified knowledge graph to automate tunnel face stability prediction. LLMs were employed to create high-fidelity synthetic rock mass images to improve dataset balance and to increase the diversity of geological conditions. By combining LLMs for synthetic rock mass image generation with computer vision models and a structured knowledge graph, the framework achieves high accuracy (up to 96%) under complex geological conditions and reduces reliance on manual inspections. Overall, these innovations make LLM-driven multimodal systems an important technology for achieving real-time evaluation of tunnel face stability.
In the same year, Tiwari et al. [
20] demonstrated a semantic AI framework, GeoSemantica, that uses fine-tuned LLMs to assess seismic soil liquefaction risk. The key application of LLMs is to perform binary classification of soil liquefaction occurrence under seismic loading. The LLM examines the semantic history derived from geotechnical and seismic inputs to determine whether liquefaction is possible or not at the site. GeoSemantica translates geotechnical parameters, such as effective stress, soil type, SPT (N) value, and seismic loading, into domain-informed natural language to perform geotechnical reasoning. This allows the LLM to record interactions between soil properties and seismic demand. The GeoSemantica LLM achieved accuracy (75%), F1 score (81.5%), and a high recall value, outperforming other LLMs. This study shows that LLM approaches can provide more reliable decision-making in geotechnical earthquake engineering.
Another research, Hu et al. [
18], presented the application of LLM by developing an LLM-based intelligent assistant for autonomous Tunnel Boring Machine (TBM) tunneling. This research combined an LLM with domain-specific knowledge and a multi-agent framework to enable human–machine collaboration in complex underground construction scenarios. Moreover, by combining a stepwise LLM with Retrieval-Augmented Generation (RAG), the framework can predict operator intention, support decision-making, and monitor anomalies during tunneling. Case studies of metro tunnel projects demonstrate that LLM-based assistants notably enhance system transparency, reduce manual intervention, and improve operational safety. From a construction perspective, this work demonstrates how LLMs can enable more efficient, reliable geotechnical construction by optimizing automated operations and minimizing manual errors.
Mehrishal et al. [
19] presented an AI-driven framework, Tunnel Rapid AI Classification (TRaiC), that demonstrates the role of LLMs in geotechnical engineering workflows, particularly in underground engineering. This research combined computer vision-based discontinuity detection, 360° tunnel face imaging, 3D digital twin generation, and the (Retrieval-Augmented Generation-Large Language Model) RAG-LLM system to automate interpretation and standardized reporting. In this framework, the LLM acts as an intelligent geotechnical assistant, blending multimodal inputs such as images, discontinuous data, and historical tunnel data to provide rock mass descriptions and rock mass rating (RMR) values aligned with engineering standards. By minimizing reliance on manual tunnel face mapping, this LLM-based system improves efficiency and safety while reducing human involvement in risky environments and situations.
The most recent study on the application of LLMs in tunnel engineering was conducted by Wu et al. [
15], who developed a Tunnel Rock Integrity Prediction GPT (Tunnel RIP GPT) framework for tunnel rock mass integrity assessment, a critical task for construction safety. The study commenced with data augmentation and balancing to enable the model to identify the different rock types. During this step, synthetic images of rock fracture types were generated, reviewed by experts, and subsequently used to augment the training data. Thereafter, an advanced computer vision model combining a Swin Transformer and a Convolutional Neural Network (CNN) was developed to segment rock fractures from tunnel image faces. Digital image processing was employed to convert the fracture data into measurable parameters such as density. Multimodal data fusion was performed by combining data from ground-penetrating radar, drilling logs, rock mass structure, fracture density from computer vision, water seepage, physico-mechanical properties, and design parameters. Finally, the Tunnel RIP GPT model uses these parameters to provide different stability classifications. Traditional ML models like CNNs and transformer-based models struggle with multimodal integration, but LLMs apply attention-based language-driven interactions to achieve end-to-end prediction of rock mass integrity, with accuracy exceeding 90%. Moreover, the study includes diffusion-based image generation to address data imbalance and enables prompt-based interactions for tunnel engineers, reducing dependence on site testing.
In tunnel engineering, the applications of LLMs are multifaceted, ranging from construction, monitoring, geologic prediction, face stability assessment, and rock integrity predictions. Some of the LLMs employed are GPT-4, GPT-4o, DeepSeek-R1, Gemini 1.5, Qwen1.5-32B, GPT-4, BERT, GPT-2, and Llama. One limitation of current LLM integrations is the technology readiness level of these methods. Although these methods were successfully deployed for a few projects, their real-world application remains unknown. Therefore, these methods should be deployed under expert supervision to assess their performance and shortcomings.
3.4. LLM-Assisted Bearing Capacity Estimation
Bearing capacity is an important concept in geotechnical engineering and is used for the design of shallow foundations and deep foundations. Conventionally, bearing capacities are estimated using standard penetration test values, soil types from borehole logs, and empirical equations developed by researchers. There are certain shortcomings of empirical equations, including the assumed failure mechanism of soil, not considering the stress–strain relationship of soil, inadequate representation of soil stratification, and ignoring the stress history of soil. These shortcomings can be solved by performing a finite element analysis of foundations. However, the shortcomings of finite element analysis include complex analysis and time-consuming calculations.
Table 3 summarizes the application of LLM for bearing capacity calculation.
Figure 4 shows a schematic representation of bearing capacity estimation for shallow foundations using an LLM. Users provide a prompt to the LLM, the LLM processes it, and generates code to implement the design. The user reviews the results, and once satisfied, the design can be implemented.
Xu et al. [
22] developed a Gemini-pro-based GeoLLM model to estimate bearing capacity and settlement for a single pile. Main tasks involve extracting design parameters from geotechnical texts and performing calculations in accordance with European, Chinese, and American design codes. In addition, this study evaluates various LLMs, including Gemini-pro, GPT-4, GLM-4, and the Qwen family, for accuracy in extracting geotechnical parameters and reliability in performing engineering calculations. The study demonstrates that LLMs with >100 B parameters are suitable for high-precision engineering tasks. The main advantage of this model is its remarkable text comprehension and human-like responses, enabled by its transformer architecture. Also, the GeoLLM model attained high precision (up to 0.988) for intelligent geotechnical designs. The following year, Kim et al. [
21] presented a study demonstrating the use of ChatGPT to automate the calculation of vertical pile bearing capacity in accordance with American Petroleum Institute Recommended Practice 2A API RP 2A design standards. The key application of LLM in this study is to generate Python code for calculating pile vertical bearing capacity, to read and understand API RP 2A design standards, and to extract equations, parameter limits, and tabulated coefficients. The study highlighted that ChatGPT successfully generated valid computational workflows for shaft friction, end bearing capacity, and penetration depth estimation through prompt interaction. LLM-assisted code generation remarkably excels in direct numerical computation by LLMs and minimizes arithmetic errors. This approach has been proven to deliver consistent geotechnical design workflows by reducing repetitive manual calculations, thereby promoting sustainable geotechnical problem-solving.
Overall, the integration of LLMs for bearing capacity estimation is at a nascent stage, with a few research works. One challenge in applying LLMs to bearing capacity estimation is selecting incorrect values from codes due to hallucination. Some of the future areas of research are estimation of bearing capacity of deep foundations under cyclic loading using LLM, estimation of bearing capacity of shallow foundations on slopes, and estimation of bearing capacity of piles supporting offshore wind turbines using LLM.
3.5. Virtual Assistance, Content Generation, and Knowledge Support
In earlier times, the major sources of knowledge for geotechnical engineers were books, journal papers, lecture notes, and videos. It was cumbersome and time-consuming for engineers to learn various geotechnical engineering concepts. LLMs are used to retrieve information on various geotechnical engineering concepts. LLM chatbots are developed based on different scientific literature and can answer basic to advanced-level questions in geotechnical engineering. Geotechnical engineers can leverage LLM for quick reference to questions.
Table 4 summarizes the application of LLM for providing knowledge support in geotechnical engineering.
Figure 5 shows the process of numerical modeling code generation using LLM. The input from the user is processed using a decoder-based architecture LLM, and subsequently, the code is generated.
Chen et al. [
28] performed a study to address a major research gap by systematically evaluating GPT 4’s capabilities in geotechnical education and problem-solving. The study includes a question bank of 391 questions covering soil mechanics, permeability, shear strength, slope stability, and bearing capacity. In this study, GPT-4 is envisioned as an AI tutor that can provide personalized instruction to students, correct errors in responses, explain reasoning steps, and serve as a feedback mechanism. Also, GPT 4 was applied to solve textbook-based geotechnical problems, including calculations for stresses, void ratios, and bearing capacities. GPT-4 achieved 28.9% accuracy with baseline performance without guidance, 34% accuracy when reasoning steps are requested, and 67% accuracy when domain-specific instructions are provided.
Liu and Shi [
30] conducted a study demonstrating the capability of LLM (GPT-4) to automatically extract critical information, such as geological conditions, laboratory test results, and engineering recommendations, from conventional geotechnical reports. Moreover, GPT-4 can parse general project metadata, subsurface and hydrogeologic conditions, design recommendations, spatial artifacts such as site maps and boring logs, and laboratory tests, and stream these outputs into Augmented Reality (AR)-based 3D visualizations for onsite decision support. This practice reduces the time and expertise taken for manual data processing, reduces human errors, and promotes data-driven decision-making.
Soranzo [
24] demonstrated that LLMs, including ChatGPT 4.0, DistilBERT, and MiniLM, when fine-tuned on geotechnical textbooks and domain-specific texts, can generate high-quality educational content, automate grading of technical responses and reports, and support consistent decision-making aligned with established soil mechanics and geotechnical design principles. In their study, GPT 4.0, BERT, and MiniLM were employed for generating geotechnical question–answers, creating synthetic student answers, computing cosine similarity for grading, and classifying student answers in Grades 1 to 5. LLM-based grading systems, supplemented by cosine similarity and retrieval augmented generation, have improved the evaluation of open-ended geotechnical questions, achieving up to 98% accuracy after fine-tuning and surpassing traditional similarity-based methods. Moreover, a web-based, threshold-powered tool for embedding and grading was developed, which instantly evaluates student responses and provides feedback. In summary, LLMs deliver near-human consistency with accuracy ranging from 97.5 to 98.3% on fine-tuned open-ended grading and 71.4% on full technical reports, while offering scalable, low-effort deployment and immediate feedback loops.
In the same year, Babu et al. [
26] conducted a study that included ChatGPT, Microsoft Copilot, and Google Gemini across various geotechnical concepts, such as slope stability, frost action, and cross anisotropy, and rated their performance as fair, good, and poor. The primary contribution of this study is a domain-specific evaluation of general-purpose LLMs as virtual assistants for fundamental, practical, and advanced technical topics. The study showed that LLMs can assist engineers with conceptual understanding, preliminary analysis, and literature review by providing fluent explanations of soil mechanics problems. While LLMs have strong potential to assist with geotechnical tasks, some limitations, such as misattributed references, incorrect technical generalizations, and failure to contextualize site-specific geotechnical conditions, have been observed.
Tophel et al. [
25] demonstrated the application of GPT 4 and Llama 3 as AI educators for undergraduate geotechnical engineering, emphasizing the RAG framework. By merging the geotechnical literature with formula repositories via an API, this research demonstrated that LLMs can improve accuracy and reliability in solving geotechnical topics such as consolidation, shear strength, and stress analysis. A GPT-4-based LLM achieved nearly 95% accuracy, showcasing the success of blending LLMs with geotechnical knowledge from the literature. Furthermore, this study underscores the use of LLMs as a supplementary resource similar to textbooks or solution manuals. Through these applications, this study demonstrates that domain-adapted LLMs can serve as scalable, 24/7 knowledge-support tools.
Reddy and Janga [
27] explored AI adoption through a global survey of geotechnical and geoenvironmental professionals, demonstrating that LLMs are primarily used for literature review, technical content preparation, code generation, and data interpretation. Moreover, LLMs have the potential to support geotechnical engineering practices by enabling efficient analysis of large geotechnical reports and reducing the time required for manual tasks, such as report preparation and data visualization. Apart from these advantages, LLMs also have disadvantages, such as hallucinations, numerical inaccuracies, and a lack of engineering judgment, which, at this point, make LLMs unsuitable for final design decisions.
Collectively, these studies show that LLMs have advanced from general-purpose tools to domain-aware knowledge-support tools for geotechnical engineering. Chen et al. [
28] reported that GPT-4’s problem-solving ability in geotechnical education is very sensitive to guidance. When domain-specific instructions are given before use, accuracy jumps from poor baseline results to strong performance. This makes GPT-4 a reliable AI tutor, but not an autonomous solver. Liu and Shi [
30] demonstrated that GPT-4 is robust for information extraction and visualization from geotechnical reports, thereby increasing efficiency and reducing human error. In contrast, Soranzo [
24] found that fine-tuned LLMs such as GPT-4 and BERT variants can match near-human consistency in grading and educational content generation, with more than 97% accuracy. Babu et al. [
26] adopted a practitioner-focused approach and observed that LLM output quality varies across basic and advanced topics. They also found persistent issues such as technical generalizations and contextual mismatches, reinforcing the need for human judgment. Similarly, Reddy and Janga [
27] identified LLM adoption in tasks such as literature review, reporting, and data interpretation, but highlighted hallucinations and numerical inaccuracies as barriers to their use in the final design. Finally, Tophel et al. [
25] showed that combining LLMs with structured geotechnical knowledge via RAG can achieve near textbook-level precision. The common takeaway is that LLMs work best as intelligent, scalable assistants for education, analysis, and decision support when carefully guided and fine-tuned. Although LLMs have emerged as successful tools for content generation and virtual assistance, they sometimes generate incorrect content due to hallucinations and incorrect prompts. Therefore, content generated from an LLM should be verified with subject matter.
3.6. LLM-Based Risk Assessment of Geotechnical Infrastructure
Risk assessment in geotechnical engineering is performed using different stochastic techniques, such as Monte Carlo simulation, to determine the probability of failure of geotechnical infrastructures. In the present era, researchers are using LLMs to assess the risk of different infrastructure systems.
Table 5 shows the application of LLM for risk assessment of geotechnical infrastructure.
Figure 6 shows a schematic representation of the risk assessment process using an LLM. The user input to the LLM includes site investigation parameters, sensor data, and instructions for performing the analysis. Based on the available data, the LLM performs Monte Carlo simulations and finite element analyses to generate the probability of failure. The probability of failure can be used in decision-making and hazard mitigation.
Njock et al. [
17] presented a study on how LLM can be operationalized for geotechnical risk assessment. The authors developed DistilBERT TunnelRisk to enable natural language-driven prediction of structural failure risk in shield tunnels. By converting conventional geotechnical inputs such as geological conditions and groundwater levels into question–answer pairs, the model enables engineers to query tunnel risk through conversational text rather than through structured numerical interfaces. The model achieves high predictive accuracy (precision/recall/F1 up to 0.96–1.0) and outperforms general-purpose LLMs like GPT 4 and DeepSeek in domain-specific reasoning. Overall, this research represents an advancement in applying LLMs to geotechnical engineering tasks, including excavation stability, slope failure assessment, and foundation risk evaluation.
In the same year, Areerob et al. [
33] integrated an LLM with multimodal AI in their study on geotechnical hazard interpretation, particularly expert-level landslide image analysis. The study linked aerial imagery with LLM-based reasoning to recreate the tacit decision-making processes conventionally done by experienced geotechnical engineers. By advancing both a VQA–LLM hybrid framework and an end-to-end multimodal LLM (MLLM), the authors proved how LLMs can be utilized for causal interpretation and future risk assessment of slope failures from visual data. Additionally, the major focus of this study is on the digitization of expert geotechnical knowledge captured via verbal commentary and structured using LLMs. The outcome demonstrates that LLM-driven systems can provide geologically relevant interpretations and risk insights comparable to those of human experts, highlighting the strong potential of LLMs as decision-support tools in geotechnical engineering. This option is fast, easy, and scalable for landslide assessment.
Another study in 2025 by Pang et al. [
32] examined the reconstruction of landslides and the automation of post-landslide investigation using LLM-based agentic AI. In this research, an LLM was combined with RAG to extract engineering-relevant information, and a multimodal LLM was integrated with fine-tuned vision models, such as YOLO, to estimate landslide geometry from site images. By using pretrained foundation models and CoT prompting, the suggested framework reduces reliance on large databases and heavy manual effort, two major drawbacks in traditional geotechnical analysis. Results from LLMs applied to historical landslide cases in Hong Kong show that summaries and geometric estimates are consistent with professional forensic reports. This highlights the potential of LLM-based agentic AI to achieve greater efficiency and scalability in hazard investigation, supporting quick risk assessment, better decision-making, and improved planning. Some downsides of LLM-generated risk assessments are that LLMs can select the wrong probabilistic distribution and can discard some risks due to hallucination. Therefore, step-by-step instructions should be provided to LLM for risk assessment.
3.7. LLM-Driven Automation of Numerical Modeling
Numerical modeling of geotechnical infrastructures is performed to determine the stability of slopes, bearing capacity estimation, settlement of infrastructures, and design of tunnels and underground infrastructures. Numerical modeling of infrastructure is performed using finite difference, finite element, and discrete element methods. Although these finite element and finite difference models are very accurate, these techniques have shortcomings, including complexity, time consumption, and the need for manual interpretation to analyze results.
LLMs can perform fast analysis and interpret results without human intervention.
Table 6 shows the application of LLMs for numerical modeling in geotechnical engineering.
Figure 7 shows a schematic representation of numerical modeling for slope stability analysis using an LLM. The LLM receives the prompt from the user and generates code to solve numerical problems. There is a feedback loop between the LLM agent and the numerical solver, which enables the LLM to improve the code and debug errors.
Bekele [
35] introduced GeoSim.AI, which demonstrates how LLMs can reshape computational geomechanics through numerical simulations, enabling them to be managed via natural language. GeoSim.AI uses LLMs as its central processing unit to translate natural-language or image inputs into full geomechanical simulation scripts for tools such as ADONIS, HYRCAN, PLAXIS, and FLAC. Moreover, this study showcases slope stability modeling in ADONIS and HYRCAN using text-only prompts and combined image-and-text prompts. GeoSim.AI automates repetitive setup tasks, allowing researchers to focus more on geomechanical behavior rather than software operations. Overall, GeoSim.AI’s ability to translate natural language and visual inputs into fully structured numerical models makes it efficient for geotechnical design.
Kim et al. [
34] conducted a study on the use of ChatGPT for Finite Element Analysis of soil–structure interaction and coupled hydro-mechanical problems. This work demonstrates how LLMs can autonomously generate executable FE code for single-field problems, such as 1D consolidation using Terzaghi’s equation, and mixed-field problems, such as coupled displacement–pore pressure formulations. Additionally, this work addresses three benchmark problems: 1D consolidation (fluid mass diffusion), differential settlement of a strip footing, and gravity-driven seepage in unsaturated soil. By validating GPT-generated finite element codes against analytical solutions and experimental data, this study provides a proof-of-concept for integrating AI into computational geomechanics workflows. From this study, it was observed that while using advanced libraries like FEniCS, ChatGPT required minimal code revisions and passed verification tests quickly, which is its primary advantage, whereas a low-level programming environment like MATLAB failed even after multiple prompt augmentations, requiring direct human intervention.
Kamran et al. [
31] demonstrate an integration of LLMs and Generative AI for geotechnical risk prediction, specifically focusing on rockburst hazards in underground construction. By leveraging Google Gemini’s multimodal (text, code, audio, images, PDFs, and video) reasoning and prompt engineering-driven automation, the authors show how LLMs can independently generate, refine, and validate Python code for complex geotechnical analyses, which can transfer traditionally manual and time-intensive processes into adaptive, data-driven workflows. Furthermore, the LLM was used to generate Pie charts for rockburst intensity distribution, pairwise scatter plots for variable relationships, and 3D plots for factor analysis and clustering results. The research highlights how LLMs help engineers in shifting from reactive safety measures to predictive, sustainable risk mitigation strategies by enabling automated data processing, factor analysis, clustering, and ML-based intensity forecasting with high accuracy. This work represents an emerging direction in which LLMs act not only as conversational assistants but also as intelligent analytical partners capable of enhancing underground risk assessment.
Some of the downsides of LLM-generated codes for numerical modeling are the generation of incorrect code due to hallucinations and the misrepresentation of real-world conditions due to incorrect prompts from users. Users can avoid these mistakes from LLMs by using the divide and conquer technique and requesting step-by-step code from an LLM. For example, if a user would like to deploy LLM for stability analysis of a slope subject to surcharge loading, the user should request LLM to generate code for performing slope stability considering only the self-weight of soil and subsequently move forward, integrating the surcharge.
3.8. Automation in Geotechnical Site Investigation Planning
Conventionally, geotechnical site investigation planning is performed manually based on project requirements. Manual methods are time-consuming and costly. Researchers are developing LLM-based techniques to automate geotechnical site investigation planning. Qian and Shi [
36] presented an LLM (GPT 4O)-empowered study that advances geotechnical engineering workflows by integrating RAG and agentic human–machine collaboration. Their work demonstrates how LLMs can automate key components of site investigation, including information retrieval from multi-source site investigation design codes, automated borehole layout planning, and rapid geological characterization from multimodal data. The prepared LLM model can automatically generate borehole spacing, depth, and layout schemes in accordance with regional codes, providing near-real-time, code-compliant site investigation plans. By proposing a Multihop RAG framework that is capable of accurately extracting domain-specific clauses and generating feasible sampling schemes, the study showcases the potential of LLMs to enhance efficiency, reduce human error, and support real-time, risk-informed decision-making. This contribution highlights an important movement toward sustainable geotechnical engineering by enabling digital transformation, improving resource optimization during site investigations, and fostering interpretable AI-assisted geotechnical analysis.
Table 7 shows the application of LLM for automating geotechnical site investigation planning.
Figure 8 shows the automated site investigation planning process using an LLM. The LLM receives parameters from the user and project information and generates borehole plans that account for various constraints and requirements. The design engineer reviews the output, and there is a feedback loop between the user and the LLM to optimize the design.
Li and Shi [
39] presented a study on the automatic generation of geological cross-sections from sparse borehole data using ChatGPT 4.0. This research presented that LLMs can understand geotechnical reasoning from few-shot textual examples without model retraining. The model developed a prompting strategy by integrating few-shot examples to teach domain rules, chain-of-thought (CoT) reasoning to enforce multistep logic, and self-consistency sampling. Two tunneling and reclamation case studies were validated in this study, in which LLMs were employed to determine stratigraphic boundaries and generate 2D geological cross-sections. This framework achieved an accuracy of ~77%, demonstrating that LLMs can reduce reliance on expert manual interpretation and improve consistency and efficiency. This proves LLMs as a powerful mechanism for scalable subsurface modeling.
In the same year, Wu et al. [
38] conducted a study on the use of LLM-based agentic AI in geotechnical engineering, demonstrating its ability to transform labor-intensive, expert-advised workflows. This LLM agent was used for geotechnical site planning, landslide investigation and post-event analysis, liquefaction analysis, and shield tunnel safety evaluation. Additionally, LLMs were used to extract design clauses from multilingual geotechnical codes and guidelines using RAG, and to automatically generate geological cross-sections from sparse borehole data using agentic workflows. Moreover, this study introduced a natural language-based geotechnical computation that uses natural language as a formal interface for site characterization, design reasoning, risk evaluation, and regulatory compliance checks. These automations ultimately decrease repetitive manual efforts and create proactive risk management, which ultimately creates sustainable geotechnical engineering practices.
3.9. LLM-Driven Workflow Automation in Geotechnical Data Analysis
Recent studies have demonstrated that LLMs are domain-specific workflow controllers rather than text generators. Zhang et al. [
29] conducted LLM-driven research in which LLMs serve as action agents to understand natural-language queries, retrieve geoscientific details, and execute analytical and visualization tasks via external tools. The LLM natural language queries are transformed into machine-readable API parameters for the OpenMindat Application Programming Interface (API). By merging LLM as a connecting layer between users and domain APIs, the workflow reduces programming queries and improves the consistency of data-driven analyses. Moreover, this study demonstrates that domain-specific fine-tuning of LLMs is not essential for complicated geoscience workflows. Instead, well-designed prompts and tool schemas will enable generic-purpose LLMs to work productively in specialized engineering contexts.
4. Integration of LLM Capabilities for Empowering Geotechnical Practitioners
Practicing geotechnical engineers across different consulting and design organizations often face challenges performing various tasks due to resource and budgetary constraints. Some of these tasks include scanning geotechnical reports and borehole logs, reviewing geotechnical instrumentation data, drafting geotechnical reports, conducting preliminary design and analysis of infrastructure, and summarizing the literature to synthesize knowledge across different concepts.
LLMs can be employed to advance and automate these activities that practicing engineers perform.
Figure 9 lists some tasks that can be automated using an LLM. Conventionally, scanning information from geotechnical reports and borehole logs requires a considerable amount of time; applying an LLM for this purpose can save time for geotechnical engineers. Geotechnical engineers can use LLMs to obtain quick answers on various geotechnical engineering concepts; however, these answers should be verified by experts for accuracy before using them for decision-making in a project. Engineers can use an LLM to review sensor data from geotechnical infrastructure to perform risk assessment. Moreover, engineers can use LLMs to perform preliminary design and analysis of geotechnical infrastructure. Although LLMs cannot perform finite element analysis, they can generate high-quality code to automate various geotechnical engineering processes.
LLMs can perform real-time monitoring of infrastructure by processing data from various geotechnical sensors, such as piezometers, inclinometers, earth pressure cells, and fiber-optic sensors, to detect anomalies in infrastructure behavior and issue early warnings. Real-time monitoring is useful for facilities with extreme consequences, such as water dams, tailing storage facilities, and nuclear power plants. For example, if an LLM can detect the movement of water dam foundations using an inclinometer, it can issue an early warning, allowing precautionary measures to be taken.
5. Challenges and Future Roadmaps
5.1. Challenges in LLM-Driven Geotechnical Engineering
Although LLMs have found large-scale applications across different sectors of geotechnical engineering, there are certain challenges in applying them to geotechnical design and analysis. Some challenges in implementing LLMs in geotechnical engineering include the need for large volumes of data, hallucinations in LLMs, and the requirement for extensive computational resources.
LLM generally provides better generalizability of results than conventional statistical models, machine learning models, and deep learning models. Better generalizability comes at the cost of requiring a large volume of data. Generating huge volumes of data is difficult in geotechnical engineering. For instance, LLM models developed for slope stability or embankment construction analysis are based on limited sets of subsurface data due to the high cost of acquiring them. The limited number of subsurface affects the generalizability of the models. One potential solution to this issue is to generate data using various data augmentation techniques. Very little information was obtained from the literature about the LLM hallucination rate during code generation for finite element analysis.
LLM sometimes hallucinates and provides incorrect solutions and codes for solving geotechnical engineering problems. Incorrect answers from LLMs may mislead geotechnical practitioners, leading to incorrect estimates of bearing capacity and the factor of safety for slopes and embankments. One solution to this problem is to cross-check the LLM-based solution against the literature. As very little information from the literature is available on the LLM hallucination rate during code generation for finite element analysis, this is a topic of interest to the scientific community.
Another shortcoming of incorporating an LLM-based approach in geotechnical engineering is prompt engineering. Prompt engineering is the process of structuring and designing instructions for LLMs. Adequate prompt engineering reduces hallucination, reduces computational times, and improves automation. Kumar [
23] presented a study demonstrating the stochastic parrot problem in LLMs, where LLMs can produce fluent but incorrect or misleading outputs. Moreover, this study provided a clear experimental example in which GPT produced incorrect soil classification results when asked for direct answers. However, when authors applied chain-of-thought prompting, the model correctly followed all logic, including the liquid limit threshold, plasticity index computation, and A-line comparison, and correctly generated a classification. Furthermore, this study is among the first to formalize prompt engineering as a methodological requirement rather than a user-convenience tool.
Finally, one major problem associated with the development of LLM-based geotechnical tools is the requirement for huge computational resources, as they are developed on huge volumes of data. The requirement for substantial computational resources increases costs and limits their use in edge devices. One topic of interest to geotechnical practitioners and researchers is computational resource cost for fine-tuning, retraining, or developing an LLM. However, little information on this topic is available in the current literature.
5.2. Future Roadmaps
The future research related to LLM application in geotechnical engineering can be categorized into three different components: (a) development of LLM with improved architecture, achieving better accuracy and performance, (b) development of LLM for better generalizability, and (c) developing LLM-based approaches that did not exist before. An LLM with better architecture can be developed for complex tasks that require nonlinear stress–strain behavior or large-strain behavior, including predicting particle flow during landslides and slope failures, and the consolidation of compressible soils. Moreover, advanced LLMs should be developed to solve multi-physics problems in geotechnical engineering, including the coupled thermal, hydraulic, and mechanical behavior of energy piles and the hydraulic and mechanical behavior of geomaterials during rainfall-induced landslides.
One of the major problems in geotechnical engineering is the acquisition of large volumes of data, which hinders the generalizability of LLMs. The generalizability of LLMs in geotechnical engineering can be improved by generating data with different data augmentation techniques or by using generative AI. Another approach to increasing generalizability is to train an LLM on data from regions around the world. For instance, an LLM trained to analyze slope stability in one region can be trained on data from another region to improve generalization.
Some of the potential future research areas of integrating LLM into the geotechnical engineering workflow:
Slope stability analysis of tailings dams due to changes in tailings deposition rate and precipitation conditions using pre-trained LLM. The Factor of Safety (FOS) and phreatic line from the model should be benchmarked with commercial finite element software.
Study the effects of extreme climatic conditions (floods and heavy rainfall) on the stability of earth dams using a pre-trained model or developing a new model. The deformation, FOS, phreatic line, and generated pore-water pressure should be standardized using commercial software.
Determine the deformation behavior of tunnels in earthquake-prone regions using pre-trained LLM and validating results using 3D finite element software.
LLM-driven code generation for developing physics-informed neural networks to compute the settlement of energy piles.
LLM can be integrated with modern technologies, such as digital twins, to enhance the safety and reliability of geotechnical infrastructure. A digital twin is a concept consisting of sensing devices, virtual models, and a communication system. The sensing device acquires real-time information from the geotechnical infrastructure; the virtual model functions as a replica of the real infrastructure; and the communication system enables real-time information transfer between the model and the infrastructure. LLM can be integrated into the digital twin infrastructure, serving as a decision-making tool to improve the safety and reliability of infrastructure. One area of interest in research and practice communities is the integration of LLMs into a digital twin framework for tailings storage facilities. As tailings storage dams are extreme-consequence facilities requiring real-time monitoring, a digital twin framework for tailings storage facilities can be developed, with an LLM serving as the data processing unit. LLM can combine climate data (precipitation, humidity, and temperature), geotechnical instrumentation data, and other historical data from the facility to detect problems within it.
6. Conclusions
This paper showcased a state-of-the-art review of the integration of LLMs into geotechnical engineering and their emerging role in analysis, design, risk assessment, and numerical modeling of geotechnical infrastructure. Moreover, LLM also found applications in automating site investigation and planning, as well as knowledge support. This study was initiated by a systematic literature review using keywords related to geotechnical engineering and LLMs across multiple databases. The literature was categorized into two main segments: LLMs and their applications in geotechnical engineering. The second segment was categorized into eight different categories. The literature review commenced by introducing the concepts of LLMs and the transformer architecture, and briefly describes three different LLM architectures and their potential applications in geotechnical engineering.
LLMs are increasingly used to generate automated MATLAB/Python code for seepage flow analysis, failure surface detection, and slope stability evaluation, with results that closely match those of commercial solvers such as SLOPE/W and SEEP/W. The research underscores the human-in-the-loop approach for clarifying prompts when ChatGPT misinterprets tasks (e.g., correcting slice angle calculation). Furthermore, advanced models such as Multi GeoLLM achieved a perfect accuracy of 1.0 with 60 multimodal cases and enabled automated design drawings.
In tunneling and underground engineering, LLMs have shifted from numerical or empirical approaches to language-driven, multimodal reasoning systems. Recent research on this domain demonstrated that geological sketches, drilling logs, tunnel face imagery, GPR measurements, and operational data can be converted into linguistically structured inputs utilizing knowledge graphs and prompt engineering. This framework enables LLMs to perform tunnel face stability assessments, rock mass integrity predictions, and advanced geological forecasting with precision exceeding 90%. LLMs are transforming tunnel engineering by converting complex geological and drilling data into a structured format, enabling high-precision estimates of tunnel face and rock mass stability. Furthermore, models like Tunnel-GPT and GeoPredict-LLM utilize multimodal inputs such as images, GPR signals, drilling logs, and geological sketches to automate and streamline forecasting with high accuracy.
For bearing capacity calculation and settlement estimation, several limitations of traditional methods, such as the finite element method, can be overcome by using LLMs for foundation design, as LLMs (such as GeoLLM) are trained on large datasets and exhibit better generalizability across conditions. Domain-specific GeoLLM demonstrates the ability to extract geotechnical parameters from textual reports, interpret international design standards (API, Eurocodes, Chinese, and American codes), and develop reliable computational workflows for shallow and deep foundations. Recent studies even claim the automation of pile-bearing capacity calculations using LLMs, generating a trustworthy workflow that can minimize manual tasks.
Geotechnical engineering is shifting from the textbook-based literature to LLM-based virtual assistance, which can provide instant support. Apart from numerical analysis, research in 2024 and 2025 highlights LLMs’ capabilities as virtual assistants, educational tools, and knowledge-support systems. When augmented with RAG and domain-specific instructions, GPT 4 and similar models were used to solve soil mechanics problems, extract insights from geotechnical reports, grade student responses, and even support 3D AR-based visualization. With fine-tuning and RAG frameworks, LLMs achieved a high precision of 95–98%, but limitations such as hallucinations and a lack of engineering judgment persist. The performance of an LLM depends on the prompt structure and domain guidance. Thus, it becomes necessary to have expert oversight and structured reasoning prompts to eliminate hallucinations and contextual errors. Overall, LLMs can serve as digital tutors and knowledge partners, always available.
LLM also gives new ways to geotechnical risk assessment by converting probabilistic, multimodal, and post-event investigation workflows into interpretable, conversational interfaces. Agentic LLM frameworks combining Monte Carlo simulations, vision models, and stepwise reasoning have shown strong agreement with expert forensic reports in landslide and tunnel failure analyses, suggesting that LLMs are well-suited for preliminary risk screening, hazard reconstruction, and decision prioritization.
Despite these advances, several limitations still exist. Data scarcity, site-specific variability in geotechnical conditions, hallucinated outputs, numerical inaccuracies, and the high computational cost of LLMs currently constrain their direct deployment for final designs. Consequently, LLMs should be viewed as intelligent co-pilots rather than autonomous designers, enabling them to support engineers in automation and synthesis.
Overall, this study demonstrates that LLMs are reframing geotechnical engineering workflows by serving as reasoning engines that can handle text, code, images, numerical models, and design knowledge. When carefully deployed within human-in-the-loop, prompt-driven, and domain-aware frameworks, LLMs give considerable gains in efficiency, consistency, accessibility, and sustainability. Continued evolution of LLMs, particularly through multimodal integration, improved domain grounding, and coupling with digital twins and real-time sensing, can make LLMs a foundational technology for the next generation of safe, data-driven geotechnical infrastructure systems.