Human–AI Teaming in Structural Analysis: A Model Context Protocol Approach for Explainable and Accurate Generative AI

Avila, Carlos; Ilbay, Daniel; Rivera, David

doi:10.3390/buildings15173190

Open AccessArticle

Human–AI Teaming in Structural Analysis: A Model Context Protocol Approach for Explainable and Accurate Generative AI

by

Carlos Avila

^1,*,

Daniel Ilbay

²

and

David Rivera

^1,*

¹

Grupo de Investigación de Energía, Minas y Agua (GIEMA), Facultad de Ciencias, Ingeniería y Construcción, Universidad UTE, Quito 170527, Ecuador

²

Facultad de Ciencias, Ingeniería y Construcción, Ingeniería Civil, Universidad UTE, Quito 170527, Ecuador

^*

Authors to whom correspondence should be addressed.

Buildings 2025, 15(17), 3190; https://doi.org/10.3390/buildings15173190

Submission received: 22 July 2025 / Revised: 18 August 2025 / Accepted: 28 August 2025 / Published: 4 September 2025

(This article belongs to the Special Issue Automation and Intelligence in the Construction Industry)

Download

Browse Figures

Versions Notes

Abstract

The integration of large language models (LLMs) into structural engineering workflows presents both a transformative opportunity and a critical challenge. While LLMs enable intuitive, natural language interactions with complex data, their limited arithmetic reasoning, contextual fragility, and lack of verifiability constrain their application in safety-critical domains. This study introduces a novel automation pipeline that couples generative AI with finite element modelling through the Model Context Protocol (MCP)—a modular, context-aware architecture that complements language interpretation with structural computation. By interfacing GPT-4 with OpenSeesPy via MCP (JSON schemas, API interfaces, communication standards), the system allows engineers to specify and evaluate 3D frame structures using conversational prompts, while ensuring computational fidelity and code compliance. Across four case studies, the GPT+MCP framework demonstrated predictive accuracy for key structural parameters, with deviations under 1.5% compared to reference solutions produced using conventional finite element analysis workflows. In contrast, unconstrained LLM use produces deviations exceeding 400%. The architecture supports reproducibility, traceability, and rapid analysis cycles (6–12 s), enabling real-time feedback for both design and education. This work establishes a reproducible framework for trustworthy AI-assisted analysis in engineering, offering a scalable foundation for future developments in optimisation and regulatory automation.

Keywords:

generative AI-assisted structural analysis; Model Context Protocol; human–machine interaction; computational efficiency; LLM–FEM integration

1. Introduction

Complexity has become central to modern engineering and is no longer a peripheral issue. Each decision—across systems from power grids to software—is entangled in socio-technical networks [1,2,3]. Emerging technologies and AI amplify this interconnectedness. When misunderstood, they introduce risk rather than insight [4]. In the past, engineers often relied on trial and error, lacking a formal understanding of complex interdependencies [3]. Seeing complexity as a problem often leads to hesitation and missed opportunities for learning. A more constructive approach is to view complexity as a valuable part of the engineering process. Nowadays, with clear principles, reliable data, and the support of AI tools, it is possible to reduce unnecessary complications while keeping the detail needed for robust and flexible solutions. This approach supports better decisions, faster progress, more effective collaboration across different groups, and effective human–machine interactions.

AI emulates human cognition by manipulating structured and unstructured data. Generative AI extends this capacity by synthesising new solutions, inferring goals, and navigating varied data formats [5]. Large language models (LLMs) emerged from this progress. Trained on massive corpora, they interpret linguistic input, generate text and code, and perform tasks with minimal fine-tuning. In structural engineering, LLMs now assist in design analysis by translating domain-specific queries into useful insights [6,7,8]. However, their outputs remain statistical guesses, not verifiable computations. They frequently hallucinate numbers, misapply logic, or embed biases—unacceptable in engineering contexts [9,10]. Their accuracy depends entirely on the quality and context of the prompt. To mitigate this, engineers must design domain-specific prompts and use frameworks that enforce structured reasoning. Embedding context-awareness and external validation is critical for making generative outputs trustworthy and auditable.

Generative AI models benefit significantly from context-aware integration. A system is context-aware when it tailors responses using information about its environment or task [11]. Traditional AI architectures, though adept at pattern recognition, lack the semantic depth to interpret complex real-world inputs. This is where context engines excel—they simulate human-like understanding by linking meaning across data. The Model Context Protocol (MCP) was developed to formalise these interactions. It connects LLMs with external tools through APIs, allowing structured communication with analytical software [12]. MCP is model-agnostic. It allows AI to read files, call functions, and respond to enrich contextual prompts [13]. This addresses a longstanding challenge in AI systems: maintaining consistent context across distributed tools [14,15]. When deployed within generative AI systems, MCP grounds the model’s responses in explicit data and reduces hallucinations. It strengthens accuracy and makes AI tools genuinely useful for domain-specific decision-making.

Unlike conventional interpreter–agent frameworks, such as those built with LangChain, which rely on predefined tool chains and modular orchestration components (e.g., LangGraph for workflow modelling, LangServe for scalable API deployment, and LangSmith for monitoring), MCP natively supports autonomous tool discovery, context-driven selection, and execution without requiring manual API wiring or platform-specific plugin management [15,16]. While LangChain provides a mature ecosystem for orchestrating multi-step workflows, Retrieval-Augmented Generation, and secure API integrations, the MCP introduces a dynamic and standardised interface that extends beyond these capabilities. MCP enables the communication of an LLM with external resources, prompt templates, and specialised tools, thereby contextualising tasks with domains not originally present in the LLM’s training data. This architecture not only permits the implementation of agents or AI-based applications over MCP but also ensures interoperability, computational fidelity, and traceability within real-time engineering workflows.

Generative AI models such as GPT-4, Claude 4, and LLaMA-3 are transforming construction industry workflows. Their natural language interfaces streamline documentation and enable rapid knowledge retrieval from case studies and codes [17]. However, without careful curation, their training data may omit edge cases—such as atypical structural systems or extreme climates [18]. As a result, generated advice may not generalise well across projects. For instance, recent studies have shown how LLMs can translate structural descriptions into OpenSeesPy scripts [6,18]. But LLMs still cannot verify geometry, assess structural logic, or detect orphaned elements. They lack intrinsic understanding of code compliance, safety margins, and construction feasibility. These limitations, along with dataset biases, underscore the need for moderated, context-constrained AI integration.

Recent studies highlight LLMs’ growing role in the building domain, spanning energy modelling, HVAC design, knowledge standardisation, and automated code compliance [19,20,21]. While these works demonstrate potential—e.g., generating energy models from natural language inputs, passing professional HVAC exams, and standardising performance data—limitations remain. Reported issues include legislative hallucinations, inaccurate thermal parameters, weak generalisation across jurisdictions, and performance gaps in open-source models. These constraints reinforce the need for domain-specific, context-aware integration to ensure LLM outputs are technically accurate, verifiable, and aligned with engineering standards [21].

Beyond these general limitations, generative AI is fundamentally reshaping the early phases of engineering and architectural design by enabling the rapid generation and exploration of extensive design alternatives in response to defined objectives [22]. Through parameterisation of material constraints, performance targets, and sustainability goals, GenAI algorithms can generate thousands of optimised variations in minutes rather than weeks [23]. When linked to Building Information Modelling (BIM) workflows, these solutions can be refined to enhance structural performance, material efficiency, and energy use [15]. In structural engineering, generative AI is already applied to optimise forms for minimal material consumption while preserving stability and safety [24]. However, baseline large language models, such as LLaMA-3.3 70B Instruct, while demonstrating qualitative comprehension of structural mechanics, lack the quantitative reliability required for safe analysis under complex loading and support conditions. Recasting structural analysis as a code-generation task and employing structured prompt engineering can achieve reliability above 99%, yet residual hallucinations and challenges in interpreting complex or visually described configurations remain. These developments illustrate that, despite generative AI’s unprecedented capacity to accelerate design exploration, rigorous contextualisation of input requirements and domain-specific validation are essential to mitigate risks and ensure trustworthy outputs. Because LLMs are weak in numerical reasoning and code compliance, structural engineering demands controlled integration frameworks that enforce validation and fidelity. Rather than use LLMs for direct calculations, a better strategy is to delegate computations to verified solvers. LLMs should serve as interfaces—interpreting user intent and relaying structured instructions. This study proposes such a system, embedding conversational LLMs within the Model Context Protocol. This framework connects natural language prompts to finite-element tools like OpenSeesPy (3.7.1, PEER). By enabling transparent feedback and structured inputs, MCP democratises access to advanced simulations. It helps engineers interpret results safely and guards against overconfidence in generative outputs.

2. Methods

2.1. Architecture Workflow

The proposed system uses a modular client–server design rooted in the MCP. It separates natural language processing from simulation execution. This architecture connects complex structural models with intuitive language-based interfaces (Figure 1). Engineers can run advanced simulations without needing to write code or manipulate software directly. The platform uses FastAPI (3.1.1) design to facilitate communication. Through this, engineers submit prompts, receive feedback, and manage simulations in an accessible and scalable environment. By lowering the barrier to entry, the system promotes interdisciplinary collaboration and accelerates design workflows.

The system consists of three layers (Figure 1). First, the Host Application Layer collects user prompts describing geometry and loading. Second, the Communication Layer validates inputs and routes requests. It uses FastAPI to connect the interface with simulation tools. Third, the External Application Layer includes OpenSeesPy, which performs structural simulations. This layer handles seismic analysis, load cases, and geometry creation. The analytical outputs are then returned to the reasoning engine, as context, to support results interpretation.

The proposed system architecture is structured into three principal layers, each fulfilling a distinct role in the execution pipeline (Figure 1). First, the Host Application Layer allows users to articulate structural configurations and loading scenarios through natural language prompts, submitted via an interactive interface or API client. Second, the Communication Protocol Layer, implementable using FastAPI, intermediates between user intent and computational execution. It ensures robust input validation, manages tool invocation, and routes data through a centralised orchestration server. Third, the External Application Layer comprises domain-specific tools such as OpenSeesPy, a Python (3.8.x, Python Software Foundation) interface to OpenSees (3.7.1, PEER), which conducts advanced structural simulations, particularly seismic analyses. The analytical outputs are then returned to the reasoning engine to support results interpretation. It is important to clarify that, while this study employs the MCP architecture in conjunction with OpenSeesPy, its design is not restricted to this specific platform. The MCP framework is inherently domain-agnostic and can be adapted to interface with any engineering software that provides an accessible API. This extensibility ensures that the proposed workflow can be repurposed across different computational domains, thereby enhancing its applicability beyond the scope of the present structural analysis case studies.

This layered setup ensures traceability by tracking each computational step. It also supports scalability through modular components and extensibility for future tools. Engineers can modify or extend the platform without retraining the language model. This makes the system durable and adaptable for long-term use. Potential extensions include dynamic response analysis, fragility curve estimation, and automated code checks. The separation of roles ensures reliable performance across evolving requirements.

2.2. Implementation of MCP

The MCP rationale implemented in this work is structured into three functional layers that coordinate natural language reasoning with structural simulation tools, enabling an interpretable and modular interface (Table 1). In the Client Layer, ChatGPT (GPT-4o, version released on 13 May 2024, OpenAI) processes user prompts and extracts parameters like geometry, loads, and material properties. It converts these into structured JSON that guides simulation tools. These tools are declared in the Server Layer, which specifies functions, inputs, and execution rules. Results update the context for the LLM hosted in the Client Layer, keeping the interaction coherent. The approach allows autonomous tool use within a verified framework. It enables simulations that are transparent, reproducible, and context-aware—essential features for AI in engineering.

Complementing this, the Server Layer incorporates specialised Python modules—such as seismic loading and static linear analysis tools—that interface with OpenSeesPy via a local API. These modules generate domain-specific outputs, including storey drifts, shear forces, and mode shapes, while maintaining modularity and reusability. The Server Layer manages the execution workflow by validating inputs, sequencing tasks, and handling errors. It returns results in structured, machine-readable JSON, ensuring consistency and traceability. This layered integration not only operationalises language-driven structural analysis but also supports future enhancements such as automated code compliance checks and structural optimisation routines.

2.3. MCP Client–Server Workflow

The proposed system adopts a REST-based client–server architecture implemented through FastAPI, with a streamlined communication pipeline centred on a unified /analyse endpoint that accepts natural language prompts as POST requests. These two components establish the operational foundation for a modular and scalable workflow that begins with the initialisation of a Model–Context environment, where all simulation and modelling resources are preloaded. Once activated, the prompt is interpreted by ChatGPT, which extracts engineering intent and encodes it into a structured JSON schema defining geometry, materials, and design actions. This structured payload then feeds into a modelling engine that generates a three-dimensional reinforced concrete frame using OpenSeesPy. Subsequent analysis routines simulate static and seismic loading conditions while incorporating code-based constraints. These steps converge to produce critical engineering outputs, including storey-level drifts, shear force distributions, and modal responses. Parsed results are evaluated against normative thresholds, such as the Ecuadorean Construction Code NEC-15 [25] and the American Standard for Ensuring Load Safety for All Infrastructure (ASCE 7-22) [26], and subsequently synthesised by the LLM into a coherent technical report. This sequential and modular design ensures deterministic, transparent, and reproducible outputs, while enabling flexible integration with command-line interfaces, interactive dashboards, or external systems, aligning with modern principles of traceable, AI-assisted structural analysis.

2.4. Tool Integration and Execution Logic

The proposed framework integrates advanced modelling, simulation, and validation tools to support structural engineering workflows. While OpenSeesPy was used for its open-source accessibility and seismic analysis capabilities, the architecture is platform-agnostic. Any structural analysis software with API access—open-source or commercial—can be integrated into the MCP architecture. This flexibility enables automated code-based verification and supports diverse engineering environments efficiently.

First, the Modelling Tool enables the construction of detailed 3D structural models by translating LLM-defined parameters into engineering representations, incorporating critical assumptions such as material behaviour and torsional modifiers. Second, the Simulation Tool executes seismic analyses using established code procedures (e.g., ASCE 7-22, NEC-15), ensuring accurate force distribution and job-level isolation for reproducibility. Third, the Parsing and Validation Tool extracts and assesses performance metrics (e.g., inter-storey drift compliance), formatting outputs in a machine-readable structure to support interpretability and traceability.

These three tools converge to support the Tool Invocation Logic, which is orchestrated by the MCP. The MCP architecture ensures correct sequencing, input validation, and dynamic selection of tools based on prompt semantics and LLM reasoning. This orchestration enables the LLM to act not merely as a query interpreter but as an intelligent controller of engineering workflows.

Consequently, the system achieves a stateless, modular, and declarative architecture, where each tool performs a distinct role, and their coordination is dynamically governed to support adaptive, context-aware structural analysis.

2.5. Model-to-Code Translation and Error Handling

The proposed MCP-based workflow implements a two-tier error-handling strategy to manage ambiguity, incompleteness, and edge cases in user prompts. These mechanisms operate at the Client Layer and Server Layer, ensuring robustness from prompt interpretation to simulation execution (Supplementary Material S1).

Client Layer validation ensures that all prompts conform to a predefined structured schema prior to transmission to the server. Missing parameters are explicitly populated with null values rather than inferred, thereby avoiding unintended assumptions. If a required field is absent, ill-typed, or inconsistent, a standardised diagnostic message is immediately returned to the user. This mechanism prevents the submission of invalid simulation requests and explicitly identifies the causes of rejection. For instance, a prompt to “analyse a concrete frame” without specifying bay lengths or storey heights triggers a client-side diagnostic listing the missing fields and halts execution until the required information is provided.

Server Layer monitoring operates after input reception, where parameters are normalised and the OpenSeesPy execution is encapsulated within runtime supervision. This process detects solver errors, convergence failures, or model misconfigurations and returns structured diagnostics. When a runtime error occurs—such as a stiffness matrix singularity or non-convergence—the server identifies the failure stage (input validation, execution, or numerical solution) and delivers both a machine-readable report and a concise human-readable explanation. For example, a model passing client-side checks but omitting stiffness modifiers may fail during assembly; this is captured by the server, and corrective guidance is issued.

Edge cases identified during testing included (1) contradictory span definitions leading to infeasible geometry, and (2) load cases exceeding solver stability thresholds. Addressing these scenarios illustrates the operational limits of the framework and highlights its capacity to recover from misinterpretations without manual debugging.

2.6. Case Study

This study applies the MCP to establish a structured communication interface between ChatGPT and the OpenSeesPy API, enabling the automated generation of structural analysis models through natural language input. A CIDI-style prompt [27] is formulated to describe a representative three-dimensional frame system, encoding key geometric and mechanical parameters—including member lengths, cross-sectional dimensions, bay spans in the X and Y directions, number of storeys, loading conditions, and material properties (Table 2). While the main text details a single reference case to illustrate the methodology, three additional case studies—designed using the same specification logic—are presented in Table 3 to assess the method’s generalizability and predictive accuracy. Prompt variability is addressed through a structured JSON extraction layer and error-tolerant parsing functions, ensuring consistent parameter interpretation regardless of linguistic or formatting differences. Supplementary Material S2 presents examples where syntactically distinct yet semantically equivalent prompts produce identical model definitions, demonstrating the MCP workflow’s robustness against ambiguous or incomplete inputs.

The defined model is intended for the analysis of storey drifts, shear forces, and vibration modes (Table 4). To ensure the correctness of the AI-assisted process, the same structural model is also manually implemented using conventional OpenSees programming techniques and manual modelling in ETABS (20.3.0, Computers and Structures, CSI). This enables the establishment of a baseline for comparison and supports validation and verification procedures. All configurations are evaluated with respect to compliance with relevant structural performance criteria, particularly storey drift limitations specified in the NEC-15 and ASCE 7-22 standards.

The comparison involved four groups. The GPT group introduced the CIDI prompt directly to the LLM. GPT+MCP employed the Model Context Protocol to connect the LLM with the OpenSees API engine, enabling automated generation and execution of analysis scripts without manual coding. The OpenSees group was manually programmed by the authors, while ETABS was also manually implemented and used as the benchmark. Although GPT+MCP and OpenSees yield identical numerical results, the key difference is procedural: GPT+MCP streamlines workflows, reduces manual intervention, and enhances accessibility by enabling structural analysis through natural language interaction rather than coding.

Finally, the relative error was employed to evaluate differences between the GPT-Standalone, GPT + OPENSEES, and OPENSEES results against ETABS (Equation (1)).

R e l a t i v e E r r o r = \frac{|X_{m o d e l} - X_{E T A B S}|}{X_{E T A B S}} \times 100 %

(1)

where X represents the respective structural response parameter.

3. Results

3.1. Storey Drift

The inter-storey drift results for Cases A–D in both X (Table 5) and Y (Table 6) directions demonstrate considerable performance differences between the evaluated models. The standalone GPT model consistently produced the largest deviations from the ETABS benchmark. Maximum errors reached 38.49% (Case A in Y) and 35.53% (Case A in X), with substantial inaccuracies observed across all storeys and cases (e.g., 0.006 vs. 0.012 in Case A in Y Storey 1).

In contrast, the GPT+MCP hybrid model exhibited markedly improved accuracy. Its drift predictions were closely aligned with both the OpenSees simulations and ETABS results across all storeys and loading cases. The maximum error for GPT+MCP relative to ETABS was limited to 2.756% (Case C, both directions, Table 7), with most errors below 2% (e.g., 1.41% in Case A, 1.09% in Case B in X, 1.27% in Case D in Y) (Table 7 and Table 8). Notably, the GPT+MCP outputs were virtually identical to the OpenSees results in every instance, confirming the hybrid model’s ability to correct the standalone GPT’s limitations.

3.2. Max Displacement

Figure 2 presents a comparison of maximum storey-level displacements in the X and Y directions, obtained using four different approaches: the standalone GPT model, the GPT+MCP pipeline, the native OpenSees solver, and the structural analysis software ETABS, across four case studies (A–D). In the X-direction, the standalone GPT method substantially overestimated deformations, yielding mean displacements of 0.047 m (Case A) to 0.180 m (Case C), versus ETABS values of 0.0128 m to 0.0356 m. By contrast, both GPT+MCP and the direct OpenSees implementation closely replicated ETABS results, with X-direction differences consistently below 3% (maximum of 2.83% for Case C) and virtually identical outputs (e.g., 0.013 m for both in Case A).

A similar trend appears in the Y-direction: the standalone GPT model registered displacements from 0.043 m (Case A) to 0.163 m (Case C), markedly exceeding ETABS benchmarks (0.013 m–0.037 m). In contrast, GPT+MCP and OpenSees closely matched ETABS across all cases, with Y-direction errors under 2% (minimum of 0.23% in Case C and maximum of 1.62% in Case B). Furthermore, standalone GPT produced errors exceeding 200% in every scenario (up to 481.6% in X for Case C), whereas GPT+MCP and native OpenSees remained within a narrow ±3% band (Table 9).

3.3. Base Shear

The results presented in Figure 3 illustrate the storey-wise accumulation of base shear forces and the associated relative errors across four structural case studies (A through D), evaluated using distinct analytical approaches. The standalone GPT consistently underestimated base shear compared to benchmark software such as ETABS and OpenSees. For instance, in Case C, standalone GPT predicted a base shear of 556.444 kN, whereas ETABS and OpenSees reported 861.322 kN and 861.136 kN, respectively.

The integration of standalone GPT with GPT+MCP exhibited remarkable consistency with the outputs of both OpenSees and ETABS. The base shear values obtained from GPT+MCP were nearly identical to those from OpenSees across all cases, and exhibited minimal deviations from ETABS. For instance, in Case D, GPT+MCP, OpenSees, and ETABS reported 712.342 kN, 712.342 kN, and 712.460 kN, respectively.

Further analysis of the relative errors (Table 8) highlights the substantial discrepancies of the standalone GPT model, with errors ranging from 27.774% (Case A) to 43.590% (Case D) when compared to ETABS. Conversely, the GPT+MCP and OpenSees models exhibited remarkably low relative errors, consistently below 0.03% across all cases, indicating a high degree of accuracy and reliability when compared to the benchmark ETABS results (Table 10).

3.4. Building Period

The GPT-only model consistently underpredicted the fundamental period, reporting values of 0.38 s, 0.44 s, 0.57 s, and 0.51 s for Cases A, B, C, and D, respectively (Figure 4). In contrast, both the GPT+MCP and the pure OpenSees analyses yielded identical period estimates—0.485 s, 0.679 s, 0.705 s, and 0.769 s—which closely approximated the ETABS results of 0.489 s, 0.684 s, 0.715 s, and 0.777 s.

The relative error evaluation, computed with respect to the ETABS reference, highlights the disparity between the methods. Standalone GPT incurred substantial errors of 22.29%, 35.67%, 20.28%, and 34.36% for Cases A–D, respectively, indicating limited reliability in isolation (Table 11). Conversely, both GPT+MCP and OpenSees confined their errors within a narrow window of 0.73% to 1.40%. Specifically, the GPT+MCP integration achieved minimum and maximum errors of 0.82% (Case A) and 1.03% (Case D), precisely matching the performance of the OpenSees model.

4. Discussion

Engineering analyses demand the careful integration of myriad variables to arrive at robust solutions. Because each technical decision unfolds within a web of interdependent factors—ranging from material properties and geometric constraints to load patterns and boundary conditions—the field of engineering is inherently complex [28]. This complexity is further amplified by the adoption of emergent technologies such as large language models, which are increasingly employed to automate evaluations and generate design insights [6,29]. However, inadequate proficiency in prompting and interpreting LLM outputs can yield misleading or erroneous results, thereby compromising both efficiency and reliability [24]. Recognising the dual challenge posed by engineering’s intrinsic intricacy and the potential misapplication of generative AI, this study proposes an integrative approach that embeds LLM capabilities—specifically those of ChatGPT—within an MCP architecture. This methodology systematically constrains model interactions, enforces input validation, and guides the interpretation of generated content, ensuring consistency with fundamental engineering principles.

By coupling the computational rigour of engineering software with the generative and creative affordances of GPT, the proposed approach enables dynamic user interaction with both numerical simulations and natural language reasoning systems. This interaction is not superficial; rather, it empowers users to interrogate, reformulate, and contextualise technical data through iterative dialogue. The ability to engage with engineering platforms in real time is essential in environments where feedback loops and rapid scenario evaluation are critical to success.

Simultaneously, GPT’s generative capacity facilitates the exploration of alternative hypotheses, clarification of ambiguous results, and formulation of new lines of inquiry. The convergence of these capabilities within the MCP interface yields a hybrid analytical space in which human insight is augmented—rather than supplanted—by artificial intelligence. Ultimately, this integration enables users to engage critically and constructively with complex engineering outputs, such as those derived from three-dimensional frame analysis. The result is an enhanced decision-making environment where computational analysis is complemented by structured interpretive dialogue, mitigating misinterpretation while preserving fidelity to engineering rigour.

The MCP framework is inherently regulation-agnostic, functioning as an orchestration layer capable of accommodating code-compliance requirements from any jurisdiction. Its modular architecture allows seamless integration of country-specific regulatory modules or API-accessible verification tools, ensuring adaptability to diverse legal and technical contexts. In the present implementation, the compliance layer was configured to verify structural models against NEC-15 (Ecuador) and ASCE 7-22 (United States) provisions. However, this configuration is not limiting. By adjusting the validation layer, the framework can be extended to incorporate other standards—such as Eurocodes, Indian Standards, or region-specific seismic and wind load regulations—without altering the core client–server logic. This flexibility supports the deployment of MCP in multi-jurisdictional engineering environments, facilitating automated, code-compliant design verification across international projects.

In the present study, the implemented MCP-based system was subjected to a series of benchmark structural-engineering queries—including single-span beams, multi-storey frames, and seismic loading scenarios—to assess its modelling robustness. Across every test, the system not only generated syntactically and physically valid OpenSees models but also executed them without runtime failures, subsequently translating numerical outputs into clear, natural language interpretations. This consistent performance directly substantiates the system’s capacity to perform automated structural analyses with high reliability. Independently, a detailed comparative evaluation of storey-drift, nodal displacement, and vibrational period metrics revealed that GPT+MCP predictions consistently aligned with ETABS benchmarks within a one percent error margin at each storey level. Such congruence indicates that the MCP integration not only guides the generative model toward engineering-grade precision but also elevates its predictive fidelity to match that of both open-source and commercial finite-element platforms. A similar approach was reported by Liang et al. (2025), where an LLM generated OpenSeesPy code for 2D frame analysis, which was then automatically executed within a Python-based framework [6]. Together, these convergent lines of evidence demonstrate that embedding MCP within an LLM-driven workflow effectively constrains its outputs to the stringent accuracy demands of structural engineering practice, thereby validating the proposed method’s practical viability.

Leveraging carefully structured, CIDI-style prompts establishes a rigorous foundation for guiding the LLM’s interpretation [30]. The prompt design produced responses that closely followed the input instructions, demonstrating a high level of consistency with limited variation. Together, these factors converge to support the observation that the system’s textual interpretations reliably report key metrics such as maximum drifts, storey-level forces, and compliance thresholds, although ambiguous requests occasionally yielded overly generic summaries, underscoring the necessity of prompt specificity. In parallel, the adoption of a structured JSON interface between the LLM and computational tools further reinforces reproducibility and traceability: even slight variations in prompt wording did not compromise the stability of parameter extraction or tool invocation. These combined mechanisms ensure that each analysis run generates an identical, timestamped pipeline—from the initial Tcl script through raw OpenSees outputs to parsed performance metrics—while the LLM remains stateless across executions. Therefore, the modular MCP protocol architecture not only guarantees executional fidelity but also delivers comprehensive logging at every stage, a feature essential for educational deployment, rigorous peer review, and regulatory auditing of AI-assisted design workflows.

Our integrated prompt-to-output pipeline—encompassing generative LLM synthesis, numerical simulation in OpenSees, and result parsing—achieves full cycle times of just 6–12 s on an Apple M1 Pro workstation. Such sub-dozen-second latencies align with the demands of interactive teaching environments and design charrettes, where rapid feedback deepens insight and accelerates structural hypothesis refinement. By contrast, conventional workflows that combine manual scripting in OpenSees with modelling in ETABS incur iteration times at least fifteen times longer—often spanning several minutes—due to code editing, batch file orchestration, and post-processing overhead. This extended turnaround interrupts the cognitive flow of engineers and students and constrains the exploration of design alternatives [31]. By unifying AI-driven model generation with high-performance simulation in a single coherent framework, our method preserves modelling accuracy while substantially reducing latency. Moreover, this streamlined methodology minimises manual intervention overhead, ensuring consistency across iterations and reducing human error. As a result, it enables real-time educational engagement, fosters iterative design exploration, and significantly enhances overall workflow efficiency in 3D frame structure modelling. This study thus advocates adopting this integrated approach to accelerate design cycles without compromising structural fidelity.

Although the case study focused on static evaluation, the GPT-MCP workflow is not restricted to static analysis. Its applicability extends to dynamic simulations, as supported by the external solver. The observed limitations reflect the specific configuration of OpenSeesPy used in this study, not the architecture of the workflow itself, which remains adaptable to any analysis type permitted by the external software’s capabilities. Furthermore, soil–structure interaction effects, foundation flexibility, and higher-order P-Δ phenomena are not modelled, potentially underestimating global demand for slender or soft-storey systems.

Furthermore, beyond regular structural systems, potential challenges emerge in irregular or code-exempt design scenarios where LLM-based reasoning could diverge more substantially. Within the proposed framework, the Client Layer governs LLM interaction, while the Server Layer mediates external computational tools, particularly OpenSees. In such irregular cases, it is the structural analysis engine that generates the specialised engineering context required to anchor LLM outputs. By feeding this validated computational evidence back to the LLM, the framework ensures that user interactions remain appropriate, traceable, and consistent with structural behaviour, thereby mitigating the risk of unconstrained or misleading responses.

In addition, limitations emerge from the language model’s reliance on pre-trained knowledge and the specific prompt it receives. As a result, while the system can produce coherent textual outputs, it lacks the built-in ability to verify whether the predicted values align with physical laws unless such checks are explicitly added in later stages of the process. The current response time and memory use are suitable for small-scale applications but may not support larger studies, such as those involving broad parameter variations or regional evaluations. These challenges point to future improvements, including the integration of physical verifiers, simplified dynamic solvers, and distributed computing methods, to support more advanced structural assessments. Finally, the use of commercial solvers may hinder access for academic institutions with limited resources, reducing the potential for widespread adoption.

An additional limitation concerns the reliance on a commercial GPT model accessed via a proprietary black-box API. The model’s internal architecture, training corpus, and parameterisation are undisclosed, and updates may occur without notification, potentially altering output behaviour and limiting reproducibility unless explicit version control is implemented. We recommend documenting the model name, release date, and snapshot identifier to improve traceability. Moreover, transmitting structural data through public APIs introduces security and privacy risks. Future implementations should adopt secure transmission protocols, anonymisation, or on-premises deployments. Importantly, the MCP framework is model-agnostic, enabling seamless integration of generative model improvements while ensuring reliable, context-grounded outputs.

5. Conclusions

This study has shown that integrating LLMS with an MCP enables accurate, auditable, and accessible structural analysis. By structuring the interaction between natural language interpretation and numerical computation through the MCP framework, the GPT+MCP system achieves both modularity and integration. While the language model and the simulation engine remain distinct components, the protocol enables contextual information from the engineering software to inform the language model’s responses. This bidirectional flow fosters coherent interaction, ensuring accurate, code-compliant outputs. Case studies confirm that the system consistently reproduces benchmark results with minimal latency, meeting professional engineering standards.

The significance of this work lies in its ability to transform generative AI from a speculative tool into a dependable, domain-specific interface. Unlike standalone models, the MCP architecture enforces validation, ensures reproducibility, and guides language models toward context-aware, code-compliant outcomes. This elevates decision-making quality, supports transparent model interpretation, and makes advanced analysis accessible to non-programmers and emerging engineers alike.

Looking ahead, the MCP offers a promising foundation for integrating generative AI with a wide range of engineering computational software, provided such systems expose their functionality in a structured manner. In many cases, no additional adaptations are necessary—GPT models can interact effectively through the context provided by the software itself. This potential for seamless integration highlights one of MCP’s key advantages: its capacity to mediate between natural language processing and domain-specific execution environments. Nevertheless, further evaluation is warranted to assess how reliably this approach scales across more complex or less standardised systems. As AI-assisted design continues to evolve, such context-aware frameworks will be essential for advancing responsible, interoperable, and technically sound innovation in engineering practice.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/buildings15173190/s1, Supplementary Material S1: Handling Ambiguity, Incompleteness, and Execution Failures; Supplementary Material S2: Robustness to Prompt Variations.

Author Contributions

Conceptualisation, C.A. and D.R.; formal analysis, C.A.; investigation, D.R.; methodology, C.A.; software, C.A. and D.I.; supervision, C.A. and D.R.; validation, D.I.; writing—original draft, D.I. and D.R.; writing—review and editing, C.A. and D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, D.R., upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	artificial intelligence
API	Application Programming Interface
ASCE 7-22	American Society of Civil Engineers Standard 7, 2022 Edition
CIDI	Context–Intent–Details–Instructions (prompting structure)
ETABS	Extended Three-Dimensional Analysis of Building Systems
GPT	Generative Pre-trained Transformer
GPT+MCP	Generative Pre-trained Transformer integrated with Model Context Protocol
JSON	JavaScript Object Notation
LLM	Large Language Model
MCP	Model Context Protocol
NEC-15	Norma Ecuatoriana de la Construcción, 2015 Edition
OpenSees	Open System for Earthquake Engineering Simulation
OpenSeesPy	Python interface for OpenSees
REST	Representational State Transfer
Tcl	Tool Command Language

References

Garza Morales, G.A.; Nizamis, K.; Bonnema, G.M. Engineering complexity beyond the surface: Discerning the viewpoints, the drivers, and the challenges. Res. Eng. Des. 2023, 34, 367–400. [Google Scholar] [CrossRef]
Morales, G.A.G.; Nizamis, K.; Bonnema, G.M. Why is there complexity in engineering? A scoping review on complexity origins. In Proceedings of the 2023 IEEE International Systems Conference (SysCon), Vancouver, BC, Canada, 17–20 April 2023; IEEE: New York, NY, USA, 2023; pp. 1–8. [Google Scholar] [CrossRef]
Suh, N.P. Complexity in Engineering. CIRP Ann. 2005, 54, 46–63. [Google Scholar] [CrossRef]
Oladele Junior, A.; Ibrahim, A. Artificial Intelligence for Systems Engineering Complexity: A Review on the Use of AI and Machine Learning Algorithms. Comput. Sci. IT Res. J. 2024, 5, 787–808. [Google Scholar] [CrossRef]
Salehi, H.; Burgueño, R. Emerging artificial intelligence methods in structural engineering. Eng. Struct. 2018, 171, 170–189. [Google Scholar] [CrossRef]
Liang, H.; Kalaleh, M.T.; Mei, Q. Integrating Large Language Models for Automated Structural Analysis. arXiv 2025, arXiv:2504.09754. [Google Scholar] [CrossRef]
Cha, Y.-J.; Ali, R.; Lewis, J.; Büyükӧztürk, O. Deep learning-based structural health monitoring. Autom. Constr. 2024, 161, 105328. [Google Scholar] [CrossRef]
Zhang, L.; Le, B.; Akhtar, N.; Lam, S.-K.; Ngo, T. Large Language Models for Computer-Aided Design: A Survey. ACM Comput. Surv. 2025, 37, 31. [Google Scholar] [CrossRef]
Yang, X.; Chen, B.; Tam, Y.-C. Arithmetic Reasoning with LLM: Prolog Generation & Permutation. In Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Mexico City, Mexico, 16–21 June 2024. [Google Scholar]
Ismayilzada, M.; Paul, D.; Montariol, S.; Geva, M.; Bosselut, A. CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks. The 2023 Conference on Empirical Methods in Natural Language Processing. arXiv 2023. [Google Scholar] [CrossRef]
Hong, J.; Suh, E.; Kim, S.-J. Context-aware systems: A literature review and classification. Expert Syst. Appl. 2009, 36, 8509–8522. [Google Scholar] [CrossRef]
Anthropic. Introducing the Model Context Protocol. 2024. Available online: https://www.anthropic.com/news/model-context-protocol (accessed on 17 June 2025).
Ray, P.P.; Pratim, P.R. A Survey on Model Context Protocol: Architecture, State-of-the-art, Challenges and Future Directions. TechRxiv 2025. [Google Scholar] [CrossRef]
Krishnan, N. Advancing Multi-Agent Systems Through Model Context Protocol: Architecture, Implementation, and Applications. arXiv 2025, arXiv:2504.21030. [Google Scholar] [CrossRef]
Hou, X.; Zhao, Y.; Wang, S.; Wang, H. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv 2025. [Google Scholar] [CrossRef]
Mavroudis, V. LangChain v0.3. Preprints 2024. [Google Scholar] [CrossRef]
Rane, N. Role of ChatGPT and Similar Generative Artificial Intelligence (AI) in Construction Industry. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
Ghimire, P.; Kim, K.; Acharya, M. Opportunities and Challenges of Generative AI in Construction Industry: Focusing on Adoption of Text-Based Models. Buildings 2024, 14, 220. [Google Scholar] [CrossRef]
Lu, J.; Tian, X.; Zhang, C.; Zhao, Y.; Zhang, J.; Zhang, W.; Feng, C.; He, J.; Wang, J.; He, F. Evaluation of large language models (LLMs) on the mastery of knowledge and skills in the heating, ventilation and air conditioning (HVAC) industry. Energy Built Environ. 2024, in press. [CrossRef]
Jiang, G.; Chen, J. Efficient fine-tuning of large language models for automated building energy modeling in complex cases. Autom. Constr. 2025, 175, 106223. [Google Scholar] [CrossRef]
Jurišević, N.; Kowalik, R.; Gordić, D.; Novaković, A.; Vukašinović, V.; Rakić, N.; Nikolić, J.; Vukićević, A. Large Language Models as Tools for Public Building Energy Management: An Assessment of Possibilities and Barriers. Int. J. Qual. Res. 2025, 19. [Google Scholar] [CrossRef]
Albukhari, I.N. The Role of Artificial Intelligence (AI) in Architectural Design: A Systematic Review of Emerging Technologies and Applications; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–20. [Google Scholar] [CrossRef]
Onatayo, D.; Onososen, A.; Oyediran, A.O.; Oyediran, H.; Arowoiya, V.; Onatayo, E. Generative AI Applications in Architecture, Engineering, and Construction: Trends, Implications for Practice, Education & Imperatives for Upskilling—A Review. Architecture 2024, 4, 877–902. [Google Scholar] [CrossRef]
Liu, J.; Geng, Z.; Cao, R.; Cheng, L.; Bocchini, P.; Cheng, M. A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis. arXiv 2025, arXiv:2507.02938. [Google Scholar]
CAMICON; MIDUVI. Norma Ecuatoriana de la Construcción—NEC: NEC-SE-MP—Mamposteria Estructural. Quito, Dec. 2014. Available online: https://www.habitatyvivienda.gob.ec/wp-content/uploads/2023/03/10.-NEC-SE-MP-Mamposteria-Estructural.pdf (accessed on 28 August 2025).
American Society of Civil Engineers. Minimum Design Loads and Associated Criteria for Buildings and Other Structures; ASCE/SEI 7-22; ASCE: Reston, VA, USA, 2022. [Google Scholar]
Hardman, P. Structured Prompting for Educators. 2023. Available online: https://drphilippahardman.substack.com/p/structured-prompting-for-educators (accessed on 13 July 2025).
Baccarini, D. The concept of project complexity—A review. Int. J. Proj. Manag. 1996, 14, 201–204. [Google Scholar] [CrossRef]
Joffe, I.; Felobes, G.; Elgouhari, Y.; Kalaleh, M.T.; Mei, Q.; Chui, Y.H. The Framework and Implementation of Using Large Language Models to Answer Questions about Building Codes and Standards. J. Comput. Civ. Eng. 2025, 39, 05025004. [Google Scholar] [CrossRef]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.H.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2023, arXiv:2201.11903. [Google Scholar]
Nielsen, J. Response Times: The 3 Important Limits. 1993. Available online: https://www.nngroup.com/articles/response-times-3-important-limits/ (accessed on 17 July 2025).

Figure 1. MCP Architecture Workflow.

Figure 2. Maximum inter-storey displacement in the X (a) and Y (b) directions for each case study (A–D) and computational method. The GPT+MCP pipeline corresponds to ChatGPT-assisted modelling using the MCP, while standalone GPT results are generated without model feedback. ETABS and OpenSees serve as reference solutions.

Figure 3. Comparison of base shear in the case studies with the different calculation approaches. The GPT+MCP pipeline corresponds to ChatGPT-assisted modelling using the MCP, while standalone GPT results are generated without model feedback. ETABS and OpenSees serve as reference solutions.

Figure 4. Comparison of building period in the case studies with respect to the different calculation approaches. The GPT+MCP pipeline corresponds to ChatGPT-assisted modelling using the MCP, while standalone GPT results are generated without model feedback. ETABS and OpenSees serve as reference solutions.

Table 1. Tool declaration and integration role.

MCP Layer	Role of Tools Declaration
Client Layer	Hosts the LLM-based application and initiates tool execution requests.
Server Layer	Provides the tools to the Client Layer and executes them upon request.
External Applications	Supply expert context and domain-specific information to support tool execution.

Table 2. Geometric, mechanical, and loading parameters defining the reference case study.

Category	Parameter	Magnitude
Geometry	Number of Storeys	5
	Storey Heights	3.0, 2.5, 2.5, 2.5, 2.5 m (from bottom to top)
	Spans in X-Direction	4 m, 5 m, 6 m
	Spans in Y-Direction	3.5 m, 4.5 m
Sections	Beam Cross-Section	0.30 m × 0.40 m
	Column Cross-Section	0.40 m × 0.40 m
	Cracking Factor (Beams)	0.7
	Cracking Factor (Columns)	0.8
Material	Concrete Young’s Modulus	21,458,890.83 kN/m²
Loads	Dead Load	4.9 kN/m²
	Live Load	1.9 kN/m²
	Dead Load Weight Coefficient	1
	Live Load Weight Coefficient	0.15
Seismic Parameters	Base Shear Coefficient	0.1488
	Vertical Distribution Coefficient	1
	Accidental Torsion Coefficient	0.05
	Drift Amplification Factor (for Inelastic Drift)	6
	Maximum Allowable Drift	0.02

Table 3. Basic structural organisation of the case studies.

Case ID	No. of Floors	Storey Heights (m)	Spans X (m)	Spans Y (m)	Geometry Type
A	2	3.0–3.0	4–4	4–4	Symmetric
B	3	3.0–3.0–3.0	4–5–4	3.5–4.5	Asymmetric
C	5	3.0–2.5–2.5–2.5–2.5	4–4–4	4–4–4	Symmetric
D	5	3.0–2.5–2.5–2.5–2.5	4–5–6	3.5–4.5	Asymmetric

Note: Case D represents the scenario detailed in the Case Study section.

Table 4. Structural analysis case study: (a) three-dimensional model for OpenSees and ETABS implementation, and (b) natural language specification via MIDI-formatted prompt.

(a)

(b)

prompt = (
“Context:”

“You are an expert in structural analysis using natural language and numerical simulation with OpenSeesPy.“
“The implemented system is capable of interpreting technical prompts and generating automated structural simulations“
“based on international seismic-resistant design standards.”

“Instructions:”

“Analyse a three-dimensional reinforced concrete frame, verify code compliance for inter-storey drift,“
“and apply structural optimisation if necessary.”

“Details:”

“The modulus of elasticity for concrete is 21,458,890.83 kN/m².“
“The structural system has spans of 4, 5, and 6 m in the X-direction, and spans of 3.5 and 4.5 m in the Y-direction.“
“The structure has 5 storeys, with storey heights of: 3.0, 2.5, 2.5, 2.5, and 2.5 m, respectively.“
“Beams have a cross-sectional dimension of 0.3 × 0.4 m, and columns are 0.40 × 0.45 m.“
“Cracking factors are 0.7 for beams and 0.8 for columns.“
“Dead load is 4.9 Kn/m² and live load is 1.9 kN/m².“
“The weight coefficients are: 1.0 for dead load, 0.15 for live load, 0.1488 for base shear coefficient,“
“1.0 for vertical distribution of base shear, 0.05 for accidental torsion,“
“and a drift amplification factor of 6.0 is applied to estimate inelastic drift. The maximum allowable drift is 0.02.”

“Tasks:”

“Perform linear static seismic analysis using the equivalent lateral force method with OpenSeesPy.“
“Compute maximum displacements and storey drifts per level and direction (X and Y).“
“Perform strict numerical validation:”
-
“Iterate through all obtained inelastic drift values.“
-
“For each value, compare it against the allowable maximum (0.02).“
-
“If * at least one value * exceeds 0.02, * you must not state that all values are compliant *.“
-
“Report precisely: storey number, direction (X or Y), and the drift value that exceeds the limit.“
-
“Only if * all drifts * are ≤ 0.02, the code compliance can be confirmed.“
-
“Present results in tabular format and be rigorous with numerical precision.”
“Also determine floor-by-floor shear forces and vibration modes.”
“Structural optimisation:”
-
“If any drift exceeds the limit, propose a structural optimisation based on displacements,“
-
“storey drifts, shear forces, and vibration modes.“
-
“Generate 10 alternatives by modifying material properties and section dimensions.“
-
“Evaluate drift for each alternative and present the comparison in tabular format.“
-
“Highlight the configurations that meet code requirements and provide better structural efficiency.”

“Intent:”

“Generate an automated technical report, including detailed structural analysis, code validation,“
“and optimisation in case of non-compliance. The output must be expressed in technical language and clear tables,“
“suitable for professional and academic environments.”

)

The asterisk (*) is used as a markup device to highlight specific terms and facilitate the interpretation of user requirements. It should not be understood as a mathematical operator or syntactic element.

Table 5. Inter-storey drift in X.

Inter-Storey Drift X
Case	Storey	GPT	GPT+MCP	OPENSEES	ETABS
A	2	0.009	0.013	0.013	0.013
A	1	0.007	0.012	0.012	0.012
B	3	0.022	0.015	0.015	0.016
	2	0.014	0.022	0.022	0.022
	1	0.010	0.015	0.015	0.015
C	5	0.022	0.007	0.007	0.008
	4	0.016	0.012	0.012	0.013
	3	0.014	0.016	0.016	0.017
	2	0.012	0.019	0.019	0.019
	1	0.007	0.014	0.014	0.015
D	5	0.016	0.009	0.009	0.009
	4	0.012	0.015	0.015	0.015
	3	0.011	0.020	0.020	0.020
	2	0.010	0.022	0.022	0.022
	1	0.006	0.016	0.016	0.016

Table 6. Inter-storey drift in Y.

Inter-Storey Drift in Y
Case	Storey	GPT	GPT+MCP	OPENSEES	ETABS
A	2	0.008	0.013	0.013	0.013
A	1	0.006	0.012	0.012	0.012
B	3	0.019	0.017	0.017	0.017
	2	0.013	0.024	0.024	0.024
	1	0.009	0.016	0.016	0.017
C	5	0.022	0.007	0.007	0.008
	4	0.016	0.012	0.012	0.013
	3	0.014	0.016	0.016	0.017
	2	0.012	0.019	0.019	0.019
	1	0.007	0.014	0.014	0.015
D	5	0.015	0.010	0.010	0.010
	4	0.011	0.016	0.016	0.016
	3	0.010	0.020	0.020	0.021
	2	0.008	0.022	0.022	0.023
	1	0.005	0.015	0.015	0.016

Table 7. Relative error in the inter-storey drift in X.

	MAX ERROR vs. ETABS
CASE	GPT	GPT+MCP	OPENSEES	ETABS
A	35.527	1.408	1.408	NA
B	2.441	1.087	1.087	NA
C	12.120	2.756	2.756	NA
D	26.119	1.900	1.900	NA
SD of the max error	14.667	0.727	0.727	NA