1. Introduction
Design is a cognitively demanding activity that involves iterative cycles of problem framing, idea generation, evaluation and refinement, moving back and forth between divergent exploration and convergent decision-making [
1,
2,
3]. In recent years, generative artificial intelligence (GenAI) has become a new kind of design aid: image- and text-based systems can rapidly generate alternatives, support concept development and visualisation, and expand the breadth and depth of designers’ solution spaces [
4,
5,
6]. Large-language-model (LLM) GenAI system platforms such as ChatGPT and Gemini are now integrated into everyday workflows far beyond computer science, including architectural and interior design practice [
7,
8,
9]. Designers increasingly turn to these tools not only for images but also for explicit spatial strategies and design rationales. Yet, while GenAI clearly accelerates divergent exploration, it is unclear how reliable and theoretically sound its proposals are when designers work with specific, evidence-based design principles rather than purely stylistic aims [
10,
11].
Biophilia describes the innate human tendency to seek connections with living systems and natural processes [
12,
13,
14]. Biophilic design operationalises this concept by embedding natural elements, patterns and experiences into the built environment to restore human-nature relationships in increasingly urbanised and technologically mediated settings [
15,
16,
17,
18,
19]. Empirical studies associate exposure to natural features with reduced stress, anxiety and depression, improved mood and social functioning, and a range of physiological benefits, including lower blood pressure, reduced pain and faster recovery in healing environments [
20,
21,
22,
23,
24,
25,
26]. Cognitive outcomes such as improved attention, memory and learning further underline the relevance of biophilic strategies for performance and restoration in work and clinical settings [
21,
22,
27,
28,
29,
30,
31]. These effects are grounded in theoretical frameworks, including Stress Recovery Theory (SRT) and Attention Restoration Theory (ART), which explain how unthreatening natural environments support psychophysiological recovery and the restoration of fatigued directed attention through “soft fascination” and “being away” [
32,
33,
34].
To guide practice, several structured biophilic frameworks have been proposed. Kellert and colleagues organised design parameters into multi-layered taxonomies of human-nature relationships [
13,
17]. Terrapin Bright Green’s 14 Patterns of Biophilic Design distilled this work into a widely used set of patterns clustered under Nature in the Space, Natural Analogues and Nature of the Space, including visual and non-visual connections with nature, thermal and airflow variability, presence of water, dynamic and diffuse light, biomorphic forms, material connection, complexity and order, prospect, refuge, mystery and risk/peril [
35]. This pattern-based approach can also be situated within a longer architectural lineage of “pattern thinking” rooted in Alexander et al.’s A Pattern Language (1977), which links recurring spatial configurations to everyday human activities, comfort, and well-being [
36]. From this perspective, several biophilic patterns, particularly relational and experiential ones such as prospect-refuge, gradations of privacy, perceived human-scaled complexity, and continuity between interior and exterior, can be read as contemporary, evidence-oriented reinterpretations of pattern-language principles.
Despite the parallel rise in GenAI and biophilic design, their intersection has barely been explored. A scoping review by Tekin et al. (2025) identified only five studies that involve artificial intelligence within the biophilic design domain, most of which use AI as an analytical back-end rather than as an interactive design partner [
37]. No prior work systematically evaluates how LLM-based GenAI, when directly prompted by designers, interprets biophilic frameworks, applies patterns such as the 14 Patterns of Biophilic Design, or supports psychologically targeted goals, such as cognitive restoration grounded in Attention Restoration Theory. Given that practitioners already consult tools like ChatGPT and Gemini for biophilic design ideas, this lack of validation is problematic.
This study addresses that gap by examining how two widely used LLM-based GenAI systems, ChatGPT 5.1 (with DALL·E image generation) and Gemini 3 (with Imagen), perform as early-stage assistants for biophilic interior design. Using three interior scenarios of increasing contextual complexity, the study asks each model to (1) generate a step-by-step biophilic design strategy and (2) produce corresponding images. These outputs are then evaluated qualitatively against the 14 Patterns of Biophilic Design and related restorative theories, and quantitatively through expert ratings provided by architects trained in biophilic design and environmental psychology.
The central research question is: To what extent can ChatGPT and Gemini be considered reliable and effective co-design partners when architects and interior designers seek to apply biophilic design principles, particularly the 14 patterns and attention-restorative goals, in the early stages of interior design? Sub-questions examine: (1) how each model conceptualises biophilic strategies in text across different design briefs; (2) how consistently these strategies are translated into images; and (3) how experts assess the resulting proposals in terms of visual quality, biophilic depth, spatial experience, functionality, contextual sensitivity, and support for cognitive restoration.
2. Methodology
This study employed a comparative, mixed-methods multiple case study design to systematically evaluate the capabilities of two widely used multimodal generative AI systems, OpenAI’s ChatGPT-5.1 (Plus) with DALL·E 3 image generation and Google’s Gemini 3 (Pro) Advanced with Imagen image generation. In both platforms, the textual design strategy is produced by a large language model (LLM), while the photorealistic visual outputs are generated by an integrated image model; accordingly, this study evaluates the end-to-end performance of each integrated system (LLM and image generator) as typically used in early-stage design workflows, rather than isolating the LLM component alone. As summarised in
Figure 1, the experiment used three interior case inputs (tabula rasa, contextual renovation, and strategic restoration) processed in parallel through a standardised three-phase protocol: (Phase I) analysis and planning from the source image and analytical prompt to generate a strategic text plan; (Phase II) initial visualisation to generate a first photorealistic image from the strategy; and (Phase III) iterative refinement using a standardised critique structure to produce a refined final image selected for analysis. Data collection and evaluation followed a dual mechanism combining (1) qualitative framework-based coding of the textual plans (14 Patterns of Biophilic Design) together with plan-to-image fidelity assessment, and (2) quantitative ratings of the final images via an independent expert-panel Likert-scale survey. This mixed-methods structure grounds interpretation in both theory-informed qualitative analysis and systematically collected expert judgements, while acknowledging that differences in renderings may partly reflect the characteristics of the image-generation models (DALL·E 3 vs. Imagen) in addition to the preceding textual plans.
2.1. Case Selection and Justification
Three distinct cases were purposefully selected to test a range of AI capabilities, moving from minimal constraints (creativity) to complex constraints (problem-solving) and finally to strategic, goal-oriented design (psychological application).
Case 1—“Tabula Rasa” (Creativity Test): This case used an image of an empty, well-lit room with neutral walls and a clean wooden floor. The purpose was to establish a baseline by examining each model’s ability to generate biophilic design ideas when architectural constraints are minimal. This case tested the tools’ inherent creativity, aesthetic reasoning, and capacity to conceptualise biophilic principles from scratch (
Figure 2A).
Case 2—“Contextual Renovation” (Problem-Solving and Contextual Awareness Test): This case presented a rustic and visibly damaged room that included a distinctive built-in wooden platform. The aim was to assess the models’ analytical and problem-solving abilities, including their capacity to identify and address observable damage and to recognise, preserve, and meaningfully incorporate unique architectural features within a biophilic design proposal (
Figure 2B).
Case 3—“Strategic Restoration” (Psychological Objective Test): The third case involved a sterile hospital staff break room. This scenario examined the models’ strategic competence in applying biophilic design principles to achieve a defined psychological outcome. The design brief instructed the AI tools to transform the space into a “cognitive restoration sanctuary” for high-stress healthcare staff. This required the models to consider stress reduction and mental recovery as explicit design goals, while also navigating fixed constraints, such as a wall-mounted medical panel, and utilising available assets, including a large exterior window (
Figure 2C).
2.2. Data Collection Protocol
A standardised, multi-phase protocol was applied to both AI models for all three cases to ensure strict comparability and methodological consistency across outputs.
To avoid cross-contamination, prior conversational memory, or adaptive stylistic drift, each test was conducted in a new, memory-less (clean) chat session. This ensured that both models responded solely to the provided stimuli and not to prior interactions.
Phase I—Analysis and Planning: Each model was first provided with the case image together with a standardised analytical prompt aligned with the specific design objective of that case.
Case 1 Prompt: “Design this room from scratch using biophilic design principles. Provide a step-by-step, justified plan.”
Case 2 Prompt: “Renovate this space using biophilic design principles. Provide a detailed, justified plan that explains how you will address the damaged areas and how you will handle unique architectural elements like the wooden platform.”
Case 3 Prompt: “This is a hospital staff break room… Your task is to redesign this room as a ‘cognitive restoration sanctuary’ using the proven restorative and stress-reducing effects of biophilic design. The goal is to allow staff to quickly recharge their mental energy and reduce cognitive fatigue during short breaks. Explain your strategic plan with justifications, including how you will manage existing elements…”
The Phase I prompts intentionally used the general phrasing “biophilic design principles” to reflect common real-world prompting behaviour and to avoid over-constraining or “coaching” the models toward the study’s evaluation framework. The 14 Patterns of Biophilic Design were applied as an external coding and assessment framework during analysis, rather than being explicitly embedded in the prompt, to reduce the risk of criterion leakage between the prompt and the evaluation rubric.
Phase II—Initial Visualisation: After generating the textual design plan, each model was issued an identical visualisation prompt: “Based on the plan you provided, generate a photorealistic image of the finished room.” This ensured that both systems translated their own strategies into images without the researcher’s influence.
Phase III—Iterative Refinement: To test iterative learning, responsiveness, and co-creative capacity, a standardised critique was given for every initial rendering produced in Phase II: “I’ve reviewed the image. There are some problems:…” Follow-up prompts were then tailored to specific deficiencies observed in each model’s output, while the critique structure remained constant across cases to preserve comparability.
In this paper, “plan” refers to the AI-generated textual spatial strategy (implied layout logic, activity placement, and circulation intent) derived from single-view interior images, rather than a measured architectural floor plan. Likewise, “zones” are used to describe implied functional groupings (e.g., work vs. refuge/relaxation areas) inferred from the proposed layout logic and the visible room affordances. Because the case inputs are photographic interior perspectives without verified dimensions, the analysis does not treat room scale metrically; therefore, references to “human-scaled complexity” are interpreted as perceptual qualities (as operationalised through pattern-based coding and expert judgement) rather than dimensioned, construction-ready claims.
2.3. AI Image Generation and Selection
For Case 1, both AI tools required several iterations before a final image could be selected. ChatGPT’s initial rendering (
Figure 3(C1A)) only partially reflected its biophilic plan, and the revised version (
Figure 3(C1B)) offered minimal improvement. Gemini’s first image (
Figure 3(G1A)) was sparse, and a second prompt requesting a full photorealistic rendering (
Figure 3(G1B)) produced a more complete but still cluttered and incomplete design (
Figure 3(G1C)). A final instruction to strictly follow its plan (
Figure 3(G1D)) produced no change. Thus, for both models, the images used for analysis were the closest available approximations, despite not fully implementing their own strategies.
For Case 2, multiple revision prompts were again required, but neither model successfully corrected the inconsistencies between plan and image. ChatGPT’s renderings remained incomplete and altered key architectural features even after clarifications regarding the window and cupboard configuration (
Figure 3(C2A,C2D)). Gemini’s outputs showed cluttered layouts and unaddressed structural issues such as the unrepaired floor and platform (
Figure 3(G2A)). Additional prompts to fix the floor, adjust the platform size, remove the fluorescent ceiling fixture, and revise wall colour (
Figure 3(G2B,G2C)) resulted only in minor aesthetic adjustments (
Figure 3(G2D)). The final images, therefore, reflect the best available outputs rather than faithful applications of each plan.
For Case 3, both systems struggled to produce photorealistic or plan-consistent imagery. ChatGPT’s first render lacked realism and diverged from its strategy (
Figure 3(C3A)), and subsequent revisions led to random, cluttered compositions that preserved unnecessary clinical elements (
Figure 3(C3B,C3C)). Because further prompting worsened results, the initial image was retained. Gemini similarly resisted revision: attempts to increase comfort (
Figure 3(G3A)), add a lying-down opportunity (
Figure 3(G3B)), or correct furniture arrangements (
Figure 3(G3C,G3D)) resulted only in small additions like a rug or recliner, with no meaningful reconfiguration. The final image selected was the least problematic version (
Figure 3(G3B)). Across all cases, both models demonstrated limited ability to revise or correct their images, and final visuals often fell short of the biophilic strategies they themselves proposed.
2.4. Data Analysis and Validation Protocol
Data analysis consisted of three complementary components: (1) qualitative framework analysis of the AI-generated textual plans, (2) an internal plan-to-image fidelity assessment and (3) expert quantitative evaluation of the final AI-generated images. Together, these three components operationalise the study’s notion of system “reliability” in a design context, i.e., the consistency of (a) theoretical alignment, (b) text-to-image translation fidelity (including operational fidelity where relevant), and (c) expert-rated performance.
2.4.1. Qualitative Framework Analysis of AI Strategic Plans
The AI-generated textual plans (Phase I) were analysed separately using the 14 Patterns of Biophilic Design as the primary coding framework. Each plan was examined for explicit or implicit inclusion of these patterns to assess the conceptual depth and strategic sophistication of the model’s proposed approach. Plans were evaluated on a continuum from superficial (limited to basic visual cues) to deep (integrating multiple biophilic patterns with spatial logic and psychological intent).
2.4.2. Plan-to-Image Fidelity Assessment
To evaluate internal consistency, each model’s textual plan was compared directly with its corresponding generated image. This “gap analysis” examined whether the visual output accurately implemented the strategies proposed in the text. This fidelity assessment provided insight into the reliability of each model’s conceptual-to-visual translation and its potential usefulness as a design collaborator.
2.4.3. Expert Evaluation of Visual Outputs
The expert evaluation was conducted through an online survey. A total of 25 invitation emails were distributed to practising architects with formal training in biophilic design, and 15 experts completed the full assessment (response rate: 60%). The expert panel comprised 15 architects who had completed Master’s- or PhD-level coursework in biophilic design and related disciplines. Professional experience ranged from 2 to 12 years. Four panellists had primarily academic roles, while the remaining participants were practice-based architects. All panellists reported prior familiarity with generative AI tools at a moderate-to-professional level. Using a Condensed Quantitative Evaluation Matrix (5-point Likert scale), panellists evaluated each final AI-generated image across three core domains: Technical & Visual Quality, Biophilic Principle Depth, and Contextual & Functional Success. An overall creativity score was also collected. A double-blind protocol ensured that panellists were not informed of which AI system produced each image.
Quantitative data from the expert matrices were aggregated and analysed using descriptive statistics (means and standard deviations) to compare performance between the two AI systems. Where appropriate, paired statistical tests (parametric or non-parametric based on distribution) were applied to determine whether observed differences between models were significant. Written comments were thematically organised to contextualise the numerical findings.
3. Comparative Analysis of AI Outputs
The case analyses that follow examine how each AI model performed across three distinct biophilic design scenarios, considering both their conceptual plans and the visual outputs derived from them. For each case, the evaluation integrates two complementary components: a qualitative assessment based on biophilic design principles and plan-image alignment, and a quantitative appraisal provided by the expert panel. Together, these analyses provide a focused understanding of how reliably and effectively the AI tools operate as early-stage biophilic design assistants under different spatial conditions and design challenges.
3.1. Case 1 “Tabula Rasa”: Empty Room Baseline
Case 1 explored how each AI handled a blank spatial canvas, testing their baseline capacity to conceptualise and visualise biophilic design in an empty, well-lit room (
Figure 4).
3.1.1. Qualitative Analysis of Strategic Plans and Images
The textual plans revealed a clear difference in design approach between the two systems. ChatGPT responded as if acting in the role of a spatial designer. Its plan was organised around functional zoning, user experience and circulation, explicitly defining a refuge corner, a primary workspace oriented towards the window, and a clear separation between work and relaxation areas. The language was directive and volumetric (e.g., positioning of the desk, creation of a reading nook), indicating an understanding of the room as a three-dimensional environment with distinct activity zones.
Gemini focused more on mood and styling than on spatial organisation. Its plan emphasised natural textures, organic shapes, sheer curtains and warm materials, but gave little guidance on how these elements should be arranged in relation to one another. The room was largely treated as a container to be filled with biophilic objects, rather than as an integrated spatial system shaped by prospect, refuge, and flow.
When coded against the 14 Patterns of Biophilic Design, ChatGPT engaged with a broader and deeper range of patterns. It addressed 10 of the 14, including key spatial-psychological patterns such as Prospect and Refuge, and environmental control strategies such as airflow variation, scent and layered, dynamic lighting. Gemini performed well on decorative and material-oriented patterns (Biomorphic Forms and Material Connection), but did not meaningfully address more complex patterns, such as Prospect, Refuge, or Thermal and Airflow Variability (
Table 1). Overall, ChatGPT demonstrated a more structurally and experientially grounded biophilic strategy, while Gemini produced a visually coherent but comparatively superficial, object-driven scheme.
The plan-to-image comparison confirmed this contrast. ChatGPT’s image achieved moderate-to-high fidelity, correctly representing the main zones, view-oriented desk, refuge corner, and natural material palette described in the plan. However, it failed to visualise several promised non-visual and control elements, such as a tabletop water feature, scent diffuser, operable window cues, dimmable lighting controls, and a complete three-layer lighting hierarchy. Storage and clutter-reduction strategies were also largely invisible. Gemini’s image displayed moderate fidelity to its own plan in terms of atmosphere, with organic furniture forms, soft daylight, and natural textures well captured. At the same time, the functional duality of the space was diluted; the room read primarily as a lounge, with no clear workstation, and the layered or tunable lighting strategy and richer nature artefacts mentioned in the text were not fully realised.
3.1.2. Quantitative Analysis of Expert Panel Ratings
Fifteen architects with formal training in biophilic design rated both images on an 8-item, 5-point Likert scale (1 = “very poor,” 5 = “excellent”). The criteria covered visual quality, biophilic depth, spatial experience, functionality and overall creativity.
Across the panel, Gemini was clearly preferred for visual realism and lighting, and scored slightly higher on material/colour quality, as well as perceived creativity, reflecting its strong stylistic and rendering capabilities. ChatGPT, in contrast, was rated higher on spatial composition and flow, human-spatial responses, and especially overall functionality and feasibility, in line with its more architectural, zoning-oriented plan (
Table 2).
3.1.3. Synthesis of Case 1 Findings
Taken together, the qualitative analysis and expert ratings suggest that each AI exhibits distinct strengths in the “Tabula Rasa” condition. ChatGPT generates a more conceptually rich and spatially coherent biophilic strategy and is perceived by experts as more functional and architecturally plausible, but its image output only partially realises the non-visual and environmental-control aspects it proposes. Gemini produces visually polished and atmospheric renderings that experts rate higher in realism, but its underlying strategy is less structurally grounded and its image tends to under-represent functional requirements and deeper biophilic patterns. In Case 1, therefore, neither model delivers a fully integrated biophilic solution; instead, they occupy complementary niches: ChatGPT as a stronger conceptual and spatial assistant, and Gemini as a stronger visual stylist and renderer.
3.2. Case 2 “Contextual Renovation”: Damaged Rustic Room
Case 2 examined how the AI models handled a constrained renovation problem: a rustic, partially damaged room with a distinctive built-in wooden platform. The brief asked the models to repair the visible damage while applying biophilic design principles and respecting the character of the existing elements (
Figure 5).
3.2.1. Qualitative Analysis of Strategic Plans and Images
The textual plans showed a marked difference in design persona and problem-solving methodology. ChatGPT adopted a heritage-sensitive stance, framing its strategy around “assessment and preservation” and emphasising the need to “retain the soul of the structure.” It treated the project as a gentle restoration, focusing on atmosphere, refuge and material authenticity. The platform was reframed as a “cosy refuge” or daybed, with suggestions to use reclaimed timber, lime plaster and local materials to maintain the rustic character. Repair instructions (e.g., replacing damaged floorboards) were present but relatively schematic, reading more like a design brief than a technical specification.
Gemini’s plan read more like a conservation or building-physics report. It was organised into explicit operational phases, structural repair, enhancement of unique elements, and biophilic additions. Its language was both diagnostic and prescriptive, referring to joists, subfloor repairs, and reinforcement, and justifying material choices through a biophilic lens (e.g., breathable lime plaster for improved air quality). It extended these systems’ view into lighting, proposing circadian-tuned LEDs and “living frames” around openings, and treated the platform as a “nest” or retreat zone within a broader performance-driven scheme.
Coded against the 14 Patterns of Biophilic Design, both plans were strong in Material Connection to Nature and Refuge, correctly identifying the platform as the primary opportunity for a retreat-like zone (
Table 3). ChatGPT excelled at constructing a cohesive atmospheric narrative centred on refuge and tactile richness, but engaged less with dynamic systems and natural cycles. Gemini offered a more explicitly systems-based biophilic strategy, particularly for Dynamic and Diffuse Light and Natural Systems, linking lighting and daily rhythms and reimagining existing wall niches as “living frames” that expand visual connection with nature.
The plan-to-image comparison revealed almost the opposite pattern in visualisation. ChatGPT’s image delivered a pleasing, refuge-like atmosphere but showed low technical fidelity to the repair intentions: the broken floor remained visibly damaged and partially open, creating a sense of “picturesque ruin” rather than a safe, renovated space. The platform became a daybed, but edge and step details appeared abrupt and unresolved, undermining the functional safety implied in the text. Gemini’s image, in contrast, achieved high technical fidelity to the basic renovation: the floor and platform appeared fully repaired and structurally sound, consistent with its “Structural Repair” phase. However, several of its more ambitious biophilic moves—such as integrated planters within the platform, clearly legible sustainable finishes, and the nuanced expression of breathable lime plaster—were reduced to a generic rustic language with freestanding plants and undifferentiated materials.
3.2.2. Quantitative Analysis of Expert Panel Ratings
In contrast to Case 1, where the expert questionnaire focused on general visual quality, biophilic depth, spatial experience and overall creativity, the Case 2 survey included two additional, context-specific criteria. Alongside the same core dimensions used in Case 1, experts were also asked to rate how well the design respected and integrated the original elements of the room (platform, cupboards, window, ceiling, flooring) and how successfully the visible damage was repaired in a context-sensitive way. These extra items reflect the renovation-oriented nature of Case 2 and its emphasis on dealing with an existing, partially deteriorated fabric.
Across almost all dimensions, experts preferred the ChatGPT solution (
Table 4). The largest differences appear in spatial composition and flow (4.00 vs. 2.53), human-spatial responses (3.87 vs. 2.87), overall functionality and feasibility (3.73 vs. 2.87), and contextual handling of damage (3.67 vs. 2.60). Gemini’s scores remain close to the “average” band, with somewhat better performance on material palette and contextual integration of original elements, but consistently lower evaluations in spatial, functional and renovation-related aspects. Overall, creativity is also rated higher for ChatGPT (3.33 vs. 2.73), reinforcing the impression that, in a renovation context, its proposal is experienced as both more functional and more inventive.
3.2.3. Synthesis of Case 2 Findings
Case 2 exposes a different pattern from Case 1. At the plan level, Gemini appears more technically rigorous and systems-aware, particularly regarding repair sequencing and circadian lighting, whereas ChatGPT tells a strong narrative about refuge, warmth and material authenticity. At the image level, however, the roles partially invert: ChatGPT’s rendering is atmospherically convincing but structurally unreliable, leaving major floor damage unresolved; Gemini’s rendering is structurally resolved but biophilically diluted, reducing integrated and performative strategies to a more conventional rustic style.
The expert ratings lean strongly towards ChatGPT in this scenario, suggesting that, when evaluating the images as architectural proposals, practitioners value coherent spatial organisation, clear refuge qualities, and convincing treatment of damage more than the invisible technical systems mentioned in the text. In other words, ChatGPT’s atmospheric “gentrification” of the ruin reads as a more compelling renovation proposal than Gemini’s technically driven but visually and biophilically underexpressed solution. At the same time, both models struggle again to make performance-based aspects of biophilic renovation, such as breathable plasters, low-VOC finishes, or thermal comfort, legible in their images, underscoring the limitations of current AI image generation for verifying sustainable and health-related design strategies.
3.3. Case 3 “Strategic Restoration”: Hospital Staff Break Room
Case 3 examined how the AI models handled a psychologically oriented brief: transforming a sterile hospital staff break room into a “cognitive restoration sanctuary” for high-stress healthcare workers (
Figure 6). The focus was on how effectively each model applied biophilic design and Attention Restoration Theory (ART), particularly the dimensions of “Being Away” and “Soft Fascination,” while managing existing clinical constraints, such as the wall-mounted medical services rail.
3.3.1. Qualitative Analysis of Strategic Plans and Images
At the level of strategic planning, both models recognised the need for psychological restoration but pursued it through different architectural logics. ChatGPT framed its approach around “sensory load reduction” and organised the room into distinct behavioural zones: a Quiet Restoration Zone for solitary recovery and a Social Recharge Zone for informal support among staff. Its application of ART emphasised Soft Fascination through layered sensory inputs, nature murals, living or moss walls, warmer materials, and a “nature soundscape” intended to mask mechanical and clinical noise. The medical services rail was addressed pragmatically: ChatGPT proposed concealing it behind a bamboo or timber feature panel while maintaining access, thus reducing visual stress without erasing function.
Gemini adopted a more radical “de-clinicalisation” stance, explicitly aligning its strategy with ART by targeting Being Away. It aimed to suppress institutional cues wherever possible. Its key move was to transform the medical services rail into a biomorphic wood screen, conceptually turning a stress-inducing medical device into a piece of fractal art. Spatially, Gemini emphasised Prospect and Refuge by proposing high-back chairs to create enclosed “refuge nooks” and by positioning seating to frame the window as a “portal” to natural systems.
When coded against the 14 Patterns of Biophilic Design, both plans demonstrated strengths, albeit in different clusters. ChatGPT excelled in sensory richness and compatibility, particularly in Non-Visual Connection to Nature (soundscape, scents, tactility) and Material Connection to Nature, providing an immersive, multi-sensory experience that current image generation cannot fully replicate. Gemini was stronger in spatial psychology and systemic biophilia, notably in Prospect and Refuge, Dynamic and Diffuse Light, Natural Systems and Biomorphic Forms, using fractal screens and circadian lighting to target the physiological mechanisms behind stress reduction (
Table 5).
The plan-to-image fidelity assessment highlighted a common gap between conceptual ambition and visual output. ChatGPT’s image achieved high visual fidelity but low operational fidelity. In this study, operational fidelity refers to the extent to which the visual output accurately represents the functional or technical intent of the textual design plan. It successfully implemented the overall zoning and colour palette, and replaced the clinical feeling with a calming, nature-themed mural. However, the “living wall” became a flat mural, the acoustic rug was reduced to a modest decorative carpet, the medical rail appeared essentially erased rather than concealed with operational access, and the ceiling lighting remained generic rather than clearly upgraded to tunable or dimmable fixtures. Gemini’s image also achieved high visual fidelity in its central elements: the biomorphic screen over the medical rail was clearly depicted as a carved wooden feature, and the refuge seating zone was legible. Yet the screen no longer read as hinged or operable, the window treatment did not fully express the “portal” with layered planting, and the circadian and acoustic performance of the system remained invisible. In both cases, the images captured the “look” of the strategies more than their technical or operational intent.
3.3.2. Quantitative Analysis of Expert Panel Ratings
Fifteen architects with biophilic design training rated the Case 3 images on nine criteria using a 5-point Likert scale. The criteria covered visual quality, biophilic depth, spatial experience, functionality, strategic success in supporting cognitive restoration, and overall creativity.
Overall, the ratings indicate that neither image was perceived as fully successful in delivering a convincing restorative environment, with most scores clustered between “weak” and “average”. Gemini scored slightly higher on visual realism and lighting (2.93 vs. 2.80), suggesting that its rendering was marginally more convincing at a purely visual level (
Table 6). However, ChatGPT was rated higher across all other dimensions, particularly for material and colour quality, spatial composition and flow, human-spatial responses, overall functionality, and strategic success in meeting the cognitive restoration brief (3.13 vs. 2.53). Overall creativity was also rated higher for ChatGPT (2.80 vs. 2.40), although both remained below the “good” threshold.
3.3.3. Synthesis of Case 3 Findings
Case 3 highlights the difficulty current generative AI systems have in operationalising psychologically targeted biophilic design in a clinical context. At the level of written strategy, both models displayed a meaningful engagement with ART: ChatGPT through multi-sensory soft fascination and compatibility, Gemini through de-clinicalisation, prospect-refuge and systemic lighting strategies. In practice, however, the images that experts evaluated were perceived as only moderately successful, with ChatGPT’s proposal viewed as somewhat more coherent, functional and aligned with the cognitive restoration goal.
The expert ratings suggest that, in this psychologically demanding context, spatial organisation, material comfort and legible refuge qualities are more influential for practitioners than marginal differences in photorealism. At the same time, Case 3 reinforces a pattern already visible in earlier cases: non-visual biophilic patterns (sound, scent, air movement) and operational systems (circadian lighting, movable screens, medical access) are consistently lost or flattened in AI-generated imagery. The tools tend to produce visually plausible “biophilic atmospheres” but struggle to communicate the deeper environmental performance and psychological mechanisms that their own textual plans describe.
4. Cross-Case Synthesis and Implications
The three cases collectively show that current multimodal generative AI tools can support biophilic design thinking in a useful but partial and uneven way. Their strengths and limitations are not random; they follow clear patterns across different spatial conditions and design tasks.
4.1. Cross-Case Patterns in Strategic Planning
Across all three scenarios, both tools demonstrated a non-trivial grasp of biophilic concepts at the textual level. Each model was able to name appropriate strategies, reference recognised patterns (e.g., refuge, prospect, natural materials, dynamic lighting), and frame its proposals in ways that are broadly consistent with contemporary biophilic design discourse.
However, a consistent differentiation emerged:
ChatGPT tended to behave like a spatial designer: organising space into zones, articulating user flows and activities, and embedding biophilic moves into a coherent spatial narrative. Its plans generally engaged a wider range of Terrapin’s 14 Patterns, including more complex spatial-psychological ones such as Prospect and Refuge, and, in the hospital case, explicit references to Attention Restoration Theory and cognitive restoration.
Gemini behaved more like a stylist and technical consultant: strong on mood, material palettes, and, in Case 2, technically framed repair sequences and systems (e.g., circadian lighting, breathable plasters). Its biophilic integration was often more systemic in concept (light, rhythms, de-clinicalisation), but less deeply articulated in terms of spatial organisation.
Taken together, the three cases suggest that both models are capable of generating credible biophilic concepts on paper, but with different emphases: one is more spatial and narrative, while the other is more atmospheric and occasionally systems-oriented.
4.2. Cross-Case Patterns in Visual Outputs and Expert Judgements
When these textual strategies were translated into images, a different pattern emerged. The expert ratings indicate that:
Visual realism and lighting were where Gemini most often had a slight edge (particularly in the empty-room baseline), confirming its strength as a renderer.
Spatial composition, human-spatial responses, functionality and context were consistently rated higher for ChatGPT, especially in the more complex renovation and clinical cases.
In Case 1, the experts perceived a form of complementarity: Gemini as the more realistic visualiser, ChatGPT as the more coherent and functional spatial designer.
In Case 2 and Case 3, that balance shifted: ChatGPT’s images, despite their flaws, were more often experienced as architecturally convincing proposals, while Gemini’s tended to remain at the level of visually plausible but conceptually shallow scenes.
Overall, the ratings cluster around the “average” band rather than the “good” or “excellent” levels. This suggests that, in their current state, these tools do not yet produce biophilic design proposals that trained architects would accept as ready-to-build or strongly restorative, but they can reach a baseline level of adequacy that may be useful for early idea generation.
4.3. Recurrent Gaps: From Biophilic Text to Image
Across all cases, there are systematic losses when moving from strategic text to visual output:
Non-visual biophilic patterns (sound, scent, airflow, thermal variability) appear robustly in the textual plans but are effectively absent in the images. Water features become implied or decorative; soundscapes and scents disappear completely. The AI imagery is strongly oculocentric, biased towards what can be seen rather than what can be felt, heard or smelled.
Operational and performative systems (circadian lighting, dimmers, operable screens, breathable materials, medical access) are either simplified into generic elements or visually indistinguishable from conventional solutions. Even when models describe sophisticated systems in text, the corresponding images rarely make those capabilities legible.
Safety and technical integrity are only inconsistently represented. In the renovation case, structural damage remained visible in one image despite textual promises of repair; in the hospital case, critical infrastructure was visually “cleaned away” rather than realistically integrated.
These recurring discrepancies highlight a fundamental limitation: the textual “intelligence” of the language model is not yet reliably coupled to the visual “intelligence” of the image generator. As a result, the AI may sound more biophilic and technically aware than it looks.
4.4. Implications for Design Practice
For architects and interior designers, these findings suggest a nuanced but clear position on the role of such tools:
They are promising as early-stage assistants for ideation, especially when the goal is to quickly explore biophilic atmospheres, material palettes, or high-level spatial concepts.
They are not reliable as autonomous design consultants for biophilic performance, safety, or psychologically sensitive environments. Critical decisions about structure, services, and psychological impact still require professional judgement and detailed design.
The clearest current value lies in co-creation: designers can use text-based outputs to widen their conceptual field (e.g., patterns they might not have considered) and use images as conversational sketches rather than as final proposals.
Because the tools repeatedly under-represent non-visual and systemic biophilic patterns, professionals should treat AI-generated images as partial visualisations and explicitly check them against more comprehensive biophilic frameworks before taking them seriously.
4.5. Implications for Biophilic Design Research
From a research perspective, this study demonstrates that evaluating AI through both a theoretical framework and expert ratings offers a robust approach to moving beyond “impressive examples” towards a systematic assessment. Biophilic design, with its combination of sensory, spatial, and psychological dimensions, is an effective stress test for AI tools, exposing where visual generation remains shallow or incomplete. Future work could explore prompt strategies, fine-tuning, or custom models that explicitly encode non-visual and systemic biophilic aspects, aiming to close the current gap between conceptual depth and visual expression.
In sum, across the three cases, ChatGPT and Gemini demonstrate that generative AI can already play a supportive but limited role in biophilic design. They are capable of generating structured, sometimes sophisticated biophilic narratives and visually plausible scenes, yet they fall short in reliably translating full biophilic intent, especially non-visual and performative dimensions, into images that experts would consider truly restorative or buildable.
While the tested systems can rapidly recombine known biophilic strategies and generate visually persuasive atmospheres, this capacity is better understood as computational or combinatorial generation than human creative agency. In architectural design, creativity extends beyond novelty production to include intentionality, contextual judgment, ethical responsibility, and the interpretation of lived experience. These dimensions are especially central in biophilic design, where the designer must translate wellbeing goals into situated spatial experiences that respond to specific users, constraints, and potential risks.
Accordingly, the distinct value of human designers lies in meaning-making and care: deciding what matters in a given context, for whom, and why, and taking responsibility for consequences. The findings therefore support positioning multimodal GenAI as an early-stage co-creative aid for ideation and visualization, rather than as an autonomous author of restorative environments. Human judgment remains essential to ensure that biophilic intentions are grounded in empathy, culture, ethics, and accountable decision-making.
5. Conclusions
The study set out to examine how two widely used multimodal GenAI systems, ChatGPT-5.1 with DALL·E 3 and Gemini 3 with Imagen, perform as early-stage co-design partners for biophilic interior design, using three deliberately contrasting cases (blank canvas, damaged renovation, and clinical restorative space) and combining framework-based analysis with expert ratings from architects trained in biophilic design.
Across all cases, both models were able to articulate broadly credible biophilic strategies in text, but with distinct emphases: ChatGPT consistently behaved more like a spatial designer, structuring programmes, flows and refuge zones around recognised patterns such as prospect and refuge, while Gemini tended to operate as a visual stylist or technically framed consultant, stronger in mood, rendering quality and, in some instances, systems language (e.g., circadian lighting). When these strategies were translated into images and evaluated by experts, Gemini was rated marginally higher for visual realism and lighting, whereas ChatGPT was more often judged superior in spatial composition, human-spatial responses, functionality, contextual handling of existing elements and, in the clinical case, strategic support for cognitive restoration. Yet in all three scenarios, mean scores clustered around “average”, indicating that neither system currently produces biophilic designs that experts regard as fully convincing or build-ready.
A recurrent limitation was the systematic loss of non-visual and performative dimensions, sound, scent, airflow, thermal variation, circadian systems, operable infrastructure, when moving from text to image. The tools generated persuasive “biophilic atmospheres” but struggled to visualise safety-critical repairs, medical access, or the operational logic underpinning restorative environments. A limitation of the comparative setup is that the image outputs are produced by different image-generation models (DALL·E 3 vs. Imagen), so visual differences cannot be attributed exclusively to the LLM component. In addition, outcomes may be prompt-sensitive: explicitly referencing the “14 Patterns of Biophilic Design” (rather than the broader phrase “biophilic design principles”) may increase category-wise completeness, particularly for models that respond better to structured taxonomies. For practice, this suggests that current GenAI tools are best treated as fast, idea-generating partners for early exploration of atmospheres, patterns and layouts, rather than as reliable advisors on biophilic performance or psychologically sensitive design. For research, the work demonstrates the value of using established frameworks, such as the 14 Patterns of Biophilic Design, together with expert judgement to benchmark AI behaviour, and points towards future studies on prompt strategies, fine-tuned or domain-specific models, additional building types, and longitudinal impacts on design education. As GenAI systems continue to evolve, theory-informed evaluations of this critical nature will be essential to ensure that their growing presence in design practice strengthens, rather than dilutes, evidence-based approaches to health-supportive environments.
Funding
This research received no external funding.
Institutional Review Board Statement
The study was conducted in accordance with relevant ethical standards for research involving human participants and was approved by the Ethics Committee of Sakarya University, Faculty of Engineering and Natural Sciences (Decision No. 66/04, Date of Approval: 16 December 2025).
Informed Consent Statement
Informed consent for participation was obtained from all subjects involved in the study.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
During the preparation of this manuscript/study, the author used ChatGPT 5.1 and Gemini 3 PRO for the purposes of data extraction, as clearly stated in the methodology. The authors have reviewed and edited the output and take full responsibility for the content of this publication.
Conflicts of Interest
The author declares no conflicts of interest.
References
- Howard, T.J.; Culley, S.J.; Dekoninck, E. Describing the creative design process by the integration of engineering design and cognitive psychology literature. Des. Stud. 2008, 29, 160–180. [Google Scholar] [CrossRef]
- Cortes, R.A.; Weinberger, A.B.; Daker, R.J.; Green, A.E. Re-examining prominent measures of divergent and convergent creativity. Curr. Opin. Behav. Sci. 2019, 27, 90–93. [Google Scholar] [CrossRef]
- Dorst, K.; Cross, N. Creativity in the design process: Co-evolution of problem–solution. Des. Stud. 2001, 22, 425–437. [Google Scholar] [CrossRef]
- Chiou, L.-Y.; Hung, P.-K.; Liang, R.-H.; Wang, C.-T. Designing with AI: An Exploration of Co-Ideation with Image Generators. In DIS ‘23: Proceedings of the 2023 ACM Designing Interactive Systems Conference; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1941–1954. [Google Scholar]
- Peckham, O.; Raines, J.; Bulsink, E.; Goudswaard, M.; Gopsill, J.; Barton, D.; Nassehi, A.; Hicks, B. Artificial Intelligence in Generative Design: A Structured Review of Trends and Opportunities in Techniques and Applications. Designs 2025, 9, 79. [Google Scholar] [CrossRef]
- Chandrasekera, T.; Hosseini, Z.; Perera, U. Can artificial intelligence support creativity in early design processes? Int. J. Arch. Comput. 2025, 23, 122–136. [Google Scholar] [CrossRef]
- Rane, N.; Choudhary, S.; Rane, J. Gemini Versus ChatGPT: Applications, Performance, Architecture, Capabilities, and Implementation. J. Appl. Artif. Intell. 2024, 5, 69–93. [Google Scholar] [CrossRef]
- Kasneci, E.; Sessler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
- McIntosh, T.R.; Susnjak, T.; Liu, T.; Watters, P.; Xu, D.; Liu, D.; Halgamuge, M.N. From Google Gemini to OpenAI Q* (Q-Star): A Survey on Reshaping the Generative Artificial Intelligence (AI) Research Landscape. Technologies 2025, 13, 51. [Google Scholar] [CrossRef]
- Saadi, J.I.; Yang, M.C. Generative Design: Reframing the Role of the Designer in Early-Stage Design Process. J. Mech. Des. 2023, 145, 041411. [Google Scholar] [CrossRef]
- Fang, C.; Zhu, Y.; Fang, L.; Long, Y.; Lin, H.; Cong, Y.; Wang, S.J. Generative AI-enhanced human-AI collaborative conceptual design: A systematic literature review. Des. Stud. 2025, 97, 101300. [Google Scholar] [CrossRef]
- Kellert, S.; Wilson, E.O. The Biophilia Hypothesis; Island Press: Washington, DC, USA, 1993. [Google Scholar]
- Kellert, S.; Heerwagen, J.; Mador, M. Biophilic Design: The Theory, Science and Practice of Bringing Buildings to Life; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Wilson, E. Biophilia. 1984. Available online: https://www.degruyter.com/document/doi/10.4159/9780674045231/html (accessed on 17 June 2021).
- Tekin, B.H.; Aktog, M.A. A Conceptual Framework for Biophilic Architectural Design in Cold Climates: A Meta-Synthesis Analysis. Buildings 2025, 15, 3825. [Google Scholar] [CrossRef]
- Tekin, B.H.; Gutiérrez, R.U. Human-centred health-care environments: A new framework for biophilic design. Front. Med. Technol. 2023, 5, 1219897. [Google Scholar] [CrossRef] [PubMed]
- Kellert, S.; Calabrese, E. The Practice of Biophilic Design. 2015. Available online: www.biophilic-design.com (accessed on 23 March 2021).
- Tekin, B.H. From greenwashing to grounded practice: A context-specific biophilic design framework for hot arid climates. Build. Environ. 2026, 292, 114295. [Google Scholar] [CrossRef]
- Browning, W.D.; Ryan, C.O. Nature Inside: A Biophilic Design Guide; RIBA Publishing: London, UK, 2020. [Google Scholar]
- Ulrich, R.S. View through a window may influence recovery from surgery. Science 1984, 224, 420–421. [Google Scholar] [CrossRef]
- Tekin, B.H.; Corcoran, R.; Gutiérrez, R.U. A Systematic Review and Conceptual Framework of Biophilic Design Parameters in Clinical Environments. Health Environ. Res. Des. J. 2022, 16, 233–250. [Google Scholar] [CrossRef]
- Berman, M.G.; Jonides, J.; Kaplan, S. The cognitive benefits of interacting with nature. Psychol. Sci. 2008, 19, 1207–1212. [Google Scholar] [CrossRef]
- Tekin, B.H. Healing Environments for Cancer Care: Toward a Patient-Centered Design Guideline for Inpatient Settings. Health Environ. Res. Des. J. 2025. [Google Scholar] [CrossRef]
- Park, S.-H.; Mattson, R.H. Ornamental Indoor Plants in Hospital Rooms Enhanced Health Outcomes of Patients Recovering from Surgery. J. Altern. Complement. Med. 2009, 15, 975–980. [Google Scholar] [CrossRef]
- Clarke, D.M.; Currie, K.C. Depression, anxiety and their relationship with chronic diseases: A review of the epidemiology, risk and treatment evidence. Med. J. Aust. 2009, 190, S54–S60. [Google Scholar] [CrossRef]
- Laursen, J.; Danielsen, A.; Rosenberg, J. Effects of Environmental Design on Patient Outcome: A Systematic Review. HERD Health Environ. Res. Des. J. 2014, 7, 108–119. [Google Scholar] [CrossRef]
- Kaplan, R.; Kaplan, S. The Experience of Nature: A Psychological Perspective; Cambridge University Press: Cambridge, UK, 1989. [Google Scholar]
- Tekin, B.H.; Corcoran, R.; Gutiérrez, R.U. The impact of biophilic design in Maggie’s Centres: A meta-synthesis analysis. Front. Arch. Res. 2023, 12, 188–207. [Google Scholar] [CrossRef]
- Turner, J.; Kelly, B. Emotional dimensions of chronic disease. West. J. Med. 2000, 172, 124. [Google Scholar] [CrossRef] [PubMed]
- Abdelaal, M.S.; Soebarto, V. Biophilia and Salutogenesis as restorative design approaches in healthcare architecture. Archit. Sci. Rev. 2019, 62, 195–205. [Google Scholar] [CrossRef]
- Murphy, M.; Mansfield, J. Can architecture heal? Building as instruments of health. Arch. Des. 2017, 87, 82–89. [Google Scholar] [CrossRef]
- Kaplan, S. The restorative benefits of nature: Toward an integrative framework. J. Environ. Psychol. 1995, 15, 169–182. [Google Scholar] [CrossRef]
- Ulrich, R.S.; Simons, R.F.; Losito, B.D.; Fiorito, E.; Miles, M.A.; Zelson, M. Stress recovery during exposure to natural and urban environments. J. Environ. Psychol. 1991, 11, 201–230. [Google Scholar] [CrossRef]
- Joye, Y.; Dewitte, S. Nature’s broken path to restoration. A critical look at Attention Restoration Theory. J. Environ. Psychol. 2018, 59, 1–8. [Google Scholar] [CrossRef]
- Browning, W.D.; Ryan, C.O.; Clancy, J.O. 14 Patterns of Biophilic Design: Improving Health & Well-Being in the Built Environment; Terrapin Bright Green, LLC: New York, NY, USA, 2014; Available online: https://www.terrapinbrightgreen.com/wp-content/uploads/2014/09/14-Patterns-of-Biophilic-Design-Terrapin-2014p.pdf (accessed on 22 August 2020).
- Alexander, C.; Ishikawa, S.; Silverstein, M. A Pattern Language: Towns, Buildings, Construction; Oxford University Press: Oxford, UK, 1977; Available online: https://www.scirp.org/reference/referencespapers?referenceid=1225982 (accessed on 8 February 2026).
- Tekin, B.H.; Tunahan, G.I.; Disci, Z.N.; Ozer, H.S. Biophilic Design in the Built Environment: Trends, Gaps and Future Directions. Buildings 2025, 15, 2516. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |