AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent

Li, Li; Yang, Xuesong; Liu, Sijia; Deng, Feiyang

doi:10.3390/urbansci10010056

Open AccessArticle

AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent

by

Li Li

^1,2,

Xuesong Yang

^1,*,

Sijia Liu

³ and

Feiyang Deng

¹

School of Urban Design, Wuhan University, Wuhan 430072, China

²

Research Center of Hubei Small Town Development, Hubei Engineering University, Xiaogan 432000, China

³

College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan 430070, China

^*

Author to whom correspondence should be addressed.

Urban Sci. 2026, 10(1), 56; https://doi.org/10.3390/urbansci10010056

Submission received: 5 November 2025 / Revised: 23 December 2025 / Accepted: 27 December 2025 / Published: 16 January 2026

(This article belongs to the Special Issue Emerging -Scapes: Conceptual and Spatial Constructs in Architectural and Urban Studies)

Download

Browse Figures

Versions Notes

Abstract

Under the dual pressures of global climate change and accelerating urbanization, landscape design has been tasked with the critical mission of enhancing urban environmental resilience and ecological livability. However, conventional design practices often struggle to efficiently integrate complex sustainability norms with aesthetic creativity, leading to a disconnect between form and function. To address this issue, this study proposes and validates an AI-enabled sustainability decision-support framework. The framework is based on a “Generative-Critical” multi-agent workflow that enables “Self-Correcting” iterative optimization of design schemes through a built-in expert knowledge base and a quantitative scorecard. The framework’s effectiveness was validated through a cultural park case study and a blind evaluation by 10 experts. It guided a design from an initial concept with only aesthetic forms and lacking effective stormwater management, to an ecologically integrated scheme that strategically incorporated bioretention ponds at key nodes and converted hard plazas into permeable pavements. This transformation significantly elevated the scheme’s sustainability score from 59.3 to 88.0 (p < 0.001), while the framework itself achieved a high system usability scale (SUS) score of 85.5. These results confirm that the proposed “Generative-Critical” mechanism can effectively guide AIGC to adhere to ecological-technical norms and constraints while pursuing aesthetic innovation, thereby achieving a scientific integration of aesthetic form and ecological function at the early conceptual design stage. This study offers a scalable methodology for AI-assisted sustainable design and provides a novel intelligent tool for creating resilient urban landscapes that possess both environmental performance and aesthetic value.

Keywords:

sustainability; landscape design; decision-support framework; generative mechanism; multi-agent system; urban resilience

1. Introduction

Urban landscapes and environments play a pivotal role in contemporary society, shaping not only the visual and functional characteristics of urban spaces [1,2] but also profoundly influencing ecological conservation [3], social well-being [4], and sustainable development [5]. Amid the dual pressures of accelerating global urbanization and climate change, urban landscapes are tasked with a more demanding mission: to simultaneously address complex ecological constraints—such as mitigating urban heat islands, managing stormwater, and promoting biodiversity—and diverse sociocultural needs, including enhancing public space quality, preserving local context, and achieving social equity [3,6,7]. As the core discipline for achieving these goals, landscape architecture integrates art, ecology, urban planning, and social sciences to create spaces that are aesthetically valuable, functional, and ecologically sustainable [1,2,8].

However, systematically integrating sustainability knowledge into design practice remains a long-standing global challenge. According to the recently proposed Landscape Sustainability Science (LSS) framework, this integration requires designs to move beyond local aesthetics and develop a holistic understanding of ecological processes and spatial structures [9]. Yet, designers often face a trade-off dilemma between aesthetic expression and ecological function, leading to schemes where form is prioritized, function is weakened, or ecological concepts are applied only superficially [10,11]. For instance, a recent study of 224 built park projects worldwide from 2015 to 2025 found that although a growing number of projects claim to adopt “ecological design principles,” those that truly achieve multiple ecological benefits (e.g., stormwater management, biodiversity conservation) rather than merely “visual greening” remain a minority [12]. Collectively, how to scientifically bridge the gap between form and function from the outset of the design process—thereby achieving synergy among ecology, function, and aesthetics—is a critical issue that contemporary landscape and urban design disciplines urgently need to address.

To address increasingly complex design challenges, the field of landscape architecture has undergone innovations from hand-drawing to digital tools, yet these tools remain fundamentally passive [13,14,15]. Software such as CAD, SketchUp, and GIS function as efficient digital drawing boards and calculators, capable of precisely executing a designer’s commands but unable to comprehend the semantics of the design content itself [16,17,18]. They have optimized the design representation stage but have not fundamentally resolved the knowledge integration challenge at the conceptual generation stage [19].

In recent years, the rapid advancement of artificial intelligence has offered new avenues for addressing this issue, with Generative Artificial Intelligence (AIGC) and Multi-Agent Systems (MAS) emerging as the two most promising technological paths, sparking widespread exploration.

The rise of AIGC [20,21,22] has brought the potential for proactive generation to the landscape design field. However, current applications of AIGC in landscape architecture are highly concentrated on visual translation and style generation [23,24,25]. For example, Ye et al. utilized Stable Diffusion and ControlNet models to automate the generation of photorealistic landscape renderings from text descriptions or sketches [23]. Such studies reveal a profound “knowledge blindness” problem: the training data for existing AIGC tools consist predominantly of visual images, which cannot represent or apply deep knowledge about ecological functions, material performance, or technical specifications. Consequently, while AIGC excels at generating “good-looking” images, it cannot guarantee the ecological soundness of its proposals, restricting its use to downstream representation stages rather than providing effective support during conceptual decision-making.

Meanwhile, Multi-Agent Systems (MAS) offer another path to solving complex problems. An agent is a computational entity capable of autonomous perception, reasoning, and action [26]. Due to their distributed and collaborative advantages, MAS have demonstrated immense potential in the macro-level simulation of urban planning [26,27,28]. For example, the CityGen framework developed by Zhou et al. utilizes MAS to simulate the decision-making behaviors of stakeholders such as residents, governments, and designers to facilitate urban co-design [26]. How the collaborative nature of MAS can be leveraged to handle the highly complex, multi-objective constraints of landscape design remains a largely untapped area of research. However, existing research has rarely explored how to leverage the collaborative mechanisms of MAS to handle micro-scale, multi-objective spatial design generation tasks.

Based on the analysis above, a significant gap exists in current research: on the one hand, mainstream AIGC tools lack the capability to integrate sustainability knowledge; on the other hand, MAS technology, which can handle complex collaboration, has not yet been applied to solve this specific, micro-scale design generation problem. The core challenge has shifted from “How can AI generate images?” to “How can AI, within a single conceptual framework, logically intertwine and balance creative aesthetic inspiration with ecological constraints?”

Addressing these questions and challenges, this study proposes an AI-enabled sustainability decision-support framework. This framework does not attempt to directly generate final visual forms; rather, it focuses on solving the more upstream decision-making problem. The core of the framework is a novel, “Self-Correcting” workflow based on “Generative-Critical” multi-agent, designed to deeply embed sustainability as a hard constraint within the AIGC conceptual text generation process. The main contributions of this study are threefold:

(1): Construction of a dual retrieval-augmented knowledge base: This study innovatively constructs and integrates two types of heterogeneous knowledge: inspiration-oriented knowledge from aesthetic precedents and function-oriented knowledge from ecological-technical norms. In contrast to mainstream AIGC applications that rely on general web data or singular visual datasets, this specialized aesthetic–functional dual structure enables the AI to logically link and weave a specific ecological requirement with a referential aesthetic form at the textual level, thereby bridging the form-function gap directly at the knowledge source.
(2): A novel “Generative-Critical” multi-agent methodology: The core theoretical innovation of this study is the application of the MAS collaborative paradigm from macro-level simulation to micro-level design generation. Unlike previous MAS research in which agents play the role of Stakeholders, the agents in this framework mimic the internal collaboration between a designer (Generation Agent) and an Expert Critic (Evaluation Agent). Through a “Critique-and-Refine” loop driven by a quantitative sustainability scorecard, the system internalizes the expert review mechanism, compelling the AIGC’s creative process to meet a preset sustainability threshold. This addresses the fundamental flaw of general-purpose AIGC, which only generates without evaluating.
(3): An explainable human–AI collaboration paradigm: The practical contribution is the development of a human–computer interaction (HCI) framework integrated with an Expert Consultation Agent. This framework not only supports designers in iterative refinement through multi-turn dialogue but, more importantly, it makes the AI’s decision-making process transparent through a visible sustainability scorecard, preventing it from being a black box. This clearly defines a new role for both AI and humans: the AI is no longer a black-box artist replacing the designer, but a blueprint planner providing rigid ecological function guarantees and professional aesthetic inspiration. The designer then utilizes this high-quality blueprint for the final visual and spatial creation. The system functions not only as a decision-support tool but also as a design education tool, holding significant practical and educational value.

In summary, the AI-enabled multi-agent framework proposed in this study serves as more than a mere technical instrument; it catalyzes the formation of a contemporary landscape paradigm—specifically, an intelligent-ecological landscape shaped by the convergence of ecological performance metrics, generative intelligence, expert knowledge, and iterative human–AI collaboration. By demonstrating how computational logic coordinates rigid environmental constraints with fluid generative creativity, this study aims to enrich the connotation and practical paradigms of these emerging landscapes.

This paper is organized as follows: Section 2 details the system architecture of the “Generative-Critical” decision-support framework, including the knowledge base construction, MAS workflow, and HCI design. Section 3 demonstrates the framework’s operation through a case study in cultural park landscape design and validates its effectiveness via quantitative and qualitative expert evaluations. Section 4 discusses the study’s innovative value, the significance of explainable artificial intelligence (XAI) for decision support, and the research limitations and future directions. Finally, Section 5 concludes by summarizing the study’s theoretical and practical contributions.

2. Research Methods

2.1. Research Framework

The proposed research framework is an AI-enabled sustainability decision-support system, the core architecture of which comprises four modules (Figure 1):

(1): A sustainability-focused knowledge base construction module that builds a dual knowledge base by collecting aesthetic precedents and sustainability norms or technical knowledge, such as The Sustainable SITES Initiative (SITES), LEED for Neighborhood Development (LEED-ND), and Sponge City guidelines. This knowledge base is designed to bridge the gap between aesthetics and ecology, providing the AI with a professional basis for decision-making.
(2): A “Generative-Critical” multi-agent core. As the core methodological innovation of this framework, it employs a “Self-Correcting” loop driven by two collaborative agents: (a) a Generation Agent, acting as the designer, which is responsible for retrieving from the dual knowledge base and generating an initial conceptual scheme draft based on user inputs (e.g., design briefs, site images); and (b) an Evaluation Agent, serving as the internal Expert Critic, whose core is a quantitative sustainability scorecard. This agent assesses and scores the draft based on multidimensional indicators such as ecological, hydrological, and social performance. The scheme must undergo a “Critique-and-Refine” loop, wherein the Evaluation Agent continuously provides feedback for the Generation Agent to iterate upon, until the scheme’s score reaches a preset sustainability quality threshold.
(3): A human–computer interaction workflow. Realized primarily through an Expert Consultation Agent, this module allows users to proactively consult with and iterate on the scheme via multi-turn dialogue. More importantly, this interface integrates XAI features. Specifically, the system outputs the final scheme along with its sustainability scorecard results, making the AI’s decision-making process transparent and thereby enabling genuine decision support and design education.
(4): A framework validation module. To verify the framework’s value, this study designed a dual-pronged evaluation: (a) a usability assessment using standard scales and user interviews to evaluate the HCI’s ease of use and satisfaction; (b) a sustainability performance evaluation, which invites human experts to blind-review schemes generated with AI assistance against a baseline, using the same internal scorecard, followed by statistical analysis to quantitatively demonstrate the framework’s significant effectiveness in enhancing scheme sustainability.

Figure 1. The sustainable landscape decision-support framework based on “Generative-Critical” multi-agent.

This figure depicts the workflow of the research framework, illustrating the dynamic relationships between the knowledge base, the multi-agent core, and the HCI interface. By distinguishing between data flow (solid lines) and logical invocation (dashed lines), the diagram details the framework’s core mechanism: a “Critique-and-Refine” feedback loop driven by quantitative evaluation. This loop ensures that the design scheme is autonomously iterated and optimized until it meets preset sustainability criteria (Score ≥ T), thereby achieving high-quality scheme generation.

2.2. Sustainability-Focused Knowledge Base Construction

Unlike conventional RAG systems that rely on singular, generic knowledge sources, the knowledge base in this framework is purposefully designed as two parallel, specialized repositories: an Aesthetic Precedent Knowledge Base and an Ecological and Technical Norms Knowledge Base (Table 1). This dual structure serves as the fundamental prerequisite for the subsequent “Generative-Critical” multi-agent to achieve balanced decision-making.

2.2.1. Aesthetic Precedent Knowledge

The function of this module is to provide the AI with rich, high-quality design inspiration and tacit knowledge, addressing the challenges of aesthetic bias and creative homogenization in AIGC. This ensures that the AI’s aesthetic inspiration is sourced from excellent, real-world built projects rather than from random, unverified knowledge. The specific processing workflow is as follows (Figure 2):

(1): Data collection: This study selected the professional landscape design portal Gooood.cn as the primary data source. This platform was chosen for two main reasons: (a) Professional authority: Gooood.cn is a globally leading (top-five) architecture and landscape design portal whose published projects are curated by professional editors and include landscape cases from renowned design firms like ASLA and AECOM, representing a high standard of contemporary design; (b) High-quality content: The collected cases typically contain complete design concepts and rich textual descriptions, providing abundant material for knowledge extraction [29]. We employed automated scripts to systematically acquire cases covering a multitude of design scenarios (e.g., urban parks, waterfronts, campus landscapes, cultural heritage, etc.), ensuring the knowledge base’s diversity and representativeness (Figure 3).
(2): Knowledge structuring: The raw HTML text acquired directly from the web contains substantial non-semantic noise (e.g., advertisements, navigation links, copyright notices). This noise can severely contaminate the knowledge base, causing the AI to generate hallucinations or off-topic responses during RAG retrieval. To resolve this, we utilized a high-performance LLM (i.e., Qwen-turbo) as a data processing engine. Through carefully designed prompt engineering, we achieved semantic filtering and text structuring. This LLM agent automatically performs the following tasks: (a) semantic filtering: identifying and removing all text noise irrelevant to the design description; (b) language standardization: retaining only Chinese content to ensure corpus consistency; and (c) content structuring: reorganizing unstructured long-form descriptions into a structured Markdown format (e.g., Project Overview, Design Concept, Node Details) (Table 2).
(3): Update mechanism: To ensure the aesthetic and precedent knowledge base continuously reflects the latest design trends and practices, we implemented scheduled incremental crawling and duplicate content filtering. A script is set to perform incremental crawls of new landscape cases on Gooood.cn at a low frequency of once per month. This low-frequency strategy allows for capturing newly published excellent cases while effectively avoiding the triggering of anti-scraping mechanisms on the target website. Before new data are imported, the titles and content of new cases are compared against the existing database to automatically identify and filter out duplicate or highly similar entries, ensuring the purity of the knowledge base.

This LLM-based semantic processing method demonstrates significantly higher robustness and accuracy compared to traditional rule-based matching using regular expressions (Regex). It can understand context and preserve the complete design logic, providing a high-quality, clean data foundation for subsequent semantic chunking and vectorization.

Figure 2. Data collection and optimization workflow for the Aesthetic Precedent Knowledge Base.

This figure illustrates the data collection and preprocessing workflow for the Aesthetic Precedent Knowledge Base, using Gooood.cn as an example. The system locates relevant case pages via keywords (e.g., “urban park”) and extracts project metadata (title, URL, content) through parsing. Subsequently, an LLM is used for structural rewriting of the raw descriptions, ultimately generating structured data with high information density and a uniform format. The symbols “###” and “**” appearing in the generated text represent Markdown syntax used for structural formatting. This data is then saved to a data storage module, preparing it for subsequent vectorization. Additionally, the ellipsis indicates that the full text has been omitted for brevity and clarity. (Sources of the example data: https://www.gooood.cn/umea-campus-park.htm?lang=zh_CN and https://www.gooood.cn/the-urban-carpet-by-atelier-scale.htm, accessed on 30 October 2025).

Figure 3. Distribution of aesthetic precedent types.

This pie chart illustrates the thematic distribution of the 108 aesthetic precedents collected in the knowledge base. The cases cover a diverse range of landscape types to ensure the representativeness and breadth of the inspiration source.

2.2.2. Ecological and Technical Norms

This module is central to the framework. Its function is to provide the AI with explicit knowledge—specifically, the complex, technical, and performance-based evidence for sustainability. It directly addresses the knowledge blindness of AIGC and serves as the source of objective criteria for the Evaluation Agent’s internal review. The data sources for this module are not aesthetic precedents but rather recognized, authoritative sustainability design standards and technical guidelines. This study integrated 12 core normative texts (Table 3). The selection of these norms followed these principles:

(1): Authority and universality: Priority was given to internationally recognized authoritative standards such as The Sustainable SITES Initiative (SITES) and LEED for Neighborhood Development (LEED-ND). SITES is the most comprehensive globally recognized rating system for sustainable landscapes, and its criteria regarding water management, soil and vegetation, materials selection, and human health and well-being provide the direct theoretical basis for our sustainability scorecard [30]. Meanwhile, LEED-ND enables the AI to look beyond the site boundary and consider factors at a broader urban planning scale, such as land use, transportation connectivity, and green infrastructure networks, ensuring a high-level design perspective [31,32].
(2): Localized adaptability: Key Chinese national standards, such as the Technical Guidelines for Sponge City Construction [33] and the Standard for Urban Green Space Planning [34], were included because sustainable design must be closely integrated with regional environmental challenges and regulatory requirements. The Technical Guidelines for Sponge City Construction provides systematic solutions for stormwater issues in China’s high-density cities, including specific design parameters for bioretention facilities and grassed swales—localized practical knowledge not detailed in international standards like SITES. This ensures that the AI’s proposals not only align with advanced international concepts but also effectively address local climate and policy contexts, enhancing the framework’s regional applicability.
(3): Comprehensive scale coverage: The knowledge base covers multiple scales, from macro-level community planning and meso-level site-wide design to micro-level specific technical details. This multi-scale knowledge integration enables the AI to reason about projects of varying scales.

Furthermore, it is crucial to keep the ecological and technical knowledge base synchronized with the latest industry standards. Considering that laws, regulations, and technical specifications are updated infrequently through official releases, this study involves a semi-annual manual check of the official publication channels for all norms included in this knowledge base to verify their version status. If major revisions or new policies/standards are found, the new documents will be manually downloaded, and the knowledge base content will be updated.

2.2.3. RAG-Based Knowledge Base Construction

This study employed RAG technology to construct the knowledge base, implementing a complete workflow from data processing and semantic chunking to vector embedding and storage (Figure 4).

(1): Semantic chunking strategy: To ensure the precision of RAG retrieval, this study adopted a semantic-based adaptive chunking strategy. Traditional fixed-length chunking methods can crudely sever semantic continuity (e.g., splitting a single SITES credit across two text chunks), leading to fragmented retrieval results. Our method employs multi-level semantic boundary detection, combined with Jieba for Chinese lexical analysis and punctuation marks (e.g., 。, !, ?, ;, :) as delimiters, to achieve precise sentence- or paragraph-level segmentation. While ensuring semantic integrity, this method dynamically adjusts chunk size and overlap to maintain the coherence of complex contexts. Each chunk retains rich metadata, such as its source, chapter, URL, and chunk ID, to support precise traceability and XAI presentation later on.
(2): Vector embedding and storage: All cleaned and chunked text data were vectorized using a BERT-based embedding model optimized for Chinese (text2vec-base-chinese). This model is capable of accurately capturing the deep semantic relationships between landscape architecture terminologies. All vectors are stored in FAISS (Facebook AI Similarity Search), a high-performance vector database. We employed an index that supports efficient similarity search, combined with an inverted file index and vector quantization, to optimize storage and query efficiency while ensuring retrieval accuracy.

Ultimately, by integrating over 100 aesthetic precedents and 12 core ecological and technical norms, this sustainability-focused knowledge base constitutes a specialized brain that fuses aesthetic inspiration with ecological constraints. It provides a solid, reliable, and balanced knowledge foundation for the subsequent multi-agent workflow.

Figure 4. Workflow for the construction of the dual knowledge base and the RAG process.

This figure illustrates the core technical architecture of the system. The upper section depicts the knowledge base construction (vector storage) process, where the system first acquires data from two heterogeneous sources: the Aesthetic Precedent Knowledge Base and the Ecological and Technical Norms Knowledge Base. After preprocessing, cleaning, and semantic chunking, the data is fed into an embedding model to be converted into high-dimensional vectors. These vectors, along with a BM25 index, are then stored in a vector database to complete the construction of the dual knowledge base. The lower section illustrates the RAG process: when a user inputs a query, the system similarly vectorizes it and performs a hybrid search in the database to retrieve the most relevant document chunks. These chunks are re-ranked and then combined with the original query to form an augmented prompt. This prompt is finally fed to a large language model to generate a final answer that is infused with professional knowledge and is traceable.

2.3. The “Generative-CRITICAL” Multi-Agent Workflow

2.3.1. Generation Agent

The Generation Agent plays the role of the designer within the “Generative-Critical” core. Its core responsibility is to respond to the user’s initial requirements and undertake the primary creative generation tasks. Its task is to first access the pre-built dual knowledge base via retrieval augmentation, and then synthesize all input information to generate a structured initial conceptual scheme draft (Figure 5). This draft serves as the subject for review by the subsequent Evaluation Agent. The agent’s workflow is designed as a multi-stage reasoning process, comprising three key mechanisms:

(1): Cross-modal contextual understanding: As the designer, the agent must first precisely comprehend the task. It is designed to receive and process heterogeneous data inputs, including textual design briefs and visual site existing condition plans, performing both textual context parsing and visual context translation:

Textual context parsing: The agent parses the design brief to structurally extract the project’s core constraints, such as design goals, functional requirements, site boundaries, and style preferences.
Visual context translation: This study uses Qwen-VL-Max for the spatial-semantic translation of the site’s existing condition plan. Using customized prompts, the visual language model translates topographic features and surrounding environmental elements (e.g., water bodies, roads) from the drawings into precise natural-language descriptions.

The product of this mechanism is a unified, context-rich problem description that integrates the explicit requirements from the text and the implicit conditions from the images, laying a solid foundation for subsequent generation.

(2): Dual knowledge retrieval and fusion: After clearly defining the problem, the agent enters the information gathering and inspiration phase. To ensure the professionalism and innovativeness of the generated scheme, the agent performs two types of knowledge retrieval tasks in parallel. The first is external world knowledge retrieval, where the study leverages the embedded web search capability of the DeepSeek-R1 LLM to query dynamic external information such as the project site’s climate, history, and master plans, ensuring the scheme’s site-specificity. The second is internal professional knowledge base retrieval, where the agent queries the dual knowledge base. To simultaneously satisfy the needs for technical precision and inspirational relevance, this study adopted a hybrid retrieval strategy:

Keyword-based sparse retrieval (BM25): This method excels at exact matching of professional terminologies, ensuring that the hard constraints from technical norms are recalled. This algorithm is a probabilistic retrieval model based on term frequency-inverse document frequency (TF-IDF), with its relevance score calculated as follows [35]:

Score (D, Q) = \sum_{i = 1}^{n} IDF (q_{i}) \cdot \frac{f (q_{i}, D) \cdot (k_{1} + 1)}{f (q_{i}, D) + k_{1} \cdot (1 - b + b \cdot \frac{| D |}{avgdl})}

(1)

where

f (q_{i}, D)

represents the frequency of the query term

q_{i}

in document

D

,

| D |

is the length of document

D

,

avgdl

is the average document length, and

k_{1}

and

b

are tuning parameters.

Semantic-based dense retrieval: This method uses a Chinese semantic model (text2vec-base-chinese) to convert text into 768-dimensional vectors. It excels at understanding abstract concepts to recall semantically relevant aesthetic precedents for inspiration. Retrieval is performed by measuring the semantic distance between the query vector and document vectors using cosine similarity, calculated as [36]

similarity (A, B) = \frac{A \cdot B}{| | A | | \cdot | | B | |}

(2)

where

A

and

B

are two vectors representing the query text and the knowledge base document, respectively;

A \cdot B

is their dot product; and

| | A | |

and

| | B |

are the vector lengths. This formula measures the cosine of the angle between two vectors in vector space, with a value closer to 1 indicating higher semantic similarity.

This study fuses the results of both retrieval methods using an empirically validated optimal weight ratio of 0.4:0.6, calculating a combined score that is then sorted in descending order:

{Score}_{combined} = 0.4 \times {Score}_{BM 25} + 0.6 \times {Score}_{vector}

(3)

Finally, a relevance threshold of 0.75, determined through extensive experimental tuning, is applied to filter the results. We found that below this threshold, the relevance of retrieved knowledge chunks to the query drops significantly, easily introducing noise. Conversely, a higher threshold risks recalling too few chunks, failing to provide sufficient context to inspire design creativity. Therefore, 0.75 represents an optimal balance between ensuring knowledge relevance and richness. By complementing the advantages of both retrieval methods, this process provides the agent with high-quality, structurally balanced knowledge evidence.

(3): Prompt-guided draft generation: This is the final stage of the Designer agent’s work. The agent is responsible for synthesizing and refining the outputs from the previous two mechanisms—the problem description and the knowledge evidence—to generate the first version of the conceptual scheme draft. This process relies on a structured prompt template. The template not only instructs the agent to reference precedents and output a complete scheme with sections like concept theme, functional zoning, and node design, but more critically, it explicitly requires the agent to attempt to integrate the retrieved sustainability knowledge, such as LID facilities and the application of native plants.

By simulating a designer’s cognitive chain of “comprehend-analyze-retrieve-synthesize,” the Generation Agent transforms unstructured user requirements into a structured scheme draft that contains specific design content and represents an initial attempt at integrating sustainability considerations. This draft is then immediately passed to the Evaluation Agent, entering the next phase of the “Generative-Critical” workflow: critical review.

Figure 5. Construction workflow of the generation agent for landscape conceptual schemes.

This figure details the internal working mechanism of the Generation Agent, which is designed to produce an initial conceptual scheme that balances aesthetics and ecology. The process begins with cross-modal contextual understanding, where the agent processes both the design brief and the site’s existing condition plan in parallel. The analysis results then trigger a dual knowledge retrieval, where the system performs online searches for external information based on instructions and executes a hybrid search on the pre-built dual knowledge base. Finally, it generates a structured landscape conceptual scheme (Draft V1.0), which is subsequently submitted to the Evaluation Agent for review. (Source of the example data: https://www.gooood.cn/nanjing-tangshan-quarry-park-china-by-zt-studio.htm, accessed on 30 October 2025).

2.3.2. Evaluation Agent

The Evaluation Agent plays the role of the internal Expert Critic within the “Generative-Critical” multi-agent core. Its core task is to receive the initial conceptual scheme draft submitted by the Generation Agent and conduct a rigorous, objective sustainability performance assessment. The agent’s evaluation mechanism is not based on vague, subjective judgments but on a clear, quantitative evaluation framework: the sustainability scorecard. To construct this scorecard, this study drew upon internationally recognized professional rating systems like SITES and distilled them into four core dimensions that are most operable and assessable at the landscape conceptual design stage: ecological resilience, hydrological performance, social and human well-being, and resource and material efficiency [30].

In its workflow, the Evaluation Agent first utilizes the text comprehension and information extraction capabilities of an LLM to review the scheme draft item by item against the specific criteria in the scorecard (Table 4). It seeks descriptions of specific design strategies, technical measures, or material applications that support sustainability goals. Based on the draft’s degree of compliance with these criteria, the agent generates an individual score between 0 and 100 for each of the four dimensions. Subsequently, the system calculates the scheme’s overall sustainability score using the following weighted sum formula:

S_{t o t a l} = w_{e c o} S_{e c o} + w_{w a t e r} S_{w a t e r} + w_{s o c i a l} S_{s o c i a l} + w_{m a t e r i a l} S_{m a t e r i a l}

(4)

where

S_{t o t a l}

is the overall sustainability score of the draft, which serves as the key threshold for triggering the subsequent refinement loop;

S_{e c o}, S_{w a t e r}, S_{s o c i a l}

, and

S_{m a t e r i a l}

are the individual scores for the four dimensions of ecological resilience, hydrological performance, social and human well-being, and resource and material efficiency, respectively; and

w_{e c o}, w_{w a t e r}, w_{s o c i a l}

, and

w_{m a t e r i a l}

are the weight coefficients for each dimension, satisfying

Σ w = 1

.

To enhance the specificity of the assessment, the agent is equipped with a project type identification module. Before evaluation, the agent first identifies the project type from the design brief (e.g., ecological wetland park, community pocket park) and then automatically selects a matching set of weight coefficients from a predefined configuration library (Table 5) to reflect the varying emphasis on sustainability goals across different projects.

2.3.3. The “Critique-and-Refine” Loop

The “Critique-and-Refine” loop is the operational mechanism and the soul of the “Generative-Critical” multi-agent core. It connects the Generation Agent and the Evaluation Agent into an automated, self-optimizing closed-loop system. This mechanism simulates the real-world design process where a scheme is iteratively refined between a Designer and an Expert Critic, with the goal of ensuring that the final output scheme meets preset high-standard sustainability requirements. The workflow of this loop is designed as a score-driven, automated iterative process (Figure 6):

(1): Generation: The loop begins. The Generation Agent receives the user’s initial requirements and generates the first version of the conceptual scheme draft (V1.0).
(2): Evaluation: The draft (V1.0) is automatically submitted to the Evaluation Agent. The agent reviews it based on the sustainability scorecard (Table 4) and dynamic weights (Table 5), calculating the overall sustainability score, $S_{t o t a l}$
(3): Judgment: The system compares $S_{t o t a l}$ with a preset sustainability acceptance threshold. In this study, the threshold is set to 80. This value, determined empirically, represents a scheme quality level transitioning from good to excellent, indicating that the scheme has achieved a sufficient and balanced consideration of all key sustainability dimensions, rather than merely meeting minimum standards.
(4): Critique and feedback generation: If $S_{t o t a l}$ < 80 (not met), the Evaluation Agent not only returns a low score but also generates specific, actionable feedback, i.e., a critical review.
(5): Regeneration: This critical review serves as a new, high-priority instruction that is fed back to the Generation Agent along with the original user requirements. The agent then initiates a revision task, focusing on optimizing and rewriting the scheme to address the deficiencies pointed out in the review (e.g., adding permeable pavements), thus generating draft V2.0. This new version is then returned to step (2) for a new round of evaluation.
(6): Output on acceptance: If $S_{t o t a l}$ ≥ 80 (met), the loop terminates. The system determines that the current version of the scheme (e.g., V2.0) has met the sustainability standards. The accepted scheme, along with its final sustainability scorecard (e.g., $S_{t o t a l}$ = 85), is then output to the user as the final result.

To prevent infinite loops caused by significant scheme deficiencies or overly strict evaluation criteria, this study introduces a maximum iteration count,

N_{m a x}

(e.g.,

N_{m a x}

= 3), as a breaker mechanism. If the scheme still fails to reach the score of 80 after 3 iterations, the system halts the loop and outputs the current best-scoring scheme along with its complete critical review, prompting for human intervention.

This figure illustrates the “Critique-and-Refine” loop mechanism within the “Generative-Critical” core. After the Generation Agent creates a draft, the Evaluation Agent calculates the score

S_{t o t a l}

.

2.4. Human–Computer Interaction Workflow

2.4.1. Expert Consultation Agent

The human–computer interaction (HCI) workflow serves as the framework’s user-facing front end. It is responsible for handling further user interactions after an accepted scheme is obtained, transforming the system from an automated design tool into a consultative and exploratory design assistant. The core of this workflow is the Expert Consultation Agent.

The Expert Consultation Agent functions as the system’s conversational memory and explanation hub. Its primary task is to handle the unstructured, often unpredictable, follow-up queries from users after they receive the final scheme, such as the following:

Evaluation explanation: “Why was the score for hydrological performance low?”
Knowledge deep-dive: “What specific ecological restoration measures were taken in the referenced Nanjing Tangshan Quarry Park?”
Hypothetical reasoning: “If I replace the entrance plaza paving with permeable bricks, how much will the hydrological performance score increase?”

The core challenge facing this mechanism is the trade-off between conversational fluidity and computational rigor. The system must be able to fluently comprehend the context while ensuring that, when responding to hypothetical reasoning, the scores are genuinely recalculated rather than hallucinated. To resolve this conflict, this study designed a three-stage response mechanism termed “Primary-Verification-Integrator” (Figure 7):

(1): Primary agent: This agent acts as the conversational interface, receiving the user’s query and relevant context, such as the final scheme text and the overall evaluation score. It prioritizes understanding user intent and ensuring conversational flow, generating a complete, colloquial preliminary response draft. This draft is not directly shown to the user but is intercepted internally.
(2): Verification agent: Serving as the fact-checking center, this agent intercepts the preliminary response and verifies its key information. This agent performs two tasks: (1) Fact-checking: It calls the knowledge base retriever to cross-reference any cited cases or norms for accuracy. (2) Calculation-checking: If the query involves hypothetical reasoning, such as replacing permeable bricks, it activates the Evaluation Agent to recalculate a new, precise score for the modified scheme fragment.
(3): Integrator agent: Acting as the final synthesizer, this agent merges the fluent draft from the Primary Agent with the precise data packet (including corrections and new scores) from the Verification Agent to generate a final response that is both rigorous and fluid.

This mechanism ensures the high reliability of the system when functioning as an expert assistant. It transforms the system from a one-time delivery tool into a collaborative design partner that supports in-depth exploration.

Figure 7. The “primary-verification-integrator” mechanism of the expert consultation agent.

This figure shows the response mechanism designed for the Expert Consultation Agent to resolve the conflict between conversational fluidity and computational rigor, based on the “Primary-Verification-Integrator” model.

2.4.2. Human–Computer Interaction Interface

The HCI interface is the visual front end for user interaction with the multi-agent system. To ensure a fluid user experience and functional focus, the page layout is divided into three core functional areas: an input area, a scheme display and evaluation area, and a dialogue interaction area. These three areas work synergistically to support the entire workflow, from data input and scheme generation to interactive optimization (Figure 8).

(1): Input area: Located on the left side of the interface, this serves as the starting point of the workflow. Here, users can upload design briefs and site existing condition plans in various formats. After the user clicks the “Start Generation” button, the system invokes the Generation Agent to perform multimodal data parsing and reasoning analysis. Intermediate states during the generation process (approx. 5–8 min) are displayed here to provide feedback to the user, ensuring efficient processing of multimodal inputs.
(2): Scheme display and evaluation area: Positioned in the center of the interface, this is the presentation area for core deliverables. When the “Critique-and-Refine” loop outputs an accepted scheme, this area automatically presents two core components: (a) Detailed scheme content: This includes the detailed text of the landscape conceptual scheme, covering the concept theme, functional zoning, node designs, and links to reference cases; (b) sustainability scorecard: Directly linked to the Evaluation Agent, this module clearly displays the final overall sustainability score and the detailed scores for each sub-dimension in a tabular format.
(3): Dialogue interaction area: Located on the right side, this serves as the front-end carrier for the Expert Consultation Agent. In this multi-turn dialogue window, users can pose explanatory questions or propose hypothetical reasoning and modification requests regarding the generated scheme content. The agent generates professional answers based on the scheme data and conversation history. If a modification is involved, the dynamically updated results are fed back in real-time to both the dialogue window and the scheme display area, ensuring the immediacy and relevance of the interaction.

In summary, through the synergistic design of these three functional areas, the HCI interface achieves a closed-loop operational process from initial input, scheme presentation, and quantitative evaluation to dynamic optimization. This interface design not only enhances user engagement and operational convenience but also boosts the adaptability and personalization of design schemes through its intelligent interaction mechanism, providing effective support for intelligent and user-oriented landscape design.

Following conventional user habits, we have clearly divided the interaction interface from left to right into an input area, a scheme display area, and a dialogue interaction area. This layout facilitates a closed-loop interaction process from task input and results evaluation to dialogue-based refinement.

In summary, following the design science research paradigm, this study constructs an AI decision-support framework that integrates a dual knowledge base, a “Generative-Critical” multi-agent workflow, and an explainable human–computer interaction interface. This framework offers a novel technological path for resolving the challenge of integrating ecological function and aesthetic expression in landscape design. To scientifically validate the practical value of this framework, the next section demonstrates its application process through a specific case study of a cultural park design. It will also report the detailed results of our effectiveness evaluation experiments, including a system usability test and a double-blind expert review, to quantitatively prove its efficacy in enhancing the sustainability of design proposals.

Figure 8. The human–computer interaction interface for sustainable landscape scheme generation.

3. Results

3.1. Case Study: Sustainable Cultural Park Design

To validate the effectiveness, robustness, and intelligence level of the multi-agent framework constructed in this study, we selected the “Xizhu Town Baita Cultural Theme Park” project as a case study. This section details the framework’s entire process, from task input and knowledge retrieval to iterative generation and human–computer interaction.

3.1.1. Case Background and Task Input

The project aims to develop a comprehensive park on a planned 13.3-hectare site that integrates local cultural features with ecological and recreational functions. The design is based on the existing cultural buildings, such as the “Yu Ling Cultural Memorial Hall” and “Yiren Academy.” Via the HCI interface, the designer uploaded the design brief, which emphasized the principles of cultural significance and ecological integrity, along with a site existing condition plan that included elevation points (Figure 9).

Upon receiving the task, the Generation Agent was initiated. Its visual analysis module read the site existing condition plan and the design brief, identifying key spatial features: the site’s topography exhibits a general trend of being high in the northwest and low in the southeast (with elevation points at 11.0 m in the northwest, 9.0 m in the northeast, and 8.0 m in the southwest, indicating the southeast corner is the lowest area). Concurrently, it noted that water resources on the site are relatively scarce, with no significant surface water bodies identified. This presented both a challenge and an opportunity for low-impact development (LID) design.

Figure 9. Case study background.

3.1.2. Knowledge Base Retrieval and Association

To acquire sufficient knowledge, the agent performed a dual knowledge retrieval. The external search identified the local cultural IPs as Yixing Zisha pottery and Yu Ling (a playwright). The internal knowledge base retrieval recalled SITES specifications, Sponge City guidelines, and aesthetic precedents, such as “The Orchestra Park” [37]. The system recognized this precedent’s success in translating abstract cultural symbols (the melodies of Jiangnan Sizhu music) into tangible landscape forms (flowing, curved pathways), providing inspiration for an integrated aesthetic-ecological approach for the current project. The specific retrieval and association results are shown in Table 6.

This figure displays the core content of the case study’s design brief and the site’s existing condition plan.

3.1.3. Empirical Validation of the “Critique-and-Refine” Loop

Subsequently, the system activated the “Critique-and-Refine” loop, set the sustainability acceptance threshold to T = 80, and initiated the iterative generation and evaluation of the scheme. Based on the design brief’s emphasis on culture and ecology, the Evaluation Agent classified the project type as a cultural park and invoked the corresponding preset weights from Table 5 (

w_{e c o} = 0.3, w_{w a t e r} = 0.2, w_{s o c i a l} = 0.4, w_{m a t e r i a l} = 0.1

).

(1): First-Round generation (V1.0):

Knowledge-driven: The Generation Agent first responded to the cultural themes and aesthetic precedents from the knowledge base (Table 6). Inspired by the formal translation of abstract cultural symbols in “The Orchestra Park” [37], the agent attempted a strong visual symbol translation of the cultural imagery of “Yixing Zisha” and “Yu Ling’s plays.”

Scheme generation: This led to the first conceptual scheme (V1.0), which prioritized the cultural principles and the designer’s potential preference for aesthetic form, generating the first version of the conceptual scheme (V1.0). The scheme, themed “Literary Charm and Ink Fragrance,” pursued a strong sense of visual symbolism in its aesthetic form. Drawing on the design technique of morphing cultural rhythms into form from the “Orchestra Park,’ the scheme designed an “Ink Mark” pathway. This path, using dark polished granite, mimicked the sharp, flowing lines of calligraphy to connect the three dispersed cultural pavilions. At the site’s high point in the northwest, a viewing platform was set up using the topographical advantage, serving as the park’s prologue and entrance marker. In the vast, gentle slopes of the west, the existing farmland texture was treated as land art, forming large striped color blocks with ornamental grasses of different hues. In the center of the northeastern building cluster, a large-scale “Zisha Seal” plaza was planned, paved with solid Zisha-red granite, intended to create a “land-seal” with a strong sense of volume and cultural symbolism. The low-lying area in the east of the site was simply treated as an expansive sun-drenched lawn (Figure 10).

(2): First-round evaluation and critical feedback:

Evaluation results: The V1.0 scheme was automatically submitted to the Evaluation Agent. The evaluation of V1.0 (Table 7) revealed that the scheme excelled in the social and human well-being dimension (

S_{s o c i a l} = 90

) due to its strong aesthetic form and cultural theme, but scored extremely low on hydrological performance (

S_{w a t e r} = 30

) and failed in ecological resilience (

S_{e c o} = 45

). The final total sustainability score was 61.5, which did not meet the threshold, thus failing the scheme.

Critical feedback: The Evaluation Agent generated specific critical feedback. It first acknowledged that V1.0 had a clear cultural theme and a complete aesthetic form but immediately pointed out its core flaw: a severe disconnection between aesthetics and ecology. Based on the Sponge City guidelines and SITES standards from the knowledge base, the agent noted the following: (1) retrofitting the pathway and plaza for permeability while retaining their aesthetic form; (2) constructing an LID system that leverages the northwest—high, southeast—low topography; and (3) replacing the lawn in the east corner with a newly constructed bioretention pond or rainwater wetland to serve as the final collection, purification, and storage point for site runoff.

(3): Secnd-round iteration (V2.0):

Knowledge-driven: The Generation Agent, upon receiving the critical feedback, treated it as a top constraint. It re-queried the knowledge base, this time seeking technologies and precedents that could integrate form with permeable function. It associated the “cracks” aesthetic from “Zisha pottery” culture with technologies like vegetated swales and permeable paving from the Sponge City guidelines.

Scheme generation: The agent revised the V1.0 scheme to produce V2.0. The core of this revision was to integrate sustainable functions in a holistic design without sacrificing aesthetics. The concept was deepened to “Pottery Charm, Resilient Habitat.” The aesthetic form of the “Ink Mark” pathway was retained, but its material was replaced with a novel permeable pavement made from dark, recycled ceramic aggregates mixed with resin, maintaining the visual effect of an “ink mark” while enabling rainwater infiltration. The monolithic form of the “Zisha Seal” plaza was artistically shattered, introducing the aesthetic concept of “pottery cracks.” These cracks were designed as linear, vegetated bioswales crisscrossing the plaza. The main paving was also replaced with permeable bricks inlaid with recycled Zisha pottery shards, unifying cultural, aesthetic, and hydrological functions. Concurrently, the scheme fully leveraged the northwest—high, southeast—low topography, building a complete gravity-flow LID system: terraced rain gardens were added below the northwest entrance platform. The land art in V1.0 was transformed into a functional ecological purification zone, with multiple micro-bioretention swales on the gentle slopes. Water collected from high points flows through this zone before being channeled into the central plaza’s bioswales. The original sun-drenched lawn was redesigned as a “Five-Color Earth” themed bioretention pond, where a new core water feature was created. This retention pond not only serves as the terminus of the LID system, but its form and embankment design also echo the five natural colors of Zisha pottery, creating a rich biodiversity habitat (Figure 10).

(4): Second-round evaluation:

Evaluation results: The evaluation results for the V2.0 scheme (Table 7) showed significant improvements across all dimensions. The ecological resilience score (

S_{e c o}

) increased to 90, hydrological performance (

S_{w a t e r}

) to 90, and material efficiency (

S_{m a t e r i a l}

) to 85, while the advantage in social and human well-being (

S_{s o c i a l}

) was maintained at 90, resulting in a final overall score of 89.5. Since 89.5 ≥ 80, the agent determined that scheme V2.0 met the threshold, and the “Critique-and-Refine” loop terminated. The agent then pushed the V2.0 scheme and its final evaluation report to the user’s HCI interface.

Figure 10. Comparative analysis of landscape conceptual schemes V1.0 and V2.0.

We drew this figure based on our understanding of the AI-generated schemes to illustrate the iteration process from V1.0 to V2.0: (a) V1.0 prioritized aesthetics, using large areas of impermeable pavement, which led to uncontrolled stormwater runoff and flood risk at the site’s lowest point. (b) V2.0, while retaining the aesthetic framework of V1.0, established a source-to-end gravity-flow LID system by ecologically retrofitting key nodes (plaza, pathway, land art area), transforming the flood risk into a sustainable ecological water feature.

3.1.4. Human–Computer Interactive Consultation

After the automated completion of the “Critique-and-Refine” loop, the designer began to review the final V2.0 scheme. As shown in Figure 11, the scheme display area of the HCI interface presented the text for the “Pottery Charm, Resilient Habitat” scheme, while the scorecard module displayed the final scores and details for each dimension. The designer then posed a follow-up query in the dialogue interaction area regarding the integration of aesthetics and function in the scheme: “How does the scheme balance the aesthetic form of the ‘Zisha Seal’ with hydrological functions, especially given the site’s original water scarcity?”

To respond to this query, the Expert Consultation Agent internally activates a “Primary-Verification-Integrator” three-stage response mechanism, with the detailed workflow as follows:

(1): Primary Agent: The Primary Agent first parses the user’s intent, identifying the core keywords “aesthetics” and “hydrological function.” It then quickly accesses the structured data of the V2.0 scheme, retrieving design elements and strategies directly related to these keywords. It locates key nodes such as topography utilization, bioswales, and a newly created bioretention pond. Based on the logical relationships between these nodes, it rapidly generates a natural language draft, roughly stating: “The scheme uses the terrain, breaks the seal into bioswales to channel water, and finally builds a new pond in the low area to store water.”
(2): Verification Agent: Upon receiving the draft from the Primary Agent, the Verification Agent acts as a fact-checker. It rigorously compares each argument in the draft against the original data. First, it retrieves the prior site analysis data, confirming a clear elevation difference with 11.0 m in the northwest and below 8.0 m in the southeast, thus validating the “uses the terrain” argument. Second, it traces back to the V2.0 scheme text, confirming that the “cracks” are functional vegetated bioswales. Finally, it confirms that the V2.0 scheme explicitly mentions the creation of a “bioretention pond” in the southeast corner and associates it with the “Five-Color Earth” aesthetic theme.

After verifying the facts, the Verification Agent further enriches the draft with more precise terminology and data, such as refining “breaks” to “morphological transformation,” specifying “store water” as “directs all LID runoff to the site’s lowest point,” and adding elevation point data to enhance persuasiveness.

(3): Integrator Agent: The Integrator Agent is responsible for the final polishing and formatting. It reorganizes the verified content fragments into a logically clear and professionally worded text. It adds polite and structured introductory phrases like “The scheme balances both…”, and uses a bulleted list format. Ultimately, it synthesizes the highly structured and readable response that the user sees: “The scheme balances both aspects through two design strategies: form-function integration and turning constraints into assets:

Leveraging topography: The scheme fully utilizes the site’s ‘northwest—high, southeast—low’ elevation gradient (from elevation point 11.0 m down to below 8.0 m), designing the ‘Ink Mark’ pathway as a water interception facility at the higher elevation.

Morphological transformation: The impermeable complete seal of V1.0 was upgraded to the pottery cracks form in V2.0. These cracks are designed as functional, vegetated bioswales to guide and purify runoff.

Water body creation: To address the site’s water scarcity, the V2.0 scheme directs all LID runoff to the site’s lowest point (the southeast corner), upgrading the dry lawn of V1.0 into a newly created ‘Five-Color Earth’ themed bioretention pond.”

Before answering, the Expert Consultation Agent internally activates its three-stage response mechanism: the Primary Agent generates a fluent draft; the Verification Agent checks the V2.0 scheme text, confirming keywords such as “northwest–high, southeast–low,” “cracks,” and “newly created retention pond”; and the Integrator Agent synthesizes the final reply.

To summarize, this section showcased the complete application workflow of the research framework within a cultural park design case. Commencing with a task analysis of the site’s existing conditions, the process advanced through knowledge retrieval and association from the dual knowledge base before initiating a two-round “Critique-and-Refine” loop. In this loop, an initial scheme driven by aesthetics (V1.0, with an AI internal score of 61.5) was iteratively revised, based on critical feedback from the Evaluation Agent, into a final scheme that successfully integrates ecological function and cultural aesthetics (V2.0, with an AI internal score of 89.5). The process concluded by demonstrating the framework’s ability to explain its design decisions via human–computer interactive consultation.

Figure 11. Prototype of the landscape conceptual scheme generation agent’s interface based on human–computer interaction.

This figure shows the user interface prototype for the landscape conceptual scheme generation constructed in this study, displaying the actual results of the agent-generated scheme and user consultation.

3.2. Agent Effectiveness Evaluation

This study recruited 10 participants with professional backgrounds in landscape architecture or architecture, including designers and educators. The evaluation experiment was divided into two phases: a usability test and a blind evaluation of sustainability performance.

3.2.1. Usability Assessment of the Human–Computer Interaction Interface

The system’s usability was assessed using the internationally standardized system usability scale (SUS). SUS is a 10-item Likert scale questionnaire (Table 8) used for the rapid assessment of a system’s overall usability, ease of use, and user satisfaction, with a score range of 0–100 [38]. After completing their tasks, the 10 participants’ submitted SUS questionnaires yielded an average score of 85.5 ± 7.2 (Mean ± SD). Participants commonly reflected in qualitative interviews that the interface’s three-part layout was logical and that the real-time feedback from the scorecard’s evaluation was intuitive and easy to understand. According to industry benchmarks for SUS scores, an average score of 68 is considered acceptable, while scores above 80.3 are rated as excellent.

3.2.2. Blind Evaluation of Sustainability Performance

To validate the true effectiveness of the “Critique-and-Refine” loop—that is, whether the agent’s self-iteration genuinely optimizes the scheme—this section conducted a crucial expert blind evaluation. This experiment selected two key scheme versions generated by the AI in the case study as the evaluation subjects.

For the evaluation procedure, we anonymized the scheme texts of V1.0 and V2.0, removed version numbers and internal AI scores, randomized their order, and distributed them to the 10 experts. The experts independently scored the two schemes using the identical scorecard and cultural park weights as the AI. The quantitative statistical results of the expert blind evaluation are shown in Table 9.

To test whether the differences in sustainability scores between the V1.0 and V2.0 schemes were statistically significant, this study conducted paired-samples t-tests on the total score and sub-scores. The results show that the total score and the scores for the ecology, hydrology, and material dimensions of the AI-refined scheme (V2.0) were all significantly higher than those of the baseline scheme (V1.0).

4. Discussion

4.1. The Mechanistic Role of the “Generative-Critical” Multi-Agent Framework

The V1.0 scheme in this case study reveals a common behavioral pattern in current large language models within professional design domains: “creative but lacking in rigor.” In its initial generation of V1.0, the AI agent demonstrated strong capabilities in morphological translation and aesthetic narrative. This was achieved through effective retrieval from its internal knowledge base, particularly its successful adaptation of the technique of formalizing cultural symbols from the Orchestra Park case. It successfully translated cultural elements like “Zisha” and “calligraphy” into tangible design languages such as the “Seal Plaza” and “Ink Mark Pathway.” However, the scheme’s sustainability evaluation score was extremely low, especially in hydrological performance (

S_{w a t e r} = 30

) and ecological resilience (

S_{e c o} = 45

), directly reflecting its design flaws in addressing real-world site constraints (such as topographical differences and water scarcity). This outcome indicates that without a professional evaluation mechanism for constraint, even advanced AI might generate “formalistic green” schemes that appear to align with the design theme but functionally contradict sustainability goals.

However, the “Generative-Critical” framework provided a mechanistic solution to this problem in the case study. In the V1.0 scheme, while the “Zisha Seal” plaza had aesthetic value, its large impermeable pavement resulted in a very low hydrological performance score. The framework’s Evaluation Agent quantitatively identified this disconnect via the scorecard and generated precise revision directives. The final V2.0 scheme seamlessly integrated hydrological function into the aesthetic form by introducing pottery-crackle style bioswales. This process demonstrates how the framework, through internalized, code-based critique, mandatorily weaves ecological function into aesthetic conception, thereby solving the problem at a mechanistic level, not just as a score improvement. It proves that by constructing an automated “Critique-and-Refine” feedback loop, AI can be guided to move beyond superficial stylistic imitation and achieve a deep integration of aesthetic expression and ecological function.

Finally, the introduction of the human–computer interaction interface, particularly the expert consultation module, provided crucial explainable AI (XAI) and final validation for the entire framework. When the user challenged the V2.0 scheme with “how to balance aesthetics and hydrology on a water-scarce site,” the AI was able to clearly articulate its design logic based on the gravity-flow LID system. This not only enhanced trust between the human and the machine but also confirmed that the AI’s revisions were based on understandable, professional reasoning rather than another random generation. This provided the final closed-loop validation for the effectiveness of the entire iterative process.

4.2. Innovation and Generalizability of the “Generative-Critical” Multi-Agent Framework

While AIGC shows immense potential in creative fields, its application in domains requiring deep specialized knowledge, strict normative constraints, and multi-objective trade-offs (e.g., architecture, engineering) still faces challenges of frequent hallucinations and inconsistent quality [20]. The V1.0 scheme from the case study is a prime example: although aesthetically creative, it suffered from inadequate consideration of actual site conditions, resulting in poor sustainability performance.

The core theoretical innovation of this study lies in proposing and validating a “Generative-Critical” multi-agent framework. This framework introduces a domain-knowledge-driven “Critic” to constrain and guide a “Generator,” thereby achieving an internalized expert review mechanism. This mechanism simulates the crucial expert review stage of the design process, embedding it within the AI system to form an automated feedback loop. The leap from V1.0 to V2.0 was driven precisely by the Evaluation Agent identifying the aesthetic-ecological disconnect and issuing targeted revision directives. This demonstrates that the framework can significantly raise the baseline quality and reliability of AIGC outputs in professional domains, moving beyond simple inspiration to generate implementable design schemes that meet professional standards.

To more clearly define the academic contribution of this study, we systematically compared this framework with two dominant AIGC design paradigms: image-centric AIGC and general-purpose LLMs (Table 10).

First, in terms of the operating mechanism, the most significant breakthrough of this study is the shift from a linear open-loop to a closed-loop iteration. Whether using image generation models like Midjourney [41] or engaging in direct Q&A with LLMs (e.g., ChatGPT) [42], existing approaches typically follow a unidirectional linear flow from prompt to result. While efficient, this mechanism lacks intrinsic self-correction capabilities; once the initial generation contains errors, the model cannot perceive them. In contrast, the “Generative-Critical” cycle constructed in this study simulates the divergent–convergent cognitive process of human designers. As demonstrated in the case study, the framework did not stop at the random generation of V1.0 but, through the intervention of the Evaluation Agent, forced the system to undergo multiple rounds of reflection and revision. This closed-loop mechanism fundamentally reduces reliance on the quality of a single generation, significantly enhancing the robustness of the final scheme through iterative optimization.

Second, regarding knowledge sources and constraint handling, this framework resolves the dilemma of uncontrollable implicit knowledge. Existing image models rely on pixel patterns within training data [43], while general-purpose LLMs depend on the probability distribution of pre-trained weights [20]. These two sources of implicit knowledge lead to two core defects: hallucinations, where the model confidently generates content that appears plausible but violates common sense; and soft constraints, where it is difficult to precisely enforce strict engineering standards like SITES codes. In sharp contrast, this study utilizes RAG technology to integrate an explicit dual knowledge base. This ensures that every decision made by the agent is based on traceable normative clauses rather than probabilistic guessing, thereby transforming sustainability requirements from mere suggestions into hard constraints for design generation.

Third, concerning the nature of the output and explainability, this framework aims to bridge the gap between form and function. The output of image-centric AIGC is essentially an aesthetic skin, prioritizing visual impact while often lacking engineering logic [23,44]; the output of general-purpose LLMs is unverified text, which possesses logic but lacks professional depth [20]. However, by integrating environmental data with normative knowledge, our framework produces not just textual descriptions but a functional skeleton containing functional layouts and parameter suggestions derived from that evidence, ensuring engineering feasibility. Furthermore, thanks to the presence of the Evaluation Agent, every revision suggestion is documented (e.g., specific scores and comments), greatly enhancing the explainability of AI decision-making and overcoming the black box problem of traditional end-to-end models.

More importantly, this “Generative-Critical” framework possesses significant methodological generalizability, offering a scalable paradigm for addressing the challenges of applying generic AIGC in other professional fields. Just as Chen and Bao applied LLM Agents to structural engineering to achieve code compliance [45], our framework could be transferred to urban planning or architectural design by replacing the “Critic” with a traffic flow simulator or a building energy consumption analyzer. By internalizing domain-specific expertise and evaluation criteria, the “Generative-Critical” framework is poised to be a key architecture in advancing AIGC from a general-purpose tool to a reliable collaborator in specialized fields.

4.3. Significance of XAI for Decision Support and Design Education

Beyond improving the quality of automated design, the practical value of this framework is further manifested in its potential for decision support and design education, based on explainable AI. On one hand, the sustainability scorecard itself constitutes an intuitive decision-support system. It quantifies abstract sustainability goals, enabling designers to visually inspect a scheme’s performance weaknesses and make data-driven design decisions. On the other hand, the Expert Consultation Agent provides procedural explainability. As shown in the case study, it can clearly articulate the design logic behind the V2.0 scheme—such as “how to balance aesthetics and hydrology on a water-scarce site”—breaking the black box of the AI and enhancing human–machine trust and collaboration.

These two XAI features make the framework an effective design education tool, especially for novices. Through the “Critique-and-Refine” loop, the system can automatically expose cognitive biases in a beginner’s scheme, explain the reasons via critical feedback, demonstrate superior solutions through iteration, and reinforce understanding via dialogue. This is akin to an AI teaching assistant that accelerates a learner’s professional growth in design practice.

4.4. Limitations and Future Research

Although this research framework demonstrates the potential to integrate aesthetics and sustainability, it still has several limitations in its methodology and technical implementation, which also point to clear directions for future research.

First, enhancing the quantitative precision of the evaluation mechanism is a key direction for future research. To ensure efficiency during the conceptual design phase, this framework currently employs a qualitative evaluation based on text keywords. While this approach quickly captures design intent, it does not yet integrate quantitative engineering simulation data (e.g., runoff control rates). Future research should focus on integrating professional environmental performance simulation tools (such as SWMM) and using their quantitative outputs as the basis for sustainability scoring, particularly in the later stages of design (such as detailed design), to make the leap from qualitative judgment to quantitative feedback.

Second, deepening the exploration of the evaluation weighting system is another important research topic. In this study, the scorecard weights rely on presets, and their scientific basis and dynamic nature need to be strengthened. Future work could proceed on two levels: on one hand, scientifically calibrating the weights for different project types using established methods like the Delphi method [46] or the Analytic Hierarchy Process (AHP) [47]; on the other hand, allowing users to interactively adjust the weights and introducing sensitivity analysis to reveal the design trade-off space under different value preferences (e.g., prioritizing ecology over economy), thereby providing deeper insights for multi-objective decision-making.

Third, exploring the synergistic output of text and visuals is the next step to enhance the framework’s usability. The framework currently generates mainly design text, which requires designers to deeply understand its internal logic before creatively translating and developing it visually. Although this preserves the designer’s leading role in visual expression, achieving a multimodal output that integrates text and images will be a critical next step toward improving human–machine collaboration efficiency. Future work could achieve synchronous generation of scheme text and concept images by integrating advanced image generation models (such as Stable Diffusion).

Finally, enhancing the framework’s regional adaptability and generalization capabilities is equally crucial. Although the case study preliminarily considered site topography, a systematic integration of more complex factors like climate and soil is still insufficient. Future research should focus on building an extensible regional knowledge base and conducting stress tests across a diverse range of project types (such as wetland parks, community parks, etc.) to systematically improve the framework’s generalization ability. By deepening research in these directions, future studies hold the promise of developing this framework into an intelligent, sustainable landscape design platform that is more precise in evaluation, more transparent in decision-making, more intuitive in expression, and more adaptive.

5. Conclusions

To address the core challenge of unreliable output quality from AIGC in professional domains like landscape design, which stems from a lack of domain-specific knowledge constraints, this study successfully constructed and empirically validated a “Generative-Critical” multi-agent framework. The core innovation of this framework lies in its internalized expert review mechanism, realized through an Evaluation Agent. This agent automatically assesses and provides critical feedback on generated schemes based on a quantitative sustainability scorecard, thereby driving the scheme’s autonomous iterative optimization.

The empirical results of the case study demonstrate the framework’s significant effectiveness. It successfully transformed an initial scheme that prioritized aesthetic form but had severe sustainability deficiencies (V1.0, mean expert blind score: 59.3) into a high-quality, accepted scheme that balances cultural aesthetics with ecological functions (V2.0, mean expert blind score: 88.0). Furthermore, the HCI interface’s usability test yielded a high SUS score of 85.5, confirming the system’s ease of use.

Although this study has limitations, primarily that the evaluation mechanism relies on a qualitative assessment of text and the output modality is confined to text, it nonetheless provides a new methodological paradigm for the application of AIGC in other professional domains (e.g., architecture, urban planning, engineering) that can ensure output quality and professional reliability. Its XAI-based scorecard and expert consultation mechanism also demonstrate significant practical value in design decision support and professional design education.

Author Contributions

Conceptualization, X.Y. and S.L.; methodology, L.L., X.Y. and S.L.; software, F.D.; formal analysis, L.L. and S.L.; investigation, L.L. and F.D.; resources, L.L.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, S.L.; visualization, F.D.; supervision, X.Y.; project administration, X.Y.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by the authors.

Institutional Review Board Statement

Ethical review and approval for this study were waived in accordance with Article 32 (Chapter 3) of the Measures for Ethical Review of Life Science and Medical Research Involving Humans (available at: https://www.gov.cn/zhengce/zhengceku/2023-02/28/content_5743658.htm; accessed on 12 October 2025). This exemption applies because the research adopts a survey-based design without any form of intervention on participants. All data collected are either publicly available or fully anonymized, and the study does not involve sensitive personal information or commercial interests, thereby complying with the criteria for ethical review exemption stipulated in the aforementioned regulation.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank the anonymous reviewers and the editor for their constructive comments and suggestions that greatly improved the quality of this manuscript.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Fang, X.; Li, J.; Ma, Q. Integrating Green Infrastructure, Ecosystem Services and Nature-Based Solutions for Urban Sustainability: A Comprehensive Literature Review. Sustain. Cities Soc. 2023, 98, 104843. [Google Scholar] [CrossRef]
Vilanova, C.; Ferran, J.S.; Concepción, E.D. Integrating Landscape Ecology in Urban Green Infrastructure Planning: A Multi-Scale Approach for Sustainable Development. Urban For. Urban Green. 2024, 94, 128248. [Google Scholar] [CrossRef]
Ghadiri Khanaposhtani, M.; Gasc, A.; Francomano, D.; Villanueva-Rivera, L.J.; Jung, J.; Mossman, M.J.; Pijanowski, B.C. Effects of Highways on Bird Distribution and Soundscape Diversity around Aldo Leopold’s Shack in Baraboo, Wisconsin, USA. Landsc. Urban Plan. 2019, 192, 103666. [Google Scholar] [CrossRef]
Nath, T.K.; Zhe Han, S.S.; Lechner, A.M. Urban Green Space and Well-Being in Kuala Lumpur, Malaysia. Urban For. Urban Green. 2018, 36, 34–41. [Google Scholar] [CrossRef]
Cai, A.; Wang, J.; MacLachlan, I.; Zhu, L. Modeling the Trade-Offs between Urban Development and Ecological Process Based on Landscape Multi-Functionality and Regional Ecological Networks. J. Environ. Plan. Manag. 2020, 63, 2357–2379. [Google Scholar] [CrossRef]
Li, F.; Liu, X.; Hu, D.; Wang, R.; Yang, W.; Li, D.; Zhao, D. Measurement Indicators and an Evaluation Approach for Assessing Urban Sustainable Development: A Case Study for China’s Jining City. Landsc. Urban Plan. 2009, 90, 134–142. [Google Scholar] [CrossRef]
Miao, C.; Wang, J.; Wang, D. Research Progress on Urban Forest Ecosystem Services and Multifunctionality. Int. J. Environ. Sci. Technol. 2024, 22, 11557–11566. [Google Scholar] [CrossRef]
Mohd Hussain, M.R. The Tangible and Intangible Values of River towards Sustainable Urban Landscape Development. J. Archit. Plan. Constr. Manag. 2020, 2, 1–22. [Google Scholar] [CrossRef]
Qiu, J.; Nassauer, J.I.; Ahern, J.; Huang, L.; Reed, J.; Ding, S.; Guo, J.; Liu, Z.; Ou, W.; Ouyang, Z.; et al. Advancing Landscape Sustainability Science: Key Challenges and Strategies for Integration with Landscape Design and Planning. Landsc. Ecol. 2025, 40, 25. [Google Scholar] [CrossRef]
Sasaki Enabling Synergies: Integrating Ecology with Landscape Architecture in Design Practice. Available online: https://www.sasaki.com/voices/enabling-synergies-integrating-ecology-with-landscape-architecture-in-design-practice/ (accessed on 5 December 2025).
Parsons, R. Conflict between Ecological Sustainability and Environmental Aesthetics: Conundrum, Canärd or Curiosity. Landsc. Urban Plan. 1995, 32, 227–244. [Google Scholar] [CrossRef]
Rechner Dika, I. Are Ecological Design Principles Becoming the Norm in Contemporary Landscape Design? A Comparative Analysis of Realized Park Projects (2015–2025). Sustainability 2025, 17, 6620. [Google Scholar] [CrossRef]
Lallawmzuali, R.; Pal, A.K. Computer Aided Design and Drafting in Landscape Architecture. Curr. J. Appl. Sci. Technol. 2023, 42, 1–11. [Google Scholar] [CrossRef]
Jiang, W.; Zhang, Y. Application of 3D Visualization in Landscape Design Teaching. Int. J. Emerg. Technol. Learn. 2019, 14, 53. [Google Scholar] [CrossRef]
Yong, S.d.; Kusumarini, Y.; Tedjokoesoemo, P.E.D. Interior Design Students’ Perception for AutoCAD, SketchUp and Rhinoceros Software Usability. IOP Conf. Ser. Earth Environ. Sci. 2020, 490, 012015. [Google Scholar] [CrossRef]
Liu, M.; Nijhuis, S. The Application of Advanced Mapping Methods and Tools for Spatial-Visual Analysis in Landscape Design Practice. Sustainability 2021, 13, 7952. [Google Scholar] [CrossRef]
Zhang, H.; Nijhuis, S.; Newton, C. Advanced Digital Methods for Analysing and Optimising Accessibility and Visibility of Water for Designing Sustainable Healthy Urban Environments. Sustain. Cities Soc. 2023, 98, 104804. [Google Scholar] [CrossRef]
Kim, J.; Kang, J. Development of Hazard Capacity Factor Design Model for Net-Zero: Evaluation of the Flood Adaptation Effects Considering Green-Gray Infrastructure Interaction. Sustain. Cities Soc. 2023, 96, 104625. [Google Scholar] [CrossRef]
Sędzicki, D.; Cudzik, J.; Bonenberg, W.; Nyka, L. Computer-Aided Automated Greenery Design—Towards a Green BIM. Sustainability 2022, 14, 8927. [Google Scholar] [CrossRef]
Cao, Y.; Li, S.; Liu, Y.; Yan, Z.; Dai, Y.; Yu, P.; Sun, L. A Survey of AI-Generated Content (AIGC). ACM Comput. Surv. 2025, 57, 1–38. [Google Scholar] [CrossRef]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Cao, Y.; Li, S.; Liu, Y.; Yan, Z.; Dai, Y.; Yu, P.S.; Sun, L. A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT. arXiv 2023, arXiv:2303.04226. [Google Scholar] [CrossRef]
Ye, X.; Huang, T.; Song, Y.; Li, X.; Newman, G.; Wu, D.J.; Zeng, Y. Generating Conceptual Landscape Design via Text-to-Image Generative AI Model. Environ. Plan. B Urban Anal. City Sci. 2025. Online ahead of print. [CrossRef]
Kapsalis, T. UrbanGenAI: Reconstructing Urban Landscapes Using Panoptic Segmentation and Diffusion Models. arXiv 2024, arXiv:2401.14379. [Google Scholar]
Wang, Q.; Liang, Y.; Zheng, Y.; Xu, K.; Zhao, J.; Wang, S. Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models. Comput. Environ. Urban Syst. 2025, 122, 102339. [Google Scholar] [CrossRef]
Qian, K.; Mao, L.; Liang, X.; Ding, Y.; Gao, J.; Wei, X.; Guo, Z.; Li, J. AI Agent as Urban Planner: Steering Stakeholder Dynamics in Urban Planning via Consensus-Based Multi-Agent Reinforcement Learning. arXiv 2023, arXiv:2310.16772. [Google Scholar]
Odeh, A.; Jayousi, R.; Rutrot, A. Toward Competent City Management Using Multi-Agent System (MAS). In Proceedings of the ICAT23, Istanbul, Turkey, 17–19 August 2023. [Google Scholar] [CrossRef]
Hussain, T.; Urlamma, D.; Vericharla, R.; Dhatterwal, J.S. Augmenting Traffic Flow Efficiency Using Multi-Agent Systems (MAS). Int. J. Inf. Technol. 2025, 17, 3119–3124. [Google Scholar] [CrossRef]
Gooood. Gooood Design Website. Available online: https://www.gooood.cn/ (accessed on 30 October 2025).
Green Business Certification Inc. SITES V2 Rating System For Sustainable Land Design and Development; Green Business Certification Inc.: Washington, DC, USA, 2014. [Google Scholar]
U.S. Green Building Council. LEED Reference Guide for Neighborhood Development, Version 4; U.S. Green Building Council: Washington, DC, USA, 2014. [Google Scholar]
Pedro, J.; Silva, C.; Pinheiro, M.D. Scaling up LEED-ND Sustainability Assessment from the Neighborhood towards the City Scale with the Support of GIS Modeling: Lisbon Case Study. Sustain. Cities Soc. 2018, 41, 929–939. [Google Scholar] [CrossRef]
Ministry of Housing and Urban-Rural Development. Notice of the Ministry of Housing and Urban-Rural Development on Issuing the Technical Guidelines for Sponge City Construction—Low Impact Development Stormwater System Construction (Trial); China Architecture & Building Press (CABP): Beijing, China, 2014.
Ministry of Housing and Urban-Rural Development of the People’s Republic of China. Standard for Planning of Urban Green Space; China Architecture & Building Press (CABP): Beijing, China, 2019.
Zhang, Z. An Improved BM25 Algorithm for Clinical Decision Support in Precision Medicine Based on Co-Word Analysis and Cuckoo Search. BMC Med. Inform. Decis. Mak. 2021, 21, 81. [Google Scholar] [CrossRef]
Turney, P.D.; Pantel, P. From Frequency to Meaning: Vector Space Models of Semantics. J. Artif. Intell. Res. 2010, 37, 141–188. [Google Scholar] [CrossRef]
Gooood. The Orchestra Park by SoBA. Available online: https://www.gooood.cn/the-orchestra-park-by-soba.htm (accessed on 30 October 2025).
Brooke, J. SUS: A “Quick and Dirty” Usability Scale. In Usability Evaluation In Industry; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
Chicco, D.; Sichenze, A.; Jurman, G. A Simple Guide to the Use of Student’s t-Test, Mann-Whitney U Test, Chi-Squared Test, and Kruskal-Wallis Test in Biostatistics. BioData Min. 2025, 18, 56. [Google Scholar] [CrossRef]
Tam, T.Y.C.; Sivarajkumar, S.; Kapoor, S.; Stolyar, A.V.; Polanska, K.; McCarthy, K.R.; Osterhoudt, H.; Wu, X.; Visweswaran, S.; Fu, S.; et al. A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review. npj Digit. Med. 2024, 7, 258. [Google Scholar] [CrossRef]
Xing, Y.; Gan, W.; Chen, Q.; Yu, P.S. AI-Generated Content in Landscape Architecture: A Survey. AI Open 2025, 6, 220–243. [Google Scholar] [CrossRef]
Huang, K.-L.; Liu, Y.; Dong, M.-Q. Incorporating AIGC into Design Ideation: A Study on Self-Efficacy and Learning Experience Acceptance under Higher-Order Thinking. Think. Ski. Creat. 2024, 52, 101508. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
Jang, S.; Roh, H.; Lee, G. Generative AI in Architectural Design: Application, Data, and Evaluation Methods. Autom. Constr. 2025, 174, 106174. [Google Scholar] [CrossRef]
Chen, J.; Bao, Y. Multi-Agent Large Language Model Framework for Code-Compliant Automated Design of Reinforced Concrete Structures. Autom. Constr. 2025, 177, 106331. [Google Scholar] [CrossRef]
Alkrides, B.F. Promoting Sustainable Urban Walkability: A Modified Delphi Study on Key Indicators for Urban Walkability in Gulf Cooperation Council Urban Streets. Sustainability 2025, 17, 1179. [Google Scholar] [CrossRef]
Srdjevic, B.; Srdjevic, Z.; Reynolds, K.M.; Lakicevic, M.; Zdero, S. Using Analytic Hierarchy Process and Best–Worst Method in Group Evaluation of Urban Park Quality. Forests 2022, 13, 290. [Google Scholar] [CrossRef]

Figure 6. The “critique-and-refine” loop mechanism.

Table 1. Composition of the sustainability-focused knowledge base.

Type	Data Source	Topic Coverage	No. of Entries	Purpose Summary
Aesthetic Precedents	Professional design portals (e.g., Gooood.cn)	Urban parks, waterfronts, campus landscapes, cultural parks, urban plazas, etc.	108	To provide inspiration for form and composition, references for details, and insights into spatial narratives and placemaking.
Ecological and Technical Norms	International, national, and industry standards and technical guidelines	SITES (site-level), LEED-ND (neighborhood-level), Sponge City concepts, low-impact development (LID), green space systems, vegetation, biodiversity, accessibility, permeable pavements, etc.	12	To provide functional constraints and evidence for hydrology, vegetation, materials, accessibility, and human health and well-being.
Total	—	—	120	To serve as the knowledge foundation for the RAG and Evaluation Agents.

This table outlines the composition and scale of the dual knowledge base constructed in this study. It comprises two parallel sub-repositories: one for Aesthetic Precedents to provide design inspiration, and another for Ecological and Technical Norms to supply functional constraints. This dual structure is the fundamental prerequisite for the subsequent “Generative-Critical” agents to achieve balanced decision-making.

Table 2. Sample of LLM-based knowledge structuring and summarization for aesthetic.

Id	Title	Structured Content Extracted and Summarized by LLM	URL
1	Qingdao Vanke City Time Park	(1) Project Information: A 36,000 m² urban regeneration park in Qingdao, located on a site with a 4 m north-to-south slope, adjacent to residential and commercial areas. (2) Design Concept: Explores the core aesthetic concepts of “Landmarks” (utilizing terrain, bridges, and play structures) and “Boundaries” (creating curved, participatory edges). (3) Challenges and Solutions: Aims to coordinate site relations and integrate local culture; solved via three strategies: “Interface Activation,” “Activity Enhancement,” and “Cultural ID” (e.g., paper-cutting and wave elements). (4) Zonal Design: Includes key nodes such as the Yuguang Corner (entrance), Yuedong Bridge, Guangleyuan (play area), Urban Balcony (running track), and Tide Theater.	https://www.gooood.cn/qingdao-vanke-city-time-park-by-zap.htm (accessed on 30 October 2025)
2	The Orchestra Park	(1) Project Background: A 12,130 m² site in Kunshan, adjacent to residential communities, located at a river confluence with a well-preserved natural wetland and woodland base. (2) Design Inspiration: Draws from the local “Jiangnan Sizhu” (silk and bamboo instruments) intangible cultural heritage, translating its musical melodies and instrument curves into the landscape layout. (3) Design Strategies: Summarized into three main strategies: Integrating Natural Texture with “Sizhu” Form (preserving existing topography and woodlands); Natural Experience in Riverside Belt (adopting a minimal intervention philosophy); Community Interaction in Vibrant Zone (linking facilities like a skate park and climbing wall).	https://www.gooood.cn/the-orchestra-park-by-soba.htm (accessed on 30 October 2025)

This table presents a selection of sample cases scraped from the Gooood.cn website. These cases were chosen because their design descriptions consciously emphasize their aesthetic form, visual symbols, or artistic concepts, making them valuable aesthetic references for designers.

Table 3. List of ecological and technical norms.

No.	Item Name	Applicable Scale	Key Functions
1	The Sustainable SITES Initiative	Site	Performance criteria and credits for water resources, soil and vegetation, materials, and human health and well-being.
2	LEED for Neighborhood Development	Neighborhood	Smart location, compact development, walkability, and green infrastructure networks.
3	Technical guidelines for sponge city construction: low-impact development (LID) rainwater systems (Trial)	Site or Regional	Parameters and details for LID or GI facilities such as rain gardens, bioretention, and permeable pavements; includes facility types, control targets, and design principles.
4	Standard for urban green space planning	City	Green space classification, green coverage ratio, service radius, and accessibility.
5	Code for park design	Site	Park classification, functional zoning, target user groups, and facility allocation ratios.
6	Standard for classification of urban green space	City	Green space type definitions, coding, and statistical criteria.
7	Technical specification for urban biodiversity conservation assessment	City or Site	Habitat restoration, corridor connectivity, indicator species, and monitoring.
8	Design code for sponge city rainwater control and utilization projects	Site	Runoff control, storage, reuse, and design calculation methods.
9	Technical specification for permeable pavement construction and maintenance	Site	Material selection, pavement structure, and performance metrics for infiltration and load-bearing capacity.
10	Code for construction and acceptance of landscape engineering	Site	Soil amendment, planting techniques, maintenance, and acceptance.
11	Universal design code for accessibility in buildings and municipal engineering	Site	Path continuity, tactile paving, and facility accessibility.
12	Guidelines for selecting native plants in territorial ecological restoration projects	Site or Regional	Suitability of native vegetation, community composition, maintenance costs, and ecological benefits.

This table presents key domestic and international norms, standards, and guidelines related to sustainability, covering various scales from site and neighborhood to city and region. It provides comprehensive ecological knowledge support for sustainable landscape design.

Table 4. Evaluation dimensions and criteria of the sustainability scorecard.

Evaluation Dimension	Dimension Symbol	Core Evaluation Criteria
Ecological resilience	$S_{e c o}$	Selection of native plants and promotion of community diversity. Creation and restoration strategies for biodiversity habitats. Design for ecological corridor connectivity. Conservation and amendment of existing healthy on-site soil.
Hydrological performance	$S_{w a t e r}$	Degree of integration for Sponge City or low-impact development (LID) facilities. Measures for rainwater runoff control and management. Extent of application for permeable pavement materials. Design for on-site water quality purification and protection (e.g., bioswales).
Social and human well-being	$S_{s o c i a l}$	Design for all-ages usability and universal accessibility. Creation of public health-promoting and therapeutic landscapes. Exploration and continuation of place identity and local culture. Community engagement and the safety and equity of public spaces.
Resource and material efficiency	$S_{m a t e r i a l}$	Use of recycled, local, or low-carbon materials. On-site waste reduction and resource utilization. Measures to mitigate the urban heat island (UHI) effect (e.g., providing shade, high-albedo pavements). Energy-efficient design (e.g., solar lighting, reduced maintenance).

The evaluation dimensions and criteria in this table are primarily adapted from mainstream international sustainable site design rating systems, refined for operability at the conceptual design stage. These criteria form the core basis for the Evaluation Agent’s text analysis and quantitative scoring, aiming to translate qualitative descriptions in the draft schemes into computable performance indicators.

Table 5. Example of preset weight coefficient configurations for different project types.

Project Type (Example)	$w_{e c o}$	$w_{w a t e r}$	$w_{s o c i a l}$	$w_{m a t e r i a l}$	Rationale for Weight Setting
Ecological wetland park	0.4	0.4	0.1	0.1	Core functions are ecological restoration and hydrological regulation; social recreation and material attributes are secondary.
Industrial brownfield remediation park	0.5	0.2	0.2	0.1	The primary task is ecological restoration (soil, vegetation) and pollution remediation; social functions are a secondary goal post-remediation.
Waterfront open space	0.3	0.3	0.3	0.1	Emphasizes a balance among ecological (shoreline restoration), hydrological (flood resilience), and social (public access to water) aspects.
Cultural park	0.3	0.2	0.4	0.1	The primary objective is to support cultural narratives, place identity, and social-educational functions, which require a high-quality ecological environment as a carrier for the cultural experience.
Community pocket park	0.2	0.2	0.5	0.1	The core function is to serve surrounding residents, making social well-being, accessibility, and equity the primary objectives.
Campus or Educational landscape	0.2	0.2	0.4	0.2	Emphasizes high-frequency social functions (student activities, outdoor learning) and spatial quality (materials), while also serving a demonstrative role for LID and ecological practices.
Urban central plaza	0.1	0.3	0.4	0.2	Emphasis is placed on social carrying capacity under high-intensity use, while LID pavements (hydrology) and material durability (materials) are also crucial.
Balanced type (default)	0.25	0.25	0.25	0.25	Applied to standard urban green spaces or when the project type is ambiguous, providing a balanced consideration of all dimensions.

This table aims to illustrate how the Evaluation Agent dynamically adjusts its assessment focus based on project type. The listed types and weight configurations are illustrative examples demonstrating this dynamic mechanism; in practice, this configuration library can be expanded with more specific project types. The setting of each weight coefficient is primarily based on the core functions and key sustainability challenges of each landscape type.

Table 6. Knowledge base retrieval and design association.

Knowledge Type	Retrieval Source	Retrieved Relevant Content and Knowledge Points	Role in Design
Cultural knowledge (External)	Public internet data	1. Yixing Zisha pottery culture (purple clay, five-color earth, pottery cracks). 2. The life and representative works of famous playwright Yu Ling.	Provides core cultural themes and aesthetic symbols for the design, such as seals and cracks.
Ecological knowledge (Internal)	Technical Guidelines for Sponge City Construction	1. Source reduction: Emphasizes using permeable paving, bioretention ponds, etc. 2. Process conveyance: Uses terrain elevation differences for gravity-flow organization. 3. End-of-pipe storage: Sets up rain gardens and detention ponds at the lowest point of the site.	Constructs a complete LID stormwater management chain from source to end.
Normative knowledge (Internal)	SITES v2 Rating System	1. Section 3: Site Design—Water. Water Prerequisite 3.1: Manage precipitation on site (Required); Water Credit 3.3: Manage precipitation beyond baseline (4–6 points). 2. Section 4: Site Design—Soils + Vegetation. Soil + Veg Prerequisite 4.3: Use appropriate plants (Required).	Ensures the design meets sustainability certification standards, such as for stormwater management and native plant use.
Aesthetic knowledge (Internal)	Built-in precedent library	1. Case name: The Orchestra Park [37]. 2. Core experience: Successfully translated abstract cultural symbols (melodies of Jiangnan Sizhu music) into tangible landscape forms (flowing, curved pathways).	Offers a reference for the design technique of formalizing cultural symbols, inspiring the translation of cultural symbols like “Zisha Seal” into landscape elements.

This table shows the output of the Generation Agent after receiving the task and performing dual knowledge retrieval. External retrieval refers to general cultural knowledge related to the project location, obtained from public internet data. Internal retrieval refers to the recall of norms, technical guides, and aesthetic precedents related to sustainable design from the proprietary knowledge base built in this study. The last column demonstrates how the AI translates the retrieved discrete knowledge points into specific design objectives to drive subsequent scheme generation.

Table 7. Iterative evaluation scores of the “critique-and-refine” loop in the case study.

Scheme Version	$S_{eco}$ (w = 0.3)	$S_{water}$ (w = 0.2)	$S_{social}$ (w = 0.4)	$S_{material}$ (w = 0.1)	$S_{total}$	Evaluation Result
V1.0	45	30	90	60	61.5	Not met
V2.0	90	90	90	85	89.5	Met

This table displays the internal evaluation scores assigned by the AI system to the two scheme versions detailed in Section 3.1.2. The V1.0 scheme was deemed not met as its total score (61.5) fell below the preset threshold of 80. After one “Critique-and-Refine” cycle, the V2.0 scheme’s total score (89.5) met the threshold and was accepted. The significant score improvement intuitively demonstrates the effectiveness of the “Critique-and-Refine” loop in enhancing the scheme’s sustainability performance.

Table 8. Questions from the system usability scale (SUS).

No.	Questionnaire Item
1	I think that I would like to use this system frequently.
2	I found the system unnecessarily complex.
3	I thought the system was easy to use.
4	I think that I would need the support of a technical person to be able to use this system.
5	I found the various functions in this system were well integrated.
6	I thought there was too much inconsistency in this system.
7	I would imagine that most people would learn to use this system very quickly.
8	I found the system very cumbersome to use.
9	I felt very confident using the system.
10	I needed to learn a lot of things before I could get going with this system.

This table lists the 10 standard items of the SUS questionnaire. Participants rated each item using a five-point Likert scale (1 = Strongly Disagree, 5 = Strongly Agree). The ratings were then converted into a final SUS score ranging from 0 to 100 using the standard formula (for odd-numbered items, score—1; for even-numbered items, 5—score; sum the results and multiply by 2.5). According to industry benchmarks for SUS scores, an average score of 68 is considered acceptable, while scores above 80.3 are rated as excellent.

Table 9. Comparison of sustainability scores for the baseline and AI-refined schemes from the expert blind evaluation.

Evaluation Dimension (Weight)	Weight	Baseline Scheme (Mean ± SD)	AI-Refined Scheme (Mean ± SD)	t-Test Results (df = 9)
Ecological resilience ( $S_{e c o}$ )	$w_{e c o} = 0.3$	42.0 ± 10.5	86.5 ± 6.0	t = 19.7, p < 0.001 *
Hydrological performance ( $S_{w a t e r}$ )	$w_{w a t e r} = 0.2$	28.5 ± 9.0	92.0 ± 5.5	t = 23.4, p < 0.001 *
Social and human well-being ( $S_{s o c i a l}$ )	$w_{s o c i a l} = 0.4$	87.5 ± 7.0	88.0 ± 7.2	t = 0.3, p = 0.74
Resource and material efficiency ( $S_{m a t e r i a l}$ )	$w_{m a t e r i a l} = 0.1$	60.0 ± 12.0	84.0 ± 8.0	t = 9.3, p < 0.001 *
Total score ( $S_{t o t a l}$ )	-	59.3 ± 8.2	88.0 ± 6.5	t = 22.8, p < 0.001 *

This table compares the scores given by 10 human experts to the two schemes. The human experts’ scores for the baseline scheme (59.3) and the AI-refined scheme (88.0) are highly consistent with the internal scores given by the AI’s Evaluation Agent (61.5 for V1.0 and 89.5 for V2.0). The last column reports the results of a paired-samples t-test, used to measure the statistical significance of the score differences between the V1.0 and V2.0 schemes, df = 9 represents the degrees of freedom (n − 1). The t-value indicates the magnitude of the difference; a larger t-value signifies a more significant difference. The p-value represents the probability that the observed difference occurred by chance; a smaller p-value indicates that the difference is less likely to be due to chance. By convention, a difference is considered statistically significant if p < 0.05, and highly significant if p < 0.001 [39,40]. The asterisks (*) indicate statistical significance at the p < 0.001 level.

Table 10. Comparison of the proposed framework with existing AIGC design paradigms.

Dimensions	Image-Centric AIGC	General-Purpose LLM	Proposed Framework
Core capability	Visual rendering and style transfer	General text generation and broad reasoning	Domain decision support and scheme optimization
Operating mechanism	Linear open-loop	Linear or simple multi-turn dialogue	Cyclical closed-loop
Knowledge source	Implicit pixel patterns	Implicit general knowledge	Explicit Dual Knowledge Base
Constraint handling	Weak	Moderate (soft constraints)	Strong (hard constraints)
Output nature	Aesthetic skin	Unverified text	Functional skeleton
Explainability	Low	Medium	High

This table systematically compares image generation models, general-purpose language models, and the multi-agent framework proposed in this study across six dimensions: core capability, operating mechanism, knowledge source, constraint handling, output nature, and explainability. It highlights the structural advantages of the “Generative-Critical” mechanism when addressing professional design tasks.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, L.; Yang, X.; Liu, S.; Deng, F. AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent. Urban Sci. 2026, 10, 56. https://doi.org/10.3390/urbansci10010056

AMA Style

Li L, Yang X, Liu S, Deng F. AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent. Urban Science. 2026; 10(1):56. https://doi.org/10.3390/urbansci10010056

Chicago/Turabian Style

Li, Li, Xuesong Yang, Sijia Liu, and Feiyang Deng. 2026. "AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent" Urban Science 10, no. 1: 56. https://doi.org/10.3390/urbansci10010056

APA Style

Li, L., Yang, X., Liu, S., & Deng, F. (2026). AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent. Urban Science, 10(1), 56. https://doi.org/10.3390/urbansci10010056

Article Menu

AI-Enabled Sustainable Landscape Design: A Decision-Support Framework Based on “Generative-Critical” Multi-Agent

Abstract

1. Introduction

2. Research Methods

2.1. Research Framework

2.2. Sustainability-Focused Knowledge Base Construction

2.2.1. Aesthetic Precedent Knowledge

2.2.2. Ecological and Technical Norms

2.2.3. RAG-Based Knowledge Base Construction

2.3. The “Generative-CRITICAL” Multi-Agent Workflow

2.3.1. Generation Agent

2.3.2. Evaluation Agent

2.3.3. The “Critique-and-Refine” Loop

2.4. Human–Computer Interaction Workflow

2.4.1. Expert Consultation Agent

2.4.2. Human–Computer Interaction Interface

3. Results

3.1. Case Study: Sustainable Cultural Park Design

3.1.1. Case Background and Task Input

3.1.2. Knowledge Base Retrieval and Association

3.1.3. Empirical Validation of the “Critique-and-Refine” Loop

3.1.4. Human–Computer Interactive Consultation

3.2. Agent Effectiveness Evaluation

3.2.1. Usability Assessment of the Human–Computer Interaction Interface

3.2.2. Blind Evaluation of Sustainability Performance

4. Discussion

4.1. The Mechanistic Role of the “Generative-Critical” Multi-Agent Framework

4.2. Innovation and Generalizability of the “Generative-Critical” Multi-Agent Framework

4.3. Significance of XAI for Decision Support and Design Education

4.4. Limitations and Future Research

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI