A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response

Deng, Mengqiu; Peng, Xiao

doi:10.3390/buildings16050887

Open AccessArticle

A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response

by

Mengqiu Deng

¹ and

Xiao Peng

^2,*

¹

School of Energy Engineering, Zhejiang University, Hangzhou 310000, China

²

Organizations in Digital Transition, Utrecht University of Applied Sciences, 3584 CS Utrecht, The Netherlands

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(5), 887; https://doi.org/10.3390/buildings16050887

Submission received: 15 January 2026 / Revised: 6 February 2026 / Accepted: 13 February 2026 / Published: 24 February 2026

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Download

Browse Figures

Versions Notes

Abstract

Agent-based modeling (ABM) is a powerful tool for simulating building users’ dynamic behavior in demand response (DR) programs. However, ABM faces several challenges, particularly in encoding building users’ natural language features and common sense into rules or mathematical equations. To overcome these limitations, this paper proposes an agent framework based on large language models (LLMs) to simulate building users’ air-conditioning setpoint adjustment behavior under DR. This framework leverages LLMs’ natural language processing capabilities to replicate building users’ reasoning and decision-making processes. It consists of five modules: persona, perception, decision, reflection, and memory. Agents are assigned diverse personas through natural language descriptions based on empirical survey data. LLMs drive agents to reason and make decisions based on incentive prices and historical experiences. The results show that the LLM-based agent has common sense derived from natural language-defined personas and exhibits human-like irrational characteristics. This demonstrates the feasibility of replacing rules with natural language in ABM. The LLM-based agent can more effectively model hard-to-parameterize human features and provide decision explanations through LLM outputs. The results show that the inclusion of reflection and memory modules enables the agent to learn from previous decisions and reduce unreasonable choices.

Keywords:

agent-based modeling; large language model; air-conditioning; setpoint adjustment behavior; demand response

1. Introduction

Demand response (DR) is recognized as an effective method for managing peak electricity loads and alleviating grid strain [1]. The load flexibility of building air-conditioning systems offers a significant opportunity for DR implementation in the building sector [2]. In incentive-based DR programs, system operators use pricing incentives to encourage building users to adjust their air-conditioning setpoint, compensating them based on load reduction during peak periods [3,4]. As shown in [5,6], the success of DR programs depends heavily on building users’ participation and response behaviors. Therefore, accurately simulating these behaviors is crucial for the ex ante assessment of DR programs.

Extensive research has been conducted in the field of occupant behavior (OB) modeling [7,8]. The most widely used modeling technique is regression analysis, which describes the probability of a specific group’s response to certain stimuli (e.g., indoor and outdoor temperatures) [9]. While regression-based models provide straightforward implementation, they often struggle to capture the adaptive behavior of occupants in dynamic environments [10]. Agent-based modeling (ABM) has been introduced to address this challenge [11]. In ABM, building users are treated as autonomous agents, with their behaviors defined by various attributes and decision-making strategies [12,13].

In the context of DR, ABM methods can be categorized into three types based on whether building users have an energy management system (EMS) and their interaction with it. The first type of ABM method assumes that building users do not have an EMS and respond directly to the DR signals from the system operator. A set of predefined rules is used to describe building users’ air-conditioning setpoint adjustment behavior [14]. These rules are usually derived from expert knowledge and behavioral theories. For example, Zhou et al. [15] established a piecewise linear adjustment rule for the cooling setpoint in office buildings, based on real-time electricity prices. The second type of ABM method assumes that building users have an EMS and strictly follow its recommendations. In this case, the building user is treated as a rational agent [16]. Within the ABM framework, the EMS is typically represented mathematically using optimization models [17]. These models require the establishment of building thermal dynamic prediction models, as well as the definition of objective functions and constraints [18]. Specifically, the prediction models can be either physically driven [19] or data-driven [20,21]. The objective functions typically aim to minimize costs or user discomfort [22]. The constraints ensure that decision variables remain within acceptable ranges, such as maintaining the setpoint within a comfort range [23]. Therefore, building users’ setpoint adjustment behavior can be represented by various objective functions, their corresponding weights, and the associated constraints [24]. For example, Reis et al. [25] assigned different weight coefficients to the cost and comfort objective functions to reflect the heterogeneous preferences of building users. The third type of ABM method assumes that building users have an EMS but need to decide how to configure the EMS and whether to follow the EMS recommendations based on changing environmental conditions (e.g., time-varying DR signals). In this case, the agent’s behavior may exhibit irrationality (i.e., decisions are based on incomplete information or do not necessarily follow the optimal plan [26]). Some behavioral theories from sociology, psychology, or behavioral economics are integrated into the agent models to better capture human characteristics [27,28]. For example, Deng et al. [29] proposed an agent model based on the belief–desire–intention (BDI) framework, using fuzzy logic to model building users’ knowledge base and establishing decision rules to represent their setpoint adjustment behavior.

In summary, the ABM methods discussed above utilize rules or mathematical equations to model building users’ setpoint adjustment behavior. However, three major challenges remain. First, when modeling complex human behaviors that involve natural language descriptions (e.g., users’ DR attitudes) or common sense that is difficult to formalize, existing ABM methods tend to overlook these factors or simplify them through parameterization or heuristic rules. For example, Hoffmann et al. [30] conducted a survey among building users and performed clustering analyses based on characteristics such as personal responsibility, attitude, preferences, energy-saving experience, and trust utility. Four different building user personas were identified. Corresponding heuristic rules were then implemented in the agent model to simulate their response behaviors. Second, these agent behaviors still rely on predefined rules or parameter settings. While this approach can be effective in certain cases, it introduces researcher bias and uncertainty. Finally, capturing agent heterogeneity, a key aspect of the real world, remains challenging. Although ABM methods attempt to account for heterogeneity through corresponding parameter settings, this requires a deep understanding of how these parameters affect behavior. Successfully translating this heterogeneity into specific behaviors often depends on the researcher’s experience, making parameter tuning a significant challenge.

Recently, large language models (LLMs) such as ChatGPT-4o and Llama 3.2 have shown extraordinary capabilities in understanding, generating, and translating human language. In the building sector, many studies have leveraged LLMs for various applications [31], including building energy model generation [32,33,34], building structure design [35], building control [36,37] and its interpretability [38], knowledge management [39], information queries [40,41], data analysis [42], real-time building monitoring platforms [43], and building energy retrofit challenge evaluations [44]. However, a research gap still exists in modeling building users’ decision-making behavior. With recent breakthroughs in LLMs for simulating complex human decisions, new technical pathways are emerging for modeling building users’ behavior in ABM [45,46]. Table 1 compares the features of LLM-based ABM with traditional ABM. First, LLMs are capable of processing natural language, which allows them to describe complex human behaviors and common-sense reasoning. Empirical research has shown that factors such as attitude, behavior ability, external motivators, and energy-saving technologies influence building users’ willingness to participate in DR [47]. These factors are often described in natural language in survey questionnaires. With LLM-based prompt engineering, researchers can design agent models using natural language, significantly reducing the effort required for rule design and parameter selection. Second, by incorporating LLMs into ABM, agents can make decisions based on reasoning and past experiences, such as adjusting setpoint and adapting to environmental changes. This enhances the realism of the simulation and provides deeper insights into the mechanisms driving behavior. Finally, LLMs enable researchers to assign agents different personas through natural language descriptions, simulating the heterogeneity of building users. These advantages make LLMs highly promising for modeling building users’ air-conditioning setpoint adjustment behavior.

Inspired by this, this paper proposes an LLM-based agent framework to simulate building users’ air-conditioning setpoint adjustment behavior under DR. The framework consists of five modules: persona, perception, decision, reflection, and memory. First, agents are assigned diverse personas through natural language descriptions based on empirical survey data. Next, LLMs drive agents to reason and make decisions based on incentive prices and historical experiences. Finally, LLMs facilitate agents’ reflection on their past decisions when indoor temperature or thermal discomfort exceeds specific thresholds. Case simulations are conducted to evaluate the key modeling capabilities of the proposed LLM-based agent, with rule-based and optimization-based agents selected for comparison. The main contributions of this paper are as follows:

(1): This paper proposes an LLM-based agent framework to simulate building users’ air-conditioning setpoint adjustment behavior under DR. By defining the agents’ persona characteristics and common sense through natural language, this approach reduces the effort required for rule design and parameter selection compared to traditional ABM methods.
(2): This paper leverages LLMs’ natural language processing capabilities to replicate building users’ reasoning, decision-making, and reflection processes in response to DR signals. Compared to traditional ABM methods, this approach emphasizes the reasoning and learning capabilities of LLM-based agents, capturing the variability in behavior.
(3): This paper assigns different persona characteristics to agents using natural language descriptions, thereby achieving heterogeneity among them. The resulting behavioral differences among the agents are evaluated.

2. Methodology

2.1. LLM-Based Agent Framework

Figure 1 shows the proposed LLM-based agent framework for simulating building users’ air-conditioning setpoint adjustment behavior under DR. The framework consists of five modules: persona, perception, decision, reflection, and memory. The persona module defines the agent’s personal characteristics. The perception module collects real-time environmental data by communicating with the environment. The decision module retrieves relevant few-shot experiences from the memory module. It combines these with the current environmental data to create a prompt. This prompt is input into the decision LLM to generate an action. The action corresponds to an air-conditioning temperature setpoint. The reflection module evaluates past decisions based on DR performance indicators. It uses the evaluation to generate a prompt. This prompt is processed by the reflection LLM to correct any errors. These corrected actions are then stored back in the memory module. The memory module holds historical action data to support continual learning and adaptation. The detailed implementation of each module is presented in the following sections.

2.2. Persona

The persona module defines the agent’s personalities, preferences, and behaviors. In order to capture building users’ heterogeneous characteristics, different personas are developed. Generally, persona development involves analyzing data, conducting interviews, clustering user profiles, and constructing representative personas [49]. This process is time-consuming and requires significant manual effort. LLMs allow personas to be defined using natural language. This reduces the time spent on persona definition and eliminates the need for detailed behavioral parameters. Table 2 lists examples of empirical questionnaire data related to building users’ DR behavior. This data is used to define agent personas in natural language.

Different building users often exhibit inconsistent responses to questionnaire items due to their heterogeneity characteristics. This makes assigning an agent persona to each user a challenging task. Hoffmann et al. [30] identified four types of building users through empirical research and cluster analysis: hesitant skeptics (those reluctant to take action), spendthrifts (those with no clear tendencies, where cost is not a significant factor), cost-conscious materialists (those most likely to act for cost minimization), and eco-responsible helpers (those who feel a strong sense of responsibility and frequently take action). Based on these types, this paper defines four corresponding agent personas: (a) comfort-oriented, (b) balanced, (c) cost-oriented, and (d) grid-friendly. The questionnaire items related to these personas can be selected from Table 2. These agent personas are described in natural language through prompt engineering, as shown in Figure 2.

2.3. Perception

The perception module collects dynamic information from the environment. It supports the agent’s decision-making for air-conditioning setpoint adjustments. In a DR program, building users without an EMS make setpoint decisions directly. These decisions are mainly influenced by factors such as the current indoor temperature, the current incentive price, and the incentive price for the next time step [50,51]. Building users with an EMS also consider the EMS recommendations. They need to decide whether to follow the suggested setpoint.

In this paper, an optimization model is developed to represent the EMS in the building, as shown in Equation (1). The optimization time interval is set to 30 min. The EMS recommends the optimized setpoint T_opt,t to building users for the current time step.

\begin{array}{l} \min \underset{T}{O b j} (P_{t}, x_{t}) \\ s . t . f (x_{t}) = 0 \\ h (x_{t}) \leq 0 \end{array}

(1)

where T is the set of time steps for the optimization day; Obj is the objective function; x_t is the decision variable, representing the air-conditioning setpoint at time t; P_t is the incentive price at time t during the DR period; and f and h are the corresponding equality and inequality constraints.

Three key objective functions are defined: response quantity, cooling cost, and thermal discomfort, as shown in Equations (2)–(4).

O b j_{1} = F_{D R_q u a n t i t y} = \sum_{t = T_{s t a r t}}^{T_{e n d}} (Q_{b a s e, t} - Q_{c o o l, t})

(2)

O b j_{2} = F_{c o s t} = \sum_{t}^{T} (P_{c o o l, t} \times Q_{c o o l, t}) - \sum_{t = T_{s t a r t}}^{T_{e n d}} P_{t} \times (Q_{b a s e, t} - Q_{c o o l, t})

(3)

O b j_{3} = F_{d i s c o m f o r t} = \sum_{t = T_{s t a r t}}^{T_{e n d}} | T_{i n, t} - T_{d e f a u l t} | \times Δ t

(4)

where F_{DR_quantity} is the quantity of cooling load reduction achieved during the DR period [kWh]; F_cost is the cooling cost for the optimization day [CNY]; F_discomfort is the thermal discomfort level for the optimization day [°C*h]; T is the duration of the optimization day; T_start is the start time of the DR period; T_end is the end time of the DR period; Q_base,t is the baseline cooling load at time t [kW]; Q_cool,t is the actual cooling load at time t [kW]; P_t is the incentive price at time t [CNY/kWh]; P_cool,t is the unit cooling price at time t [CNY/kWh]; T_in,t is the indoor temperature at time t [°C]; T_default is the default setpoint [°C]; and Δt is the optimization time interval [h].

The overall objective function of the EMS is presented in Equation (5).

O b j = η_{1} \times O b j_{1} + η_{2} \times O b j_{2} + η_{3} \times O b j_{3}

(5)

where η₁, η₂, and η₃ are the weight coefficients of the corresponding objective functions.

The lower and upper bounds of the setpoint range are constrained as shown in Equation (6).

T_{i n, \min} \leq x_{t} \leq T_{i n, \max}

(6)

where T_in_,min and T_in_,max are the lower and upper limits of the acceptable indoor temperature, respectively.

Prompt engineering is employed to describe the environmental scenario at time t in natural language. The prompt includes the current indoor temperature T_in,t, the current incentive price P_t, the incentive price for the next time step P_t+1, the default setpoint T_default, the intuitive setpoint T_adj,t (detailed in Section 2.4.1), and the EMS recommendations T_opt,t. An environmental scenario perception prompt template is shown in Figure 3.

2.4. Decision

In the decision module, the agent uses the LLM’s common knowledge to reason and make decisions based on the current environmental scenario. In this paper, the decision interval of the LLM is set to 30 min. Additionally, if the agent has a memory module, it can retrieve historical experiences from its memory. The following two sections present the decision-making process from two perspectives: a decision without an EMS and decision with an EMS.

2.4.1. Decision Without an EMS

The agent without an EMS makes setpoint adjustment decisions directly. The decision-making process is divided into four steps: environmental perception, experience retrieval, prompt generation, and decision output.

Step 1: Environmental perception. The agent perceives the current environmental scenario and describes it as a corresponding prompt.

Step 2: Experience retrieval. The current environmental scenario is embedded into a vector. This vector is then queried within the memory module to retrieve the most similar past experience (detailed in Section 2.6). The retrieved experience is used as a few-shot example and integrated into the prompt generator.

Step 3: Prompt generation. The decision-making prompt consists of four key components: the system prompt, the few-shot experiences, the scenario description, and the available actions, as shown in Figure 4 (see Supplementary Material A1 for details).

The system prompt defines the agent’s persona and constrains the LLM’s response output. The few-shot experiences represent similar memory entries retrieved from the memory module. The scenario description provides the environmental perception information. The available actions outline the actions the agent can take. In this study, five actions are defined:

0: Significantly Increase—Increase the temperature setpoint by 2 °C (ΔT = +2 °C);
1: Lightly Increase—Increase the temperature setpoint by 1 °C (ΔT = +1 °C);
2: No Change—Keep the temperature setpoint unchanged (ΔT = +0 °C);
3: Lightly Decrease—Decrease the temperature setpoint by 1 °C (ΔT = −1 °C);
4: Significantly Decrease—Decrease the temperature setpoint by 2 °C (ΔT = −2 °C).

It is worth noting that human decision-making is often imprecise, so a 1 °C action interval is considered reasonable. Additionally, building users usually avoid large setpoint changes within a half-hour. Therefore, a maximum adjustment of 2 °C from the previous setpoint is also reasonable.

At each decision time step, a prompt is generated based on the current environmental scenario. The decision LLM then uses this prompt to reason and determine the appropriate action for the current time step.

Step 4: Decision output. To enable the agent to reason like humans, chain-of-thought (CoT) prompting techniques are used [52]. These techniques allow the LLM to generate a series of sentences outlining the step-by-step reasoning process, leading to the final decision. Once the prompt is input, the LLM generates the setpoint adjustment action for the current time step. This action is then translated into an air-conditioning setpoint T_adj,t (=T_set,t + ΔT), which adjusts the setpoint in the building model. By repeating these steps, the agent’s decision-making process forms a closed-loop system.

2.4.2. Decision with an EMS

The agent with an EMS considers the EMS recommendations when making decisions. The agent compares the setpoint it derives through its own common-sense reasoning (as described in Section 2.4.1) with the optimized setpoint provided by the EMS. In this section, the setpoint based on intuitive experience is limited to the upper limit of the acceptable indoor temperature. The agent then decides which setpoint to execute. The decision-making process prompt is shown in Figure 5 (see Supplementary Material A2 for details).

2.5. Reflection

The reflection module enables the agent to evaluate its DR performance indicators and historical decisions, accumulate experience, and enrich the memory module. This module is important for the agent’s continuous learning and adaptation. It helps the agent improve performance by analyzing erroneous or imperfect decisions. The following two sections focus on reflections regarding exceeding the maximum indoor temperature and maximum discomfort.

2.5.1. Reflection on Exceeding the Maximum Indoor Temperature

At the end of each simulation day, the agent evaluates whether the indoor temperature exceeded the maximum acceptable temperature (T_in_,max), as shown in Figure 6 (see Supplementary Material A3 for details). If the temperature surpasses this threshold at any given time (e.g., at time t_mistake in Figure 6), it indicates that an incorrect decision was made at time t (= t_mistake − Δt). The reflection LLM then correct this error. The historical message at time t and the corresponding decision LLM response are used as a prompt for the reflection LLM. The reflection LLM identifies the cause of the mistake and provides a corrected action. This corrected action is then stored in the memory module.

2.5.2. Reflection on Exceeding the Maximum Discomfort

At the end of the simulation day, if the indoor temperature remains within the maximum acceptable range, the agent evaluates whether the thermal discomfort exceeded the maximum acceptable threshold (F_discomfort_,max), as shown in Figure 7 (see Supplementary Material A4 for details). If the thermal discomfort surpasses this threshold, it indicates that the agent made imperfect decisions. The reflection LLM then corrects these decisions. To do so, the decision LLM’s response messages from historical scenarios across all time steps during the DR period are integrated into a prompt for the reflection LLM. The reflection LLM analyzes the causes of the imperfect decisions and generating corrected actions. These corrected actions are then stored in the memory module.

2.6. Memory

The memory module stores historical setpoint adjustment experiences, which help guide the agent in making accurate decisions. The memory module’s implementation consists of three steps: initialization, memory retrieval, and memory storage, as shown in Figure 8.

Step 1: Initialization. The initialization of the memory module is similar to the foundational knowledge a building user has about air-conditioning setpoint adjustment before participating in a DR program. A memory entry is defined as a vector

\vec{M}

= [T, P_T, P_T+1, T_set,T, T_in,T, A_T], where T is the simulation time step, P_T is the incentive price at time T, P_T+1 is the incentive price at the next time step, T_set,T is the setpoint at previous time step, T_in,T is the indoor temperature at time T, and A_T is the setpoint adjustment action at time T. The initial memory consists of n memory entries. In this study, it is assumed that the initial memory is empty.

Step 2: Memory retrieval. At each decision time step, the agent’s perception of the current environmental scenario is embedded into a vector

\vec{P}

= [t, P_t, P_t+1, T_set,t, T_in,t]. This vector is then searched within the memory module’s setpoint adjustment memory database to find the most similar memory. In this study, similarity is calculated using the formula shown in Equation (7). The memories in the memory database are sorted in ascending order of similarity, with the similar memories being selected. These similar memories are then presented to the agent in a few-shot manner to assist it in making accurate reasoning and decisions for the current environmental scenario. In this paper, memory retrieval is independent of the agent’s persona type. It is worth noting that these few-shot experiences merely provide a reference based on past decisions. The final decision still relies on the LLM’s reasoning based on the current environmental scenario.

S i m i l a r i t y_{n} = ω_{1} \times | t_{n} - t | + ω_{2} \times | P_{t n} - P_{t} | + ω_{3} \times | P_{t n + 1} - P_{t + 1} | + ω_{4} \times | T_{s e t, t n} - T_{se t, t} | + ω_{5} \times | T_{i n, t n} - T_{i n, t} |

(7)

where Similarity_n is the similarity between the current environmental scenario vector and the nth memory entry vector, and ω₁–ω₅ are the weight coefficients.

Step 3: Memory storage. At the end of each simulation day, the agent reflects on its past decisions and corrects them, resulting in new setpoint adjustment experiences. Each experience is represented as a vector

\vec{M}

as defined in Step 1. Memory entries are updated according to the following rules:

(a): Exact duplicate: If an existing memory entry has identical values for all elements of $\vec{M}$ , no new entry is added.
(b): Action correction: If all elements except A_T match an existing memory entry, the existing A_T is replaced with the corrected action.
(c): New scenario: If any of [T, P_T, P_T₊₁, T_set,T, T_in,T] differ, the new memory vector is added as a new entry in the memory database.

3. Case Study

3.1. Simulation Platform

To demonstrate the key modeling capabilities of the LLM-based agent, a simulation platform was constructed based on [53], as shown in Figure 9. The building’s dynamic thermal model was established using the 5R1C reduced order model [54]. Weather data were sourced from the Meteonorm v8.0.0 meteorological software. The internal heat gain schedule (e.g., from occupants and equipment) was obtained from the EnergyPlus v24.1.0 building simulation software. The acceptable temperature range for building users was derived from the ASHRAE database. Building model parameters were calibrated using the building’s design documents. The building’s default setpoint was set to 26 °C. The unit cooling price for the building was 0.7 CNY/kWh. The system operator provides DR signals, with the DR period occurring between 15:00 and 18:00. The incentive prices during the DR period are shown in Table 3. The building’s EMS offers recommended setpoint. The LangChain framework was employed to model LLM-based agents for building users. Additionally, rule-based and optimization-based agents were developed for comparison. The entire simulation ran for 10 days, from 21 August to 30 August, with a time step of 30 min.

3.2. Rule-Based Agent Settings

Table 4 lists the setpoint adjustment rules for the rule-based agent based on [15]. The rule-based agent is compared with the LLM-based agent without an EMS to assess whether the LLM-based agent exhibits common-sense knowledge similar to that of the rule-based agent.

3.3. Optimization-Based Agent Settings

Table 5 lists the acceptable indoor temperature ranges and weight coefficients of objective functions for the optimization-based agent based on [25]. The agent follows the optimization model described in Section 2.3. The optimization-based agent is compared with the LLM-based agent with an EMS to verify the setpoint override behavior of the LLM-based agent.

3.4. LLM-Based Agent Settings

Table 6 lists the parameters for the LLM-based agent. In this paper, OpenAI’s premier LLM, ChatGPT, was accessed via its API. The specific model version used in the decision module and reflection module was “gpt-4o.” The temperature setting for the LLM was configured to 0.5. This setting strikes an optimal balance to ensure that the responses are both relevant and appropriately varied.

4. Results and Discussion

Section 4.1 compares the DR performance of the LLM-based agent with rule-based and optimization-based agents across four different personas. Section 4.2 provides a detailed analysis of these agents’ decision-making processes. Section 4.3 evaluates the performance of the LLM-based agent without reflection and memory modules. Section 4.4 analyzes the effects of response randomness in LLMs on decision-making.

4.1. DR Performance Comparison

4.1.1. Comparison Between Rule-Based and LLM-Based Agents

Figure 10 shows the DR performance gap (response quantity, cooling cost, and thermal discomfort) between rule-based agents and LLM-based agents (without EMS) across four personas. As the simulation progresses, the DR performance of the LLM-based agents with comfort-oriented and grid-friendly personas remains unchanged. The performance of the LLM-based agents with balanced and cost-oriented personas gradually stabilizes, with little change after 24 August. Due to the reflection and memory modules, the LLM-based agent can accumulate experience and make reasonable choices within the constraints of maximum indoor temperature and thermal discomfort. Compared to the rule-based agents, the LLM-based agents with comfort-oriented and balanced personas are consistent with their performance in all DR indicators. The cost-oriented persona agent shows minor differences, while the grid-friendly persona agent exhibits larger discrepancies. The detailed decision-making process is further analyzed in Section 4.2.

4.1.2. Comparison Between Optimization-Based and LLM-Based Agents

Figure 11 shows the DR performance gap (response quantity, cooling cost, and thermal discomfort) between optimization-based agents and LLM-based agents (with EMS) across four personas. Except for the grid-friendly persona agent, the DR performance of other persona agents shows noticeable differences. This is because the optimization-based agents rationally balance cost and comfort using corresponding weight coefficients. However, in practice, building users may exhibit an intuitive response behavior driven by their own experience. Based on their attitudes and understanding of incentive prices, they may act irrationally, overriding the EMS’s optimized recommendations [29]. The detailed decision-making process is further analyzed in Section 4.2.

4.2. Analysis of the Decision-Making Process

The following sections provide a detailed analysis of the decision-making processes of agents across four personas. LLM responses for agents with different personas are provided in Supplementary Materials A5–A7.

4.2.1. Comfort-Oriented

Figure 12 shows the setpoint curves of LLM-based, rule-based, and optimization-based agents with the comfort-oriented persona. The rule-based and LLM-based agents (with or without EMS) maintain a constant setpoint of 26 °C. The optimization-based agent adjusts the setpoint to 26.4 °C at 16:00, increases it to 28 °C at 17:00, and then lowers it back to 26 °C at 17:30. This indicates that the LLM-based agent possesses common sense corresponding to the comfort-oriented persona. Even when the EMS recommends a moderate increase during the period of high incentive prices, the LLM-based agent prioritizes maintaining its setpoint unchanged.

At 16:00 on 21 August, although the EMS recommended a setpoint of 28 °C, the LLM-based agent (with EMS) chose to “Follow your intuitive experience” (Action 0). The reasoning aligns with the agent’s comfort-oriented persona.

4.2.2. Balanced

Figure 13 shows the setpoint curves of LLM-based, rule-based, and optimization-based agents with the balanced persona. The LLM-based agent (without EMS) set the setpoint to 30 °C on 21 August. It used reflection and memory to correct past erroneous decisions. By 24 August, its decisions became stable. Compared to the rule-based agent, the LLM agent (without EMS) set the setpoint to 26 °C at 15:00 (vs. 27 °C) and 27 °C at 17:00 (vs. 26 °C). Despite these slight differences, they achieved similar DR performance (Figure 10). The LLM-based agent (with EMS) exhibited setpoint override behavior compared to the optimization-based agent. This behavior appeared random. Section 4.4.2 discusses this randomness in detail.

At 15:30 on 21 August, The LLM-based agent (without EMS) agent decided to keep the setpoint at 27 °C (Action 2: No change) under an incentive price of 2.5 CNY/kWh. It predicted that the incentive price would rise to 4.5 CNY/kWh in the next time step and planned to increase the setpoint then to maximize economic benefit while maintaining the same level of comfort loss. This decision reflects LLM’s human-like common sense. Generally, rule-based agents would require complex rule design to achieve this level of common sense.

By the end of 21 August, six experience tuples generated at 30 min decision intervals, denoted as

\vec{M}

= [T, P_T, P_T₊₁, T_set,T, T_in,T, A_T], namely [30, 2.5, 2.5, 26, 26, 1], [31, 2.5, 4.5, 27, 27, 2], [32, 4.5, 4.5, 27, 27, 1], [33, 4.5, 0.5, 28, 28, 1], [34, 0.5, 0.5, 29, 29, 1], and [35, 0.5, 0, 30, 30, 3], were stored in the memory module as newly acquired experiences. And then the agent noticed that the indoor temperature exceeded the 28 °C threshold. This occurred when the agent increased the setpoint from 28 °C to 29 °C at 16:30 (Action 1: Lightly increase), causing the temperature to surpass the limit by 17:00. The agent then invoked its reflection module to review the decision made at 16:30 and corrected it to the right action (Action 2: No change). The corrected action [33, 4.5, 0.5, 28, 28, 2] was used to replace the previously incorrect experience [33, 4.5, 0.5, 28, 28, 1]. The corrected actions [34, 0.5, 0.5, 28, 28, 2] and [35, 0.5, 0, 28, 28, 2] were treated as new experiences and subsequently stored in the memory module.

At the end of 22 August, the agent kept the indoor temperature within 28 °C throughout the day. However, it found that the maximum thermal discomfort was 5 °C*h, exceeding the acceptable limit of 3 °C*h. The agent then invoked its reflection module to review the day’s decisions and re-generated corrected actions for the corresponding moments. These corrected actions were added to its memory. The corrected actions [34, 0.5, 0.5, 28, 28, 3] and [35, 0.5, 0, 28, 28, 3] were used to replace the previously incorrect experiences [34, 0.5, 0.5, 28, 28, 2] and [35, 0.5, 0, 28, 28, 2].

Through these corrections by the reflection module, the LLM continuously updated the experiences stored in the memory module, enabling the agent to make decisions that complied with both the maximum temperature and thermal comfort constraints. As shown in Figure 13, the agent’s actions gradually stabilized over the course of the simulation, initially exceeding the maximum temperature but later remaining within both the temperature and comfort limits, demonstrating the effectiveness of the reflection module.

4.2.3. Cost-Oriented

Figure 14 shows the setpoint curves of LLM-based, rule-based, and optimization-based agents with the cost-oriented persona. The LLM-based agent (without EMS) gradually stabilizes its setpoint through reflective behavior, as explained in Section 4.2.2. Compared to the rule-based agent, it sets the setpoint to 27 °C at 15:00 and 16:30 (vs. 28 °C). This is due to the LLM’s reflective behavior, which forms inherent memories from few-shot experiences. However, these memories can be broken by the LLM’s common-sense knowledge, meaning it does not always strictly follow the few-shot experiences, as detailed in Section 4.4.3. Compared to the optimization-based agent, the LLM-based agent (with EMS) tends to maintain the setpoint of 28 °C. This is due to the agent’s persona, which favors adjusting the setpoint for cost reduction. As a result, it does not strictly follow the recommended setpoint from the optimization model. This reflects the human-like irrationality in decision-making.

4.2.4. Grid-Friendly

Figure 15 shows the setpoint curves of LLM-based, rule-based, and optimization-based agents with the grid-friendly persona. The setpoint of the LLM-based agent (without EMS) rises continuously, reaching 38 °C. This reflects the proactive DR behavior of the grid-friendly persona, even turning off the air-conditioning. The optimization-based agent and the LLM-based agent (with EMS) are constrained by temperature limits, both reaching the upper limit of 30 °C.

At 16:00 on 21 August, even though the indoor temperature reached the upper limit of 30 °C, the LLM-based agent (without EMS) still decided to increase the setpoint (Action 0: Significant increase). Additionally, the few-shot experiences were not enough to constrain the agent’s decision-making. This indicates that the agent’s behavior is entirely driven by its persona, as defined through natural language. It also reflects that the LLM possesses the common sense associated with the grid-friendly persona.

All LLM-based agents with reflection and memory modules perform experience retrieval at each simulation time step when a DR signal is received, querying their memory repositories for the most similar past experience. Table 7 summarizes the numbers of memory entries added and modified for different agent personas during the simulation. It can be observed that after 25 August, memory entries are no longer frequently updated. This is due to the relatively stable external incentive signals and the agents having already accumulated sufficient experience to respond effectively.

In summary, the results show that the LLM-based agent’s decision-making behavior aligns with its persona characteristics and is largely consistent with the rule-based agent’s behavior. This demonstrates that the LLM-based agent possesses human-like common sense derived from natural language-defined personas. It also proves the feasibility of replacing rules with natural language in ABM. With EMS, the LLM-based agent resets the EMS-recommended setpoint based on its persona. This reflects the agent’s irrational decision-making characteristics, as explained in [16]. The agent can learn during the simulation using the reflection and memory modules. However, LLM outputs still show randomness. A detailed analysis of this will be provided in Section 4.4.

4.3. Evaluation of the LLM-Based Agent Without Reflection and Memory Modules

Figure 16 shows the setpoint curves for LLM-based agents with four personas (comfort-oriented, balanced, cost-oriented, and grid-friendly) during the 5-day simulation. The comfort-oriented and grid-friendly agents exhibit constant setpoint adjustment behavior. The balanced and cost-oriented agents display more randomness and variability in their adjustments. However, their behaviors still align with their respective personas. Without reflection and memory modules, their decisions are less stable, highlighting the importance of these modules. Table 8 shows the maximum, minimum, mean, and standard deviation of the thermal discomfort indicator for the four persona agents during the 10-day simulation. The cost-oriented agent has a higher mean discomfort than the balanced agent. This validates the rationality of the prompt design.

4.4. Analysis of LLM Response Randomness

4.4.1. Evaluation of LLM Response Randomness

To analyze the randomness of LLM outputs, 10 repeated simulations were conducted. Figure 17 and Figure 18 show the randomness in DR performance indicators (response quantity, cooling cost, and thermal discomfort) for comfort-oriented, balanced, cost-oriented, and grid-friendly agents (without and with EMS) during the 10 repeated simulations. The error bars represent the standard deviation of the mean values for each simulation. In Figure 17, the standard deviation is larger in the first two or three days, as the agent is still learning from its past experience. In the later days, the agent relies more on its learned experience, reducing randomness. However, some randomness still persists. Figure 18 shows randomness only in the balanced agent, while the other persona agents exhibit no randomness. This is because the natural language descriptions of the personas influence the agent’s decision-making tendencies. The balanced agent, lacking a clear tendency, is the only one to show randomness in its decisions.

4.4.2. Example of LLM Response Randomness

The section presents an example of LLM response randomness for the LLM-based agent with the balanced persona. The conditions are as follows: the incentive price is 0.5 CNY/kWh, the current indoor temperature is 28 °C, the intuitive setpoint is 28 °C, and the EMS-recommended setpoint is 26 °C. In this scenario, the agent chooses between two random actions: (1) Action 0: Following the intuitive experience (the LLM response is shown in Box 1), and (2) Action 1: Following the EMS recommendations (the LLM response is shown in Box 2). The agent’s reasoning and decision-making processes can be understood through the LLM outputs. Both actions are reasonable. It mirrors the variability in human decisions.

Box 1. LLM response corresponding to Action 0: Following the intuitive experience.

#### The current incentive price is 0.50 CNY/kWh, which is a moderate incentive to reduce electricity consumption.

#### The current indoor temperature is 28.00 °C, which is at the upper limit of your comfort tolerance. Adjusting the setpoint to 28.00 °C would maintain the current temperature, potentially maximizing economic benefits by reducing energy consumption.

#### The energy management system recommends a setpoint of 26.00 °C, which would increase electricity consumption and reduce the potential economic benefits from the incentive price.

#### Given the moderate incentive and the fact that the current indoor temperature is already at 28.00 °C, following your intuitive experience to maintain the setpoint at 28.00 °C would maximize economic benefits without further reducing comfort.

Response to user: #### 0 (Following the intuitive experience)

Box 2. LLM response corresponding to Action 1: Following the EMS recommendations.

#### The current incentive price is 0.50 CNY/kWh, which is a moderate incentive to reduce electricity consumption.

#### The current indoor temperature is 28.00 °C, which is at the upper limit of my comfort tolerance.

#### The baseline setpoint is 26 °C, which is more comfortable than the current indoor temperature.

#### The energy management system recommends a setpoint of 26.00 °C, which aligns with the baseline and would maintain comfort.

#### Following my intuitive experience would mean keeping the setpoint at 28.00 °C, which would maximize economic benefits by reducing electricity consumption, but it might compromise comfort.

#### Since the incentive price is not very high and comfort is important, following the energy management system’s recommendation to maintain a setpoint of 26.00 °C seems reasonable.

#### Response to user: #### 1 (Following the EMS recommendations)

4.4.3. Impact of Few-Shot Experiences on LLM Outputs

This section evaluates whether few-shot experiences limit the diversity of LLM outputs. With an incentive price of 4.5 CNY/kWh and an indoor temperature of 28 °C, the agent’s historical experience suggests Action 2: No change. However, in repeated simulations, the agent may also choose Action 1: Lightly increase (the LLM response is shown in Box 3). This indicates that the agent does not rely entirely on few-shot experiences and is still influenced by the natural language-defined persona characteristics.

Box 3. LLM response corresponding to Action 1: Lightly increase.

#### The current incentive price is 4.50 CNY/kWh, which is relatively high, indicating a strong encouragement to reduce electricity consumption.

#### The current indoor temperature is 28.00 °C, which is within a comfortable range for many people.

#### In a previous similar scenario with the same incentive price and indoor temperature, the decision was to take no action (Action 2).

#### However, given the high incentive price, it might be beneficial to consider a slight adjustment to maximize economic benefits while staying within comfort tolerance.

#### A Lightly Increase (Action 1) would slightly reduce electricity consumption and take advantage of the high incentive price without significantly impacting comfort.

#### Considering the balance between economic benefits and comfort, a Lightly Increase (Action 1) seems to be a reasonable choice.

Response to user: #### 1 (Lightly increase)

5. Conclusions

Traditional ABM methods struggle to encode building users’ natural language features and common sense into rules or mathematical equations. This paper proposes an LLM-based agent framework to simulate building users’ air-conditioning setpoint adjustment behavior under DR. The LLM-based agent is compared with rule-based and optimization-based agents. The results show that the proposed LLM-based agent has human-like common sense derived from natural language-defined personas. This proves that replacing rules with natural language in ABM is feasible. The conclusions are summarized as follows:

(1): The LLM-based agent can define personas using natural language based on empirical survey data. This reduces the effort involved in rule design and parameter selection compared to traditional ABM methods. Without an EMS, the LLM-based agent performs similarly to the rule-based agent in DR performance indicators. This shows that natural language can provide the agent with proper persona characteristics and common sense. With an EMS, the LLM-based agent exhibits setpoint override behavior compared to the optimization-based agent. This reflects human-like irrational factors in decision-making based on the agent’s common sense.
(2): The LLM-based agent can model human characteristics that are difficult to parameterize. It can reason and reflect, using LLM outputs to explain its decisions. Memory and reflection modules enable the agent to learn and adapt during the simulation. The LLM outputs may exhibit randomness when the persona tendencies from natural language are not clearly defined. This randomness partly mirrors the variability in human decisions.
(3): The LLM-based agent’s persona heterogeneity can be achieved through varied natural language descriptions based on empirical survey data. This reduces the time needed for traditional persona development.

This study primarily presents a conceptual framework integrating LLMs into agent-based modeling for simulating building users’ air-conditioning setpoint adjustment behavior. As an emerging conceptual approach, several limitations remain and warrant future investigation. First, the proposed framework has been compared only with rule-based and optimization-based agent-based modeling approaches. Its ability to accurately reflect real human DR behavior in practical settings still requires further validation through empirical data. Integrating such real-world data with the proposed agent framework poses a key challenge for future research. Second, variations in model parameters, such as the choice of LLM or temperature settings, may affect model outputs. Systematic exploration of these factors is necessary to better understand the robustness and generalizability of the framework.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/buildings16050887/s1, Notes A1–A4: De-tailed prompt designs for Figure 4, Figure 5, Figure 6 and Figure 7; Notes A5–A7: LLM responses for agents with different personas.

Author Contributions

Conceptualization, M.D.; Data curation, M.D.; Formal analysis, M.D.; Funding acquisition, X.P.; Investigation, M.D.; Methodology, M.D.; Project administration, X.P.; Resources, M.D.; Software, M.D.; Supervision, X.P.; Validation, M.D.; Visualization, X.P.; Writing—original draft, M.D.; Writing—review and editing, X.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Merian Fund by the Dutch Research Council, grant number 482.20.608.

Data Availability Statement

The original contributions proposed in this study are included within the manuscript. For further inquiries, please contact the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Y.; Zhao, Y.; Zhang, C.; Feng, C. Multi-Agent Reinforcement Learning-Based Method for Demand Response of Building HVAC Systems. J. Build. Eng. 2025, 108, 112734. [Google Scholar] [CrossRef]
Madani, S.; Pineau, P.-O.; Charlin, L.; Desage, Y. Towards Sustainable Energy Use: Reinforcement Learning for Demand Response in Commercial Buildings. Energy Build. 2025, 342, 115721. [Google Scholar] [CrossRef]
Zhen, C.; Niu, J.; Tian, Z.; Lu, Y.; Liang, C. Risk-Averse Transactions Optimization Strategy for Building Users Participating in Incentive-Based Demand Response Programs. Appl. Energy 2025, 380, 125009. [Google Scholar] [CrossRef]
Koch, M.; Sawant, P.; Eismann, R.; Jones, C.N. A Commissioning-Oriented Approach to Data-Driven Modeling of Buildings with Heat Pumps for Predictive Demand Response. J. Build. Eng. 2025, 111, 113016. [Google Scholar] [CrossRef]
Langevin, J.; Cetin, K.; Willems, S.; Kang, J.; Mahmud, R.; Christensen, T.H.; Li, R.; Knotzer, A.; Olawale, O.W.; Saelens, D.; et al. Customer Enrollment and Participation in Building Demand Management Programs: A Review of Key Factors. Energy Build. 2024, 320, 114618. [Google Scholar] [CrossRef]
Skoczkowski, T.; Bielecki, S.; Wołowicz, M.; Sobczak, L.; Węglarz, A.; Gilewski, P. Participation in Demand Side Response. Are Individual Energy Users Interested in This? Renew. Energy 2024, 232, 121104. [Google Scholar] [CrossRef]
Yan, D.; Hong, T.; Dong, B.; Mahdavi, A.; D’Oca, S.; Gaetani, I.; Feng, X. IEA EBC Annex 66: Definition and Simulation of Occupant Behavior in Buildings. Energy Build. 2017, 156, 258–270. [Google Scholar] [CrossRef]
Mylonas, A.; Tsangrassoulis, A.; Pascual, J. Modelling Occupant Behaviour in Residential Buildings: A Systematic Literature Review. Build. Environ. 2024, 265, 111959. [Google Scholar] [CrossRef]
Carlucci, S.; De Simone, M.; Firth, S.K.; Kjærgaard, M.B.; Markovic, R.; Rahaman, M.S.; Annaqeeb, M.K.; Biandrate, S.; Das, A.; Dziedzic, J.W.; et al. Modeling Occupant Behavior in Buildings. Build. Environ. 2020, 174, 106768. [Google Scholar] [CrossRef]
Putra, H.C.; Andrews, C.J.; Senick, J.A. An Agent-Based Model of Building Occupant Behavior during Load Shedding. Build. Simul. 2017, 10, 845–859. [Google Scholar] [CrossRef]
Malik, J.; Mahdavi, A.; Azar, E.; Chandra Putra, H.; Berger, C.; Andrews, C.; Hong, T. Ten Questions Concerning Agent-Based Modeling of Occupant Behavior for Energy and Environmental Performance of Buildings. Build. Environ. 2022, 217, 109016. [Google Scholar] [CrossRef]
Mashuk, M.S.; Pinchin, J.; Siebers, P.-O.; Moore, T. Comparing Different Approaches of Agent-Based Occupancy Modelling for Predicting Realistic Electricity Consumption in Office Buildings. J. Build. Eng. 2024, 84, 108420. [Google Scholar] [CrossRef]
Mishra, S.; Silva, T.L.; Hellemo, L.; Jaehnert, S.; Egner, L.E.; Petersen, S.A.; Signer, T.; Zimmermann, F.; Bordin, C. Agent-Based Modeling: Insights into Consumer Behavior, Urban Dynamics, Grid Management, and Market Interactions. Energy Strategy Rev. 2025, 57, 101613. [Google Scholar] [CrossRef]
Mazzarino, P.R.; De Vizia, C.; Macii, E.; Patti, E.; Bottaccioli, L. An Agent-Based Framework for Smart Grid Balancing Exploiting Thermal Flexibility of Residential Buildings. In Proceedings of the 2021 IEEE International Conference on Environment and Electrical Engineering and 2021 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe), Bari, Italy, 7–10 September 2021; pp. 1–6. [Google Scholar]
Zhou, X.; Xue, S.; Du, H.; Ma, Z. Optimization of Building Demand Flexibility Using Reinforcement Learning and Rule-Based Expert Systems. Appl. Energy 2023, 350, 121792. [Google Scholar] [CrossRef]
Good, N. Using Behavioural Economic Theory in Modelling of Demand Response. Appl. Energy 2019, 239, 107–116. [Google Scholar] [CrossRef]
Reis, I.F.G.; Gonçalves, I.; Lopes, M.A.R.; Antunes, C.H. Assessing the Influence of Different Goals in Energy Communities’ Self-Sufficiency—An Optimized Multiagent Approach. Energies 2021, 14, 989. [Google Scholar] [CrossRef]
Reis, I.F.G.; Gonçalves, I.; Lopes, M.A.R.; Antunes, C.H. Towards Inclusive Community-Based Energy Markets: A Multiagent Framework. Appl. Energy 2022, 307, 118115. [Google Scholar] [CrossRef]
Liu, Y.; Kergus, P.; Claveau, F.; Chevrel, P.; Lacarrière, B. Building Thermal Control: Hierarchical Design from Limited Data Using Gray-Box or Black-Box Internal Models for Model Predictive Control. J. Build. Eng. 2025, 110, 112936. [Google Scholar] [CrossRef]
Tang, R.; Fan, C.; Zeng, F.; Feng, W. Data-Driven Model Predictive Control for Power Demand Management and Fast Demand Response of Commercial Buildings Using Support Vector Regression. Build. Simul. 2022, 15, 317–331. [Google Scholar] [CrossRef]
Ala’raj, M.; Radi, M.; Abbod, M.F.; Majdalawieh, M.; Parodi, M. Data-Driven Based HVAC Optimisation Approaches: A Systematic Literature Review. J. Build. Eng. 2022, 46, 103678. [Google Scholar] [CrossRef]
Peng, J.; Luo, Z.; Tan, Y.; Jiang, H.; Yin, R.; Yan, J. Balancing Stakeholder Benefits: A Many-Objective Optimal Dispatch Framework for Home Energy Systems Inspired by Maslow’s Hierarchy of Needs. Adv. Appl. Energy 2024, 13, 100160. [Google Scholar] [CrossRef]
Taheri, S.; Hosseini, P.; Razban, A. Model Predictive Control of Heating, Ventilation, and Air Conditioning (HVAC) Systems: A State-of-the-Art Review. J. Build. Eng. 2022, 60, 105067. [Google Scholar] [CrossRef]
Xu, S.; Chen, X.; Xie, J.; Rahman, S.; Wang, J.; Hui, H.; Chen, T. Agent-Based Modeling and Simulation of the Electricity Market with Residential Demand Response. CSEE J. Power Energy Syst. 2020, 7, 368–380. [Google Scholar] [CrossRef]
Reis, I.F.G.; Gonçalves, I.; Lopes, M.A.R.; Antunes, C.H. A Multi-Agent System Approach to Exploit Demand-Side Flexibility in an Energy Community. Util. Policy 2020, 67, 101114. [Google Scholar] [CrossRef]
Price, C.S.; Moodley, D.; Pillay, A.; Rens, G. An Adaptive, Probabilistic, Cognitive Agent Architecture for Modelling Sugarcane Growers’ Operational Decision-Making. S. Afr. Comput. J. (SACJ) 2022, 34, 152–191. [Google Scholar] [CrossRef]
Schwarzer, J.; Engel, D. Consumer Participation in Demand Response Programs: Development of a Consumat-Based Toy Model. In Advances in Social Simulation; Czupryna, M., Kamiński, B., Eds.; Springer Proceedings in Complexity; Springer International Publishing: Cham, Switzerland, 2022; pp. 315–327. ISBN 978-3-030-92842-1. [Google Scholar]
Vellei, M.; Martinez, S.; Le Dréau, J. Agent-Based Stochastic Model of Thermostat Adjustments: A Demand Response Application. Energy Build. 2021, 238, 110846. [Google Scholar] [CrossRef]
Deng, M.; Peng, X.; Zhao, Y. A Belief-Desire-Intention Agent for Modeling Building Managers’ Air-Conditioning Setpoint Adjustment Behavior under Demand Response. Energy Build. 2025, 346, 116248. [Google Scholar] [CrossRef]
Hoffmann, S.; Adelt, F.; Weyer, J. Modelling End-User Behavior and Behavioral Change in Smart Grids. An Application of the Model of Frame Selection. Energies 2020, 13, 6674. [Google Scholar] [CrossRef]
Zhang, L.; Chen, Z. Opportunities of Applying Large Language Models in Building Energy Sector. Renew. Sustain. Energy Rev. 2025, 214, 115558. [Google Scholar] [CrossRef]
Zhang, L.; Chen, Z.; Ford, V. Advancing Building Energy Modeling with Large Language Models: Exploration and Case Studies. Energy Build. 2024, 323, 114788. [Google Scholar] [CrossRef]
Jiang, G.; Chen, J. Efficient Fine-Tuning of Large Language Models for Automated Building Energy Modeling in Complex Cases. Autom. Constr. 2025, 175, 106223. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, D.; Wang, G.; Xu, P.; Zhu, Y. Data-Driven Building Load Prediction and Large Language Models: Comprehensive Overview. Energy Build. 2025, 326, 115001. [Google Scholar] [CrossRef]
Qin, S.; Guan, H.; Liao, W.; Gu, Y.; Zheng, Z.; Xue, H.; Lu, X. Intelligent Design and Optimization System for Shear Wall Structures Based on Large Language Models and Generative Artificial Intelligence. J. Build. Eng. 2024, 95, 109996. [Google Scholar] [CrossRef]
Song, L.; Zhang, C.; Zhao, L.; Bian, J. Pre-Trained Large Language Models for Industrial Control. arXiv 2023, arXiv:2308.03028. [Google Scholar] [CrossRef]
Zhang, L.; Huang, J.; Chen, C. Multi-Condition Building Decarbonization Using Deep Reinforcement Learning and Large Language Model. Energy Build. 2025, 341, 115810. [Google Scholar] [CrossRef]
Zhang, L.; Chen, Z. Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems. Energy Build. 2024, 313, 114278. [Google Scholar] [CrossRef]
Wang, Z.; Liu, Z.; Lu, W.; Jia, L. Improving Knowledge Management in Building Engineering with Hybrid Retrieval-Augmented Generation Framework. J. Build. Eng. 2025, 103, 112189. [Google Scholar] [CrossRef]
Li, Y.; Ji, M.; Chen, J.; Wei, X.; Gu, X.; Tang, J. A Large Language Model-Based Building Operation and Maintenance Information Query. Energy Build. 2025, 334, 115515. [Google Scholar] [CrossRef]
Yin, M.; Tang, L.; Webster, C.; Xu, S.; Li, X.; Ying, H. An Ontology-Aided, Natural Language-Based Approach for Multi-Constraint BIM Model Querying. J. Build. Eng. 2023, 76, 107066. [Google Scholar] [CrossRef]
Zheng, Z.; Marié, S.; Farazdaghi, E.; Yahia, E.; Makhoul, K.; Lagarde, T.; Meouche, R.E.; Ababsa, F. Mastering Building Management Systems Data Points Tagging with Minimal Examples: Unveiling the Power of Large Language Models. Energy Build. 2025, 328, 115173. [Google Scholar] [CrossRef]
Xu, Y.; Zhu, S.; Cai, J.; Chen, J.; Li, S. A Large Language Model-Based Platform for Real-Time Building Monitoring and Occupant Interaction. J. Build. Eng. 2025, 100, 111488. [Google Scholar] [CrossRef]
Chen, L.; Darko, A.; Zhang, F.; Chan, A.P.C.; Yang, Q. Can Large Language Models Replace Human Experts? Effectiveness and Limitations in Building Energy Retrofit Challenges Assessment. Build. Environ. 2025, 276, 112891. [Google Scholar] [CrossRef]
Gao, C.; Lan, X.; Li, N.; Yuan, Y.; Ding, J.; Zhou, Z.; Xu, F.; Li, Y. Large Language Models Empowered Agent-Based Modeling and Simulation: A Survey and Perspectives. Humanit. Soc. Sci. Commun. 2024, 11, 1259. [Google Scholar] [CrossRef]
Dang, P.; Zhu, J.; Li, W.; Xie, Y.; Zhang, H. Large-Language-Model-Driven Agents for Fire Evacuation Simulation in a Cellular Automata Environment. Saf. Sci. 2025, 191, 106935. [Google Scholar] [CrossRef]
Wang, B.; Cai, Q.; Sun, Z. Determinants of Willingness to Participate in Urban Incentive-Based Energy Demand-Side Response: An Empirical Micro-Data Analysis. Sustainability 2020, 12, 8052. [Google Scholar] [CrossRef]
Wu, Z.; Peng, R.; Han, X.; Zheng, S.; Zhang, Y.; Xiao, C. Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations. arXiv 2023, arXiv:2311.06330. [Google Scholar] [CrossRef]
Anik, S.M.H.; Gao, X.; Meng, N. Towards Automated Occupant Profile Creation in Smart Buildings: A Machine Learning-Enabled Approach for User Persona Generation. Energy Build. 2023, 297, 113485. [Google Scholar] [CrossRef]
Lyu, J.; Li, J.; Zhao, Z.; Miao, X.; Du, H.; Lai, D.; Yang, Y.; Lian, Z. How Do People Set Air Conditioning Temperature Setpoint in Urban Domestic–Behavior Model in Chinese Three Climate Zones Based on Historical Usage Data. Energy Build. 2023, 284, 112856. [Google Scholar] [CrossRef]
Hua, P.; Wang, H.; Xie, Z.; Lahdelma, R. Integrated Demand Response Method for Heating Multiple Rooms Based on Fuzzy Logic Considering Dynamic Price. Energy 2024, 307, 132577. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, A.; Li, M.; Smola, A. Automatic Chain of Thought Prompting in Large Language Models. arXiv 2022, arXiv:2210.03493. [Google Scholar] [CrossRef]
Wang, Z.; Chen, B.; Li, H.; Hong, T. AlphaBuilding ResCommunity: A Multi-Agent Virtual Testbed for Community-Level Load Coordination. Adv. Appl. Energy 2021, 4, 100061. [Google Scholar] [CrossRef]
Hallik, J.; Arumägi, E.; Pikas, E.; Kalamees, T. Comparative Assessment of Simple and Detailed Energy Performance Models for Urban Energy Modelling Based on Digital Twin and Statistical Typology Database for the Renovation of Existing Building Stock. Energy Build. 2024, 323, 114775. [Google Scholar] [CrossRef]

Figure 1. The proposed LLM-based agent framework.

Figure 2. Persona description prompt template.

Figure 3. Environmental scenario perception prompt template.

Figure 4. Decision-making prompt template for agents without an EMS.

Figure 5. Decision-making prompt template for agents with an EMS.

Figure 6. Reflection on exceeding maximum indoor temperature and prompt template.

Figure 7. Reflection on exceeding maximum discomfort and prompt template.

Figure 8. Initialization, memory retrieval, and memory storage of the memory module.

Figure 9. Simulation platform for demonstrating the LLM-based agent.

Figure 10. DR performance gap (response quantity, cooling cost, and thermal discomfort) between rule-based agents and LLM-based agents (without EMS) across four personas.

Figure 11. DR performance gap (response quantity, cooling cost, and thermal discomfort) between optimization-based agents and LLM-based agents (with EMS) across four personas.

Figure 12. Setpoint curves of LLM-based, rule-based, and optimization-based agents with the comfort-oriented persona. The results represented by the blue and yellow curves are nearly identical, leading to their overlap in the figure.

Figure 13. Setpoint curves of LLM-based, rule-based, and optimization-based agents with the balanced persona.

Figure 14. Setpoint curves of LLM-based, rule-based, and optimization-based agents with the cost-oriented persona.

Figure 15. Setpoint curves of LLM-based, rule-based, and optimization-based agents with the grid-friendly persona. The results represented by the green and purple curves are nearly identical, leading to their overlap in the figure.

Figure 16. Setpoint curves for LLM-based agents with four personas (comfort-oriented, balanced, cost-oriented, and grid-friendly) during the 5-day simulation.

Figure 17. Randomness in DR performance indicators (response quantity, cooling cost, and thermal discomfort) for comfort-oriented, balanced, cost-oriented, and grid-friendly agents (without EMS) during the 10 repeated simulations.

Figure 18. Randomness in DR performance indicators (response quantity, cooling cost, and thermal discomfort) for comfort-oriented, balanced, cost-oriented, and grid-friendly agents (with EMS) during the 10 repeated simulations.

Table 1. Comparison of LLM-based ABM and traditional ABM features [48].

	LLM-Based ABM	Traditional ABM
Model implementation	Natural language instruction	Programming
Personalization	Natural language	By parameters
Common sense	Built-in LLM	Specified by rules
Learning and reasoning	Few-shot + zero-shot	Data-demanding or time consuming learning; no reasoning
Randomness and diversity	Random and diversified	Limited by rules
Model interpretability	By LLM’s output	By rules and parameters

Table 2. Examples of empirical questionnaire data related to building users’ DR behavior.

Influence Factors	Description	Item *
Attitude [47]	Reflecting the motivation for energy-saving behavior	No matter what others do, I have a moral obligation to reduce power consumption.
Attitude [47]	Reflecting the motivation for energy-saving behavior	I think my energy-saving actions can help reduce air pollution and alleviate power shortages.
Behavior ability [47]	Reflecting personal past experience and future obstacles	I think it is troublesome to save electricity.
Behavior ability [47]	Reflecting personal past experience and future obstacles	I don’t think I can reduce my electricity use because it will reduce my comfort.
External motivators [47]	Incentives and policy factors that change personal electricity consumption behavior	Providing monetary rewards to consumers who save electricity, I would be more willing to save electricity.
External motivators [47]		The rules and regulations require saving electricity, and I will resolutely implement it.
Preferences [30]	Subjectively preferences when making decisions; related to (1) costs, (2) comfort, (3) grid-friendly behavior	When making electricity-related decisions, it is important to ensure that my comfort is not compromised.
		When making electricity-related decisions, it is important to keep the setpoint within my tolerance range while maximizing my economic benefits.
		When making electricity-related decisions, it is important to achieve as much load reduction as possible, even if it means sacrificing some of my comfort.

* Response choices (except preferences): 1–5 Likert scale (Strongly disagree; Disagree; Neutral; Agree; Strongly agree).

Table 3. Incentive prices during the DR period.

Time	0:00–15:00	15:00–16:00	16:00–17:00	17:00–18:00	18:00–00:00
Incentive price (CNY/kWh)	0	2.5	4.5	0.5	0

Table 4. Setpoint adjustment rules for the rule-based agent.

Time		0:00–15:00	15:00–16:00	16:00–17:00	17:00–18:00	18:00–00:00
Incentive price (CNY/kWh)		0	2.5	4.5	0.5	0
Setpoint for different personas (°C)	Comfort-oriented	26	26	26	26	26
	Balanced	26	27	28	26	26
	Cost-oriented	26	28	28	27	26
	Grid-friendly	26	30	30	30	26

Table 5. Parameters for the optimization-based agent.

Personas		Comfort-Oriented	Balanced	Cost-Oriented	Grid-Friendly
Parameters	Variables	Comfort-Oriented	Balanced	Cost-Oriented	Grid-Friendly
Minimum acceptable indoor temperature (°C)	T_in_,min	24	24	24	24
Maximum acceptable indoor temperature (°C)	T_in_,max	28	28	28	30
Weight coefficients of objective functions	[η₁, η₂, η₃]	[0, 0.2, 0.8]	[0, 0.5, 0.5]	[0, 0.8, 0.2]	[1, 0, 0]

Table 6. Parameters for the LLM-based agent.

Personas		Comfort-Oriented	Balanced	Cost-Oriented	Grid-Friendly
Parameters	Variables	Comfort-Oriented	Balanced	Cost-Oriented	Grid-Friendly
Minimum acceptable indoor temperature (°C)	T_in_,min	24	24	24	24
Maximum acceptable indoor temperature (°C)	T_in_,max	28	28	28	30
Weight coefficients of objective functions	[η₁, η₂, η₃]	[0, 0.2, 0.8]	[0, 0.5, 0.5]	[0, 0.8, 0.2]	[1, 0, 0]
Maximum tolerable thermal discomfort level (°C*h)	F_discomfort_,max	3	3	6	/
Weight coefficients in similarity calculation	ω₁–ω₅	1, 0.25, 0.25, 0.25, 0.25

Table 7. Numbers of memory entries added and modified for different agent personas during the simulation.

Simulation Day	Comfort-Oriented		Balanced		Cost-Oriented		Grid-Friendly
Simulation Day	Added	Modified	Added	Modified	Added	Modified	Added	Modified
8/21	6	0	8	1	9	1	6	0
8/22	0	0	0	2	2	1	0	0
8/23	0	0	0	1	2	1	0	0
8/24	0	0	1	0	0	0	0	0
8/25	0	0	0	0	0	0	0	0

Table 8. Maximum, minimum, mean, and standard deviation of the thermal discomfort indicator for the four persona agents during the 10-day simulation.

Personas	Thermal Discomfort
Personas	Max	Min	Mean	Standard
Comfort-oriented	0	0	0	0
Balanced	10	6.5	7.6	0.99
Cost-oriented	11	7.5	8.85	1.11
Grid-friendly	21	21	21	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, M.; Peng, X. A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response. Buildings 2026, 16, 887. https://doi.org/10.3390/buildings16050887

AMA Style

Deng M, Peng X. A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response. Buildings. 2026; 16(5):887. https://doi.org/10.3390/buildings16050887

Chicago/Turabian Style

Deng, Mengqiu, and Xiao Peng. 2026. "A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response" Buildings 16, no. 5: 887. https://doi.org/10.3390/buildings16050887

APA Style

Deng, M., & Peng, X. (2026). A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response. Buildings, 16(5), 887. https://doi.org/10.3390/buildings16050887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Large Language Model-Based Agent Framework for Simulating Building Users’ Air-Conditioning Setpoint Adjustment Behavior Under Demand Response

Abstract

1. Introduction

2. Methodology

2.1. LLM-Based Agent Framework

2.2. Persona

2.3. Perception

2.4. Decision

2.4.1. Decision Without an EMS

2.4.2. Decision with an EMS

2.5. Reflection

2.5.1. Reflection on Exceeding the Maximum Indoor Temperature

2.5.2. Reflection on Exceeding the Maximum Discomfort

2.6. Memory

3. Case Study

3.1. Simulation Platform

3.2. Rule-Based Agent Settings

3.3. Optimization-Based Agent Settings

3.4. LLM-Based Agent Settings

4. Results and Discussion

4.1. DR Performance Comparison

4.1.1. Comparison Between Rule-Based and LLM-Based Agents

4.1.2. Comparison Between Optimization-Based and LLM-Based Agents

4.2. Analysis of the Decision-Making Process

4.2.1. Comfort-Oriented

4.2.2. Balanced

4.2.3. Cost-Oriented

4.2.4. Grid-Friendly

4.3. Evaluation of the LLM-Based Agent Without Reflection and Memory Modules

4.4. Analysis of LLM Response Randomness

4.4.1. Evaluation of LLM Response Randomness

4.4.2. Example of LLM Response Randomness

4.4.3. Impact of Few-Shot Experiences on LLM Outputs

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI