Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations

Ahn, Ki Uhn; Kim, Deuk-Woo; Cho, Hyun Mi; Chae, Chang-U

doi:10.3390/buildings13112680

Open AccessArticle

Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations

¹

Department of Building Research, Korea Institute of Civil Engineering and Building Technology, Goyang-si 10223, Republic of Korea

²

Department of Building Energy Research, Korea Institute of Civil Engineering and Building Technology, Goyang-si 10223, Republic of Korea

³

Research Strategic Planning Department, Korea Institute of Civil Engineering and Building Technology, Goyang-si 10223, Republic of Korea

^*

Author to whom correspondence should be addressed.

Buildings 2023, 13(11), 2680; https://doi.org/10.3390/buildings13112680

Submission received: 5 September 2023 / Revised: 14 October 2023 / Accepted: 23 October 2023 / Published: 24 October 2023

(This article belongs to the Special Issue AI and Data Analytics for Energy-Efficient and Healthy Buildings)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial intelligence (AI) technology has rapidly advanced and transformed the nature of scientific inquiry. The recent release of the large language model Chat Generative Pre-Trained Transformer (ChatGPT) has attracted significant attention from the public and various industries. This study applied ChatGPT to autonomous building system operations, specifically coupling it with an EnergyPlus reference office building simulation model. The operational objective was to minimize the energy use of the building systems, including four air-handling units, two chillers, a cooling tower, and two pumps, while ensuring that indoor CO₂ concentrations remain below 1000 ppm. The performance of ChatGPT in an autonomous operation was compared with control results based on a deep Q-network (DQN), which is a reinforcement learning method. The ChatGPT and DQN lowered the total energy use by 16.8% and 24.1%, respectively, compared with the baseline operation, while maintaining an indoor CO₂ concentration below 1000 ppm. Notably, compared with the DQN, ChatGPT-based control does not require a learning process to develop intelligence for building control. In real-world applications, the high generalization capabilities of the ChatGPT-based control, resulting from its extensive training on vast and diverse data, could potentially make it more effective.

Keywords:

ChatGPT; large language model; deep Q-network; reinforcement learning; artificial intelligence; autonomous building

1. Introduction

The building sector contributes to more than 40% of the primary energy use, of which 30% of building energy use is consumed by heating, ventilation, and air conditioning (HVAC) systems [1,2]. While energy savings have become a priority in the design and operation of modern HVAC systems, numerous studies have reported that advanced HVAC control can achieve average energy savings of 13% to 28% [3]. However, the majority of buildings today adopt simple rule-based control [3], resulting in a waste of 4–20% of the energy used in HVAC systems in most commercial and residential buildings owing to operational inefficiencies [2].

To date, approaches to HVAC system control have been categorized as (1) rule-based, (2) model-based, and (3) model-free [4]. The rule-based approach, which uses a predetermined set of intuitive rules, requires the least modeling effort [4] and has been used primarily for indoor temperature regulation [5] and sequential control of HVAC systems [6]. However, its performance can be inferior to that of other control methods because it cannot precisely consider the dynamic behavior of HVAC systems [4]. In the last decade, model-based approaches, particularly model predictive control (MPC), have become dominant control strategies in research on intelligent building operations [3]. MPC has the capability to predict system states and determine optimal values of control variables for minimizing the cost function [7,8,9,10,11]. The cost function accounts for the control objectives, e.g., minimizing energy use. The prediction horizon is the period in which the cost function is to be minimized. The optimization solver determines optimal values for control variables that minimize the cost function for the prediction horizon. White, gray, and black box models have been used for the MPC. The white and gray box models suffer from uncertainties related to the model structure and parameters, and black box models face challenges in terms of accuracy and reliability, which depend on factors such as quantity, quality, and the relevance of the collected dataset [4].

Reinforcement learning (RL) can be employed for the model-free control of HVAC systems. In reinforcement learning, an agent consistently interacts with a process at each time step using three components: a state that characterizes the environment’s state (or system), an action that empowers the agent to have effects on the environment, and a reward that promptly offers feedback on the agent’s performance [12]. The primary aim of RL is to establish an optimal policy that maximizes the anticipated cumulative reward concerning the agent’s interactions with the environment [4]. Valladares et al. [13] proposed an RL framework to minimize energy consumption and enhance the overall energy efficiency of buildings. Azuatalam et al. [14] used an RL architecture to optimally control whole-building HVAC systems for energy savings and thermal comfort while harnessing its demand response capabilities. However, RL requires iterative interactions between the environment and the agent until its policy becomes capable of optimal decision-making. Additionally, the optimal RL policy for specific environments may not be directly applicable to other environments if it is not trained under generic conditions [4]. Transfer learning has been applied to RL tasks in order to improve the generalization performance of RL policy; however, various technology integrations and validation studies are still needed, considering the diverse building types and HVAC system configurations.

Over the past few years, language models have benefited significantly from the rapid development of artificial intelligence (AI) and natural language processing (NLP), rendering them more accurate, flexible, and useful than ever before [15,16]. The advancement of AI and NLP technologies has been driven by increased computational power, large datasets, and innovative model architectures, enabling large language models to achieve unprecedented levels of performance. Chat Generative Pre-Trained Transformer (ChatGPT) is an AI large language model (LLM) that is trained with RL from human feedback (RLHF) and fine-tuned with proximal policy optimization [17,18,19]. The ChatGPT model was fine-tuned with supervised learning using the labelers’ demonstration dataset. The policy of ChatGPT that generates an output was optimized to maximize a reward that reflects the preferences of the labelers for outputs generated by the ChatGPT model [17,18]. ChatGPT is a robust NLP model that can comprehend and create natural languages for a wide range of applications, including text production, language understanding, and interactive programs [20]. Owing to its adaptability and superior NLP skills, ChatGPT is expected to be applicable in numerous domains, including legal support, learning and teaching, coding and programming, journalism and media, financial institutions, etc. [17]. Furthermore, ChatGPT is being considered for use in various academic research fields, such as biology and environmental science [19], healthcare [21], and medical research [22,23].

This study aims to investigate the potential of the pre-trained large language model, ChatGPT, as an alternative to existing approaches to HVAC system control. The authors implemented a novel approach with ChatGPT, referred to as ChatGPT pre-trained LLM control, in order to minimize building energy consumption while ensuring that indoor CO₂ concentration remains below 10,000 ppm. The authors employed ChatGPT to make consecutive decisions for HVAC systems, thereby enabling the operation of buildings based on the decision results. To investigate the feasibility of the ChatGPT pre-trained LLM control, model-free control using a deep Q-network (DQN) was employed and compared with the results of the ChatGPT-based operation. For this study, a reference office building was chosen, which is an EnergyPlus model [24] provided by the U.S. Department of Energy (DOE). ChatGPT- and DQN-based HVAC controls were implemented by coupling the EnergyPlus model using Python programming.

2. Methods

2.1. ChatGPT

The history of ChatGPT began with GPT-1, a first-generation generative pre-trained transformer model, marking the development of NLP technology [25,26]. GPT-1 is pr-trained on a large corpus of text data using a masked language model objective, enabling it to capture general language features and linguistic patterns, and it is subsequently fine-tuned on specific downstream tasks using labeled data [27]. In particular, the model’s pretraining enables it to learn contextualized representations, which enhances its ability to understand and generate text [27].

GPT-2, trained on 40 GB of textual data, exhibited the capability to generate highly coherent and plausible outputs [26,28]. Utilizing unsupervised learning, GPT-2 acquires linguistic patterns and general language features [29]. Furthermore, the multitasking learning ability of GPT-2 enables it to perform various language-related tasks without task-specific fine-tuning [29]. GPT-2’s notably scalable architecture incorporates models spanning 117 million to 1.5 billion parameters and facilitates the generation of coherent text across diverse prompts and contexts, demonstrating substantial architectural flexibility [29].

GPT-3 generates human-like coherent sequences of words, codes, and other data [25,28,30]. As an autoregressive language model with 175 billion parameters, it embodies the concept of few-shot learning, enabling it to perform proficiently in different tasks even when considering only a few examples [31]. Although few-shot learning offers the benefit of significantly reducing the requirement for task-specific data, it is accompanied by the notable drawback that results from this approach, which has demonstrated inferior performance compared with state-of-the-art fine-tuned models [31]. In addition, GPT-3 exhibits three primary shortcomings as noted by researchers: the inability to answer semantic [26,28,30], factual [28], and ethical questions [26,28,30].

On 30 November 2022, OpenAI released ChatGPT, which is a conversational chatbot model capable of responding to follow-up questions, admitting mistakes, challenging incorrect premises, and rejecting inappropriate requests [18]. The training data for ChatGPT encompass a wide variety of textual content, including web pages, books, news articles, Wikipedia entries, academic papers, and dialogue records, covering a broad spectrum of subjects and domains as well as incorporating multiple languages. ChatGPT was fine-tuned from a model in the GPT-3.5 series, including text-davinci-002, which is an InstructGPT model [32]. InstructGPT was developed as an advanced language model with the aim of enhancing the ability to effectively follow instructions by utilizing human feedback during training [33]. The development approach involved training the model on a broad range of tasks by providing it with instruction-like prompts and corresponding responses generated by human AI trainers [33]. The pre-trained GPT-3 model was fine-tuned using the dataset in which OpenAI’s labelers demonstrated the desired behavior on input prompts. A dataset of output comparisons was also gathered, where the preferred output for a given input was chosen by labelers. After a reward model was trained to predict the preferred output, its output was used as a reward. Subsequently, the policy was optimized to maximize this reward using the proximal policy optimization (PPO) algorithm [33]. InstructGPT demonstrates the ability to accurately follow instructions and produce contextually relevant outputs, which has significant implications for various applications requiring precise and instructive language generation [33]. ChatGPT was trained by the RLHF, employing methods similar to InstructGPT but with slight differences in the data collection setup [18]. It was fine-tuned to acquire the capability to imitate human-like conversational skills and respond to smooth, natural instant dialogues with the general public through a free, easy-to-use web interface [25,34,35]. Currently, the latest GPT model is GPT-4, which was released on 14 March 2023, and it has been applied to ChatGPT Plus [36]. GPT-4, which is an improvement on GPT-3.5, can solve difficult problems with greater accuracy owing to its broader general knowledge and advanced reasoning capabilities [37].

2.2. Deep Q-Network (DQN)

An agent can develop its behavior by interacting with a dynamic environment in the context of RL [38]. In this framework, the agent is the term used for the learner and decision-maker, while the environment includes everything apart from the agent with which interactions occur [39]. The agent makes its decision regarding action

a_{t}

by utilizing the information of state

s_{t}

from the environment at the current time instant (

t

). At the following time instant (

t + 1

), the agent obtains a numerical reward

r_{t + 1}

and a new state

s_{t + 1}

as a result of taking action

a_{t}

. At each time instant, the agent acts in accordance with the policy

π_{t}

, which specifies probabilities of choosing from the feasible actions for each state. RL involves a sequential decision-making problem, where the objective is to maximize the cumulative rewards obtained through a series of interactions between an agent and the environment [4]. For continuing tasks, the return (Equation (1)) represents the total cumulative reward, while considering the discount rate

γ \in [0, 1]

.

R_{t} = r_{t + 1} + γ r_{t + 2} + γ^{2} r_{t + 3} + \dots = \sum_{k = 0}^{\infty} γ^{k} r_{t + k + 1}

(1)

where

R_{t}

denotes the return at time

t

,

γ

represents the discount rate, and

r_{t + k + 1}

corresponds to the reward at time

t + k + 1

. In RL, the action value function

Q (s, a)

(Equation (2)) defines the total amount of reward that the agent should expect to receive over the long term by choosing an action while in a specific state [39]. Specifically,

Q (s, a)

represents the expected return with a specific action in a given state

s

under policy

π

. Policy

π

maps each state-action pair

(s, a)

to the probability

π (s, a)

of taking action

a

when the environment (or system) is in state

s

[39].

Q (s, a) = E \{R_{t}| s_{t} = s, a_{t} = a\} = E \{\sum_{k = 0}^{\infty} γ^{k} r_{t + k + 1}| s_{t} = s, a_{t} = a\}

(2)

In Q-learning [39,40,41], an essential component is the Q-table, which is a lookup table that contains expected returns, known as Q-values, which are associated with specific actions in relation to a finite set of state-action

(s, a)

pairs. An optimal policy is obtained for

\forall s \in S, π^{*} (s) = \underset{a}{argmax} Q^{*} (s, a)

, where

S

is the set of states [42]. Experiences

(s, a, r, s^{'})

are used to update the Q-values, and this update follows the rule by the Bellman equation [4].

s^{'}

represents any states from the set of possible states at time

t + 1

. However, Q-learning is not well-suited for problems featuring a large number of state-action pairs, particularly when dealing with time-varying continuous states, due to the significant demands on memory storage and computation time required for Q-table updates [39,42].

Instead of using the Q-table, a function approximator, such as a deep neural network, can be used as follows:

Q (s, a) = Q (s, a; θ)

[38,43]. Here,

θ

denotes a parameter set of the approximator, and

Q (s, a; θ)

represents a Q network [43,44]. Mnih et al. [45] and Nair et al. [38] introduced a DQN that utilizes a pair of Q-networks: Q-network

Q (s, a; θ)

and target Q-network

\hat{Q} (s, a; θ^{-})

. Q-network

Q (s, a; θ)

is used to determine the optimal action and target Q-network

\hat{Q} (s, a; θ^{-})

to generate a target value to update

θ

of

Q (s, a; θ)

. In each iteration

i

,

Q (s, a; θ)

is updated to minimize the mean-square error with

\underset{a^{'}}{a r g m a x} \hat{Q} (s^{'}, a^{'}; θ^{-})

using the optimization of the loss function outlined in Equation (3) [38].

a^{'}

are all possible actions at

t + 1

. The target Q-network parameters

θ_{i}^{-}

are only updated with the Q-network parameters

θ

every

N

steps and are held fixed between individual updates [45].

L_{i} (θ_{i}) = E [{(r + γ \underset{a^{'}}{argmax} \hat{Q} (s^{'}, a^{'}; θ_{i}^{-}) - Q (s, a; θ_{i}))}^{2}]

(3)

In the context of a DQN, exploration involves the assessment of possible actions, while exploitation relies on knowledge from prior experience, and the

ϵ

-greedy method can be used to balance exploration and exploitation [4]. Furthermore, a DQN utilizes an experience replay that collects the agents’ experience

(s_{t}, a_{t}, r_{t}, s_{t + 1})

at each time step

t

and stores them in a dataset known as the replay memory [38,45]. Experience samples are randomly drawn from the replay memory for updating the Q-networks [45].

3. Application of the Methods

3.1. Target Building Model

In this study, an office building was selected from the EnergyPlus reference models (RefBldgLargeOfficeNew2004_Chicago.idf [24]) for ChatGPT pre-trained LLM control and DQN model-free control. The building featured a total floor area of 46,320 m², comprising 12 stories above ground and one underground floor. A 10-times multiplier, which is the modeling trick of EnergyPlus [46], was used to represent the 2nd to 11th floors by applying it to the 6th floor (Figure 1). The building includes four air handling units (AHUs), two electric chillers, a cooling tower, two pumps for the chilled water loop, and a condenser loop. To maintain an appropriate CO₂ concentration, the determination of the outdoor air (OA) volume delivered by each AHU was based on the maximum ventilation rate and its fraction, which varied from 0% to 100%.

The internal loads of the occupants, lighting, and equipment in the target building were the same as those of the EnergyPlus reference model. The cooling setpoint indoor air temperature and operating hours were set at 26 °C and 05:00 to 18:00, respectively. In particular, the indoor CO₂ concentration is influenced by the ventilation rate, infiltration rates, outdoor CO₂ concentration, and CO₂ generation rate per person. To practically consider stochastic occupancy [47], a random number generated using the Python numpy module within the range of 0.8 and 1.2 was multiplied by the occupant density at every time step (1 h). In this study, the cooling season was considered: 11 days (1 July to 11 July) were used to explore the optimal policy of the DQN, and 3 days (13 July to 15 July) were used to demonstrate the ChatGPT pre-trained LLM control and DQN model-free controls. Meanwhile, the authors determined the specifications of HVAC systems (Table 1) through EnergyPlus sizing simulation in order to ensure that the specifications remained unchanged during the policy exploration process of DQN.

The objective of the HVAC control was to reduce the energy use of HVAC systems while ensuring that indoor CO₂ levels remained below 1000 ppm. The energy use of HVAC systems included electrical energy used by two chillers, a cooling tower, two pumps, and four fans in each AHU. The state information of the target building and the control variables of the HVAC system utilized in the ChatGPT pre-trained LLM and DQN model-free controls were identical, as listed in Table 2 and Table 3, respectively.

3.2. ChatGPT Pre-Trained LLM Control

Online co-simulation was executed to integrate the target building with ChatGPT (Figure 2). The external interface in EnergyPlus 8.3 was used to define the state variables to be delivered (Table 2) and control variables to be received (Table 3). In Python 3.9, the socket module of the Building Controls Virtual Test Bed (BCVTB) was employed for receiving the states from EnergyPlus and transmitting the values corresponding to the actions determined by ChatGPT. At every time step, the authors provided ChatGPT with information on the current state of the building received through the BCVTB socket and requested a decision on the optimal value of the control variable to reduce building energy consumption while maintaining an indoor CO₂ concentration under 1000 ppm. Example 1 shows a query message format sent to ChatGPT; the values for the 16 variables (Var.1 to 16) within { } brackets are determined based on the values transferred and calculated through the BCVTB socket from EnergyPlus. The values of the control variables (

{a_{1}}^{*}

,

{a_{2}}^{*}

) received at each time step were transferred through the ChatGPT application programming interface (API) as EnergyPlus control variable values.

OpenAI provides a Chat Completion API [48], which can employ GPT 4.0 to facilitate interactive and dynamic conversations through natural language processing, enabling responsive dialogue generation. In the context of the Chat Completion API, temperature is a parameter that ranges between 0 and 1 and influences the randomness of the GPT’s answer. A higher temperature value, closer to 1, generates more diverse and imaginative outputs by allowing greater randomness. Conversely, a value closer to 0 produces more deterministic and focused responses, resulting in less variation in the generated text. In this study, among the parameters of the Chat Completion API, the model and temperature were set to ‘gpt-4’ and ‘0’, respectively, while the other parameters were not modified from their default values. In this study, ChatGPT pre-trained LLM control utilized the OpenAI-provided API directly without training the traditional AI or machine learning (ML) models.

Example 1.

As a building operator, I’m responsible for checking the indoor and outdoor environmental conditions, monitoring the energy usage of HVAC systems, and adjusting the control variables for HVAC systems energy hour. Here in Korea, it’s a weekday in July, which is dominated by cooling load.

Specifically, I need to control the outdoor damper opening ratio of the air handling units and the leaving chilled water temperature of the chiller. Currently, the outdoor air temperature has {Var.1: ‘increased’ or ‘decreased’} by {Var.2:

s_{1, t - 1} - s_{1, t - 2}

} °C to {Var.3:

s_{1, t - 1}

} °C, and the indoor CO₂ concentration has {Var.4: ‘increased’ or ‘decreased’} by {Var.5:

(

average from

s_{2, t - 1}

to

s_{17, t - 1}) - (

average from

s_{2, t - 2}

to

s_{17, t - 2}

} ppm to {Var.6: average from

s_{2, t - 1}

to

s_{17, t - 1}

} ppm, and the energy usage has {Var.7: ‘increased’ or ‘decreased’} by {Var.8:

(

sum of

s_{18, t - 1}

to

s_{22, t - 1}) - (

sum of

s_{18, t - 2}

to

s_{22, t - 2})

} kWh to {Var.9: sum of

s_{18, t - 1}

to

s_{22, t - 1}

} kWh, and the entering chilled water temperature of the chiller has {Var.10: ‘increased’ or ‘decreased’} by {Var.11:

s_{24, t - 1} - s_{24, t - 2}

} °C to {Var.12:

s_{24, t - 1}

} °C, and the temperature difference between entering and leaving chilled water temperature of the chiller has {Var.13: ‘increased’ or ‘decreased’} by {Var.14:

(s_{24, t - 1} - s_{23, t - 1}) - (s_{24, t - 2} - s_{23, t - 2})

} °C, in comparison to an hour ago.

The current setpoints for the outdoor damper opening ratio and chilled water temperature of the chiller are {Var.15:

a_{2, t - 1}

} % and {Var.16:

a_{1, t - 1}

} °C, respectively.

Unfortunately, the indoor air temperature setpoint cannot be changed, so the chiller’s energy usage varies based on the cooling load changes caused by indoor and outdoor environmental factors. In addition, as the outdoor damper opening ratio increases, the indoor CO₂ concentration decreases.

The building has to be maintained the indoor CO₂ concentration within 1000 ppm by changing the outdoor damper opening ration. And the building’s energy consumption should be reduced as much as possible by changing the leaving chilled water temperature.

The important thing is that to maintain the indoor CO₂ concentration levels below a certain threshold, it is possible to increase the ventilation rate by raising the damper opening, but necessary ventilation should be minimized, as it can also increase cooling load at the same time.

I need your help to select an optimal value for the outdoor damper opening ratio from 40%, 50%, 60%, and 70%. Additionally, please choose an optimal value for the leaving chilled water temperature of the chiller from 6 °C, 7 °C, 8 °C, 9 °C, and 10 °C.

I would prefer if you could inform me of challenging control settings that focus on energy saving, rather than conventional and stable ones, when it comes to decision-making. However, if you have reached a judgment in a specific situation, I would appreciate it if you could provide consistent judgment results when faced with similar situations.

For your information, the baseline setpoints for the outdoor damper opening ratio and the leaving chilled water temperature of the chiller are 50% and 6 °C, respectively. Please note the optimal values you have determined. Please note the optimal values you have determined as follows:

–: Outdoor Damper Opening Ration: (optimal value),
–: Leaving Chilled Water Temperature: (optimal value)

3.3. DQN Model-Free Control

The objective of the DQN model-free control is identical to that of the ChatGPT pre-trained LLM control. In this study, the DQN explored the optimal policy using twenty-four states (Table 2), two actions (Table 3), and rewards gathered from iterative experiences. The reward was set as the total building energy used by two chillers, a cooling tower fan, and two pumps. Furthermore, a weighting factor of 1.5 was applied to penalize any action that led to an indoor CO₂ concentration exceeding 1000 ppm.

The optimal policy of the DQN was explored to determine actions that could minimize the reward and total building energy use at each 1 h time step between 5:00 and 18:00 over an 11-day period from 1 July to 11 July. A sequence of EnergyPlus simulations conducted continuously from 1 July to 11 July was considered a single episode, and a total of 500 episodes were executed to investigate the optimal policy for the DQN. The unused 3 days from 13 July to 15 July, which were not employed for policy exploration, were utilized to test the performance of the DQN model-free control, and the 3 days of the testing period were applied in the same manner as the GPT pre-trained LLM control.

The Q network is composed of two hidden layers, and each hidden layer is configured with 30 neurons. The activation function was the rectified linear unit (LU) function, and the Q-network’s parameter

θ

was iteratively updated by applying the Adam optimization method, which is particularly appropriate for deep neural networks with numerous layers and neurons [42]. The initial value of the

ϵ

-greedy method was set to 1.0. During the policy exploration period from 5:00 to 18:00 (operation hours) in 500 episodes, it gradually decayed by multiplying it by 0.9999. The DQN replay memory removed the oldest experience as new ones were received, maintaining an experience history of 1430 time steps (13 h from 5:00 to 18:00 × 11 days × 10 episodes).

DQN model-free control was implemented using the same method as the ChatGPT pre-trained LLM control through online co-simulation (Figure 3). At every time step of the operation hours in all episodes, the

θ

of the Q-network

Q (s, a; θ)

was trained to minimize the loss function (Equation (2)), and the optimal action

a^{*} (\underset{a^{*}}{argmax} Q (s^{'}, a^{*}; θ)

) was determined. At the end of each episode, the

θ^{-}

values of the target Q-network

\hat{Q} (s, a; θ^{-})

were synchronized with the

θ

values of the Q-network

Q (s, a; θ)

.

4. Results

In contrast to the ChatGPT pre-trained LLM control, which utilizes a pre-trained language model and requires no additional training, the DQN model-free control requires iterative experience to explore the optimal policy. Figure 4 shows the return (Equation (1)), which is the cumulative rewards at each time step within each episode. The return started at an initial value of 35,563 and decreased by 11.87%, reaching 31,343 at the 500th episode. The maximum return occurred in the 17th episode, with a value of 36,270, whereas the minimum return was observed in the 351st episode, with a value of 31,113. The decreasing pattern of returns indicates the DQN’s ability to learn autonomously, although fluctuations in the return curve could be influenced by the randomness exploration of the policy through the

ϵ

-greedy method (Section 3.3) and the randomness of occupancy (Section 3.1). In this study, we adopted the policy of the 498th episode, which had sufficient experience with nearly 500 episodes and a relatively low return of 31,141.

To investigate the ChatGPT pre-trained LLM control and the DQN model-free control, a baseline operation was executed with a set-point chilled-water temperature of 6 °C and OA damper opening rates (

a_{2}

) of 50%. The performances of three different operation models—baseline, ChatGPT pre-trained LLM control, and DQN model-free control—were compared over a three-day testing period (13 July to 15 July). Table 4 and Table 5 show values for 15 variables received in the real-time EnergyPlus simulation, which were automatically inserted into the query message format (Example 1) for ChatGPT. Except for values related to outdoor air temperature (Var.1, Var.2, Var.3), the other values were determined through EnergyPlus simulations based on ChatGPT’s decision-making. For example, the indoor CO₂ concentration value (Var.6) and its change compared to the previous time step (Var.4, Var.5) were determined based on the set-point for the OA damper opening ratio by ChatGPT (Var.15,

a_{2}

). Additionally, the energy usage (Var.9) and its changed values compared to the previous time step (Var.7, Var.8) were determined according to the set-point for the chilled water temperature of the chiller (Var.16,

a_{1}

). Additionally, in accordance with ChatGPT’s decisions influencing the cooling load of the target building, the values of the entering chilled water temperature of the chiller (Var.12) and its change (Var.10, Var.11), as well as the temperature difference between entering and leaving chilled water temperature of the chiller (Var.14) and its change (Var.13), were determined.

Examples 2 to 7 show the response samples of the GPT obtained by asking the question as in Example 1, Table 4 and Table 5 at each time step. ChatGPT formulated responses based on the information contained in the question, leveraging both pre-trained knowledge and the information mentioned in the query message, including data from the specific time step and the immediately preceding time step. Consequently, when energy usage increased in the previous time step, ChatGPT determined the leaving chilled water temperature of the chiller (Var.16,

a_{1}

) to reduce energy usage (from 8 °C in Example 2 to 10 °C in Example 3; from 7 °C in Example 6 to 10 °C in Example 7). Conversely, considering the decrease in energy usage and increase in the entering chilled water temperature, ChatGPT assessed the chillers’ operational status and determined the leaving chilled water temperature of the chiller for energy conservation (from 10 °C in Example 4 to 8 °C in Example 5). The OA damper opening ratio (Var.15,

a_{2}

) was determined by comparing the current CO₂ concentration with a threshold of 1000 ppm.

Example 2.

Based on the given information the indoor CO₂ concentration is still within the acceptable range of 1000 ppm but it has increased. To control this we can increase the outdoor damper opening ratio. However considering the energy saving focus we should not increase it too much. Therefore an optimal value for the outdoor damper opening ratio could be 60%.

As for the leaving chilled water temperature of the chiller the current setpoint is 9.0 °C which is higher than the baseline setpoint of 6 °C. The chiller’s energy usage has decreased and the temperature difference between entering and leaving chilled water temperature of the chiller has also decreased. This suggests that the chiller is currently operating more efficiently. To further reduce energy consumption we can decrease the leaving chilled water temperature. However a too drastic decrease might increase the energy usage again. Therefore an optimal value for the leaving chilled water temperature could be 8 °C.

Please note that these are challenging control settings focusing on energy saving and they might need to be adjusted based on the actual performance and further changes in environmental conditions.

–: Outdoor Damper Opening Ratio: 60%
–: Leaving Chilled Water Temperature: 8 °C

Example 3.

–: Outdoor Damper Opening Ratio: 70%
–: Leaving Chilled Water Temperature: 10 °C

Explanation:

The indoor CO₂ concentration has increased significantly and it’s still within the acceptable range but getting closer to the upper limit of 1000 ppm. Therefore increasing the outdoor damper opening ratio to 70% would help to decrease the CO₂ concentration by increasing the ventilation rate.

On the other hand the chiller’s energy usage has also increased significantly. To reduce the energy consumption increasing the leaving chilled water temperature to 10 °C would help. This is because a higher leaving water temperature means the chiller doesn’t have to work as hard to cool the water thus saving energy.

These settings are more aggressive in terms of energy saving and might pose some challenges in maintaining the indoor environmental conditions. However they should be effective in reducing the energy consumption while maintaining the indoor CO₂ concentration within the acceptable range.

Please monitor the indoor and outdoor conditions closely after applying these settings and adjust them as necessary.

Example 4.

–: Outdoor Damper Opening Ratio: 70%
–: Leaving Chilled Water Temperature: 10 °C

Explanation:

The outdoor damper opening ratio is increased to 70% to ensure the indoor CO₂ concentration is maintained below the threshold of 1000 ppm. This is a higher value than the baseline but it is necessary to ensure adequate ventilation and maintain indoor air quality.

The leaving chilled water temperature is set to 10 °C which is higher than the baseline and current setpoint. This is to reduce the energy consumption of the chiller. A higher leaving chilled water temperature means the chiller doesn’t have to work as hard to cool the water thus saving energy. However this might slightly increase the indoor temperature but as long as it’s within acceptable comfort levels it’s a worthwhile trade-off for energy savings.

These settings are more challenging and focus on energy saving as requested. They might require more frequent monitoring and adjustments but they should help to achieve the goal of reducing energy consumption while maintaining indoor air quality.

Example 5.

–: Outdoor Damper Opening Ratio: 60%
–: Leaving Chilled Water Temperature: 8 °C

Explanation:

The outdoor air temperature has increased which means the cooling load will also increase. However the indoor CO₂ concentration has decreased which means we can afford to decrease the damper opening ratio a bit to save energy. However considering the baseline setpoint of 50% and the current setpoint of 70% a damper opening ratio of 60% seems to be a good compromise between maintaining indoor air quality and energy saving.

On the other hand the chiller’s energy usage has decreased and the entering chilled water temperature has increased which means the chiller is not working as hard as before. Therefore we can afford to increase the leaving chilled water temperature a bit to save energy. Considering the baseline setpoint of 6 °C and the current setpoint of 10 °C a leaving chilled water temperature of 8 °C seems to be a good compromise between maintaining the cooling load and energy saving.

These settings are more challenging than the conventional ones as they require a careful balance between maintaining indoor air quality meeting the cooling load and saving energy. However they should provide consistent results in similar situations.

Example 6.

–: Outdoor Damper Opening Ratio: 60%
–: Leaving Chilled Water Temperature: 7 °C

Explanation:

The indoor CO₂ concentration has increased significantly but it is still below the threshold of 1000 ppm. Therefore it is necessary to increase the outdoor damper opening ratio to 60% to ensure adequate ventilation and maintain the CO₂ concentration within the acceptable range.

On the other hand the chiller’s energy usage has decreased and the temperature difference between entering and leaving chilled water temperature of the chiller has increased indicating that the chiller is operating more efficiently. Therefore it is possible to reduce the leaving chilled water temperature to 7 °C to further decrease the energy consumption.

This control setting is more challenging than the conventional one as it requires a careful balance between maintaining indoor air quality and energy saving. However it is consistent with the goal of reducing energy consumption as much as possible while ensuring the indoor environmental conditions are within acceptable levels.

Example 7.

–: Outdoor Damper Opening Ratio: 70%
–: Leaving Chilled Water Temperature: 10 °C

Explanation:

The outdoor damper opening ratio is increased to 70% to ensure the indoor CO₂ concentration is maintained below 1000 ppm. This is a higher value than the baseline but it is necessary to ensure adequate ventilation and maintain indoor air quality.

The leaving chilled water temperature is set to 10 °C which is higher than the baseline and current setpoint. This is to reduce the energy consumption of the chiller. A higher leaving chilled water temperature means the chiller doesn’t have to work as hard to cool the water thus saving energy. However this might slightly increase the indoor temperature but as the setpoint for indoor temperature cannot be changed this is a compromise that needs to be made for energy saving.

These settings are more challenging and focus on energy saving as requested. They might not provide the same level of comfort as the conventional settings but they will help to reduce the building’s energy consumption.

Please note that these settings are based on the current conditions and might need to be adjusted if the conditions change. For example if the outdoor temperature or the indoor CO₂ concentration increases significantly the damper opening ratio might need to be increased further or the leaving chilled water temperature might need to be decreased.

Figure 5 shows the control results of the ChatGPT pre-trained LLM and DQN model-free controls. For the set-point chilled water temperature (

a_{1}

), the DQN model-free control operated with a policy of maintaining a maximum allowable temperature (Table 3) of 10 °C at all times to reduce energy use. In contrast, the ChatGPT pre-trained LLM control operated by initially raining the set-point chilled water temperature to 10 °C, then lowering it to 7 or 8 °C and raising it back to 10 °C in a cyclic manner. In terms of the OA damper opening rate (

a_{2}

), ChatGPT pre-trained LLM control employed a continuous control approach, alternating between a 50% and 60% setting or between a 60% and 70% setting, whereas DQN model-free control adhered to a baseline operational condition of 50%. DQN model-free control transitioned to a 60% opening setting, starting at 16:00.

Figure 6 and Table 6 show the results of the total building energy use under baseline operation, ChatGPT pre-trained LLM control, and DQN model-free control. Because the ChatGPT pre-trained control repeated the process of raising and lowering the set-point chilled water temperature in a cyclic manner ((Figure 5a) Examples 2 to 7) based on information from the current and immediately preceding time step (Example 1, Table 4 and Table 5), the energy-use pattern also exhibited a recurring cycle of high and low trends. However, the DQN model-free control maintained the set-point chilled water temperature at its maximum value of 10 °C, sustaining the total energy use at approximately 350 kWh during operational hours (Figure 6a). In consequence, the difference in the energy use between the baseline and ChatGPT pre-trained LLM controls was 16.8% and that of the DQN model-free control was 24.1% (Table 5). The ChatGPT pre-trained LLM control and DQN model-free control satisfied the requirement to maintain the CO₂ concentration below 1000 ppm during building operational hours. In particular, the DQN model-free control managed to closely maintain an indoor CO₂ concentration below the 1000 ppm threshold while achieving its objective. By contrast, the ChatGPT pre-trained LLM control maintained the indoor CO₂ concentration at significantly lower levels compared to that of the DQN model-free control, resulting in much higher ventilation rates (Figure 5b). This is attributed to the DQN’s improvement in its policy, which enables it to make decisions at each time step to maximize the cumulative long-term reward (Equation (1)). By contrast, ChatGPT was limited to considering only the information from the previous and current time steps, which were constrained by the specific queries defined by the authors in this study (Example 1).

However, it is important to highlight that in this study, DQN model-free control results were derived from its policy refined over 500 repeated episodes. However, ChatGPT pre-trained LLM control results were obtained from its application without additional training or development, leveraging its pre-trained language model capabilities. In other words, despite ChatGPT pre-trained LLM control not utilizing thermodynamic and numerical analysis, simulation models, or iterative experiences like reinforcement learning for the target building, it relies solely on a pre-trained extensive dataset and saves only 7.3% less energy than the DQN model-free control. Particularly, while the performance of the DQN model-free control may be superior to that of the ChatGPT pre-trained LLM control in terms of goal achievement, in the practical aspect of real-world applications, the high generalization capabilities of the ChatGPT pre-trained LLM control, resulting from its extensive training on vast and diverse data, could potentially make it more effective.

5. Limitations

Mnih et al. [45] demonstrated that a DQN can learn and achieve human-level control by rescaling Atari 2600 game frames from 210

\times

160-pixel images with a 128-color palette to 84

\times

84-pixel images with a Y channel (luminance) and stacking the four most recent frames to represent states. Therefore, the DQN, which can approximate high-dimensional states and actions, has demonstrated its applicability in real-world domains, such as autonomous driving and robot control [49], game AI [45,50], resource optimization, and energy management [51]. In this study, ChatGPT and a DQN were used to make optimal sequential decisions using twenty-four states (Table 2) and two actions (Table 3). Online co-simulations were implemented to couple the simulation model with ChatGPT and the DQN. Therefore, considering real-world application cases [45,49,50,51], even when dealing with high-dimensional building data, it is expected that the DQN can be effectively employed for complex HVAC control problems.

Meanwhile, despite setting the temperature parameter to zero in the Chat Completion API to reduce variations in GPT’s responses, the format of the answers was not consistently the same at every request (Examples 2 to 4). In addition, while the responses (Example 2) were based on quantified values from the current building operational information provided by every question (Example 1), there were answers based on theoretical backgrounds (Examples 2 to 7). In particular, ChatGPT developed for a conversational chatbot model may be unsuitable for precise or quantified decision-making problems involving extensive, high-dimensional information. Although the ChatGPT pre-trained LLM control may not represent the most optimal operation fitted to the target building compared to the DQN model-free control, it shows the feasibility of operating based on theoretical foundations. In other words, ChatGPT pre-trained LLM control can be implemented efficiently in terms of time and cost compared to other HVAC control methods, as it can target various types of buildings or systems without the need for thermodynamic and numerical analysis, simulation models, optimization solvers, or iterative experiences due to its theoretical foundation-based generalization performance in real-word applications. In addition, OpenAI supports fine-tuning [52] to provide ChatGPT with opportunities to enhance its performance in specific domains and applications. The fine-tuning of ChatGPT enables it to generate more precise and domain-specific answers. However, it is important to consider the risk analysis for decision-making in ChatGPT pre-trained LLM control since the outcomes of control commands could impact occupant comfort and system stability.

6. Conclusions

This study utilized ChatGPT and DQN to control HVAC systems with the aim of reducing building energy use while maintaining a CO₂ concentration below 1000 ppm. Both ChatGPT pre-trained LLM control and DQN model-free control were employed to make decisions regarding the HVAC system control at each time step through online co-simulation with the reference office building. ChatGPT was directly utilized through the Chat Completion API provided by OpenAI, which offers a pre-developed language model. The DQN repeated 500 iterations of episodes over an 11-day period to improve its policy. The performances of both control methods were evaluated over 3 days, resulting in energy savings of 16.8% for ChatGPT pre-trained LLM control and 24.1% for DQN model-free control.

In this study, the DQN model-free control demonstrated lower energy-saving performance than the ChatGPT pre-trained LLM control. However, the ChatGPT pre-trained LLM control showed the ability for plausible decision-making using real-time collected building operational and pre-learned relevant domain background knowledge without the need for a traditional training process for developing AI or ML models. In future research, we aim to investigate the fine-tuning of the ChatGPT model to apply it to more specialized HVAC control tasks and to effectively manage high-dimensional building state information.

Author Contributions

K.U.A. designed and implemented the study. K.U.A., D.-W.K., H.M.C. and C.-U.C. described the results of this study. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure, and Transport (Grant RS-2020-KA157018).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data supporting the results reported in this study will be available upon request from the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

IEA. Policy Pathways: Modernizing Building Energy Codes. 2013. Available online: https://www.iea.org/reports/policy-pathway-modernising-building-energy-codes-2013 (accessed on 26 August 2023).
Li, X.; Wen, J. Review of Building Energy Modeling for Control and Operation. Renew. Sustain. Energy Rev. 2014, 37, 517–537. [Google Scholar] [CrossRef]
Drgoňa, J.; Arroyo, J.; Cupeiro Figueroa, I.C.; Blum, D.; Arendt, K.; Kim, D.; Ollé, E.P.; Oravec, J.; Wetter, M.; Vrabie, D.L.; et al. All You Need to Know About Model Predictive Control for Buildings. Annu. Rev. Control 2020, 50, 190–232. [Google Scholar] [CrossRef]
Ahn, K.U.; Park, C.S. Application of Deep Q-Networks for Model-Free Optimal Control Balancing Between Different HVAC Systems. Sci. Technol. Built Environ. 2020, 26, 61–74. [Google Scholar] [CrossRef]
Shaikh, P.H.; Nor, N.B.M.; Nallagownden, P.; Elamvazuthi, I.; Ibrahim, T. A Review on Optimized Control Systems for Building Energy and Comfort Management of Smart Sustainable Buildings. Renew. Sustain. Energy Rev. 2014, 34, 409–429. [Google Scholar] [CrossRef]
Wang, S.W.; Ma, Z.J. Supervisory and Optimal Control of Building HVAC Systems: A Review. HVAC&R Res. 2008, 14, 3–32. [Google Scholar] [CrossRef]
Yun, K.; Luck, R.; Mago, P.J.; Cho, H. Building Hourly Thermal Load Prediction Using an Indexed ARX Model. Energy Build. 2012, 54, 225–233. [Google Scholar] [CrossRef]
Eisenhower, B.; O’Neill, Z.; Narayanan, S.; Fonoberov, V.A.; Mezić, I. A Methodology for Meta-model Based Optimization in Building Energy Models. Energy Build. 2012, 47, 292–301. [Google Scholar] [CrossRef]
Prívara, S.; Cigler, J.; Váňa, Z.; Oldewurtel, F.; Sagerschnig, C.; Žáčeková, E. Building Modeling as a Crucial Part for Building Predictive Control. Energy Build. 2013, 56, 8–22. [Google Scholar] [CrossRef]
Corbin, C.D.; Henze, G.P.; May-Ostendorp, P. A Model Predictive Control Optimization Environment for Real-Time Commercial Building Application. J. Build. Perform. Simul. 2013, 6, 159–174. [Google Scholar] [CrossRef]
Afram, A.; Janabi-Sharifi, F. Theory and Applications of HVAC Control Systems—A Review of Model Predictive Control (MPC). Build. Environ. 2014, 72, 343–355. [Google Scholar] [CrossRef]
Busoniu, L.; Babuska, R.; Schutter, B.D.; Ernst, D. Reinforcement Learning and Dynamic Programming Using Function Approximators, Automation and Control Engineering; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
Valladares, W.; Galindo, M.; Gutiérrez, J.; Wu, W.-C.; Liao, K.-K.; Liao, J.-C.; Lu, K.-C.; Wang, C.-C. Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Build. Environ. 2019, 155, 105–117. [Google Scholar] [CrossRef]
Azuatalam, D.; Lee, W.L.; de Nijs, F.; Liebman, A. Reinforcement Learning for Whole-Building HVAC Control and Demand Response. Energy AI 2020, 2, 100020. [Google Scholar] [CrossRef]
Gill, S.S.; Kaur, R. ChatGPT: Vision and Challenges. Internet. Things Cyber-Phys. Syst. 2023, 3, 262–271. [Google Scholar] [CrossRef]
Thorp, H.H. ChatGPT Is Fun, but Not an Author. Science 2023, 379, 313. [Google Scholar] [CrossRef]
OpenAI. ChatGPT. 2023. Available online: https://openai.com/chatgpt (accessed on 26 August 2023).
OpenAI. Introducing ChatGPT. 2023. Available online: https://openai.com/blog/chatgpt (accessed on 26 August 2023).
Agathokleous, E.; Saitanis, C.J.; Fang, C.; Yu, Z. Use of ChatGPT: What Does It Mean for Biology and Environmental Science? Sci. Total Environ. 2023, 888, 164154. [Google Scholar] [CrossRef] [PubMed]
Van Dis, E.A.M.; Bollen, J.; Zuidema, W.; van Rooij, R.; Bockting, C.L. ChatGPT: Five Priorities for Research. Nature 2023, 614, 224–226. [Google Scholar] [CrossRef]
Vaishya, R.; Misra, A.; Vaish, A. ChatGPT: Is This Version Good for Healthcare and Research? Diabetes Metab. Syndr. 2023, 17, 102744. [Google Scholar] [CrossRef]
Blanchard, F.; Assefi, M.; Gatulle, N.; Constantin, J.M. ChatGPT in the World of Medical Research: From How It Works to How to Use It. Anaesth. Crit. Care Pain Med. 2023, 42, 101231. [Google Scholar] [CrossRef] [PubMed]
Currie, G.; Singh, C.; Nelson, T.; Nabasenja, C.; Al-Hayek, Y.; Spuur, K. ChatGPT in Medical Imaging Higher Education. Radiography 2023, 29, 792–799. [Google Scholar] [CrossRef] [PubMed]
DOE. EnergyPlus. 2019. Available online: https://energyplus.net (accessed on 26 August 2023).
Taecharungroj, V. “What Can ChatGPT Do?” Analyzing Early Reactions to the Innovative AI Chatbot on Twitter. Big Data Cogn. Comput. 2023, 7, 35. [Google Scholar] [CrossRef]
Zhang, M.; Li, J. A Commentary of GPT-3 in MIT Technology Review 2021. Fundam. Res. 2021, 1, 831–833. [Google Scholar] [CrossRef]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pretraining. Available online: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 26 August 2023).
Dale, R. GPT-3: What’s It Good For? Nat. Lang. Eng. 2021, 27, 113–118. [Google Scholar] [CrossRef]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 2019. Available online: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (accessed on 26 August 2023).
Floridi, L.; Chiriatti, M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
OpenAI. Model Index for Researchers. 2023. Available online: https://platform.openai.com/docs/model-index-for-researchers (accessed on 26 August 2023).
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training Language Models to Follow Instructions with Human Feedback. arXiv 2022, arXiv:2203.02155. [Google Scholar]
Verma, P.; Lerman, R. What Is ChatGPT, the Viral Social Media AI? 2023. Available online: https://www.washingtonpost.com/technology/2022/12/06/what-is-chatgpt-ai/ (accessed on 4 January 2023).
Roose, K. The Brilliance and Weirdness of ChatGPT. 2023. Available online: https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html (accessed on 4 January 2023).
OpenAI. GPT-4. 2023. Available online: https://openai.com/research/gpt-4 (accessed on 26 August 2023).
OpenAI. Documentation. 2023. Available online: https://platform.openai.com/docs/models/overview (accessed on 26 August 2023).
Nair, A.; Srinivasan, P.; Blackwell, S.; Alcicek, C.; Fearon, R.; De Maria, A.; Panneershelvam, V.; Suleyman, M.; Beattie, C.; Petersen, S.; et al. Massively Parallel Methods for Deep Reinforcement Learning. arXiv 2015, arXiv:1507.04296. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning an Introduction, a Bradford Book; MIT Press: Cambridge, UK, 1998. [Google Scholar]
Yang, L.; Nagy, Z.; Goffin, P.; Schlueter, A. Reinforcement Learning for Optimal Control of Low Exergy Buildings. Appl. Energy 2015, 156, 577–586. [Google Scholar] [CrossRef]
Cheng, Z.J.; Zhao, Q.C.; Wang, F.L.; Jiang, Y.; Xia, L.; Ding, J.L. Satisfaction Based Q-Learning for Integrated Lighting and Blind Control. Energy Build. 2016, 127, 43–55. [Google Scholar] [CrossRef]
Mocanu, E.; Nguyen, P.H.; Kling, W.L.; Gibescu, M. Unsupervised Energy Prediction in a Smart Grid Context Using Reinforcement Cross-Building Transfer Learning. Energy Build. 2016, 116, 646–655. [Google Scholar] [CrossRef]
Kumar, R.; Aggarwal, R.K.; Sharma, J.D. Energy Analysis of a Building Using Artificial Neural Network: A Review. Energy Build. 2013, 65, 352–358. [Google Scholar] [CrossRef]
Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control Through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
DOE. Tips and Tricks for Using EnergyPlus. 2019b. Available online: https://energyplus.net/documentation (accessed on 26 August 2023).
Ahn, K.U.; Kim, D.W.; Park, C.S.; de Wilde, P. Predictability of Occupant Presence and Performance Gap in Building Energy Simulation. Appl. Energy 2017, 208, 1639–1652. [Google Scholar] [CrossRef]
OpenAI. Chat Completion API. Available online: https://platform.openai.com/docs/guides/gpt/chat-completions-api (accessed on 26 August 2023).
Bojarski, M.B.; Firner, B.; Flepp, B.; Jackel, L.; Muller, U.; Zieba, K.; Testa, D.D. End-to-end deep learning for self-driving cars. Nvidia, 17 August 2016. Available online: https://devblogs.nvidia.com/parallelforall/deep-learning-self-driving-cars/ (accessed on 26 August 2023).
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Shaw, R.; Howley, E.; Barrett, E. Applying Reinforcement Learning Towards Automating Energy Efficient Virtual Machine Consolidation in Cloud Data Centers. Inf. Syst. 2022, 107, 101722. [Google Scholar] [CrossRef]
OpenAI. Fine-Tuning. 2023. Available online: https://platform.openai.com/docs/guides/fine-tuning (accessed on 26 August 2023).

Figure 1. Target building.

Figure 2. Co-simulation of EnergyPlus and ChatGPT.

Figure 3. Co-simulation of EnergyPlus and DQN (modified from [4,38]).

Figure 4. DQN return.

Figure 5. Comparison of control results. (a) Set-point chilled water temperature (

a_{1}

); (b) OA damper opening rate for four AHUs (

a_{2}

).

Figure 5. Comparison of control results. (a) Set-point chilled water temperature (

a_{1}

); (b) OA damper opening rate for four AHUs (

a_{2}

).

Figure 6. Comparison of operation results. (a) Total energy usage; (b) CO₂ concentration.

Table 1. Descriptions of the target building.

Category	Description
Total floor area	46,320 m²
Stories	Basement: one floor; above ground: 12 floors (10 times multiplier was applied to 6th floor)
Occupancy	Basement: 37.16 m²/people; above ground: 18.58 m²/people
Lighting density	10.76 W/m²
Equipment density	10.76 W/m²
HVAC systems	4 AHUs for basement (20,124 CMH), 1st floor (47,664 CMH), 6th floor (506,304 CMH), and 12th floor50,220), 2 electric chillers (each 504 USRT), a cooling tower (1007 USRT), two pumps for chilled water loop (7697 LPM) and condenser loop (10,893 LPM)
Operation hours	05:00 to 18:00

Table 2. State information on the target building.

Index	State	Unit
$s_{1}$	Temperature of outdoor air	°C
$s_{2} - s_{17}$	CO₂ concentration of 16 zones ¹	ppm
$s_{18}$ , $s_{19}$	Electrical energy use of two chillers	kWh
$s_{20}$	Electrical energy use of a cooling tower fan	kWh
$s_{21}$ , $s_{22}$	Electrical energy use of two pumps	kWh
$s_{23}$	Temperature of chilled water leaving at two chillers	°C
$s_{24}$	Temperature of chilled water entering to two chillers	°C

¹ 5 zones for each above ground floor (1st floor + 6th floor + 12th floor) and 1 zone for the underground floor.

Table 3. Control variables of the HVAC system.

Index	Control Variable	Unit	Value
$a_{1}$	Set-point chilled water temperature	°C	6, 7, 8, 9, 10
$a_{2}$	OA damper opening rate for 4 AUHs	%	40, 50, 60, 70

Table 4. Values for 16 variables in message format (Var.1 to Var.9 in Example 1).

Date/Time	Values for 16 Variables Sent to GPT
Date/Time	Var.1	Var.2	Var.3	Var.4	Var.5	Var.6	Var.7	Var.8	Var.9
07/13 06:00	decreased	−0.3	23.2	decreased	−13.2	693.0	increased	0.0	0.0
07/13 07:00	decreased	−0.2	23.0	increased	117.9	810.9	increased	343.0	343.0
07/13 08:00	increased	0.0	23.0	decreased	−44.1	766.8	increased	75.9	418.9
07/13 09:00	decreased	−0.1	22.9	decreased	−77.4	689.4	decreased	−86.4	332.5
07/13 10:00	increased	0.0	22.9	increased	68.6	758.0	decreased	−49.5	283.0
07/13 11:00	increased	0.0	22.9	increased	66.1	824.1	increased	163.1	446.1
07/13 12:00	increased	0.1	23.0	increased	14.8	838.9	decreased	−142.4	303.8
07/13 13:00	increased	0.0	23.0	increased	28.5	867.4	increased	161.0	464.8
07/13 14:00	decreased	−0.1	22.9	decreased	−15.8	851.7	decreased	−196.5	268.2
07/13 15:00	increased	0.0	22.9	increased	21.9	873.5	increased	171.7	439.9
07/13 16:00	decreased	−0.1	22.8	increased	26.9	900.4	decreased	−28.2	411.7
07/13 17:00	decreased	−0.1	22.7	increased	10.0	910.4	increased	59.5	471.3
07/13 18:00	decreased	−0.2	22.5	increased	28.5	938.9	decreased	−198.9	272.4
07/14 06:00	increased	0.1	22.4	decreased	−13.1	698.0	increased	0.0	0.0
07/14 07:00	increased	0.2	22.6	increased	114.7	812.7	increased	345.2	345.2
07/14 08:00	increased	0.3	22.9	decreased	−42.8	769.9	increased	73.2	418.4
07/14 09:00	increased	0.2	23.1	decreased	−78.8	691.1	decreased	−75.0	343.4
07/14 10:00	increased	0.3	23.4	increased	67.5	758.6	increased	63.9	407.2
07/14 11:00	increased	0.3	23.7	increased	60.6	819.2	increased	11.5	418.7
07/14 12:00	increased	0.4	24.1	increased	18.1	837.3	decreased	−81.0	337.7
07/14 13:00	increased	0.3	24.4	increased	25.9	863.2	increased	89.4	427.1
07/14 14:00	increased	0.8	25.2	decreased	−61.1	802.1	decreased	−79.2	347.9
07/14 15:00	increased	0.7	25.9	increased	18.3	820.3	increased	89.0	436.9
07/14 16:00	increased	0.8	26.7	increased	17.7	838.0	decreased	−79.6	357.3
07/14 17:00	decreased	−0.8	25.9	increased	31.1	869.1	increased	84.1	441.4
07/14 18:00	decreased	−0.8	25.1	increased	6.9	876.1	decreased	−92.9	348.5
07/15 06:00	increased	0.0	23.9	decreased	−12.7	686.3	increased	0.0	0.0
07/15 07:00	decreased	−0.1	23.8	increased	114.4	800.7	increased	346.9	346.9
07/15 08:00	increased	0.3	24.1	decreased	−42.7	758.0	increased	77.9	424.8
07/15 09:00	increased	0.4	24.5	decreased	−41.8	716.2	decreased	−84.7	340.2
07/15 10:00	increased	0.3	24.8	increased	85.9	802.0	increased	80.7	420.9
07/15 11:00	increased	0.5	25.3	increased	70.5	872.6	decreased	−3.9	416.9
07/15 12:00	increased	0.5	25.8	increased	18.8	891.3	increased	44.6	461.6
07/15 13:00	increased	0.5	26.3	decreased	−8.6	882.8	decreased	−112.8	348.7
07/15 14:00	decreased	−0.5	25.8	decreased	−49.8	832.9	increased	147.0	495.7
07/15 15:00	decreased	−0.5	25.3	increased	30.0	863.0	decreased	−137.5	358.2
07/15 16:00	decreased	−0.5	24.8	increased	22.3	885.2	increased	20.0	378.2
07/15 17:00	decreased	−0.1	24.7	decreased	−2.8	882.4	decreased	−23.8	354.4
07/15 18:00	decreased	−0.1	24.6	increased	11.5	893.9	increased	79.1	433.5

Table 5. Values for 16 variables in message format (Var.10 to Var.16 in Example 1).

Date/Time	Values for 16 Variables Sent to GPT							Response of GPT
Date/Time	Var.10	Var.11	Var.12	Var.13	Var.14	Var.15	Var.16	Response of GPT
07/13 06:00	increased	0.0	13.3	increased	0.0	50%	6.0
07/13 07:00	increased	0.9	14.2	increased	3.0	40%	10.0
07/13 08:00	decreased	−1.5	12.7	increased	2.2	50%	7.0
07/13 09:00	increased	0.3	13.0	decreased	−1.0	60%	8.0
07/13 10:00	increased	0.5	13.5	decreased	−1.0	50%	9.0
07/13 11:00	increased	0.1	13.6	increased	1.8	60%	8.0	Example 2
07/13 12:00	increased	0.3	13.9	decreased	−2.5	70%	10.0	Example 3
07/13 13:00	decreased	−0.9	13.0	increased	3.0	60%	7.0
07/13 14:00	increased	0.7	13.6	decreased	−0.3	50%	8.0
07/13 15:00	decreased	−1.0	12.7	decreased	−0.2	60%	7.0
07/13 16:00	increased	0.8	13.5	decreased	−0.3	60%	8.0
07/13 17:00	decreased	0.0	13.4	increased	1.7	60%	7.0
07/13 18:00	increased	0.3	13.7	decreased	−1.0	50%	8.0
07/14 06:00	increased	0.0	13.9	increased	0.0	50%	6.0
07/14 07:00	increased	0.4	14.3	increased	3.1	40%	10.0
07/14 08:00	decreased	−1.6	12.7	increased	2.1	50%	7.0
07/14 09:00	increased	0.4	13.1	decreased	−1.0	60%	8.0
07/14 10:00	decreased	−0.6	12.5	increased	0.4	50%	7.0
07/14 11:00	increased	0.9	13.4	increased	0.0	60%	8.0
07/14 12:00	increased	0.7	14.0	decreased	−1.9	70%	10.0
07/14 13:00	decreased	−0.4	13.7	increased	2.3	60%	8.0
07/14 14:00	increased	0.4	14.1	decreased	−2.2	70%	10.0	Example 4
07/14 15:00	decreased	−0.1	14.0	increased	2.7	60%	8.0	Example 5
07/14 16:00	increased	0.5	14.4	decreased	−2.2	70%	10.0
07/14 17:00	decreased	−0.6	13.8	increased	2.0	60%	8.0
07/14 18:00	increased	0.4	14.2	decreased	−2.3	70%	10.0
07/15 06:00	increased	0.0	13.6	increased	0.0	50%	6.0
07/15 07:00	increased	0.7	14.3	increased	3.1	40%	10.0
07/15 08:00	decreased	−1.5	12.8	increased	2.2	50%	7.0
07/15 09:00	increased	0.3	13.1	decreased	−1.3	40%	8.0
07/15 10:00	decreased	−0.6	12.5	increased	0.6	40%	7.0
07/15 11:00	increased	0.9	13.4	increased	0.0	50%	8.0
07/15 12:00	decreased	−0.3	13.1	increased	1.0	60%	7.0	Example 6
07/15 13:00	increased	1.1	14.1	decreased	−2.8	70%	10.0	Example 7
07/15 14:00	decreased	−0.8	13.4	increased	3.3	60%	7.0
07/15 15:00	increased	0.3	13.6	decreased	−1.3	50%	8.0
07/15 16:00	decreased	0.0	13.6	decreased	−1.2	60%	9.0
07/15 17:00	increased	0.9	14.5	decreased	−0.2	70%	10.0
07/15 18:00	decreased	−0.7	13.8	increased	1.9	60%	8.0

Table 6. Comparison of the ChatGPT pre-trained LLM control and the DQN model-free control.

	Baseline Operation	ChatGPT Pre-Trained LLM Control	DQN Model-Free Control
Total energy use (kWh)	17,961	14,944	13,635
Saving rate compared with baseline operation (%)	-	16.8	24.1
During building operating hours, duration when CO₂ exceeded 1000 ppm * (hours)	0	0	0

* The results are based on operating hours from 05:00 to 18:00, excluding other times.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahn, K.U.; Kim, D.-W.; Cho, H.M.; Chae, C.-U. Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations. Buildings 2023, 13, 2680. https://doi.org/10.3390/buildings13112680

AMA Style

Ahn KU, Kim D-W, Cho HM, Chae C-U. Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations. Buildings. 2023; 13(11):2680. https://doi.org/10.3390/buildings13112680

Chicago/Turabian Style

Ahn, Ki Uhn, Deuk-Woo Kim, Hyun Mi Cho, and Chang-U Chae. 2023. "Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations" Buildings 13, no. 11: 2680. https://doi.org/10.3390/buildings13112680

APA Style

Ahn, K. U., Kim, D.-W., Cho, H. M., & Chae, C.-U. (2023). Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations. Buildings, 13(11), 2680. https://doi.org/10.3390/buildings13112680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Alternative Approaches to HVAC Control of Chat Generative Pre-Trained Transformer (ChatGPT) for Autonomous Building System Operations

Abstract

1. Introduction

2. Methods

2.1. ChatGPT

2.2. Deep Q-Network (DQN)

3. Application of the Methods

3.1. Target Building Model

3.2. ChatGPT Pre-Trained LLM Control

3.3. DQN Model-Free Control

4. Results

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI