Autonomous Energy Management by Applying Deep Q-Learning to Enhance Sustainability in Smart Tourism Cities

: Autonomous energy management is becoming a signiﬁcant mechanism for attaining sustainability in energy management. This resulted in this research paper, which aimed to apply deep reinforcement learning algorithms for an autonomous energy management system of a microgrid. This paper proposed a novel microgrid model that consisted of a combustion set of a household load, renewable energy, an energy storage system, and a generator, which were connected to the main grid. The proposed autonomous energy management system was designed to cooperate with the various ﬂexible sources and loads by deﬁning the priority resources, loads, and electricity prices. The system was implemented by using deep reinforcement learning algorithms that worked effectively in order to control the power storage, solar panels, generator, and main grid. The system model could achieve the optimal performance with near-optimal policies. As a result, this method could save 13.19% in the cost compared to conducting manual control of energy management. In this study, there was a focus on applying Q-learning for the microgrid in the tourism industry in remote areas which can produce and store energy. Therefore, we proposed an autonomous energy management system for effective energy management. In future work, the system could be improved by applying deep learning to use energy price data to predict the future energy price, when the system could produce more energy than the demand and store it for selling at the most appropriate price; this would make the autonomous energy management system smarter and provide better beneﬁts for the tourism industry. This proposed autonomous energy management could be applied to other industries, for example businesses or factories which need effective energy management to maintain microgrid stability and also save energy.


Introduction
With the advancement of information technology in the disruption era, which is driving digital disruption, the way tourism businesses operate would be transformed by adopting new technology to help support their business operations and elevate them to sustainable development [1][2][3][4]. With the development model based on Sustainable Development Goal 7 (SDG7) by the United Nations Environment Programme (UNEP) by 2030, the goal is to have "cheap, reliable, sustainable, and modern energy for all." Its three main goals serve as the cornerstone for the researchers' efforts: (1) ensure that everyone has access to energy services that are affordable, reliable, and contemporary; (2) significantly enhance the amount of renewable energy in the global energy mix; (3) double the global rate of energy efficiency improvement [5]. In this regard, this would see the emergence of using advanced technology for sustaining and managing the energy to go green and preserve the environment, moving toward sustainability [6]. To reduce any potential differences, an autonomous energy management system would combine various energy sources, both renewable and non-renewable, and energy storage systems (ESS) to meet the demand for the loads, that could be connected to the main grid at the point of common coupling (PCC) or operated off-grid, where the microgrids' operating systems could support green energy. When a fault would develop in the linked power systems, autonomous energy management would act in an isolated mode. Hence, microgrids provide a number of advantages, including reducing greenhouse gas emissions, supporting reactive power to raise the voltage profile, decentralizing the energy supply, and responding to the demand. By 2024, the global deployment of microgrids is estimated to reach 8.8 GW. Moreover, microgrids have been installed in rural places, towns, and a variety of industries, including commercial, industrial, and military, based on their goals, load types, and geographical and climatic conditions [7].
Autonomous energy management has also been applied in the tourism industry because of the rapidly growing demand for energy at an accelerated pace due to the internationalization and development of civilization [8]. Hotels and resorts are a very important accommodation service business in the Thai tourism industry, which is an industry that generated an income of 3.076 trillion Thai Baht in 2019 for the country (reference). Additionally, the amount of energy demand depends on many factors, such as the nature and style of the building, usage of the customers who stay, number of rooms, outdoor temperature maintenance, etc. Therefore, if the hotel or resort applies energy management to its business operations by using energy in each section effectively and reduces unnecessary energy use, this would help the hotel or resort to save costs, electricity, and reduce wastage or waste of natural resources. Today, many hotels and resorts are able to generate electricity in several ways, including solar cells or gasoline generators that create the complexity of the power system for the tourism business [9]. Therefore, smart energy management is a necessary system for the tourism business.
With the advent of technology in the digital disruption era, the tourism industry's energy conservation system has been widely implemented, and it has become an important aspect of driving an attraction toward becoming a smart tourism city [6]. A smart city is a sustainable and efficient urban center that delivers a high quality of life for a large number of people, while also requiring effective resource management. As such, energy management is one of the most essential concerns in such urban centers where the energy networks are complex. Therefore, smart energy management is an important key in order to solve this problem. As a consequence, modeling and simulation would be applied to find smart solutions, as well as to plan the most appropriate ways to change from existing cities to smarter ones [10]. In general, energy planning and operation models of a smart city consist of generation, storage, infrastructure, facilities, and transport, which can become a complex power system that would provide an autonomous energy management system for a smart city. Thus, the research topic about deep Q-learning is interesting for research in the energy field [11].
Many tourism locations are situated in remote areas that are quite far from the main power grids. Consequently, these grids are unable to support these tourism attractions effectively, with the result being that many do not have sufficient energy to operate their business [12]. Each microgrid has a different objective and capacity that cooperates with various power resources and a high quantity of loads. Microgrids are alternatively called energy management systems that are operated in coordination to reliably supply electricity to a cluster of loads and distribute generated units. Furthermore, they are energy storage systems connected to the host power system at the distribution level at a single point of connection, or PCC [13] Microgrids can also be totally self-contained and independent of the grid (off-grid).
The purpose of this study was to investigate a deep Q-learning artificial intelligence (AI) model for automatically regulating an energy management system (EMS) that would preserve the energy reserve, maximize the overall system's efficiency, and optimize the dispatch of local resources [2][3][4][5][6]. The EMS had significant hurdles as a result of the microgrid's structure, including the small size, volatility, uncertainty, and intermittency of the distributed energy resources (DER), as well as demand unpredictability and dynamic power market prices. Further advancements in microgrid construction and control would also be necessary to overcome these obstacles. To mitigate the significant volatility of the DER, additional sources of flexibility would need to be used at the architectural level. In addition, to improve the energy dispatch and overcome the uncertainties of the microgrid's components, new control mechanisms and intelligent control approaches would be required.

Autonomous Energy Management
The term "autonomous energy management" still has no specific definition to cover this concept. However, the literature review found that the problem of high demand for energy production is the main problem. Behind this concept, many scholars try to study the new paradigm of looking for energy autonomy in several countries in response to this problem; in [13,14] the authors studied the problem regarding the local energy organization management in the European Union and proposed autonomous energy regional organizations. Furthermore, the study proposed the solution to the preparation and implementation of the grid services as a part of the local public autonomous energy system [15]. In [16,17], the authors purpose the other idea in terms the technology development to support the renewable energy sources. They propose a multipurpose optimization method of the autonomous energy system size which consists of diesel, wind, battery storage, and photovoltaic systems, as well as a load switching system [17].
Currently, power systems are principally based on large-scale power plants like coal, hydro, natural gas, and nuclear. However, those forms of energy are based on nonrenewable energy sources. In addition, every country around the world has been concerned about the environment and energy resources. As a result, renewable resources like solar and wind were investigated and integrated into the system [14,18]. At present, the power system is controlled centrally, and the power is synchronously generated in the power plants and flows from the central power station to the customers in a single direction [16]. With this centralized energy management, the customers would depend on the central power station as the only source. However, in the event that the power station experienced some problems, this would affect the customers.
With the worldwide growth in the digital era, the demand for energy is further increasing. Nevertheless, there are many new technologies being integrated into future power systems. First of all, there are more kinds of distributed energy source technologies, such as solar, wind, combined heat, and power generators [18,19]. These technologies are slightly different from the conventional models as they typically have a power inverter interface to connect to the grid. There are also new distributed technologies, such as distributed storage, flexible loads, and electric vehicles (EVs). Finally, all of these have led to a highly complex situation in the control and operation of the energy system.
Nowadays, microgrid technology has provided a solution for autonomous energy management from distributed energy sources [8]. A microgrid is a small low-voltage or medium-voltage system, which has integrated the power generator, electric load, information technology, and communication systems. The combined energy storage and automatic control system are able to work together as a single system. Typically, microgrid systems are connected to the main grid.
Regarding from the literature review, we can summarize the definition of autonomous energy management as the deconstruction of a centralized power grid's control from a large scale into a smaller grid. With decentralized control and autonomy by every load from each energy resource and data communication, in order to contribute more savings and stability to the energy system.
The advantage of the microgrid system is the reliability of the self-sufficient energy system, which can detect any problems from the main grid and switch to its system automatically. It is also able to serve some activities to continuously operate the system, such as in hospitals, university laboratories, hotels, factories, electric vehicle charging stations, etc. Moreover, the existing literature showed that some energy management systems could decrease the energy costs for business owners. Kapiki [20] found that efficient energy management systems could save energy costs up to 65% for hotel owners. Furthermore, the smart grid and the latest technologies could provide a solid solution to control complex distributed energy systems, such as an autonomous energy management system for green buildings [8]. Additionally, Basit et al. [12] proposed an autonomous energy management system for smart houses, which reduced the cost at peak load times in the home environment. In the study from Raju et al. [21], a multi-agent system (MAS) was implemented for the autonomous energy management of a solar microgrid consisting of two solar photovoltaic (PV) systems. Each component of the microgrid was used as an agent, and together on the optimal energy management [22].

Smart City
The term "smart tourism cities" is gaining renown [2], but there is still no specific definition that can particularly cover this concept. Chung et al. [23] stated "smart tourism cities are indistinct boundaries between tourists and residents in geospatial locations (e.g., urban or destination)." However, behind this concept, most researchers have referred to the terms "smart city" and "smart tourism". Being "a smart city is using all available resources and technologies to grow to be integrated, habitable, and sustainable in an intelligent and corresponding manner" [24]. Harrison et al. [25] also defined that "a smart city means a city that connects with social substructure, physical substructure, business substructure and IT substructure to take advantage of the city's collective intelligence." For the concept of smart tourism, Li et al. [26] defined this as "it is a tourist information service that tourists receive throughout the travel process." Gretzel et al. [27] indicated that "smart tourism is tourism maintained by a combined endeavor to collect data from the social connection, physical infrastructure, and government with the use of innovative technology to transform that data to on-site experiences and business value schemes with emphasis on efficiency, sustainability, and experience enhancement." Moreover, Chung et al. [23] introduced the integration of a "smart city" and "smart tourism," so "smart tourist city" was born. Furthermore, "smart tourism towns are sophisticated tourist destinations that provide sustainable growth that simplifies and increases visitor contact with the destination experience and, as a result, improves the quality of life for the locals" [28].

Deep Q-Learning
In reinforcement learning, deep Q-learning is a familiar algorithm that produces a Q-table that an agent is able to use to find the most appropriate solution to process [29]. In deep Q-learning, neural networks (NN) are used to approximate the Q-valued function. The state is defined as the input, and the Q-values of all possible actions are generated as the output [30]. Additionally, the deep Q-learning algorithm has many benefits for the control system.
The following literature demonstrates the existing research. James and Johns [31] presented an approach that used deep Q-learning to train seven robotic arms in a controlled task without any prior knowledge. Rahman et al. [32] also applied the deep Q-network (DQN) for a self-balancing robot to make the robot model learn the best actions for staying balanced in an environment. Additionally, Qiao et al. [33] proposed handwritten digit recognition using an adaptive deep Q-learning strategy. Furthermore, Zhu et al. [34] studied a deep-Q-learning-based transmission scheduling mechanism for the cognitive Internet of Things (IoT). Moreover, Bui et al. [35] controlled a battery energy storage system by using a double deep-Q-learning-based approach.

Materials and Methods
In this paper, the researchers developed a prototype of smart microgrids for tourism cities, which developed a microgrid virtual environment by using an open-source Python tool. Reinforcement learning (RL) also allowed the machine to learn how to perform the actions. In order to optimize a reward signal, the machine conducted actions in the surroundings. That reward signal in the context of a microgrid could comprise the energy cost, peak load, or safety, depending on which behavior would need to be incentivized. A Markov decision process (MDP) was used to teach the agent how to respond in an RL scenario. However, because the state space in modern power grids is so huge, a normal RL algorithm would be unable to solve it. Therefore, to solve this problem, a deep NN could be used to model the desired policies and value functions, which would therefore be called deep RL.
To apply solutions for sequential decision making based on deep RL, the optimal operation of an MG could be described as a partially observable MDP, in which the MG would be viewed as an agent interacting with its surroundings. The state of the system st = s was made up of a history of features of observations in order to approach the Markov property. Oti; i ∈ {1,...,Nf }, where Nf ∈ N would be the total number of features. Each Oti would be represented by a series of punctual observations over a predetermined period of time hi: Oti = [ot − hi + 1i; ...; oti] (the history length may depend on the feature). The agent would observe a state variable st at each time step, perform an action at A, and advance into a state st, take an action at ∈ A, and move into a state st + 1~P (|st; at). The transition (st; at; st + 1) would be coupled with a reward signal rt =ρ (st; at; st + 1), where: SASR would be the reward function. Then, the γ-discounted optimal Q-value function would be defined. (1)

Value-Based Deep Reinforce Learning Methods
The Q-function would be represented as an approximator using an NN with parameters based on the MDP formulation notations. Deep Q-learning (DQN) is one of the parameter-tuning techniques that is most often used with the goal of directly approximating the ideal Q-function. The parameters are learned in one-step DQN by iteratively minimizing a succession of loss functions with the loss function defined. The Q-function is then changed to return in one step. The researchers also implemented an experience replay mechanism to improve the efficient use of a previously gained experience. The learning phase was conceptually separated from the experience gain phase in an experience replay. Randomly sampled batches of transitions from an experience dataset were used in the experience replay. Moreover, the NN could overcome the limitations of non-stationary data distributions through this technique, thus resulting in improved algorithm convergence. It is also worth mentioning that this algorithm did not employ the greedy strategy because the search space was always explored at random during the training.
The stMG ∈ SMG storage operating state of the microgrid was used by the researchers. This was a term used to define the quantity of energy stored in the storage devices. The quantity of energy stored in the battery was measured in stB [Wh] ∈ SB [Wh], and the energy density of a diesel generator was represented by stDG [Wh] ∈ SDG [Wh/kg]. Then, xB [Wh] (resp.xH2 [Wp]) was introduced as well as the battery storage capacity and generator output xDG [W]. The variable ηB (resp. ζB) denoted the discharge efficiency of the battery. Likewise, the efficiency of the electrolysis and fuel cells were given by ηH2 (when storing energy) and ζ H2 (when delivering energy). The variable ζDG was the efficiency of a diesel generator, and an action was undertaken at each time step. at = [atH2; atDG; atB] ∈ At was applied on the system, where atH2 was the amount of energy moved into (if positive) or out of (if negative) the hydrogen storage device; similarly, this was the amount of energy transported into (if positive) or out of (if negative) the hydrogen storage device. atB was the quantity of energy transferred into or out of the battery that was measured by atDG, which was the quantity of energy emitted by the diesel generator (all negative). The dynamics of the battery were determined by st + 1B = stB + ηtBatB if atB ≥ 0 and st + 1B = stB − atB ζtB/ otherwise. Similarly, the dynamics of hydrogen were described by st + 1H2 = stH2 + ζtH2atH2 if atH2 ≥ 0 and st + 1H2 = stH2 − atH2 ζtH. Figure 1 show the deep reinforcement learning design of the study. The instantaneous reward signal rt was calculated by adding the earnings from the generation of hydrogen. rH2 with the penalties r− was due to the value of the loss load: rt = r (at; dt) = rH2 + r − (at; dt). The penalty r− was equivalent to the total quantity of the energy not delivered to meet the demand: r − (at; dt) = kδt when δt < 0 and null otherwise (k was the cost endured per Wh not supplied within the microgrid), while rH2 was given by rH2 (at; dt) = kH2atH2 (kH2 was the revenue/cost per Wh of hydrogen produced/used). According to the description of the problem, there was no means to supply energy from outside the system (for the public grid), and the system was not rewarded for it. The operational revenue for year y was calculated by using the series of incentives rt as follows: My = ∑rtt ∈ τy where τy was the set of time steps belonging to year y. The optimal operation of the MG necessitated the development of a sequential decision-making method that led to the maximization of the output of My (Algorithm 1).

Repeat (for each episode). Initialize s. repeat
Choose a from s using the policy from Q( -greedy). Take action (a). Update building states (s'). Calculate reward (r). Q(s,a)←r + γQ(s',a') s ← s' until s is terminal. end The researchers' experiment replicated the operation of an actual microgrid with PV panels, batteries, and a generator that was not linked to the main utility grid (off-grid).
The researchers developed a DQN architecture in which the state vector provided the inputs, and each discretized action's Q-values were represented by a separate output. The DQN time series processes used a set of 16-filter convolutions with stride 1 followed by a convolution with 16-filter convolutions with stride 2. The output of the convolutions, as well as the other inputs, was followed by two fully connected layers of 50 and 20 neurons, respectively, as well as the output layer. Except for the output layer, where no activation function was employed, the rectified linear activation unit (ReLU) was utilized as the activation function. The researchers conducted the updated Q at each time step by starting with a random DQN. Simultaneously, the researchers used an agent to supplement a replay memory with all the observations, actions, and rewards. This was followed by an -greedy policy s.t. where the policy π(s) = maxa ∈ A Q (s; a; Θk) was selected with the probability 1 − , and a random action was chosen with the probability (with uniform probability over the acts) . The researchers also employed a decreasing value of over time. During the validation and test phases, the policy π(s) = maxa ∈ A Q (s; a; Θk) was applied (with = 0). Figure 2 show the microgrid diagram of this study. The researchers assumed a household power consumer in a holiday village with an off-grid MG (average of 48 kWh/day). As a starting point, historical data on total sun radiation were employed. At a meteorological station in this town, solar radiation was measured. The electrical load was calculated using real-time data from typical days in each month. The battery had a capacity of xB = 384 kWh, the diesel generator had power of xDG = 100 kW, and the peak PV power generation was xPV = 75 kWp, consumed outside of the MG that was fixed at 2.16 Thai Baht/kWh. The main goal was to minimize the electrical costs, and the reward function was created to maximize the economic profit from the activities. The incentive was based on the gross margin from the operations, which was the money generated by selling electricity to the microgrid and to the external grid minus the costs of the power generation, purchases, and transmission from the external grid.

Results
The result of operating deep RL algorithms in a simulated environment for 50 h and recording both the training performances and daily rewards is shown in Figures 3-8 and Table 1, which depicts the learning processes for each of the RL algorithms. In the simple one-step DQN, the learning curves showed a large amount of instability, and the remaining algorithms displayed a positive learning process that resulted in reasonable convergence. Figure 3 shows the relation of load (kilowatt) on the y-axis, in 9000 h on the x-axis.        Figure 4 shows the relation between the photovoltaics in 9000 h. Figure 5 shows the relation between load (orange) and photovoltaics (blue) in 24 h. Figure 6 shows the relation between load (orange) and photovoltaics (blue) in 168 h. Figure 7 shows the relation between load, photovoltaics, battery (charge and discharge) and grid (in and out) in 9000 h. Figure 8 shows the relation between episodes and reward.

Discussion and Conclusions
In this study we applied Q-learning for autonomous energy management. After running the simulation, the results showed this proposed method could save 13.19% in the cost, compared to conducting manual control for energy management. The results showed the average cost of manual control was 1637.32 baht in 24 h; the average cost of control by applying Q-learning was 1431.36 baht in 24 h. An autonomous energy management system for a residential microgrid for a hotel or resort with multiple sources of flexibility was investigated in this study. The suggested microgrid model took into account the demand flexibility that price-responsive loads could provide. To achieve the effective management of the local resources, the suggested autonomous energy management systems were coordinated between the ESS, the main grid, the loads, and the price-responsive loads. The high dimensionality of the variables in the microgrid components encouraged the employment of intelligent learning-based methods in autonomous energy management systems, such as deep reinforcement learning (RL) algorithms. The numerical findings revealed that varied levels of convergence were attained by the deep RL methods. The findings were compared to a theoretical optimal controller with perfect knowledge of the system's variables and dynamics for the entire day, as well as an electricity retailer who purchased electricity on the day-ahead market and met the same demand without using a microgrid. The results suggested that the proposed microgrid paradigm had a substantial advantage in terms of financial prosperity and resilience in the face of adversity. Because of the high complexity and uncertainty of the microgrid components, designing and implementing an effective autonomous energy management system for future microgrids would be a difficult undertaking. Although deep RL approaches have shown to be successful in simulations, they are far from ideal, and due to data inefficiency, instability, and sluggish convergence, they confront implementation challenges in real-world energy management systems [36][37][38]. As a consequence, the researchers are now working on improving the performance of the deep RL algorithms and expanding their applicability to real-world energy management problems [15,38,39].
From the study, we found that the result from proposed system of applying Q-learning for autonomous energy management could reduce energy costs by 13.19% and applying reinforcement learning could reduce energy costs by 9.74%, compared to manual controlling.
In future work, the experimental results could show improvements in autonomous energy management in several ways. Firstly, when the microgrid produces energy higher than the demand, the system could control the energy storage system to charge or discharge electricity to be sold to the main grid or neighboring microgrids. Secondly, deep RL could be applied to energy planning for selling or buying at the real-time price in energy markets. Finally, a study could be undertaken of the performance of deep Q-learning in order to convert the knowledge of simulations to a real application for the microgrid in the tourism industry.
Additionally, our proposed autonomous energy management could be applied to other industries, for example businesses or factories which need an effective energy management. Although these industries can produce energy, these still need to connect to the main grid. Therefore, our proposed autonomous energy management can help to maintain microgrid stability and also save energy.
Author Contributions: The research conceptualization was by P.S. and P.J.; research methodology by P.S. and P.J.; software and system implementation by P.S. and P.J.; validation by P.J.; formal analysis by P.S. and P.J.; investigation, P.S. and P.J.; resources by P.S. and P.J.; data curation by P.J.; writing-original draft preparation by P.S., P.J. and K.J.; writing-review and editing, P.S., P.J., P.K. and K.J. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.