Dynamic Topology Model of Q-Learning LEACH Using Disposable Sensors in Autonomous Things Environment

: Low-Energy Adaptive Clustering Hierarchy (LEACH) is a typical routing protocol that e ﬀ ectively reduces transmission energy consumption by forming a hierarchical structure between nodes. LEACH on Wireless Sensor Network (WSN) has been widely studied in the recent decade as one key technique for the Internet of Things (IoT). The main aims of the autonomous things, and one of advanced of IoT, is that it creates a ﬂexible environment that enables movement and communication between objects anytime, anywhere, by saving computing power and utilizing e ﬃ cient wireless communication capability. However, the existing LEACH method is only based on the model with a static topology, but a case for a disposable sensor is included in an autonomous thing’s environment. With the increase of interest in disposable sensors which constantly change their locations during the operation, dynamic topology changes should be considered in LEACH. This study suggests the probing model for randomly moving nodes, implementing a change in the position of a node depending on the environment, such as strong winds. In addition, as a method to quickly adapt to the change in node location and construct a new topology, we propose Q-learning LEACH based on Q-table reinforcement learning and Fuzzy-LEACH based on Fuzziﬁer method. Then, we compared the results of the dynamic and static topology model with existing LEACH on the aspects of energy loss, number of alive nodes, and throughput. By comparison, all types of LEACH showed sensitivity results on the dynamic location of each node, while Q-LEACH shows best performance of all.


Introduction
The most potent and influential technology of 21st century is Wireless Sensor Network (WSN). This is a key technology in the ubiquitous network, and is becoming more important as sensors can connect everything to the Internet and process important information. It also collects information about the surrounding environment (temperature, humidity, etc.) with a specific purpose and transmits it to the base station through various processes. With the advancements of WSN disposable sensors shows enhanced performance on the fields where the user cannot access or include large area, such as forest or deep sea [1]. In addition, the size of the sensor node should be small as it includes batteries, making it impossible to charge or replace, and often forming a wireless network by deploying a large number of sensors. And because it uses a battery, it cannot be charged or replaced. In addition, the size of the sensor node must be small as a large number of sensors are placed to form a wireless network.
In addition, the price should be low, so there is a limit to the ability to process data and the amount of power that can be supplied to the nodes [2]. The latest WSN technology enables two-way communication to control sensor activity. The types of WSN can be roughly divided into homogenous [6,7] and heterogeneous network [8,9]. Previous studies using wireless sensor networks have limited energy because sensor nodes operate with batteries. Therefore, for efficient use of the battery, a clustering technique was developed that considers network overlapping prevention [8]. People who use the technique could cite several tasks that deal with monitoring [9][10][11]. This is not only for medical use, but WSN is the same in civil areas, such as health monitoring and smart agriculture, as well as enemy tracking and military areas. Remote sensing execution is greatly simplified to be useful in inexpensive sensors and essential software packages. Therefore, the user can monitor and control the basic environment from a remote location, and it will be used in various ways besides safety accident and prevention training [12,13].

Disposable IoT Sensors
Currently, various technologies have been fused and evolved into IoT technologies. Existing IoT sensors have limitations in price, size, and weight, so it is difficult to precisely monitor a wide range of physical spaces. To solve these problems, technology of disposable IoT that has reduced size, price, and weight and has been constantly studied and developed. In the late 1990s, The University of California developed a very small sensor with a size of 1-2 mm called "Smart Dust" [14]. It has a technology that can detect and manage surrounding physical information (temperature, humidity, acceleration, etc.) through a wireless network through disposable IoT sensors (micro sensors). Therefore, it is possible to accurately sense the entire space by using a low-cost micro sensor compared to the existing sensor. In smart dust system using disposable sensors, LEACH is one of the key technologies for effective data collecting. In addition, disposable sensors broadcast collected data without polling process in order to reduce energy consumption and the size of send/receive payloads are relatively small. Disposable IoT sensor can be applied to a wide range of fields, such as weather, national defense, safety of risks, and detection of forest fire, as it can judge parts that are difficult for user to access [15].
With the advancements of wireless sensor networks, it is anticipated that real time forest fire detection systems can be developed for high precision and accuracy using wireless sensors data. Thousands of disposable sensors can be densely scattered over a disaster-prone area to form a wireless sensor network in a forest [16].
For monitoring the environment, disposable sensors are distributed in large areas by aerial dispersion using airplanes or dispersion in water stream. Distributed disposable sensors can be dislocated from original point by environmental condition, such as winds or movement of animals, since its size is only a maximum of 1 cm 2 , as shown Figure 2. The latest WSN technology enables two-way communication to control sensor activity. The types of WSN can be roughly divided into homogenous [6,7] and heterogeneous network [8,9]. Previous studies using wireless sensor networks have limited energy because sensor nodes operate with batteries. Therefore, for efficient use of the battery, a clustering technique was developed that considers network overlapping prevention [8]. People who use the technique could cite several tasks that deal with monitoring [9][10][11]. This is not only for medical use, but WSN is the same in civil areas, such as health monitoring and smart agriculture, as well as enemy tracking and military areas. Remote sensing execution is greatly simplified to be useful in inexpensive sensors and essential software packages. Therefore, the user can monitor and control the basic environment from a remote location, and it will be used in various ways besides safety accident and prevention training [12,13].

Disposable IoT Sensors
Currently, various technologies have been fused and evolved into IoT technologies. Existing IoT sensors have limitations in price, size, and weight, so it is difficult to precisely monitor a wide range of physical spaces. To solve these problems, technology of disposable IoT that has reduced size, price, and weight and has been constantly studied and developed. In the late 1990s, The University of California developed a very small sensor with a size of 1-2 mm called "Smart Dust" [14]. It has a technology that can detect and manage surrounding physical information (temperature, humidity, acceleration, etc.) through a wireless network through disposable IoT sensors (micro sensors). Therefore, it is possible to accurately sense the entire space by using a low-cost micro sensor compared to the existing sensor. In smart dust system using disposable sensors, LEACH is one of the key technologies for effective data collecting. In addition, disposable sensors broadcast collected data without polling process in order to reduce energy consumption and the size of send/receive payloads are relatively small. Disposable IoT sensor can be applied to a wide range of fields, such as weather, national defense, safety of risks, and detection of forest fire, as it can judge parts that are difficult for user to access [15].
With the advancements of wireless sensor networks, it is anticipated that real time forest fire detection systems can be developed for high precision and accuracy using wireless sensors data. Thousands of disposable sensors can be densely scattered over a disaster-prone area to form a wireless sensor network in a forest [16].
For monitoring the environment, disposable sensors are distributed in large areas by aerial dispersion using airplanes or dispersion in water stream. Distributed disposable sensors can be dislocated from original point by environmental condition, such as winds or movement of animals, since its size is only a maximum of 1 cm 2 , as shown Figure 2.

LEACH (Low-Energy Adaptive Clustering Hierarchy)
The structure of a network can be classified according to node uniformity, in Flat Networks Routing Protocols (FNRP) and Hierarchical Networks Routing Protocol (HNRP) [17].
LEACH is one of HNRP in sensor networks proposed by Wendi Heizehan [18]. This is a typical routing protocol that effectively reduces transmission energy consumption by forming a hierarchical structure between nodes. As shown in Figure 3, LEACH is a method in which the cluster head collects and processes data from the member nodes of the cluster and delivers it directly to the BS. If this is used, the cluster head is selected by the ratio of all sensor nodes, and the cluster head is determined by calculation inside the sensor node, so it is a distributed system. The operation of LEACH is divided into rounds, and each round begins with a setup phase when the cluster is formed, and consists of a steady state phase when data is transmitted to the base station. In order to minimize overhead, the steady state phase is longer than the set phase, as shown in Equation (1) [19]. In Equation (1), the setup is: each node decides to depend on CH of the current round. This decision is based on choosing a random number between 0 and 1 if the number is less than the threshold T(n), and the node becomes the cluster head of the current round. The threshold is set as follows. If the cluster head is selected, the state is reported using the Carrier Sense Multiple Access (CSMA) Medium Access Control (MAC) protocol. The remained nodes make decisions about the current round's cluster head according to the received signal strength of the advertisement message. The Time Division Multiple Access (TDMA) scheduling is applied to all members of the cluster group to send a message to the CH and then from the cluster head to the base station. As soon as a cluster head is selected for a region, the steady state phase begins. Steady: the cluster is created when the TDMA scheduling is fixed and data transfer can begin. If the node always has the data to send, it sends it to the cluster head for the allotted transmission time. Each head node can be turned off until the node allocates the transmission time, thus minimizing energy loss in these nodes [20].

LEACH (Low-Energy Adaptive Clustering Hierarchy)
The structure of a network can be classified according to node uniformity, in Flat Networks Routing Protocols (FNRP) and Hierarchical Networks Routing Protocol (HNRP) [17].
LEACH is one of HNRP in sensor networks proposed by Wendi Heizehan [18]. This is a typical routing protocol that effectively reduces transmission energy consumption by forming a hierarchical structure between nodes. As shown in Figure 3, LEACH is a method in which the cluster head collects and processes data from the member nodes of the cluster and delivers it directly to the BS. If this is used, the cluster head is selected by the ratio of all sensor nodes, and the cluster head is determined by calculation inside the sensor node, so it is a distributed system.

LEACH (Low-Energy Adaptive Clustering Hierarchy)
The structure of a network can be classified according to node uniformity, in Flat Networks Routing Protocols (FNRP) and Hierarchical Networks Routing Protocol (HNRP) [17].
LEACH is one of HNRP in sensor networks proposed by Wendi Heizehan [18]. This is a typical routing protocol that effectively reduces transmission energy consumption by forming a hierarchical structure between nodes. As shown in Figure 3, LEACH is a method in which the cluster head collects and processes data from the member nodes of the cluster and delivers it directly to the BS. If this is used, the cluster head is selected by the ratio of all sensor nodes, and the cluster head is determined by calculation inside the sensor node, so it is a distributed system. The operation of LEACH is divided into rounds, and each round begins with a setup phase when the cluster is formed, and consists of a steady state phase when data is transmitted to the base station. In order to minimize overhead, the steady state phase is longer than the set phase, as shown in Equation (1) [19]. In Equation (1), the setup is: each node decides to depend on CH of the current round. This decision is based on choosing a random number between 0 and 1 if the number is less than the threshold T(n), and the node becomes the cluster head of the current round. The threshold is set as follows. If the cluster head is selected, the state is reported using the Carrier Sense Multiple Access (CSMA) Medium Access Control (MAC) protocol. The remained nodes make decisions about the current round's cluster head according to the received signal strength of the advertisement message. The Time Division Multiple Access (TDMA) scheduling is applied to all members of the cluster group to send a message to the CH and then from the cluster head to the base station. As soon as a cluster head is selected for a region, the steady state phase begins. Steady: the cluster is created when the TDMA scheduling is fixed and data transfer can begin. If the node always has the data to send, it sends it to the cluster head for the allotted transmission time. Each head node can be turned off until the node allocates the transmission time, thus minimizing energy loss in these nodes [20]. The operation of LEACH is divided into rounds, and each round begins with a setup phase when the cluster is formed, and consists of a steady state phase when data is transmitted to the base station. In order to minimize overhead, the steady state phase is longer than the set phase, as shown in Equation (1) [19]. In Equation (1), the setup is: each node decides to depend on CH of the current round. This decision is based on choosing a random number between 0 and 1 if the number is less than the threshold T(n), and the node becomes the cluster head of the current round. The threshold is set as follows. If the cluster head is selected, the state is reported using the Carrier Sense Multiple Access (CSMA) Medium Access Control (MAC) protocol. The remained nodes make decisions about the current round's cluster head according to the received signal strength of the advertisement message. The Time Division Multiple Access (TDMA) scheduling is applied to all members of the cluster group to send a message to the CH and then from the cluster head to the base station. As soon as a cluster head is selected for a region, the steady state phase begins. Steady: the cluster is created when the TDMA scheduling is fixed and data transfer can begin. If the node always has the data to send, it sends Appl. Sci. 2020, 10, 9037 5 of 19 it to the cluster head for the allotted transmission time. Each head node can be turned off until the node allocates the transmission time, thus minimizing energy loss in these nodes [20].
T(n) = Threshold; r = current round; G = The set of nodes that have not been CH; ρ = the percentage of cluster − heads.

Enhancements of LEACH
There are several drawbacks with LEACH. The setup phase is non-deterministic due to randomness. It may be unstable during setup phase that depending on the density of sensor nodes. It is not applicable on large networks as it uses single hop for communication. CH, which is located far from BS, will consume huge amount of energy. It does not guarantee the good CH distribution and it involves assumption of uniform energy consumption of CH during setup phase. And also, the main problems with LEACH lie in the random selection of CH. Various studies have been conducted to improve the LEACH protocol to improve network lifetime, energy savings, and performance and stability [21][22][23][24][25][26][27][28]. LEACH-Centralized (LEACH-C) method was developed for improving the lifetime of the network in the setting stage and also each node transmits information related to the location and energy level to the BS. The BS determines the cluster, as well as the CH and each cluster node. Through this, it was possible to extend the life of the network and improve energy savings [21]. In addition, in S-LEACH, solar power is used to improve WSN life. Typically, the remained energy is maximized to select a node with redundant power, and the node transmits the solar state to the BS along with its energy level. As a mechanism, nodes with higher energy are selected as CH, and the higher the number of recognition nodes, the better the performance of the sensor network. According to a Simulation results, S-LEACH significantly extends the lifetime of WSNs [23]. In addition, in the case of a LEACH, a new node concept that reduces and provides a uniform distribution of dissipated energy by dividing routing work and data aggregation, and supports CH in multi-hop routing is introduced. Energy-efficient multi-hop pathways are established for nodes to reach the BS to extend the life of the entire network and minimize energy dissipation [28]. As a method for saving energy, H-LEACH was developed with various algorithms that consider the concept of minimizing the distance between data [21]. It adds a CH of LEACH to act as the Master Cluster Head (MCH) to pass the data to the BS. In addition, it is proposed I-LEACH to save energy, while communicating within the network. In order to increase the stability of the network, we proposed Optimized-LEACH(O-LEACH), which optimizes CH selection [29]. This means that the selected CH is based only on the remaining dynamic energy. If the energy of the node is greater than 10% of the minimum residual energy, the node is selected as CH, otherwise the existing CH is maintained. In order to deal with situations in which the diameter of the network increases beyond certain limits, D-LEACH randomly places nodes with a high probability of being located close to each other, called twin nodes. It is necessary to keep one node asleep until the energy of another node depletes. Therefore, D-LEACH has uniform distribution of CH so that it does not run out of energy when longer distance transmission takes place [21]. The D-LEACH method ensures that the selected cluster heads are distributed to the network. Therefore, it is unlikely that all cluster heads will be concentrated in one part of the network [25].

Clustering in LEACH
Clustering means defining a data group by considering the characteristics of given data and finding a representative point that can be represented. In LEACH, the sensor detects the events and then sends the collected data to a faraway BS. Because the cost of information transmission is higher Appl. Sci. 2020, 10, 9037 6 of 19 than the calculation, nodes are clustered into groups to save energy. It allows data to be communicated only to the CH, then the CH routes the aggregated data to the BS. The CH, which is periodically elected using a clustering algorithm, aggregates the data collected from cluster members and sends it back to the BS, which can be used by the end user. To transmit data over long distances in this way, only a few nodes are required, and most nodes need to complete short distance transmissions. So, more energy is saved, and the WSN life is extended. The main idea of a hierarchical routing protocol is to divide the entire network into two or more hierarchical levels, each of which performs different tasks [20]. In order to create these hierarchical levels, the clustering functions as a critical role in LEACH.

Interval Type-2 Possibilistic Fuzzy C-Means (IT2-PFCM)
It is known that the synthesis of Fuzzy C-Means (FCM) and T2FS gives more room to handle the uncertainties of belongingness caused by noisy environment. In addition, a Possibilistic C-Means (PCM) clustering algorithm was presented, which allocates typicality using an absolute distance between one pattern and one central value. However, the PCM algorithm also has a problem that clustering results respond to the initial parameter values sensitively. To address this sensitivity problem, PFCM algorithm combining FCM and PCM by weighted sum was proposed. However, the PFCM method also has the uncertainty problem of determining the value of the purge constant. It is an important issue to control the uncertainty of the fuzzy constant value because the fuzzy constant value plays a decisive role in obtaining the membership function. To control the uncertainty of these fuzzy constants, hybrid algorithms are suggested, which includes the general type-2 FCM [30][31][32], Interval Type-2 FCM (IT2-FCM) [33], kernelized IT2-FCM [34], interval type-2 fuzzy c-regression clustering [35], interval type-2 possibilistic c-means clustering [36,37], interval type-2 relative entropy FCM [38], particle swarm optimization based IT2-FCM [39], interval-valued fuzzy set-based collaborative fuzzy clustering [40], and others. These T2FS based algorithms have been successfully applied to areas like image processing, time series prediction and others. Interval Type-2 FCM (IT2-FCM): In fuzzy clustering algorithms, like FCM, the fuzzification coefficient m plays an important role in determining the uncertainty of partition matrix. However, the value of m is usually hard to be decided upon in advance. IT2-FCM considers the fuzzification coefficient as an interval (m 1 , m 2 ) and solves two optimization problems [41].
Fuzzifier Value IT2 PFCM is expressed as the sum of the weights of FCM and PCM method. Therefore, it is clustered in the direction of minimizing PFCM objective function, as follows in Equation (2).
In Equation (3), u ik represents a membership value where the input pattern k belongs to cluster i. x i is the k-th input pattern, and v i is the center value of the i-th cluster. m is a constant representing the degree of fuzziness and satisfies the condition of m ∈ (1, ∞). tik represents typicality that the input pattern k belongs to cluster i, which is a feature of PFCM method using absolute distance. γ i is a scale defining point, where typicality of the i-th cluster is 0.5, and the value is an arbitrary number.
For PFCM cluster method, in above Equation (5) the objective function should be minimized with respect to the membership function u ik . Membership that is obtained by Equation (2). To draw m, you must create the lowest and highest membership functions using the primary membership function. The highest and lowest membership functions of PFCM according to m are as follows.
As shown in Equation (6), the lowest and highest membership values where m 1 and m 2 are representing objective function, the value γ i also changes according to the lowest and highest membership functions. Using γ i , the lowest and highest typicality is, For updating the center value, as shown in Equation (7), the type reduction process of changing type-2 fuzzy takes up type-1 using the K-means algorithm that it is performed, and the center value of each cluster is updated.

Reinforcement Learning (RL)
RL is one of the unsupervised learning methods which tries to find out some policies from interaction with the environment. It is the problem faced by an agent that has to learn behavior through trial and error interactions with vigorous environments. And it was applied successfully in many agent systems [42][43][44]. The aim of reinforcement learning is to find out a useful behavior by evaluating the reward. Q learning is a RL method adequately appropriate for learning from interaction, where the learning is performed through a reward mechanism. This method was applied to WSN optimization problems [45]. Research into applying intelligent and machine learning methods for power management was considered in Reference [46], with Reference [47] being among those specially targeting the area of dynamic power management for energy harvesting embedded systems. A learner interacted with the environment and autonomously determined required actions [47]. An RL-based Sleep Scheduling for Coverage (RLSSC) algorithm is for sustainable time-slotted operation in rechargeable sensor networks [48]. Then, the learner was rewarded by the reward function to respond to different environmental states. RL can be applied to both single and multiagent systems [49]. In previous studies, the efficiency of RL was improved by developing RL applications in WSN [50]. An independent RL approach for resource management in WSN is proposed [51]. WSN tasks through random selection can provide better performance in terms of cumulative compensation over time and residual energy of the network [52].

Q-Learning
The most well-known reinforcement learning technique is Q-learning. Q-learning is a representative algorithm of RL prosed by Watkins [53]. As shown in Figure 5, data is collected directly by the agent acting in a given environment. In other words, the agent takes an action (a) in the current state (s) and gets a reward (r) for it.

Reinforcement Learning (RL)
RL is one of the unsupervised learning methods which tries to find out some policies from interaction with the environment. It is the problem faced by an agent that has to learn behavior through trial and error interactions with vigorous environments. And it was applied successfully in many agent systems [42][43][44]. The aim of reinforcement learning is to find out a useful behavior by evaluating the reward. Q learning is a RL method adequately appropriate for learning from interaction, where the learning is performed through a reward mechanism. This method was applied to WSN optimization problems [45]. Research into applying intelligent and machine learning methods for power management was considered in Reference [46], with Reference [47] being among those specially targeting the area of dynamic power management for energy harvesting embedded systems. A learner interacted with the environment and autonomously determined required actions [47]. An RL-based Sleep Scheduling for Coverage (RLSSC) algorithm is for sustainable time-slotted operation in rechargeable sensor networks [48]. Then, the learner was rewarded by the reward function to respond to different environmental states. RL can be applied to both single and multi-agent systems [49]. In previous studies, the efficiency of RL was improved by developing RL applications in WSN [50]. An independent RL approach for resource management in WSN is proposed [51]. WSN tasks through random selection can provide better performance in terms of cumulative compensation over time and residual energy of the network [52].

Q-Learning
The most well-known reinforcement learning technique is Q-learning. Q-learning is a representative algorithm of RL prosed by Watkins [53]. As shown in Figure 5, data is collected directly by the agent acting in a given environment. In other words, the agent takes an action (a) in the current state (s) and gets a reward (r) for it.
In Equation (9), after Q-learning Q( s t , a t ) is initialized to an arbitrary value, it is repeatedly updated with the following formula value according to learning progresses. R t+1 is the reward that gets it from current state(s t ) to next state (s t+1 ). γ is a discount factor, which is the largest value among the action value function values that can be obtained in the next state (s t+1 ) [54].

Q-Learning
The most well-known reinforcement learning technique is Q-learning. Q-learning is a representative algorithm of RL prosed by Watkins [53]. As shown in Figure 5, data is collected directly by the agent acting in a given environment. In other words, the agent takes an action (a) in the current state (s) and gets a reward (r) for it.   Figure 6 shows the workflow of this study, first, we define the deployment of sensor node in the matrix then, choose the model according to sensor movement, if the movement of sensor topology is not considered as traditional LEACH modeling, static topology model is chosen; otherwise, the dynamic topology model is applied to imply the disposable sensor movement. In the chosen model, LEACH protocol algorism is applied, and each LEACH protocol selects Cluster Head (CH) in its own way, while the calculation is repeated until the system is terminated as the node dying.

Proposed Modification in LEACH
Appl. Sci. 2020, 12, x FOR PEER REVIEW 9 of 19 In Equation (9), after Q-learning Q( s t ,a t )is initialized to an arbitrary value, it is repeatedly updated with the following formula value according to learning progresses. R t+1 is the reward that gets it from current state(s t ) to next state (s t+1 ). γ is a discount factor, which is the largest value among the action value function values that can be obtained in the next state(s t+1 ) [54].
Q( s t ,a t )=R t+1 + γ max a+1 Q. Figure 6 shows the workflow of this study, first, we define the deployment of sensor node in the matrix then, choose the model according to sensor movement, if the movement of sensor topology is not considered as traditional LEACH modeling, static topology model is chosen; otherwise, the dynamic topology model is applied to imply the disposable sensor movement. In the chosen model, LEACH protocol algorism is applied, and each LEACH protocol selects Cluster Head (CH) in its own way, while the calculation is repeated until the system is terminated as the node dying.

Dynamic Topology Modeling
The simulation of the dynamic topology model is signed with the changing node location on the field during operation. This reflects the phenomenon that the micro-sensors exposed to the real environment show movements. As round (r) in Equation (10)

Dynamic Topology Modeling
The simulation of the dynamic topology model is signed with the changing node location on the field during operation. This reflects the phenomenon that the micro-sensors exposed to the real environment show movements. As round (r) in Equation (10) increase, the position of the node (X r , Y r ), every node updates its location as (X r+1 , Y r+1 ), while R random is a random number between −m random and +m random . In the proposed method, one step is inserted before the network moves to the steady state step. After cluster formation is complete, the cluster topology is improved using the various methods (Q-LEACH: reinforcement learning, F-LEACH: IT2 PFCM). The aim of the improvement is to still maintain network connectivity with minimal energy loss in the network. All sensors set the transmit power to the minimum level but keep the received signal power of CH above the E b /I o (Threshold) value so that the sensor transmits with an acceptable error rate. In order to improve the topology of the cluster, the simulation is set up as follows in Equation (11). E t is the total energy consumed in the network, E tx , i is the energy of the sensor i transmitting data to the CH, and k is the energy of the CH receiving data from the sensor. In Equation (12), Signal Interference Noise Rate (SINR) is the received signal-to-noise-noise ratio, where P is the transmit power of the sensor to CH to transmit synchro data, I is the interference power, and N is the noise density. As the distance between nodes gets closer, I and P values increase, and, as the distance between nodes increases, I and P values decrease. The energy the sensor consumes to send data to the cluster head depends on the distance between the sensor and the CH. If the distance is less than d 0 , the energy depends on the free space. However, if the distance is greater than d 0 , the energy is dependent on multipath fading. In addition to the transmission energy, the sensor consumes energy to process data inside the hardware (E elec ) in Equation (13).
The CH receives L-bit data from the sensor with the energy E RX , processes the data inside the hardware with the energy E elec , aggregates the data with the energy E DA , and transmits the data to the sink with the energy E MP in Equation (14).

F-LEACH
The disadvantage of the existing D-LEACH is that the distribution method is too simple as 1/N. Therefore, the proposed F-LEACH in this paper provides more effective distributed clustering compared to the D-LEACH protocol. To do this, dispersion proceeds through the introduction of Interval Type-2 FCM (IT2-FCM). The entire sensor field is divided into sub-areas by fuzzifier centroid. Fuzzifiers (m 1i , m 2i ) for each data point calculation. (function of upper and lower membership) As the effect of the fuzzifier value on the cluster center position, it is clustered by automatic calculation of the fuzzifier value at the cluster center position, as shown in the following the algorithm (Algorithm 1) of the F-LEACH protocol. Q-LEACH is based on Q-table. As a reward, SINR is considered, while states are selected using -greedy.

Reward
The proposed algorithm sets SINR as state and chooses to use transmit power as action. In order to evaluate efficient SINR and transmission power, when the SINR is less than the threshold(T) and it is greater than the threshold(T), it is divided into two independent elements. After that, calculate the compensation function r t+1 . ω is the weighting factor, a max is the maximum transmission power factor of node i, and at is the power for node i at the present time. The reward value can have a positive or negative value, depending on T in Equation (15).

Action Selection
Performed operations in all states are selected using the -greedy method in Equation (16).
ε is the exploration factor [0,1]. With probability ε, the sensor starts an exploration in which the task is chosen at random and performs a random action(a) to figure out how the environment reacts to other available tasks. Otherwise, the system enters the exploitation and selects the action with the maximum Q value.
One of the reinforcement models, Temporal Difference (TD), is used to estimate the Q-value in Equation (9). The Q-value is updated as follows in Equation (17). s t+1 , a)) − Q t (s t , a t )). (17) β is a learning factor that determines how fast learning occurs (usually set to value between 0 and 1) and γ is a discount factor. The Q-value function is calculated using the existing estimate, not the final return. The Q value of the t+1 state is the value of the current state Q(s t ,a t ) multiplied by the discount factor that controls the importance of the future value by selecting the max value among the action values in the t+1 state. (s t , a t ) is subtracted. It can be obtained by adding the value obtained by multiplying the value by the learning factor to the value obtained by adding the compensation function value r t+1 . The closer the discount factor is to 1, the higher the importance for future value, and the closer to 0. This Q-LEACH (Algorithm 2) is as follows.  Declaring target matrix from the original topology is preceded as shown in Figure 7. Before applying Q-table, total field is divided into several unit dimension (which is the same method as D-LEACH) and Q-table is calculated in each unit dimension in the space of target matrix, in this study, defined as 15 × 15. Q-probability, drawn out in four directions at node position on target matrix, gives the direction of node (toward direction with the highest probability). In the developed Q-table, state demonstrates the Q-probability that of approach (4 direction: up, down, left, right) of node toward CH, as the hole and other nodes in the dimension can act as the causes of lowing the probability. 1 step action occurs according to the state-direction with the biggest probability. Eventually, the system earns 1 reward when node arrives to CH. The detailed parameters were chosen according to previous studies that analyzed WSN parameters in order to select the optimal value for the experiment [52][53][54][55]. Proposing LEACH protocol simulation parameters are shown in Table 2.

Results
As shown in Figure 8b, the node location in the static topology model is immovable, while it was observed that nodes were constantly moving random distances in random directions in the dynamic topology model, as shown in Figure 8a. The original node location is marked in red, and, while round is operated 3 times, the nodes move to each direction marked as a green arrow. The detailed parameters were chosen according to previous studies that analyzed WSN parameters in order to select the optimal value for the experiment [52][53][54][55]. Proposing LEACH protocol simulation parameters are shown in Table 2.

Results
As shown in Figure 8b, the node location in the static topology model is immovable, while it was observed that nodes were constantly moving random distances in random directions in the dynamic topology model, as shown in Figure 8a. The original node location is marked in red, and, while round is operated 3 times, the nodes move to each direction marked as a green arrow. AS shown in Figure 9, we calculate energy loss, number of alive nodes and data throughput as evaluation index of LEACHs. D-LEACH, Q-LEACH and F-LEACH which sets CH in their own way, perform better than LEACH, which is characterized as random setting of CH. In the static topology model, he performance of D-LEACH and F-LEACH were similar, but in the case of moving nodes in the dynamic topology model, the performance of F-LEACH is significantly lowered compared to D-LEACH. In particular, the significant reduction in the life time of the node in F_LEACH at the beginning increases energy consumption and decreases throughput significantly.
As a result of Q-LEACH & F-LEACH and existing LEACH (LEACH, D-LEACH) on the aspects of energy loss, number of alive nodes and throughput, Q-LEACH showed the best performance both when a node was immobile/mobile. Q-LEACH on the static node model shows constantly better  Figure 9, we calculate energy loss, number of alive nodes and data throughput as evaluation index of LEACHs. D-LEACH, Q-LEACH and F-LEACH which sets CH in their own way, perform better than LEACH, which is characterized as random setting of CH. AS shown in Figure 9, we calculate energy loss, number of alive nodes and data throughput as evaluation index of LEACHs. D-LEACH, Q-LEACH and F-LEACH which sets CH in their own way, perform better than LEACH, which is characterized as random setting of CH. In the static topology model, he performance of D-LEACH and F-LEACH were similar, but in the case of moving nodes in the dynamic topology model, the performance of F-LEACH is significantly lowered compared to D-LEACH. In particular, the significant reduction in the life time of the node in F_LEACH at the beginning increases energy consumption and decreases throughput significantly.

AS shown in
As a result of Q-LEACH & F-LEACH and existing LEACH (LEACH, D-LEACH) on the aspects of energy loss, number of alive nodes and throughput, Q-LEACH showed the best performance both when a node was immobile/mobile. Q-LEACH on the static node model shows constantly better In the static topology model, he performance of D-LEACH and F-LEACH were similar, but in the case of moving nodes in the dynamic topology model, the performance of F-LEACH is significantly lowered compared to D-LEACH. In particular, the significant reduction in the life time of the node in F_LEACH at the beginning increases energy consumption and decreases throughput significantly.
As a result of Q-LEACH & F-LEACH and existing LEACH (LEACH, D-LEACH) on the aspects of energy loss, number of alive nodes and throughput, Q-LEACH showed the best performance both when a node was immobile/mobile. Q-LEACH on the static node model shows constantly better performance; on the other hand, at the beginning of operation, Q-LEACH and D-LEACH in the dynamic node model shows a similar tendency; however, over time, it can be found that the performance of Q-LEACH has been greatly improved by the reinforced learning effect. Figure 10 shows the comparison of various LEACH protocols, and the percentage of the life time of node each system compared to original LEACH is marked [56,57].
Appl. Sci. 2020, 12, x FOR PEER REVIEW 15 of 19 performance; on the other hand, at the beginning of operation, Q-LEACH and D-LEACH in the dynamic node model shows a similar tendency; however, over time, it can be found that the performance of Q-LEACH has been greatly improved by the reinforced learning effect. Figure 10 shows the comparison of various LEACH protocols, and the percentage of the life time of node each system compared to original LEACH is marked. As in this study, Q-table is applicable since the table is only 4,000,000 (1000(x grid) × 1000(y grid) × 4 (up, down, left, right)) but in the most of application such as invade game the overall calculation is determined by the number of pixels. As related studies on comparison of RL algorithms shows that performance of RL algorithms are showing environment dependence [58]. The environment of this study has the same structure as the OpenAI gym environment [59], and this allows developers to easily and quickly implement the environment by sharing the environment. As can be seen in the pseudocode of Q-LEACH, we tried to improve the performance optimized for the situation by customizing the logic in the openAI environment. In addition, we presented as experimental results that the Q-table showed better performance than other RL methods. The Q-network is applied to a vast number of cases situation, and the Q-table is more suitable for this study because it shows better results in finite situations. As mentioned above, performance improvement is observed in Q-LEACH compared to other LEACH methods.

Discussion
Existing research presupposes static topology. However, in real-world environments where LEACH is applied, nodes change dynamically over time. Thus, in this study, the efficiency of LEACH within the dynamic topology was increased through Q-LEACH.
As the reinforcement learning is an alternative solution for optimizing the routing. It can achieve low latency, high throughput, and adaptive routing [60]. Various type of reinforcement learning method are applied to routing [61]. Among these RL methods, Q-table was applied in this paper to perform Q-learning in OpenAI environment. Existing research required extra effort to build its own environment, but we can contribute to performance improvement by configuring an optimized situation with an open source environment applying OpenAI's gym environment. As in this study, Q-table is applicable since the table is only 4,000,000 (1000 (x grid) × 1000 (y grid) × 4 (up, down, left, right)) but in the most of application such as invade game the overall calculation is determined by the number of pixels. As related studies on comparison of RL algorithms shows that performance of RL algorithms are showing environment dependence [58]. The environment of this study has the same structure as the OpenAI gym environment [59], and this allows developers to easily and quickly implement the environment by sharing the environment. As can be seen in the pseudocode of Q-LEACH, we tried to improve the performance optimized for the situation by customizing the logic in the openAI environment. In addition, we presented as experimental results that the Q-table showed better performance than other RL methods. The Q-network is applied to a vast number of cases situation, and the Q-table is more suitable for this study because it shows better results in finite situations. As mentioned above, performance improvement is observed in Q-LEACH compared to other LEACH methods.

Discussion
Existing research presupposes static topology. However, in real-world environments where LEACH is applied, nodes change dynamically over time. Thus, in this study, the efficiency of LEACH within the dynamic topology was increased through Q-LEACH.
As the reinforcement learning is an alternative solution for optimizing the routing. It can achieve low latency, high throughput, and adaptive routing [60]. Various type of reinforcement learning method are applied to routing [61]. Among these RL methods, Q-table was applied in this paper to perform Q-learning in OpenAI environment. Existing research required extra effort to build its own environment, but we can contribute to performance improvement by configuring an optimized situation with an open source environment applying OpenAI's gym environment.
In order to optimize routing in each network, it is necessary to develop this study by detailing the specific tasks defined for existing networking problems. In addition, it is necessary to introduce a number of recently developed reinforcement learning methods (Deep Deterministic Policy Gradient, Advantage Actor Critic, etc.) into LEACH to be applied in real life.
In the static topology model, there is the need to consider LEACH with multi-hop, whereas, in the dynamic topology model, a multi-hop system is not efficient in the aspect of energy consumption as nodes composing the topology are constantly moving. It seems necessary to calculate LEACH through more well-defined dynamic modeling by applying various variables that occur in actual situation.

Conclusions
Extending network lifetime is still an important issue in WSN of Autonomous Things Environment. Our study aimed the extended the network lifetime even if there is node topology changes. To simulate and evaluate, the basic LEACH protocol and implementation of the proposed algorithm are used in the statistic and dynamic topology models. The rationality of the dynamic node modeling is shown in the similarity on tendency of energy consumption of both models. The difference between the static topology model and the dynamic topology model proved limitations of LEACH on fixed topology. It is shown in Figure 9a against Figure 9d.
In detail, the dynamic node model, the initial life time of node significantly overall decreased as the distance between nodes continuously increasing, and the effect of energy consumption, which is calculated from the distance between node and CH, also increased. In other words, selecting cluster heads efficiently is more important in the dynamic node model than that of the static model. As the nodes move continuously, the location of all nodes tends to diverge. D-LEACH divides a given field into a constant division, so the location of the node is not significantly affected by the divergence; therefore, with both models, D-LEACH shows similar results, while F-LEACH is backward in the dynamic model. In the diverging case with F-LEACH, the effect of uncertainty problem is emerged as the CH is defined from the fuzzifier values and elimination of outliers. This is seen as the result of uncertainty of fuzzy constant, which makes it difficult to select CH because the fuzzy constant plays a decisive role in the membership function. While F-LEACH is a calculation of approaching to the CH, Q-LEACH reflects the random location of a node by calculating the probability of all directions through rewards of SINR. As a result, Q-LEACH eventually achieved the best throughput with the least energy.