green MAC Protocol: A Q-Learning-Based Mechanism to Enhance Channel Reliability for WLAN Energy Savings

: We have seen a promising acceptance of wireless local area networks (WLANs) in our day-to-day communication devices, such as handheld smartphones, tablets, and laptops. Energy preservation plays a vital role in WLAN communication networks. The efﬁcient use of energy remains one of the most substantial challenges to WLAN devices. Several approaches have been proposed by the industrial and institutional researchers to save energy and reduce the overall power consumption of WLAN devices focusing on static/adaptive energy saving methods. However, most of the approaches save energy at the cost of throughput degradation due to either increased sleep-time or reduced number of transmissions. In this paper, we recognize the potentials of reinforcement learning (RL) techniques, such as the Q-learning (QL) model, to enhance the WLAN’s channel reliability for energy saving. QL is one of the RL techniques, which utilizes the accumulated reward of the actions performed in the state-action model. We propose a QL-based energy-saving MAC protocol, named green MAC protocol. The proposed green MAC protocol reduces the energy consumption by utilizing accumulated reward value to optimize the channel reliability, which results in reduced channel collision probability of the network. We assess the degrees of channel congestion in collision probability as a reward function for our QL-based green MAC protocol. The comparative results show that green MAC protocol achieves enhanced system throughput performance with additional energy savings compared to existing energy-saving mechanisms in WLANs.


Introduction
Recently, energy harvesting and saving have become vital subjects of interest for researchers working on wireless communication technologies. Essentially, a wireless local area network (WLAN) device (also referred as a WLAN station-STA) is assumed to have the capacity to save energy while performing most of its important tasks-such as accessing the medium access control (MAC) layer channel-and resource allocation mechanisms. Such techniques are more prominent and needful when these STAs are low power and energy-constrained. Energy-saving techniques expand the lifetime of an STA and make them self-sustainable. Moreover, these techniques help lower carbon dioxide emissions to fight for climate change, and thus can also be included as green technology [1].
The WLAN radio interface is a fundamental source for the energy utilization of the STAs; for example, a Wi-Fi (such as IEEE 802.11n [2]) radio consumes over 70% of total energy in an STA with an screen-off state [3]. However, this energy consumption reduces to 44.5% for screen-off state and 50% for the screen-on state in the power saving mode (PSM) implemented by WLAN technologies [4]. In a WLAN STA, a wireless radio interface remains in one of the following states: transmit (TX), receive (RX), idle (IDL), or sleep (SLP). Most of the WLAN devices consume maximum energy in their active states (that is, TX and RX) and consume minimum energy in the SLP and IDL states [2]. However, in the IDL state, an STA needs to sense the channel continuously for the availability of the resources. Thus, a considerable amount of energy is used even in the IDL state. That happens in the carrier sense multiple access with collision avoidance (CSMA/CA) mechanism in IEEE 802.11, which is one of the distributed coordination functions (DCFs) in WLANs. In the CSMA/CA mechanism, each STA in the network must continually sense the channel for contention. PSMmechanisms [5][6][7] allow an STA to enter the sleep mode by powering off its wireless radio interface if the STA is not engaged in transmission.
The MAC layer decides how STAs share the transmission medium in a WLAN and controls the activities of their radio interfaces. Thus, it plays a significant role in accomplishing high throughput, lower delay, and energy efficiency. Current MAC layers of WLANs are classified either as contention-free (CF) or contention-based (CB) [8]. The CF uses predefined transmission slots to enable STAs to transmit without contention, while in CB, an STA proficiently uses CSMA/CA to contend for the channel with other STAs in the network. These CB schemes are more adaptable and efficient in dealing with the channel resources in a disseminated way for scarce networks (low density of STAs). However, for highly dense networks, there are high chances of collisions [9] due to an increase in channel contention. In WLANs, collisions are assumed if the acknowledgment (ACK) is not received in response to a data packet. For every collision, the STA must re-sense the channel to perform a re-transmission, where channel bandwidth and energy of the sender and receiver are unnecessarily consumed. Thus, an efficient and intelligent MAC layer channel access mechanism can limit the chances of transmission collisions, resulting in a reduced channel access delay and power consumption.
Motivated by Q-learning (QL), which is one of the prevailing reinforcement learning (RL) models [10], we propose a channel observation-based, energy-efficient PSM mechanism for WLANs, named the greenMAC protocol. QL is a behaviorist learning technique that learns from its environment with iterative interactions and exploits the accumulated experience. Figure 1 shows a typical QL-based intelligent STA that interacts with the wireless medium to learn its optimal actions. Observation-based channel collision probability reflects the density of the WLAN-that is, the higher the number of contenders, the higher the collision probability. The key contribution in the greenMAC protocol is to choose to go for SLP mode based on the channel collision probability.
The rest of the paper is organized as follows. Section 2 discusses related research work. In Section 3, we present our proposed QL-based PSM mechanism and a brief description of the QL model and its elements. Section 4 includes a performance evaluation of the proposed greenMAC protocol. Finally, we present our conclusion and future work in Section 5.

Research
The PSM mechanism is enhanced in WLAN energy saving by distinguishing the delay-sensitive data traffic, delay tolerance data traffic [11], and priority-based data traffic [12]. Many researchers have investigated the active/SLP mode scheduling that ensures the lower delay requirements [13][14][15][16]. Several techniques for increasing the power performance of WLANs have been suggested. Vukadinovic et al. [17] proposed a traffic alert approach for updating the standard PSM approach of ad hoc WLANs: when a data frame is transmitted over several hops, only the next hop STA is informed of the pending frame; STAs on other hops remain in doze mode and thus cause a long end-to-end (E2E) delay. Their proposed scheme requires each STA along the routing route to forward a traffic alert to its downstream neighbor. It results in the transmission of the data frames in a single beacon cycle over several hops, and the E2E delay in the multi-hop transfer is significantly minimized. Radwan et al. [18] proposed a solution where STAs are required to work together to take advantage of the strong channel capacity of short-range (SR) connections to minimize transmission time and preserve energy. The neighboring STAs form a cluster, and a head of the cluster is chosen to relay data traffic. Instead of transmitting data directly through the long-range (LR) communication protocol to AP, the STAs send their data to the cluster head using SR networking. The cluster head will then, on behalf of other STAs, relay the traffic to the AP using the LR communication. Tang et al. [19] provided a power-saving protocol for reduced power consumption of APs. This strategy helps AP to reach SLP state while there is no transmission traffic for a while. Equipped wake-up transceivers will relay wake-up signals to AP. This strategy reduces APs' power consumption by reducing the amount of time spent in IDL state. However, their suggested strategy includes installing new radios with STAs, and additional methods are often required to handle the operation of the radios. Lin et al. [20] points out that WLAN STAs also waste their power on IDL state in communication mode, as the STAs can constantly feel the channel and overhear the continuing transmissions of the other STAs. Additionally, the collisions between the STAs which wake up for data recovery at the same time may cause power loss. The authors proposed a DeepSleep scheme for energy-harvesting systems to improve the WLAN PSM, where STAs short of power will reach long-term IDL state and only access the channel with a higher priority. In [21], He et al. present a TDMA-based MAC protocol to decrease contention among WLAN STAs. An AP divides a BI into many equal-time slices and allocates the slices to single or groups of STAs. Therefore, each STA wakes up in its allocated time slot for data retrieval instead of contending for channel access. By eliminating channel contention, this approach effectively decreases the energy consumption of PSM devices. However, if a PSM system does not wake up in its time slot, it will waste the allocated channel. Moreover, because all time slots have the same length without considering frame length or traffic load, in the case of short frames or light traffic, the allocated time slots can be used ineffectively. Eun-Sun et al. [22] proposed an improved PSM (IPSM) to change the size of the ATIM window accordingly. During the predefined ATIM window, when a certain number of IDL channels are sensed, STAs may terminate the ATIM window and start transmitting data frames. Otherwise, if the current ATIM window length is too short, STAs will dynamically expand the window size for a given scale. While this protocol can efficiently enhance the WLAN PSM performance, a hidden terminal problem is missing. Lei et al. [23] suggest a reservation scheme for back-off counters (BC), which is paired with a neighboring polling solution. The authors propose a BC reservation system for STAs to minimize the risk of selecting the same BC at random. Based on the proposal, devices that have successfully transmitted a control frame, ATIM frame, will reserve a BC by that frame, and use the reserved BC to continue with the following data transmissions. Furthermore, the authors present a neighboring polling scheme to minimize the hidden STA problems. As the wireless transmission has a broadcast nature, the STAs located at the transmission range of the transmitter will overhear an ongoing transmission colliding with a transmission from a hidden STA. It helps one of the neighbors to poll the transmitter again, using continuing transmission from the neighbor. However, a BC reservation scheme has similar results to a fixed channel access mechanism, such as TDMA.
Hence, from the above-related research discussion, we see that the PSM-based MAC protocols can provide higher performance with low energy consumption. However, this performance enhancement is greatly affected by the increase of the number of STAs in the WLAN. It increases the contention among the STAs, which increases the channel sensing time of the STAs.

Existing PSM
In the PSM mechanism, the channel access and transmission time are divided into beacon intervals (BI). At the start of each BI, the access point (AP) communicates a traffic indication message (TIM) to inform a PSM-interested STA. Thus, the STAs with data packets to send for the AP remain awake during the BI period and ask the AP to receive the data packets. The STAs that have data packets in the queue for transmission to the AP likewise remain awake and send their data packets to the AP during the BI [2]. Figure 2a shows the working of a PSM mechanism in a WLAN, where an AP initiates BI with the transmission of a TIM beacon. Once a TIM is successfully received, STAs around the AP proceed for channel contention to transmit a PS-poll message after observing a DCF interframe space (DIFS) idle period. During the contention, a standard BEB mechanism is used. An STA recognizes its transmission as successful if the acknowledgment (ACK) message is received from the AP. After a successful PS-poll transmission, STA awakens for data reception on the channel, as shown in the figure. Once the data are successfully received, an STA changes its state to SLP mode for energy saving purposes, while AP may remain busy in its other tasks, such as transmission to other STAs in the WLAN.

Q Learning Model: Environment and Elements
A QL model has three primary elements related to its environment; strategy, reward function, and Q-value function.

Strategy/Policy
A strategy (also known as a policy) portrays the learning agent's (which is an STA in our case) way of taking action at a given time. Moreover, a policy is a mapping between the actions and their evident states, which analyzes a set of action-response relationships. Generally, a policy is a mathematical function or a simple lookup matrix, and it may include complex computation, such as the pursuit process. A policy is the essence of a QL-enabled STA and alone is enough to choose its behavior.

Reward
A reward describes the aim of the QL agent (STA). At each time step, the system generates a feedback value called the reward of the action taken. STA's core goal is to maximize the total reward collected from the environment. Thus, it is the reward which is the fundamental purpose behind changing the policy at any state; if action is chosen by the policy that brings a small reward, the policy may be changed to pick some other action for that state later on.

Q-Value Function
Reward shows the instant response of the action in any state, whereas a Q-value function shows what is best at last by accumulating each time system's reward for this action in this specific state. A state may have a high Q-value even though it yields a low reward if it is visited by the agent very often. Thus, we seek to move us to the highest value states, not the highest reward states.
QL always tries to find and follow an optimal policy for any finite Markov decision process (MDP), especially for a murky environment model [10].

GreenMAC Protocol
An intelligent QL-based PSM mechanism based on the channel observation approach is used to resolve the energy deprivation issue due to high contention in the WLANs caused by the CSMA/CA of the conventional PSM mechanism. The proposed greenMAC protocol guarantees energy savings while preserving the throughput of the network. In greenMAC protocol, the competing STAs observe the number of busy slots S i busy in B i obs , the number of back-off slots, that is, B i obs = S i busy + S i idle , as shown in Figure 2b. As we see in Figure 2a, back-off is performed at least two times when an STA is willing to transmit/receive data, once for PS-poll, and after that for data packet transmission/reception. Therefore, an STA must observe and measure the channel density probability p d every time it proceeds for the back-off, as shown in Figure 2b. Hence, p d is determined as follows, where n is the number of times a channel is observed (or the number of times the back-off procedure is performed). For example, as shown in Figure 2b, STA randomly selects its back-off value B = 12 for its first back-off stage (that is, twelve idle slots, S i idle = 12) and B = 9 (that is, nine idle slots, S i+1 idle = 9). The STA observes three busy slots in first back-off (that is, S i busy = 3) and two busy slots in the second back-off (that is, S i+1 busy = 2). The total number of observed slots is equal to the sum of all the idle and busy slots during these two back-off stages (that is, B obs = S i idle + S i+1 idle + S i busy + S i+1 busy ). Thus, according to Equation (1), where B obs = 12 + 9 + 3 + 2 = 26 and S busy = 3 + 2 = 5, we obtain the value of p d as p d = 5 26 = 0.192. The proposed QL-based mechanism considers the number of back-offs as an available set of states, which is S = {1, 2, 3, . . . , n}, and the decision to go to sleep or awake is considered as an action set, that is, A = {0, 1} (where 0 means sleep mode and 1 denotes awake mode). At time t, an action a t is performed in a particular state s t to obtain a reward R t (s t , a t ), with the aim of exploiting the accumulated Q-value function, Q t (s t , a t ). This Q-value is accumulated every time an STA performs the action and perceives the resulting reward. With this action a t , an STA moves from state s t to s t+1 . The QL-based mechanism aims to discover an optimal policy that can exploit the accumulated reward. The Q-value function Q t (s t , a t ) is updated as follows [10]: In Equation (2), α is a learning rate and defined as 0 < α < 1. The convergence of the QL-based algorithm is based on the learning estimate ∆Q t (s t , a t ), and is given by where γ is the discount factor (0 < γ < 1) to determine the importance of future Q-value, that is, Q t+1 (s t+1 , a t+1 ). In Equation (3), max a shows the exploitation of Q-value function with respect to a. One of the key characteristics of a QL algorithm is to maximize the instant reward by continuous exploitation (known as exploitation or greedy strategy). A reasonable alternative is to exploit it more often; however, sometimes, the learning STA must explore the environment for changes (known as exploration or non-greedy strategy). The QL algorithm uses a probabilistic combination of greedy and non-greedy strategies, known as the ε-greedy mechanism [10], which uses ε probability for exploration and 1 − ε probability for exploitation.
Finally, we consider p d estimated during the number of back-offs performed as the reward of an action. Thus, a reward given by action a t taken at state s t is given by, The proposed greenMAC protocol defines a threshold value T value . In the exploitation, an STA checks if the Q-value is higher than the T value before going to sleep. A higher Q-value shows a high density of STAs in a WLAN; thus, an STA may choose to go to SLP mode more often to avoid collisions in the network. It results in decreased network collision and reduced energy consumption as well. Figure 3 shows the flowchart of the functionalities of our proposed greenMAC protocol.

Performance Evaluation
In this section, the performance of our proposed greenMAC protocol is evaluated based on ns3 (network simulator 3) simulations [24], with an IEEE 802.11ax WLAN model. Table 1 shows a few of the important simulation parameters used for the performance evaluation of greenMAC protocol. In our simulation environment, every STA measures the channel density probability p d by counting the number of transmissions on the channel during the back-off mechanism. An STA senses others' transmissions on the channel and increments the S busy counter if it is found to be busy, while an idle channel S idle is decremented. Since most of the 802.11 WLANs are of limited mobility, the wireless channel in our experiments is considered as stationary. However, the network dynamics, such as mobility and increase/decrease of STAs within the WLAN would be interesting to know as well, which we consider as our future works.
We simulated 10 contending STAs for 20 learning episodes, varying between α and γ with a small value (0.2), a medium value (0.5), and a large value (0.9). For a balanced exploration and exploitation, the probability of ε was set to 0.5. Figure 4a shows the learning estimate (∆Q) convergence from Equation (3) for different learning rate values (α) while keeping the discount factor as γ = 0.9. Similarly, Figure 4b shows the learning estimate (∆Q) convergence for different discount factor values (γ) while keeping the learning rate as α = 0.9. The figures show how a greater value of α and γ makes the convergence of ∆Q faster. Figure 5 represents the throughput (Mbps) comparison of the existing PSM mechanism and our proposed greenMAC protocol in a WLAN. In the figure, we see that the throughput of the WLAN network environment decreases with the increase of contending STAs, which is obvious due to increased contention. Our proposed greenMAC protocol enhances the throughput performance of the WLAN as well. This performance enhancement in terms of throughput is very slight. However, this performance gain indicates that the proposed mechanism does not scarify its performance for energy-saving purposes.  We evaluate the performance of our proposed greenMAC protocol compared with the existing PSM mechanism and two of the approaches from the related research works-DeepSleep [20] and BC-counter-based [23]. The energy consumption of the STAs in a WLAN environment increases as the number of contending STAs increases due to the massive contention process for data frames transmission. Therefore, in Figure 6 we compare the the total energy consumed in the network. The figure illustrates that existing PSM approach consumes higher total energy as compared to the other approaches-the proposed greenMAC, DeepSleep, and BC-counter. The enhanced energy savings of greenMAC protocol in this figure show that a machine learning-based approach can learn the network environment and optimize the power-saving procedure. From the evaluation of Figures 5 and 6, we observe that the existing PSM mechanism decides to sleep so often to save energy consumption that it results in decreased throughput. However, our proposed greenMAC protocol chooses to stay awake longer if the density of the STAs in WLAN is less and decides to sleep more often if the network is highly dense. The decision to be awake or to sleep is based on the QL mechanism, which allows an STA to converge its channel observation-based collision probability. That ultimately results in the lesser energy consumption in the network.

Conclusions and Future Work
Recently, it was observed that the use of WLAN-enabled devices dramatically increased. The issue of high energy consumption by WLAN-enabled devices (STAs) remains a highly critical challenge regardless of the substantial growing recognition of WLANs. Researchers from institutes and industries, have highlighted numerous weaknesses within the existing PSM mechanism and proposed many approaches addressing this issue. Most of the approaches proposed by the researchers to save energy and reduce the overall energy consumption of WLAN STA focus on static/adaptive PSM mechanisms. Researchers have addressed several issues and limitations concerning their energy utilization and network performance degradation. In this paper, we recognize the potential of machine-learning techniques, such as the QL algorithm, to enhance the WLAN's channel reliability for energy saving. A QL-based energy-saving MAC protocol, called the greenMAC protocol, is proposed for this purpose. Our proposed greenMAC protocol mainly depends on the density of the WLAN environment by measuring channel observation-based collision probability. The density of the WLAN environment is assessed as a reward function for our QL-based greenMAC protocol. The proposed greenMAC protocol chooses to turn sleep mode on or off based on the channel density probability, which results in reduced channel collision probability of the network and helps to save energy of the STAs. The comparative simulation results show that greenMAC protocol achieves enhanced system throughput performance with additional energy savings compared to the existing PSM mechanism of WLANs and other related approaches.
For future work, we aim to explore our current QL-based protocol for dynamic WLAN environments and to enhance its potential. We will specifically focus on implementing the evaluation of the proposed mechanism and comparison with existing PSM mechanisms proposed by the researchers. We also aim to evaluate the performance of the QL-based PSM mechanism for a delay-sensitive WLAN environment with QoS requirements.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: