FQ-AGO: Fuzzy Logic Q-Learning Based Asymmetric Link Aware and Geographic Opportunistic Routing Scheme for MANETs

: The proliferation of mobile and IoT devices, coupled with the advances in the wireless communication capabilities of these devices, have urged the need for novel communication paradigms for such heterogeneous hybrid networks. Researchers have proposed opportunistic routing as a means to leverage the potentials offered by such heterogeneous networks. While several proposals for multiple opportunistic routing protocols exist, only a few have explored fuzzy logic to evaluate wireless links status in the network to construct stable and faster paths towards the destinations. We propose FQ-AGO, a novel Fuzzy Logic Q-learning Based Asymmetric Link Aware and Geographic Opportunistic Routing scheme that leverages the presence of long-range transmission links to assign forwarding candidates towards a given destination. The proposed routing scheme utilizes fuzzy logic to evaluate whether a wireless link is useful or not by capturing multiple network metrics, the available bandwidth, link quality, node transmission power, and distance progress. Based on the fuzzy logic evaluation, the proposed routing scheme employs a Q-learning algorithm to select the best candidate set toward the destination. We implemented FQ-AGO on the ns-3 simulator and compared the performance of the proposed routing scheme with three other relevant protocols: AODV, DSDV, and GOR. For precise analysis, we considered various network metrics to compare the performance of the routing protocols. The simulation result validates our analysis and demonstrates remarkable performance improvements in terms of total network throughput, packet delivery ration, and end-to-end delay. FQ-AGO achieves up to 15%, 50%, and 45% higher throughput compared to DSDV, AODV, and GOR, respectively. Meanwhile, FQ-AGO reduces by 50% the end-to-end latency and the average number of hop-count. the average number of hop-count. The simulation results show that FQ-AGO improves network connectivity and routing performances, particularly in the case of increasing the network size and higher nodes mobility. However, the performance of relevant routing protocols failed to scale and cope whenever the network size increased, or network topology changed.


Introduction
Mobile ad hoc networks (MANETs) have been attracting great interest in the past twenty years. The routing problem of establishing an efficient stationary path between a source and a destination through multiple mobile intermediate nodes is very challenging due to the movement of the intermediate nodes, limited wireless resources, the heterogeneity of the transmission power of the nodes, and the lossy characteristics of a wireless channel. The efficiency of a route depends on the cooperation of all intermediate nodes participating in data transmissions. These restrictions make the route selection information. The proposed routing scheme is scalable since there are no overwhelming control packets required to move the data packets towards the desired destination. By modifying the fuzzy logic membership functions and rules, various MANETs scenarios such as vehicular ad hoc networks (VANETs), or wireless sensor networks (WSNs) can apply FQ-AGO seamlessly.
In addition, our approach addresses another challenge in MANETs, namely the high cost of distributing network global routing information. Although the capacity of wireless networks has been studied extensively, little attention has been paid to how practical routing protocols may be best implemented in MANETs. It is impossible to ignore network dynamics in large-scale networks: nodes joining, leaving, or moving, and channels fading. However, the distribution of global routing information incurs high costs. In a network of n nodes, the overhead of a routing protocol, such as the link-state protocol, scales as n 2 , whereas the total network capacity scales as n/log n [3,4], which may be overwhelmed by the routing overhead as the number of nodes increases. We demonstrate that our distributed opportunistic routing scheme is independent of network global routing information and rely on the local routing information to make routing decisions using our approach.
In this paper, we make the following contributions: 1.
We propose FQ-AGO fuzzy Q-learning opportunistic routing scheme that learns and makes the route decisions on-the-fly by considering one-hop performance.

2.
The proposed scheme takes into account the available throughput, distance progress, asymmetric link, and link quality in the candidate selection process.

3.
The proposed protocol is flexible because we can easily tune the protocol to work for different MANETs networks by modifying the fuzzy membership functions and fuzzy rules.

4.
We implemented and tested the proposed routing scheme in the ns-3 simulator.

5.
We evaluated the performance of the proposed routing scheme and compared it with three state-of-the-art MANETs routing protocols: destination sequenced distance vector routing (DSDV) [8], ad hoc on-demand distance vector (AODV) [9], and geographic opportunistic routing protocol (GOR) [10].
The remainder of this paper is organized as follows. In Section 2, we review the related literature, focusing mostly on the opportunistic routing on MANETs. In Section 3, we describe the system model and the assumptions. A detailed description of the proposed scheme and its objective function and features is in Section 4. In Section 5, we discuss the routing scheme implementation as well as the simulation setup and performance evaluation. Finally, Section 6 concludes the paper.

Background and Related Works
Opportunistic Routing has been proposed in the context of multiple applications and scenarios including mobile ad-hoc networks ( Figure 1) [11][12][13], wireless mesh and sensor networks [14][15][16], Delay Tolerant Network (DTN) [17], and for smart grids applications [18]. While traditional MANETs protocols fail with high mobility of the nodes and their heterogeneity, OR exploits the broadcast nature of the wireless medium and pre-selects a set of candidate nodes and delegates the packet forwarding, hop-by-hop and on the fly, to these relays [3]. Thus, the packet can traverse from the source node to the destination node through a set of potential routes. This dynamic selection of route improves the transmission reliability and throughput of the network.
While the MANETs algorithm is concerned with the end-to-end path between the source and the destination and deals with frequent link failures due to connections and disconnections by applying route management and recovery strategies OR constructs the route online by pre-selecting a set of neighbor's relays on a routing protocol metric within the radio frequency transmission range of the source node. These neighbor nodes are called relay nodes or candidate sets.
Extensive research and surveys have been conducted on OR aspects. The OR reduces the number of transmissions to a destination by taking advantage of the broadcast nature of the wireless medium. Success on reducing transmission number to a packet between source and destination can improve network performance by reducing end-to-end delay, and the energy consumption of network nodes [19]. The broadcast nature of the wireless medium replicates a packet to multiple nodes per transmission, and the choice of relay nodes affects the efficiency of OR. The source nodes utilize the available network information to calculate the OR metric, which is ordering and assigning a probability to the potential candidate set. The node with a higher probability becomes the new custodian node. One can classify the network information to be utilized to compute the OR metrics as local or global. Local network information mainly depends on the information of neighbors, such as the geographic location of nodes or the link delivery probability. On the other hand, the former one mainly relies on the end-to-end information of the network topology by taking into consideration the number of hops from each neighbor to the given destination [19,20]. The OR algorithms that consider the network topology information to select and prioritize the candidate set outperform OR algorithms that depend only on the local information of the neighbors. However, the computation cost of the global network information could impact the network performance, especially, and network scalability because of the significant overhead.
Sharma et al. [21,22] proposed an opportunistic routing approach called (MLProph), which utilizes machine learning algorithms, namely decision tree and neural networks. MLProph is built on top of the opportunistic routing in PROPHET. The routing approach of MLProph is trained by adopting PROPHET routing data performance metrics, such as node power, speed, and location. Based on the evaluation and analysis of the PROPHET routing data, the machine learning algorithm calculates the transition probability to a neighbor node to check whether the node will be able to deliver the message to the corresponding destination eventually. Accordingly, the likelihood of successful deliveries of data packets is predicted. Thomas et al. [23] introduced Q 2 − Routing, which is a hybrid routing protocol that integrates on-demand route discovery technique with proactive updates of the available routes. Their routing scheme depends on Multi-Agent Reinforcement Learning (MARL). The optimal candidate selection of the next-hop forwarder process learns by trial-and-error, without the need for any prior knowledge about the topology of the network environment. Accordingly, the neighbor node that reveals the optimal Q-value for a given destination is selected as the next-hop forwarder. Their simulation results demonstrate a significant improvement in routing performance compared to the AODV routing protocol. QGeo [24] proposed a distributed machine-learning-based geographic routing scheme for ad hoc networks to reduce overhead in a high-mobility scenario. The proposed routing scheme considers a model-free reinforcement learning algorithm in which the global network information is not required to make a routing decision. Instead, nodes make geographic routing decisions distributively relying on the exchange of local location information by periodic Hello messages that include location, current Q-value, and link condition. QGeo routing scheme introduces a local maximum avoiding algorithm in the reward function in which the forwarder node gets a negative reward value to prevent detours. Recently, fuzzy logic became popular in the design of routing protocols. Routing schemes that utilized fuzzy logic for Vehicular ad hoc networks (VANETs) have been proposed [25][26][27]. Alzamzami et al. [25] proposed a routing protocol (FL-DGR) which utilizes fuzzy logic to combine and find a trade-off between different routing metrics intelligently without deriving a mathematical model. In FL-DGR, the communication between nodes takes place through a unicast transmission in which the forwarder node selects one next-hop candidate only using fuzzy logic evaluation. FL-DGR demonstrates improvement in network connectivity and routing performance in relatively dense urban environments compared to some relevant VANETs routing protocols. Wu et al. [26] proposed a vehicle-to-roadside routing scheme. Their scheme utilizes distributed clustering, where a coalitional game model stimulates the vehicles joining a cluster. At the same time, they utilized a fuzzy logic algorithm to select stable clusters heads where the authors evaluated multiple metrics of mobile vehicles. Their evaluation uses the performance of multi-hop routes between nodes and roadside assistants. The scheme relies on reinforcement learning.
With the popularity of global positioning systems (GPS), the geographic location information of the neighbors has been considered as a metric to select and sort the candidate set nodes [28,29]. Accordingly, depending on the Distance Progress (DP), the forwarder node computes the Euclidean distance between potential candidate nodes and the given destination; the closest node to the geographic location of the destination has higher priority to become the potential next-hop forwarder. Utilizing the DP metric, source nodes can estimate the distance from each potential candidate node to reach the destination. Expected Distance Progress (EDP) [19] improved the performance of DP by including the link delivery probability between the current custodian node and the candidate set nodes. Selecting the closest nodes to the destination is not always the optimal choice, especially when the link delivery probability ratio is low. In such a case, multiple re-transmissions of the same packet will be attempted until it is received by one of the candidate nodes. When the DP metric fails to select reliable candidate nodes, EDP will overcome the DP shortcoming by selecting reliable candidate nodes by taking into consideration the same delivery probability, and that will reduce the number of re-transmissions of a packet significantly.
While multiple OR protocols have been proposed, most of these solutions consider a homogeneous network consisting of similar wireless nodes equipped with the same wireless capabilities, moving at the same speeds. In this paper, however, we consider OR in a hybrid scenario, where we leverage, detect, and select nodes equipped with high transmission range wireless antenna, from which they are allowed to construct eventual shortcut paths towards the destination.

System Model and Assumption
The network model simulates a typical multi-hop urban environment composed of a large number of mobile nodes N. The topology of a network is a directed graph G(V, E), where V is the set of all mobile nodes and E is the set of wireless links in the network. Among these mobile nodes, few are equipped with long-range (N L ) transmission capabilities, and others use the common short-range (N S ). Each node is equipped with a wireless radio transceiver. A wireless link exists between any two nodes A and B, if and only if B locates within the transmission range of A. A link is bidirectional if A → B ∈ E and unidirectional if B → A / ∈ E. Each node is equipped with a GPS device that provides information about its position and speed. In addition, every node can obtain the destination position when needed using a location service such as Region-based Location Service Management Protocol (RLSMP) [30] and Mobile Group based Location Service Management (MG-LSM) [31]. Network nodes obtain neighbors information through broadcasting a Hello message periodically at predefined intervals (τ + jitter), to control traffic storm, and to avoid potential collisions. Table 1 shows the format of a Hello message used in FQ-AGO, which includes the sender's ID, which identifies each node in the network, a unique sequence number that reflects the freshness of the neighbors' information, the current position, and node velocity also included in the Hello messages. Each network node maintains a local view of its neighbors by constructing a neighbors' table to manage neighbor's information obtained from exchanging Hello packets. We define the set of neighbor's nodes from which forwarder node can select next-hop to relay its packet as the Candidates Set denoted by C n . A candidate node from C n is selected to relay the forwarder packet to its corresponding destination according to the neighbor evaluation strategy of our proposed fuzzy logic-based scheme. To avoid situations where a selected candidate might be out of the forwarder communication range, Hello packet transmission interval can be tuned based on the network environment requirements. However, the Hello packet transmission interval should be small, to get the current location information about C n . Accordingly, there is a trade-off between the routing protocol performance and the network overhead. After conducting a set of initial simulation experiments, we found that setting the Hello message transmission interval in our scheme to 1 s is sufficient in most cases because the mobile node relative movement during this amount of time is much smaller than the transmission range [25].

Proposed Scheme
The design of the FQ-AGO uses a distributed multi-hop ad hoc wireless communication without relying on any network infrastructure units. The proposed scheme is an opportunistic routing that employs a fuzzy logic-based approach ( Figure 2) to evaluate each direct wireless link of neighbors in the candidate set. Each node evaluates its candidate set members based on their distance to destination, transmission power, link quality, and available throughput. These metrics are jointly evaluated, utilizing a fuzzy logic analysis system. The wireless links evaluation values of the candidate set nodes are stored in the neighbor table ( Figure 2) and these evaluation values are kept up-to-date. Whenever a forwarder node has a packet for forwarding, based on the destination's current location, the forwarder applies a Q-learning algorithm to the candidate set, and the candidate that reveals the highest Q-learning value is considered the next-hop forwarder.
For each packet that the forwarder node has to forward, the candidate forwarding node with the maximum Q-learning value is selected as a next-hop forwarder. The membership functions and fuzzy rules used by the fuzzy logic system are easy to modify and tune to make the protocol more suitable for a particular network environment and condition. Therefore, the proposed protocol can provide a practical and intelligent solution for routing in varieties of ad hoc networks, such as VANETs, and Wireless Sensor Network (WSN). In FQ-AGO, if a local minima problem is detected, the forwarder node will select a new next-hop forwarder, and a re-transmission of the packet is required to progress the transmission packet operation.

Neighbors Evaluation Criteria
The proposed model is an opportunistic routing scheme that relies on fuzzy logic and Q-learning for routing decision making. The proposed routing scheme is designed to work in a variety of mobile wireless networks. Several configuration parameters of FQ-AGO are adjustable so that it is adaptive to different network environments. One of the characteristics of the proposed routing scheme is that the routing decisions depend on the proposed metric value, which simplifies network setting-up operations where there is no need for any network layer system management. Consequently, this will alleviate the network's overall end-to-end delay. In this section, we define multiple network metrics concerning mobile ad hoc networks to improve online routing decisions [32].

Link Quality Estimation Using ETX
Link instability has a significant impact on network connectivity, routing performance, packet delivery ratio, network throughput, and end-to-end delay. Wireless links are directly affected by signal attenuation and fading. Thus, to elevate forwarding reliability and minimize packet re-transmissions and delay, we consider the link quality between the current forwarding node and the potential next-hop candidates. To estimate link quality, we utilize the well-known Expected Transmission Count (ETX) module [25,33] to select the most reliable wireless links. Mainly ETX measurement depends on dedicated probe packets. Instead, we consider Hello packets, which is a fundamental factor of an opportunistic cooperative awareness scheme, and it is also used by our routing protocol to discover direct neighbors.
We define the probability of successfully receiving data packet as d packet and d ACK is the probability of receiving an ACK packet. ETX can be defined as: Every node estimates link quality for its direct neighbors by calculating d packet and d ACK using the ratios of Hello packets that have been successfully received (HRR). The forwarder maintains a counter of each neighbor to calculate the number of received Hello messages within a sliding window size of w seconds. Therefore, the ratios of the successfully received Hello packets (HRR) is calculated at any given time instance (t) by using the following equation: where count(t − w, t) is the number of Hello packets that have been delivered at time t during the sliding window w, and w/τ indicate to the total number of Hello packets that should have been received during time t. For the calculation of the HRR(t), ETX model relys on the sequence number that exists in every received Hello packet, which is used to determine the number of received and lost Hello packets. The packet is either received or lost in the current window size, w, based on a Hello packet sequence number. This determination happens through a comparison process between the sequence number of the freshly received packet and the sequence number of the last packet sequence number recorded in the neighbors' table ( Figure 2). Accordingly, a node can detect any lost Hello packet, and the HRR value is calculated by updating the numerator and denominator in Equation (2).

Available Throughput Estimation
Recruiting a subset of neighbors as a potential candidate without taking into consideration their available throughput would highly impact routing performance and might result in high packet delays and loss. Therefore, we define the Available Throughput Estimation (ATE) as a mean to predict the maximum amount of data that could be transmitted successfully by each candidate node C i during a specified time period [25]: where NBits indicates the number of bits that are successfully transmitted and DTT (Data Transmission Time) is the total amount of time a forwarder node requires to transmit packet. DTT includes total time of both successful and failed transmission attempts that a forwarder node conducts: The number of attempts a forwarder takes to transmit a packet k successfully is N Tx (k), and we define the time duration that it takes a packet to be successfully transmitted as: where N Tx (k) is the number of transmissions attempts it takes a packet to be transmitted successfully, CW i is the estimate of contention window size for the xth transmission attempt, CW N Tx (k) is the estimate contention window size for the last successful attempt of transmitting that packet, and T ϕ and T s are defined as follows: T SIFS indicates the short inter-frame space, T AIFS AC is the arbitration inter-frame space and tuned based on Access Categories (AC), and time slot T Slot duration is defined as follows: T Header+Data is the transmission time of the payload of a packet, T ACK is the transmission time of the ACK frame, δ is the propagation delay, and T ACKTout is the ACK timeout duration: Similarly, the time spent in re-transmitting a packet ((NT xmax )) before it gets dropped, which is considered a failed transmission attempt, is defined as: Assuming that the selected contention window size is uniformly distributed in the range [0, CW i ], where CW i is the current window size for the ith transmission attempt, the average contention time for any transmission attempt can be defined using the following equation: where CW max and CW m in are the maximum and minimum contention window sizes IEEE 802.11p access defined. The dynamic topology in MANETs due to rapid nodes mobility and the wireless signal impairments, such as fading, interference, and obstacles, causes instabilities in node's data transmission rates. To cope with wireless communication impairments in dynamic environment, we use the Exponentially Moving Average (EMA) to calculate the available bandwidth estimation accuracy as follows: where α is a simulation parameter applied to estimate the relative movement of a neighboring node to balance the accuracy and responsiveness of a mobile node link status, due to the rapid changes in network topology and fluctuation of the wireless communication conditions. Based on our simulation experiments, we found that the best value for α is 0.7.

Asymmetric Link
There are two different models for considering the extra wireless transmission capabilities of heterogeneous nodes in MANETs routing [34]. One is trying to use a long-range asymmetric link, and the other avoids it. The former exploits the long-range wireless link to transmit data. There have been many attempts to accomplish this [35][36][37]. There are several benefits to considering transmission over long-range links, such as reducing the number of hops on the path and increasing the lifetime of network nodes with limited energy. This reduces latency and improves network connectivity and routing efficiency, which makes the path more robust. The latter model tries to avoid the use of a remote connection to transmit data (see, e.g., [38,39]). These models consider that the use of long-range transport may increase energy consumption and reduce overall network throughput.
In conventional homogeneous wireless ad hoc networks, a network node can only communicate with neighbor nodes in its transmission range. In contrast, in heterogeneous wireless ad hoc networks, some network nodes can reach other nodes within a broader transmission range. Therefore, the network topology and routing mechanisms are quite different from those in homogeneous networks. As an example, a homogeneous network topology without asymmetric links is depicted in Figure 3a, in which all wireless links are symmetric and labeled with transmission costs in both directions. Node A has high transmission power and has the ability to reach much farther area in the network, thus a wireless asymmetric link is added in Figure 3b. We label asymmetric links from node A to its neighbors with cost 0 to represent the high transmission power of node A [40]. Consequently, a forwarding node should forward the packet to a candidate C i that has higher transmission power, and avoid relying on candidate nodes with short transmission, if possible ( Figure 4). For these purposes, it is desirable to allow a candidate with high transmission capability to relay the forwarded packet to cover a larger transmission area, which can statistically reduce the number of intermediate nodes involved in packet forwarding.

Evaluation of One-Hop Neighbor Wireless Link Status Based on Fuzzy Logic Model
Fuzzy logic-based systems have attracted considerable interest and are widely accepted in many industrial systems and research communities to develop communication applications [6,25,41], due to their ability to simulate human brains reasoning in interpretation of uncertain information. The fuzzy logic model handles imprecise and uncertain information well when the boundaries are not clear. Uncertainty in a classical set theory arises when an attempt is made to define a measurement value as either member or non-member; instead, values in the fuzzy logic set theory can have different degrees of membership [6,25]. Accordingly, the fuzzy logic model can handle imprecise and uncertain information by approximating data rather than precise information analysis. Therefore, fuzzy logic can perform well in decision making and analyzing different metrics.
In the opportunistic network, wireless communication links are vulnerable due to nodes mobility and wireless signal impairment. Accordingly, an evaluation of neighbors' wireless links status is required to improve network connectivity and routing performance. The neighbor's wireless links status depends on multiple factors, such as transmission range capability, available throughput, link stability, neighbor mobility, etc. Therefore, the attempt to derive any mathematical model facilitating the selection of the forwarding candidate set by taking into consideration multiple network metrics is a complex and challenging task, and the solution would be inflexible.
For routing in MANETs, a fuzzy logic model can cope with unreliable and uncertain information about candidate nodes due to rapid nodes mobility and wireless channel conditions. Whenever a forwarder node has a packet to forward, a fuzzy logic model is used to evaluate each candidate's wireless link status and select the most suitable next-hop for forwarding the packet. The proposed fuzzy logic model is developed to combine all considered network metrics discussed in Section 3.3 to contribute to improve the efficiency and reliability of selecting a next-hop forwarder from the potential next-hop candidate. Figure 2 illustrates the fuzzy logic structure that evaluates the neighbors' wireless link status. It consists of three phases: fuzzification, fuzzy inference, and defuzzification. In the first phase, numerical values accepted as inputs are converted into nonnumerical linguistic variables to express the facts. Then, the defined fuzzy rules are used to drive a linguistic output. The final phase interprets the linguistic output into a numerical value that represents the fuzzy weight of each neighbor. The phases that illustrate the calculation operation of the fuzzy logic weight for candidate nodes C i are elaborated in the following subsections.

Fuzzification
Fuzzification is the process of accepting and converting crisp numerical input values into fuzzy linguistic values by employing a predefined membership function to each network metric. For the sake of accurate evaluation of the considered metrics, their values are normalized by considering Min-Max feature scaling normalization, which improves the sensitivity of the metrics to every negligible variation on the neighbor link status. The following equation shows the normalization of the expected distance progress metric: where DP c x ,d c i is the normalized distance value of the potential candidate to destination and DPmax and DP min are the maximum and minimum measured distance span values to destination of all potential candidates C i . Figure 5a illustrates the membership function of candidate distance progress evaluation. The forwarder node employs the developed membership function to map the normalized metric value into the predefined linguistic fuzzy set and measures the degrees of belonging to the linguistic variables. Thus, the fuzzy set degree values for the distance progress metric input belongs to {VeryClose, Close, Far}. The fuzzy membership function for the node transmission power is shown in Figure 5b. The forwarder node uses the membership function to evaluate the neighbor transmission range based on the information included in the exchanged hello packets. Thus, the forwarder node calculates which degree the transmission range of a candidate forwarder belongs to {Long, Short}. The membership function of the ATM metric is depicted in Figure 5c. The forwarder node uses the membership function and the normalized metric to calculate which degree the available throughput value of a candidate node belongs to {Low, Medium, High}, where low represents low achievable throughput and high represents high achievable throughput. Figure 5d depicts the fuzzy membership function of the wireless link quality metric. The metric defined in Equation (1) is normalized using min-max normalization as in Equation (15). The current forwarding node uses the fuzzy membership function and the normalized value to calculate which degree the metric value belongs to {Good, Average, Bad}.

Rule-Based and Inference Procedure
The intelligence of evaluating wireless link status in this work comes from the developed fuzzy inference engine [25]. After mapping the numerical values of the input metrics into a fuzzy set variable, the forwarder node employs the predefined combination of IF-THEN rules to convert the fuzzy values into fuzzy output, which rank the wireless link between the forwarder node and each potential candidate. The linguistic variables of the rank are defined as {Perfect, Good, Acceptable, Not Preferred, Bad, Very Bad}. The fuzzy rule-base construction requires a clear and profound understanding of the nodes' behavior in dynamic environments and the ways input metrics values can affect the ranking output. The fuzzy rule-base consists of a set of IF-THEN rules applied to infer output ranking values. The fuzzy rule-base of FQ-AGO composes of 54 rules. A few sample rules are shown in Table 2. The first four columns show the four input variables considered in FQ-AGO fuzzy analysis system, and the last column contains the output fuzzy ranking values. Each rule consists of an IF component and a THEN component. The IF component is applied to constructs conditions by employing predicates and logical connections, while the THEN component gives the degree of membership that the fuzzy variable corresponds to. For example, in Table 2, the first rule defines the best candidate for relaying the forwarder packet as follows: IF ATM is High, LQ is High, DP is VeryClose, and TP is Long THEN , the candidate rank is Perfect. In contrast, the worst candidate node for forwarding can be defined using the following rule: IF ATM is Low , LQ is Low, DP is Far, and TP is Short THEN Rank is Very Bad.

Defuzzification
Defuzzification is the process of generating a crisp numerical result based on fuzzy membership functions. For the corresponding output membership function, we consider the center-of-gravity (COG) method, since it is the most common defuzzification method that is widely used in real applications to produce the fuzzy output ( Figure 6). It ranges 0-1, where the highest value represents the best candidate node in C n . In the situation where multiple candidate nodes have the same rank value, the candidate with the highest transmission range is selected as the next-hop forwarder.

Q-Learning Model
Q-learning [7] is one of the early breakthroughs of a reinforcement learning algorithm that does not rely on a model of its environment and learns by directly interacting with its dynamic environment ( Figure 2). Q-learning works by trial-and-error for estimating the values of state-action pairs. The Q-value Q(s, C i ) (s ∈ S, a ∈ A) in Q-learning is utilized to directly approximates the optimal action-value function (Q*) for the value of future rewards if the agent performs a particular action (a) when it is in a particular state (s) [42,43]. By exploring the environment, the agents build a table of Q-values for each environment state and each possible action. The Q-learning control algorithm in FQ-AGO is defined as follows. The opportunistic network is the entire environment. The mobile nodes are the learning agents, where they observe and learn the dynamic environment by exchanging Hello messages with neighbor agents. The set of all nodes in the network is the state space. The action that a forwarder node can perform is to select the best next-hop candidate to relay the data packet. Therefore, the set of one-hop neighbors is the possible actions that are allowed at each forwarder [26,42]. A state transition is the process of delivering a packet from the forwarder node to the selected candidate node. There is a set of state transition P(s|s, a) that model the probabilities of transiting from state (s) to state (s') by performing the action (a). Every node maintains a Q-table, which consists of Q-value, which range 0-1.

Updates of the Q-Values
Each node maintains a wireless link status fuzzy logic evaluation [ Fuzzy(s, C i ) in (14)] for candidate nodes. These evaluation values are used as an input for the Q-learning to select the next-hop forwarder online. Accordingly, the fuzzy evaluation value must be fresh enough at each node to reflect the up-to-date topology information in its vicinity. Thus, it frequently updates upon the reception of Hello messages and eradicates stale neighbor information in the absence of new advertisements of neighbors [26,42]. In an Asymmetric fading environment, a node may receive Hello packet from a node that is not in its one-hop neighbor list. Since each node is only required to maintain the information about its vicinity and traffic source nodes, the proposed protocol is scalable. Each entry of Q-values is initialized to 0. When a forwarder node has a data packet to send, it selects the best next-hop forwarder candidate by applying the Q-learning algorithm as follow: where Fuzzy C i is the Fuzzy logic evaluation value of the best candidate wireless link between the forwarder and the next-hop forwarder. We set the learning rate α to 0.8, and the discount-rate parameter γ to 0.9. The possible reward values R n is calculated as follows:

Simulation and Evaluation
In this section, we detail the simulation setup of our proposed techniques and evaluate our techniques compared to those that exist in the literature.

Simulation Setup
To evaluate the performance of FQ-AGO, we implemented our scheme in the ns-3 simulator. We simulated a network with 100 nodes randomly deployed in a 1500 m × 1500 m area. In our simulations, all network nodes were capable of roaming randomly in the network according to the modified Random Way Point (RWP) mobility model [44] at a predefined uniformly distributed speed (V min , V max ), where V min was fixed to be 5 m/s, and V max assumed various values to reflect various network mobility levels. No group mobility models were considered in this simulation. The simulation was carried out using the IEEE 802.11 p standard for Physical and MAC layers. A Lognormal shadowing propagation channel model was adopted to simulate the channel fading. We set the values of path-loss exponent equal to β = 2.7 and standard deviation δ dB = 6 dB as in [45,46].
In the simulation, we considered constant bit rate (CBR) data sessions between a randomly selected source and destination pairs. The source nodes generated a constant bit rate (CBR) traffic at a constant time interval with a packet payload size of 512 bps at a rate of λ packets per second.
We studied networks with 20% of the nodes with long transmission range. These nodes were randomly deployed. By varying the number of nodes, the maximum moving velocity, and the CBR source rate, we could study the performance of FQ-AGO under various network configurations. We performed a set of 15 runs for each experiment to plot the average result for each simulation configuration, and each run was executed for 500 s of simulation time. The remaining simulation parameters are listed in Table 3.

Evaluation Metrics
We evaluated the performance of FQ-AGO by measuring different performance metrics: • Throughput defines as the rate of successfully delivered data packets per second over the communication channel from source to destination. • Packet delivery ratio defines as the ratio of packets successfully delivered to the corresponding destination compared to the total number of data packets transmitted by the sender. This can be easily calculated by dividing the number of received data packets at the destination by the number of data packets that were transmitted by the sender. • Average hop-count refers to the number of intermediate nodes that participate in the transmission process to deliver data packets from the source node to a given destination.

•
Average packet delay defines as the average delay that elapses between sending a data packet from a source and its arrival at a given destination, which may include delays due to route discovery, queuing, and re-transmissions. Figure 7 illustrates the evaluation of the routing protocols' performance, measuring the average number of hop-count as a function of the nodes' speed. In AODV, the results offer higher resolution and a clear relation between the average hop-count and nodes' speed. The AODV does not perform well in a highly dynamic wireless environment, thus the average hop-count increases as nodes mobility increases due to the requirement of re-computation of the route when the original route is lost. Furthermore, AODV overlooks the presence of asymmetric link, and it repeatedly performs route discoveries. On the other hand, in DSDV, the hop-count stays constant compared to AODV and GOR. DSDV constructs a path to a destination that relies on the shortest path metric with periodic route maintenance for accurate routing and link stability. GOR routing makes the routing decision on-the-fly based only on candidates' current location to maximize transmission distance. Intuitively, node mobility has no direct impact on GOR. However, candidate node selection without concern for link condition leads to frequent re-transmission due to interference or obstacles, which leads to an increase in the average hop-count relay the transmission. Differently, FQ-AGO performs the best due to its utilization of an asymmetric link wherever and whenever possible. In FQ-AGO, the candidate selection to relay the data packet has a substantial impact on reducing the total number of intermediate nodes.

Impact of Nodes Mobility
The impact of nodes' speed on network throughput is depicted in Figure 8. Overall, the movement speed of the nodes has a direct effect on the network throughput due to the frequent loss of wireless communication between neighbor nodes, notably in routing schemes that rely on the pre-constructed end-to-end path. Routing protocols such as DSDV and AODV face a severe reduction on throughput as node mobility increases due to frequent path loss to the destination and increased of routing overhead to find a new path to the destination. On the other hand, the GOR achieves better throughput than the AODV due to less path loss since the routing decision does not rely on a single fixed path to the destination. However, FQ-AGO always measures one-hop throughput as a means to elevate routing performance; thus, FQ-AGO shows better achievable throughput than DSDV, AODV, and GOR. This is due to selecting the best wireless communication link that has high throughput along the path to the destination. While the throughput in DSDV and AODV drops when the nodes mobility increase, FQ-AGO exhibits stable throughput under high node mobility.   Figure 9 shows the average packet delay as a function of node speed. We observe that, for AODV, the delay grows more significantly as the mobility of nodes increases; node mobility remarkably affects the routing protocol performance. As a consequence, the route to the destination may be lost since the AODV relies on one next-hop candidate only. Therefore, once the relay node moves out of the sender's transmission range, the sender has to initiate a search action by broadcasting a Route Request packet to find a new route to the destination at the cost of additional delay. On the other hand, DSDV performs better than AODV; the rapid wireless link maintenance updates benefit the overall DSDV routing performance at the cost of extensive consumption of network resources. However, as the speed increases, the DSDV gradually starts losing the available routes to the destination; in some situations, a packet travels through a long path to the destination. In the worst case, the current packet custodian drops the packet. The GOR protocol slightly outperforms AODV in terms of average packet delay, and it is not affected directly by increase nodes mobility since it relies on its neighbors' local geographic position for routing decisions.
It makes the routing decision on-the-fly when a network node has a packet to send. It instantaneously selects the neighbor with the highest distance progress toward a given destination to forward the packet. The main factor that has a direct impact on the packet delay performance metric is the local minima phenomenon, which occurs when a sender becomes the closest node to the destination, and no further progress towards the destination can be made. In such a scenario, the packet is dropped, which increases the packet delay. Finally, FQ-AGO algorithm outperforms all the routing protocols in this study, by taking advantage of the broadcast medium and involving all available next-hop nodes in the candidate set to choose the most reliable as the next-hop forwarder. Furthermore, selecting the candidate node with a high transmission range has a direct impact on minimizing the packet delay transmission due to the long-distance progress it can achieve.  Figure 10 illustrates the impact of the independent factor nodes speed on the packet delivery ratio. We observe that the packet delivery ratio decreases drastically at higher node mobility in DSDV and AODV. The frequent breaking of links requires initiating a route search action to update the routing table, which increases the control message overhead and creates congestion in the network; the time-out it takes to update the routing table increases the end-to-end packet delay and decreases the network throughput. In a real-time application, the source node generates a high constant bit rate traffic, which increases the number of packets injected into the network. Hence, DSDV and AODV are unable to tolerate the packets streams with limited node memory resources. Consequently, the ratio of the dropped packets is high due to high node mobility. On the other hand, the proposed protocol is more resilient to frequent topology changes, thanks to the candidate selection mechanism, which allows the construction of the path on-the-fly, which makes it independent of node mobility. Moreover, the presence of asymmetric links creates shorter paths toward the destination, which increases the probability of reaching the destination with a minimum number of intermediate nodes. Overall, the FQ-AGO outperforms all other protocols due to link stability along the path, and its independents from any network topology change.

Impact of Network Size
The network throughput is the critical metric that illustrates the network scalability. For the scalability of a network, its capacity should grow linearly with the number of nodes. Furthermore, increasing the network size should not lead to performance degradation [3,4]. The main reason for the lack of scalability in routing protocols that run on multi-hop wireless networks is that the packet has to travel long distances, roughly n 1/2 hops, causing the bandwidth consumption per packet to rise rather quickly as the number of nodes increases [32].  Figure 11 shows the throughput performance evaluation as a function of the number of nodes. The FQ-AGO shows a considerable advantage over other routing protocols. FQ-AGO considers the distance progress, available throughput, transmission power, and node-link quality in the candidate selection process. Accordingly, it enhances the network performance by maximizing the one-hop node performance along the path, as one hop performance improvement contributes to the end-to-end performance. On the other hand, the throughput on AODV is low due to the rapid loss of wireless connections with neighbors as the network topology changed and network size grows, the routing control messages drastically increase, which generates considerable overhead in DSDV and AODV. The requirement of pre-computed routes in DSDV and AODV improves network connectivity at the expense of routing performance. Therefore, the throughput decreases linearly as the number of nodes increases. Regarding the packet delivery ratio, as shown in Figure 12, FQ-AGO demonstrates consistent performance in all network environment configurations due to selecting the most stable and effective candidate along the path to the destination, which leads to high packet delivery ratio. By contrast, for DSDV and AODV, the packet delivery ratio decreases drastically, because of the frequent link breakage, which requires initiating a route search to update the routing table. That increases the control messages' overhead and congests the network; the time-out it takes to update the routing table increases the end-to-end packet delay and remarkably decreases the packet delivery ratio. Figure 13 shows the average hop-count as a function of increasing the network size. Intuitively, the hop-count relies on the number of intermediate nodes participating in the packet relaying process. The results offer higher resolution and a clear relation between the number of nodes and the average hop-count. FQ-AGO takes advantage and makes full use of candidates that can transmit the packet for long distances. In other words, nodes with broad transmission range can reach the destination in fewer hops. On the other hand, DSDV and AODV both show an increase in the number of hop-counts to relay the transmitted packet. For example, DSDV calculates the shortest path to the destination; however, DSDV overlooks the presence of asymmetric links and relies on homogeneous candidates with the same transmission capability. In AODV, there is a noticeable routing performance degradation due to the rapid change in the network topology, a massive overhead initiated by intermediate nodes along the path to find a new path to the destination. The recovery process from losing end-to-end path comes with an extra cost of increasing delays and launching high overhead, which decays the overall network performance and, in the worst-case scenario, the percentage of dropped packets increases consequently.

Conclusions
In this paper, we propose FQ-AGO, a novel Fuzzy Logic Q-learning based Asymmetric Link Aware and Geographic Opportunistic Routing scheme. FQ-AGO handles asymmetric links that frequently arise in heterogeneous mobile ad hoc networks. FQ-AGO uses a distributed multi-hop ad hoc wireless communication without relying on any network infrastructure units. FQ-AGO utilizes a fuzzy logic approach to evaluate each direct wireless link of its neighbors and employs the Q-learning algorithm to select the most stable and reliable wireless communication link. Following the opportunistic paradigm, the proposed routing scheme novelties are presented concerning the candidate selection algorithm, where the fuzzy logic model captures the following performance metrics: one-hop throughput, expected distance progress, link quality to the potential candidate, and transmission power. FQ-AGO utilizes fuzzy logic to find the device heterogeneity inherent in MANETs, and exploit the presence of asymmetric links in the candidate set to increase the communication reachability substantially. The candidate selection algorithm improves network connectivity and routing performance by leveraging the presence of asymmetric links in the routing paths and their costs, which minimizes the average distance that each data packet must travel before reaching the final destination. The awareness of the presence of asymmetric links and leveraging asymmetric links opportunistically are the main factors in providing a scalable large-scale network, with the absence of network topology information.
We studied the performance of FQ-AGO via simulation while comparing it with the existing MANETs routing protocols DSDV, AODV, and GOR. For precise simulation analysis, we considered various network metrics to compare the performance of the routing protocols. We performed ns-3 simulation and considered the following performance metrics: hop-count, the available throughput, end-to-end latency, and packet delivery ratio for performance comparison. Simulation results demonstrate that FQ-AGO substantially outperforms the other protocols. FQ-AGO achieves up to 15%, 50%, and 45% higher throughput compared to DSDV, AODV, and GOR, respectively, while reducing by 50% end-to-end latency and the average number of hop-count. The simulation results show that FQ-AGO improves network connectivity and routing performances, particularly in the case of increasing the network size and higher nodes mobility. However, the performance of relevant routing protocols failed to scale and cope whenever the network size increased, or network topology changed.