Mobility, Residual Energy, and Link Quality Aware Multipath Routing in MANETs with Q-learning Algorithm

: To facilitate connectivity to the internet, the easiest way to establish communication infrastructure in areas a ﬀ ected by natural disaster and in remote locations with intermittent cellular services and / or lack of Wi-Fi coverage is to deploy an end-to-end connection over Mobile Ad-hoc Networks (MANETs). However, the potentials of MANETs are yet to be fully realized as existing MANETs routing protocols still su ﬀ er some major technical drawback in the areas of mobility, link quality, and battery constraint of mobile nodes between the overlay connections. To address these problems, a routing scheme named Mobility, Residual energy and Link quality Aware Multipath (MRLAM) is proposed for routing in MANETs. The proposed scheme makes routing decisions by determining the optimal route with energy e ﬃ cient nodes to maintain the stability, reliability, and lifetime of the network over a sustained period of time. The MRLAM scheme uses a Q-Learning algorithm for the selection of optimal intermediate nodes based on the available status of energy level, mobility, and link quality parameters, and then provides positive and negative reward values accordingly. The proposed routing scheme reduces energy cost by 33% and 23%, end to end delay by 15% and 10%, packet loss ratio by 30.76% and 24.59%, and convergence time by 16.49% and 11.34% approximately, compared with other well-known routing schemes such as Multipath Optimized Link State Routing protocol (MP-OLSR) and MP-OLSRv2, respectively. Overall, the acquired results indicate that the proposed MRLAM routing scheme signiﬁcantly improves the overall performance of the network.


Introduction
In recent years, advances in wireless communication and internet service technologies have tremendously heightened the prospect of wireless mobile computing applications [1].Nowadays, several wireless communication technologies such as MANETs [2], Vehicular Ad-hoc Networks (VANETs) [3], Cognitive Radio (CR) [4], Wireless Sensor Networks (WSN) [5], Device to Device Communication (D2D) [6], Coordinated Multipoint (CoMP) [7], Internet of things (IoT) [8], Carrier Aggregation (CA) [9], Passive Optical Network (PON) [10], Massive Multiple Input Multiple Output (MIMO) [11], Non-Orthogonal Multiple Access (NOMA) [12,13], and Fog Computing [14] are now becoming increasingly popular due to their compelling performances and application in the vast area of wireless communications.MANETs and VANETs are prominent among other wireless applications related to D2D networked systems.Smart devices in fifth generation (5G) wireless systems can be loaded not only with a variety of sensors, but also by different means of communication that enable access to the Internet.Technologies such as Wi-Fi, Bluetooth, NFC, and more recently Wi-Fi P2P (Wi-Fi Direct), can be used to enrich the smart side of these devices by enabling D2D communication and providing more functionality to enhance the user experience [15,16] and facilitate interactivity with the surrounding environment [17][18][19][20][21][22].However, such level of user experience and interactivity may become seriously reduced in remote areas with intermittent cellular services and/or lack of Wi-Fi coverage.In times of calamitous events, such as an earthquake, communication infrastructures may become dysfunctional or degraded due to failure or damage, in such a situation, the MANET paradigm becomes the most convenient way to provide rapid deployment and self-manage wireless link with the end users.These functionalities and features of MANET have offered a broader possibility for extensive research in building the future 5G wireless network community [23][24][25].MANET is a self-generated and self-structured wireless network which does not rely on a centralized infrastructure for its operation.In other words, all mobile nodes/devices in a MANET network collaborate with each other and act as a router for one another, thereby providing a robust and effective operation throughout the whole network.Mobile nodes are incorporated with routing functionality, and each node can join and/or leave the network at will depending on the capacity of its energy resources and nature of its network topology [26].MANET is characterized by event such as constant change in network topology, which often leads to frequent link failure, degraded transmission quality, and reduced network throughput [27].To overcome these issues, it is imperative to design, develop and implement a new generation of routing protocols that support robust and efficient routing in MANETs.
Routing protocols in MANETs are classified into three categories, namely; proactive, reactive, and hybrid routing protocols, depending on the nature of underlying routing information and update mechanism they employ.The proactive routing protocols are also known as table-driven routing protocols because of their tendency to store updated routing information in the form of tables.Notable examples of proactive routing protocols include destination-sequenced distance-vector (DSDV) [28] and optimized link state routing protocol (OLSR) [29].Each node maintains and stores the network topology information in a tabler form in order to maintain a consistent network view as the network topology constantly updates periodically due to the very nature of ad-hoc wireless network.From time to time, nodes that run the proactive routing protocol need to exchange up-to-date routing information table among themselves.Whenever a node needs to transmit packets, it firstly extracts routing information from the maintained table with which it routs the packets.Thus, less time is required for a route discovery process which leads to reduced end-to-end delay for data transmission between the source and destination nodes.In contrast, this periodic exchange of routing information and route request packets in the route discovery process leads to high control packet overhead throughout the network.On the other hand, the reactive routing protocols are also known as on-demand routing protocols due to their on-demand manner for the route selection process.Some good examples of reactive routing protocols are dynamic source routing (DSR) [30] and ad hoc on-demand distance vector (AODV) [31].Nodes running the reactive routing protocols do not need to maintain any prior routing information but exchange routing information of the network only when there is a need for communication.Thus, source node based on reactive routing protocols simply discovers routes in an on-demand manner to establish a connection to a destination node; hence, few control packets overhead are generated for the maintenance of network topology information.Conversely, during the route discovery initiation process in reactive protocols, all nodes exchange topology information with one other, which typically involves flooding of packets in the network.Then, the reactive protocol takes time to gather and analyze the network topology information.This process tends to prolong the end-to-end delay for data transmission between source and destination nodes [32].Hybrid routing protocols such as zone routing protocol (ZRP) [33] and Secure Link State routing (SLSP) [34] handle routing activity by dividing the network into zones while combining the best features of reactive and proactive routing protocols to make a timely and informed routing decision.For instance, in the event that a node wants to transmit packets to a destination node within a certain geographical zone of its immediate neighboring network, it will choose the proactive approach to handle this task.On the other hand, if the destination node is outside the geographical zone of its immediate neighbor, it will rather use the reactive routing protocols.However, this approach increases the complexity, computational cost, and energy consumption of mobile nodes [35] during the route selection process in the hybrid routing protocol, which degrades the network performance due to repeatedly changing network zones and constant switching between routing protocols.

Contribution
The MANETs reactive routing protocols are more practical due to their ability to maintain a reasonable network convergence time and minimize flooding of packets in the route establishment process, especially in real-time communication scenarios.In practical terms, it is evident that MANETs do not rely on a fixed infrastructure or any centralized ancillary for coordination and control between the mobile nodes.Thus, mobile nodes are simultaneously considered as hosts as well as routers when establishing a connection between end users in the network.Therefore, whenever a route needs to be established between a pair of mobile nodes, the status of intermediate nodes must be considered before determining the routing in MANETs.Routing in MANETs involves establishing a stable and reliable route between a source and destination nodes.However, factors such as channel link quality, mobility, and energy resource constraints of mobile nodes, remain major challenges.These challenges are briefly discussed as follows: Link Quality: The dynamic time-varying characteristics of the MANETs pose link failure.To overcome this, it is of paramount importance to consider the reliability of link quality as it interacts with the Media Access Control (MAC) layer for the selection of the optimal route.
Mobility: The network topology in MANETs is highly dynamic and changes in an unpredictable manner due to high displacement and random addition/subtraction of intermediate nodes in the network, which is capable of degrading the overall performance and can disrupt the and Quality of Service (QoS) of the network.
Energy constraints: Limited energy resources and processing power of mobile nodes are major source of constraints in the design of an optimal route, which incur severe impacts on route stability and lifetime of the network.
Although, number of multiobjective routing metrics have been suggested by researchers in the literature as discussed in Section 3; however, they did not combine multiple parameter metrics with decision-making techniques such as Q-learning algorithm to simplify information exchange throughout the network and reduce routing overhead.To the best of our knowledge, none of these studies addressed mobility, link quality, and energy efficiency of nodes simultaneously with the multipath concept in the selection of the optimal route.Therefore, in this paper, we have considered link quality, mobility, and energy resource constraints of mobile nodes factor in the proposed scheme.
The rest of this paper is organized as follows.Section 3 reviews the existing literature pertaining to OLSR reactive routing scheme.Section 4 describes the proposed MRLAM routing scheme in detail and presents simulation parameters and the scenario used to validate the proposed scheme.Section 5 presents the results and discussion.Finally, the conclusion and future work are presented in Sections 6 and 7, respectively.

Related Works
Multipath routing protocols have been extensively researched in the last few decades and several MANETs paradigms which are said to possess the ability to provide reliable communications, ensure load balancing, and improve QoS have been designed by researchers [36].Multipath routing is an additional heuristic dimension introduced into routing paradigms with the intention to improve end-to-end delay, reduce packet overhead, and maximize network lifetime.In the first version of OLSR reactive routing protocol introduced in [37], some specific nodes named as Multi-Point Relay (MPR) are selected during routing to minimize traffic overhead and copes with message flooding problem in the network.These MPR nodes are selected two hops away from neighbor node to be used for forwarding packets to the destination.Thus, excessive load induces on MPR nodes contributes to the rapid depletion of their battery and increases the chance of link failure.Kots and Kumar et al. [38] proposed a Fuzzy logic based novel routing scheme, Neuro-fuzzy, a fuzzy-genetic techniques for the quality attributes of MPR nodes selection.Meanwhile, they applied the Soft Computing (SC) technique for the selection of qualified MPR node to improve energy efficiency and network lifetime for reliable data transmission in MANETs.Novel strategies for selection of MPR nodes using a simplified OLSR routing protocol were proposed in [39] without the addition of a signaling message to forwarding packets.In addition to that, the main aim is to optimize the network performance by reducing the number of signaling messages, collision & traffic congestion, and energy consumption in the network.In [40], the authors proposed a proactive Multipath Optimized Link State Routing Protocol (MOLSR), in which new cross-layer metric and node discovery algorithm are employed to build two disjoint paths between source and destination node.The first disjoint path is used for data communication, whereas the second path acts as an alternate path in case link failure occurs in the first path.Moreover, the MOLSR routing protocol broadcasts a smaller number of topological messages as compared to OLSR for the discovery of multiple paths between source to destination node.The results prove that it provides a better route selection mechanism and reduces packet overhead for the network.In [41], a modified OLSR routing scheme called OLSRM for reliable routing and energy conservation in the network is proposed, which selects mobile nodes based on the energy awareness and drain rate.The OLSRM aims to maximize MPR lifetime and minimizes routing overhead in the network.
Due to the scalability problem of flat routing schemes in ad-hoc networks, in [42] the authors presented the Heterogenous OLSR routing (HOLSR) scheme, which is more suitable for large-scale heterogeneous networks.The HOLSR scheme utilizes multiple interfaces and reduces control message overhead in the network to improve the performance of the routing mechanism.The authors in [43] introduced a multipath routing scheme called Quality OLSR routing (QOLSR), in which path selection criteria is based on the multiple QoS routing metric such as bandwidth and delay.The paths are loop-free and multiple node disjoint, computed by using shortest-widest path algorithm.Moreover, a correlation factor is presented which defines the number of links between each two disjoint paths to minimize the interferences between multiple paths to achieve better QoS with improved network resource utilization.Sarkar et al. [44] proposed a point-based mobility factor computing technique for route selection which provides better routing performance for frequent topology changing scenarios in MANETs.The point-based mobility factor value is estimated based on the pause time, speed and moving direction of the mobile nodes for the static and dynamic scenario.Moreover, the Mobility Aware Routing (MAR) technique based on the mobility factor is used for the route selection process.It provides better QoS and performance for the static and highly dynamic network.An energy-aware routing model named enhanced intellects masses optimizer energy-efficient and secure optimized link state routing (EIMO-ESOLSR) is proposed in [45] for energy efficient and secure MPR selection in OLSR routing scheme.The MPR selection in the route discovery process is based on the willingness and Composite Eligibility Index (CEI) for each node in the network.The willingness value is estimated based on the available bandwidth, lifetime, and queue occupancy metric of the network nodes, whereas the CEI value is calculated based on the power factor, misbehaving probability, and forwarding behavior of the network nodes.The results show that the proposed EIMO-ESOLSR routing scheme reduces the energy consumption, provides a secured selection of MPR nodes, and minimizes flooding of packets in the network.
One of the limitations in MANETs is the traffic congestion, that occurs when the excessive flow of packets injected on a single mobile node and carry most of the network traffic that affect the quick depletion of the node's battery.Due to this, routing approaches suffer from balancing the load among the intermediate nodes of the network, which can further significantly degrade the network performance.In addition to that, the authors in [46] addressed a network utility maximization (NUM)-based congestion control paradigm for cross-layer designs for wireless networks.It considered that the problem of cross-layer congestion control arises from the several dimensions, specifically, elastic or inelastic traffic, same or multitimescale, and single or multipath transmissions.Moreover, it gives comprehensive discussions on various mathematical techniques and solutions used for cross-layer congestion control problems.The Successive Convex Approximation (SCA) methods is also discussed, which strongly depend on initial conditions and network scenarios.Overall, it concluded that the cross-layer congestion control in wireless networks can be solved by utilizing network generation network such as SDN and Cloud-RAN with efficient central controller and network statistics.In [47], the authors introduced Proactive Source Routing protocol (PSR) to facilitate opportunistic data forwarding in MANETs.Each node has full information about the network topology and periodically exchanges messages with neighbors' nodes.The exchanged messages contain link cost information based on the number of nodes along the path from source to destination.The PSR routing maintains more network topology information than distance vector routing, making it ideal for source routing since it incurs less routing overhead with improve data transmission capability.In [48], a routing protocol called Least Common Multiple based Routing (LCMR) is proposed for MANETs which load balances among the different possible paths.The LCMR routing distributes packets and properly uses them for computation of route along multiple paths to minimize the overall routing time.
For link instabilities, energy resource constraints and node mobility of frequently topology changing network, [49] introduced Multipath-ChaMeLeon (M-CML) routing protocols to stabilize and enhance the QoS and performance of the network.The introduced routing protocol helps to minimize routing overhead and energy consumption of mobile nodes by cutting down the amount of generated of duplicate packets in the network.In a dynamically changing network topology, link quality is badly impacted, resulting in frequent link failure in the network.To solve this, a Smooth Mobility and Link Reliability-based OLSR (SMLR-OLSR) routing scheme is proposed in [50].Moreover, Semi-Markov Smooth and Complexity Restricted mobility model (SMS-CR) is used to provide enough smoothness and low complexity for reliability enhanced MPR selection that facilities longer MPR lifetime and less routing overhead in the network.In [51], authors proposed the swarm intelligence based Ant-based Energy Aware Disjoint Multipath Routing Algorithm (AEADMRA), which consider "node's energy consumption" as one of node selection criterion.The proposed algorithm is a subset of swarm intelligence, which utilizes the ability of ants to solve complex problems by cooperating among themselves.The results show that it delivers better performance in terms of reducing routing complexity and improving route discovery process.A scalable routing protocol named Link-stability and energy Aware Routing protocol (LAER) has been proposed in [52], which is based on the joint metric of link stability and energy drain rate.A Biobjective Integer Programming (BIP) model is adopted to achieve optimal solution in the route selection process.The BIP model selects next hop node among the neighbor nodes, which has better link stability and less energy consumption rate.The LAER scheme prolongs the network lifetime and maintains the robustness of the network.A parallel Disjointed Multipoint routing algorithm-based OLSR called DMP-EOLSR is proposed in [53], which is a hybrid routing protocol that involves both reactive and proactive routing protocols.The DMP-EOLSR considers living time of node as well as living time of links between nodes based on energy consumption and moving state of mobile nodes during route computation.The living time of nodes depends on the residual energy of the nodes, whereas the living time of links between nodes depends on speed and direction of the movement.Moreover, in order to find multiple parallel disjoint paths or link disjoint paths, assessment factors of node energy and an iterative is used based on the modified Dijkstra algorithm.The Dijkstra algorithms are well-known schemes for exploring multiple disjointed paths between source and destination nodes based on topology information of the network.The proposed algorithm not only improves the stability of the selected paths with better parallel transmission capacity, but also reduces the computation cost of intermediate nodes in route selection and route recovery mechanism.
Another approach proposes a MP-OLSR routing protocol in [54], where the Dijkstra algorithm has been modified in order to find multiple routes for sparse and dense network.It utilized two cost function to generate node disjoint and link disjoint paths.Results proved that the proposed algorithm is able to achieve great flexibility by using different link cost metrics and cost function.In addition, two new concepts were introduced, namely; route recovery and loop detection in order to improve the packet delivery ratio and reduce the chance of optimal link failure for network topology that changes frequently.The MP-OLSR routing improves overall network performance, especially in high mobility and heavy loaded network scenarios.Yi et al. [55] proposed multipath extension version of MP-OLSR stated as MP-OLSRv2, which belongs to the category of hybrid multipath routing protocol that involves proactive and reactive routing concepts for data transmission.The MP-OLSRv2 uses two incremental cost functions for the link cost between the nodes to generate multiple node-disjoint and link disjoint paths.It discovers multiple disjointed paths instead of single disjointed path for data transmission with the multipath Dijkstra algorithm in a frequently changing network topology.The MP-OLSRv2 mechanism is divided into two phases (topology sensing and route computation) to maintain the multiple routes from source to destination pairs.It delivers reliable communication and facilitates traffic load distribution into multiple paths, thereby ensuring load balancing among the nodes of the network.Multipath data transmission is a way of increasing network throughput while maintaining robust link with reliable transmission in the network.The selection of highly qualified intermediate nodes in the route mapping process still remains one of the most important and critical issues for efficient data transmission in MANET networks.Thus, this is one of the key factors mind in our design and implementation of the MRLAM routing protocol proposed in this paper.However, the MP-OLSR and MP-OLSRv2 routing scheme do not consider residual energy, mobility, and link quality status of the intermediate nodes, when selecting the route from source to destination nodes.In the MRLAM scheme, a route is established between source and destination nodes with the active participation of intermediate nodes that have sufficient amount of energy resource, less mobility, and better link quality.

System Model
A communication network in Mobile Ad-hoc network environment can be represented as a graph G = (V, L) wherein V is the set of nodes and L is the link between the nodes.The link between the source node s and destination node d is denoted by (s, d) i.e., s, d ∈ V.Whenever node s is the neighbor of node d, both of which can directly communicate with each other.In a centralized network, the network topology information is maintained by a network controller.In a decentralized network, network topology information is maintained by the information exchange between the node and its neighbors.The proposed routing scheme is for a decentralized network, in which all mobile nodes work as hosts as well as a router for other nodes to build network infrastructure.In the proposed scheme, the route selection and data transmission is based on the collaboration among network nodes for topological information exchange among each other.Due to the dynamic movement of nodes, network topology continuously changes in an unpredictable manner.In this scenario, if two nodes do not share a direct link with each other, they will utilize routing protocols to establish a connection, using intermediate nodes between them to transfer data.The proposed MRLAM routing scheme utilizes some key factors, such as the energy consumption, link quality of channel, and mobility of nodes in the route computation, to enhance the energy efficiency and QoS of communicating nodes as discussed below:

Energy Consumption Estimation of Mobile Nodes
The energy consumption of nodes plays a significant role in the selection of robust and qualified intermediate nodes, which collaborate to improve the overall network performance.The mobile nodes in the MANETs are battery operated and their energy is limited.Therefore, battery life of the nodes should be taken into account while selecting a set of intermediate nodes for route establishment to a destination in the network.Moreover, the mobile nodes operate in four states, which are; transmission, receiving, idle, and sleep states.The energy consumption in idle and sleep states is low; hence, the transmission and receiving states of nodes are considered in the design of the routing algorithm of our proposed scheme.In the proposed scheme, the energy of each node is linearly discharged as a function of current load based on the Coulomb counting technique [56], which accumulates the dissipated Coulombs from the beginning of the discharge cycle.It estimates the residual energy based on the difference between the accumulated value and prerecorded full charge value of the battery capacity.The proposed scheme utilizes a generic radio energy model [57] where E i Consumption (t + τ) refers to the energy consumption of node i for a period of τ, which is the energy consumed during transmission, receiving, exchanging routing information control packets, and during internal operation time of nodes.In the proposed scheme, energy consumption is estimated based on the circuitry power consumption, number of packets exchanged, and time spent in each state.Thus, the energy consumption in a transmission state of the node i for transmitting n number packets can be calculated as follows: whereas P i Transmission (t + τ) denotes the amount of power consumed during transmission of n number of data packets and exchanging routing information control packets with neighbor's nodes at the period of τ time, the unit is in Watts per second.Similarly, the energy consumed by node i during the receiving state for transmitting m numbers of packets and including control packets at the period τ is calculated as follows: where P i Recieve (t + τ) represents power consumed by node i during exchange control messages and receiving m number of packets at a time τ.Moreover, all the nodes consume energy when they perform internal operation such as connecting, managing, catching, and updating database at the time of τ period, which is denoted by E i operation (t + τ).Therefore, the total energy consumption of node i at the time period τ in all transmission, reception and operation states can be calculated as follows: Finally, the residual energy of node i updated at time τ is denoted as follows: The above expression provides information on the amount of remaining energy and the amount of energy that will be consumed by node i at time t + τ.Based on this, MRLAM routing scheme gives higher priority to nodes which have higher available residual energy and less energy consumption rate while establishing a connection between end users.Overall, this approach maximizes the network lifetime with decreased chance of link failure in the network.

Node Mobility Estimation
Node mobility estimation governs the unpredictable movement of a node in a network, which helps to estimate the probability of a link failure and link instability in advance.The random waypoint model (RWP) is extensively used for estimation of node's mobility and for simulation analysis in the mobile wireless network due to its simplicity and better QoS [58].Several parameters, such as speed, move time, pause time, and separation distance of the mobile nodes, are configured for the estimation of node's mobility.The RWP model periodically estimates the mobility of nodes and selects the random destination point ("waypoint") in the network based on the above parameters.Once the nodes reach their selected waypoints, nodes are pause for defined duration of time (pause time), and this process is repeated.The waypoints are uniformly and randomly chosen in the network area.In the proposed scheme, nodes can move randomly in an area of operation and pause for an arbitrary period of time.The maximum and minimum velocity (v max and v min ) of each node is configured based on the RWP model along with its calculation of pause time duration, so as to simulate real deployment scenarios.Network nodes are uniformly distributed in the RWP model between the velocities range [v max − v min ], where v max and v min are the minimum and maximum velocities of the nodes, respectively.The node speed distribution f (s,d) (v) in the dynamic network is described as below: where the velocity function v is defined in the range of v ∈ [v max − v min ], p pause and p mov = (1 − p pause ) are the probabilities of nodes in the pause and moving states in the network respectively.δ(v) defines the Dirac delta function whose value updates according to the velocity of the node.The value of δ(v) varies from 0 to 1, when the velocity of a node is maximum, δ(v) is updated to 0, and when the velocity of the node is minimum, δ(v) become 1.The pause time of nodes is considered to be t p ≥ 0, and the value of probability of pause time p pause can be calculated as follows: where E[D] denotes expected value of distance cover due to node movement on the whole trip.If a node starts from the pause state p pause , it pause time is set to be t p .On the other hand, if the node starts from the moving state p mov , it will select the node speed range of v ∈ [v max − v min ].Recall that, p pause is the probability of a mobile node in the pause state, while t p is the time spent by the node during pause state.The mobility of nodes in a distributed dynamic network is calculated as follows: The stability of an established path between the source and destination nodes depends on the mobility of the intermediate nodes which form the path.Therefore, a path will be unstable if the mobility of the constituent nodes of the path is high.Based on the above mobility value of nodes in the network, the MRLAM routing scheme prioritizes nodes with less mobility, meaning that the route's stability and network's lifetime are increased, resulting in reduced probability of link failure.

Link Quality Estimation of Mobile Nodes
The link quality of the channel is one of the key factors in the selection of the reliable route, which is estimated based on the Expected Number of Transmission (ETX) and Expected Any-path Count (EAX) parameters [59].The ETX is the expected number of transmissions required before a packet is successfully delivered to the next-hop which also determines link quality and packet loss on both directions of a link.Node i is the intermediate node whose probabilities of receiving and Appl.Sci.2019, 9, 1582 9 of 23 forwarding data packets are λ i and µ i , respectively.These values are measured through the probe message that is transmitted using dedicated link before actual data transmission.Each node broadcasts a probe message to all of its neighbor nodes and maintains records of transmitted probe message for a designated period of τ seconds.Accordingly, the packet delivery ratio probability from a source node to neighbor nodes is calculated as follows: whereas n w denotes the number of probe messages delivered in a period of w seconds satisfying the condition w > τ.The probability that a neighbor's node i receives a packet from at least one forward node is p f or = 1 − i>s (1 − µ s λ i ), and the probability that node i successfully delivers a packet to at least one backward node is Consequently, the expected number of transmissions that node i needs to take can be calculated as: The ETX calculates the expected number of transmissions needed to successfully deliver a packet from source to destination through the forward node set i.The ETX metric selects all forwarded candidates and prioritizes them based on the Equation (10).However, the ETX metric also includes forwarded sets that increase packet overhead in the network.To address this issue, the EAX metric is employed to select only the optimal forwarded sets and prioritize them based on opportunistic routing.This decreases interference with neighbor's node and minimized packets overhead, and number of transmissions to some extent.The hop-count parameter is useful for optimized routing performance with minimum hop-count value.In addition to that, EAX minimizes the number of intermediate nodes and number of transmissions required for reliable packet's delivery on optimal route without contrarily affecting the performance of network.C s, d defines the forwarded set of candidates from node s to d and C s, d i defines the node i forwarded set prioritizing range from 0 to 1. Therefore, the delivery probability between source and C s, d i is p f or and vice versa (considering both the forward data and backward ACK transmissions).Then, the EAX value from source to destination through forwarded set can be calculated as follows: In the above expression, the EAX metric selects and enables the potential candidate pairs of intermediate nodes based on the packet's delivery probability.The selection of potential candidates in the optimal route from source to destination nodes is performed as follows.Firstly, the ETX metric potential candidate C s, d potential is determined based on the best path, such that if ETX(s, d) > ETX( j, d), node j will be added to the potential candidates in the route.Secondly, the subset of potential nodes C s, d potential is selected as the actual candidates set C s, d having the smallest ETX value to the destination.Also, a potential node is added to the forwarded set C s, d when it decreases the value of EAX(s, d) by a factor δ, which is defined as a configurable parameter.This step iterates until the new potential node is added in the network.
The selection of intermediate potential candidate nodes with ETX and EAX metrics is shown in Figure 1.It can be observed that source node s, destination node d, and next-hop neighbor nodes b, e, and f of node c possess a different ratio of delivery probabilities.The ETX metric selects all three neighbor nodes such as b, e, and f because paths from these nodes to d have smaller ETX values than that from s to d.These three neighbor's nodes are sorted in ascending order based on the ETX value obtained by using Equation (10).On the other hand, the EAX select only two nodes, such as b and e, because with these two candidates, the EAX from s to d is less than the EAX from f to d, adding f to the candidate set does not decrease EAX between s and d.Furthermore, EAX prioritizes node b over node e as it possesses a smaller value of EAX based on Equation (11) when it decreases the value of ( ) , EAX s d by a factor δ , which is defined as a configurable parameter.This step iterates until the new potential node is added in the network.Q-Learning is a prevalent paradigm of reinforcement learning algorithms applied in wireless multipath networks for decision making policy to improve network performance.Some salient attributes of Q-Learning that make it the most widely adopted algorithm are its simplicity and modelfree nature.Moreover, Q-Learning generates an optimal reward value within environments' stateaction pairs that are described by Markov decision processes [60].Q-learning observes agent behavior through trial-and-error interactions with a dynamic environment through a tuple (S, A, R) such as states, actions, and rewards as shown in Figure 2.  Q-Learning is a prevalent paradigm of reinforcement learning algorithms applied in wireless multipath networks for decision making policy to improve network performance.Some salient attributes of Q-Learning that make it the most widely adopted algorithm are its simplicity and model-free nature.Moreover, Q-Learning generates an optimal reward value within environments' state-action pairs that are described by Markov decision processes [60].Q-learning observes agent behavior through trial-and-error interactions with a dynamic environment through a tuple (S, A, R) such as states, actions, and rewards as shown in Figure 2. The objective of the agent is to maximize its long term rewards of the selection procedure which are obtained from immediate and discounted rewards for the estimation of the Q-value ( , )  The objective of the agent is to maximize its long term rewards of the selection procedure which are obtained from immediate and discounted rewards for the estimation of the Q-value Q i t (s i t , a i t ), (s ∈ S, a ∈ A) by taking a given action a i t at the given state s i t .The Q-Learning algorithm learns in a nondynamic environment and adopts the best policy to solve routing problems in a distributed manner by utilizing the Q-value equation as shown below.
where α is defined as the learning rate and ranges between 0 ≤ α ≤ 1, it depends upon variation of Q-value that changes with dynamic topology of the network.If Q-value changes very fast, learning rate value goes to 1 and the agent takes only new reward value.The parameter γ ∈ [0, 1] is a discount factor, which determines the importance of future rewards.Low value, i.e., γ = 0, indicates the system is myopic and merely takes results of the current action into account.By contrast, as γ close to 1, future rewards play an important role in determining optimal actions.In addition, max ) is the model of maximum expected future reward, which selects possible action a i in the next state s i t+1 .Based on different decision factors, i.e., link quality, mobility, and energy consumption of nodes, as described above, a new reward function for node i updated at each instant t + 1 is formulated and is as follows: where w 1 , w 2 , and, w 3 are the corresponding weight assigned to each criterion based on the available status of nodes (i.e., mobility, link quality, and residual energy) which range from 0 to 1 and their sum equals to 1.The above criteria's value and weights facilities the sensitivity of decision factors for future reward value such as R i t+1 .Therefore, Q-value of node i is updated with a previously stored value and the new reward R i t+1 which is calculated using the following equation as below: Lastly, based on the expression, next-hop node which has higher Q-value is selected for the optimal route.The flowchart of MRLAM routing scheme shown in Figure 3 describes the process of selection of optimal route from source to destination based on Q-learning algorithms.The MRLAM routing scheme selects the optimal route applying decision policy-based Q-learning algorithms, which can effectively reduce the redundant information in the network topology and enhance overall network performance.The route selection based on mobility, residual energy, and link quality with the Q-Learning algorithm is described below: At every edge and node between source and destination node 2.
Choose action a i t from the state s i t using best policy derived Q i t+1

4.
Q-value at time as: Based on mobility, residual energy and link quality new reward function Update Q-value with new reward function as: Select the optimal route with update Q-value

Simulation Setup
Extensive simulations have been conducted with MATLAB 2018a simulator to evaluate the performance of the MRLAM routing scheme, which is compared with MP-OLSR and its extended version MP-OLSRv2 routing schemes, using different speed of node scenario.Random topologies with a maximum of 49 nodes are generated over a rectangular field in the area of 1000 m × 1000 m.The nodes are placed in the middle of the simulation area and seven data sources are randomly chosen for the scenarios and all sources are transmitting their data to the destination node.The Constant Bit Rate (CBR) generates traffic into networks having a size of the CBR 512 bps from sources and the simulation runs in total for 200 times.802.11a standard and modules of the wireless physical layer are utilized in the simulation in order to provide a high level of accuracy.The User Datagram

Simulation Setup
Extensive simulations have been conducted with MATLAB 2018a simulator to evaluate the performance of the MRLAM routing scheme, which is compared with MP-OLSR and its extended version MP-OLSRv2 routing schemes, using different speed of node scenario.Random topologies with a maximum of 49 nodes are generated over a rectangular field in the area of 1000 m × 1000 m.The nodes are placed in the middle of the simulation area and seven data sources are randomly chosen for the scenarios and all sources are transmitting their data to the destination node.The Constant Bit Rate (CBR) generates traffic into networks having a size of the CBR 512 bps from sources and the simulation runs in total for 200 times.802.11a standard and modules of the wireless physical layer are utilized in the simulation in order to provide a high level of accuracy.The User Datagram Protocol (UDP) has been used as the transport layer protocol, which, in contrast to Transmission Control Protocol (TCP), provides a simple transmission model.Random Waypoint Mobility model is used as the mobility standard that enables node speed variation from 10 m/s to 60 m/s.The other performance evolution scenarios are presented in Table 2 as follows:

Evaluation Criteria
The objective of this extensive simulation is to evaluate the performance of the MRLAM routing scheme and the following metric is adopted for the performance evaluation: (i) Throughput: A network throughput represents the average amount of data productivity that is delivered during a period of the network operation time, between the source and destination node pairs.It is expressed in Kbps that can be defined as: (ii) Average End-to-End Delay (Avg.EED): It is the ratio of total time taken for data transmission between source and destination node to the number of packets received at the destination node, which can be calculated as follows: Avg. EED = Total time taken for packets transmission Number of packets recieved (16) (iii) Packet Loss Ratio : (PLR): It is represented as the percentage ratio of the number of packets dropped by the malicious nodes that are selected in the route to the total number of packets sent during the transmission, which is calculated as follows: where N s and N r denotes the number of bits transmitted from the source and received at destination nodes respectively.(iv) Energy Consumption : It is defined as the total amount of energy consumed by all nodes for key transmission throughout the duration of the simulation.Energy consumption of each node is obtained at the end of each simulation, factoring in the initial energy of each node.The energy consumption formula for transmitting data is: where E Total is the total energy of node which is 3600 mAh in the simulation model.(v) Energy Cost: The energy cost metric represents operational model's energy efficiency for period of data transmission with end node pairs that can be evaluated as the energy consumption of nodes to the total number of successfully received packets in the network, as shown below:

Results and Discussion
In this section, results acquired from the simulation of MRLAM, MP-OLSR, and MP-OLSRv2 routing schemes under different node speed scenarios are presented.The results of mentioned routing schemes are compared on the following metric, i.e., throughput, average end-to-end delay, packet loss ratio, energy consumption, energy cost, and convergence time.Furthermore, critical analysis of acquired results is also conducted in this section.

Throughput
The throughput performance of the MRLAM, MP-OLSR, and MP-OLSRv2 routing schemes with increasing node speed shown in Figure 4 indicates that the MRLAM scheme constantly provides acceptable and better throughput performance than the MP-OLSR and MP-OLSRv2 routing schemes in all scenarios of node speed.Higher throughput achieved through the MRLAM scheme is indicative of the mobility awareness consideration factored in during the optimal route selection process, whereas the other two routing schemes do not consider mobility awareness, especially when a link failure occurs as a result of high node's mobility.The MP-OLSR routing employs route recovery and loop detection, while the MP-OLSRv2 employs the topology sensing and route computation mechanism for the selection of an intermediate node in the route.These mechanisms need to transmit extra packets in the network when the route is established, which lead to reduce number of packets received at the destination at the particular time period.The route selected by the MRLAM scheme usually has lower mobility or more energy level than other routes; therefore, the link is more stable and ultimately experiences very few packet drops, which in turn maximizes the throughput.The link failure of nodes in MRLAM routing is estimated by using ETX and EAX parameters.Based on the values of these parameters, the proposed routing scheme diverts the packet flow toward the intermediate nodes which have better link quality.As the node speed increases, the number of link breaks with MRLAM remains lower than MP-OLSR and MP-OLSRv2 routing schemes and accordingly, the throughput decreases constantly.Throughput decreases from 47.39 kbps to 39.87 kbps for MP-OLSR, from 49.17 kbps to 40.87 kbps for MP-OLSRv2, and from 50.86 kbps to 42.63 kbps for MRLAM when the node speed varies from 10 m/s to 60 m/s respectively.The proposed routing scheme selects the route based on the status of link quality metric, which reduces the number of transmissions and increases channel utilization efficiency upon which reliable link is selected.This, in turn, increases the throughput of the whole network.Compared with the other schemes, critical analysis of the simulation results demonstrates that MRLAM scheme clearly performs better when it comes to selecting intermediate nodes that are more qualified in terms of low mobility, energy level, and link quality.As a result, the route recovery process invokes less frequent link failure; therefore, improved overall network throughput is achieved.
received at the destination at the particular time period.The route selected by the MRLAM scheme usually has lower mobility or more energy level than other routes; therefore, the link is more stable and ultimately experiences very few packet drops, which in turn maximizes the throughput.The link failure of nodes in MRLAM routing is estimated by using ETX and EAX parameters.Based on the values of these parameters, the proposed routing scheme diverts the packet flow toward the intermediate nodes which have better link quality.As the node speed increases, the number of link breaks with MRLAM remains lower than MP-OLSR and MP-OLSRv2 routing schemes and accordingly, the throughput decreases constantly.Throughput decreases from 47.39 kbps to 39.87 kbps for MP-OLSR, from 49.17 kbps to 40.87 kbps for MP-OLSRv2, and from 50.86 kbps to 42.63 kbps for MRLAM when the node speed varies from 10 m/s to 60 m/s respectively.The proposed routing scheme selects the route based on the status of link quality metric, which reduces the number of transmissions and increases channel utilization efficiency upon which reliable link is selected.This, in turn, increases the throughput of the whole network.Compared with the other schemes, critical analysis of the simulation results demonstrates that MRLAM scheme clearly performs better when it comes to selecting intermediate nodes that are more qualified in terms of low mobility, energy level, and link quality.As a result, the route recovery process invokes less frequent link failure; therefore, improved overall network throughput is achieved.

Average End-to-End Delay (Avg. EED)
Figure 5 illustrates the end-to-end delay comparison of the MRLAM, MP-OLSRv2, and MP-OLSR routing schemes with a variation of node speed between 10 m/s and 60 m/s.It can be seen from the figure that all the schemes show almost similar results when the node speed is till 30 m/s.As the node speed increases from 40 m/s, the end-to-end delay of the conventional MP-OLSR and MP-OLSRv2 routing schemes substantially increases.It is due to the fact that both routing schemes do not consider mobility awareness factor of intermediate nodes in the selection of an optimal route, which leads to increased delay at high-speed node scenarios.The MP-OLSR and MP-OLSRv2 routing schemes select the intermediate nodes based on the multiple node-disjoint and link disjoint metric cost functions, respectively.Therefore, these routing schemes forward packets through a longer route from source to destination, that induces propagation and transmission delays in the network.Moreover, MRLAM scheme maintains and controls end-to-end delay by exploiting EAX metric that minimizes the number of intermediate nodes and number of retransmissions required for reliable

Average End-to-End Delay (Avg. EED)
Figure 5 illustrates the end-to-end delay comparison of the MRLAM, MP-OLSRv2, and MP-OLSR routing schemes with a variation of node speed between 10 m/s and 60 m/s.It can be seen from the figure that all the schemes show almost similar results when the node speed is till 30 m/s.As the node speed increases from 40 m/s, the end-to-end delay of the conventional MP-OLSR and MP-OLSRv2 routing schemes substantially increases.It is due to the fact that both routing schemes do not consider mobility awareness factor of intermediate nodes in the selection of an optimal route, which leads to increased delay at high-speed node scenarios.The MP-OLSR and MP-OLSRv2 routing schemes select the intermediate nodes based on the multiple node-disjoint and link disjoint metric cost functions, respectively.Therefore, these routing schemes forward packets through a longer route from source to destination, that induces propagation and transmission delays in the network.Moreover, MRLAM scheme maintains and controls end-to-end delay by exploiting EAX metric that minimizes the number of intermediate nodes and number of retransmissions required for reliable packet's delivery on optimal route.The MRLAM utilizes pause time and moving time factor for mobility estimation of nodes from the RWP model.Therefore, it stabilizes the dynamic network to effectively deliver the packet with less time.In addition, using the Q-learning algorithm, the MRLAM selects the intermediate nodes which have low mobility and better link quality when determining the best from available alternate paths for forwarding data packets toward the destination that leads to minimized end-to-end delay.Overall, it can be observed that MRLAM reduces the end to end delay by approximately 15% and 10% compared to MP-OLSR and MP-OLSRv2 routing schemes, respectively, at 60 m/s node speed.
packet's delivery on optimal route.The MRLAM utilizes pause time and moving time factor for mobility estimation of nodes from the RWP model.Therefore, it stabilizes the dynamic network to effectively deliver the packet with less time.In addition, using the Q-learning algorithm, the MRLAM selects the intermediate nodes which have low mobility and better link quality when determining the best path from available alternate paths for forwarding data packets toward the destination that leads to minimized end-to-end delay.Overall, it can be observed that MRLAM reduces the end to end delay by approximately 15% and 10% compared to MP-OLSR and MP-OLSRv2 routing schemes, respectively, at 60 m/s node speed.

Packets Loss Ratio (PLR)
The MRLAM routing scheme maintains a PLR less than 30% for all scenarios of node speed which is a significant performance improvement for efficient data transmitted in MANETs environment.These results attribute the effectiveness of the proposed scheme to selection of optimal route synchronized with Q-learning process for multiple parameters during the route establishment process.The Q-learning algorithm provides higher rewards to the node which has better link quality with a neighbor node, which reduces the chances of frequent link failure and increases the successful delivery probability of packets.The MRLAM scheme exploits EAX that decreases the number of intermediate nodes in the selection of optimal route, the impact of which reduces the number of dropped packets during data transmission.In addition, EAX decreases the number of trials for data retransmission and reduces link failure probability resulting from the dynamic nature of networks, thereby keeping a lower PLR than other multipath schemes.In establishing a reliable route, the MRLAM scheme avoids the nodes which change their positions frequently and quickly exhaust their battery power.For high-speed node scenarios, it can be seen from Figure 6 that MRLAM scheme achieves better performance compared to both routing schemes, and the improvement percentages are approximately 30.76% and 24.59% compared to MP-OLSR and MP-OLSRv2, respectively.

Packets Loss Ratio (PLR)
The MRLAM routing scheme maintains a PLR less than 30% for all scenarios of node speed which is a significant performance improvement for efficient data transmitted in MANETs environment.These results attribute the effectiveness of the proposed scheme to selection of optimal route synchronized with Q-learning process for multiple parameters during the route establishment process.The Q-learning algorithm provides higher rewards to the node which has better link quality with a neighbor node, which reduces the chances of frequent link failure and increases the successful delivery probability of packets.The MRLAM scheme exploits EAX that decreases the number of intermediate nodes in the selection of optimal route, the impact of which reduces the number of dropped packets during data transmission.In addition, EAX decreases the number of trials for data retransmission and reduces link failure probability resulting from the dynamic nature of networks, thereby keeping a lower PLR than other multipath schemes.In establishing a reliable route, the MRLAM scheme avoids the nodes which change their positions frequently and quickly exhaust their battery power.For high-speed node scenarios, it can be seen from Figure 6 that MRLAM scheme achieves better performance compared to both routing schemes, and the improvement percentages are approximately 30.76% and 24.59% compared to MP-OLSR and MP-OLSRv2, respectively.

Energy Consumption
Comparison of node energy consumption during operation time of the network is shown in Figure 7.The result shows that MRLAM performs better than the other routing schemes in terms of energy consumption during path establishment and in exchanging routing topological information as it selects intermediate nodes based on low EAX value for probe messages.Moreover, it has already been established that MRLAM selects the path with less chances of link failure, resulting in lower energy consumption as it forwards data packets toward destination node.If effect, the Q-learning technique provides high rewards value to the nodes which have lower energy consumption, thereby enhancing energy utilization during data transmission.However, as the node speed increases, the energy consumption of nodes increases.Energy consumption increases from 56.67 mAh to 57.26 mAh for MP-OLSR, from 56.43 mAh to 56.926 mAh for MP-OLSRv2, and from 56.15 mAh to 56.72 mAh for MRLAM approximately, when the node speed increases from 10 m/s to 60 m/s.Overall, it can be observed that has less energy consumption rating than MP-OLSR and MP-OLSRv2 scheme because the source node forwards traffic flow towards intermediate nodes which have the highest level of energy as opposed to its counterpart routing schemes.

Energy Consumption
Comparison of node energy consumption during operation time of the network is shown in Figure 7.The result shows that MRLAM performs better than the other routing schemes in terms of energy consumption during path establishment and in exchanging routing topological information as it selects intermediate nodes based on low EAX value for probe messages.Moreover, it has already been established that MRLAM selects the path with less chances of link failure, resulting in lower energy consumption as it forwards data packets toward destination node.If effect, the Q-learning technique provides high rewards value to the nodes which have lower energy consumption, thereby enhancing energy utilization during data transmission.However, as the node speed increases, the energy consumption of nodes increases.Energy consumption increases from 56.67 mAh to 57.26 mAh for MP-OLSR, from 56.43 mAh to 56.926 mAh for MP-OLSRv2, and from 56.15 mAh to 56.72 mAh for MRLAM approximately, when the node speed increases from 10 m/s to 60 m/s.Overall, it can be observed that MRLAM has less energy consumption rating than MP-OLSR and MP-OLSRv2 scheme because the source node forwards traffic flow towards intermediate nodes which have the highest level of energy as opposed to its counterpart routing schemes.

Energy Cost
Simulation results for energy cost per packet are shown in Figure 8, the proposed routing scheme attains lower energy cost due to the node selection mechanism it employs which select only the nodes with highest energy level.Displacement of intermediate nodes in the path causes the variation of energy levels of nodes whilst increasing mobility.The MRLAM scheme also selects nodes with better link quality and lower speed to decrease energy consumption and packet loss this awareness is not used in the other routing protocols.The increase in a node's movement in the network increases the complexity of network topology for sensing and route computation process.However, the proposed MRLAM routing scheme takes advantage of utilizing Q-learning algorithm along with minimum energy consumption node for transmission packets that helps to reduce the energy cost as well as complexity of the network.The Q-learning algorithm periodically updates the state-action value with learning rate  based on energy consumption of intermediate nodes as well as the number of packets transmitted and received.Moreover, there is reduced network packet flooding and energy cost of packets forwarded to the destination in MRLAM scheme compare with the other schemes.From the perspective of the overall node speeds, critical analysis of the simulation results shows a decrease in energy cost by 33% and 23% as compared to the MP-OLSR and MP-OLSRv2 routing schemes, respectively.

Energy Cost
Simulation results for energy cost per packet are shown in Figure 8, the proposed routing scheme lower energy cost due to the node selection mechanism it employs which select only the nodes with highest energy level.Displacement of intermediate nodes in the path causes the variation of energy levels of nodes whilst increasing mobility.The MRLAM scheme also selects nodes with better link quality and lower speed to decrease energy consumption and packet loss ratio, this awareness is not used in the other routing protocols.The increase in a node's movement in the network increases the complexity of network topology for sensing and route computation process.However, the proposed MRLAM routing scheme takes advantage of utilizing Q-learning algorithm along with minimum energy consumption node for transmission packets that helps to reduce the energy cost as well as complexity of the network.The Q-learning algorithm periodically updates the state-action value with learning rate α based on energy consumption of intermediate nodes as well as the number of packets transmitted and received.Moreover, there is reduced network packet flooding and energy cost of packets forwarded to the destination in MRLAM scheme compare with the other schemes.From the perspective of the overall node speeds, critical analysis of the simulation results shows a decrease in energy cost by 33% and 23% as compared to the MP-OLSR and MP-OLSRv2 routing schemes, respectively.

Convergence Time
The convergence time for all routing schemes is depicted in Figure 9.It can be seen from the figure that all the schemes show similar results when the node speed is up to 30 m/s.As the node speed increases to 40 m/s, the convergence time of the conventional MP-OLSR and MP-OLSRv2 routing schemes substantially increases.This because other routing schemes select the longer path from source to destination nodes based on the link cost function and multiple disjoint metrics, while the MRLAM chooses the shortest path with more stable nodes and better links which reduces data packet transmission delay.In addition, the MRLAM routing scheme utilizes the mobility and link quality status of the mobile nodes, which leads to minimizing the frequent change of network topology.Meanwhile, it uses the Q-learning algorithm which quickly updates the routing information of the network with any changes occurring in the network, and this reduces the convergence time of the network.In the MRLAM routing scheme, fewer intermediate nodes need to converge, thus the load on any given node or communication link is minimized.Therefore, it reduces the calculating costs of intermediate nodes of the paths and improves communication efficiency.Overall, at the node speed of 60 m/s, the proposed MRLAM routing scheme recorded the lowest convergence time of 16.49% and 11.34% in comparison to the MP-OLSR and the MP-OLSRv2 routing schemes, respectively.

Convergence Time
The convergence time for all routing schemes is depicted in Figure 9.It can be seen from the figure that all the schemes show similar results when the node speed is up to 30 m/s.As the node speed increases to 40 m/s, the convergence time of the conventional MP-OLSR and MP-OLSRv2 routing schemes substantially increases.This because other routing schemes select the longer path from source to destination nodes based on the link cost function and multiple disjoint metrics, while the MRLAM chooses the shortest path with more stable nodes and better links which reduces data packet transmission delay.In addition, the MRLAM routing scheme utilizes the mobility and link quality status of the mobile nodes, which leads to minimizing the frequent change of network topology.Meanwhile, it uses the Q-learning algorithm which quickly updates the routing information of the network with any changes occurring in the network, and this reduces the convergence time of the network.In the MRLAM routing scheme, fewer intermediate nodes need to converge, thus the load on any given node or communication link is minimized.Therefore, it reduces the calculating costs of intermediate nodes of the paths and improves communication efficiency.Overall, at the node speed of 60 m/s, the proposed MRLAM routing scheme recorded the lowest convergence time of 16.49% and 11.34% in comparison to the MP-OLSR and the MP-OLSRv2 routing schemes, respectively.

Conclusion
This paper represents an evaluation and comparative study of three routing protocols in MANETs environment, conducted under a series of simulations with varying speed of node scenario.A multipath routing scheme called MRLAM is proposed in which the routing decision is made based on residual energy, link quality, and mobility status of the network nodes.In addition, the MRLAM scheme aggregates multiple parameters into a single metric using the mechanics of the Q-learning process to make an optimal routing decision.The MRLAM routing protocol evaluates the status of nodes during the route computation and topology sensing process to determine the optimal routes in a convergence network.Moreover, the proposed scheme has the ability to cope with link failure and sustains the network lifetime by avoiding nodes with lower residual energy and higher mobility in selecting optimal and stable paths between all pairs of source and destination nodes.The simulation results show that MRLAM has approximately 33% and 23% less energy cost per packet compared to MP-OLSR and MP-OLSRv2 routing schemes, respectively.Moreover, results also corroborate the premise that MRLAM attains better throughput in all scenarios of node speed with successfully delivered packets.In addition, the frequent change in the network topology increase the computational complexity of existing MP-OLSR and MP-OLSRv2 routing processes as the route computation process for discovery of new routes becomes more intricate.In the proposed routing scheme, the Q-learning aggregates multiple parameters related to energy, mobility, and link quality into a comprehensive metric to dramatically reduce the complexity and avoid the control overhead caused by separately broadcasting multiple parameters.Furthermore, the results show that the MRLAM scheme decreases the packet loss ratio up to 30.76% and 24.59%, approximately as compared to MP-OLSR and MP-OLSRv2 routing schemes respectively.Overall, the proposed MRLAM scheme evidently outperforms the conventional MP-OLSR and its extended version MP-OLSRv2 routing schemes in terms of throughput, average end-to-end delay, packet loss ratio, and energy consumption rate.

Conclusions
This paper represents an evaluation and comparative study of three routing protocols in MANETs environment, conducted under a series of simulations with varying speed of node scenario.A multipath routing scheme called MRLAM is proposed in which the routing decision is made based on residual energy, link quality, and mobility status of the network nodes.In addition, the MRLAM scheme aggregates multiple parameters into a single metric using the mechanics of the Q-learning process to make an optimal routing decision.The MRLAM routing protocol evaluates the status of nodes during the route computation and topology sensing process to determine the optimal routes in a convergence network.Moreover, the proposed scheme has the ability to cope with link failure and sustains the network lifetime by avoiding nodes with lower residual energy and higher mobility in selecting optimal and stable paths between all pairs of source and destination nodes.The simulation results show that MRLAM has approximately 33% and 23% less energy cost per packet compared to MP-OLSR and MP-OLSRv2 routing schemes, respectively.Moreover, results also corroborate the premise that MRLAM attains better throughput in all scenarios of node speed with successfully delivered packets.In addition, the frequent change in the network topology increase the computational complexity of existing MP-OLSR and MP-OLSRv2 routing processes as the route computation process for discovery of new routes becomes more intricate.In the proposed routing scheme, the Q-learning aggregates multiple parameters related to energy, mobility, and link quality into a comprehensive metric to dramatically reduce the complexity and avoid the control overhead caused by separately broadcasting multiple parameters.Furthermore, the results show that the MRLAM scheme decreases the packet loss ratio up to 30.76% and 24.59%, approximately as compared to MP-OLSR and MP-OLSRv2 routing schemes respectively.Overall, the proposed MRLAM scheme evidently outperforms the conventional MP-OLSR and its extended version MP-OLSRv2 routing schemes in terms of throughput, average end-to-end delay, packet loss ratio, and energy consumption rate.

Future Works and Challenges
The MRLAM routing scheme with reinforcement by the Q-learning algorithm can be used to improve the QoS not only for MANETs, but also for scenarios of highly mobile nodes, such as drones and remote-controlled vehicles.Due to the large computation cost of Q-learning algorithms, reducing the energy consumption for large numbers of mobile nodes should be considered for future research.The MRLAM routing scheme can be further extended in other large-scale network deployments and popular multihop wireless network scenarios, including typical WSN, VANETs, and MANET-IoT scenarios.Moreover, the queue length of the network nodes should be considered for the selection of routes to minimize the traffic congestion and packet overhead in the network.This will result in a new scheme that can suit more scenarios and meet the requirements of various applications.

Figure 1 .Figure 1 .
Figure 1.The general scenario for potential node selection.

Figure 1 .
Figure 1.The general scenario for potential node selection.

Figure 2 .
Figure 2. Q-learning approach for route selection.

Figure 2 .
Figure 2. Q-learning approach for route selection.

STARTFigure 3 .
Figure 3. Route selection procedure of the Mobility, Residual energy and Link quality Aware Multipath (MRLAM) routing scheme.

Figure 3 .
Figure 3. Route selection procedure of the Mobility, Residual energy and Link quality Aware Multipath (MRLAM) routing scheme.

Figure 4 .
Figure 4. Network throughput with different speed of nodes.

Figure 4 .
Figure 4. Network throughput with different speed of nodes.

Figure 5 .
Figure 5. End-to-end delay with different speed of nodes.

Figure 5 .
Figure 5. End-to-end delay with different speed of nodes.

Figure 6 .
Figure 6.Packet loss ratio with different speed of nodes.

Figure 6 .
Figure 6.Packet loss ratio with different speed of nodes.

Figure 7 .
Figure 7. Node energy consumption with different speed of nodes.

Figure 7 .
Figure 7. Node energy consumption with different speed of nodes.

Figure 8 .
Figure 8. Node Energy Cost with different speed of nodes.

Figure 8 .
Figure 8. Node Energy Cost with different speed of nodes.

Figure 9 .
Figure 9. Convergence time of routing schemes with different speed of nodes.

Figure 9 .
Figure 9. Convergence time of routing schemes with different speed of nodes.
to estimate the consumed energy for each state of node such as (E i Initial (t), E i Residual , E i Consumption and E i operation (t)).Energy consumption in different states of nodes and duration of time in each state of nodes, affect the performance for the proposed routing scheme.The initial energy level of the intermediate node i at the time t is denoted by E i Initial (t) and the residual energy E i Residual (t + τ) of node i at the period of time τ is calculated as follows: . Next hop selection comparison between ETX and EAX potential intermediate candidate's selection and prioritization from source to destination nodes by ETX and EAX is illustrated in Table 1.

Table 1 .
Selection of potential intermediate nodes in optimal route based on the Expected Number of Transmission (ETX) and Expected Any-path Count (EAX) metrics.
).On the other hand, the EAX select only two nodes, such as b and e, because with these two candidates, the EAX from s to d is less than the EAX from f to d , adding f to the candidate set does not decrease EAX between s and d .Furthermore, EAX prioritizes node b over node e as it possesses a smaller value of EAX based on equation (11).Next hop selection comparison between ETX and EAX potential intermediate candidate's selection and prioritization from source to destination nodes by ETX and EAX is illustrated in Table 1.

Table 1 .
Selection of potential intermediate nodes in optimal route based on the Expected Number of Transmission (ETX) and Expected Any-path Count (EAX) metrics.
4.4.Node Selection in the Optimal Route Based on Q-Learning Algorithms: