An Overview of Machine Learning-Based Energy-Efﬁcient Routing Algorithms in Wireless Sensor Networks

: Machine learning (ML) technology has shown its unique advantages in many ﬁelds and has excellent performance in many applications, such as image recognition, speech recognition, recommendation systems, and natural language processing. Recently, the applicability of ML in wireless sensor networks (WSNs) has attracted much attention. As resources are limited in WSNs, identifying how to improve resource utilization and achieve power-efﬁcient load balancing is becoming a critical issue in WSNs. Traditional green routing algorithms aim to achieve this by reducing energy consumption and prolonging network lifetime through optimized routing schemes in WSNs. However, there are usually problems such as poor ﬂexibility, a single consideration factor, and a reliance on accurate mathematical models. ML techniques can quickly adapt to environmental changes and integrate multiple factors for routing decisions, which provides new ideas for intelligent energy-efﬁcient routing algorithms in WSNs. In this paper, we survey and propose a theoretical hypothetic model formulation of ML as an effective method for creating a power-efﬁcient green routing model that can overcome the limitations of traditional green routing methods. In addition, the study also provides an overview of past, present, and future progress in green routing schemes in WSNs. The contents of this paper will appeal to a wide range of audiences interested in ML-based WSNs.


Introduction
Wireless sensor networks (WSNs) are an important technology that enables sensors to acquire and collect various sensing data in the monitoring area [1][2][3] as well as realize intelligent data processing and decisions [4,5]. In a typical WSN, a large number of sensor nodes sense and process data by self-organizing approaches with the sink [6], and sensor nodes transmit the collected data to the sink, which is then responsible for integrating, processing, and uploading those data to the server [7,8]. WSNs have many advantages, such as easy deployment [9], high reliability [10,11], and low power consumption [12][13][14], which is why they are widely used in environmental monitoring [15][16][17], medical care [18][19][20], industrial monitoring, and other fields [21][22][23].
However, the limited power and processing capacity of sensor nodes constrain the applications and decrease the lifetime of WSNs [24][25][26]. Generally, sensors in WSNs are equipped with limited power and are unchangeable once WSNs are deployed in the environment [27][28][29]. Therefore, energy is the most precious resource in WSNs, and powerefficient schemes can prolong their lifetime [30,31]. Identifying how to efficiently utilize the limited resources, achieve load balancing among nodes [32], and extend the network lifetime as much as possible is a critical issue in WSNs [33][34][35], especially as power-efficient routing algorithms can greatly reduce energy consumption and extend the survival cycle of WSNs [36][37][38].
Traditional green routing schemes focus on clustering and selecting special nodes to control the data flow and extend the lifetime of WSNs [39][40][41]. The low-energy adaptive clustering hierarchy (LEACH) routing algorithm [42] divides sensor nodes into multiple clusters and utilizes high hierarchical cluster head nodes to uniformly receive and process the sensing data in the cluster. Hence, the member nodes in the cluster can avoid the energy consumption caused by forwarding. In [43], data forwarding was separated from data transmission by setting up an assistant node to avoid the premature death of some nodes due to frequent data forwarding. However, these methods have several drawbacks. Firstly, they depend on precise mathematical models that are very difficult to formulate and consume too much energy. Secondly, the suitability of those methods is not adaptive to varying network topology and different scales, which makes WSNs prone to congestion and thus impractical. Therefore, it is necessary to adopt new methods to alleviate these problems [44].
Machine learning (ML)-related techniques have recently helped to address the limitations of traditional green routing in WSNs [45][46][47], which provides a versatile and flexible paradigm when dealing with data and computation to solve complex problems that exactly match the requirements for the design of efficient routing algorithms in WSNs [48][49][50].
Despite the increasing interest in ML in the WSN domain, a comprehensive overview focusing on ML for green routing algorithms in WSNs is lacking. In order to promote the application of ML-based routing algorithms in WSNs, this paper introduces a complete overview of progressive research in this field. This paper aims to fill the gap between ML and routing algorithms in WSNs by offering the most advanced overview for interested practitioners to further promote the development of this field.
Since it is difficult to recharge sensors after deployment, it is necessary to consider how to rationalize the limited energy of each node and extend the lifetime of WSNs. ML offers a generic and flexible paradigm for evaluating complex problems that perfectly match the requirement of energy-efficient routing in WSNs. Thus, the novel contributions of this paper are as follows: (1) We present a complete overview of advanced research within the domain of green routing algorithms in WSNs, specifically focusing on the use of ML; (2) We propose a theoretical hypothetic model of ML-based green routing algorithms.
As ML offers the ability to automatically learn features from data within a given environment without prior knowledge of the underlying distribution, it is envisioned that the proposed model will outperform traditional routing algorithms; (3) We present the challenges related to the implementation of ML for green routing algorithms in WSNs and identify future research directions around unresolved issues of great value.
The rest of this paper is structured as follows. Section 2 provides an overview of traditional green routing algorithms in WSNs and classifies and compares them through a novel perspective. Section 3 proposes our theoretical model and describes the existing work on ML for green routing in WSNs. Section 4 introduces the applications of MLbased routing algorithms in WSNs. Section 5 analyzes the challenges of ML-based routing algorithms in WSNs. Section 6 highlights the conclusion and future research directions.

Green Routing Algorithms in WSNs
Among the factors that consume energy in WSNs, communication among sensors is the worst offender. The routing scheme determines the forwarding path between sender and receiver, and an efficient routing algorithm can clearly minimize the communication cost and maximize the lifetime of WSNs.
In most existing surveys of green routing protocols in WSNs, routing algorithms are categorized based on the construction of the network, its topology, etc. However, there are few classifications according to the approaches that improve the energy efficiency of WSNs. In this paper, a novel classification is proposed as shown in Figure 1, which classifies the existing energy-efficient routing algorithms into three categories according to In most existing surveys of green routing protocols in WSNs, routing algorithms are categorized based on the construction of the network, its topology, etc. However, there are few classifications according to the approaches that improve the energy efficiency of WSNs. In this paper, a novel classification is proposed as shown in Figure 1, which classifies the existing energy-efficient routing algorithms into three categories according to the energy-saving scheme, namely, setting special nodes, energy-efficient scheduling, and optimizing data flow. Specifically, setting special node-based routing algorithms include setting hierarchical nodes and special functional nodes. Energy-efficient scheduling-based routing algorithms can be classed into static node scheduling and mobile node scheduling. Optimizing data-flow-based routing algorithms can be classified into single path and multi-path routing schemes.

Setting Special Node-Based Routing Algorithms
In WSNs, each sensor node not only transmits sensing data but also forwards them to other sensor nodes. Additionally, nodes closer to the sink will be saddled with more tasks around relaying packets than nodes farther, which causes an imbalance of network energy and leads to a short WSN lifetime. By setting special nodes, these issues can be relieved or even solved. Special node-based routing algorithms are classified into two categories: hierarchical node-based routing algorithms [51,52] and special function nodebased routing algorithms [53,54].

Hierarchical Node-Based Routing Schemes
In hierarchical node-based routing algorithms, high hierarchical nodes act as an intermediate layer between low hierarchical nodes and a sink node, which is responsible for receiving, fusing, and delivering data sent from low-level nodes to the sink. By increasing the communication task, high hierarchical nodes force low hierarchical nodes to reduce their energy consumption for forwarding data.
LEACH [51] is a classic routing algorithm based on hierarchical nodes. The system model is shown in Figure 2. The nodes in LEACH are divided into cluster head (CHs) and cluster members according to the roles they play in WSNs. Cluster member nodes collect the data of their surrounding area and send them to the CH in the cluster. Then, the CH processes the data and transmits them to the sink.

Setting Special Node-Based Routing Algorithms
In WSNs, each sensor node not only transmits sensing data but also forwards them to other sensor nodes. Additionally, nodes closer to the sink will be saddled with more tasks around relaying packets than nodes farther, which causes an imbalance of network energy and leads to a short WSN lifetime. By setting special nodes, these issues can be relieved or even solved. Special node-based routing algorithms are classified into two categories: hierarchical node-based routing algorithms [51,52] and special function node-based routing algorithms [53,54].

Hierarchical Node-Based Routing Schemes
In hierarchical node-based routing algorithms, high hierarchical nodes act as an intermediate layer between low hierarchical nodes and a sink node, which is responsible for receiving, fusing, and delivering data sent from low-level nodes to the sink. By increasing the communication task, high hierarchical nodes force low hierarchical nodes to reduce their energy consumption for forwarding data.
LEACH [51] is a classic routing algorithm based on hierarchical nodes. The system model is shown in Figure 2. The nodes in LEACH are divided into cluster head (CHs) and cluster members according to the roles they play in WSNs. Cluster member nodes collect the data of their surrounding area and send them to the CH in the cluster. Then, the CH processes the data and transmits them to the sink. LEACH balances energy consumption in the cluster by adaptively electing appropriate CHs. However, direct communication between CHs and the sink leads to unbalanced energy consumption between clusters. The reason for this is that energy consumption increases dramatically when the communication distance increases. Hence, CHs far away from the sink will consume more energy, which results in load imbalance among clusters and prevents the LEACH protocol from being applied to large-scale WSNs.
To overcome this defect, an energy-efficient concentric clustering scheme (EECCS) protocol was proposed [52], which constructs a multi-hop path among CHs according to LEACH balances energy consumption in the cluster by adaptively electing appropriate CHs. However, direct communication between CHs and the sink leads to unbalanced energy consumption between clusters. The reason for this is that energy consumption increases dramatically when the communication distance increases. Hence, CHs far away from the sink will consume more energy, which results in load imbalance among clusters and prevents the LEACH protocol from being applied to large-scale WSNs.
To overcome this defect, an energy-efficient concentric clustering scheme (EECCS) protocol was proposed [52], which constructs a multi-hop path among CHs according to distance from the sink node. The path eases the imbalance of energy load among clusters. EECCS selects cluster heads based on node weight value w: where E Remain is the remaining energy of the sensor node, E Average is the remaining total energy of the cluster, and d denotes distance. EECCS assumes that the number of nodes in each cluster should be different, the farthest cluster from the sink should have the least nodes, and the nearest cluster should have the most nodes because the farthest cluster will consume the least energy.

Special Function Node-Based Schemes
In special function node-based schemes, special function nodes may be a stronger function node with additional hardware modules. Different from the functions of general sensing nodes, special function nodes can assist in positioning, communication between clusters, and serve as an aid in forwarding data. Similar to hierarchical nodes, they are used to reduce the communication consumption of other nodes by increasing their own tasks.
The helping node is proposed in topology analysis based on node spatial distribution (NSD) [53], which neither sends nor receives data packets but only helps nodes relay their data packets. In NSD, there are two forwarding modes: user mode and helping mode. In the user mode, packets are only forwarded among user nodes. In the helping mode, the node transmits data to the nearest helping node within its own communication range, and then data are transmitted to the sink via the helping network. NSD separates data forwarding from data transmission by setting up helping nodes, avoiding some user nodes from dying prematurely due to frequent data forwarding. However, the nodes in the helping network also need to consume energy, and NSD does not provide a solution to balance the load in the helping network.
The color-theory-based energy-efficient routing algorithm (CEER) [54] was proposed, which designs four anchor nodes equipped with global positioning system (GPS) devices to locate nodes. These anchor nodes play a role similar to that of CHs in LEACH, helping the nodes to collect data and forward them to the sink directly. As shown in Figure 3, CEER constructs a database of geographic location information, in which different RGB values correspond to different positions. As the node has a larger distance from the sink, the RGB value is greater. Each node computes its RGB value with the help of anchor nodes; then, the anchor nodes upload the RGB value to the sink, and the sink can find the geographic location of this node according to the database. CEER includes three stages: the routing constructing stage, the data forwarding stage, and the improving stage. CEER decreases the cost of the positioning network by using four anchor nodes and balances the energy load by selecting CHs. However, the fact that CEER sets additional time slots to check whether the cluster member has packets to send in the data forwarding stage will increase the delay of the network. Anchor nodes not only help with the positioning of other nodes but also with collecting and relaying data packets from other nodes, which results in the high cost of anchor nodes.
Hierarchical nodes and special function nodes are designed to alleviate other nodes' burden and schedule power consumption among nodes. The difference between hierarchical nodes and special function nodes is that the former rules the lower hierarchical nodes, while the latter provides services for the sensor nodes. A comparison of setting special node-based routing algorithms is shown in Table 1. Table 1. Comparison of setting special node-based routing algorithms.

Scheme Energy Consumption Energy Load Balance Scalability
LEACH [51] Save energy in data gathering, and consume energy in single hop communication between CHs and the sink Balance in cluster Low EECCS [52] Save energy in data gathering and multi-hop communication between CHs and sink Good Low NSD [53] Reduce energy consumption by helping nodes Good High CEER [54] Anchor nodes relay energy consumption Good Medium As shown in Table 1, in the hierarchical node-based routing algorithms, the high hierarchical nodes that communicate with the sink directly consume more energy than those nodes that adopt multi-hop communication. In the aspect of energy load balance, CHs ensure an energy load balance in their cluster by selecting appropriate CHs. However, the energy load among clusters is still imbalanced. EECCS can balance the energy load among clusters, but it is impractical. Routing algorithms based on hierarchical nodes have a common demerit, i.e., poor scalability. In special function node-based routing algorithms, the special nodes perform well when it comes to saving energy and can achieve good performance on energy load balance, and they are also scalable.

Energy-Efficient Scheduling-Based Routing Algorithms
In WSNs, sensor nodes not only transmit sensing data but also relay packets from other nodes to the sink. Energy-efficient scheduling-based routing algorithms adjust nodes' communication modes dynamically in a pre-defined policy to optimize performance and prolong the lifetime. In this paper, routing algorithms based on energy-efficient scheduling are classified into two groups. One group is based on static node scheduling [55,56], and the schemes in this group mostly adopt sleep scheduling to reduce the energy consumption of WSNs. The other group is based on mobile node scheduling [57,58], which optimizes the network lifetime by moving nodes on purpose.

Static Node Scheduling-Based Schemes
In static node scheduling-based schemes, all nodes are fixed once they have been deployed. Identifying how to minimize the energy cost under the premise of a good operation is the problem that static scheduling is attempting to address. These schemes mostly use sleep scheduling to minimize energy consumption. In sleep scheduling, some nodes continue to work to keep the network running, and the rest of the nodes go into a dormant state. The sleeping nodes do not transmit or receive any data packets during their sleep. Due to the reduction in the number of working nodes, the possible paths between sensor nodes and the sink are reduced. Improper scheduling can greatly increase the number of hops between source nodes and sink, obviously resulting in delay.
The connected-K neighborhood scheduling algorithm (CKN) was proposed in [55], which presents a random ranking, with each node receiving a ranking from CKN. Figure 4a shows node C and its neighbor nodes. After node C receives the rankings of its neighbors, it compares these rankings with its own. When each current neighbor's ranking is lower than that of C, node C has more than K neighbors awake, and node C can sleep. Assuming Electronics 2021, 10, 1539 6 of 24 that K is 3, as shown in Figure 4b, there are no neighbors awake with a higher ranking and the number of neighbors awake is bigger than 3, so the node C goes to sleep. To ensure that every node has a chance to sleep, ranking changes randomly in every scheduling cycle. CKN saves energy by allowing some of the nodes to sleep and reduces end-to-end delay by maintaining a ranking list. However, random ranking cannot ensure that nodes with low remaining energy will sleep.
sleep. Due to the reduction in the number of working nodes, the possible paths between sensor nodes and the sink are reduced. Improper scheduling can greatly increase the number of hops between source nodes and sink, obviously resulting in delay.
The connected-K neighborhood scheduling algorithm (CKN) was proposed in [55], which presents a random ranking, with each node receiving a ranking from CKN. Figure  4a shows node C and its neighbor nodes. After node C receives the rankings of its neighbors, it compares these rankings with its own. When each current neighbor's ranking is lower than that of C, node C has more than K neighbors awake, and node C can sleep. Assuming that K is 3, as shown in Figure 4b, there are no neighbors awake with a higher ranking and the number of neighbors awake is bigger than 3, so the node C goes to sleep. To ensure that every node has a chance to sleep, ranking changes randomly in every scheduling cycle. CKN saves energy by allowing some of the nodes to sleep and reduces end-to-end delay by maintaining a ranking list. However, random ranking cannot ensure that nodes with low remaining energy will sleep.  The geographic routing-oriented sleep scheduling algorithm (GSS) [56] is a trade-off routing protocol. Before the data forwarding phase, GSS explores all possible paths from the source nodes to the sink and ensures that all paths separate while optimizing them by making them as short as possible. In the data forwarding phase, the source node or the forwarding node chooses the neighbor nearest to the sink as its next hop according to the received geographic location information. In addition, GSS demands that the nearest nodes from the sink not be eligible for sleeping and that they keep working. As shown in Figure 4c, node C can turn to sleep in CKN, but it cannot sleep in GSS due to its distance The geographic routing-oriented sleep scheduling algorithm (GSS) [56] is a trade-off routing protocol. Before the data forwarding phase, GSS explores all possible paths from the source nodes to the sink and ensures that all paths separate while optimizing them by making them as short as possible. In the data forwarding phase, the source node or the forwarding node chooses the neighbor nearest to the sink as its next hop according to the received geographic location information. In addition, GSS demands that the nearest nodes from the sink not be eligible for sleeping and that they keep working. As shown in Figure 4c, node C can turn to sleep in CKN, but it cannot sleep in GSS due to its distance from the sink. The never-sleeping nodes are a bottleneck in GSS, consuming energy much faster than those nodes that can rest, leading to the premature death of WSNs.

Mobile Node Scheduling-Based Schemes
In WSNs with static sensor nodes, energy distribution becomes uneven with the continuous communication of sensor nodes. The distance from the source node to the sink directly affects the communication energy cost. The greater the distance, the more the energy consumed. To reduce power consumption, source nodes or the sink must move toward the goal area in mobile node scheduling-based routing schemes. Mobile nodes can optimize the transmission path between the source node and the destination node and reduce the energy consumption of the network.
In order to save power efficiently, a global energy balancing routing protocol (GEBRP) was proposed [57], which adopts the virtual grid-based network model. In GEBRP, node mobility scheduling includes two phases: the diffusion phase and supplementary phase. In the diffusion phase, as shown in Figure 5, the nodes in the higher coverage areas are scheduled to move to the low-coverage areas to balance the network coverage. In the supplement phase, those nodes that fail or have low residual energy are replaced by high residual energy nodes. GEBRP balances the distribution of nodes in the diffusing phase and improves network robustness in the supplement phase. Meanwhile, GEBRP assumes that only a few nodes need to stay awake, while other nodes can switch to sleep mode. When there are multiple nodes in a virtual grid, GEBRP significantly reduces energy consumption.
To balance the energy load, a reliable and energy-efficient routing protocol for mobile sinks (REEMS) was proposed [58], which schedules the mobile sink group to improve power efficiency. As shown in Figure 6, REEMS constructs a virtual block including a sink group, and the virtual grid is considered a boundary of the mobile sink group. are scheduled to move to the low-coverage areas to balance the network coverage. In the supplement phase, those nodes that fail or have low residual energy are replaced by high residual energy nodes. GEBRP balances the distribution of nodes in the diffusing phase and improves network robustness in the supplement phase. Meanwhile, GEBRP assumes that only a few nodes need to stay awake, while other nodes can switch to sleep mode. When there are multiple nodes in a virtual grid, GEBRP significantly reduces energy consumption. To balance the energy load, a reliable and energy-efficient routing protocol for mobile sinks (REEMS) was proposed [58], which schedules the mobile sink group to improve power efficiency. As shown in Figure 6, REEMS constructs a virtual block including a sink group, and the virtual grid is considered a boundary of the mobile sink group.
A(x r -R,y r +R) B(x r -R,y r -R) C(x r +R,y r -R) D(x r +R,y r +R) LS sink moving direction Boundary node (x r ,y r )  sumption.
virtual grid moving direction nodes Figure 5. The diffusing phase of GEBRP [57].
To balance the energy load, a reliable and energy-efficient routing protocol for mobile sinks (REEMS) was proposed [58], which schedules the mobile sink group to improve power efficiency. As shown in Figure 6, REEMS constructs a virtual block including a sink group, and the virtual grid is considered a boundary of the mobile sink group.
A(x r -R,y r +R) B(x r -R,y r -R) C(x r +R,y r -R) D(x r +R,y r +R) LS sink moving direction Boundary node (x r ,y r ) Figure 6. The system model of REEMS [58]. Figure 6. The system model of REEMS [58].
Packets are first transmitted to the boundary nodes of the virtual grid, and when the sinks leave the virtual grid, the data are received by sinks from the boundary node. The sinks obtain the data packets by moving out from the virtual grid, which ensures the reliability of the data forwarding and avoids situations where the registering area is different from the actual area. The mobility of the sink makes the source only need to send data to the area where the sink group may pass. Meanwhile, sinks can receive data packets from the boundary nodes. REEMS reduces energy consumption and enhances the forwarding ratio of data packets.
A comparison of different energy-efficient scheduling-based routing algorithms is shown in Table 2, which considers metrics including energy saving, balance of energy load, and delay. Reducing energy consumption and balancing the energy load can both prolong the lifetime of WSNs. Static node scheduling-based schemes have merits in energy saving. However, the fact that many nodes sleep in such schemes causes an issue of delay. CKN and GSS consider the delay issue and try to decrease transmission delay. Mobile node scheduling-based schemes have advantages in balancing the energy and load in WSNs. GEBRP optimizes energy distribution by moving the nodes to balance the node density of different areas. REEMS utilizes the sink and sink group to balance the energy and load. The random ranking of the node to balance the energy and load.
K neighbors awake to ensure the connectivity and increase delay GSS [56] Reduce energy cost by switching part of nodes to sleep mode.
The random ranking of the node to balance the energy and load.
The bottleneck node will degrade the delay.
GEBRP [57] A few nodes need to keep awake, and other nodes sleep to reduce energy cost.
Moving nodes to balance the node energy distribution.
Moving nodes are scheduled to decrease delay.
REEMS [58] Reduce the energy consumption with moving sinks.
Optimize the imbalance of energy load by the different distance Sinks are scheduled to decrease delay

Optimizing Data Flow-Based Routing Algorithms
For energy-limited applications, WSNs should have a good performance in energy saving and load balance. For real-time applications, WSNs should transmit data with little delay. In order to avoid data loss and node failure, reliability is also a very important metric that should be considered in WSNs. To meet the different requirements of applications, optimizing data flow-based routing algorithms provides efficient solutions, which are classified into two categories based on the number of paths from the source node to the destination node. One is single path-based routing algorithms [59,60], and the other is multi-path-based routing algorithms [61,62].

Single Path-Based Schemes
In single path-based routing schemes, routing algorithms try to achieve a near-optimal or optimal path to the sink through local optimization. Inspired by the concept of potential application in the physical field, Ren et al. [59] proposed the energy-balanced routing protocol (EBRP). EBRP sets up a hybrid field consisting of three virtual fields: depth field, energy density field, and residual energy field. The potential difference U m (i, j, t) from node i to j at time t is defined as: where 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, 0 ≤ α + β ≤ 1, weights α and β determine how much impact the energy density potential field and residual energy potential field impose on the routing decision, respectively. The depth field means the depth of minimum hops from source node to the sink. Under the hybrid field, the packets are forwarded to the sink along the energy-intensive area to protect those low-energy nodes. EBRP finds an energy balancing path to forward packets and guarantee delay performance. Energy-efficient speed routing protocol (EESPEED) [60] is an improved energy-efficient routing protocol based on SPEED [63]. In SPEED, each node records the location information and forward speed of all its neighbors and sets a threshold. When a node receives a packet, it chooses the neighbors nearest to the destination than to the node itself as the candidate relaying set. The nodes with the highest forwarding rate are selected as the relaying node. The routing restarts when there is no node with a forwarding rate higher than the threshold. Different from SPEED, EESPEED considers three factors of forwarding delay, residual energy, and relaying rate to optimize the next hop. The weight function f j of the jth sensor is formally defined as: where α + β = 1, E n is the ratio of residual energy on node j, De is the ratio of delay on node j, Sp is the relay speed its packet transfers from the present node to node j, and α and β are the coefficients of these factors.

Multi-Path-Based Schemes
Multi-path-based schemes have good reliability in data transmission because there are multiple routing paths from the source nodes to the destination nodes. Even though some nodes in one path fail or run out of energy, packets can be forwarded through other paths.
To construct multiple path segments according to the gradient information of nodes, the gradient-based multi-path routing protocol (GMRP) is proposed [61]. The node chooses the path segment that can forward packets to the sink fastest as its next path segment. The path segment selection process is repeated until the source node obtains the minimum delay path. In GMRP, each node should maintain the information of its own gradient and some path segments. The node finds each path segment, which starts from the node itself and ends when it encounters a node with a lower gradient. Then, the node chooses the minimum delay path segment as its next segment, and the first node of the selected path segment is the node's next hop. GMRP finds the minimum delay path to the source with less complexity.
Randomized switching for maximizing lifetime (RSML) [62] is a tree-based routing protocol, and it achieves the balance of the energy load by changing the tree routing dynamically. In RSML, the energy load is computed as the ratio of energy consumption of the node and the node's energy. A parameter x is adopted that represents the magnitude of the network balance. Once the difference between the energy path load and minimum path load is more than x, the child node, which loads more, chooses a neighbor with a load as light as that of its parent. Repeat the process until the difference between energy path load and minimum path load is no more than x; then, the optimal balanced tree is created.
A comparison of different optimizing data flow-based routing algorithms is shown in Table 3, which considers the optimized parameters involved in the single-path-based schemes and multi-path-based schemes. Table 3. The comparisons of different optimizing data flow-based routing algorithms.

Scheme Energy Consumption
EBRP [59] Delay, residual energy, energy density EESPEED [60] Delay, residual energy, forwarding rate GMRP [61] Delay RSML [62] Routing quality, supporting time As shown in Table 3, the delay and residual energy are the most optimized parameters in different schemes. The delay metric is related to the number of hops from the sending node to the destination node. To some extent, the shorter the number of hops at both ends of the communication, the lower the delay and the lower the energy consumption. The residual energy of the node is related to the network lifetime of WSNs. If the residual energy of the node is too low to complete the communication task, the node will fail, which will decrease the lifetime of the WSNs. The difference between single-path-based schemes and multi-path-based schemes is the number of paths from source nodes to destination nodes. In general, multi-path routing is more reliable than single-path routing. From the perspective of the specific process of optimizing data flow, the single-path-based protocol mainly adopts the next-hop optimization method and completes the global optimization of the network through local optimization. The multi-path-based protocol, by contrast, utilizes the method of optimizing the next path segment or optimizing the parent node to achieve network optimization.

Discussions
Setting special node-based routing algorithms can release the burden of the general nodes. LEACH and EECCS reduce the forwarding burden through hierarchical nodes. CEER utilizes anchor nodes to help other nodes to implement positioning, which minimizes the power consumption of sensor nodes. NSD uses helping nodes to relay the packets of others. Static node-based scheduling methods minimize the number of working nodes to decrease energy consumption. However, delay constraints should be considered due to sleep scheduling. CKN and GSS try to decrease communication delay. The mobile node scheduling-based routing methods balance the energy load according to energy consumption by moving the nodes dynamically. GEBRP and REEMS optimize power consumption by scheduling the moving nodes. GEBRP schedules selected sensors to balance node density among the monitoring areas, and REEMS optimizes the sink group to balance the energy load in WSNs. Single path-based routing schemes optimize the next hop in WSNs, and multi-path-based schemes make the network more reliable.

ML-Based Routing Algorithms in WSNs
ML is a technical science that studies and develops theories, methods, technologies, and application systems for simulating, extending, and expanding human intelligence, and it adopts different learning algorithms to analyze data, continue learning, and make judgments and predictions. ML techniques have been widely used for classification, regression, and density estimation in various fields, such as bioinformatics, natural language processing, computer vision, and graphics processing. The algorithms and techniques used also come from many different fields, including statistics, probability, neuroscience, bionics, and computer science.
The ML-based routing algorithm is an important routing design in WSNs for the following reasons: (1) WSNs typically monitor dynamic environments that change rapidly over time, and the locations of nodes may change, thus requiring the development of routing algorithms with rapid adaptation and decision making. (2) WSNs can be used for information collection in hazardous locations that are inaccessible to humans (e.g., volcanic eruptions, sewage treatment, and nuclear energy leaks). The various unanticipated influences that may occur in such situations can paralyze some nodes and affect routing decisions, which makes systems require strong learning and self-adaptive capabilities to ensure accurate information collection and transmission. Therefore, ML is very suitable for routing protocol designs with WSNs. (3) Designing routing algorithms in WSNs should consider the challenges posed by multiple influencing factors, such as energy consumption, fault tolerance, scalability, and data coverage. ML enables nodes and sinks to learn from past experience and chooses an optimized routing path to adapt to dynamically changing environments.
The advantages of using ML to design routing algorithms can be summarized as follows. ML learns the best routing path to reduce energy consumption and extends the lifetime of WSNs. It can reduce computational complexity by dividing the typical routing problem into simpler sub-routing problems. In each subproblem, nodes develop graph structures by considering only their local neighbors, resulting in low-cost, efficient, and real-time routing. Additionally, the quality of service (QoS) requirements of routing are satisfied by relatively simple calculation methods and classifiers, which reduce cost and save power efficiently.

System Model
In the proposed system model, WSN is considered to be an agent within the context of ML and uses feedback signals that can identify the system environment. Then, the system learns and tries to achieve the best decisions with ML algorithms to design the optimal policies. Hence, the routing decision problem can be efficiently represented as a Markov decision process (MDP). A hypothetical cooperative routing model which is formulated as a multi-agent MDP is proposed as shown in Figure 7. In addition, any of the ML algorithms discussed below can be used to formulate the optimal solution. The study considers only a WSN consisting of N sensors that observe the action of the routing.
The proposed system model is considered to function in a time-slotted manner. At the beginning of each time slot, one of the agents, which is also the fusion center (FC), activates an ML-based distributed cooperative routing decision and combines the local decision of each agent. Each sensor node senses its own neighbors' link state information (e.g., energy efficiency, overhead, packet loss rate, bit error rate) and makes decisions locally. Then, they constantly sense the link state information and inform the FC of their local decisions. Typical routing issues such as energy efficiency and QoS could be minimized by employing spatial diversity through cooperative and distributed routing decision. In this case, the sensor broadcasts a request for cooperative routing to all neighbors. Then, the agent combines all the independent local decisions and gives the final decision of routing.

Distributed-Regression-Based Routing Schemes
Distributed regression aims to find a fit function ( ) where s is the size of the whole sample set S.
To improve the routing performance, an efficient sensor data modeling framework (ESDFM) was presented in [64]. In this distributed framework, the nodes optimally fit a global function to match their local measurement in a collaborative way, and the model executes a kernel linear regression in the form of a weighted sum of local basis functions, which can map training samples into feature space to facilitate data manipulation. Once the model is started, all nodes can answer queries in the local scope, or users on the external network can efficiently obtain the model parameters from the nodes. Multiple sensors in the proposed framework are highly correlated, which can minimize the communication cost used to detect sensor data structures. This distributed regression framework includes three layers: the routing layer, which builds a spanning tree so that neighboring nodes have high-quality communication links; the connection tree layer, which sends messages between neighboring nodes in the routing tree to enforce the intersection property; and the regression layer, which optimizes the regression estimation by sending information about the coefficients of the basis functions. Distributed regression achieves energy efficiency by optimizing the transmitted data and routing, thus extending the lifecycle of the WSN.

Distributed-Regression-Based Routing Schemes
Distributed regression aims to find a fit function ∧ f (t) to best fit the real sensor measure function f (t). The fit function ∧ f (t) can be expressed as: where h i (t) is a basis function to minimize square sense, and w i is the basis function coefficient. Pick the weight vector w * that minimizes the squared error, we have where s is the size of the whole sample set S. To improve the routing performance, an efficient sensor data modeling framework (ESDFM) was presented in [64]. In this distributed framework, the nodes optimally fit a global function to match their local measurement in a collaborative way, and the model executes a kernel linear regression in the form of a weighted sum of local basis functions, which can map training samples into feature space to facilitate data manipulation. Once the model is started, all nodes can answer queries in the local scope, or users on the external network can efficiently obtain the model parameters from the nodes. Multiple sensors in the proposed framework are highly correlated, which can minimize the communication cost used to detect sensor data structures. This distributed regression framework includes three layers: the routing layer, which builds a spanning tree so that neighboring nodes have high-quality communication links; the connection tree layer, which sends messages between neighboring nodes in the routing tree to enforce the intersection property; and the regression layer, which optimizes the regression estimation by sending information about the coefficients of the basis functions. Distributed regression achieves energy efficiency by optimizing the transmitted data and routing, thus extending the lifecycle of the WSN.
ESDFM provides a key inspiration to utilize ML in routing design in WSNs, achieving good fitting results and less overhead in the learning phase. However, it still faces some challenges, such as the inability to learn nonlinear and complex functions.

Artificial Neural Network-Based Routing Methods
ANN is an information processing system based on mimicking the structure and function of neural networks in the brain, which achieves the purpose of processing information and simulating the relationship between inputs and outputs through the repeated learning training of known information and by gradually adjusting the connection weights of neurons. ANN is widely used in image recognition, speech recognition, weather forecasting, and facial feature recognition due to its parallelism, efficiency, and flexibility.
The self-organizing map (SOM) is a typical ANN algorithm that is trained via unsupervised learning to create a two-dimensional and discrete input example called a map. To exploit the excellence of cluster-based routing algorithms in increasing the lifetime of WSNS, the energy-based clustering self-organizing map (EBC-S) was proposed in [65], which creates clusters with the same energy level, allowing a balanced energy load in the network. In the cluster creation process, SOM and K-means methods are used together to create clusters. First, vector V, which represents the X-Y coordinates and energy of the nodes are input to the SOM. These parameters are normalized as V 1 : where max a and min a are the highest and smallest value for feature a. Then, the base station selects n nodes by maximum energy, which is considered the weight vector of the SOM. The data are transmitted from the common nodes to the cluster head and aggregated at the cluster head, and then, the information obtained by the cluster head will be sent to the base station. Energy consumption E Tx (k, d) of transmitting k bit data with distance d is calculated as: where E elec (k, d) is the energy for transmitting/receiving k bit data with distance d, E Tx_amp (k, d) is the amplification energy of transmitting k bit data with distance d.
The weight matrix consists of the X-Y coordinates of the nodes and the residual energy of the nodes. The nodes with less energy move closer to the high-energy nodes and form clusters aggregated by the SOM. The result of the SOM is specified as the input to the K-means algorithm. Then, cluster heads are selected based on the maximum energy of the nodes, the distance of the nodes from the base station, and the proximity of the nodes to the cluster center. In addition, the EBC-S is deployed on the base station and centralized computation, which assigns the role to each node, whether it is a cluster head or a node. This hybrid scheme considers the energy consumption to extend the lifetime of WSNs.
To guarantee the QoS of the network, an ANN-based routing (ANNR) algorithm [66] was proposed to detect the best path using the SOM unsupervised learning, and the ANNR model is shown in Figure 8. ANNR adopts a revised Dijkstra algorithm to form a backbone network and minimize the cost path from the sink to each node in WSNs, which adopts QoS parameters as the influencing factor in defining the edge weight coefficients, including delay, throughput, error rate, and duty cycle. Due to the distributed nature of WSNs, ANNR defines the QoS level in a diffuse way. Each node periodically tests the quality of each neighboring link with Ping packets to get input samples. In the training phase, the second layer neurons compete for high weights in the learning chain among each other. Consequently, the weights of the winning neurons and their neighboring neurons are updated through the neighborhood function to further match the input patterns and obtain the QoS value. Then, node v j uses the obtained QoS value to calculate the distance d[v j , v r ] to the base station v r through neighbor node v i to avoid the region with the worst QoS level. The distance d[v j , v r ] of node v i to the base station v r , can be expressed as: where QoS is the corresponding channel quality. Due to the bottleneck of data processing ability and energy in WSNs, the task and learning process of ANNR with high computational cost should be implemented through a central data processing unit. In contrast, sensor nodes can implement the execution process that does not require a high computational cost. This hybrid algorithm takes into account the QoS requirements when defining the edge weight coefficients and can achieve the best route with a low average dissipation energy and average delay.
To design an energy-efficient routing scheme, a resilient routing algorithm was proposed [67], which considers the link reliability and other traditional routing metrics and utilizes a deep-learning-based link prediction model. To improve the self-learning capability of mining topological features, this model combines Weisfeiler-Lehman kernel and dual convolutional neural network (WL-DCNN) for lightweight subgraph extraction and labeling. Then, the link state information is used to design a resilient routing mechanism based on a combination of the current shortest path and link reliability metric, which achieves energy efficiency and extends the lifetime by ensuring data integrity and shortening the transmission path.
To ensure energy efficiency and QoS, a secure deep learning (SecDL) approach was proposed [68], where the topology is formulated as a biconcentric hexagon network along with mobile sink to improve energy efficiency. The hexagonal network is first divided into six sectors, and the dynamic clusters are formed in a self-organizing manner. Each cluster performs cluster head selection, minimization of redundant data, and data aggregation. In the data transmission, the optimal route is selected in SecDL by combining trust parameters, QoS parameters, and energy-efficiency parameters.
In general, ANN can provide adaptive capability for routing protocols with dynamic topology in WSNs. However, the main defects are the complexity of the algorithm and the learning overhead in the case of dynamic topology.

Reinforcement-Learning-Based Routing Methods
Reinforcement learning enables the system to learn and improve from cases and depends on agents who are constantly acting in the environment and receiving rewards and punishments based on the results of their actions.
Q-learning is the most typical and widely used reinforcement learning algorithm, in which the maximum action value is considered. Q-learning reaches convergence in continuous iterations to obtain the optimal value matrix Q(s t , a t ), making its computational complexity relatively low: where Q(s t , a t ) is value matrix of action selection, s t is the state observed at time t, a t is the action selected at time t, α is the learning rate, γ is the discount factor, and r(s t , a t ) is the reward for choosing action a t in state s t . In [69], a Q-learning reliable routing with a weighting agent approach (QLRR-WA) was proposed. In the construction of the routing graph, QLRR-WA adopts a weighted cost equation to select nodes and neighbors. The states of agents represent the set of weights, and the action is continuously taken in the environment to learn the set of weights that can optimize network performance. Based on average network latency and expected network lifetime, the agent is given a reward or punishment to change the agent's next action policy. Meanwhile, QLRR-WA tends to improve the network reliability by constructing an uplink graph in which nodes have at least two neighbors to forward data to the gateway.
In [70], a Q-probabilistic routing (Q-PR) scheme is proposed based on reinforcement learning and a Bayesian decision model, which learns from previous routing decisions and local interaction among neighbor nodes to make routing schemes adjust to future conditions. Q-PR calculates the optimal routes depending on the power constraints, the importance of the messages, and the expected delivery rate. The Bayesian approach makes decisions by considering the data importance, node profiles, expected transmission, and received energy to send the group to the set of candidate neighboring nodes. However, the RL-based routing algorithm faces the issue of a limited cognition of knowledge. Therefore, this algorithm is not suitable for highly dynamic environments because it takes a long time to learn history information.
To address the routing challenges caused by intermittent connections between devices and lack of fixed paths for message transmission in opportunistic Internet of Things networks (OppIoTs), a reinforcement-learning-based routing protocol RLProph was proposed [71]. The algorithm optimizes the routing process by the Markov decision process (MDP) and reduces energy consumption with the optimal routing. Compared with other context-independent routing, RLProph automatically determines the decision to maximize the reward based on the context and achieves better performance in information transmission.
To maximize message delivery and minimize network overhead, a MDP is formulated as (S, A, P a (s, s ), R a , γ), where S is a finite set of states describing the environment, A is the finite set of actions, P a (s, s ) = P(s t+1 = s |s t = s, a t = a) is the probability that action a(a ∈ A) in state s (s ∈ S) at time t will lead to state s at time t + 1, R a is the immediate reward received after transitioning from state s to state s due to action a, and γ is the discount factor representing the difference in value between future rewards and present rewards. The state set S is described by using certain characteristics or node-specific context information. The rest of the MDP is structured to improve message delivery using both the reward function and state transition probabilities. Policy iteration is used for finding the optimal policy that describes how routing should be performed in the network.
RLProph uses a planning-based reinforcement learning approach for routing. The routing process uses optimal strategies to make intelligent and effective routing decisions, thereby increasing the probability of message delivery and reducing overhead.
Considering that IoT applications have stricter requirements, such as more flexible and efficient routing, the reinforcement-learning-based routing decision is favored for its high flexibility and accuracy. Based on reinforcement learning, two centralized and distributed routing schemes were designed and implemented in WSNs [72]. The powerful learning ability and generalization ability of artificial intelligence combined with a software-defined network (SDN) and programmable routing equipment make reinforcement-learning-based intelligent routing feasible. A deep reinforcement-learning-based single-optimality routing (DRLSOR) algorithm was implemented in both centralized and distributed modes, and the convergence and re-convergence time were analyzed. A multi-optimal routing scheme was also implemented by the model fusion, which improves the route efficiency and reduces energy consumption. Experimental results show that the centralized routing has faster convergence speed, while the distributed routing has better scalability.

Ensemble-Learning-Based Routing Methods
Ensemble learning is a powerful tool in ML that builds and combines multiple machine learners to accomplish learning tasks, which generates a set of individual learners and then combines them following a specific strategy. Individual learners are generally common ML algorithms, such as decision trees, neural networks, etc. There are two types of integration: homogeneity and heterogeneity. Homogeneity means that individual learners are all of the same type, and the individual learners in this homogeneous integration are also called base learners. Heterogeneity means that individual learners contain different types of learning algorithms, such as decision trees and neural networks, at the same time.
In order to predict the best routing protocol for WSN, R. Arroyo-Valles et al. [73] proposed a scheme for the order of preferences by similarity to ideal solution (TOPSIS), which is a multi-criteria assessment algorithm used to optimize ensemble learners. Then, a multi-criteria TOPSIS-based ensemble framework (MCTOPE) was presented. MCTOPE considers two types of routing protocols: ad hoc on demand distance vector (AODV) and dynamic source routing (DSR). The principle of prediction is to correctly determine the appropriate routing protocol for the data collected by the WSN. Selective training samples are first constructed to train base classifiers from a pool of ML models. The action of routing for training is represented as a set P(SP, RP, DP, RA, RO, PDR, APL, TP, DFR, NN), where SP represents the sent packet, RP represents the received packet, DP denotes the dropped packet, RA is the routing agent, RO represents the routing overhead, PDR represents the packet delivery ratio, APL denotes the average path length, TP represents the throughput, DFR represents the dataflow rate, and NN represents the node number.
The ML models in the candidate pool include decision trees (DT), AdaBoost (AB), random forest (RF), support vector machines (SVMs), probit linear models (PLM), neural networks (NNs), decision stump models (DSMs), and naive Bayesian (NB). MCTOPE continuously performs iterations of combining different models, comparing their scores and retaining the model with the highest score to perform routing protocol selection. MCTOPE ranks the features by the metric of mean decrease Gini (MDG). MDG is defined as follows: where c n is the number of classes in the target variable and p i is the ratio of each class.
In general, MCTOPE divides routing protocols into different types and predicts the best routing protocols considering various factors, such as energy efficiency, which can effectively extend the lifetime of WSNs. However, the learning phase has a high overhead due to the centralized implementation of the framework and the need to continuously select appropriate ensemble models from the sample pool.

Discussions
The comparisons of different ML-based routing algorithms are shown in Table 4, and the metrics, such as routing type, overhead, delay, QoS, scalability, and energy efficiency are considered. As shown in Table 4, in order to implement energy-efficient routing in WSNs, MLbased routing schemes adopt different learning algorithms, including kernel linear regression, SOM, Q-learning, reinforcement learning (RL), convolutional neural network (CNN), deep neural network (DNN), and hybrid learning algorithms, which achieve different performances. For example, ESDFM [64] utilizes a kernel linear regression scheme to learn information, which causes a low overhead in WSNs. Q-learning-based QLRR-WA can achieve both low delay and low overhead, which provides a potentially good way to optimize power consumption in WSNs. RL can achieve both low delay and low overhead, which provides a potentially good way to optimize power consumption in WSNs. CNN and DNN have high overhead but good scalability and low delay. Meanwhile, EL based on hybrid learning algorithms has better scalability in WSNs.
ML-based routing algorithms achieve energy efficiency by reducing redundant data, optimizing paths, or optimizing protocols. Most schemes optimize routing paths to implement power saving [65][66][67][68][69][70][71][72]. ESDFM [64] reduces redundant data to save power, which results in high delay. Based on the hybrid ML, EL optimizes the routing protocols to balance the energy consumption and overhead [73]. On one hand, the complex learning procedure incurs additional overhead due to massive calculations. On the other hand, the forwarding consumption of routes and the packet transmission are optimized due to the ML-based prediction and optimal routing, which prolongs the overall lifetime of WSNs to some extent. It is a challenging issue to design energy-efficient ML models to balance the tradeoff between the overhead of training models and the energy saving of the optimal routing.

Applications of ML-Based Routing Algorithms in WSNs
WSNs integrates microelectronics, embedded computing, mobile networks, wireless communication, and distributed information processing technologies, which can accomplish real-time monitoring, sensing, and collecting information under any environmental conditions and location through various kinds of integrated miniature sensors in cooperation. Meanwhile, ML-based algorithms enable WSNs that are more powerful and intelligent. Therefore, ML-based WSNs have become one of the frontier hotspots involving a high degree of interdisciplinary and knowledge integration that is currently attracting international attention. Currently, ML-based WSNs are widely used in many fields such as intelligent healthcare [74,75], industrial applications [69], underwater sensing [76,77], intelligent transportation systems [78][79][80], and smart home [81] in recent years, because of the low cost, easy deployment, and high adaptability.

Intelligent Healthcare
In recent years, ML-based schemes have been widely used to collect various physiological and vital signals of patients in intelligent medical systems to assist users in health management. There are several advantages of healthcare-aware WSNs (HWSN): high QoS, low cost, low power consumption, easy integration, and low complexity. Different from other application fields, since smart medical care is closely related to the life and health of patients, the loss or delay of any important information may lead to irreparable losses. Therefore, there are higher requirements for QoS and delay in HWSN.
In healthcare applications, the occurrence of medical emergencies may lead to problems such as sudden traffic, network congestion, premature death of overloaded nodes, and dynamic changes in routing paths. Therefore, doctors may not be able to obtain the latest information from patients in time. To address these issues, a distributed congestion control and routing algorithm was proposed in HWSN [74]. In the proposed scheme, deployed sensor nodes are divided into different levels to transmit their sensing information through the congestion-free path, which minimizes the total energy consumption and improves QoS. Vital signals related to sensitive information (such as the heart rate, respiratory status, and blood sugar) in healthcare applications are set at a high priority to ensure the reliability and low latency of critical information delivery.
To reduce the energy consumption of collecting patient physiological data in medical scenarios and extend the lifetime of wireless body sensor networks (WBSN), a clusterbased routing protocol was proposed [75], which combines the Q-Learning method and the cluster method to optimize the route between the node and the telemedicine station and reduce the delay of data packet transmission. The scope of cluster head nodes is planned, and the distance between two cluster heads is controlled to ensure better cluster area coverage. To obtain an optimized route to forward the data packet from the source node to the destination, the Q-learning algorithm is used. The node selects a route according to a strategy that attempts to maximize the cumulative reward that a packet may have in subsequent transitions from the current node to the next node, resulting in the shortest path from the beginning to the end. Let r X j z i , a j be the immediate reward that the packet acquires by executing an action a j at node location X j z i , the Q-value Q X j z i , a j at node X j z i with the route a j is given by: where δ X j z i , a j represents the next node location due to the selection of route a j at the node X j z i . If the next state node is X j z j , then Q δ X j z i , a j , a = Q X j z j , a . γ is a constant that ensures the convergence of the sum, and its value ranges from 0 to 1. If the value is close to 0, the immediate reward is considered, while for the value close to 1, the future reward with greater weight is considered.
It is very time-consuming to obtain Q values of all possible routing nodes. By identifying the path closer to the destination a , the Q-learning algorithm is transformed into an enhanced Q-Learning (QL-CLUSTER) algorithm, which improves the computing speed by reducing paths and minimizes memory requirements. This method effectively reduces computing cost of finding the best route.

Industrial Applications
Industrial WSN (IWSN) has the advantages of flexibility, mobility, scalability, and low maintenance, which helps to collect environmental parameters such as temperature and humidity in the factory, or be used to monitor and maintain the operating status of plant equipment. While saving manpower and financial resources, it is also convenient to provide accurate monitoring and correction remotely and improve the output efficiency and quality of products.
To provide the reliable, low-latency, and low-energy communication services in IWSN, a Q-learning reliable route with a weighted proxy (QLRR-WA) algorithm was proposed [67], The algorithm combines the characteristics of RL that adjusts the weights in real time according to the current operating conditions and the advantages of the strong reliability of the graph routing algorithm. Compared with the inconvenience of manually adjusting static weights and parameters in traditional graph routing, this method based on RL only needs low computing resources and workload to flexibly adjust the optimal routing according to the status. The effectiveness of the algorithm was validated in terms of reliability and the power consumption.

Underwater Sensing
Underwater WSNs (UWSNs) can be widely used in marine resource surveying, pollution monitoring, aided navigation, and tactical surveillance. The most important feature that distinguishes the underwater environment from other environments is the difference in the propagation, which also leads to the difference in its communication methods. UWSNs are mainly divided into two categories: underwater acoustic WSNs (UAWSNs) and underwater optical WSNs (UOWSNs). Although there are many similarities between UWSNs and traditional WSNs, some characteristics of UWSNs mean that traditional routing concepts cannot be directly applied to the design of routing algorithms in UWSNs.
The acoustic routing of UAWSNs faces challenges in terms of dynamic structure, high energy consumption, long delay, and narrow bandwidth. Traditional routing protocols are difficult to adapt to the underwater environment. A routing model based on RL and game theory was proposed in [76], which provides an effective packet forwarding mechanism to solve the acoustic transmission routing problem in UAWSNs. This scheme uses a routing based on Q-learning, which works by successively improving its evaluations of the quality of particular actions at particular states. During the operation of UAWSNs, sensor nodes make routing decisions by considering end-to-end delay, hop count, packet error rate, and energy efficiency. Under dynamic acoustic conditions, the sensor nodes attempt to maximize their own profit in a distributed online manner based on the Qlearning approach. This solution enables each sensor node to understand the situation and determine the best routing path through a step-by-step interactive feedback process. During the routing operation, each node periodically updates the routing information and reevaluates the current strategy and selects one of its neighbor nodes for packet forwarding. This method effectively achieves the maximization of the expected benefit and improves the energy efficiency.
Different from UAWSNs, UOWSNs have attracted wide attention for their advantages such as high transmission rate, ultra-wideband, and low latency. However, UOWSNs have limited communication distance, and nodes must transmit data in a multi-hop manner, which leads to the problem of high probability of instability in the connection and makes them have higher requirements for link quality. To realize the adaptive ability in the dynamic environment, a distributed multiagent RL routing (DMARL) scheme was designed [77]. Due to the information interaction between agents, RL algorithms based on multiple agents are beneficial to help agents learn environmental knowledge from a global perspective. A distributed network model is established to support multi-hop data transmissions, which considers the energy, link stability, and data transmission quality to modify the reward function of the routing algorithm. In order to prolong the lifetime of the network, a distributed value function (DVF) based on the Q-learning iterative formula is used to update the Q value of each node. The DVF Q t+1 i s t i , a t i is defined as: where: a f (j) = E j res ∑ i ∈I E i res (15) where V t j (s t+1 j ) and V t i s t i represent the state value functions to estimate the next state s t+1 j and other neighbors' state s t i , r t+1 (s t+1 j ) represents a direct reward that node i (with the current state) will receive at time t + 1, α is the type of learning rate that depends on the link stability, I is the set of i's neighbors, a f (j) represents the received global reward (GR), which indicates the transmission direction of the data packet (the quality of the action performed). According to this weight, GR is hierarchically assigned the local network of node i to encourage nodes with more energy as the next hop and balance the network energy. The determination of GR depends on the information of the previous node ID stored in the data packet, and the value of the distribution weight is calculated through the Q table of node i. w 1 and w 2 represent the weights of long-term rewards received by node i from the selected node j and i's other neighbors, respectively.
In addition, the remaining energy and the link quality between communication nodes are incorporated into the local reward function. When the distance from the last hop to the last hop is greater than the distance from the current node to the sink, it indicates that the transmission direction of the packet is favorable (closer). Then, the positive feedback is used to motivate the node. Otherwise, the negative rewards will be received. To enhance the adaptability to the dynamic environment, DMARL also considers the link quality and remaining energy of the node to extend the lifetime of the network.

Intelligent Transportation Systems
The Internet of Vehicle (IoV) is a typical application in the field of transportation systems, which can effectively improve traffic efficiency and improve traffic safety. The application of wireless multimedia sensor network (WMSN) brings rich sensing and driving capabilities to the sensing layer of IoV. The complex task processing and frequent data communication in IoV need to ensure the effective distribution of energy while ensuring QoS, which is a challenge for heterogeneous WMSN with uneven energy distribution.
Based on RL, an energy-saving distributed adaptive cooperative routing (EDACR) was proposed [78]. Compared to other ML algorithms, the RL algorithm is lighter, has low requirements on computing resources, and can provide the high flexibility and adaptability for the dynamic topology of heterogeneous WMSNs. In EDACR, RL is used to maintain the routing table, and some suitable relay nodes instead of all nodes are selected to maintain the routing table to achieve energy-efficient distribution. The nodes adaptively choose whether to obtain the knowledge of expected performances of reliability and delay that could be provided by the candidate relay nodes through by lightweight RL according to their remaining energy. EDACR combines collaborative routing and RL to reduce system energy consumption while ensuring QoS.
To guarantee the real-time transmission in IoV, a routing algorithm based on collaborative learning automata was proposed [79]. The learning automata (LA) located at the nearest access point (AP) learns from past experience and quickly makes routing decisions. The vehicle density as well as the distance and delay between the AP and the nearest service unit are input as parameters to cooperate with LA to select an optimized path. Vehicles communicate with each other in a collaborative manner to share information about these variables and intelligently choose the best route. Experimental results show that the routing algorithm can quickly plan the optimal path and effectively reduce the delay and energy consumption.
Considering the energy transfer problem in the vehicle energy network (VEN) organized by renewable energy, charging stations, and electric vehicles, the literature [80] takes the joint optimization of renewable energy routing and dynamic storage allocation as the goal and studies a VEN with time-varying point-to-point traffic flow and adjustable energy storage capacity at stations. Based on linear programming, the joint energy storage capacity and routing planning method are explored. The long and short-term memory model is used to predict traffic patterns, and RL is used to iteratively improve the prediction accuracy. Experiments verify the effectiveness of the proposed method in optimizing routing and reducing energy consumption.

Smart Home
A smart home is based on intelligent living systems that can help people improve their quality of life. Sensors are embedded in home appliances and furniture and connected to the Internet through wireless networks. The sensors deployed in the room can sense the humidity, temperature, light, air composition, and other data, so as to automatically control the doors, windows, air conditioners, and other home appliances, providing people with an intelligent and comfortable living environment. The use of remote monitoring systems can achieve remote control of home appliances. Meanwhile, WSN can monitor the potentially dangerous behavior of infants and elderly people living alone and provide early warnings to prevent tragedies.
The home healthcare scheduling and routing problem (HHCSRP) was explored [81], and an offline learning method was proposed that can process all the information needed to make decisions for HHCSRP. Considering that neural networks have been widely used in the complex information processing of decision-making algorithms, shallow neural networks was adopted to approximate the action value function of offline model-free RL algorithms to develop a more intelligent routing decision algorithm and solve the home care routing problem. The experimental results show that the neural network approximator enables the agent to make effective routing decisions based on the complete information of the environment, reducing unnecessary energy loss.

Challenges for ML-Based Routing Algorithms in WSNs
Despite their great promise, there are many challenges to implementing ML-based routing algorithms in WSNs, which should be addressed in future research. The key challenges come from the requirements around learning performance, energy efficiency, real-time performance, and maintaining the security of the application and privacy of the users.
Most current studies consider routing transmission and decision making in a single ML framework to obtain the best policy. However, from a real-world perspective, it is extremely costly for the network structure to collect information, which may be due to high energy consumption, long delay, asynchronous information preprocessing, and reduced learning speed. Therefore, the open problem is finding the best balance between learning performance and information quality. This prevents agents from consuming excessive resources just to achieve a minimal, negligible increase in learning performance.
As highlighted earlier, sensor nodes cannot replace their batteries after deployment, and the lifetime of the sensors determines the lifetime of WSNs, which places a demand on the energy efficiency of the routing protocol. However, ML algorithms, especially the deep neural network, need a large amount of computation to ensure the accuracy of the algorithm, which will undoubtedly consume a lot of energy, making the needs of the two contrary. Determining how to identify a point where the superiority of ML algorithms is balanced with energy efficiency is one of the core challenges for future applications of ML algorithms in WSNs.
In some application scenarios, WSNs require a real-time response after monitoring events. Some ML algorithms take the approach of collecting network conditions periodically before recalculating and distributing the updated network dispatch information, which cannot meet a real-time performance in WSNs. Therefore, it is necessary to adopt efficient solutions for specific applications to ensure that relevant information is communicated quickly and accurately.
In addition to the requirements for energy efficiency and real-time performance, security is also a key issue for WSNs, since WSN is a resource-constrained communication network, and communication between sensors mainly relies on low-bandwidth channels.
Hence, it is too complicated for traditional protection mechanisms to operate on WSNs to ensure their security. Identifying how to ensure the security as well as privacy of sensing data through ML algorithms in WSNs is one of the most challenging research topics.
Another challenge is most existing ML-based routing algorithms are based on simulation, which is far away from the actual environments of WSNs. Building large-scale test beds to evaluate the performance of the existing algorithms and refine the ML-based routing schemes is an exciting research area in WSNS.

Conclusions and Future Research Directions
This paper presents a comprehensive overview of routing algorithms in WSNs. We elaborate on traditional and ML approaches of green routing algorithm design in WSNs. Based on detailed comparisons and analysis, we propose a mathematical hypothesis model of an ML-based routing algorithm for extending the lifetime of WSNs. In addition, this study reviews the basic principles and characteristics of various routing algorithms in WSNs. We also discuss the merits and demerits of the different techniques that can be used to improve the performance of routing algorithms in WSN. Finally, this study presents the challenges of implementing ML for routing algorithms design in WSNs and identifies future research directions to be addressed and worthy of exploration with ML. This discussion will be of interest to a broad audience in ML and WSNs.
Future research work may include the following: Due to the arithmetic bottleneck and energy consumption limitation of WSNs, ML algorithms cannot be deployed at scale in sensors with small computational power and limited energy. However, distributed learning methods require less computational capacity, energy consumption, and smaller memory footprints than centralized learning algorithms (i.e., they do not need to consider the entire network information). Distributed cooperative learning breaks the arithmetic bottleneck and achieves ML-based green routing with less energy consumption, which is very suitable for WSNs.
On the other hand, ML algorithms require a lot of computation and energy for proper parameter learning in the training learning phase, which makes their deployment in WSNs extremely difficult. Additionally, nodes have very different computational capabilities (e.g., the sink has a strong computational capability, while the rest of the nodes are weak), which provides us with ideas for how to apply transfer learning in WSNs. Since sink nodes have more power and computing capacity, they can train the model in a distributed manner with other sensor nodes. Then, sink nodes can transmit the trained model parameters to the sensor nodes. After receiving the model parameters from different nodes, the sensor nodes multiply these parameters by the corresponding weights and perform a weighted average to obtain their final model parameters, which will reduce the training consumption of the sensor nodes and improve the accuracy of the model in WSNs. Meanwhile, addressing the QoS-aware routing is quite a challenging task. Satisfying the delay constraints, bandwidth constraints, and incorporating machine learning techniques to develop a routing protocol is quite exciting research, especially utilizing the hybrid ML techniques in WSNs.