Modeling and Solution of the Routing Problem in Vehicular Delay-Tolerant Networks: A Dual, Deep Learning Perspective

: The exponential growth of cities has brought important challenges such as waste management, pollution and overpopulation, and the administration of transportation. To mitigate these problems, the idea of the smart city was born, seeking to provide robust solutions integrating sensors and electronics, information technologies, and communication networks. More particularly, to face transportation challenges, intelligent transportation systems are a vital component in this quest, helped by vehicular communication networks, which o ﬀ er a communication framework for vehicles, road infrastructure, and pedestrians. The extreme conditions of vehicular environments, nonetheless, make communication between nodes that may be moving at very high speeds very di ﬃ cult to achieve, so non-deterministic approaches are necessary to maximize the chances of packet delivery. In this paper, we address this problem using artiﬁcial intelligence from a hybrid perspective, focusing on both the best next message to replicate and the best next hop in its path. Furthermore, we propose a deep learning–based router (DLR + ), a router with a prioritized type of message scheduler and a routing algorithm based on deep learning. Simulations done to assess the router performance show important gains in terms of network overhead and hop count, while maintaining an acceptable packet delivery ratio and delivery delays, with respect to other popular routing protocols in vehicular networks.


Introduction
As urban environments have exponential grow, smart cities (SC) is the technological paradigm that aims at providing the ultimate solution to the urban development in every aspect in wide areas such as social management, security and safety, health and medical care, smart living, tourism, and transportation, with the aid of sensors and electronics, communication networks, and information technologies [1,2]. Among the essential needs and key components of a smart city are intelligent transportation systems, which seek to provide a solution to transportation-related problems, such as pollution, traffic congestions, and accident reduction [3,4]. In this sense, vehicular networks play a key role by providing a communication framework for moving vehicles, road infrastructure, and pedestrians [5]. The main goal of vehicular networks is to provide seamless wireless communication between cars (vehicle to vehicle, or V2V), infrastructure (vehicle to infrastructure, or V2I), pedestrians (vehicle to pedestrian, or V2P), and virtually any object (vehicle to anything, or V2X), which would allow important improvements to transportation services as we know them as well as the creation of new ones [6,7].

Related Work
In the past several years, several approaches have been proposed to address the routing problem in VDTN, but due to the particular characteristics of vehicular environments, and especially the lack of an end-to-end connection between nodes in a vehicular network, non-deterministic approaches must be used [10,11].
Some routers for delay-disruption tolerant networks, like the epidemic router [15] and the spray and wait router [16], use a flooding-based principle of spreading copies of the messages to newly discovered contacts. The epidemic router is one of the most popular routers in this category [7,15], whose approach is to distribute messages to other hosts within connected portions of the network, relying upon such carriers coming into contact with another connected portion of the network through node mobility, hoping that through that transitive transmission of data, messages will eventually reach their destination. This routing protocol provides an acceptable delivery rate and delay but at the expense of using too many resources in the network. In the same way, the spray and wait router [16] uses a similar (flooding-based) but more controlled mechanism, "spraying" a number of copies into the network, and "waiting" until one of these nodes meets the destination. More particularly, this router passes L copies from the source node (phase 1-spray), and then each of the L copies waits in their temporal host until there is a contact, if any, with the destination (phase 2-wait), to whom they are only then forwarded.
Other routers use probabilistic approaches to increase the chances of packet delivery. MaxProp [22] is one of the first routers proposed in this category. This router uses what the authors call an estimated delivery likelihood for each node in the network, updated through incremental averaging, so nodes that are seen infrequently obtain lower values over time, and packets that are ranked with the highest priority are the first to be transmitted during a transfer opportunity, whereas those ranked with the lowest priority are the first to be deleted to make room for incoming packets. On the other hand, the PRoPHET Router [17] is perhaps the most popular router in the probabilistic routing category. Based on the history of encounters between the nodes, this router introduces a metric called delivery predictability, a set of probabilities for successful delivery to known destinations in the network and established at each node for each known destination. This way, when nodes meet, they exchange information about the delivery predictabilities and update their own information accordingly, and the final forwarding decision is made based on these values to whether or not pass the current message to particular nodes.
In recent years, the use of artificial intelligence techniques has gained tremendous popularity because of the successful application to many practical optimization, prediction, and classification problems that include image processing (facial recognition, cancer detection, etc.), forecasting (stock prediction, weather forecasting, etc.), and others [23,24]. The application of AI-based algorithms to the routing problem in VDTN, however, is still not fully explored, although some efforts have been conducted towards this direction. In this category, SeeR is one of the most efficient routers [18]. This router uses the simulated annealing algorithm to evaluate which messages are best to be transferred in each contact opportunity. Each message is associated with a cost function in terms of the hop count and the average intercontact time of the current node, and one node transfers a message to another node if the second node offers lower cost value. Otherwise, the messages are forwarded, first decreasing their probability. Their experiment results show considerable gains in the average delivery ratio and improvements in delivery delays with respect to flooding algorithms like epidemic routing and spray and wait. Another router in this category is KNNR, a router based on the KNN classification algorithm, proposed in [25]. They use six parameters (available buffer space, time-out ratio, hop count, neighbor node distance from destination, interaction probability, and neighbor speed) to decide on the final label. The class used during the training stage (which is done offline) is based on the interaction probability, which is the same used in PRoPHET. Like SeeR, this router addresses the routing problem under the best next message perspective. Their results show better average delivery ratio and acceptable delay with respect to Epidemic and PRoPHET routers. Also, the authors in [26] Appl. Sci. 2019, 9, 5254 4 of 17 propose MLProph, a machine learning model as a routing protocol. They use the PRoPHET router as the base and expand its capabilities by adding some other features to the equation, and the result is an improved router with respect to the base. Although they use a neural network model as well, they use a different algorithm than the one proposed here, Furthermore, their router makes calculations for each connected router, which increases computational resources such as time and CPU usage, and transfers sensible information from the connected nodes, increasing the risk of security leaks. In [27], the authors presented CRPO (cognitive routing protocol for opportunistic networks), which also uses a neural network as the core, although the decision parameter is the probability of encounter defined in PRoPHET; hence, CRPO is similar in nature to MLProph, since both of them use PRoPHET's probability as their main decision parameter. Although the authors claim that the training stage is run for X units of time each Y units of time, they do not provide further detail on this. Finally, in [28], the authors explore the possibility of removing the routing protocol from a wireless network using deep learning techniques. The problem statement, however, is formulated as a classical optimization problem to find the shortest path in a connected graph. That is, the scenario is different to that of a vehicular network, since one of the main characteristics in VDTN is precisely the lack of a fixed topology with pre-defined paths.

Formulation of the Routing Problem
Let N = {N i |1 ≤ i ≤ L N } be the set of available nodes in a vehicular network with constant disruptions and non-fixed topology, and let A ∈ N be a given node in that set ( Figure 1). Given the fact that there are no predefined paths and the connections are intermittent, the nodes in the network must act opportunistically, taking advantage of any node that gets into their communication range, because whenever these encounters happen, the opportunity of replicating a message arises. In those situations, A has to decide on a node to start a transfer, and several criteria can be used for this decision, but ultimately, A would like to choose the node with better capabilities of further spreading the messages until hopefully they get to their destination. Following this approach, the routing problem can then be expressed as finding the best next hop (BNH) for the messages. That is, from all k nodes that A is connected to in a given moment, the one, N x , with better fitness f x must be determined, in terms of its current features x 1 , . . . x n . Furthermore, in order to optimize the communication conditions, not only must the best next hop B be selected, but we can also detect the best next message (BNM) to be transferred. That is, based on its current attributes y 1 , y 2 , . . . , y m , we must be able to select from the message queue M = {M i |1 ≤ i ≤ L M } the message M y ∈ M with the best fitness f y . Because neural networks have the power to learn very complex non-linear patterns, they are the perfect fit for what we are traying to achieve here, so we can model both optimization scenarios as binary classification tasks to allow us to quantify the capabilities of such nodes N i as a function F of some of their characteristics x i as f x = F(x 1 , x 2 , . . . x n ) and the capabilities of such messages M i as a function G of some of their characteristics y i as f y = G(y 1 , y 2 , . . . y n ).
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 17 equation, and the result is an improved router with respect to the base. Although they use a neural network model as well, they use a different algorithm than the one proposed here, Furthermore, their router makes calculations for each connected router, which increases computational resources such as time and CPU usage, and transfers sensible information from the connected nodes, increasing the risk of security leaks. In [27], the authors presented CRPO (cognitive routing protocol for opportunistic networks), which also uses a neural network as the core, although the decision parameter is the probability of encounter defined in PRoPHET; hence, CRPO is similar in nature to MLProph, since both of them use PRoPHET's probability as their main decision parameter. Although the authors claim that the training stage is run for X units of time each Y units of time, they do not provide further detail on this. Finally, in [28], the authors explore the possibility of removing the routing protocol from a wireless network using deep learning techniques. The problem statement, however, is formulated as a classical optimization problem to find the shortest path in a connected graph. That is, the scenario is different to that of a vehicular network, since one of the main characteristics in VDTN is precisely the lack of a fixed topology with pre-defined paths.

Formulation of the Routing Problem
Let = { |1 ≤ ≤ } be the set of available nodes in a vehicular network with constant disruptions and non-fixed topology, and let ∈ be a given node in that set ( Figure 1). Given the fact that there are no predefined paths and the connections are intermittent, the nodes in the network must act opportunistically, taking advantage of any node that gets into their communication range, because whenever these encounters happen, the opportunity of replicating a message arises. In those situations, has to decide on a node to start a transfer, and several criteria can be used for this decision, but ultimately, would like to choose the node with better capabilities of further spreading the messages until hopefully they get to their destination. Following this approach, the routing problem can then be expressed as finding the best next hop (BNH) for the messages. That is, from all nodes that is connected to in a given moment, the one, , with better fitness must be determined, in terms of its current features , … . Furthermore, in order to optimize the communication conditions, not only must the best next hop be selected, but we can also detect the best next message (BNM) to be transferred. That is, based on its current attributes , , … , , we must be able to select from the message queue = { |1 ≤ ≤ } the message ∈ with the best fitness . Because neural networks have the power to learn very complex non-linear patterns, they are the perfect fit for what we are traying to achieve here, so we can model both optimization scenarios as binary classification tasks to allow us to quantify the capabilities of such nodes as a function of some of their characteristics as = ( , , … ) and the capabilities of such messages as a function of some of their characteristics as = ( , , … ).

DLR+ Router Overview
In this section, we describe in more detail the fundamental principle and architecture of DLR+, the router in the proposed solution. The main idea is to have a router capable of learning from the conditions of its environment and use such information to make smart forwarding decisions. In order to achieve that, the router uses two pre-trained feed forward neural networks to process the information from both its neighbors and the messages in their queues in real time and select from them the best next hop for the best next message, according to their current fitness. More details are given in the following subsections.

Router Architecture
The core of the router has two fundamental modules that allow the router, upon a connection-up event, to choose the best next hop from its current connections and the best next message to send from its queue, but also to share information to other nodes (upon request), so they can decide whether or not to pass a packet to it. Such modules are called, respectively, the connections manager and the fitness center, which in turn has two independent modules for the messages and for the host itself ( Figure 2).

DLR+ Router Overview
In this section, we describe in more detail the fundamental principle and architecture of DLR+, the router in the proposed solution. The main idea is to have a router capable of learning from the conditions of its environment and use such information to make smart forwarding decisions. In order to achieve that, the router uses two pre-trained feed forward neural networks to process the information from both its neighbors and the messages in their queues in real time and select from them the best next hop for the best next message, according to their current fitness. More details are given in the following subsections.

Router Architecture
The core of the router has two fundamental modules that allow the router, upon a connectionup event, to choose the best next hop from its current connections and the best next message to send from its queue, but also to share information to other nodes (upon request), so they can decide whether or not to pass a packet to it. Such modules are called, respectively, the connections manager and the fitness center, which in turn has two independent modules for the messages and for the host itself ( Figure 2).

The Fitness Center
This part of the router has two pre-trained deep feed forward neural networks that use the available local information to compute the router's current fitness , defined as the value that determines its ability to correctly deliver data packets to the final destination, and the fitness for each message in the queue, with , ∈ , 0 ≤ , ≤ 1. The closer these values are to 1, the fitter their owners are. More details on how to get these numbers are given in Section 4.2. These values are automatically updated in each router right after a connection is ended and right after a new message has been received, so they are available and ready to be used at any moment.

The Connections Manager
The function that this module has is vital in the selection of the best next message for the best next hop. This module manages the incoming connections, requesting their values in order to select the fittest node. After this, if available, the message scheduler will send the fittest message to such node.

The Neural Networks
We treat the problem of finding the BNH and BNM as binary classification problems, given that we would like to know if the node and messages are in best conditions (i.e., fit) to carry and deliver the messages, or not. Thus, the neural networks used in the fitness center are feed forward neural networks, whose general architecture is presented in Figure 3.

The Fitness Center
This part of the router has two pre-trained deep feed forward neural networks that use the available local information to compute the router's current fitness f x , defined as the value that determines its ability to correctly deliver data packets to the final destination, and the fitness f y for each message in the queue, with f x , f y ∈ R, 0 ≤ f x , f y ≤ 1. The closer these values are to 1, the fitter their owners are. More details on how to get these numbers are given in Section 4.2. These values are automatically updated in each router right after a connection is ended and right after a new message has been received, so they are available and ready to be used at any moment.

The Connections Manager
The function that this module has is vital in the selection of the best next message for the best next hop. This module manages the incoming connections, requesting their f x values in order to select the fittest node. After this, if available, the message scheduler will send the fittest message to such node.

The Neural Networks
We treat the problem of finding the BNH and BNM as binary classification problems, given that we would like to know if the node and messages are in best conditions (i.e., fit) to carry and deliver the messages, or not. Thus, the neural networks used in the fitness center are feed forward neural networks, whose general architecture is presented in Figure 3. Here, ∈ is the set of input values , ∀ ∈ {1,2, … , } that reflect some of the characteristics of the host at that moment, such as its speed and buffer occupancy; ∈ is the vector that contains the values ℎ (computed according to Equation (3)) of the neurons in the hidden layer number , ∀ ∈ {1, … , }, where is the number of hidden layers in the network; and is the resulting fitness value of the host in the given conditions. The set of weights (synapsis) of the neural network, without its bias values, is given by ∈ × for the connections between the input layer and the hidden layer 1, and ∈ for the connections between the hidden layer and the next hidden layer + 1, for all 1 ≤ ≤ , including the connections from the last hidden layer to the output layer. Finally, the bias values of each synapsis are given by ∈ , ∀ ∈ (0, ).
Similarly, ∈ × is the synapsis vector for the connections from the input layer to the first hidden layer, and ∈ are the synapsis for the connections from the -th hidden layer to the next one, including the connections from the last hidden layer to the output layer, and the bias values of each synapsis are given by ∈ , ∀ ∈ (0, ]).
As for the number of hidden layers, the universal approximation theorem [29] establishes that "a neural network with a single hidden layer with a finite number of neurons can approximate any continuous function on compact subsets in "; this implies that, finding the appropriate parameters, a neural network with one single hidden layer is enough to represent a great amount of systems. Nonetheless, the width of such layer might become exponentially big. Indeed, Ian Goodfellow, a pioneer researcher on deep learning, holds that "a neural network with a single layer is enough to represent any function, but the layer can become infeasibly large and fail to learn and generalize correctly" [30]. On the other hand, while not having hidden layers at all in the neural network would only serve to represent linearly separable functions, a hidden layer can approximate functions with a continuous mapping from a finite space to another, and two layers can represent an arbitrary decision boundary with any level of accuracy [31]. In summary, this means that one hidden layer helps to capture non-linear aspects from a complex function, but two layers help generalize and learn better. In fact, the authors hold that one rarely needs more than two hidden layers to represent a complex non-linear model. On the other hand, for the number of neurons in each hidden layer , there is no formula to have an exact number, although some empirical rules can be used [32]. The most common assumption is that the optimal size of the hidden layers is, in general, between the size of the input layer and the size of the output layer. For this module in DLR+, this would mean that ≥ ≥ 1. Another suggestion is to keep this number as the mean between the number of neurons in the input and output layers and from here start decreasing the number of neurons in each subsequent layer without falling below 2 neurons in the last hidden layer. In DLR+, this would imply that = ≥ ≥ 2. One last suggestion to avoid overfitting during the training process (which would mean that the neural network would have great memory capacity, but no prediction capabilities over unseen data) is to keep the number of neurons in the hidden layers as < Here, X ∈ R n is the set of n input values x i , ∀i ∈ {1, 2, . . . , n} that reflect some of the characteristics of the host at that moment, such as its speed and buffer occupancy; H i ∈ R n hi is the vector that contains the values h i (computed according to Equation (3)) of the n hi neurons in the hidden layer number i, ∀i ∈ {1, . . . , K}, where K is the number of hidden layers in the network; and f is the resulting fitness value of the host in the given conditions. The set of weights (synapsis) of the neural network, without its bias values, is given by S N0 ∈ R n×n h1 for the connections between the input layer and the hidden layer 1, and S Ni ∈ R n hi for the connections between the hidden layer i and the next hidden layer i + 1, for all 1 ≤ i ≤ K, including the connections from the last hidden layer to the output layer. Finally, the bias values of each synapsis are given by B Ni ∈ R n hi , ∀i ∈ (0, K). Similarly, S M0 ∈ R m×m h1 is the synapsis vector for the connections from the input layer to the first hidden layer, and S Mi ∈ R m hi are the synapsis for the connections from the i-th hidden layer to the next one, including the connections from the last hidden layer to the output layer, and the bias values of each synapsis are given by B Mi ∈ R n hi , ∀i ∈ (0, K]).
As for the number K of hidden layers, the universal approximation theorem [29] establishes that "a neural network with a single hidden layer with a finite number of neurons can approximate any continuous function on compact subsets in R n "; this implies that, finding the appropriate parameters, a neural network with one single hidden layer is enough to represent a great amount of systems. Nonetheless, the width of such layer might become exponentially big. Indeed, Ian Goodfellow, a pioneer researcher on deep learning, holds that "a neural network with a single layer is enough to represent any function, but the layer can become infeasibly large and fail to learn and generalize correctly" [30]. On the other hand, while not having hidden layers at all in the neural network would only serve to represent linearly separable functions, a hidden layer can approximate functions with a continuous mapping from a finite space to another, and two layers can represent an arbitrary decision boundary with any level of accuracy [31]. In summary, this means that one hidden layer helps to capture non-linear aspects from a complex function, but two layers help generalize and learn better. In fact, the authors hold that one rarely needs more than two hidden layers to represent a complex non-linear model. On the other hand, for the number n hi of neurons in each hidden layer H i , there is no formula to have an exact number, although some empirical rules can be used [32]. The most common assumption is that the optimal size of the hidden layers is, in general, between the size of the input layer and the size of the output layer. For this module in DLR+, this would mean that n ≥ n hi ≥ 1. Another suggestion is to keep this number as the mean between the number of neurons in the input and output layers and from here start decreasing the number of neurons in each subsequent layer without falling below 2 neurons in the last hidden layer. In DLR+, this would imply that n h1 = n 2 ≥ n h2 ≥ 2 . One last suggestion to avoid overfitting during the training process (which would mean that the neural network would have great memory capacity, but no prediction capabilities over unseen data) is to keep the number of neurons in the hidden layers as n hi < Finally, the rectified linear unit (ReLU, for short) was used as activation function for the neurons in the hidden layers (Equation (1)), and the sigmoid function σ(z) (defined in Equation (2)) as activation function for the neuron in the output layer, because we want this value to reflect the fitness of the hosts, and the nature of this function returns values between 0 and 1. This way, the fitness value for the host is computed taking the current set of features X of the host and making a forward pass through the neural network, as is shown mathematically by Equations (3) and (4), where P·Q denotes the dot product between P and Q. Given the nature of the sigmoid function, the closer to 1 is a value f , the fitter the host will be, and vice versa.

The Routing Algorithm
To have some sensitivity with respect to other node's fitness, DLR+ uses the parameter α, with 0 ≤ α ≤ 1, named as the host fitness threshold, that determines the fitness limit over which the incoming connections may be directly ignored. This value is a key component in the routing protocol in DLR+, because different threshold values result in different dynamics in the opportunistic environment. In a similar way, we introduced β, the message fitness threshold, that determines a limit of fitness for the messages in the queue, above which they can be directly ignored by the message dispatcher.

f-Value Update
This first stage takes place each time a connection between the host and another node in the vehicular network has ended. Since some of the host's features may have changed (such as buffer occupancy, dropping rate, and others), its fitness value has to be recomputed as well. For this, the considered features x i are obtained in the fitness center, and they are passed through a process of normalization to obtained normalized features x i , according to Equation (5), where x is a feature that is being transformed, and x m and x M are the minimum and maximum registered values of that feature.
This will give final input values x i , with 0 ≤ x i ≤ 1, which in turn will make the prediction process more reliable. These normalized values are forward passed through the network, according to Equations (3) and (4) to get the final updated f value.
A similar process is executed each time a message is received by the host. Whenever this happens, the f value of the incoming message is computed according to Equations (3) and (4) in its corresponding neural network. Finally, the message is put in the queue according to its fitness. This way, the message queue is always ready with the messages ordered by the fittest message first.

BNH Selection and Packet Forwarding
The second stage of the routing process occurs when a link is established between the current host and some of its neighbor nodes. At that moment, the router will attempt to exchange deliverable Appl. Sci. 2019, 9, 5254 8 of 17 messages (i.e., messages whose final destination is among the current connections), if any. Then, the host router asks the connected nodes for their fitness values (which, thanks to their fitness center, are always up to date). After that, before entering the final selection, the router directly discards those connections whose f value is not at least the fitness threshold α, and orders in descending order the remaining connections, according to their fitness. With a complete list of fit candidates, the selection process is straightforward: The best next hop will be the fittest node (the one with the higher f value), so the router will attempt to replicate a data package to the nodes in that order. Algorithm 1 summarizes the routing protocol, as explained in the previous subsections. C o : the set of connection tuples ordered by fitness Steps: 1. Exchange messages whose final destination is in C 2.
Do: for each c i ∈ C: Sort C o in descending order 4.
Do: for each m i ∈ M: get f i if f i ≥ β: for each c i ∈ C o : replicate m i to c i

Experiment
In this section, we describe the design and execution of the experiment to validate the proposed solution. First, we explain the general setup, and then go to the router and neural networks tuning as well as the evaluation metrics considered in this experiment.

Simulation Setup
We used The ONE simulator, which is a virtual environment designed to test opportunistic networks [33]. The test scenario, delimited by a 1000 m by 1200 m squared terrain (Figure 4), was a portion of Queretaro City, a medium-sized state in Mexico, with little over 2 million inhabitants. The main simulation was done with DLR+, and we tested against four popular routing protocols: The epidemic router, the spray and wait router, the PRoPHET router, and the Seer router, from the flooding-based, probabilistic, and AI-based categories, respectively, as explained in Section 2. The simulation period was 43,200 s (12 h).

Experiment
In this section, we describe the design and execution of the experiment to validate the proposed solution. First, we explain the general setup, and then go to the router and neural networks tuning as well as the evaluation metrics considered in this experiment.

Simulation Setup
We used The ONE simulator, which is a virtual environment designed to test opportunistic networks [33]. The test scenario, delimited by a 1000 m by 1200 m squared terrain (Figure 4), was a portion of Queretaro City, a medium-sized state in Mexico, with little over 2 million inhabitants. The main simulation was done with DLR+, and we tested against four popular routing protocols: The epidemic router, the spray and wait router, the PRoPHET router, and the Seer router, from the flooding-based, probabilistic, and AI-based categories, respectively, as explained in Section 2. The simulation period was 43,200 s (12 h).

Mobility Model
One of the features that makes the simulation more realistic is the model that governs the movement of the nodes in the vehicular network, providing coordinates, speeds, and pause times for the nodes. Popular models include random waypoint, map-based movement, and shortest path mapbased movement [34]. We used the latter for the simulation, which constrains the node movement to predefined paths, using Dijkstra's shortest path algorithm to find its way through the map area. Under this model, once one node has reached its destination, it waits for a pause time, then another random map node is chosen, and the node moves there repeating the process.

Host Groups
For this simulation, there was a total of 85 nodes, divided into eight different groups, each with particular characteristics. The wireless access for vehicular environment (WAVE) IEEE 802.11p Standard [35] established a minimum of 3 Mbps and a maximum of 27 Mbps speeds for wireless communications. Thus, we decided to include connections at 6 Mbps, 12 Mbps, and 24 Mbps. Also, we included some Bluetooth connections at 2 Mbps. The buffer size, maximum node speed, and number of nodes of each type are shown in Table 1. The time to live of the messages (TTL, in seconds)

Mobility Model
One of the features that makes the simulation more realistic is the model that governs the movement of the nodes in the vehicular network, providing coordinates, speeds, and pause times for the nodes. Popular models include random waypoint, map-based movement, and shortest path map-based movement [34]. We used the latter for the simulation, which constrains the node movement to predefined paths, using Dijkstra's shortest path algorithm to find its way through the map area. Under this model, once one node has reached its destination, it waits for a pause time, then another random map node is chosen, and the node moves there repeating the process.

Host Groups
For this simulation, there was a total of 85 nodes, divided into eight different groups, each with particular characteristics. The wireless access for vehicular environment (WAVE) IEEE 802.11p Standard [35] established a minimum of 3 Mbps and a maximum of 27 Mbps speeds for wireless communications. Thus, we decided to include connections at 6 Mbps, 12 Mbps, and 24 Mbps. Also, we included some Bluetooth connections at 2 Mbps. The buffer size, maximum node speed, and number of nodes of each type are shown in Table 1. The time to live of the messages (TTL, in seconds) was iterated from the list TTL = {0, 25, 50, 75, 100, 150, 200, 300} to have a broader understanding of the behavior of the router.

Design and Training of the Neural Networks in DLR+
The general architecture of the neural networks used in DLR+ was presented in detail in Section 4.2. As noted, all of the parameters were left as variables, meaning that they can be further adjusted in future versions as desired. The neural networks considered in this work are deep feed forward neural networks with two hidden layers, which provide the capability to capture complex non-linearities in the system. This way, the networks consisted in an input layer, two hidden layers, and an output layer. As explained in Section 4.2, the number of neurons in the input layers is the number n of features to process from each sample in the classification process. For this version of DLR+, for the host's fitness, eight different features x i were considered, plus an additional eight features x j = x 2 i , 1 ≤ i ≤ 8, to help capture nonlinearities, for a total of n = 8 input features, listed in Table 2. For the second neural network (the one that takes care of the messages fitness), we used a total of m = 3 different features, described in Table 3. We also included the squared features during the training process, but did not notice any gains in accuracy, so we decided to take them out. As for the number of neurons in the hidden layers, following the suggestions shown in Section 4.2 and seeking a short computational time, we opted for n h1 = 14 and n h2 = 10. In a similar way, we decided to use m h1 = 5 and m h2 = 3 for the messages' neural network.
Finally, the output layer in both neural networks (the one for the host fitness and the one for the messages) has a single neuron, that, according to Equation (2) and explained in Section 4.2, will have a value between 0 and 1. During the training process, this value is further converted to a digital value, so each sample has a unique label l ∈ {0, 1}, given by Equation (6), where f is the value returned by the sigmoid function in the last part of the forward pass.
This labeling process is used to compare and evaluate the prediction class during training. However, we have to remember that during runtime in the VDTN environment this labeling process must not be done, because we are only interested in identifying the samples with the best fitness (that is, the samples with the highest f value), which are directly obtained after the forward pass by the sigmoid function (see Equations (3) and (4)).
For the training stage, DLR+ uses K + 1 synapses matrixes S i with their corresponding bias vectors B i , with i ∈ {0, . . . , K}, where K is the number of hidden layers of the deep neural networks, as introduced before in Section 4.2. These matrixes are obtained during the training process by using a dataset with samples obtained from a simulation scenario with the conditions defined in Section 6.1. More particularly, the hosts were configured to be one of the three popular routers PRoPHET, Spray and Wait, or SeeR, and a total of 11,016,000 sample vectors X = [x 1 , x 1 , . . . , x 8 ] were obtained from a simulation with a simulation time of 43,200 s (12 h), gathering the current features x i of each of the 85 hosts each second. The labels l for each sample were directly obtained from the feature final delivery rate (FDR), considering that the more messages a host delivers to a final destination, the closer to a fit node it must be. For this, the samples were passed through a standardization process and the ones that got a positive z-score were considered as "fit" (l = 1) according to Equation (7), where x is the value of the aforementioned feature FDR, x is the mean of all those FDR values in the data set, and σ is the sample standard deviation.
In preprocessing, all duplicated records were deleted from the original dataset, and all remaining values were normalized for each feature x i /y i , according to Equation (5), to have a better mapping and a faster convergence during training; finally, the final dataset was randomly permuted. From this, the resulting dataset was split into two subsets for real training (80% of the data) and validation (20%), to assess the learning process and generalization. Other hyperparameters of the neural networks were the ADAM optimizer (faster than the traditional stochastic gradient descent, [36]) and binary cross-entropy as an error function. This way, we got 90.12% accuracy in the training set and 90.55% in the validation set. This is how synapses and bias matrixes S i and B i used in DLR+ were obtained.

The Fitness Thresholds in DLR+
As described at the beginning of Section 5, the fitness threshold α is a router parameter used to discriminate "bad" from "good" nodes as explained in the routing algorithm definition. This value can be any real number between 0 and 1, each possibility resulting in a different router performance, as can be seen in the results section (Section 7). We found that α = 0.65 offered the optimal performance, so that is the default value for this parameter in DLR+. As for the β value, we did not notice any significant differences for values different than 0, so we decided to use β = 0 as the default value.

Evaluation Metrics
The following key evaluation metrics were considered to assess the performance of DLR+ during the simulation.

Packet Delivery Ratio
We will call this metric PDR, for short. This value is defined as in Equation (8) and is a value that is desired to be maximized, which would mean that a great amount of the messages that were created were successfully delivered to its destination.
Ideally, we would like this number to be 1, but in practice, this seems rather impossible, since there are other constrains in the network, such as buffer size and message TTL, resulting in dropping or destruction policies, which prevent some of the messages to get to its destination. Because the resources in the network are limited, that is precisely why they must be optimized. This parameter shows the fraction of created messages that got to its destination.

Average Delivery Delay
Also known as latency, this parameter is the elapsed time since a message is created until it reaches its destination. In other words, this number shows how long it takes for a message to be delivered. Ideally, we would like this value to be 0, but this is obviously impossible. Instead, the minimization of this parameter is pursued. We will call this parameter ADD, for short.

Network Overhead Ratio
This parameter (that we will call OVH, for short), shows the ratio of the messages that were relayed to the network that did not reach their destination with respect to the number of messages that did do it. Equation (9) shows this definition: The impact of OVH in the network is directly in the resource usage on the entire network. Ideally, this value should be minimized to reduce the problems related to poor bandwidth allocation, such as network congestions and consequential delays and disruptions.

Hop Count
HOP for short, this parameter shows the number of nodes that a message must have traversed to get to their final destination. The smaller this parameter is, the less administrative overhead in the previous hosts this message may have caused, so it is ideal to keep this value low. All of the above described metrics are desired to be optimized, since all of them offer some advantages in the overall performance of the network, which can be critical under particular environments. For instance, a low OVH would be desired in networks with hosts with low buffer capacity, such as sensor networks.

Results
In this section, we describe and comment on the simulation results.

Effect of TTL
As can be seen in the subsequent plots, the time-to-live of the messages has a significant impact on the metrics to a certain extent, as the longer a message exists, the higher the probability it has to be delivered. Any metric value, however, tends to plateau as more TTL is granted. We found that the TTL value at which the metrics began to settle in a notable way is around 300 s. This means that adding more time-to-live to the messages will not normally add any improvements. Also, depending on the router, some of them will exhibit a better performance when the TTL is smaller than that of the settling point. Therefore, at least a minimum of TTL = 300 s is advised when evaluating router performance to capture the complete behavior.

Effect of the Fitness Thresholds
As described in Section 5, the α parameter is a value that determines to what extent some of the connections are immediately discarded as next hop candidates. Intuitively, a very small value would mean that only a small portion of the current connections are discarded, so most of them have a chance to be chosen (although in descending order with respect to their fitness values). The limit is α = 0, and since 1 ≥ f ≥ 0, the condition f ≥ α means in this case that all of the connections are considered as potential candidates. Similarly, a very large value of α will result in a strong limiting condition, meaning that only the very best hosts (the ones with considerably large fitness) will be considered as possible next hops. As we can infer from this explanation, the dynamics of the environment are strongly influenced by the α value. To better understand the effect of this fitness threshold, we run simulations changing this parameter with α = {0 in the simulations as well. We distinguished two main differentiators in both the α and β values: α = 0 and α > 0, and β = 0 and β > 0. In the first case, with α = 0, we can see that the cases β = 0 and β > 0 resulted in noticeable different dynamics (see Figures 5 and 6). We notice that for α = 0, for TTL values smaller than 60, the performance of DLR+ is better with β = 0 for PDR. For ADD, in turn, β = 0 is the choice, as it showed better results than for other β values.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 17 the settling point. Therefore, at least a minimum of TTL = 300 s is advised when evaluating router performance to capture the complete behavior.

Effect of the Fitness Thresholds
As described in Section 5, the parameter is a value that determines to what extent some of the connections are immediately discarded as next hop candidates. Intuitively, a very small value would mean that only a small portion of the current connections are discarded, so most of them have a chance to be chosen (although in descending order with respect to their fitness values). The limit is = 0, and since 1 ≥ ≥ 0, the condition ≥ means in this case that all of the connections are considered as potential candidates. Similarly, a very large value of will result in a strong limiting condition, meaning that only the very best hosts (the ones with considerably large fitness) will be considered as possible next hops. As we can infer from this explanation, the dynamics of the environment are strongly In any case, however, for OVH and HOP ( Figure 6) the choice is any value different than 0 for . As we can see, there is a tradeoff mainly between network overhead and delivery ratio or delivery delays, and the final choice of the parameters ultimately depends on the final application of the router the settling point. Therefore, at least a minimum of TTL = 300 s is advised when evaluating router performance to capture the complete behavior.

Effect of the Fitness Thresholds
As described in Section 5, the parameter is a value that determines to what extent some of the connections are immediately discarded as next hop candidates. Intuitively, a very small value would mean that only a small portion of the current connections are discarded, so most of them have a chance to be chosen (although in descending order with respect to their fitness values). The limit is = 0, and since 1 ≥ ≥ 0, the condition ≥ means in this case that all of the connections are considered as potential candidates. Similarly, a very large value of will result in a strong limiting condition, meaning that only the very best hosts (the ones with considerably large fitness) will be considered as possible next hops. As we can infer from this explanation, the dynamics of the environment are strongly In any case, however, for OVH and HOP ( Figure 6) the choice is any value different than 0 for . As we can see, there is a tradeoff mainly between network overhead and delivery ratio or delivery delays, and the final choice of the parameters ultimately depends on the final application of the router In any case, however, for OVH and HOP ( Figure 6) the choice is any value different than 0 for β. As we can see, there is a tradeoff mainly between network overhead and delivery ratio or delivery delays, and the final choice of the parameters ultimately depends on the final application of the router in delay-tolerant networks (i.e., if we are interested in minimizing latency, at the expense of some overhead, or we have limited resources, such as in mobile sensor networks).
For α > 0, we did not notice any significant difference in the values of β. Finally, for α > 0.5 there was a slightly improvement in overhead and number of hops. For this version of DLR+, we decided to use α = 0.65 and β = 0.

Performance of DLR+
In this subsection we discuss the final performance of DLR+ (α = 0.65/0, β = 0) and compare it against other well-known routers (Figures 7 and 8).
there was a slightly improvement in overhead and number of hops. For this version of DLR+, we decided to use = 0.65 and = 0.

Performance of DLR+
In this subsection we discuss the final performance of DLR+ ( = 0.65/0, = 0) and compare it against other well-known routers (Figures 7 and 8).
As can be seen in Figure 7a, DLR+ (α = 0.65) offers a greater PDR than the epidemic router and PRoPHET for TTL greater than 60 and 130, respectively. Although its performance on this metric is not the best, it is very close to those who offer the best values, only about 6.07% below its better counterparts. On the other hand, with α = 0, DLR+ outperforms all routers in PDR for TTL < 25. This reflects an interesting dynamic in the response of DLR+ for this case, in contrast with other routers: The more TTL is provided, the more inefficient the router becomes; however, as TTL is smaller, the response of the proposed router increases, outperforming the other routers in this metric. There is a tradeoff, nonetheless, in this range of operation, because in this part, DLR+ (α = 0) does not have the best performance in network overhead and hop count (Figure 8), although it shows acceptable values, very close to the ones generated by other routers. As for delays, in the long run, DLR+ does not provide the best performance on average delivery delay (Figure 7b). We can see that as the TTL increases, so does the delivery delay values, and although they tend to stabilize at some point, there are significant differences with respect to other routers' performance. The proposed router, however, performs fairly well for small TTL values, decided to use = 0.65 and = 0.

Performance of DLR+
In this subsection we discuss the final performance of DLR+ ( = 0.65/0, = 0) and compare it against other well-known routers (Figures 7 and 8).
As can be seen in Figure 7a, DLR+ (α = 0.65) offers a greater PDR than the epidemic router and PRoPHET for TTL greater than 60 and 130, respectively. Although its performance on this metric is not the best, it is very close to those who offer the best values, only about 6.07% below its better counterparts. On the other hand, with α = 0, DLR+ outperforms all routers in PDR for TTL < 25. This reflects an interesting dynamic in the response of DLR+ for this case, in contrast with other routers: The more TTL is provided, the more inefficient the router becomes; however, as TTL is smaller, the response of the proposed router increases, outperforming the other routers in this metric. There is a tradeoff, nonetheless, in this range of operation, because in this part, DLR+ (α = 0) does not have the best performance in network overhead and hop count (Figure 8), although it shows acceptable values, very close to the ones generated by other routers. As for delays, in the long run, DLR+ does not provide the best performance on average delivery delay (Figure 7b). We can see that as the TTL increases, so does the delivery delay values, and although they tend to stabilize at some point, there are significant differences with respect to other routers' performance. The proposed router, however, performs fairly well for small TTL values, As can be seen in Figure 7a, DLR+ (α = 0.65) offers a greater PDR than the epidemic router and PRoPHET for TTL greater than 60 and 130, respectively. Although its performance on this metric is not the best, it is very close to those who offer the best values, only about 6.07% below its better counterparts. On the other hand, with α = 0, DLR+ outperforms all routers in PDR for TTL < 25. This reflects an interesting dynamic in the response of DLR+ for this case, in contrast with other routers: The more TTL is provided, the more inefficient the router becomes; however, as TTL is smaller, the response of the proposed router increases, outperforming the other routers in this metric. There is a tradeoff, nonetheless, in this range of operation, because in this part, DLR+ (α = 0) does not have the best performance in network overhead and hop count (Figure 8), although it shows acceptable values, very close to the ones generated by other routers.
As for delays, in the long run, DLR+ does not provide the best performance on average delivery delay (Figure 7b). We can see that as the TTL increases, so does the delivery delay values, and although they tend to stabilize at some point, there are significant differences with respect to other routers' performance. The proposed router, however, performs fairly well for small TTL values, laying in points very close to those resulted from their counterparts, with roughly the same ADD values than those of other routers for TTL ≤ 25.
In network overhead (Figure 8a), DLR+ (α = 0) did not have the best results, with significant differences with respect to their counterparts, closely resembling the epidemic routing. For α = 0.65, however, DLR+ had the best performance, with nearly zero overhead, which means extremely efficient resource usage, way below the OVH values returned by other routers.
In hop count, on the other hand, with α = 0 the number of hops used by DLR+ is very close to a constant 1.6 in the long run, which shows better values than other routers. Indeed, for TTL > 50, the proposed router (α = 0) outperforms all other routers in the experiment, but even for TTL values smaller than 50, the number of hops used by DLR+ is between 2.2 and 2.8, which is a range in which all other routers lie as well. For α = 0.65, however, the proposed router shows an impressive HOP of nearly 1, which is a very significant difference with respect to the rest, confirming the highly efficient usage of network resources.

Conclusions and Future Work
The integration of vehicular networks in intelligent transportation systems will bring a vast set of new services in areas such as traffic management, security and safety, e-commerce, and entertainment, resulting in a global evolution of cities as we know them. The deployment of this kind of network, however, is slowed down by the intrinsic severe conditions of its environment. Among others, routing in vehicular delay-tolerant networks is a research challenge that requires special attention, since their efficiency will ultimately dictate when these networks become real life implementations. In this paper, we have modeled a solution to the routing problem in VDTN and presented a router based on deep learning, which uses an algorithm that leverages the power of neural networks to learn from local and global information to make smart forwarding decisions on the best next hop and best next message. As discussed in the previous section, the proposed router presents improvements in network overhead and hop count over some popular routers, while maintaining an acceptable delivery rate and delivery delay. For TTL ≤ 25, if resources are not a problem, it is recommended to use DLR+ with α = β = 0, as it will provide the highest delivery ratio. On the contrary, if network resources are a concern, the proposed router is recommended to use with α = 65 and set the message scheduler to β = 0, so it has the highest performance despite the resource limitation.
In the future, the DLR+ router can be further developed, including the full integration of the neural network to work in real time and automatic online parameter tuning to increase the overall performance. Also, more features of the host and messages can be added to the paradigm, so the router gets an even better understanding of its environment.
As discussed earlier, there has to be a trade-off between some of the metrics that are sought to be optimized to achieve an overall better performance in the VDTN, and the quest for this continues. Ultimately, the corresponding trade-offs depend on the particular application of the network; for instance, in mobile sensor networks, the delays may not be an important thing, but the limited resources might be, whereas in VDTN, there can be a certain level of flexibility depending on even more specific applications, such as e-commerce transactions versus entertainment applications. All in all, the DLR+ router provides an insight into how deep neural networks can be used to make smarter routers, and this work provides a framework than can serve as a starting point to build more intelligent routing algorithms. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.