Ndist2vec: Node with Landmark and New Distance to Vector Method for Predicting Shortest Path Distance along Road Networks

: The ability to quickly calculate or query the shortest path distance between nodes on a road network is essential for many real-world applications. However, the traditional graph traversal shortest path algorithm methods, such as Dijkstra and Floyd–Warshall, cannot be extended to large-scale road networks, or the traversal speed on large-scale networks is very slow, which is computational and memory intensive. Therefore, researchers have developed many approximate methods, such as the landmark method and the embedding method, to speed up the processing time of graphs and the shortest path query. This study proposes a new method based on landmarks and embedding technology, and it proposes a multilayer neural network model to solve this problem. On the one hand, we generate distance-preserving embedding for each node, and on the other hand, we predict the shortest path distance between two nodes of a given embedment. Our approach significantly reduces training time costs and is able to approximate the real distance with a relatively low Mean Absolute Error (MAE). The experimental results on a real road network confirm these advantages.


Introduction
In the context of a large-scale road network [1,2] with a large number of users sending out remote distance queries at the same time, determining how to provide a timely response to such a large number of queries is a very important research question for many navigation applications. The aforementioned problem of efficiently and accurately predicting the shortest path distance between nodes in a road network [3,4] has attracted researchers' attention, and a number of exact methods [5][6][7][8][9] capable of performing errorfree distance prediction have been proposed, as well as approximate methods [10][11][12][13][14] that sacrifice some prediction accuracy to reduce computation and memory costs.
The traditional Dijkstra algorithm [5] has a time complexity of ( ) and a space complexity of ( ), using big O notation [15], where and are the number of edges and the number of nodes in the graph, respectively. Moreover, the Floyd-Warshall [6] algorithm has a time complexity of ( ) and a space complexity of ( ). Such a time complexity is acceptable for small graphs, but for large million-node graphs, the calculation of a single node distance requires massive computational resources and time [16]. In order to speed up the query times compared with traditional methods, a number of labeling methods have been proposed [7,8,17], all of which use distance labels. The basic idea is to calculate the distance from each node to other nodes (in extreme cases, this may be all the remaining nodes) in advance in the data preprocessing stage in order to form a tuple as the distance label of the node. Then, by checking the distance label, the distance between any two nodes can be calculated in (1) time. However, the difficulty of these labeling methods is to find the minimum node set that needs to calculate and store the distance so as to accurately calculate all the shortest paths. Finding the optimal node set of a graph has been proved to be an NP-hard problem. At the same time, the memory cost consumed by these methods is still ( ) [18].
In order to reduce the memory cost, researchers proposed approximate shortest path distance methods [12,13], which further reduce the memory and computation costs by sacrificing some prediction accuracy. The sacrifice is worthwhile, because in many practical applications, or some special graphs (large road network graphs), if the exact distance is not necessary, it is enough to find the approximate distance between nodes. A typical representative of the approximate shortest path distance method is the landmark-based method [16,[19][20][21]. This method usually selects nodes as landmarks, and then, similar to the marking method, assigns a distance label to each node, which contains the distance from the node to these landmark nodes. When querying the distance between any two nodes, the approximation is the sum of the minimum distances between these two nodes and the same landmark node. Although the landmark-based methods can reduce the memory cost to ( ), these methods cannot guarantee the approximation quality in theory [22], and the accuracy of the prediction distance largely depends on the selection of landmarks. Therefore, landmark selection is critical to improve the landmark-based method, but it is also NP-hard [16] to select the best landmarks.
The embedding method [13,[23][24][25] is another representative approximate shortest path distance method. In the data preprocessing stage, this method learns the vector embedding of each node through embedding technology [26][27][28][29] to maintain the shortest path distance; that is, each node is embedded into the -dimensional mapping space, such as Euclidean space [30] and hyperbolic space [31], to calculate the shortest path distance between nodes. Therefore, each node has a corresponding -dimensional embedding vector. When querying the distance, the embedding method uses a more effective distance approximation function, such as directly calculating the -norm between the embedding vectors or training the neural network to predict the distance according to the embedding vectors and the pre-calculated real shortest path distance. It is precisely because of this that the embedding method can make the query speed faster than other approximation methods. In addition, different embedding technologies, embedding dimensions and embedding spaces have a great impact on the accuracy of prediction distance. Therefore, it is also a challenging problem to select the appropriate embedding technology, dimension and space for different graph data.
Inspired by the above approximate shortest path distance methods, such as the landmark-based method and the embedding method, we proposed a new approximate shortest path distance model based on neural networks. The model integrates the landmarkbased method, the embedding method and a neural network model, which greatly reduces the time cost by training.
The remainder of the paper is structured as follows: The preliminary knowledge and related work are reviewed in Section 2. Section 3 introduces our model ndist2vec in detail. We arrange the experiment in Section 4, and we describe the experimental dataset, evaluation index, experimental parameters and experimental results. Finally, conclusions are drawn in Section 5.

Preliminary Knowledge
Let = ( , , ) be an undirected road network graph with = | | nodes and = | | edges. For each node (road intersection), ∈ has a pair of geocoordinates, and the edge (roads) ∈ connects nodes and , indicating that they are adjacent and have the weight . ∈ , which represents the distance across the edge, i.e., the distance between the two road intersections. Figure 1 shows an example, which contains 5 nodes and 6 edges; and are two nodes of the graph, and the presence of edge shows that they are adjacent and have the weight . , indicating that the distance between them is 2.
Given node and node , there exists at least one path ; this is connected by a series of adjacent nodes, which can make reach . The path length is the sum of the weights between these series of nodes. For example, there are three paths from node to node , which are − − − , − − − and − − , and the corresponding path lengths are 8, 9 and 10, respectively. We specify that the shortest path between node and node is * and that = * denotes the real shortest path length, so for node and node , the shortest path length is 8.

Related Work
Researchers at the University of Passau proposed a new method [13] for approximating the shortest path distance between two nodes in a social graph based on a landmark approach, and they used simple neural networks with node2vec [27] or Poincare [28] embeddings and obtained better results than Orion and Rigel on a social graph dataset. For convenience, we name this method node2vec-Sg. In detail, in the first step, they utilized node embedding technology to learn the vector embedding ( ) ∈ of each node ∈ in the graph. In the second step, they extracted training sample pairs in the entire graph . They randomly selected ( ≪ ) nodes from as landmarks, and they applied the breadth-first search (BFS) algorithm to calculate the true distance from each landmark node to the remaining nodes so as to obtain ( − ) sample pairs similar to ( , ), . In order to approximately calculate the distance between two nodes, and , they combined their embeddings, ( ) and ( ), through some binary operation ,• (the operations included subtraction, averaging, multiplication or concatenating between vectors) so as to form a training sample pair, such as ( ( ), ( ) , ). Finally, the feedforward neural network composed of a single hidden layer was trained through these training samples, and the neural network output the actual value prediction of the shortest path distance . The specific process is shown in Figure 2. In addition, since the neural network performs a regression task, the Mean Square Error (MSE) is used as the loss function, and the Stochastic Gradient Descent (SGD) [32] is used as the optimizer. In all their experiments, they confirmed that the node2vec embedding gives better results than the Poincare embedding, adding that the node2vec embedding is not able to learn the structural features of distant nodes, so it is not suitable for graphical structures with many distant nodes, such as road networks.
Researchers have proposed a learning-based model called vdist2vec [25]. The model can effectively and accurately predict the shortest path distance between two nodes in a road network, and the distance prediction time and the storage space of the model are ( ) and ( ), respectively, in which is the dimension of node embedding. As shown in Figure 3, in the 2| |-dimensional one-hot layer, vdist2vec takes the -dimensional one-hot vectors and of nodes and as inputs, and the next layer is an embedding layer composed of nodes. In this layer, the embedding of each node is learned to generate the weight matrix . Through the formula = , the embedding of each node can be obtained; in this case, and are obtained, and then and are connected as input to train a multilayer perceptron (MLP) in order to predict the shortest path distance for a given embedding of two nodes. Moreover, researchers have proposed improved models vdist2vec-L and vdist2vec-S of the base model vdist2vec, where vdist2vec-L uses Huber loss as the loss function and is able to reduce more errors than the base model vdist2vec, which uses the Mean Square Error (MSE). The model vdist2vec-S is driven by ensemble learning. Four independent MLPs are replaced with the last hidden layer of vdist2vec to focus on the distances in different ranges, and their outputs are added to obtain the final distance prediction.
The above models achieve fast distance prediction without increasing the spatial cost. The advantages of the three models have been confirmed in experiments on several different real road networks, and vdist2vec-S is the best one of the three models. However, in order to better learn the node embeddings and to obtain a higher prediction accuracy, the models use all ( − 1) node pairs as training samples to train the neural network, thus significantly increasing the training time. Inspired by the above-mentioned use of embedding methods and landmark-based methods in the approximate shortest path distance problem, we propose a new approximate shortest path distance prediction model, ndist2vec. The goal is to maintain a relatively high prediction accuracy and fast query time while greatly reducing the training time. In the next section, we elaborate on the details of ndist2vec.

Ndist2vec
In this section, we proposed the ndist2vec model. As shown in Figure 4, we used the method of randomly selecting landmarks to divide the set of nodes into a set of landmark nodes and a set of remaining nodes . Then, our ndist2vec model selects two nodes, ∈ , = 1, … , and ∈ , = 1, … , − , and obtains their corresponding embedding vectors, and , respectively, by using the vector embedding matrix ∈ * (each node has a corresponding embedding vector), and it finally connects the vectors and as input to train a multilayer neural network to output the distance between the two given nodes, and . Specifically, our goal was to extract training pairs from the entire graph to train a multilayer neural network and then to predict the shortest path distance between any two nodes and in a graph . In the first stage, we selected (where ) nodes in the node set as landmarks and generated the set of landmark nodes , and the remaining nodes in generated the set of remaining nodes ; i.e., the set is divided into sets and (where | | = and | | = − ). In the second stage, we randomly initialized a vector embedding matrix ∈ so that we could obtain an embedding vector ∈ for each node ∈ . Since our embeddings can be guided directly by distance prediction, we can update based on the back propagation of the training predictions for each epoch, which also means updating each embedding vector . In the third stage, using the vector embedding matrix , we could obtain the embedding vector corresponding to node of the landmark node set and the embedding vector corresponding to node of the remaining node set , and we connected them together as training samples while calculating the actual shortest path distance of these two nodes as the supervision information (label) of this sample. Then, we utilized the above method to traverse each landmark node and the remaining nodes to obtain ( − ) training samples, along with their corresponding supervision information. Finally, the training samples were used as input to a multilayer neural network. The neural network maps the input training samples to real-valued distances.
As shown in Figure 5, we designed a four-layer neural network consisting of an input layer, two hidden layers and an output layer. The size of the input layer depends on the dimension of the vector embedding, and since two vectors are connected in series, 2 neurons are required. Since the ReLU [33] function is effective in training neural networks, we set the activation function for the first three layers to the ReLU function. In the output layer, we learned the distances in different ranges in the form of ensemble learning, and we added their outputs to obtain the final distance prediction. We used the Mean Square Error (MSE) to measure the quality of the predictor because the network performs a regression task, so the Mean Square Error of the actual node distance and the predicted distance was taken as our training loss function : Finally, we used adaptive moment estimation (ADAM) [34] as an optimizer, which controls the learning rate after bias correction with a defined range of learning rates per iteration. We made the parameters relatively smooth, and their effectiveness has been verified in a large number of deep neural network experiments. During training, all parameters of the neural network are randomly initialized. The training samples are fed into the network in batches for training. The training loss is passed back to optimize all parameters in the network.

Experiment
In this section, we tested our proposed ndist2vec on four road network datasets and compared it with node2vec-sg and vdis2vec. All these methods were implemented by Python 3.7 on a PC with an Intel Core Duo Processor (double 4.2 GHz) with 16 GB RAM. Next, we first described the datasets, some parameter settings and evaluation metrics; then, we described the experiments that we conducted and presented the results; and, finally, we presented the conclusions drawn from the experimental results and the reasons why they turned out the way they did. The source code of the program and the experimental data were archived on figshare [35].

Datasets and Hyperparameters
We used four different road network graphs [2] for the experiments. We extracted the maximum connected component from these road graphs, renamed the node name and specified the coordinates. Therefore, all datasets contain weighted edges and unique map coordinates for each node. In addition, they are all undirected, and the number of nodes = | |, number of edges = | |, the maximum distance , the minimum distance and the average distance between the nodes are summarized in Table 1, and Figure 6 shows the original road network.  We used two types of metrics to measure the accuracy and speed of our method. Firstly, we utilized the Mean Absolute Error (MAE) and the Mean Relative Error (MRE) to measure the difference between our predicted value and the real value . Their definitions are and , respectively, and the smaller the value, the higher the accuracy. Secondly, we used the training time (PT) and the average distance prediction time (AT) to measure the speed of the model. In our experiment, our prediction set consists of ( − 1) pairs of all nodes. Table 2 shows the experimental results of the prediction error and time cost of ndist2vec, vdist2vec-S and node2vec-Sg on the four road network datasets. In terms of the prediction error, it can be seen that vdist2vec-S has the smallest MAE and MRE for each dataset because it learns all node pair information and retains the distance information of the node pairs through node embedding (Bolded numbers indicate best performance). Our method ndist2vec has a larger MAE and MRE than those of vdist2vec-S, but the size is limited. We used the landmark-based method for learning, and we did not learn all the node pair information but retained the node embedding information in the vector embedding matrix. Node2vec-Sg is an approximate method for predicting the shortest path distance of social networks. We changed its weight to the real distance and carried out experiments on road networks, but we did not obtain good results. The reason is that node2vec-Sg sometimes oversamples and undersamples in the process of generating training samples; that is, the samples are not evenly divided, and not all node pair information is learned. At the same time, the embedding technology node2vec is not suitable for road network nodes; this is explained in our subsequent experiments. In terms of the training time cost PT, our method ndist2vec has the best performance. The advantage of our model is that it sacrifices some accuracy to greatly reduce the training time, and this accuracy sacrifice is worthwhile. Specifically, for the datasets SU, AH and DH, the MAEs of ndist2vec are 1.14, 1.44 and 1.20 times those of vdist2vec-s, respectively, but the training time PT is at least one-quarter of that of vdis2vec-S. Moreover, with an increase in the number of nodes ( ), the time gap will become increasingly larger. Although node2vec-Sg is also a landmark-based method, it is very time consuming because of the different ways of selecting training samples.

Overall Results
The average prediction time AT is calculated by dividing the total prediction time of all node pairs of samples by the number of sample pairs ( − 1) of all the nodes, and the unit is microseconds (μs). The three models need to link the embedded vectors of two nodes in the prediction distance and input them into the corresponding neural network (the neural network structure of the three methods is similar) for forward propagation to obtain the prediction results. It can be seen that the AT of ndist2vec for the four datasets is the smallest. This is because the vector embedding dimension of ndist2vec is always 50, while the embedding dimension of vdist2vec-S is 0.02 , and the dimension of node2vec is = 128; more dimensions will increase the training time PT and the average prediction time AT.
Ndist2vec performs the worst in terms of prediction error in the Dongguan dataset. This is because although the Dongguan dataset only has 7658 nodes, a total of about 59 million sample pairs, its average distance between nodes is the highest, = 35096 m. It is conceivable that the node distribution in the Dongguan dataset is dominated by a large distance. In addition, our method is based on landmarks. We did not learn all the sample pairs during training, so we lack the learning of large-distance sample pairs. Therefore, our method may not be suitable for datasets with large distances between nodes, and this will be our next breakthrough direction. Our training strategy is to train 30 epochs. The first epoch trains all ( − 1) sample pairs, the remaining 29 epochs train ( − ) landmark sample pairs, and landmarks are randomly selected again in each epoch. In addition, our vector embedding matrix is updated according to the prediction results. Using the control variable method, we summarized the four models in Table 3 and compared them in Table 4 to verify the effectiveness of our model training strategy.

Models
Embedding Table 3 shows the training settings of the different models. Embedding represents the form of the vector embedding matrix, L indicates that the vector embedding matrix can be updated through the prediction results (that is, our method), and N indicates that the vector embedding matrix is learned in advance according to node2vec embedding technology and will not change with the prediction results. Epoch has two choices. Epoch1 indicates that the first epoch trains all sample pairs, and the remaining 29 epochs train landmark sample pairs; Epoch2 indicates that 30 epochs train landmark sample pairs. The S in Landmark indicates that each epoch generates new landmark sample pairs to participate in training, and F indicates that the landmark is generated once and fixed to participate in all epochs training; that is, each epoch is trained with fixed landmark sample pairs.  Table 4 shows the experimental results of ndist2vec and the other variant models. Specifically, by comparing ndist2vec and ndist2vec-1, we can see that for the four datasets, the result of ndist2vec is better than that of ndist2vec-1, which shows that the method of updating the vector embedding matrix according to the prediction results is more suitable for the prediction of the shortest path distance of a road rather than directly using node2vec embedding technology. Node2vec embedding technology pays more attention to capturing the similarity between nodes, but the shortest path prediction of the road network pays more attention to the distance relationship between nodes. Therefore, the method of updating the vector embedding matrix according to the prediction results is more appropriate to capture the characteristics of the road network. It measures the distance between the nodes rather than the similarity.
Ndist2vec trained all ( − 1) sample pairs in the first epoch, while ndist2vec-2 used the landmark-based method in the first epoch and only trained ( − ) sample pairs. As a result, the training time PT of ndist2vec was higher than that of ndist2vec-2 (only higher in the first epoch time of training), but the MAE and MRE were reduced. The reason for this is that, although the landmark-based method can reduce the training time, we used the random landmark selection method, which may not generate better sample pairs for training in the first epoch, and then we updated the vector embedding matrix . However, training all sample pairs in the first epoch can better teach and update the vector embedding matrix and provide a good foundation for the next 29 epochs of training. Comparing the ndist2vec model and the ndist2vec-3 model, in the landmark-based training epoch (the remaining 29 epochs), BB repeats learning 29 times for ( − ) sample pairs, so it can only learn the information of ( − ) sample pairs. When the random landmark selection is not good, the performance of ndis2vec-3 deteriorates. However, in the epoch landmark-based training of the ndist2vec model, each epoch randomly selects new landmarks and generates new sample pairs for training; that is, the ndist2vec model can learn more information of | ⋃ | − | ⋂ | sample pairs (where is the set of sample pairs generated by the combination of and ). For regression tasks, the more information used for learning means better fitting. Therefore, it can be seen in Table  4 that ndist2vec performs better than ndis2vec-3.

Discussion
In this paper, we studied undirected road networks. In fact, in a directed road network, whether all nodes are bidirectionally connected determines whether our model is feasible. When all nodes are bidirectionally connected, a feasible solution for our model in directed road networks is to change the connection order of the node embedding vectors and to train two prediction models in order to predict the bidirectional node distances separately. Currently, we cannot come up with a solution to apply this model to a case where only some nodes are bidirectionally connected. In addition, our model uses a randomly selected landmark method; i.e., landmark nodes are randomly selected from the node set.
The method of randomly selecting landmarks does not seem to be the best choice, and a better landmark selection method may cause a large improvement in the results. We also tried some other methods of selecting landmarks, such as using the k-media algorithm to select median nodes and using the concave hull algorithm to select all edge nodes, but the effect was not as good as directly selecting nodes at random. We were not able to determine the specific reasons for this.

Conclusions and Future Work
This paper presents a model, nidst2vec, based on embedding and landmark technology, which uses multi-layer neural networks to obtain an approximate solution to the shortest path distance problem. Ndist2vec learns the distance information between nodes through embedding technology; i.e., it learns the updated vector embedding matrix to maintain the accuracy of prediction, and only (50 ) space is required to store the vector embedding matrix . The landmark method is added to ndist2vec, which greatly reduces the training time. In particular, in each training round, the model selects new landmarks to learn more information about the node pairs without increasing the training time, which facilitates the updating of node embeddings. The experimental results show that, while the prediction error is elevated (by up to 20%), the training time is significantly reduced (by at least 75%) compared to that of the benchmark method.
In future work, we plan to adapt our method to a road network graph with a large distance between nodes and to extend it to other types of graph data. In addition, combining our method, studying more reasonable methods of landmark selection, and exploring the impact of different embedding techniques and embedding dimensions are all worthwhile research directions. We will use a geospatial big data computing framework [36,37] to improve the performance of the deep learning model considering large datasets in future work.