# Subgraph Learning for Topological Geolocalization with Graph Neural Networks

^{*}

## Abstract

**:**

## 1. Introduction

- Introduce a novel motion trajectory-based topological geolocalization method using a graph neural network, which combines the benefits of vector-based navigation and the graph representation of a map.
- Design two different subgraph representations for motion trajectories: one is for the encoding direction and the other for encoding both direction and distance by inserting virtual nodes.
- Demonstrate an affordable data collection setup that is used to generate visual-inertial navigation dataset to demonstrate the effectiveness of the proposed method in a practical setting.

## 2. Related Work

**Visual Localization**. A major category of work in the literature is dedicated to the use of images for localization, referred to as visual localization. These methods can be classified into photogrammetric localization [20,21,22,23] and retrieval-based localization [24,25]. The first set of approaches assumes the scene is represented by 3D sparse point clouds, which are commonly generated from structure from motion [26]). Then, the camera pose for a given input image is directly estimated. The training dataset consists of pairs of images and the corresponding camera poses where the camera pose is usually represented by 6-DoF position and orientation. Despite their performance, the photogrammetric pipeline for generating and storing large 3D maps is not trivial and needs a large memory footprint. Another set of methods works by matching a given image to a database of location-tagged images or location-tagged image features. From the hand-craft features such as SIFT [27], bag-of-visual words [28], Fisher Vector [29] and VLAD [30], to the learned features [31,32], all of these approaches struggle to find a good representation robust to changes in viewpoint, appearance, and scale, which is a requirement hard to fulfill in practice. Furthermore, creating an up-to-date image/feature database seems at best costly if not impossible. There is also a potential privacy issue of storing visual descriptors in the database. Our approach mitigates the above deficiencies by using open-sourced 2D maps.

**Probabilistic Localization.**A common form of localization problem is to use sensory readings to estimate the absolute coordinates of the object on the map using Bayesian filtering [33,34,35,36,37]. The authors of [33] presented a Bayesian approach to model the posterior distribution of the position given the prior map, which is considered a classic method commonly adopted in the robotics field. However, this method requires GPS readings and endures a rigorous mathematical model. In more recent studies [34,35], the authors proposed a probabilistic self-localization method using OpenStreetMap and visual odometry where the location is determined by matching with road topology. The authors of [36,37] presented a localization approach based on stochastic trajectory matching using brute-force search. However, all of these methods require the generation and maintenance of posterior distributions, which lead to complicated inference and high computational costs. For interested readers, a more comprehensive reference about probabilistic approaches is given in [38]. In contrast to the above methods, we avoid the complicated probabilistic inference process and propose an intuitive and learning-based approach.

**Topological Localization.**There are a small number of studies closely related to ours that uses topological map and deep learning. Traditional approaches utilize topological road structures and try to match features onto the map using Chamfer distance and Hamming distance [39,40]. Chen et al. [7] proposed a topological approach to achieve localization and visual navigation using several different deep neural networks. However, the method aims at visual navigation problems and is only investigated in a small indoor environment. Wei et al. [41] proposed a sequence-to-sequence labeling method for trajectory matching using a neural machine translation network. This approach was shown to only work well on synthetic scenarios where the input trajectory was synthetically generated with a known sequence of nodes from the map. In [42], the author presented a variable-length sequence classification method for motion trajectory localization using a recurrent neural network, which largely inspired us to employ motion-based data to achieve localization. Zha et al. [43] introduced a topological map-based trajectory learning method and utilized hypotheses generation and pruning strategies to achieve consistent geolocalization of moving platforms where the problems were formulated as conditional sequence prediction. In contrast, this paper focuses on the node localization problem on a topological map based on motion trajectory and develops a subgraph embedding classification model using a graph neural network, which generalizes sequence representation to graph representation and preferably fits the graph-based map structure.

**Vector-Based Navigation.**In neuroscience, much of the literature focuses on studying the mechanisms of animals’ ability to learn maps, as well as self-localization and navigation [2,11,44]. These studies have shown that one typical method used in animals, such as desert ants, is path integration, which is a mechanism in which neurons calculate location by integrating self-motion. Self-motion includes direction and the speed of movement, which inspired us to utilize turning and distance information in this paper. In [5], the authors elaborated on a topological strategy for navigation using place cells [44,45] and metric vector navigation using grid cells [12], from a biological perspective. Our work can be considered as a mixture of topological and vector strategy, where the map is a graph representation, while navigation on the map is vector-based and includes direction and distance.

**GNN on Spatial Data.**The idea of GNN is to generate representations of nodes, edges, or whole graphs that depend on the structure of the graph, as well as any feature information endowed by the graph. The basic GNN model can be motivated in a variety of ways, either from the perspective of a spatial domain [15,46] or a spectral domain [47,48]. Further comprehensive reviews can be found in [13,14,49]. In recent years, the GNN has extended its applications to geospatial data due to its powerful ability to model irregular data structures. For example, the authors of [50] combined the convolutional neural network and GNN to infer road attributes, which overcome the limitation of capturing the long-term spatial propagation of the features; the authors of [51] presented a graph neural network estimator for an estimated time of arrival (ETA), which accounts for complex spatiotemporal interactions and has been employed in production at Google Maps; and the authors of [52] improved the generalization ability of GNN through a sampling technique and demonstrated its performance on real-world street networks. Ref. [53] proposed a GNN architecture to extract road graphs from satellite images.

## 3. Proposed Method

#### 3.1. Problem Formulation

- Input subgraph: ${\mathcal{G}}_{s}=({\mathcal{V}}_{s},{\mathcal{E}}_{s})$, ${x}_{s}\in {\mathbb{R}}^{|{V}_{s}|\times d}$, where $|{V}_{s}|$ is the number of nodes of the subgraph and d is the dimension of node attribute;
- Embedding stage: ${\mathcal{Z}}_{s}$ is the embedding of subgraph ${\mathcal{G}}_{s}$ obtained from graph neural network;
- Classification stage: the subgraph embedding ${\mathcal{Z}}_{s}$ is classified into label $y={v}_{i}$, ${v}_{i}\in \mathcal{V}$ through fully-connected neural network, where $\mathcal{V}=\{{v}_{1},{v}_{2},\cdots ,{v}_{n}\}$ is the output label space and n is the number of nodes in the topological map;

#### 3.2. Subgraph Representation

#### 3.3. Embedding Stage

**${\mathcal{G}}_{s}=({\mathcal{V}}_{s},{\mathcal{E}}_{s})\in \mathcal{G}$**along with a set of respective node attributes ${x}_{s}\in {\mathbb{R}}^{|{V}_{s}|\times d}$, to first generate node embeddings that are then transformed into a subgraph embedding. During each message-passing iteration in GNN, as shown in Figure 5, a hidden embedding ${\mathit{h}}_{v}^{k}$ representing node v at layer k is updated according to the information aggregated from its previous self-embedding and neighborhood embedding. The update and aggregate operation are expressed as follows:

#### 3.4. Classification Stage

## 4. Experiments

#### 4.1. Dataset

#### 4.1.1. Map Generation

**b**in terms of longitude and latitude,

**b**= $(lo{n}_{min},lo{n}_{max},la{t}_{min},la{t}_{max})$. The obtained map is given in XML format, from which we abstract the file as a directed graph structure where each node represents the place in the map with attributes of its geographic coordinates and each edge denotes different road segments. Thus, an agent can be able to navigate freely on the such map as a graph traversal process forming different graph paths, which will be used as a training dataset in this paper.

#### 4.1.2. Map-Based Trajectory Generation

#### 4.1.3. Generating Real Trajectory Data for Testing

#### 4.2. Training Process

## 5. Results and Analyses

#### 5.1. Comparisons with Existing Methods

#### 5.2. Ablation Study

#### 5.3. Discussions

#### 5.3.1. Manhattan-World Ambiguity

#### 5.3.2. Scalability

#### 5.3.3. Image as Complementary Data

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Conflicts of Interest

## References

- El-Rabbany, A. Introduction to GPS: The Global Positioning System; Artech House: New York, NY, USA, 2002. [Google Scholar]
- Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev.
**1948**, 55, 189. [Google Scholar] [CrossRef] [PubMed] - Erdem, U.M.; Hasselmo, M. A goal-directed spatial navigation model using forward trajectory planning based on grid cells. Eur. J. Neurosci.
**2012**, 35, 916–931. [Google Scholar] [CrossRef] [PubMed] - Banino, A.; Barry, C.; Uria, B.; Blundell, C.; Lillicrap, T.; Mirowski, P.; Pritzel, A.; Chadwick, M.J.; Degris, T.; Modayil, J.; et al. Vector-based navigation using grid-like representations in artificial agents. Nature
**2018**, 557, 429–433. [Google Scholar] [CrossRef] [PubMed] - Edvardsen, V.; Bicanski, A.; Burgess, N. Navigating with grid and place cells in cluttered environments. Hippocampus
**2020**, 30, 220–232. [Google Scholar] [CrossRef] - Dolgov, D.; Thrun, S.; Montemerlo, M.; Diebel, J. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res.
**2010**, 29, 485–501. [Google Scholar] [CrossRef] - Chen, K.; de Vicente, J.P.; Sepulveda, G.; Xia, F.; Soto, A.; VÃ¡zquez, M.; Savarese, S. A Behavioral Approach to Visual Navigation with Graph Localization Networks. In Proceedings of the Robotics: Science and Systems, Breisgau, Germany, 22–26 June 2019. [Google Scholar] [CrossRef]
- Reid, T.G.; Chan, B.; Goel, A.; Gunning, K.; Manning, B.; Martin, J.; Neish, A.; Perkins, A.; Tarantino, P. Satellite navigation for the age of autonomy. In Proceedings of the 2020 IEEE/ION Position, Location and Navigation Symposium (PLANS), Portland, ON, USA, 20–23 April 2020; pp. 342–352. [Google Scholar]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot.
**2016**, 32, 1309–1332. [Google Scholar] [CrossRef] - McNaughton, B.L.; Battaglia, F.P.; Jensen, O.; Moser, E.I.; Moser, M.B. Path integration and the neural basis of the ‘cognitive map’. Nat. Rev. Neurosci.
**2006**, 7, 663–678. [Google Scholar] [CrossRef] - Bush, D.; Barry, C.; Manson, D.; Burgess, N. Using grid cells for navigation. Neuron
**2015**, 87, 507–520. [Google Scholar] [CrossRef] - Hafting, T.; Fyhn, M.; Molden, S.; Moser, M.B.; Moser, E.I. Microstructure of a spatial map in the entorhinal cortex. Nature
**2005**, 436, 801–806. [Google Scholar] [CrossRef] - Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag.
**2017**, 34, 18–42. [Google Scholar] [CrossRef] - Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv
**2018**, arXiv:1806.01261. [Google Scholar] - Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1024–1034. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv
**2018**, arXiv:1810.00826. [Google Scholar] - Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
- Shi, W.; Rajkumar, R. Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1711–1719. [Google Scholar]
- Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot.
**2018**, 34, 1004–1020. [Google Scholar] [CrossRef] - Kendall, A.; Cipolla, R. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5974–5983. [Google Scholar]
- Sattler, T.; Leibe, B.; Kobbelt, L. Fast image-based localization using direct 2d-to-3d matching. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 667–674. [Google Scholar]
- Sattler, T.; Zhou, Q.; Pollefeys, M.; Leal-Taixe, L. Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3302–3312. [Google Scholar]
- Weyand, T.; Kostrikov, I.; Philbin, J. Planet-photo geolocation with convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 37–55. [Google Scholar]
- Hays, J.; Efros, A.A. IM2GPS: Estimating geographic information from a single image. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Walch, F.; Hazirbas, C.; Leal-Taixe, L.; Sattler, T.; Hilsenbeck, S.; Cremers, D. Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 627–637. [Google Scholar]
- Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef] - Philbin, J.; Chum, O.; Isard, M.; Sivic, J.; Zisserman, A. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MI, USA, 18–23 June 2007; pp. 1–8. [Google Scholar]
- Perronnin, F.; Liu, Y.; Sánchez, J.; Poirier, H. Large-scale image retrieval with compressed fisher vectors. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3384–3391. [Google Scholar]
- Jégou, H.; Douze, M.; Schmid, C.; Pérez, P. Aggregating local descriptors into a compact image representation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3304–3311. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Lin, T.Y.; Cui, Y.; Belongie, S.; Hays, J. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5007–5015. [Google Scholar]
- Oh, S.M.; Tariq, S.; Walker, B.N.; Dellaert, F. Map-based priors for localization. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2179–2184. [Google Scholar]
- Brubaker, M.A.; Geiger, A.; Urtasun, R. Map-based probabilistic visual self-localization. IEEE Trans. Pattern Anal. Mach. Intell.
**2015**, 38, 652–665. [Google Scholar] [CrossRef] - Floros, G.; Van Der Zander, B.; Leibe, B. Openstreetslam: Global vehicle localization using openstreetmaps. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1054–1059. [Google Scholar]
- Gupta, A.; Chang, H.; Yilmaz, A. Gps-denied geo-localisation using visual odometry. In Proceedings of the ISPRS Annual Photogrammetry, Remote Sensing Spatial Information Science, Prague, Czech Republic, 12–19 July 2016; pp. 263–270. [Google Scholar]
- Gupta, A.; Yilmaz, A. Ubiquitous real-time geo-spatial localization. In Proceedings of the Eighth ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness, Burlingame, CA, USA, 31 October 2016; pp. 1–10. [Google Scholar]
- Thrun, S. Probabilistic robotics. Commun. ACM
**2002**, 45, 52–57. [Google Scholar] [CrossRef] - Costea, D.; Leordeanu, M. Aerial image geolocalization from recognition and matching of roads and intersections. arXiv
**2016**, arXiv:1605.08323. [Google Scholar] - Panphattarasap, P.; Calway, A. Automated map reading: Image based localisation in 2-D maps using binary semantic descriptors. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 6341–6348. [Google Scholar]
- Wei, J.; Koroglu, M.T.; Zha, B.; Yilmaz, A. Pedestrian localization on topological maps with neural machine translation network. In Proceedings of the 2019 IEEE Sensors, Montreal, QC, Canada, 27–30 October 2019; pp. 1–4. [Google Scholar]
- Zha, B.; Koroglu, M.T.; Yilmaz, A. Trajectory Mining for Localization Using Recurrent Neural Network. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 5–7 December 2019; pp. 1329–1332. [Google Scholar]
- Zha, B.; Yilmaz, A. Learning maps for object localization using visual-inertial odometry. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci.
**2020**, 1, 343–350. [Google Scholar] [CrossRef] - O’Keefe, J. Place units in the hippocampus of the freely moving rat. Exp. Neurol.
**1976**, 51, 78–109. [Google Scholar] [CrossRef] - O’Keefe, J.; Dostrovsky, J. The hippocampus as a spatial map: Preliminary evidence from unit activity in the freely-moving rat. Brain Res.
**1971**, 34, 171–175. [Google Scholar] [CrossRef] - Fey, M.; Lenssen, J.E.; Weichert, F.; Müller, H. Splinecnn: Fast geometric deep learning with continuous b-spline kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 869–877. [Google Scholar]
- Henaff, M.; Bruna, J.; LeCun, Y. Deep convolutional networks on graph-structured data. arXiv
**2015**, arXiv:1506.05163. [Google Scholar] - Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv
**2016**, arXiv:1609.02907. [Google Scholar] - Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 32, 4–24. [Google Scholar] [CrossRef] - He, S.; Bastani, F.; Jagwani, S.; Park, E.; Abbar, S.; Alizadeh, M.; Balakrishnan, H.; Chawla, S.; Madden, S.; Sadeghi, M.A. RoadTagger: Robust road attribute inference with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10965–10972. [Google Scholar]
- Derrow-Pinion, A.; She, J.; Wong, D.; Lange, O.; Hester, T.; Perez, L.; Nunkesser, M.; Lee, S.; Guo, X.; Wiltshire, B.; et al. ETA Prediction with Graph Neural Networks in Google Maps. arXiv
**2021**, arXiv:2108.11482. [Google Scholar] - Iddianozie, C.; McArdle, G. Improved Graph Neural Networks for Spatial Networks Using Structure-Aware Sampling. ISPRS Int. J. Geo-Inf.
**2020**, 9, 674. [Google Scholar] [CrossRef] - Bahl, G.; Bahri, M.; Lafarge, F. Road extraction from overhead images with graph neural networks. arXiv
**2021**, arXiv:2112.05215. [Google Scholar] - Rowland, D.C.; Roudi, Y.; Moser, M.B.; Moser, E.I. Ten years of grid cells. Annu. Rev. Neurosci.
**2016**, 39, 19–40. [Google Scholar] [CrossRef] - Klatzky, R.; Freksa, C.; Habel, C.; Wender, K. Spatial Cognition: An Interdisciplinary Approach to Representing and Processing Spatial Knowledge; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Lou, Z.; You, J.; Wen, C.; Canedo, A.; Leskovec, J. Neural Subgraph Matching. arXiv
**2020**, arXiv:2007.03092. [Google Scholar] - Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Sedgewick, R. Algorithms in C, Part 5: Graph Algorithms, 3rd ed.; Addison-Wesley Professional: Boston, MA, USA, 2001. [Google Scholar]
- Hua, J.; Zhang, Y.; Yilmaz, A. The Mobile AR Sensor Logger for Android and iOS Devices. In Proceedings of the 2019 IEEE Sensors, Montreal, QC, Canada, 27–30 October 2019; pp. 1–4. [Google Scholar]
- Samano, N.; Zhou, M.; Calway, A. You are here: Geolocation by embedding maps and images. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 502–518. [Google Scholar]
- Vojir, T.; Budvytis, I.; Cipolla, R. Efficient Large-Scale Semantic Visual Localization in 2D Maps. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Amini, A.; Rosman, G.; Karaman, S.; Rus, D. Variational end-to-end navigation and localization. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8958–8964. [Google Scholar]
- Zha, B.; Yilmaz, A. Map-Based Temporally Consistent Geolocalization through Learning Motion Trajectories. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 31–36. [Google Scholar]

**Figure 1.**Key Idea: A graph representation of a map is composed of places and their connections on which an object navigates from one place to another. Additionally, object navigation is usually guided by instructions including turns made and distances traversed, based on which a motion trajectory is formed. We are inspired by this observation to generate a possible set of such trajectories and their respective node locations to be used as a dataset to train a graph neural network. The testing in this setup is a path subgraph that is fed into a trained model that in turn outputs the object’s node location on the map.

**Figure 2.**Illustration of the proposed method to achieve topological localization. A forward pass consists of (

**a**) acquisition of raw trajectory from visual or/and inertial data source; (

**b**) construction of a trajectory graph or augmented trajectory graph by identifying significant turnings in raw trajectories. The augmented trajectory graph encodes both the turns and the distances by inserting virtual nodes; (

**c**) each subgraph embedding is obtained by training a graph neural network; and (

**d**) classification of subgraph embedding to generate a node label that indicates the final location of the learned map. Note that the training and inference share an identical pipeline except for the subgraph embedding part.

**Figure 3.**Encode original trajectory into subgraph using two different representations: filtered trajectory graph encodes turning information, and augmented trajectory graph encodes both turning and distance information.

**Figure 4.**Egocentric coordinate system for angle computation and quantization into discrete angle representation. The illustrated figure uses 20 bins.

**Figure 5.**Illustration of embedding trajectory subgraph with a graph neural network layer and a fully connected layer. The GNN layer is used to embed each node’s attribute and integrate it into a single subgraph embedding by graph pooling operation. The fully connected layer and softmax layer serves as a classifier intended to classify subgraph embedding into node space ${v}_{i}\in \mathcal{E}={v}_{1},{v}_{2},\cdots ,{v}_{n}$.

**Figure 7.**Three ways to collect real trajectory data for testing: the left and medium ones are used for collecting trajectories through visual-inertial odometry in the small- and medium-sized map; the last one uses Google Maps to collect trajectory data in the large-sized map.

**Figure 8.**Training performance on the original, filtered, and augmented dataset for different numbers of layers in GNN. The first row is for the small-sized map where the best accuracies are reported to be 99.1%, 83.0%, and 94.0%, respectively; the second row is for the medium-sized map where the best accuracies are 98.9%, 82.7%, and 96.1%, respectively; and the bottom row is for the large-sized map, where best accuracies are 96.1%, 51.0%, and 87.5%, respectively.

**Table 1.**The graph statistics for three different sizes of the map. The average degree centrality here is used to indicate the structure complexity of areas when generating possible paths.

Location | Node | Edge | Map Size | Avg. Centrality | |
---|---|---|---|---|---|

Small-sized map (S) | OSU Oval | 91 | 155 | 0.16 km ∗ 0.5 km | 3.5 |

Medium-sized map (M) | OSU Campus | 115 | 147 | 2.5 km ∗ 2.5 km | 2.54 |

large-sized map (L) | Washington DC | 3038 | 8211 | 10 km ∗ 10 km | 2.66 |

**Table 2.**Trajectory dataset statistics. Three different training datasets correspond to the three subgraph representations in Section 3.2. Here, Num. represents the total number of path subgraphs and Cls. represents the node classes.

Original | Filtered | Augmented | ||||
---|---|---|---|---|---|---|

Num. | Cls. | Num. | Cls. | Num. | Cls. | |

S | 235,132 | 29 | 231,967 | 29 | 231,967 | 29 |

M | 10,574 | 72 | 8551 | 72 | 8551 | 72 |

L | 644,088 | 1000 | 644,088 | 1000 | 644,088 | 1000 |

**Table 3.**Real visual-inertial odometry trajectory testing result, including 20 trajectories for walking map and 10 trajectories for driving map.

Filtered Case | Augmented Case | |
---|---|---|

S: 20 | 14 (70%) | 17 (85%) |

M: 10 | 7 (70%) | 9 (90%) |

L: 50 | 25 (50%) | 42 (84%) |

**Table 4.**Descriptive and limited quantitative comparison with state-of-the-art methods for localization on driving map. Our method achieves better results with a topological representation that exploits graph neural networks. Note that “Metric” and “Non-Metric” indicate that the location is given by a numerical representation in a Cartesian coordinate system, and a non-numerical representation, such as a node or edge in a graph-structured map.

Method | Model | Map | Localization | Initial Position | NN | Input | Accuracy |
---|---|---|---|---|---|---|---|

2013 OpenStreetSLAM [35] | MCL | Graph | Metric | ✓ | ✗ | Image | ∼5 m |

2015 Brubaker et al. [34] | State-Space | Graph | Metric | ✓ | ✗ | Image | ∼4 m |

2017 Gupta et al. [36] | Graph Search | Graph | Metric | ✗ | ✗ | Image/IMU | ∼5 m |

2019 Amini et al. [63] | Variational NN | Tile | Metric | ✓ | ✓ | Image | − |

2019 Chen et al. [7] | CNN+GNN | Graph | Non-metric | ✓ | ✓ | RGBD | − |

2020 Wei et al. [41] | Seq2Seq | Graph | Non-metric | ✗ | ✓ | Motion | 95% |

2020 Zha et al. [43,64] | RNN | Graph | Non-metric | ✗ | ✓ | Motion | 93% |

2020 Samano et al. [61] | CNN | Tile | Non-metric | ✗ | ✓ | Image | 90% |

Graph (S) | 93.61% | ||||||

Ours | GNN | Graph (M) | Non-metric | ✗ | ✓ | Motion | 95.53% |

Graph (L) | 87.56% |

**Table 5.**The ablation study on training performance on different nodes of path subgraph in six layers using the GNN-SAGE model. It can be seen that the augmented dataset outperforms the filtered dataset and that the medium-sized map achieves the best accuracy.

Nodes | S | M | L | |||
---|---|---|---|---|---|---|

Filtered | Augmented | Filtered | Augmented | Filtered | Augmented | |

4 | 47.18% | 66.71% | 55.56% | 81.48% | - | - |

5 | 46.56% | 67.72% | 69.71% | 88.24% | 2.40% | 10.70% |

6 | 53.15% | 72.82% | 78.95% | 89.65% | 6.40% | 24.70% |

7 | 58.05% | 77.05% | 85.28% | 91.48% | 13.01% | 40.12% |

8 | 68.52% | 89.47% | 86.57% | 93.20% | 21.90% | 58.42% |

10 | 83.54% | 93.61% | 88.61% | 95.33% | 51.00% | 87.50% |

**Table 6.**The ablation study on training performance on different GNN models. As can be seen, the GNN-SAGE model outperforms the other models tested.

Model | S | M | L | |||
---|---|---|---|---|---|---|

Filtered | Augmented | Filtered | Augmented | Filtered | Augmented | |

GNN-GCN | 75.31% | 86.56% | 82.04% | 85.42% | 49.91% | 78.72% |

GNN-GAT | 71.39% | 86.81% | 82.63% | 87.44% | 49.85% | 79.22% |

GNN-SAGE | 83.54% | 93.61% | 88.61% | 95.33% | 51.20% | 87.55% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zha, B.; Yilmaz, A.
Subgraph Learning for Topological Geolocalization with Graph Neural Networks. *Sensors* **2023**, *23*, 5098.
https://doi.org/10.3390/s23115098

**AMA Style**

Zha B, Yilmaz A.
Subgraph Learning for Topological Geolocalization with Graph Neural Networks. *Sensors*. 2023; 23(11):5098.
https://doi.org/10.3390/s23115098

**Chicago/Turabian Style**

Zha, Bing, and Alper Yilmaz.
2023. "Subgraph Learning for Topological Geolocalization with Graph Neural Networks" *Sensors* 23, no. 11: 5098.
https://doi.org/10.3390/s23115098