Using Di ﬀ usion Map for Visual Navigation of a Ground Robot

: This paper presents the visual navigation method for determining the position and orientation of a ground robot using a di ﬀ usion map of robot images (obtained from a camera in an upper position—e.g., tower, drone) and for investigating robot stability with respect to desirable paths and control with time delay. The time delay appears because of image processing for visual navigation. We consider a di ﬀ usion map as a possible alternative to the currently popular deep learning, comparing the possibilities of these two methods for visual navigation of ground robots. The di ﬀ usion map projects an image (described by a point in multidimensional space) to a low-dimensional manifold preserving the mutual relationships between the data. We ﬁnd the ground robot’s position and orientation as a function of coordinates of the robot image on the low-dimensional manifold obtained from the di ﬀ usion map. We compare these coordinates with coordinates obtained from deep learning. The algorithm has higher accuracy and is not sensitive to changes in lighting, the appearance of external moving objects, and other phenomena. However, the di ﬀ usion map needs a larger calculation time than deep learning. We consider possible future steps for reducing this calculation time.


Introduction
Currently, deep learning (based on artificial neural networks) [1][2][3][4][5][6][7] is a very popular and powerful instrument for arriving at the solution of complex problems of classification and function regression. The main advantage of this method is that we need not develop some complex features describing the group of investigated objects. We obtain these features automatically. In addition, deep learning is a very effective method for function regression. For example, we cannot only recognize a ground robot but can also find its position and orientation.
"PoseNet is [1] is based on the GoogLeNet architecture. It processes RGB-images and is modified so that all three softmax and fully connected layers are removed from the original model and replaced by regressors in the training phase. In the testing phase the other two regressors of the lower layers are removed and the prediction is done solely based on the regressor on the top of the whole network.
Bayesian PoseNet Kendall et al. [2] propose a Bayesian convolutional neural network to estimate uncertainty in the global camera pose which leads to improving localization accuracy. The Bayesian convolutional neural is based on PoseNet architecture by adding dropout after the fully connected layers in the pose regressor and after one of the inception layers (layer 9) of GoogLeNet architecture.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a validation set (5000 images). The training set is used for training the diffusion map algorithm, and the validation set is used for verification of the diffusion map algorithm. The diffusion map algorithm allows us to find the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect to the camera. Hence, using 1800 images (900 from the training set and 900 from the validation set) and some interpolation functions, we can find values of the usual space coordinates for any values of the diffusion space coordinates.
The rest of the images from the validation set are used for finding error values for two coordinates and the angle describing the position of the ground robot with respect to the camera.
The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. The authors showed that the eigenfunctions of the Markov matrices could be used to construct embedded low-dimensional manifold coordinates, obtained by the diffusion map, that create effective representations for complex geometric structures. It is very useful for dimensionality reduction and data parameterization.
For many cases, the data sample is represented by a set of numeric attributes. In this situation, the condition that two nodes can be connected and the strength of this connection is calculated on the basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the matrix of translation probability for all node connections) as coordinates on a dataset. Therefore, the data initially considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to classical dimensionality reduction methods (for example, classical multivariate scaling or principal component analysis): it is nonlinear and preserves local structures.
Let us briefly describe the diffusion map algorithm. Suppose that images x 1 , x 2 , . . . , x n ∈ Mathematics 2020, 8, x FOR PEER REVIEW Part 2 is the Materials and Methods section. In this part, we describe m solution of the ground robot navigation using a diffusion map and provi describing the solution flowchart.
Part 3 is the Results. In this part, we introduce the diffusion map using define the function for comparing similar images based on the Lucas-Kanade m optical flow. In the second step, we define diffusion space and its eigenbasis. define how an arbitrary image can be expanded using the eigenbasis. However robot position and orientation is only half of the problem of navigation. Hence, consider the problem of controlling the robot's motion. The paper presents a me of the moving robot controlled by autopilot with time delay using results d Indeed, image processing for visual navigation demands much time and re However, the proposed method allows us to achieve stable control in the presen Part 4 is the Discussion section. In this part, we describe the results o algorithm and the deep learning algorithm, drawing conclusions on the efficien method.
Part 5 is the Conclusion. In this part, we conclude with the main results o the diffusion map with deep learning, describe the advantages and disadvant and also discuss future plans to improve the diffusion map.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a images). The training set is used for training the diffusion map algorithm, and used for verification of the diffusion map algorithm. The diffusion map algorit the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect using 1800 images (900 from the training set and 900 from the validation set) an functions, we can find values of the usual space coordinates for any values o coordinates.
The rest of the images from the validation set are used for finding e coordinates and the angle describing the position of the ground robot with resp The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. that the eigenfunctions of the Markov matrices could be used to constr dimensional manifold coordinates, obtained by the diffusion map, th representations for complex geometric structures. It is very useful for dimensio data parameterization.
For many cases, the data sample is represented by a set of numeric attribu the condition that two nodes can be connected and the strength of this connectio basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the probability for all node connections) as coordinates on a dataset. Therefor considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to clas reduction methods (for example, classical multivariate scaling or principal com nonlinear and preserves local structures.
Let ethods section. In this part, we describe methods used for the vigation using a diffusion map and provide a block diagram part, we introduce the diffusion map using four steps. First, we similar images based on the Lucas-Kanade method [18] for finding e define diffusion space and its eigenbasis. In the third step, we be expanded using the eigenbasis. However, finding the ground ly half of the problem of navigation. Hence, in the fourth step, we g the robot's motion. The paper presents a method for stabilization y autopilot with time delay using results developed herein [8]. ual navigation demands much time and results in time delay. lows us to achieve stable control in the presence of this time delay. tion. In this part, we describe the results of the diffusion map gorithm, drawing conclusions on the efficiency of the diffuse map is part, we conclude with the main results of our paper, compare ing, describe the advantages and disadvantages of the methods prove the diffusion map.
of images-a training set (900 images) and a validation set (5000 or training the diffusion map algorithm, and the validation set is on map algorithm. The diffusion map algorithm allows us to find t on an image in the diffusion space. coordinates of the ground robot with respect to the camera. Hence, aining set and 900 from the validation set) and some interpolation he usual space coordinates for any values of the diffusion space the validation set are used for finding error values for two ng the position of the ground robot with respect to the camera. analysis tool, was introduced in 2006 [11,12]. The authors showed arkov matrices could be used to construct embedded lowtes, obtained by the diffusion map, that create effective etric structures. It is very useful for dimensionality reduction and ple is represented by a set of numeric attributes. In this situation, e connected and the strength of this connection is calculated on the ding data points of the feature space. eigenvectors of the Markov matrices (the matrix of translation ions) as coordinates on a dataset. Therefore, the data initially idered as a point cloud.
two main advantages with respect to classical dimensionality lassical multivariate scaling or principal component analysis): it is ctures. iffusion map algorithm. Suppose that images , , … , ∈ , define up to multifold measure μ, such that: Part 2 is the Materials and Methods section. In this part, we describe methods used for the solution of the ground robot navigation using a diffusion map and provide a block diagram describing the solution flowchart.
Part 3 is the Results. In this part, we introduce the diffusion map using four steps. First, we define the function for comparing similar images based on the Lucas-Kanade method [18] for finding optical flow. In the second step, we define diffusion space and its eigenbasis. In the third step, we define how an arbitrary image can be expanded using the eigenbasis. However, finding the ground robot position and orientation is only half of the problem of navigation. Hence, in the fourth step, we consider the problem of controlling the robot's motion. The paper presents a method for stabilization of the moving robot controlled by autopilot with time delay using results developed herein [8]. Indeed, image processing for visual navigation demands much time and results in time delay. However, the proposed method allows us to achieve stable control in the presence of this time delay.
Part 4 is the Discussion section. In this part, we describe the results of the diffusion map algorithm and the deep learning algorithm, drawing conclusions on the efficiency of the diffuse map method.
Part 5 is the Conclusion. In this part, we conclude with the main results of our paper, compare the diffusion map with deep learning, describe the advantages and disadvantages of the methods and also discuss future plans to improve the diffusion map.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a validation set (5000 images). The training set is used for training the diffusion map algorithm, and the validation set is used for verification of the diffusion map algorithm. The diffusion map algorithm allows us to find the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect to the camera. Hence, using 1800 images (900 from the training set and 900 from the validation set) and some interpolation functions, we can find values of the usual space coordinates for any values of the diffusion space coordinates.
The rest of the images from the validation set are used for finding error values for two coordinates and the angle describing the position of the ground robot with respect to the camera.
The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. The authors showed that the eigenfunctions of the Markov matrices could be used to construct embedded lowdimensional manifold coordinates, obtained by the diffusion map, that create effective representations for complex geometric structures. It is very useful for dimensionality reduction and data parameterization.
For many cases, the data sample is represented by a set of numeric attributes. In this situation, the condition that two nodes can be connected and the strength of this connection is calculated on the basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the matrix of translation probability for all node connections) as coordinates on a dataset. Therefore, the data initially considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to classical dimensionality reduction methods (for example, classical multivariate scaling or principal component analysis): it is nonlinear and preserves local structures.
Let us briefly describe the diffusion map algorithm. Suppose that images , , … , ∈ , where -multifold in ℝ . Let us define up to multifold measure μ, such that: 3. ∀ , , , ≤ , + , measure µ, such that: 3. Let us build using an x 1 , x 2 , . . . , x n weighted graph, where x 1 , x 2 , . . . , x n will be nodes of the graph and µ(x i , x k ) will be graph distances, connecting the nodes i and k. These distances create distance matrix M. At the next step from the graph matrix M we define weight matrix W that is called the Laplace matrix. Moreover, in the next step from the weight matrix W, we create the Markov matrix P. The first m ≤ n eigenvectors of the Markov matrix may be considered as coordinates in the diffusion space D m x 1 ,x 2 , ..., x n with m-dimensions, produced by x 1 , x 2 , . . . , x n . If we get a new point y [x 1 , x 2 , . . . , x n ], by using the vector (µ(y, x 1 ), µ(y, x 2 ), . . . , µ(y, x n ) ) we can find point y in diffusion space D m x 1 ,x 2 , ..., x n . These coordinates provide us essential information about the new point y.
We can build the diffusion space up to the basic set of ground robot images x 1 , x 2 , . . . , x n . This diffusion space will allow us to find with high precision coordinates in the diffusion space for any additional image y, which is not included in the set x 1 , x 2 , . . . , x n .
For finding the diffusion map, we must define the function of similarity µ(x 1 , x 2 ) for two images x 1 , x 2 . To do it, we look for intrinsic points by the Harris-Stephens corner detector [17] and then use the Lucas-Kanade algorithm [18] to find corresponding intrinsic points.
Finally, we describe the robot control with time delay based on diffusion map algorithm measurements using the results of the papers [19,20].
The flowchart for the full ground robot navigation and control algorithm using a diffusion map is the following:
We define the similarity of two robot images using the Harris-Stephens corner detector [17] and then use the Lucas-Kanade algorithm [18] to find correspondent intrinsic points; 3.
We choose the set of n robot images describing its possible rotations and translations with some small steps; 4.
Using the set of n images and the similarity function, we find the diffusion map with correspondent n eigenvalues and eigenvectors; 5.
Using these eigenvalues and eigenvector, we find the correspondent reduced m-dimension (m « n) diffusion space; 6.
We describe the method for finding coordinates of an arbitrary robot image in the diffusion space; 7.
We add n new images and find their coordinates in the diffusion space; 8.
Using these two sets of n images and any interpolation method, we can find correspondence between the position and orientation of the robot and its image coordinates in the diffusion space; 9.
Using the found correspondence, we can find the position and orientation of the robot on any image; 10. Using the theory for the robot control with time delay, we can find control signals for decreasing difference between the found and desirable robot position and orientation; 11. Finish.

Basic and Additional Sets of Images
Images of a ground robot on a neutral background with pixel size 100 × 100 were created in the Unity program ( Figure 1). Because of the properties of our image processing algorithm, we choose a special pattern for the upper surface of the robot with a large number of intrinsic points and nonuniform environments of these points: The black strip borders with the upper robot surface. We need this strip to separate the upper robot surface from the surrounding background.

Intrinsic Points
On the image of the robot with pixel coordinates = 0, = 0 and rotation angle = 0°, we found 25 intrinsic points by the Harris-Stephens corner detector [17] (Figure 3). These points can be easily found on any image of the robot with a different position and orientation. We also found coordinates of all these points on all images of the basic set (900 images).  sin( ) and cos( ) to prevent a gap (for ) in the point between −180° and 180°.
The set of additional images (the validation set) includes 5000 images with random angles and coordinates in the same range.
Images , , … , can be considered as points of the multifold ∈ ℝ in the space, where N is equal to the pixel number of an image multiplied by the number of the layers, i.e., N = 100 × 100 × 3 = 30,000. As far as images, , , … , ∈ are different only by the coordinates and rotation angle of the robot. It is evident that ≅ ℝ × , and dim = 3. However, for the description of the robot position and orientation, we will use 4 values-, , cos  , sin  to prevent mistakes for rotation angle close to values −180° or 180°.

Intrinsic Points
On the image of the robot with pixel coordinates = 0, = 0 and rotation angle = 0°, we found 25 intrinsic points by the Harris-Stephens corner detector [17] (Figure 3). These points can be easily found on any image of the robot with a different position and orientation. We also found coordinates of all these points on all images of the basic set (900 images).  The set of the basic robot images corresponds to a table with the robot coordinates and robot angle sin(α) and cos(α) correspondent to the robot position and orientation on the images. We use sin(α) and cos(α) to prevent a gap (for α) in the point between −180 • and 180 • .
The set of additional images (the validation set) includes 5000 images with random angles and coordinates in the same range. Images x 1 , x 2 , . . . , x n can be considered as points of the multifold solution of the ground robot navigation using a diffusion map and provide a block diagram describing the solution flowchart. Part 3 is the Results. In this part, we introduce the diffusion map using four steps. First, we define the function for comparing similar images based on the Lucas-Kanade method [18] for finding optical flow. In the second step, we define diffusion space and its eigenbasis. In the third step, we define how an arbitrary image can be expanded using the eigenbasis. However, finding the ground robot position and orientation is only half of the problem of navigation. Hence, in the fourth step, we consider the problem of controlling the robot's motion. The paper presents a method for stabilization of the moving robot controlled by autopilot with time delay using results developed herein [8] Indeed, image processing for visual navigation demands much time and results in time delay However, the proposed method allows us to achieve stable control in the presence of this time delay Part 4 is the Discussion section. In this part, we describe the results of the diffusion map algorithm and the deep learning algorithm, drawing conclusions on the efficiency of the diffuse map method.
Part 5 is the Conclusion. In this part, we conclude with the main results of our paper, compare the diffusion map with deep learning, describe the advantages and disadvantages of the methods and also discuss future plans to improve the diffusion map.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a validation set (5000 images). The training set is used for training the diffusion map algorithm, and the validation set is used for verification of the diffusion map algorithm. The diffusion map algorithm allows us to find the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect to the camera. Hence using 1800 images (900 from the training set and 900 from the validation set) and some interpolation functions, we can find values of the usual space coordinates for any values of the diffusion space coordinates.
The rest of the images from the validation set are used for finding error values for two coordinates and the angle describing the position of the ground robot with respect to the camera.
The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. The authors showed that the eigenfunctions of the Markov matrices could be used to construct embedded lowdimensional manifold coordinates, obtained by the diffusion map, that create effective representations for complex geometric structures. It is very useful for dimensionality reduction and data parameterization.
For many cases, the data sample is represented by a set of numeric attributes. In this situation the condition that two nodes can be connected and the strength of this connection is calculated on the basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the matrix of translation probability for all node connections) as coordinates on a dataset. Therefore, the data initially considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to classical dimensionality reduction methods (for example, classical multivariate scaling or principal component analysis): it is nonlinear and preserves local structures.
Let us briefly describe the diffusion map algorithm. Suppose that images , , … , ∈ where -multifold in ℝ . Let us define up to multifold measure μ, such that: 3. ∀ , , , ≤ , + , ∈ R N in the space, where N is equal to the pixel number of an image multiplied by the number of the layers, i.e., N = 100 × 100 × 3 = 30,000. As far as images, Part 2 is the Materials and Methods section. In this part, we describe methods used for the solution of the ground robot navigation using a diffusion map and provide a block diagram describing the solution flowchart.
Part 3 is the Results. In this part, we introduce the diffusion map using four steps. First, we define the function for comparing similar images based on the Lucas-Kanade method [18] for finding optical flow. In the second step, we define diffusion space and its eigenbasis. In the third step, we define how an arbitrary image can be expanded using the eigenbasis. However, finding the ground robot position and orientation is only half of the problem of navigation. Hence, in the fourth step, we consider the problem of controlling the robot's motion. The paper presents a method for stabilization of the moving robot controlled by autopilot with time delay using results developed herein [8]. Indeed, image processing for visual navigation demands much time and results in time delay. However, the proposed method allows us to achieve stable control in the presence of this time delay.
Part 4 is the Discussion section. In this part, we describe the results of the diffusion map algorithm and the deep learning algorithm, drawing conclusions on the efficiency of the diffuse map method.
Part 5 is the Conclusion. In this part, we conclude with the main results of our paper, compare the diffusion map with deep learning, describe the advantages and disadvantages of the methods and also discuss future plans to improve the diffusion map.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a validation set (5000 images). The training set is used for training the diffusion map algorithm, and the validation set is used for verification of the diffusion map algorithm. The diffusion map algorithm allows us to find the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect to the camera. Hence, using 1800 images (900 from the training set and 900 from the validation set) and some interpolation functions, we can find values of the usual space coordinates for any values of the diffusion space coordinates.
The rest of the images from the validation set are used for finding error values for two coordinates and the angle describing the position of the ground robot with respect to the camera.
The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. The authors showed that the eigenfunctions of the Markov matrices could be used to construct embedded lowdimensional manifold coordinates, obtained by the diffusion map, that create effective representations for complex geometric structures. It is very useful for dimensionality reduction and data parameterization.
For many cases, the data sample is represented by a set of numeric attributes. In this situation, the condition that two nodes can be connected and the strength of this connection is calculated on the basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the matrix of translation probability for all node connections) as coordinates on a dataset. Therefore, the data initially considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to classical dimensionality reduction methods (for example, classical multivariate scaling or principal component analysis): it is nonlinear and preserves local structures.
Let us briefly describe the diffusion map algorithm. Suppose that images , , … , ∈ , where -multifold in ℝ . Let us define up to multifold measure μ, such that: Part 2 is the Materials and Methods section. In this part, we describe methods used for the solution of the ground robot navigation using a diffusion map and provide a block diagram describing the solution flowchart.
Part 3 is the Results. In this part, we introduce the diffusion map using four steps. First, we define the function for comparing similar images based on the Lucas-Kanade method [18] for finding optical flow. In the second step, we define diffusion space and its eigenbasis. In the third step, we define how an arbitrary image can be expanded using the eigenbasis. However, finding the ground robot position and orientation is only half of the problem of navigation. Hence, in the fourth step, we consider the problem of controlling the robot's motion. The paper presents a method for stabilization of the moving robot controlled by autopilot with time delay using results developed herein [8]. Indeed, image processing for visual navigation demands much time and results in time delay. However, the proposed method allows us to achieve stable control in the presence of this time delay.
Part 4 is the Discussion section. In this part, we describe the results of the diffusion map algorithm and the deep learning algorithm, drawing conclusions on the efficiency of the diffuse map method.
Part 5 is the Conclusion. In this part, we conclude with the main results of our paper, compare the diffusion map with deep learning, describe the advantages and disadvantages of the methods and also discuss future plans to improve the diffusion map.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a validation set (5000 images). The training set is used for training the diffusion map algorithm, and the validation set is used for verification of the diffusion map algorithm. The diffusion map algorithm allows us to find the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect to the camera. Hence, using 1800 images (900 from the training set and 900 from the validation set) and some interpolation functions, we can find values of the usual space coordinates for any values of the diffusion space coordinates.
The rest of the images from the validation set are used for finding error values for two coordinates and the angle describing the position of the ground robot with respect to the camera.
The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. The authors showed that the eigenfunctions of the Markov matrices could be used to construct embedded lowdimensional manifold coordinates, obtained by the diffusion map, that create effective representations for complex geometric structures. It is very useful for dimensionality reduction and data parameterization.
For many cases, the data sample is represented by a set of numeric attributes. In this situation, the condition that two nodes can be connected and the strength of this connection is calculated on the basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the matrix of translation probability for all node connections) as coordinates on a dataset. Therefore, the data initially considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to classical dimensionality reduction methods (for example, classical multivariate scaling or principal component analysis): it is nonlinear and preserves local structures.
Let us briefly describe the diffusion map algorithm. Suppose that images , , … , ∈ , where -multifold in ℝ . Let us define up to multifold measure μ, such that: Part 2 is the Materials and Methods section. In this part, we describe methods used for the solution of the ground robot navigation using a diffusion map and provide a block diagram describing the solution flowchart.
Part 3 is the Results. In this part, we introduce the diffusion map using four steps. First, we define the function for comparing similar images based on the Lucas-Kanade method [18] for finding optical flow. In the second step, we define diffusion space and its eigenbasis. In the third step, we define how an arbitrary image can be expanded using the eigenbasis. However, finding the ground robot position and orientation is only half of the problem of navigation. Hence, in the fourth step, we consider the problem of controlling the robot's motion. The paper presents a method for stabilization of the moving robot controlled by autopilot with time delay using results developed herein [8]. Indeed, image processing for visual navigation demands much time and results in time delay. However, the proposed method allows us to achieve stable control in the presence of this time delay.
Part 4 is the Discussion section. In this part, we describe the results of the diffusion map algorithm and the deep learning algorithm, drawing conclusions on the efficiency of the diffuse map method.
Part 5 is the Conclusion. In this part, we conclude with the main results of our paper, compare the diffusion map with deep learning, describe the advantages and disadvantages of the methods and also discuss future plans to improve the diffusion map.

Materials and Methods
In this paper, we use two sets of images-a training set (900 images) and a validation set (5000 images). The training set is used for training the diffusion map algorithm, and the validation set is used for verification of the diffusion map algorithm. The diffusion map algorithm allows us to find the coordinates of the ground robot on an image in the diffusion space.
We also know the usual space coordinates of the ground robot with respect to the camera. Hence, using 1800 images (900 from the training set and 900 from the validation set) and some interpolation functions, we can find values of the usual space coordinates for any values of the diffusion space coordinates.
The rest of the images from the validation set are used for finding error values for two coordinates and the angle describing the position of the ground robot with respect to the camera.
The diffusion map, as a data analysis tool, was introduced in 2006 [11,12]. The authors showed that the eigenfunctions of the Markov matrices could be used to construct embedded lowdimensional manifold coordinates, obtained by the diffusion map, that create effective representations for complex geometric structures. It is very useful for dimensionality reduction and data parameterization.
For many cases, the data sample is represented by a set of numeric attributes. In this situation, the condition that two nodes can be connected and the strength of this connection is calculated on the basis of closeness for the corresponding data points of the feature space.
The basic idea is to treat the eigenvectors of the Markov matrices (the matrix of translation probability for all node connections) as coordinates on a dataset. Therefore, the data initially considered as a graph can be considered as a point cloud.
This algorithm demonstrates two main advantages with respect to classical dimensionality reduction methods (for example, classical multivariate scaling or principal component analysis): it is nonlinear and preserves local structures.
Let us briefly describe the diffusion map algorithm. Suppose that images , , … , ∈ , where -multifold in ℝ . Let us define up to multifold measure μ, such that: 3. ∀ , , , ≤ , + , = 3. However, for the description of the robot position and orientation, we will use 4 values-u, v, cos α, sin α to prevent mistakes for rotation angle close to values −180 • or 180 • .

Intrinsic Points
On the image of the robot with pixel coordinates u = 0, v = 0 and rotation angle α = 0 • , we found 25 intrinsic points by the Harris-Stephens corner detector [17] (Figure 3). These points can be easily found on any image of the robot with a different position and orientation. We also found coordinates of all these points on all images of the basic set (900 images).  These points have the following properties [17]: • distinctness-the intrinsic point must be clearly different from the background and have a special (one-of-a-kind) environment • invariance-the recognition of an intrinsic point must be independent with respect to affine transformation. • stability-the recognition of an intrinsic point must be independent with respect to noise and errors. • uniqueness-except for local distinctness (already described above), the intrinsic point must also have global uniqueness for the distinction of repeating patterns. • interpretability-the intrinsic point must be defined in such a way to use them for analysis of correspondences and finding interpretive information from images.

Measure Definition
We defined , as a measure of similarity between images (one of the basic images) and by the following: 1. Both images and we transform to gray images; 2. Twenty-five intrinsic points on the first image we take from prepared data of basic images with already previously found intrinsic points; 3. Using the Lucas-Kanade algorithm [18], we find correspondent points on the second image ( Figure 4); 4. We find distances between correspondent points. If no correspondent point is found for some intrinsic point of the first image, we suppose the distance to be equal to 100; 5. The Lucas-Kanade algorithm gives us information about the error value of the found correspondence. If this error value is larger than the threshold value 30, we suppose the distance to be equal to 100. The example of such correspondence is denoted by a red arrow in Figure 4; 6. All 25 found distances are arranged in increasing order; 7. The measure of similarity between images and is the median value of the distances. These points have the following properties [17]: • distinctness-the intrinsic point must be clearly different from the background and have a special (one-of-a-kind) environment. • invariance-the recognition of an intrinsic point must be independent with respect to affine transformation. • stability-the recognition of an intrinsic point must be independent with respect to noise and errors. • uniqueness-except for local distinctness (already described above), the intrinsic point must also have global uniqueness for the distinction of repeating patterns. • interpretability-the intrinsic point must be defined in such a way to use them for analysis of correspondences and finding interpretive information from images.

Measure Definition
We defined µ(x 1 , x 2 ) as a measure of similarity between images x 1 (one of the basic images) and x 2 by the following:

1.
Both images x 1 and x 2 we transform to gray images; 2.
Twenty-five intrinsic points on the first image we take from prepared data of basic images with already previously found intrinsic points; 3.
Using the Lucas-Kanade algorithm [18], we find correspondent points on the second image ( Figure 4); 4.
We find distances between correspondent points. If no correspondent point is found for some intrinsic point of the first image, we suppose the distance to be equal to 100; 5.
The Lucas-Kanade algorithm gives us information about the error value of the found correspondence. If this error value is larger than the threshold value 30, we suppose the distance to be equal to 100. The example of such correspondence is denoted by a red arrow in Figure 4; 6.
All 25 found distances are arranged in increasing order; 7.
The measure of similarity between images x 1 and x 2 is the median value of the distances.

Definition of Weight Matrix
1. Using the described above measure based on the Lucas-Kanade algorithm, we can define the distance matrix M with size 900 × 900, where any matrix element , is defined by measure , between -th and -th images. 2. In the next step, we define weight matrix W: where ε-is a correctly chosen scale coefficient.
To choose ε, we use the recommendation described in the paper (Bah, 2008), specifically: • we calculate the value: • we draw the function on a logarithmic scale ( Figure 5). This graph has two asymptotes for → 0 and for → +∞ • we choose the final value of on the middle of the linear part of the graph in the inflection point.

Creation of Diffusion Space
, ,…, For the creation of the diffusion space, we need to find eigenvectors of the Markov matrix. Markov matrix is created from matrix W by normalization of its rows. To do this, we need to: 1. Create the diagonal matrix D using the following formulas:

1.
Using the described above measure based on the Lucas-Kanade algorithm, we can define the distance matrix M with size 900 × 900, where any matrix element (i, j) is defined by measure µ x i , x j between i-th and j-th images.

2.
In the next step, we define weight matrix W: where ε-is a correctly chosen scale coefficient. To choose ε, we use the recommendation described in the paper (Bah, 2008), specifically: • we calculate the value: • we draw the function L(ε) on a logarithmic scale ( Figure 5). This graph has two asymptotes for ε → 0 and for ε → +∞ .

Definition of Weight Matrix
1. Using the described above measure based on the Lucas-Kanade algorithm, we can define the distance matrix M with size 900 × 900, where any matrix element , is defined by measure , between -th and -th images. 2. In the next step, we define weight matrix W: where ε-is a correctly chosen scale coefficient.
To choose ε, we use the recommendation described in the paper (Bah, 2008), specifically: • we calculate the value: • we draw the function on a logarithmic scale ( Figure 5). This graph has two asymptotes for → 0 and for → +∞ • we choose the final value of on the middle of the linear part of the graph in the inflection point.

Creation of Diffusion Space
, ,…, For the creation of the diffusion space, we need to find eigenvectors of the Markov matrix. Markov matrix is created from matrix W by normalization of its rows. To do this, we need to: 1. Create the diagonal matrix D using the following formulas: For the creation of the diffusion space, we need to find eigenvectors of the Markov matrix. Markov matrix P is created from matrix W by normalization of its rows. To do this, we need to:

1.
Create the diagonal matrix D using the following formulas: 2.
Find eigenvectors of the Markov matrix P = D −1 W that we previously need to create the symmetric matrix P = D 1 2 PD − 1 2 = D − 1 2 WD − 1 2 instead of P = D −1 W (element P ij of Markov matrix P can be interpreted as the probability of transition from node i to node j of the graph). It was demonstrated in the paper [13] that symmetric matrix P has the same eigenvectors as P up to multiplication to D − 1 2 . Specifically, if ν are eigenvectors and λ are eigen values of matrix P , then eigenvectors and eigen values of matrix P will be, correspondingly: The first eigenvector ν 1 is trivial and equal to ν 1 =(1, 1, 1, . . . , 1) T with eigen value 1.

3.
Image coordinates of x i in diffusion space 2. Find eigenvectors of the Markov matrix = that we previously need to create the symmetric matrix ′ = = instead of = (element of Markov matrix can be interpreted as the probability of transition from node i to node j of the graph). It was demonstrated in the paper [13] that symmetric matrix has the same eigenvectors as up to multiplication to . Specifically, if ′ are eigenvectors and λ′ are eigen values of matrix ′, then eigenvectors and eigen values of matrix will be, correspondingly: The first eigenvector  is trivial and equal to  =(1, 1, 1, …, 1) T with eigen value 1.

Image coordinates of in diffusion space
, ,…, can be defined as the following: where  , =  is the -th element of the -th eigenvector. As demonstrated [13], the distance between points and in the diffusion space is equal: Using only the first m = 20 ≤ n = 900 eigenvectors, we get a low-dimension representation of the initial set of basic images. The corresponding diffusion space , ,…, can be defined as follows: In this connection, the information about the position and orientation of the robot in the images is included in the first eigenvectors ( Figure 6). In Figure 6, we can see that the second eigenvector is correlated with sin( ), the third vector is correlated with cos( ), the sixth vector is correlated with , and the seventh vector is correlated with .

Finding the Diffusion Coordinate for an Arbitrary Image Not Included in the Basic Set
Currently, we need to find the diffusion coordinate for an arbitrary image not included in the basic set: 1. Using the above-described measure for arbitrary image ̅ , we can create the vector: We can rewrite the vector in weight form similar to elements of matrix (see Equation (2)): can be defined as the following: where ν j,i = ν j (x i ) is the i-th element of the j-th eigenvector. As demonstrated [13], the distance between points x i and x j in the diffusion space is equal: Using only the first m = 20 ≤ n = 900 eigenvectors, we get a low-dimension representation of the initial set of basic images. The corresponding diffusion space 2. Find eigenvectors of the Markov matrix = that we previously need to create the symmetric matrix ′ = = instead of = (element of Markov matrix can be interpreted as the probability of transition from node i to node j of the graph). It was demonstrated in the paper [13] that symmetric matrix has the same eigenvectors as up to multiplication to . Specifically, if ′ are eigenvectors and λ′ are eigen values of matrix ′, then eigenvectors and eigen values of matrix will be, correspondingly: 〈′ , ′ 〉 = ′ ′ = ′ , ′ , = 1, = 0, ≠ The first eigenvector  is trivial and equal to  =(1, 1, 1, …, 1) T with eigen value 1.

Image coordinates of in diffusion space
, ,…, can be defined as the following: where  , =  is the -th element of the -th eigenvector. As demonstrated [13], the distance between points and in the diffusion space is equal: Using only the first m = 20 ≤ n = 900 eigenvectors, we get a low-dimension representation of the initial set of basic images. The corresponding diffusion space , ,…, can be defined as follows: In this connection, the information about the position and orientation of the robot in the images is included in the first eigenvectors ( Figure 6). In Figure 6, we can see that the second eigenvector is correlated with sin( ), the third vector is correlated with cos( ), the sixth vector is correlated with , and the seventh vector is correlated with .

Finding the Diffusion Coordinate for an Arbitrary Image Not Included in the Basic Set
Currently, we need to find the diffusion coordinate for an arbitrary image not included in the basic set: 1. Using the above-described measure for arbitrary image ̅ , we can create the vector: can be defined as follows: In this connection, the information about the position and orientation of the robot in the images is included in the first eigenvectors ( Figure 6). In Figure 6, we can see that the second eigenvector is correlated with sin(α -65), the third vector is correlated with cos(α + 205) = cos(α -65 + 270) = sin(α -65), the sixth vector is correlated with u, and the seventh vector is correlated with v. 2. However, to find the diffusion coordinate for an arbitrary image not included in the basic set, it is necessary to use the method of geometric harmonics based on the Nyström extension [12] and [15].
According to the definition of eigenvectors and eigenvalues ν W , λ W of matrix . , we can write the following equations: Using the Nyström extension, we can approximate  ̅ for images ̅ not included in the basic set:

Finding the Diffusion Coordinate for an Arbitrary Image not Included in the Basic Set
Currently, we need to find the diffusion coordinate for an arbitrary image not included in the basic set: 1.
Using the above-described measure µ for arbitrary image x, we can create the vector: We can rewrite the vector m in weight form similar to elements of matrix W (see Equation (2)): 2. However, to find the diffusion coordinate for an arbitrary image not included in the basic set, it is necessary to use the method of geometric harmonics based on the Nyström extension [12] and [15].
According to the definition of eigenvectors and eigenvalues ν W , λ W of matrix W from Equation (2), we can write the following equations: Using the Nyström extension, we can approximate ν W s (x) for images x not included in the basic set: Eigen vectors {ν W } form the orthonormal basis in R n . Consequently, any function f(x), defined in the basic set of images, can be approximate as the linear combination of basic eigenvectors {ν W }: where Using the Nyström extension, we can approximate f(x) for an arbitrary value x: Applying the last equation for the eigenvectors ν j of Markov matrix P instead of function f, we derive from Equations (14) and (17): These values can be used for the calculation of the diffusion coordinates of the image x.

3.
Diffusion coordinates of the image x can be found as follows:

Finding the Robot Coordinates and Rotation Angle from Image Coordinates in the Diffusion Space
We can form the learning set including 900 basic images and 900 images (not included in the basic set) with the robot's two known coordinates, sin() and cos() of rotation angle and find the coordinates of these images in the diffusion space. In the next step, we can consider the robot's two coordinates, sin() and cos() of rotation angle as a function of its image coordinates in the diffusion space. Indeed, we can find these 4 functions using 1800 images described above with the help of any known interpolation method (for example, artificial neural network, inverse distance weighting and so on).  We can see from [19] that rotation and forward movement can be described using the following system of equations: As a result of system nonlinearity, it is too difficult to use those equations for stability analysis. It is thus necessary to linearize them. The parameters x(t), y(t), α(t), v(t), ω(t) correspond to steady-state flight (x 0 (t), y 0 (t), α 0 (t), v 0 (t), ω 0 (t)) perturbated by small increments δx(t), δy(t),δα(t), δv(t-τ), δω(t-τ).
The robot trajectory can be estimated by a polygonal chain path. This path is a set of rotations in vertices with zero translational velocity and constant angular velocity (rotation), and linear motion along straight-line segments with zero angular velocity and constant translational velocity (linear motion).
In case the stationary parameters themselves cannot guarantee stability for the desirable steady-state trajectory, an autopilot is necessary (see Figure 8). This autopilot must state the controlling parameters δv(t-τ), δω(t-τ) controlled by autopilot as functions of the output parameters (δx(t), δy(t),δα(t)), which are perturbations with respect to the desirable steady-state path. The autopilot gets the output parameter values from navigation: from vision-based navigation, satellite navigation, inertial navigation, and so on. Using these measurements, the autopilot can find signals of control for decreasing undesirable perturbations. Unfortunately, for any measurement, some delay always exists in getting output parameters used for control. As a result, we are faced with a problem because we have a lack of information for control. From [20], we can see that even for such conditions with the time delay, we can generate a signal of control that guarantees a stable trajectory.

Figure 8.
Automatic control: the ground robot has output parameters (output of block 1) describing its position, orientation, and velocity. These parameters are measured and calculated by the measurement system with some time delay (output of block 2). These measured parameters and their desirable values, calculated from the desirable trajectory (inputs of block 3), can be compared, and deviation from desirable trajectory can be calculated (output of block 3). The automatic pilot receives these deviations and calculates the control parameters for the ground robot to decrease these deviations. Then the ground robot changes its output parameters (output of block 1). This cycle repeats for the duration of the motion.

Discussion
In [8], we used the deep learning network for the solution of the same task: looking for the orientation and position of a ground robot. The deep learning network was developed using AlexNet with small changes in training mode and structure. AlexNet [7] is a popular CNN (convolutional neural network), which was developed by Alex Krizhevsky and described by Ilya Sutskever, Alex Krizhevsky and Geoffrey Hinton. On 30 September 2012, AlexNet took part in the ImageNet large scale visual recognition challenge and was the winner. Initially, AlexNet had eight levels (five convolutional levels and three fully connected levels). In our case, we replaced the last two fully connected layers with one fully connected layer having four outputs (trigonometric functions sin() and cos(), where  -angle of rotation, and x, y coordinates) or seven outputs (sin(), cos(),x, y, angle error, coordinate errors, and total error) and also one regression layer. The pre-trained modified AlexNet network was used for the solution of the regression problem: finding the course angle and two coordinates of the ground robot ( Figure 2 in [8]).
With this deep learning network, it is possible to find the ground robot angle with an accuracy of 4° and ground robot position with an accuracy of 2.6 pixels (6 cm) ( Table 1) for any 227 × 227 pixelsized image. For training, we used the set of 10,000 images and the "adam" solver, with training carried out in 200 epochs with a batch size equal to 500. Using Unity modeling, we programmatically generated the dataset.
At every step, robot rotation angle, robot position, and working platform element location were selected randomly. We simulated lighting as sunlight falling from various directions and angles. We used 50 different textures for the background to be sure that the ground robot coordinates would be independent of the background. In addition, 12,000 3036 × 3036-pixel sized RGB images were generated which contain the ground robot. There also were images where the robot was completely or partially occluded by the camera tower or some objects of the environment. We also prepared a file with data about the angle of rotation and coordinates for the ground robot. After using these images, we prepared the set of reduced 227 × 227-pixel sized images.
We have the train set (used for the ANN or artificial neural network training) and the validation set for the verification of the pre-trained ANN.
All errors were found for the validation set of images as differences between positions and orientations, obtained from the pre-trained artificial neural network, and known positions and Figure 8. Automatic control: the ground robot has output parameters (output of block 1) describing its position, orientation, and velocity. These parameters are measured and calculated by the measurement system with some time delay (output of block 2). These measured parameters and their desirable values, calculated from the desirable trajectory (inputs of block 3), can be compared, and deviation from desirable trajectory can be calculated (output of block 3). The automatic pilot receives these deviations and calculates the control parameters for the ground robot to decrease these deviations. Then the ground robot changes its output parameters (output of block 1). This cycle repeats for the duration of the motion.

Discussion
In [8], we used the deep learning network for the solution of the same task: looking for the orientation and position of a ground robot. The deep learning network was developed using AlexNet with small changes in training mode and structure. AlexNet [7] is a popular CNN (convolutional neural network), which was developed by Alex Krizhevsky and described by Ilya Sutskever, Alex Krizhevsky and Geoffrey Hinton. On 30 September 2012, AlexNet took part in the ImageNet large scale visual recognition challenge and was the winner. Initially, AlexNet had eight levels (five convolutional levels and three fully connected levels). In our case, we replaced the last two fully connected layers with one fully connected layer having four outputs (trigonometric functions sin(α) and cos(α), where α-angle of rotation, and x, y coordinates) or seven outputs (sin(α), cos(α), x, y, angle error, coordinate errors, and total error) and also one regression layer. The pre-trained modified AlexNet network was used for the solution of the regression problem: finding the course angle and two coordinates of the ground robot ( Figure 2 in [8]).
With this deep learning network, it is possible to find the ground robot angle with an accuracy of 4 • and ground robot position with an accuracy of 2.6 pixels (6 cm) ( Table 1) for any 227 × 227 pixel-sized image. For training, we used the set of 10,000 images and the "adam" solver, with training carried out in 200 epochs with a batch size equal to 500. Using Unity modeling, we programmatically generated the dataset. At every step, robot rotation angle, robot position, and working platform element location were selected randomly. We simulated lighting as sunlight falling from various directions and angles. We used 50 different textures for the background to be sure that the ground robot coordinates would be independent of the background. In addition, 12,000 3036 × 3036-pixel sized RGB images were generated which contain the ground robot. There also were images where the robot was completely or partially occluded by the camera tower or some objects of the environment. We also prepared a file with data about the angle of rotation and coordinates for the ground robot. After using these images, we prepared the set of reduced 227 × 227-pixel sized images.
We have the train set (used for the ANN or artificial neural network training) and the validation set for the verification of the pre-trained ANN.
All errors were found for the validation set of images as differences between positions and orientations, obtained from the pre-trained artificial neural network, and known positions and orientations used for the creation of images by the Unity program. The root-mean-squares of these errors are written in the second row of Table 1.
Let us consider the visual navigation again with the help of the diffusion map algorithm. In Figure 6, we can see that the second eigenvector is correlated with sin(α), the third vector is correlated with cos(α), the sixth vector is correlated with u, and the seventh vector is correlated with v.
We can see that the basis of the diffusion space, in fact, corresponds to the robot position and orientation. The first seven eigenvectors give us full information about these parameters.
Our experiment demonstrates that for optimal results, the number of the first eigenvectors of the diffusion space m = 40 (from maximal value n = 900).
In Table 2, we can see the error value for four variables describing the ground robot position: coordinate (u and v) and rotation angle (cos(α), sin(α), α). We give the results for the training set (used for the creation of the diffusion space and interpolation function) and the independent validation set. Table 2. Error of coordinates and rotation angle for the ground robot for navigation by the diffusion map; size of the robot is 50 cm × 50 cm; pixel size corresponds to 1.47 cm on the ground.