Research on an Optimal Path Planning Method Based on A* Algorithm for Multi-View Recognition

: In order to obtain the optimal perspectives of the recognition target, this paper combines the motion path of the manipulator arm and camera. A path planning method to ﬁnd the optimal perspectives based on an A* algorithm is proposed. The quality of perspectives is represented by means of multi-view recognition. A binary multi-view 2D kernel principal component analysis network (BM2DKPCANet) is built to extract features. The multi-view angles classiﬁer based on BM2DKPCANet + Softmax is established, which outputs category posterior probability to represent the perspective recognition performance function. The path planning problem is transformed into a multi-objective optimization problem by taking the optimal view recognition and the shortest path distance as the objective functions. In order to reduce the calculation, the multi-objective optimization problem is transformed into a single optimization problem by fusing the objective functions based on the established perspective observation directed graph model. An A* algorithm is used to solve the single source shortest path problem of the fused directed graph. The path planning experiments with different numbers of view angles and different starting points demonstrate that the method can guide the camera to reach the viewpoint with higher recognition accuracy and complete the optimal observation path planning.


Introduction
With the rapid development of artificial intelligence, it is an inevitable trend to realize intelligence for robots that integrate machine vision technology.Different images observed from different perspectives provide information on different features.Information in a certain direction from a fixed perspective cannot satisfy recognition requirements.Dynamic observation from high-quality multi-view angles can obtain more comprehensive features of the target, which can improve the recognition accuracy greatly.Series linkage robots are widely used in the industrial field with simple structures and flexible operations.It depends on the path planning of the manipulator whether the mechanical arm can complete tasks efficiently and accurately in the workspace.Therefore, in order to improve the target recognition accuracy of the robot in different environments, it is of great significance to plan the motion of the manipulator to obtain the optimal multi-view observation path of the target.
Selection of perspectives, motion of cameras and path planning of mechanical arms have been researched separately in general.For the selection of perspectives, the traditional selection methods can be divided into two types, based on geometric features and semantic features according to the different measurement criteria.Viewpoint metrics based on geometric features include profile curvature [1], depth map [2], region of interest [3], curvature variation of target projection on viewpoint-related imaging plane [4], etc. Local geometric features reflect details which cannot represent the main content of the target.Scholars have proposed some methods based on information theory, such as viewpoint interaction information [5], relative entropy [6], viewpoint visibility KL distance [7], and surrogate information [8].These methods can be used to further measure global features based on geometrical property measurement.For features such as texture and functionality, which are called semantic features, the proposed viewpoint metrics methods include discernibility region [9], upright direction [10], implication segmentation [11], Schelling Points [12], etc.The viewpoint metrics method based on geometric feature detection is simple and fast, but the measurement quality is far from expected.The method based on semantic feature detection can generally obtain good results, but its operation is complex and inefficient.With the development of artificial intelligence, deep learning is also gradually applied to the study of viewpoint selection.With the introduction of long short-term memory networks (LSTM) [13], multi-loop-view convolutional neural networks (MLVCNN) [14], the hierarchical multi-view context modeling method (HMVCM) [15] and others combined with compact 3D shape descriptors, Wei et al. [16] presented a view-based convolution neural network (view-GCN), and aggregated the multi-view features as a global shape descriptor.Zhou et al. [17] proposed a multi-view saliency guided depth neural network, which consisted of a model projection-rendering module, visual context learning module and multi-view learning module.In the visual context learning module, convolutional neural network was used to extract the visual features of a single view, and then the salient LSTM was used to select the representative view adaptively.Multi-view learning modules generated compiling three-dimensional object descriptors using the designed LSTM classification.The method utilized a multi-view context to select representative views and measure similarity.Meng et al. [18] proposed a target detection model based on a convolution neural network, which was combined with a three-dimensional CAD model to select viewpoints.The above viewpoints were preset, all views were entered into the network, and the best performance could be achieved by using all views.In order to reduce calculation, RotationNet [19] only needed a few images.The perspective variable was used as the optimized latent variable in the training process, which could obtain the ability to establish the corresponding viewpoint for the input views.The pose and the category label of the object were estimated.Then, the base axis of the object determined in an unsupervised mean based on the appearance was added to achieve alignment, and the views were enhanced [20].However, this method mainly finds the corresponding viewpoint for the input view, and still needs to contain a predefined complete multi-view image set.Inspired by the attention mechanism, Chen et al. [21] proposed a view-enhanced recurrent attention model (VERAM), which was a real active viewpoint selection mechanism.It contained a camera module, observation sub-network, cyclic subnet, viewpoint evaluation subnet and classification subnet.The information flow of the reverse gradient was added to balance the perspective prediction network and classification sub-network.The reward and loss functions were designed to improve the performance of view classification and perspective selection.The network can select the shortest view sequence independently and input it into a three-dimensional shape classification network with the active camera viewpoint selection ability.However, the problems appear easily in this mixed system such as too many parameters lead to fit, and difficult to balance the subnets.It is not difficult to find that perspectives selection methods based on deep learning describe the features of the observed objects more comprehensively and the selection results are more representative.
For the study of camera motion planning, Sokolov et al. [22] used a greedy algorithm to obtain the optimal viewpoint set.A method of camera auto-roaming guidance based on a good viewpoint set was proposed, and it was optimized based on semantic distance.Alam et al. [23] planned a virtual camera in an all-round way using heuristic methods to solve the viewpoint coverage problem.Yao et al. [24] fixed the viewpoint height, optimized the camera layout, and converted the three-dimensional problem into a two-dimensional discrete coverage problem.Amriki et al. [25] proposed a SmarkMax algorithm, which achieved maximum coverage with a minimum number of viewpoints.Marchand et al. [26] proposed a camera relocation method based on multiple output learning.Valentin et al. [27] studied the uncertainty in forest regression to achieve accurate camera positioning.Lan et al. [28] proposed a viewpoint selection evaluation method based on the Kullback-Leibler Divergence method.The aesthetic evaluation method was interpreted by the three-point method and diagonal composition method, and the probability density formula with the existence of the target was defined.The optimal viewpoint for searching scenes in the robot photography system was constructed.Qiu et al. [29] proposed a framework for answering questions in human-computer interaction by iteratively observing scenes.Yokomatsu et al. [30] proposed a spatial search method combining Gaussian process and particle swarm optimization (PSO) algorithm for indoor UAV photography, whose objective was to effectively predict the heuristic optimal viewpoint with short time.With regard to camera motion planning, it can be found that some of the research results do not consider the choice of the optimal perspective, and camera motion planning considering the optimal perspective do not consider factors such as camera motion time and efficiency.
For the path planning of the manipulator, the research achievements are mainly focused on the path planning of the manipulator end in Cartesian space and the trajectory optimization of joints in joints space.The purpose of this paper is to optimize the view angle of the camera driven by the end of the manipulator; therefore, only the path planning of the manipulator in Cartesian space is studied in this paper.Currently, the path planning algorithms proposed by domestic and foreign scholars can be divided into sample-based planning algorithms and search-based combined planning algorithms.Sample-based planning algorithms capture the connectivity properties in an infinite number of free spaces approximately according to the conditions to be met using a countable set of sampling points or sampling sequences.Some classical methods include the Probabilistic Road Map (PRM), Rapid-Exploring Random Tree (RRT), Artificial Potential Field (APF), etc. PRM [31] is a typical global planning algorithm, and map information (including all barrier information) is fully known.The PRM algorithm can be directly used in the path planning of different tasks under the constant obstacle environment with simple operation and probability completeness.It is suitable for static offline planning.However, the PRM algorithm requires a lot of time and memory.The more degrees of freedom the manipulator has, the more complex random sampling and collision detection are.RRT, as a one-way query global path planning algorithm uses a tree structure for node-to-target planning [32,33].The algorithm satisfies probability completeness and has lower planning efficiency for high-dimensional sampling space.The methods based on APF [34,35] calculate the force of gravity and repulsion to drive the robot to the target point.They are suitable for low-dimensional and high-dimensional space planning with simple models.Meanwhile, deficiencies also exist, such as to the danger of falling into local optimum and insensitivity to dynamic obstacle avoidance.
The search-based combinatorial programming algorithm searches a path after constructing a topological graph in free space using a graph search algorithm.The commonly used algorithms include Dijkstra [36] and A* algorithms [37].The Dijkstra algorithm generally solves the shortest path problem in directed graph.It works by using greedy algorithms.Each iteration, it traverses a vertex nearest to the starting point, and each iteration traverses adjacent vertices from vertices that have not been visited, until it extends to the target point.The A* algorithm combines information from the Dijkstra algorithm with the Breadth-First Search (BFS) algorithm for path searching.The A* algorithm is also often used to solve the shortest path in a static road network, which is one of the commonly used heuristic algorithms.
In recent years, with the rise of in-depth learning, some scholars have applied machine learning methods in the path planning of mechanical arms, and have accomplished more achievements.Qureshi et al. [38] proposed motion planning networks (MPNets) based on deep neural networks, which achieve end-to-end planning.However, the data depended on the RRT algorithm and had some limitations.Li et al. [39] transformed the shortest path problem of the manipulator path planning into a linear programming problem, which was solved using a zeroing neural network (ZNN).A zero-return neurodynamic method was constructed, and a ZNN model was established for the shortest path problem of the manipulator.Then, the stability of the ZNN model was proved by the Lyapunov method.Finally, the ZNN model was applied in the path planning of the manipulator to generate the optimal planning path.Zhang et al. [40] combined reinforcement learning with an intelligent optimization algorithm, and proposed a deep deterministic policy gradient-adaptive ant colony optimization algorithm (DDPG-AACO) for complex parts of welding manipulators.Deep deterministic policy gradient (DDPG) training was used to learn local path planning strategies to control the safe movement of a welding manipulator between two weld endpoints.Then, global path planning, with the shortest path length and traversing all seams under the constraint of the welding direction, was achieved using an adaptive ant colony optimization algorithm (AACO) that manually alters the pheromone value.The simulation results showed the effectiveness of this method.Hua et al. [41] proposed a deep learning training method for motion decoupling solution of redundant mechanical arm.Zou et al. [42] set up a multi-objective decision optimization function for route planning based on the idea of reinforcement learning.Wang et al. [43] proposed a hybrid bidirectional rapidly exploring random tree H-BRRT algorithm based on reinforcement learning to solve problems such as slow random generation and convergence generated by the rapidly exploring random tree (RRT) algorithm.Target gravity strategy was introduced to model the random exploration process.Enhanced learning was applied to improved target gravity exploration strategies, and adaptive switch operations were carried out based on cumulative performance.Experiments showed that this strategy improved the search efficiency and shortened the generated path.Santos et al. [44] established a multi-objective optimization formula considering the maneuverability of the manipulator and the maximum torque.The trajectory of the space manipulator was optimized by the machine learning.Fang et al. [45] presented a snake tongue algorithm based on a slope potential field, which was combined with the genetic algorithm (GA) to complete a shorter path search.The reinforcement learning (RL) was further fused to reduce the number of path points, which achieved efficient collision-free path planning for the robot.
Sampling-based methods, such as PRM and RRT algorithms, have the characteristic of probability completeness; that is, the solution can be guaranteed, but not necessarily the optimal solution in the presence of solutions as long as the planning or searching took.Compared with the machine learning method, the traditional planning methods can directly plan the route and find the near-optimal path more effectively, but they also face the problems that complex calculation and poor planning timeliness.The A* algorithm has the advantages of rapid response to the environment and direct path search.As a direct search algorithm, it is widely used in path planning problems.In order to ensure the real-time performance in the process of path planning and reducing the speed of planning, this paper established a directed-graph path-planning model.The nodes search method based on directed graph was selected.The A* algorithm has both the features of the fast speed in the depth and obtaining the optimal solution in the width by adding the evaluation function.It has the maximum probability to find the optimal solution and can reduce the redundancy time.Therefore, the A* algorithm was selected for path planning.
From the above analysis, it is clear that the current research does not combine the optimal viewpoint selection, camera motion and path planning of mechanical arms.The current path planning studies are generally conducted on the premise of setting the path and obstacle avoidance functions given the starting point and end point, which have no awareness of autonomous end point selection.In order to make the most widely used linkage robots in the industrial field complete the accurate identification of the operated target and have the awareness of autonomous viewpoint selection with the help of machine vision technology, this paper integrates the camera and the end of the robot arm together, and combines the optimal viewpoint selection, camera motion and path planning of the mechanical arm to conduct the research on viewpoint-seeking path planning technology.Specifically, the main contributions of this paper are as follows: ( The remainder of this paper is structured as follow.Section 2 presents the related research on multi-view target recognition.Section 3 establishes the target observation model, determines the objective function characterizing the viewpoint recognition performance and shortest path, and studies in detail the A* algorithm-based multi-viewpoint recognition optimal path planning.Different path starting points are selected for perspective-seeking path planning experiments in Section 4. The research content of this paper is summarized in Section 5.

Multi-View Target Recognition
The view-based approach is widely used in the 3D object recognition, that is, 3D object recognition problem is transformed into the recognition of multiple 2D images.Features are captured from multiple viewpoints and fused to improve the recognition rate.Due to the dimensionality reduction, 2D images only need 2D convolution filters for feature learning, which greatly reduces the computation.The images acquired from different viewpoints can also complement the detailed features of the target.Therefore, the view-based multiview target recognition method with excellent adaptability and flexibility has become a common method for three-dimensional object recognition and detection, and has gained the attention of many research scholars.
The model, based on the multi-view convolutional neural network (MVCNN) [46], is a milestone in multi-view target recognition.It plays a pioneering role by fusing features from multiple views through a maximum pooling layer to generate a simple and efficient descriptor.Gao et al. [47] unified feature extraction and target recognition into a CNN, and proposed an end-to-end pairwise multi-view convolutional neural network.Slice and Concat layers were added and a multi-view pairwise recognition model [48] was established, which could discriminate multiple batches of inputs to improve the robustness of feature extraction.Lin et al. [49] built an end-to-end multi-view three-dimensional fingerprint learning model based on full convolutional networks and Siamese networks.Han et al. [50] developed a SeqViews2SeqLabels model.An attention module was used to assign weights for each view.Sequential views were further aggregated to improve the recognition accuracy.Xu et al. [51] proposed a lightweight multi-view network (PVLNet) based on a parametric view learning mechanism.Views are directly constructed as parameters of PVLNet under the guidance of the parametric view learning mechanism.The gradient descent method was used for automatic optimization.A simplified microscopic deep graph generator was used to ensure the gradient propagation of the images, and the global maximum pooling was used to aggregate the multi-view features extracted by the CNN.The recognition efficiency was significantly improved.
In addition, the fusion of view features is one of the research trends used to reduce the number of operations.Zhang et al. [52] proposed a classification method based on semi-supervised learning, which included CNN-LP (label propagation) with label prop-agation capability and multi-view fusion strategy based on expectation maximization.Zhang et al. [53] designed a multi-view based local information fusion network (MV-LFN) for a 3D shape recognition task.A local relevance attention mechanism was introduced to generate more efficient view descriptors.Local information was effectively extracted and maintained by hierarchical aggregation of multi-view feature maps, which significantly improved the recognition capability.Fei et al. [54] coded the multi-view feature projections as binary codes, and assembled them into block statistical features as recognition descriptors.Liang et al. [55] proposed a new multi-view based hierarchical fusion pooling method (MHFP) for 3D model identification, which hierarchically fused features from multiple views into a compact descriptor.A 3D attention module was designed to mine the correlation between views.Redundant information was removed effectively, and a large amount of essential information was retained.Considering the complex structure and long training period of target recognition algorithms based on deep learning, we proposed the binary multi-view kernel principal component analysis network (BMKP-CANet) model [56] and binary multi-view two-dimensional kernel principal component analysis network (BM2DKPCANet) model [57] successively based on the principal component analysis network (PCANet) model [58].The BM2DKPCANet model can directly perform feature extraction on two-dimensional images.Features are clustered and fused through binary encoding, obtaining better recognition and computational efficiency.This paper conducts a characterization study of viewpoint recognition performance based on BM2DKPCANet model.

Feature Extraction Based on BM2DKPCANet Model
The BM2DKPCANet model for feature extraction of the input sequence images is divided into three stages.As shown in Figure 1, the output features are binary hashed and clustered after two-layer 2DKPCA convolutional filtering, followed by histogram features output.
A simplified microscopic deep graph generator was used to ensure the gradient propagation of the images, and the global maximum pooling was used to aggregate the multi-view features extracted by the CNN.The recognition efficiency was significantly improved.
In addition, the fusion of view features is one of the research trends used to reduce the number of operations.Zhang et al. [52] proposed a classification method based on semi-supervised learning, which included CNN-LP (label propagation) with label propagation capability and multi-view fusion strategy based on expectation maximization.Zhang et al. [53] designed a multi-view based local information fusion network (MV-LFN) for a 3D shape recognition task.A local relevance attention mechanism was introduced to generate more efficient view descriptors.Local information was effectively extracted and maintained by hierarchical aggregation of multi-view feature maps, which significantly improved the recognition capability.Fei et al. [54] coded the multi-view feature projections as binary codes, and assembled them into block statistical features as recognition descriptors.Liang et al. [55] proposed a new multi-view based hierarchical fusion pooling method (MHFP) for 3D model identification, which hierarchically fused features from multiple views into a compact descriptor.A 3D attention module was designed to mine the correlation between views.Redundant information was removed effectively, and a large amount of essential information was retained.Considering the complex structure and long training period of target recognition algorithms based on deep learning, we proposed the binary multi-view kernel principal component analysis network (BMKPCANet) model [56] and binary multi-view two-dimensional kernel principal component analysis network (BM2DKPCANet) model [57] successively based on the principal component analysis network (PCANet) model [58].The BM2DKPCANet model can directly perform feature extraction on two-dimensional images.Features are clustered and fused through binary encoding, obtaining better recognition and computational efficiency.This paper conducts a characterization study of viewpoint recognition performance based on BM2DKPCANet model.

Feature Extraction Based on BM2DKPCANet Model
The BM2DKPCANet model for feature extraction of the input sequence images is divided into three stages.As shown in Figure 1, the output features are binary hashed and clustered after two-layer 2DKPCA convolutional filtering, followed by histogram features output.The process of 2DKPCA convolutional filtering for the first filtering layer is described as follows.Suppose that there are S multi-view image sequences Ii (i = 1, 2, …, S) ∈ R m×n as samples for training.S local feature matrices Ti sized k 2 × mn are obtained after k × k windows for block sliding sampling.Then, the column direction kernel matrix di- The process of 2DKPCA convolutional filtering for the first filtering layer is described as follows.Suppose that there are S multi-view image sequences I i (i = 1, 2, . . ., S) ∈ R m × n as samples for training.S local feature matrices T i sized k 2 × mn are obtained after k × k windows for block sliding sampling.Then, the column direction kernel matrix dimension is Smn × Smn when performing centralized learning training.To reduce the computation, the average column vector is computed instead of the original column vector to obtain a nonlinear mapping sample T i , which is mapped to the kernel feature space as follow: The kernel matrix can be expressed as: The dimensionality is reduced to S × S. The kernel matrix is centralized such that where 1 is the s-order matrix with all 1 components.The eigenvectors corresponding to the first L 1 largest eigenvalues are taken as 2DKPCA filter banks W l 1 = [w 1 , w 2 , . . ., w L1 ], which are convolved with the training sample I i after the zero-filled boundary and then output the first-layer filtering: where * is two-dimensional convolution.
It is also used as input for the second stage of convolutional filtering to further explore the effective classification features.The same steps as described previously are used to obtain the second stage filter set convolution kernel W 2 = [w 1 , w 2 , . . ., w L2 ], The second layer filtering output is obtained by convolving the training samples I l i with the zeropadding boundary.
In order to reduce the data redundancy caused in the process of multi-view image acquisition and reduce the computation time and storage space to solve problems such as large data storage and long-time operation of multiple views, the binary multi-view clustering algorithm [45] is used to optimize the binary encoding and cluster of multiple views jointly.The total optimization function is written as: . . ,g n ] and λ are the regularization parameters.Each of the clustering characteristics after the above optimization is transformed into decimal: ) is an integer, and the range is [0, 2 L 2 −1 ].Each T l m i is divided into Z blocks.For each block, histogram statistics are conducted.Full connection is taken to obtain: f m i is the BM2DKPCANet feature of the ith sample from the m th perspective.

Geometric Model for Target Observation
To search for the best path for accurate identification of multi-view targets, a geometry model for target observation is first built.Then, the best path search problem is transformed into a solving optimization problem.
Within the working space of the robot arm, the observation geometry model is constructed as large as possible to obtain a better observation path.For meeting the requirements, an outer envelope sphere of the target is constructed.The radius range of the observation model is 3-5 times the radius of the outer envelope sphere.The hemispherical ring observation model shown in Figure 2 is constructed based on the target center position.The minimum and maximum observation circle radii of the observation area boundary are R min and R max , respectively.Suppose that the target centroid is located at the origin of coordinates O(0,0,0) and R is the distance of the camera viewpoint from the coordinate origin, that is, arc radius of perspective.Due to the specificity of the target observation model, a spherical coordinate system is used for position representation.Then, the viewpoint imaging coordinates of point P are denoted as P (R, λ, θ,), λ is the angle between the positive direction of z-axis and the directed line segment OP, 0 ≤ λ ≤ π/2, and θ (0 ≤ θ ≤ 2π) is the angle between the positive direction of the x-axis and the projection of the directed line segment OP on the xOy plane.The right-angle coordinate system is converted as: Algorithms 2022, 15, x FOR PEER REVIEW 9 of 28 The location of the optimal viewpoint is optimized by MHWOA, and the camera observation path passes through a series of imaging points and is planned as a polynomial function for interpolation to form a smooth curve.The camera observation path can be expressed as a sequence of coordinates as: where P v p = (Rp, θp, λp), p = 1, 2, ..., k, are the coordinates of each imaging point, Qq = (θq, λq, Rq) (q = 1, 2, …, nk−1) are coordinates of the interpolation points between each imaging point.
The coordinate set of obstacles in the observation scene are set as B = {B1,B2,..., BE}, where Be = (Re, λe, Re), (e = 1, 2, ..., E) is the position coordinate of the eth surface obstacle.As one of the important measurement parameters, the influence range of obstacles The conversion relation from a right-angle coordinate system to a spherical coordinate system is The location of the optimal viewpoint is optimized by MHWOA, and the camera observation path passes through a series of imaging points and is planned as a polynomial function for interpolation to form a smooth curve.The camera observation path can be expressed as a sequence of coordinates as: where P v p = (Rp, θp, λp), p = 1, 2, ..., k, are the coordinates of each imaging point, Q q = (θ q , λ q , R q ) (q = 1, 2, . . ., n k−1 ) are coordinates of the interpolation points between each imaging point.
The coordinate set of obstacles in the observation scene are set as B = {B1,B2,..., BE}, where Be = (Re, λe, Re), (e = 1, 2, ..., E) is the position coordinate of the eth surface obstacle.As one of the important measurement parameters, the influence range of obstacles on the path of manipulator can be expressed as parameter vector β (β 1 , β 2 , . . ., β E ).
The observation area is uniformly divided according to the latitude and longitude regions.Assuming that the longitude direction is uniformly divided into M lo regions, and the dimensional direction is uniformly divided into M la regions, then the observation model is divided into M lo × M la arc cone regions, and at most one imaging point is defined to exist in each arc region.

Objective Functions and Constraints
The path planning of viewpoints only needs to consider the shortest movement route and high efficiency of viewpoint recognition.The objective function of the mathematical model that transforms the viewpoint seeking path planning problem into a multi-objective optimization problem at this point can be expressed as: max R(X) min D(P c , P 1 ) (12) where the monotonically increasing function R(X) represents the recognition performance of the multi-view target recognition system.The larger the function value is, the higher the recognition accuracy is.The information contained in the images obtained from different viewpoints at different locations is different, and the function R(X) is also different.D(P c , P 1 ) denotes the distance from the starting point P 1 of the observation path to the target point P c .
It is very difficult to find a specific analytic function to represent to characterize the recognition performance of the system theoretically.Based on the previous study of multiview recognition of targets, this section selects the Softmax classifier for classification and outputs the category posterior probabilities to characterize the recognition performance.For more accurate identification, while the design of the classifier is further improved, combined with integrated learning theory, the single classifier is effectively combined, as shown in Figure 3 for the basic structure of BM2DKPCANet network-integrated multi-view classification.x i is the original image acquired by the camera, and the imaging viewpoint of the pair of factors is P(x i ) = (R p , θ p , λ p ).The multi-viewpoint image is decomposed into image pairs and input to the two-viewpoint base classifier for classification and recognition, and then all the reco_gnition results are fused using an integrated learning method.In order to reduce the planning complexity and the number of divided intervals, this paper sets that the subsequent planning is carried out along the sphere with R as the radius when the viewpoint enters within the allowed boundary region, i.e., R is between Rl and Ru.According to the region division of the target observation model, the sphere is homogenized into Mlo•Mla regions, and the set of view intervals can be expressed as: ij φ is used to express the above perspective range.The set of perspective range can be further expressed as: In order to reduce the planning complexity and the number of divided intervals, this paper sets that the subsequent planning is carried out along the sphere with R as the radius when the viewpoint enters within the allowed boundary region, i.e., R is between R l and R u .According to the region division of the target observation model, the sphere is homogenized into M lo •M la regions, and the set of view intervals can be expressed as: Φ ij is used to express the above perspective range.The set of perspective range can be further expressed as: Each original image x i belongs to only one of the view intervals.The classifier classifies only the image pairs within the viewpoint interval, and the original data set is reorganized into a series of image pairs as: where d θ is perspective interval difference in the dimensional direction and d λ is perspective interval difference in longitude.Some of these image pairs are randomly selected as training samples, and other image pairs are used as test samples to evaluate the recognition performance of the constructed BM2DKPCANet + Softmax classifier.
Based on the above approach, the final recognition results of the multi-view classifier after integrated learning can be expressed by the following equation: where {x 1 , x 2 , . . ., x k } is a sequence of input multi-view images.∏(•) is the index of the position of the maximum element of the input vector.p i (y|x i , x i+1 ) is the C-dimensional posterior probability vector derived from the image classification by the base classifier, expressed as where p i (y h |x i , x i+1 ) (h = 1, 2, . . .,C) is the category posterior probability recognized of the input image by the basic classifier.It can be expressed as: where z is the input of the classifier of the i-th identification network.
The constraint of viewpoint seeking path planning for robotic arms requires that the planned path points are in the target observation space, and also belong to the reachable workspace of the robotic arm.The difference in viewpoints between adjacent path points cannot exceed the number of base classifiers.The arm joint angles are within the joint limit angles.The speed, acceleration and pulse speed of each joint cannot exceed the corresponding maximum value.In summary, the mathematical model for transforming the viewpoint seeking path planning problem into a multi-objective optimization problem can be expressed as: where Ω is the viewpoint position search space, W is the accessible workspace of the manipulator, Φ ij is the range of view angle, d θ is the view angle difference of latitude, d λ is the view angle difference of longitude, k is the number of view angles in path planning, C l is the number of basic classifiers, θ s is the joint angle of the sth joint, θ d s and θ u s are the upper and lower boundary of the angle movement for the sth joint, respectively.t time, respectively.V s max , A s max and J s max are respectively the maximum velocity, maximum acceleration and maximum pulse of the sth joint of the manipulator.

Optimal Path Planning Based on A* Algorithm for Multi-View Recognition
This section abstracts the recognition performance distribution of the base classifier as a directed graph model for planning.The defined recognition rate of the base classifier for image pairs with inputs from the set of view intervals {Φ ij , Φ (i+d θ )(j+d λ ) } is denoted as The adjacency matrix corresponding to the recognition accuracy of the directed graph is: where When i = j, the matrix Aij = Aii is shown in Equation (21).Cl is defined as the number of basic classifiers.d  = 1,2..., Cl.When j = i + d  ≤ Mla, the specific elements of matrix Aij = Ai(i+ d  ) is shown in Equation (22).When i ≥ Mla − d  +1 and j = i + d  − Mla, the specific elements of matrix Aij =  The adjacency matrix corresponding to the recognition accuracy of the directed graph is: where 21).C l is defined as the number of basic classifiers.
A i(i+ Given the starting imaging point, the multi-view target viewpoint optimization path planning in barrier-free environment is equivalent to the directed graph path search problem with a given starting point.The number of nodes to be searched is k − 1, and the maximum path corresponding to the sum of edge weights is the search path.In order to facilitate the graph search, the problem of maximizing the sum of edge weights is transformed into the problem of minimizing the sum of search path weights by finding the reciprocal and normalized to (0,1): where R ˆemax ,R ˆemin are the maximum and minimum values of the reciprocal of edge weights, respectively.ε is the parameter to ensure the continuity of directed edges.The transformed matrix R e becomes a new recognition performance matrix.In addition to recognition performance, it also ensures the shortest end running distance to improve operation efficiency.When the end enters the set observation radius range, the fixed radius does not move, and the path planning is carried out on the semicircular sphere with this radius as the radius.Similarly, viewpoint intervals are set as the vertexes of the directed graph, and the distances between two viewpoints are defined as the edge weights of the directed graph.The adjacency matrix D R is constructed, which can be expressed as: where . .,M la , j = 1,2, . . .,M la .
When i = j, the specific elements of D ij = D ii are shown in Equation ( 26).
Considering the two optimization problems of optimal recognition performance and shortest planning path, based on the same directed graph model, the weight ω is introduced to adjust the recognition performance and planning path distance to further transform the problem into a single optimization problem, which is the shortest path search on the directed graph.The directed graph is defined as G = (Φ ij ,A), then the fused adjacency matrix can be expressed as: As a typical heuristic search algorithm in artificial intelligence, the A* algorithm has been widely used in path solving.The algorithm introduces the path evaluation formula when selecting nodes: where g(n) is actual cost from initial state to state n, h(n) is the estimation cost of the optimal path from the state n to the target state.In this section, the A* algorithm is used to solve the single-source shortest path problem of the fused directed graph, and the sum of the edge weights of the fused directed graph is defined as the cost function to complete the viewpoint optimization path planning in a barrier-free environment.The specific search process is shown in Figure 6.The A* algorithm uses two lists of openList and closeList to place the nodes to be examined and the nodes that have been examined, and sets the subsequent node table to store the subsequent nodes.Through the above operations, k viewpoints with the minimum sum of edge weights in P can be obtained.However, since the path searched by the A* algorithm is a spatial straight-line segment, in order to prevent the jitter and vibration of the manipulator during operation, the B-spline curve is used for interpolation and smoothing between the viewpoints of the end path to obtain the continuous and stable optimal observation path of the target, and the target recognition results are obtained through the classifier mentioned above.

Construction of Experimental Platform
The experimental platform include the UR5 six degree of freedom industrial robot developed by Universal Robots, the stereo vision Bumblebee2 binocular camera developed by Point Grey Research of Canada, the Fastec IL5 high-speed camera, the 2F-85 ROBOTIQ double finger electric gripper, the image acquisition card matching the UR5  UR5 manipulator is a cooperative manipulator with small shape, simple programming, flexible action and safe use.It can complete loading, unloading, sorting, assembly and other work in production.Its effective load is 5 ± 0.5 kg, the maximum working range can reach the sphere space with 850 mm radius, the motion range of each joint is (−2π, 2π), and the maximum motion speed is 180°/s.Two kinds of cameras are used in this experiment.Fastec IL5 high-speed camera is fixed and used as a global camera for monitoring.Combined with the experimental content, it will not be introduced here.The Bumblebee2 binocular camera is fixed at the end of the manipulator to obtain the local features of the target and the depth information of the point cloud, and complete the viewpoint optimization path.

Camera Calibration
The binocular camera is connected to the end of the manipulator by a fixed clip to ensure that there is no relative motion between the manipulator and the camera during the motion.The calibration plate is fixed in front of the manipulator to calibrate the 'eye on hand'.Under the premise of ensuring that the left and right cameras can observe a complete chessboard, the manipulator is controlled to reach 25 different poses, and the images are collected.The images collected by the left and right cameras are saved respectively, and calibrated based on Matlab.Some calibration results are shown in Figure 8.After removing the image pairs with large errors in the calibration process, the average calibration error is 0.15 pixels.Camera parameters are then exported.The internal parameter matrix of camera 1 is:  UR5 manipulator is a cooperative manipulator with small shape, simple programming, flexible action and safe use.It can complete loading, unloading, sorting, assembly and other work in production.Its effective load is 5 ± 0.5 kg, the maximum working range can reach the sphere space with 850 mm radius, the motion range of each joint is (−2π, 2π), and the maximum motion speed is 180 • /s.Two kinds of cameras are used in this experiment.Fastec IL5 high-speed camera is fixed and used as a global camera for monitoring.Combined with the experimental content, it will not be introduced here.The Bumblebee2 binocular camera is fixed at the end of the manipulator to obtain the local features of the target and the depth information of the point cloud, and complete the viewpoint optimization path.

Camera Calibration and Motion Control 4.2.1. Camera Calibration
The binocular camera is connected to the end of the manipulator by a fixed clip to ensure that there is no relative motion between the manipulator and the camera during the motion.The calibration plate is fixed in front of the manipulator to calibrate the 'eye on hand'.Under the premise of ensuring that the left and right cameras can observe a complete chessboard, the manipulator is controlled to reach 25 different poses, and the images are collected.The images collected by the left and right cameras are saved respectively, and calibrated based on Matlab.Some calibration results are shown in Figure 8.After removing the image pairs with large errors in the calibration process, the average calibration error is 0.15 pixels.Camera parameters are then exported.The internal parameter matrix of camera 1 is:

Control of Manipulator
The UR5 manipulator is controlled based on the ROS system in Linux environme The system includes motion planning, 3D perception and other functional modules.T MoveIt! module for manipulator controlling can realize modeling, kinematics calcu tion, path obstacle avoidance planning, etc.The structure is shown in Figure 9.The c node is move_group, which connect to plug-ins to implement various functions.T Ubuntu version used in this experiment is 16.04, and the ROS version is Kinetic.T used toolkits are listed as follows.The ros_kinetic_ur_description contains the mo information of UR series robots.The ur5_ros_controllers are connected with the ro hardware interface to obtain the robot information data in real time, which are sen the manipulator driving part for path planning.The ROS Param Server is a parame server which is used to define setting parameters, model files and configuration fi The Robot 3D sensors can obtain the image data from the camera, which are used identify location, positioning, grasping, obstacle avoidance, etc.The User interface p vides a series of API for function customization.The ur_ros_ sensors can obtain the s sor data of the manipulator and calculate the motion state of the robot.

Control of Manipulator
The UR5 manipulator is controlled based on the ROS system in Linux environment.The system includes motion planning, 3D perception and other functional modules.The MoveIt! module for manipulator controlling can realize modeling, kinematics calculation, path obstacle avoidance planning, etc.The structure is shown in Figure 9.The core node is move_group, which connect to plug-ins to implement various functions.The Ubuntu version used in this experiment is 16.04, and the ROS version is Kinetic.The used toolkits are listed as follows.The ros_kinetic_ur_description contains the model information of UR series robots.The ur5_ros_controllers are connected with the robot hardware interface to obtain the robot information data in real time, which are sent to the manipulator driving part for path planning.The ROS Param Server is a parameter server which is used to define setting parameters, model files and configuration files.The Robot 3D sensors can obtain the image data from the camera, which are used to identify location, positioning, grasping, obstacle avoidance, etc.The User interface provides a series of API for function customization.The ur_ros_ sensors can obtain the sensor data of the manipulator and calculate the motion state of the robot.

Optimization Path Planning for Multi-View Recognition
In order to test the performance of the proposed multi-view target optimal view path planning method in barrier free environment, this section takes the medicine box as the recognition object and generates the training set and verification set.Different starting points and different numbers of view angles are set to verify.

Optimization Path Planning for Multi-View Recognition
In order to test the performance of the proposed multi-view target optimal view path planning method in barrier free environment, this section takes the medicine box as the recognition object and generates the training set and verification set.Different starting points and different numbers of view angles are set to verify.

Path Planning with Different Perspectives
The viewing angle interval is divided into 36 parts in the latitude direction and nine parts in the longitude direction.The number of divided viewing angle intervals of the target observation model is 324.The images of the medicine box are collected, the data set corresponding to the viewing angle interval Φ ij (i = 1, 2, . . ., 36; j = 1, 2, . . ., 9) is reorganized, obtaining 324 groups of data.There are 15 original images in each group of data, and any two groups of data can generate 15 × 15 image pairs, which are used for classification training and verification of basic classifiers.Set the number of basic classifiers C l = 3.The input image size is 128 × 128.The sampling block size is 5 × 5.The size of histogram block is 8 × 8.The block overlap rate is 0.5.Gaussian kernel function is selected.In adjacency matrix D i , dˆ1 = 0.3, dˆ2 = 0.6, dˆ3 = 1, ω = 0.5, ε = 1 × 10 −6 .Firstly, the starting point is set to interval Φ 11 , and the number of viewing angles k = 3, 4, 5 are taken respectively.The observed viewpoint optimization path is shown in Figure 10.The first line is the starting viewpoint.When the number of viewing angles is selected as three, the planned camera will reach the position of lines 2, 3 and 4 from the position of line 1 one by one.Similarly, increasing the number of viewing angles, the manipulator adds the planning to the position of line 5 and line 6.According to the image recognition results, when the number of viewing angles is three, the recognition accuracy has reached 94.63%.Increasing the number of viewing angles, although the recognition accuracy has increased, it has little change.When the number of viewing angles is 5, the planned path is shown in Figure 11.It can be seen that the initial path is mainly the elevation in the Z-axis direction, and then move in the X-axis and Y-axis.The target is observed in multiple directions to improve the recognition accuracy.Different starting points are selected to determine the optimal viewpoint path planning.The top views of the observation models are shown in Figures 12-14.Figure 12 shows the optimal path when the number of viewing angles is three.The starting point of the red real line track is φ11, the starting point of the green dotted line track is φ(11)2, and the starting point of the blue dotted line track is φ(25)3.

Path Planning with Different Starting Points
Different starting points are selected to determine the optimal viewpoint path planning.The top views of the observation models are shown in Figures 12-14.Figure 12 shows the optimal path when the number of viewing angles is three.The starting point of the red real line track is φ11, the starting point of the green dotted line track is φ(11)2, and the starting point of the blue dotted line track is φ(25)3.tion is about 50°, and then the trend is mainly planned along the latitude direction.When the angle of view is 3, the path planning can complete the identification of two sides and top surfaces.Increasing the viewpoint angles, the movement is mainly along the latitude direction.It can be concluded that this method can well complete the viewpoint optimization path planning under different starting points and different viewing angle constraints in the barrier free environment, and effectively complete the optimal balance between recognition performance and search speed.When the angle of view is 3, the path planning can complete the identification of two sides and top surfaces.Increasing the viewpoint angles, the movement is mainly along the latitude direction.It can be concluded that this method can well complete the viewpoint optimization path planning under different starting points and different viewing angle constraints in the barrier free environment, and effectively complete the optimal balance between recognition performance and search speed.Figure 13 is the optimal path with viewing angles = 4, Figure 14 is the optimal path with viewing angles = 5, and the color and track starting point is consistent with Figure 12.By comparing the three figures, it can be seen that the planned path can quickly reach the better viewing point.With the bottom as the benchmark, the longitude direction is about 50 • , and then the trend is mainly planned along the latitude direction.When the angle of view is 3, the path planning can complete the identification of two sides and top surfaces.Increasing the viewpoint angles, the movement is mainly along the latitude direction.It can be concluded that this method can well complete the viewpoint optimization path planning under different starting points and different viewing angle constraints in the barrier free environment, and effectively complete the optimal balance between recognition performance and search speed.
The performance of target recognition is compared with different starting viewpoints and different viewing angles.When the number of viewing angles is 3, the recognition accuracy of different starting viewpoints is shown in Table 1.The data is the identification accuracy starting from the intersection area of longitude and latitude.It can be seen that the recognition accuracy is different at different starting points.As can be seen from the average, when the longitude is 5, that is, 75 • from the horizontal bottom, the average recognition rate is the highest.Taking it as the boundary, the longitude value decreases gradually, and the recognition accuracy decreases gradually.When the longitude value is greater than 5, the higher the longitude and the higher the recognition angles, the recognition accuracy gradually decreases.Representative starting points of different regions are selected to carry out the influence of different viewing angles on the recognition effect.As shown in Figure 15, the abscissa is the interval where the starting viewpoint is located, and the ordinate is the corresponding recognition rate.The dotted line with diamond mark represents the recognition rate with 3 viewing angles, the dotted line with * mark represents the recognition rate with 4 viewing angles, and the solid line with triangle mark represents the recognition rate with 5 view angles.
fluence of different viewing angles on the recognition effect.As shown in Figure 15, the abscissa is the interval where the starting viewpoint is located, and the ordinate is the corresponding recognition rate.The dotted line with diamond mark represents the recognition rate with 3 viewing angles, the dotted line with * mark represents the recognition rate with 4 viewing angles, and the solid line with triangle mark represents the recognition rate with 5 view angles.It can be seen from the Figure 15 that the recognition accuracy has a gradual downward trend, tending to the vertex of the hemisphere of the starting point.The starting point is at the 8th and 9th layers of longitude especially, because the set directed graph model is in a single direction.The more it tends to the vertex, the less the observation surface is, and the less the extracted features are.Even if the perspective is increased, the increase in recognition effect is not obvious due to the particularity of its location.Therefore, in practical application, try to avoid a starting point close to the top position.However, on the whole, the average recognition rate is 91.51% in the case of three viewing angles, 92.25% in the case of four viewing angles and 93.13% in the case of five viewing angles.The results show that the more viewing angles, the higher the target recognition rate.This method can obtain a higher recognition rate in an obstacle-free environment.
Through the above experimental analysis, it can be concluded that the proposed directed graph perspective optimization path planning method based on the A* algorithm considering the performance and efficiency of target recognition is feasible, and can obtain more accurate recognition results in a shorter path.

Conclusions
In order to enhance the robustness of vision-based robots in natural environment, a dynamic recognition method was used to improve the recognition performance.For the common linkage robots in industry, the end-camera was driven by a manipulator to complete the target observation with high accuracy efficiently in this paper.The perspective optimization path planning of the manipulator was studied, and the path planning method for finding optimal view angles based on the A* algorithm was proposed.An observation geometry model was established and the viewing angle intervals were divided.Based on the active viewpoint selection mechanism, a multi-view classifier It can be seen from the Figure 15 that the recognition accuracy has a gradual downward trend, tending to the vertex of the hemisphere of the starting point.The starting point is at the 8th and 9th layers of longitude especially, because the set directed graph model is in a single direction.The more it tends to the vertex, the less the observation surface is, and the less the extracted features are.Even if the perspective is increased, the increase in recognition effect is not obvious due to the particularity of its location.Therefore, in practical application, try to avoid a starting point close to the top position.However, on the whole, the average recognition rate is 91.51% in the case of three viewing angles, 92.25% in the case of four viewing angles and 93.13% in the case of five viewing angles.The results show that the more viewing angles, the higher the target recognition rate.This method can obtain a higher recognition rate in an obstacle-free environment.
Through the above experimental analysis, it can be concluded that the proposed directed graph perspective optimization path planning method based on the A* algorithm considering the performance and efficiency of target recognition is feasible, and can obtain more accurate recognition results in a shorter path.

Conclusions
In order to enhance the robustness of vision-based robots in natural environment, a dynamic recognition method was used to improve the recognition performance.For the common linkage robots in industry, the end-camera was driven by a manipulator to complete the target observation with high accuracy efficiently in this paper.The perspective optimization path planning of the manipulator was studied, and the path planning method for finding optimal view angles based on the A* algorithm was proposed.An observation geometry model was established and the viewing angle intervals were divided.Based on the active viewpoint selection mechanism, a multi-view classifier based on BM2DKPCANet and Softmax was constructed.The output class probability was set as the viewpoint measure.The best identification performance and the shortest distances were taken as the objective functions of path planning.The directed graph programming model was constructed, and the directed graph adjacency matrix based on the planning distance was determined.The multi-objective optimization problem was transformed into the singleobjective optimization problem by fusing the two adjacent matrices.The shortest path was planned based on the A* algorithm.The experiments were performed with different view count and different starting viewpoints.The results demonstrated that the method could plan the optimal path with higher recognition accuracy and efficiency.
where α m is the weight of the m th perspective of some fruit box, m = 1, ..., M. Different perspectives have different weights.r > 1 is the scalar controlling the weight.B = [b 1 , . . . ,b n ], b i is the collaborative binary code for the ith instance.Each code b is represented by the product of clustering centroids C and indicator vector g.U m is the mapping matrix under the m th perspective.Φ(O l m ) is the kernel function based on nonlinear RBF mapping of the output features after the second layer KPCA and randomly selected sample points from the m th perspective.γ is a non-negative constant.G = [g 1 , .

Algorithms 2022 ,
15,  x FOR PEER REVIEW 10 of 28 formance.For more accurate identification, while the design of the classifier is further improved, combined with integrated learning theory, the single classifier is effectively combined, as shown in Figure3for the basic structure of BM2DKPCANet network-integrated multi-view classification.xi is the original image acquired by the camera, and the imaging viewpoint of the pair of factors is P(xi) = (Rp, θp, λp).The multi-viewpoint image is decomposed into image pairs and input to the two-viewpoint base classifier for classification and recognition, and then all the reco_gnition results are fused using an integrated learning method.

Figure 3 .
Figure 3. Multi-view classifier.The structure of the two-view classifier network is shown in Figure 4. To better describe the recognition performance optimization function, the Softmax classifier is chosen to calculate the posterior probability to classify the image attributes.

Figure 3 .
Figure 3. Multi-view classifier.The structure of the two-view classifier network is shown in Figure4.To better describe the recognition performance optimization function, the Softmax classifier is chosen to calculate the posterior probability to classify the image attributes.

.
are the actual velocity, acceleration and pulse of the sth joint of the manipulator at
Equation (23).Aij of the other cases is the 0 matrix f Mlo × Mlo.

. d = 1 ,
2..., Cl.When j = i + .d ≤ M la , the specific elements of matrix A ij = A i (i+ .d) is shown in Equation (22).When i ≥ M la − .d + 1 and j = i + .d − M la , the specific elements of matrix A ij = A i(i+ .d−M la ) are shown in Equation (23).A ij of the other cases is the 0 matrix f M lo × M lo .
When i ≥ M la − .d + 1 and j = i + .d − M la , specific elements of matrix D i(i+ .d−M la ) are listed in Equation (27).The difference of two non-0 elements in each row is .d columns.D ij of the other cases are M lo × M lo 0 matrix.

Figure 6 .
Figure 6.Path planning flow chart based on A* algorithm.

4. 3 . 1 .
Path Planning with Different PerspectivesThe viewing angle interval is divided into 36 parts in the latitude direction and nine parts in the longitude direction.The number of divided viewing angle intervals of the target observation model is 324.The images of the medicine box are collected, the data set corresponding to the viewing angle interval φij(i = 1, 2, …, 36; j = 1, 2, …, 9) is reorganized, obtaining 324 groups of data.There are 15 original images in each group of data, and any two groups of data can generate 15 × 15 image pairs, which are used for classification training and verification of basic classifiers.Set the number of basic classifiers Cl = 3.The input image size is 128 × 128.The sampling block size is 5 × 5.The size of histogram block is 8 × 8.The block overlap rate is 0.5.Gaussian kernel function is selected.In adjacency matrix Di, d^1 = 0.3, d^2 = 0.6, d^3 = 1, ω = 0.5, ε = 1 × 10 −6 .Firstly, the starting point is set to interval φ11, and the number of viewing angles k = 3,4,5 are taken respectively.The observed viewpoint optimization path is shown in Figure10.

Figure 10
Figure 10 lists the top view, front view and camera recognition view that can characterize the position relationship with the target object at different viewpoints in path planning.The first line is the starting viewpoint.When the number of viewing angles is selected as three, the planned camera will reach the position of lines 2, 3 and 4 from the position of line 1 one by one.Similarly, increasing the number of viewing angles, the

Figure 10
Figure 10 lists the top view, front view and camera recognition view that can characterize the position relationship with the target object at different viewpoints in path planning.The first line is the starting viewpoint.When the number of viewing angles is selected as three, the planned camera will reach the position of lines 2, 3 and 4 from the position of line

Algorithms 2022 ,
15,  x FOR PEER REVIEW 22 of 28 manipulator adds the planning to the position of line 5 and line 6.According to the image recognition results, when the number of viewing angles is three, the recognition accuracy has reached 94.63%.Increasing the number of viewing angles, although the recognition accuracy has increased, it has little change.When the number of viewing angles is 5, the planned path is shown in Figure11.It can be seen that the initial path is mainly the elevation in the Z-axis direction, and then move in the X-axis and Y-axis.The target is observed in multiple directions to improve the recognition accuracy.

Figure 11 .
Figure 11.End three-dimensional path map.4.3.2.Path Planning with Different Starting PointsDifferent starting points are selected to determine the optimal viewpoint path planning.The top views of the observation models are shown in Figures12-14.Figure12shows the optimal path when the number of viewing angles is three.The starting point of the red real line track is Φ 11 , the starting point of the green dotted line track is Φ (11)2 , and the starting point of the blue dotted line track is Φ (25)3 .

Figure 15 .
Figure 15.Recognition performance with different start points and view angles.

Figure 15 .
Figure 15.Recognition performance with different start points and view angles.
iM lo ]

Table 1 .
Recognition with different starting points with 3 viewing angles.