Automatic Positioning of Street Objects Based on Self-Adaptive Constrained Line of Bearing from Street-View Images

: In order to realize the management of various street objects in smart cities and smart transportation, it is very important to determine their geolocation. Current positioning methods of street-view images based on mobile mapping systems (MMSs) mainly rely on depth data or image feature matching. However, auxiliary data increase the cost of data acquisition, and image features are difﬁcult to apply to MMS data with low overlap. A positioning method based on threshold-constrained line of bearing (LOB) overcomes the above problems, but threshold selection depends on speciﬁc data and scenes and is not universal. In this paper, we propose the idea of divide–conquer based on the positioning method of LOB. The area to be calculated is adaptively divided by the driving trajectory of the MMS, which constrains the effective range of LOB and reduces the unnecessary calculation cost. This method achieves reasonable screening of the positioning results within range without introducing other auxiliary data, which improves the computing efﬁciency and the geographic positioning accuracy. Yincun town, Changzhou City, China, was used as the experimental area, and pole-like objects were used as research objects to test the proposed method. The results show that the 6104 pole-like objects obtained through object detection realized by deep learning are mapped as LOBs, and high-precision geographic positioning of pole-like objects is realized through region division and self-adaptive constraints (recall rate, 93%; accuracy rate, 96%). Compared with the existing positioning methods based on LOB, the positioning accuracy of the proposed method is higher, and the threshold value is self-adaptive to various road scenes.


Introduction
Roads are important parts of cities. Street objects on both sides of the road are an important part urban infrastructure management, intelligent transportation system construction, and unmanned high-precision maps [1,2]. Achieving fast and accurate collection of street objects has become an important task for the digital construction of cities and traffic, as well as the realization of automatic driving. The geolocation and attributes of street objects are important indicators for collection, especially location information, which is the basis of the street object and one of the most important factors. The location of street objects can assist in road asset management [3] and calculation of the safety risk index to evaluate road safety [4]. The presence of appropriate street objects in the right places can effectively reduce traffic risks, for example, by placing easy-to-read guide signs on curves [5]. Therefore, it is very important to perform effective geolocation and attribute acquisition of street objects. In order to obtain geolocation and attribute information of the street objects on both sides of a road, an effective data collection and extraction method is required. Current data collection methods include manual field measurement, aerial remote sensing images, and mobile mapping systems (MMS). Manual field measurement requires a large number of professionals to conduct external measurements, and the labor cost is relatively high. Aerial remote sensing images observe the road surface from a top-down perspective, which can better collect large-area objects, such as road markings [6]. However, the orthographic projection area of narrow and vertical objects on both sides of the road is small and is therefore difficult to capture, lacks local details [7], and is affected by high-rise buildings and trees on both sides of the road. Regarding MMS, which can observe street objects on both sides of the road from a side view, the observation results are more in line with the visual psychology of people observing the objects and are more easily extracted [8]. MMS-based measurement relies on the system being equipped with a high-precision global navigation satellite system (GNSS) to realize positioning, a high-frequency inertial measurement unit (IMU) to realize attitude determination, a high-resolution camera to realize street-view image shooting, and a high-speed laser to realize distance measurement of street objects [9]. As a measurement device, LIDAR can extract the precise location of street objects from point cloud data obtained by scanning, although the cost is high [10,11]. Compared with images, point cloud is more challenging in semantic segmentation, especially for complex scenes where the technology is immature. Therefore, based on a camera mounted on the system, with the help of current, relatively mature image semantic analysis [12] and object-detection methods [13], the localization of street objects based on multi-view street-view images has become a lower-cost alternative [14,15].
The traditional object-positioning method based on multi-view street-view images acquired by MMS relies on visual matching. According to the image features, corresponding points are matched, and positioning is realized by geometric constraints generated by the corresponding points. Chang et al. demonstrated the feasibility of object positioning based on multi-view street-view images by manually matching corresponding points [16]. Nassar takes the camera location, camera spacing, and the heading angle of the target obtained by the MMS system as input parameters, applies geometric soft constraints to the Siamese convolutional neural network, and relies on the matching objects in multiple views to achieve triangulation positioning [17]. Ogawa proposed the use of a map and the location of buildings in images to correct the camera-position parameters of the captured image, thereby improving the object-recognition accuracy and geographic positioning accuracy of the image [18]. Zhu et al. performed street-to-aerial image matching based on an improved Siamese convolutional neural network to estimate the geolocation and orientation of targets in the street-view images [19]. When trying to automatically match corresponding points, due to the similarity of objects and backgrounds in multi-view images, it is difficult to automatically distinguish similar objects in the same background [20]. Multi-view images acquired by MMS have fewer overlapping areas, and it is difficult to achieve satisfactory results in visual matching using image object key points or descriptors [21,22]. The objects in images acquired by MMS have similar visual features of the same type and different instances and different visual features of the same instance and different perspectives, which makes multi-view visual matching difficult.
In order to resolve the difficulty of visual matching of multi-view images acquired by MMS, scholars have tried to transform the visual matching positioning problem into a passive positioning problem [23][24][25]. Firstly, the object of interest is detected from the multi-view images, and then the orientation of the object of interest relative to the shooting position is calculated according to the pose parameters. The orientation is then represented using line of bearing (LOB), and finally, the possible location of the object of interest is calculated by LOB intersection. Chu et al. proposed a deep-learning-based method for determining the orientation of objects in images [26]. This method provides a new form of LOB orientation acquisition and correction, as well as improvement in the positioning accuracy based on the LOB method. Hazelhoff et al. used the intersection and aggregation center of object LOBs in multiple views as the positioning result, although the result may contain a large number of ghost nodes [27,28]. Based on the Markov random field optimization method, Krylov et al. introduced the distance data of objects into decision making to reduce ghost nodes, although the object-positioning results often had considerable randomness [21]. Nassar integrates object detection and depth estimation into an end-to-end graph neural network and relies on the estimated depth information to determine the operating distance of the LOB generated by the object-detection result to achieve geographic positioning [20]. Similarly, Lumnitz et al. applied a monocular depthestimation algorithm and triangulation to Google Street View and Mapillary and gathered adjacent LOB intersections into a cluster to realize meter-level geographic positioning of urban trees [29]. These methods all introduce depth information, the accuracy of which the positioning accuracy depends on. Zhang proposed a method of modified brute-force-based LOB measurement, which reduces the influence of ghost nodes on the positioning results of utility poles without introducing other data and can obtain stable positioning results [30]. Khan realized the geographic positioning of eucalyptus trees on both sides of a road based on the modified brute-force-based LOB measurement, which verified the feasibility of this method once again [31]. However, most of the current LOB-based positioning methods require a large number of threshold constraints, and it is difficult to select an appropriate threshold value. For example, the modified brute-force-based LOB measurement needs to consider the width of the road and other factors, which is difficult to apply to object extraction on both sides of a road with a wide range and different widths.
Aiming at the applicability of threshold selection for different road scenes based on LOB object positioning, in this paper, we propose an automatic positioning method for street objects based on LOB with adaptive constraints. This method can automatically divide the calculation area into grids based on the calibrated effective collection distance and driving trajectory, which constrains the effective range of LOB intersection calculation. According to the relationship between the LOB and the intersection point, two constrained rules independent of the threshold are proposed to further eliminate the ghost nodes generated by LOB intersection. The reserved calculation results are automatic positioning results of street objects. The algorithm presented in this paper is not affected by factors such as road width, and it is suitable for different road scenes. The adaptive selection of the LOB constraint threshold is realized by the effective shooting distance of the MMS images and the driving trajectory, which has universal applicability and generalizability.
The remainder of this paper is arranged as follows. Section 2 introduces the implementation of the proposed method and expounds the basic principle. Section 3 introduces Yincun town, Changzhou City, Jiangsu Province, China, as the experimental area, with street-view images of the area collected by MMS as the experimental data, and we introduce the experimental process and results in detail. Then, we compare and analyze the automatic positioning method of street objects based on the self-adaptive constraint LOB proposed in this paper and the modified brute-force-based LOB positioning method. Different thresholds are illustrated and discussed. Finally, the full text is summarized.

Methodology
In this paper, we propose an automatic positioning method for objects in streetview images with self-adaptive constraint LOB, which mainly includes two parts: LOB mapping based on object-detection results and LOB-based geographic positioning. The specific process is shown in Figure 1. LOB mapping based on object-detection results includes object detection marked by a bounding box and simulation of line of sight based on LOBs. LOB-based geographic positioning includes grid division based on driving trajectory, acquisition of intersection points based on LOBs, and elimination of ghost nodes by constrained rules. In the LOB-based geographic positioning process, the grid division based on driving trajectory proposed in this paper improves computational efficiency. It also provides self-adaptive constraints for eliminating ghost nodes. It works with other constraints to improve the positioning accuracy of street objects. quisition of intersection points based on LOBs, and elimination of ghost nodes by constrained rules. In the LOB-based geographic positioning process, the grid division based on driving trajectory proposed in this paper improves computational efficiency. It also provides self-adaptive constraints for eliminating ghost nodes. It works with other constraints to improve the positioning accuracy of street objects.

LOB Mapping Based on Object-Detection Results
In order to realize image-based object positioning, it is necessary to detect the object in the image. There are relatively mature image object-detection algorithms that have achieved good detection results in street object detection [11,32,33]. In this paper, a cascade region convolutional neural network (cascade R-CNN) is used to realize object detection. As shown in Figure 2, the model extracts the features of the input image through backbone convolutions. It uses the region proposal network (RPN) to obtain a series of rough rectangular proposals of the object. A series of end-to-end subdetectors is cascaded, and the bounding boxes output by the previous stage detector are input into the latter stage detector. This gradually increases the threshold of intersection over union (IoU) between the candidate bound and ground-truth bound to improve detection results [34]. Compared with other R-CNN series models, cascade R-CNN introduces detection subnetworks with different frameworks. It overcomes the overfitting of regression at a specific IoU threshold and can achieve relatively good recognition accuracy [34]. The objects to be detected are marked in the images with bounding boxes, and cascade R-CNN is used for training and learning so that the same type of objects in other images can be automatically marked with bounding boxes.

LOB Mapping Based on Object-Detection Results
In order to realize image-based object positioning, it is necessary to detect the object in the image. There are relatively mature image object-detection algorithms that have achieved good detection results in street object detection [11,32,33]. In this paper, a cascade region convolutional neural network (cascade R-CNN) is used to realize object detection. As shown in Figure 2, the model extracts the features of the input image through backbone convolutions. It uses the region proposal network (RPN) to obtain a series of rough rectangular proposals of the object. A series of end-to-end subdetectors is cascaded, and the bounding boxes output by the previous stage detector are input into the latter stage detector. This gradually increases the threshold of intersection over union (IoU) between the candidate bound and ground-truth bound to improve detection results [34]. Compared with other R-CNN series models, cascade R-CNN introduces detection sub-networks with different frameworks. It overcomes the overfitting of regression at a specific IoU threshold and can achieve relatively good recognition accuracy [34]. The objects to be detected are marked in the images with bounding boxes, and cascade R-CNN is used for training and learning so that the same type of objects in other images can be automatically marked with bounding boxes.
As shown in Figure 3, the image space coordinates (x c , y c , z c ) of the center pixel of the box are obtained according to the bounding box of the recognized object in the image and the imaging law [8,35]. Combined with the spatial coordinates recorded by GNSS and the Eulerian angle recorded by IMU, the absolute location and attitude of the vehicle are obtained, the mapping relationship between the bounding and the observation orientation is calculated, and the LOB is constructed [30,31,36]. Cascade R-CNN schematic: "Input" represents input image, "Conv" represents backbone convolutions, "RPN" represents region proposal network, "Pool" represents region-wise feature extraction, "H" represents network head, "C" represents classification, and "B" represents bounding box.
As shown in Figure 3, the image space coordinates ( , , ) of the center pixel of the box are obtained according to the bounding box of the recognized object in the image and the imaging law [8,35]. Combined with the spatial coordinates recorded by GNSS and the Eulerian angle recorded by IMU, the absolute location and attitude of the vehicle are obtained, the mapping relationship between the bounding and the observation orientation is calculated, and the LOB is constructed [30,31,36]. The projection of the pixel in the world coordinate system ( , , ) can be obtained through Equation (1): Figure 2. Cascade R-CNN schematic: "Input" represents input image, "Conv" represents backbone convolutions, "RPN" represents region proposal network, "Pool" represents region-wise feature extraction, "H" represents network head, "C" represents classification, and "B" represents bounding box.
As shown in Figure 3, the image space coordinates ( , , ) of the center pixel of the box are obtained according to the bounding box of the recognized object in the image and the imaging law [8,35]. Combined with the spatial coordinates recorded by GNSS and the Eulerian angle recorded by IMU, the absolute location and attitude of the vehicle are obtained, the mapping relationship between the bounding and the observation orientation is calculated, and the LOB is constructed [30,31,36]. The projection of the pixel in the world coordinate system ( , , ) can be obtained through Equation (1): The projection of the pixel in the world coordinate system (x w , y w , z w ) can be obtained through Equation (1): where s represents the depth coefficient, which will be offset in subsequent calculations; R represents the rotation matrix from the image-space coordinate system estimated from the vehicle attitude parameters to the world coordinate system; and (x cam , y cam , z cam ) represents the camera world coordinates calculated from the GNSS parameters and calibration parameters. The orientation, b, corresponding to the pixel can be expressed by Equation (2): LOB is represented by l, as shown in Equation (3):

Geographic Positioning Based on Self-Adaptive Constrained LOB
After obtaining the LOB of the detected object mapping in the street-view image, association matching of the detection object in the multi-view images is realized by the spatial aggregation of the LOB. However, a large number of false associations may be generated, forming ghost nodes. To reduce the influence of ghost nodes, the grid division range is first automatically calculated according to the driving trajectory, the effective distance of the LOB is limited, and the ghost nodes are preliminarily eliminated. Then, the intersection of LOBs in each grid is expressed by a relation matrix, and ghost nodes are further eliminated based on the proposed self-adaptive constrained rules. Next, the process of the LOB-based positioning method is described in detail.

LOB Measurement
As shown in Figure 4, when the same object is captured by multi-view images, the bounding box of the object is mapped to an LOB, which will generate a geometric intersection, that is, the LOB intersection. However, due to the existence of observation errors, these intersections often do not overlap completely in space but are aggregated into a cluster within a certain range. The centroid of the intersections within this cluster represents the geolocation of the street object. where s represents the depth coefficient, which will be offset in subsequent calculations; R represents the rotation matrix from the image-space coordinate system estimated from the vehicle attitude parameters to the world coordinate system; and ( , , ) represents the camera world coordinates calculated from the GNSS parameters and calibration parameters.
The orientation, b, corresponding to the pixel can be expressed by Equation (2): LOB is represented by l, as shown in Equation (3):

Geographic Positioning Based on Self-Adaptive Constrained LOB
After obtaining the LOB of the detected object mapping in the street-view image, association matching of the detection object in the multi-view images is realized by the spatial aggregation of the LOB. However, a large number of false associations may be generated, forming ghost nodes. To reduce the influence of ghost nodes, the grid division range is first automatically calculated according to the driving trajectory, the effective distance of the LOB is limited, and the ghost nodes are preliminarily eliminated. Then, the intersection of LOBs in each grid is expressed by a relation matrix, and ghost nodes are further eliminated based on the proposed self-adaptive constrained rules. Next, the process of the LOB-based positioning method is described in detail.

LOB Measurement
As shown in Figure 4, when the same object is captured by multi-view images, the bounding box of the object is mapped to an LOB, which will generate a geometric intersection, that is, the LOB intersection. However, due to the existence of observation errors, these intersections often do not overlap completely in space but are aggregated into a cluster within a certain range. The centroid of the intersections within this cluster represents the geolocation of the street object.  As shown in Figure 5, the bounding boxes detected from different images in complex scenes are not all the same object. The intersection points can be generated between any two non-parallel LOBs, which includes the real geolocation of the identified object and also contains a large number of ghost nodes. Therefore, certain constrained rules are required to eliminate ghost nodes.
S Int. J. Geo-Inf. 2022, 11, x FOR PEER REVIEW 7 of As shown in Figure 5, the bounding boxes detected from different images in comp scenes are not all the same object. The intersection points can be generated between a two non-parallel LOBs, which includes the real geolocation of the identified object a also contains a large number of ghost nodes. Therefore, certain constrained rules are quired to eliminate ghost nodes.

Division of Grid
Due to the large range of the driving trajectory, each trajectory point within the ran can generate several LOBs, and only the LOB intersection generated by the trajecto points with a short distance can determine the geolocation of the object. The range of t driving trajectory is divided into grids, and LOB intersection calculation is only perform in the adjacent grid each time, which indirectly restricts the effective length of the LO potentially reducing unnecessary calculations and removing the ghost nodes outside t effective range of the LOB.
As shown in Figure 6, a driving trajectory including n records is recorded { ( , , )| = 1,2, … , }, and the average baseline length of triangulations of adjace views can be estimated by Equation (4), denoted by bl.
According to the two-dimensional range of the driving trajecto ( , , , ), it is evenly divided into several square grids and takes k tim the distance of bl as the size of the unit grid. The number of columns, , and the numb of rows, , can be obtained by Equation (5): where Ceil represents the rounding-up function.
In each calculation process, the intersection points are calculated only for the LO mapped by the captured images within the range of Equation (6)

Division of Grid
Due to the large range of the driving trajectory, each trajectory point within the range can generate several LOBs, and only the LOB intersection generated by the trajectory points with a short distance can determine the geolocation of the object. The range of the driving trajectory is divided into grids, and LOB intersection calculation is only performed in the adjacent grid each time, which indirectly restricts the effective length of the LOB, potentially reducing unnecessary calculations and removing the ghost nodes outside the effective range of the LOB.
As shown in Figure 6, a driving trajectory including n records is recorded as {T i (x i , y i , z i )|i = 1, 2, . . . , n}, and the average baseline length of triangulations of adjacent views can be estimated by Equation (4), denoted by bl.
According to the two-dimensional range of the driving trajectory (min x , min y , max x , max y ), it is evenly divided into several square grids and takes k times the distance of bl as the size of the unit grid. The number of columns, n col , and the number of rows, n row , can be obtained by Equation (5): where Ceil represents the rounding-up function.
In each calculation process, the intersection points are calculated only for the LOB mapped by the captured images within the range of Equation (6), denoted by Grid calculation , and only the geographic positioning results within the range of Equation (7) are recorded, denoted by Grid recorded .
where col represents the column number of the current Grid recorded , and the value is an integer within the range of [0, n col ); and row represents the row number of the current Grid recorded , which is an integer within the range of [0, n row ).
During grid division, the threshold, k, is used to ensure that at least two trajectory points (k > 1) for mapping LOBs are included in the grid range to be calculated. The grid is a regular square. Because only the results within the central grid are recorded each time, the maximum effective intersection distance of the LOB is k2 √ 2bl. However, objects with a long distance tend to have a small number of pixels in the image and poor positioning accuracy. Therefore, the effective shooting distance, V, of the equipment can be estimated according to the equipment conditions. As shown in Equation (8), according to k2 √ 2bl ≤ V, the value range of the threshold, k, is: During grid division, the threshold, k, is used to ensure that at least two trajectory points (k > 1) for mapping LOBs are included in the grid range to be calculated. The grid is a regular square. Because only the results within the central grid are recorded each time, the maximum effective intersection distance of the LOB is 2√2 . However, objects with a long distance tend to have a small number of pixels in the image and poor positioning accuracy. Therefore, the effective shooting distance, V, of the equipment can be estimated according to the equipment conditions. As shown in Equation (8), according to 2√2 ≤ , the value range of the threshold, k, is:

Relationship Matrix Construction
A set, L, is used to represent n LOBs, where the ith LOB is represented by li: = , , , … , , … , , … , , Then, the intersection point is generated by the intersection of LOBs in the set, L, which can be represented by an n × n intersection matrix: where M , represents the intersection matrix of LOBs in the set, L, and × repre-

Relationship Matrix Construction
A set, L, is used to represent n LOBs, where the i th LOB is represented by l i : Then, the intersection point is generated by the intersection of LOBs in the set, L, which can be represented by an n × n intersection matrix: where M P i,j represents the intersection matrix of LOBs in the set, L, and p i×j represents the intersection of l i and l j . As shown in Figure 7a, Object1, Object2 and Object3 are observed at four observation points: a, b, c, and d, respectively. In order to facilitate understanding, a combination of letters and numbers is used to record the observation position and object of the LOB. For example: l a1 is the LOB of Object1 observed from position a. We use three different colored LOBs to simulate the line of sight when observing three different objects. The object location is the aggregation location of the LOB intersection of the same color, and LOB intersections of different colors are ghost nodes. As shown in Figure 7b, the intersection matrix is used to describe the intersection relationship between LOBs. Because the matrix has a certain symmetry, it is only necessary to record the upper triangular matrix. Two LOBs that do not have an intersection relationship are recorded as "-".  As shown in Figure 7a, the intersection point, , is traversed. If the two intersection points are close to each other (the threshold is set to t), they are classified as a cluster and recorded as ck, and the centroid of the cluster is recorded as O( ). If there are no other intersections nearby, the points are recorded as a cluster alone. In Figure 7b, the matrix elements in the same cluster are marked with the same color, and each cluster can be expressed as: where dist represents the function to calculate the distance between two points.

Elimination of Ghost Nodes Based on Constrained Rules
In this study, we introduce two constrained rules without setting dynamic thresholds. By recursively executing the constrained rules until the number of clusters no longer changes, the effective elimination of ghost nodes is achieved as far as possible, which reduces the impact of ghost nodes on the positioning results.
1. Constrained rules based on the minimum number of intersections in the cluster As shown in Figure 7a, the intersection point, p i , is traversed. If the two intersection points are close to each other (the threshold is set to t), they are classified as a cluster and recorded as c k , and the centroid of the cluster is recorded as O(c k ). If there are no other intersections nearby, the points are recorded as a cluster alone. In Figure 7b, the matrix elements in the same cluster are marked with the same color, and each cluster can be expressed as: where dist represents the function to calculate the distance between two points.

Elimination of Ghost Nodes Based on Constrained Rules
In this study, we introduce two constrained rules without setting dynamic thresholds. By recursively executing the constrained rules until the number of clusters no longer changes, the effective elimination of ghost nodes is achieved as far as possible, which reduces the impact of ghost nodes on the positioning results.

Constrained rules based on the minimum number of intersections in the cluster
When the number of LOBs of the observed object is greater than 2, the number of intersections in the cluster should be greater than 1 [27,28,30,31]. The number of intersections contained in each cluster is counted. If there is only one intersection, the intersection contained in the cluster is determined as a ghost node. As shown in Figure 8a, all clusters (marked with "×") that are determined to be ghost nodes are deleted, and the intersections within the cluster with LOBs in the intersection matrix are disassociated, as shown in Figure 8b. At this point, most of the ghost nodes have been eliminated, and only the clusters (marked with "?"), the candidate points of which are to be further judged, are retained.

2.
Constrained rules based on the uniqueness of LOB association When the number of LOBs of the observed object is greater than 2, the number of intersections in the cluster should be greater than 1 [27,28,30,31]. The number of intersections contained in each cluster is counted. If there is only one intersection, the intersection contained in the cluster is determined as a ghost node. As shown in Figure 8a, all clusters (marked with "×") that are determined to be ghost nodes are deleted, and the intersections within the cluster with LOBs in the intersection matrix are disassociated, as shown in Figure 8b. At this point, most of the ghost nodes have been eliminated, and only the clusters (marked with "?"), the candidate points of which are to be further judged, are retained.

Constrained rules based on the uniqueness of LOB association
Each LOB is a line-of-sight simulation of the observation object. If the LOB is only associated with one intersection within a cluster, the cluster must be the object location. The LOB associated with this cluster should be disassociated from other clusters to ensure the uniqueness of the LOB association. Each LOB is a line-of-sight simulation of the observation object. If the LOB is only associated with one intersection within a cluster, the cluster must be the object location. The LOB associated with this cluster should be disassociated from other clusters to ensure the uniqueness of the LOB association.
For example, as shown in Figure 9, l a1 has an intersecting relationship only with the intersection points in one cluster, so the cluster must be the object location, that is, Object1.
Other LOBs associated with Object1 should also be uniquely associated with Object1 only. In addition to being associated with Object1, l d1 is also associated with other clusters. Therefore, it is necessary to disassociate the association between l d1 and the intersections in other clusters. At this time, the number of intersections in the red cluster is 1, which will be eliminated as a ghost node in the next iteration. In addition to being associated with Object1, ld1 is also associated with other clusters. Therefore, it is necessary to disassociate the association between ld1 and the intersections in other clusters. At this time, the number of intersections in the red cluster is 1, which will be eliminated as a ghost node in the next iteration.

Data Collection and Selection of Research Area
In this study, we used the data collected by the Alpha 3D vehicle-mounted laserscanning measurement system produced by CHC NAVIGATION for method validation. The system is equipped with a Ladybug panoramic camera, GNSS, IMU, and LIDAR. The original data obtained by the system are a series of binary stream data that cannot be directly used by users, and a series of preprocessing operations are required. Through CoPre software developed by CHC NAVIGATION, the image stream data output by the Ladybug panoramic camera are read and spliced to form a panoramic image with a 360° viewing angle stored as a general picture format with 8192 × 4096-pixel resolution. Through

Data Collection and Selection of Research Area
In this study, we used the data collected by the Alpha 3D vehicle-mounted laserscanning measurement system produced by CHC NAVIGATION for method validation. The system is equipped with a Ladybug panoramic camera, GNSS, IMU, and LIDAR. The original data obtained by the system are a series of binary stream data that cannot be directly used by users, and a series of preprocessing operations are required. Through CoPre software developed by CHC NAVIGATION, the image stream data output by the Ladybug panoramic camera are read and spliced to form a panoramic image with a 360 • viewing angle stored as a general picture format with 8192 × 4096-pixel resolution. Through Inertial Explorer software, the IMU and GNSS data are jointly processed to obtain the high-precision driving trajectory (latitude and longitude coordinates, regional projection coordinates, and elevation), speed, attitude (roll, pitch, and heading), and other information in the specified coordinate system, which are output one-by-one to form a structured, readable text. The output trajectory data have a horizontal accuracy of 0.010 m and a vertical accuracy of 0.020 m. For the acquired attitude data, the roll/pitch accuracy is 0.005 • and the heading is 0.017 • [37]. The high-resolution imagery provides geometric texture and semantic information of street-side objects for the experiments. High-precision position and attitude data provide sufficient measurements for precision support for method verification.
As shown in Figure 10, an Alpha 3D vehicle-mounted laser-scanning measurement system was used to collect data from the Yanziji and Mufushan areas of Nanjing City, Jiangsu Province, China, and the Yincun Town area of Changzhou City, Jiangsu Province, China. Nanjing City and Changzhou City are located in the same province in China, and Yincun area and Mufushan and Yanziji area are only 120 km away, with similar streetlayout styles. coordinates, and elevation), speed, attitude (roll, pitch, and heading), and other information in the specified coordinate system, which are output one-by-one to form a structured, readable text. The output trajectory data have a horizontal accuracy of 0.010 m and a vertical accuracy of 0.020 m. For the acquired attitude data, the roll/pitch accuracy is 0.005° and the heading is 0.017° [37]. The high-resolution imagery provides geometric texture and semantic information of street-side objects for the experiments. High-precision position and attitude data provide sufficient measurements for precision support for method verification. As shown in Figure 10, an Alpha 3D vehicle-mounted laser-scanning measurement system was used to collect data from the Yanziji and Mufushan areas of Nanjing City, Jiangsu Province, China, and the Yincun Town area of Changzhou City, Jiangsu Province, China. Nanjing City and Changzhou City are located in the same province in China, and Yincun area and Mufushan and Yanziji area are only 120 km away, with similar streetlayout styles.

Object Detection and LOB Mapping
In this study, 6367 street-view images collected from the Yanziji area and 6920 streetview images collected from the Mufushan area with a resolution of 8192 × 4096 pixels were used as annotation data. Three classifications of pole-like objects-utility poles, street lamps, and signboards (which were widely and largely distributed)-were used as acquisition objects, and a workstation equipped with an Intel Xeon E5-2698 V4 CPU and a Tesla V100 GPU was used for training.
As shown in Figure 11, because most of the labeled data are distributed in the vertical center area of the image and the image distortion in this area is relatively small, for the convenience of training, the original image was cut into 53,148 2048×2048 subimages that only contain the middle area for object labeling. These pole-like objects were divided into rod parts and top parts for labeling, avoiding overlapping of bounding boxes as much as

Object Detection and LOB Mapping
In this study, 6367 street-view images collected from the Yanziji area and 6920 streetview images collected from the Mufushan area with a resolution of 8192 × 4096 pixels were used as annotation data. Three classifications of pole-like objects-utility poles, street lamps, and signboards (which were widely and largely distributed)-were used as acquisition objects, and a workstation equipped with an Intel Xeon E5-2698 V4 CPU and a Tesla V100 GPU was used for training.
As shown in Figure 11, because most of the labeled data are distributed in the vertical center area of the image and the image distortion in this area is relatively small, for the convenience of training, the original image was cut into 53,148 2048 × 2048 subimages that only contain the middle area for object labeling. These pole-like objects were divided into rod parts and top parts for labeling, avoiding overlapping of bounding boxes as much as possible. The labeling results include: 48,162 rod parts, 7435 top parts of utility poles, 26,751 top parts of street lamps, and 5695 top parts of signboards. Taking this as the sample data, the samples were randomly divided into a training set and test set according to a 7:3 ratio and put into the Cascade R-CNN classifier [34] for training. The average precision of training was 0.880, and the recall rate was 0.929 (IoU > 0.5).  A total of 4892 street-view images collected in Yincun Town were used for object detection. Because the location of the rod part is used as the object geolocation during positioning, in the detection results, the rod parts, top parts of utility poles, top parts of street lamps, and top parts of signboards were detected separately. The classifications of pole-like objects to which the rod-parts belong is given by the classifications of the closest top parts. This method relies on the orientation of the rod parts to map LOBs, which provides more accurate orientation parameters for the subsequent geographic positioning of the pole-like objects. In order to reduce the impact of recognition errors on the subsequent matching process, the classification results were manually checked, and a total of 6104 LOBs of pole-likes objects were mapped, including 3325 utility poles, 1814 streetlamps, and 965 signboards.

Geographic Positioning Based on Self-Adaptive Constrained LOB
The region division and geographic positioning algorithms are programmed in C# language and run under the Windows 10 operating system using a personal computer with an Intel Core i7-7700 CPU and 8GB RAM for calculation.
The calculation results of the proposed method were compared with the 1409 polelike objects collected manually in this area, and the calculation results were evaluated from three indicators: time consumption, recall rate, and precision rate. The closer the recall rate and precision rate are to 1, the better the positioning effect of the algorithm is. The closer the time consumption is to 0, the more efficient the algorithm is. The time consumption is based on the actual running time (running time is the calculation time of LOB-based geographic positioning). The recall rate and precision rate were calculated using Equation (12): A total of 4892 street-view images collected in Yincun Town were used for object detection. Because the location of the rod part is used as the object geolocation during positioning, in the detection results, the rod parts, top parts of utility poles, top parts of street lamps, and top parts of signboards were detected separately. The classifications of pole-like objects to which the rod-parts belong is given by the classifications of the closest top parts. This method relies on the orientation of the rod parts to map LOBs, which provides more accurate orientation parameters for the subsequent geographic positioning of the pole-like objects. In order to reduce the impact of recognition errors on the subsequent matching process, the classification results were manually checked, and a total of 6104 LOBs of pole-likes objects were mapped, including 3325 utility poles, 1814 streetlamps, and 965 signboards.

Geographic Positioning Based on Self-Adaptive Constrained LOB
The region division and geographic positioning algorithms are programmed in C# language and run under the Windows 10 operating system using a personal computer with an Intel Core i7-7700 CPU and 8GB RAM for calculation.
The calculation results of the proposed method were compared with the 1409 pole-like objects collected manually in this area, and the calculation results were evaluated from three indicators: time consumption, recall rate, and precision rate. The closer the recall rate and precision rate are to 1, the better the positioning effect of the algorithm is. The closer the time consumption is to 0, the more efficient the algorithm is. The time consumption is based on the actual running time (running time is the calculation time of LOB-based geographic positioning). The recall rate and precision rate were calculated using Equation (12): where N r f represents the number of reference points, N cal represents the number of calculated results, N rc represents the number of reference points within a 1 m buffer range of all calculation results, and N co represents the number of calculation results within 1 m buffer range of the reference point. The LOB-based positioning method proposed in this study is affected by two thresholds: the enlargement coefficient, k, for dividing the grid and the distance, t, for aggregating the cluster points. In order to study the influence of the threshold parameter on the method, grid division is carried out according to the threshold parameter, and the LOB intersection and ghost-node elimination are performed grid-by-grid and classification-by-classification. According to the 4892 driving trajectories corresponding to the street-view images, the average baseline length of triangulation of adjacent views is calculated to be 7.21 m. The device can effectively capture pole-like objects within 100 m. The effective shooting distance, V, is set to 100 m. According to Equation (8), the value range of k is calculated as (1, 4.9]. For the convenience of calculation, an integer near this range is taken as the value of the threshold, k. Taking into account the error of recording parameters during acquisition, the distortion of street-view images, and the influence of the curvature of the earth on geodetic triangulation, the clustering distance, t, of the cluster is set to range from 0.1 m to 1 m, and the threshold value, t, is taken every 0.1 m. Within the estimated 100 m maximum effective range of the device, k takes 4 to make the maximum line-of-sight range 81.6 m, and the value of t is set to 0.2 m according to the output accuracy of the algorithm of the device. The calculation results obtained by this combination of thresholds are shown in Figure 12. The results prove the effectiveness of the proposed method for the positioning of large-scale pole-like objects. The positioning result has high accuracy and is associated with the corresponding object images, which can be imported into the database as the final result.

Comparative Analysis and Discussions with Existing Methods
In order to achieve a comparative analysis between the method proposed in this study and existing methods, in this paper, we reproduced the modified brute-force-based LOB algorithm proposed by Zhang [30,31]. The algorithm is affected by three thresholds:  Within the estimated 100 m maximum effective range of the device, k takes 4 to make the maximum line-of-sight range 81.6 m, and the value of t is set to 0.2 m according to the output accuracy of the algorithm of the device. The calculation results obtained by this combination of thresholds are shown in Figure 12. The results prove the effectiveness of the proposed method for the positioning of large-scale pole-like objects. The positioning result has high accuracy and is associated with the corresponding object images, which can be imported into the database as the final result.

Comparative Analysis and Discussions with Existing Methods
In order to achieve a comparative analysis between the method proposed in this study and existing methods, in this paper, we reproduced the modified brute-force-based LOB algorithm proposed by Zhang [30,31]. The algorithm is affected by three thresholds: the number of views, the angle, and the distance to the center of the selected road. The threshold is set according to the parameters provided in the paper and the actual situation of the data presented in this paper. The evaluation results are as follows.
Based on the modified brute-force-based LOB algorithm, there is some uncertainty in the selection of the threshold value. It is necessary to make multiple attempts to select the empirical value in combination with the data; especially when the road width is unknown or the road width varies greatly in a large-scale area, the threshold of the distance to the center of the selected road is often difficult to determine. It is easy to see from Table 2 that the modified brute-force-based LOB algorithm relies on expanding the threshold range to increase the number of candidate points, which takes more time. Although this can slightly improve the recall rate, it often leads to a decline in the accuracy of the recall results. Table 2. Evaluation results of the geographic positioning method based on modified brute-force-based LOB. Geographic positioning based on the self-adaptive constrained LOB proposed in this study can automatically calculate the range of effective threshold k according to the driving trajectory, and the threshold t is a fixed value under the condition of unchanged equipment. The values of k within the effective shooting range achieved a good recall rate and accuracy rate, with a short calculation time. As the value of k increases, the range of the unit grid becomes larger, the calculation time also gradually increases, and the number of ghost nodes generated when the effective distance of LOB exceeds the actual distance also increases. The recall rate and the accuracy rate both decreased, but they still maintained a high level. Due to the influence of the acquisition equipment, the calculation results often cannot achieve high accuracy and produce offsets. If the cluster aggregation distance, t, is too small, the intersection points near the object location cannot be aggregated into a cluster, which would be used as the result of repeated acquisition. If the value of t is large, it causes adjacent objects of the same type to merge into a cluster, resulting in missing positioning and lowering the recall rate. Because the distance between street lamps and utility poles is large, a high value of t has little impact on them, although it has a considerable impact on signboards that are close to each other.

Number of Views
Compared with other LOB-based positioning methods with which it is difficult to select thresholds, the method proposed in this study is adaptable in threshold selection. The appropriate k value range can be automatically calculated through the driving trajectory, and the t value can take a fixed value according to the output accuracy of the equipment. Compared with the modified brute-force-based LOB algorithm, the proposed method limits the effective range of the LOB by dividing the grid and does not need to rely on thresholds, such as the number of adjacent view points and distance to the center of the selected road, which are affected by changes in road scenes.

Conclusions
In this study, we proposed a method for automatic positioning of objects in street-view images based on MMS. Aiming to reduce the difficulty of image feature matching due to a long baseline in street view, a geographic positioning method based on self-adaptive constrained LOB is implemented by referring to the object-matching algorithm based on a combination of object detection and LOB positioning. In order to overcome the time consumption and the difficulties of threshold selection caused by an LOB-based positioning algorithm, the idea of "divide-conquer" is introduced, and the calculation area is divided into grids according to the driving trajectory. The calculations in each grid are independent and do not interfere with each other, which greatly improves computing efficiency. In order to make the algorithm universal, a ghost node elimination algorithm based on self-adaptive constrained rules is proposed according to the line-of-sight rule when observing the object, which realizes the non-image feature matching of the same object in multi-view images.
Taking signboards, utility poles, and street lamps of multiple road sections in Yincun Town, Changzhou City, Jiangsu Province, as the experimental objects, experiments were carried out using multiple thresholds and compared with previous LOB-based objectpositioning methods. The results show that the proposed method has higher efficiency and accuracy than previous methods, and the threshold selection range is clear and easy to promote. This method can perform automatic and accurate geographic positioning and image acquisition for a large range of street objects based on high-precision MMS, verifying its feasibility.
The results of this study are applicable in the acquisition of geolocation information for street objects, which can be used to draw high-precision maps required for autonomous driving and to provide data support for autonomous driving positioning, path planning, and traffic warning. Geolocation information on street-side objects can also assist in road safety detection and can help government departments to better manage and maintain urban living facilities and transportation facilities.
Street-side objects are easily blocked by vehicles, resulting in missing detection of targets. When the number of detections of the same object in multi-view images is less than three, the LOB-based positioning method cannot perform effective target positioning, and repeat acquisition is required for the road section in question. The data collection and positioning methods presented in this paper are not synchronized. With the support of software and hardware, the data stream of an MMS system can be converted into image and driving-trajectory data in real time. If the object-detection model proposed in this study is replaced with a lightweight model with higher detection efficiency, combined with the trajectory information obtained at a short distance, it would be possible to realize real-time image-based street-object positioning. This suggests the possibility of real-time online updating and sharing of high-precision maps in the future.  Data Availability Statement: Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.