Camera Geolocation Using Digital Elevation Models in Hilly Area

The geolocation of skyline provides an important application in unmanned vehicles, unmanned aerial vehicles, and other fields. However, the existing methods are not effective in hilly areas. In this paper, we analyze the difficulties to locate in hilly areas and propose a new geolocation method. According to the vegetation in hilly area, two new skyline features, enhanced angle chain code and lapel point, are proposed. In order to deal with the skyline being close to the camera, we also propose a matching method which incorporates skyline distance heatmap and skyline pyramid. The experimental results show that the proposed method is highly effective in hilly area and has a robust performance against noise and rotation effects.


Introduction
Visual-based geolocalization is an important research area of computer vision. By extracting features from images or videos, and comparing them with the feature databases, the location of the observation can be determined. Different types of visual geolocation methods have been proposed for different environments [1]. For example, in urban areas, image and 3D point cloud are generally used for locating. However, since the obvious characteristics are relatively sparse in the natural environments, it is difficult to directly adopt the geolocation method for urban areas.
In areas with no Global Navigation Satellite System signal, or in outer space with no positioning satellite, locating with skyline is an economic and important method for navigation [2]. For example, by extracting the skyline information from a scene with mountain [3][4][5] and comparing it with the skyline generated from Digital Elevation Model (DEM) data, the geolocation of the image location can be determined. Locating with skyline has thus become an active research area, which has important applications in unmanned vehicles [6][7][8][9], unmanned aerial vehicles [10][11][12][13][14][15], and other fields.
There are some differences between a photo skyline and a DEM skyline. Any snow cover and vegetation on a mountain interferes with the extraction of skyline features and degrades the contour details of the mountain. Furthermore, some DEM datasets generated from satellite-based data have low resolution and poor precision, which affect the matching of its skyline with a photo skyline. Furthermore, existing methods of locating skyline have been used in an environment where the skyline is far away from mountains and deserts, and the area has few vegetation [16]. However, research has shown that vegetation in hilly areas has a great impact on the feature extraction of the skyline, destroying the original contour details [17,18]. Furthermore, dense hills make the distance between the A curve characterization method based on contour word is simple and effective, which transforms curve matching into binary code matching. Its disadvantage is that it is affected by any rotation of the curve, since a little rotation will result in a significant change to the contour word. Its robustness of this method is poor, especially when the curve is relatively flat and the sampled points has similar height. This is because a little error or noise will have a significant effect on the coding.
The curve characterization method described by the authors of [23] uses the CSS descriptors to represent a skyline. It extracts the skyline and generates the CSS map of the skyline as illustrated in Figure 2. Although this method is robust against rotation of the skyline, the amount of information it provides is less than those provided by other methods. This is because after the length normalization and Gaussian kernel filtering with different widths, only the number and position of the stagnation A curve characterization method based on contour word is simple and effective, which transforms curve matching into binary code matching. Its disadvantage is that it is affected by any rotation of the curve, since a little rotation will result in a significant change to the contour word. Its robustness of this method is poor, especially when the curve is relatively flat and the sampled points has similar height. This is because a little error or noise will have a significant effect on the coding.
The curve characterization method described by the authors of [23] uses the CSS descriptors to represent a skyline. It extracts the skyline and generates the CSS map of the skyline as illustrated in Figure 2. Although this method is robust against rotation of the skyline, the amount of information it provides is less than those provided by other methods. This is because after the length normalization and Gaussian kernel filtering with different widths, only the number and position of the stagnation points of the skyline are recorded. The method only uses the CSS features of skyline to determine whether the skyline is generated by buildings or natural scenes.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 24 points of the skyline are recorded. The method only uses the CSS features of skyline to determine whether the skyline is generated by buildings or natural scenes. There have been methods [24] for extracting the concavity of skyline as the feature (see Figure  3). One method is as follows. First, the extreme points of curvature in the skyline are found, which are usually in the valley of the skyline. Second, two candidate points are randomly selected on the left and right of each extreme point as the prediction endpoints of the concave curve of the skyline. The positions of the two prediction points are then updated iteratively, until the slopes of the line formed by the two prediction points and the curvature extreme point reached the local maximum value. Finally, the skyline is cut from the positions of each of two endpoints with curvelets, and the two endpoints of each curvelet are normalized to (−1,0) and (1,0) of the coordinates by rotation and scaling. A curvelet is then divided into N equal intervals, and the area between the x-axis and the curvelet is calculated to obtain a n-dimensional feature vector, which is robust to scaling and rotation.   There have been methods [24] for extracting the concavity of skyline as the feature (see Figure 3). One method is as follows. First, the extreme points of curvature in the skyline are found, which are usually in the valley of the skyline. Second, two candidate points are randomly selected on the left and right of each extreme point as the prediction endpoints of the concave curve of the skyline. The positions of the two prediction points are then updated iteratively, until the slopes of the line formed by the two prediction points and the curvature extreme point reached the local maximum value. Finally, the skyline is cut from the positions of each of two endpoints with curvelets, and the two endpoints of each curvelet are normalized to (−1,0) and (1,0) of the coordinates by rotation and scaling. A curvelet is then divided into N equal intervals, and the area between the x-axis and the curvelet is calculated to obtain a n-dimensional feature vector, which is robust to scaling and rotation.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 24 points of the skyline are recorded. The method only uses the CSS features of skyline to determine whether the skyline is generated by buildings or natural scenes. There have been methods [24] for extracting the concavity of skyline as the feature (see Figure  3). One method is as follows. First, the extreme points of curvature in the skyline are found, which are usually in the valley of the skyline. Second, two candidate points are randomly selected on the left and right of each extreme point as the prediction endpoints of the concave curve of the skyline. The positions of the two prediction points are then updated iteratively, until the slopes of the line formed by the two prediction points and the curvature extreme point reached the local maximum value. Finally, the skyline is cut from the positions of each of two endpoints with curvelets, and the two endpoints of each curvelet are normalized to (−1,0) and (1,0) of the coordinates by rotation and scaling. A curvelet is then divided into N equal intervals, and the area between the x-axis and the curvelet is calculated to obtain a n-dimensional feature vector, which is robust to scaling and rotation.

The Dataset of DEM
It is important to build the skyline dataset of DEM, i.e., to save the skylines of area to be located. Although some papers [21,24] have proposed that dynamic adjusting the sample density of the DEM Appl. Sci. 2020, 10, 6661 4 of 24 dataset will improve the accuracy of locating, all researchers have used the method of uniform grid sampling to build the DEM dataset, since this method is simpler than dynamic sampling.
The location area of the dataset described by the authors [24] was 10,000 km 2 , and the sampling point of the dataset was 1 km. In order to improve the detection speed, the 360 • skyline of each sampling point was divided into 24 segments, where each segment had a field of view (FoV) of 30 • and the overlapping area between the two adjacent segments was 15 • .
The authors of [21] presented experiments in five regions on three continents, each of which used a dataset sampling point of 250 m. The accuracy and efficiency of different sampling density were calculated. The authors of [17] achieved the positioning within 40,000 km 2 in Switzerland. The sampling density of the dataset was 0.001 • (111 m) in the north-south direction, and 0.0015 • (115 m) in the east-west direction. A total of 3.5 million skylines were produced.

Skyline Retrieval
Skyline retrieval refers to the extraction of image skyline and finding the most similar skyline in the DEM dataset. Since the retrieval of each skyline in the dataset is very time-consuming, many methods to improve retrieval efficiency are applied to skyline location.
The matching method utilized by the authors of [24] included endpoint matching and shape matching that, respectively, used geometric hash table and k-d tree to improve the matching efficiency. The methods utilized by the authors of [17] and [21] use contour words to represent skyline features. Contour words of all skylines were extracted from DEM dataset, and then sorted by the method term frequency-inverse document frequency (TF-IDF).

Difficulties in Locating Skyline Geolocation in Hilly Areas
Compared with mountainous areas and deserts, skyline geolocalization in hilly areas has its own difficulties.

Detailed Information of Skyline Affected by Dense Vegetation
In hilly area of low altitude, the vegetation on the surface is mainly composed of trees and shrubs. The vegetation has seasonal variations, which cause great interference to the feature of the skyline. Figure 4 illustrates the effects of vegetation on a terrain surface. Although the vegetation does not affect the general trend of the skyline, the vegetation has different effects on the details of the skyline.

The Dataset of DEM
It is important to build the skyline dataset of DEM, i.e., to save the skylines of area to be located. Although some papers [21,24] have proposed that dynamic adjusting the sample density of the DEM dataset will improve the accuracy of locating, all researchers have used the method of uniform grid sampling to build the DEM dataset, since this method is simpler than dynamic sampling.
The location area of the dataset described by the authors [24] was 10,000 km 2 , and the sampling point of the dataset was 1 km. In order to improve the detection speed, the 360° skyline of each sampling point was divided into 24 segments, where each segment had a field of view (FoV) of 30° and the overlapping area between the two adjacent segments was 15°.
The authors of [21] presented experiments in five regions on three continents, each of which used a dataset sampling point of 250 m. The accuracy and efficiency of different sampling density were calculated. The authors of [17] achieved the positioning within 40,000 km 2 in Switzerland. The sampling density of the dataset was 0.001° (111 m) in the north-south direction, and 0.0015° (115 m) in the east-west direction. A total of 3.5 million skylines were produced.

Skyline Retrieval
Skyline retrieval refers to the extraction of image skyline and finding the most similar skyline in the DEM dataset. Since the retrieval of each skyline in the dataset is very time-consuming, many methods to improve retrieval efficiency are applied to skyline location.
The matching method utilized by the authors of [24] included endpoint matching and shape matching that, respectively, used geometric hash table and k-d tree to improve the matching efficiency. The methods utilized by the authors of [17] and [21] use contour words to represent skyline features. Contour words of all skylines were extracted from DEM dataset, and then sorted by the method term frequency-inverse document frequency (TF-IDF).

Difficulties in Locating Skyline Geolocation in Hilly Areas
Compared with mountainous areas and deserts, skyline geolocalization in hilly areas has its own difficulties.

Detailed Information of Skyline Affected by Dense Vegetation
In hilly area of low altitude, the vegetation on the surface is mainly composed of trees and shrubs. The vegetation has seasonal variations, which cause great interference to the feature of the skyline. Figure 4 illustrates the effects of vegetation on a terrain surface. Although the vegetation does not affect the general trend of the skyline, the vegetation has different effects on the details of the skyline.

Proximity of Camera to Skyline
For dense hills, the skyline is close to the camera, which leads to locating failure. In skyline geolocation, the nearest sampling point of the DEM dataset is often used as the ground truth of the

Proximity of Camera to Skyline
For dense hills, the skyline is close to the camera, which leads to locating failure. In skyline geolocation, the nearest sampling point of the DEM dataset is often used as the ground truth of the image taken (as illustrated in Figure 5). Although there is an offset between the ground truth point and image taken position, the skyline features of the ground truth and image are also the most similar. But when the skyline is close to the camera, the difference between the two skylines becomes significant.
image taken (as illustrated in Figure 5). Although there is an offset between the ground truth point and image taken position, the skyline features of the ground truth and image are also the most similar. But when the skyline is close to the camera, the difference between the two skylines becomes significant. Denoting the distance between two adjacent sampling points as Dsample, and the distance between them as ΔD, Since ΔD may not be 0, there will be a difference between the image skyline and the ground truth skyline. The difference in skyline caused by this offset is illustrated in the Figure 6. When the offset direction is orthogonal to the image viewing direction as in Figure 6a, it is easy to determine that the skyline moves by Δr, and this Δr is equal to ΔD. Since the distance between the camera and the skyline is D, and the FoV of the camera is α, the length of the skyline is (2) Denoting the distance between two adjacent sampling points as D sample , and the distance between them as ∆D, Since ∆D may not be 0, there will be a difference between the image skyline and the ground truth skyline. The difference in skyline caused by this offset is illustrated in the Figure 6.
image taken (as illustrated in Figure 5). Although there is an offset between the ground truth point and image taken position, the skyline features of the ground truth and image are also the most similar. But when the skyline is close to the camera, the difference between the two skylines becomes significant. Denoting the distance between two adjacent sampling points as Dsample, and the distance between them as ΔD, Since ΔD may not be 0, there will be a difference between the image skyline and the ground truth skyline. The difference in skyline caused by this offset is illustrated in the Figure 6. When the offset direction is orthogonal to the image viewing direction as in Figure 6a, it is easy to determine that the skyline moves by Δr, and this Δr is equal to ΔD. Since the distance between the camera and the skyline is D, and the FoV of the camera is α, the length of the skyline is (2) When the offset direction is orthogonal to the image viewing direction as in Figure 6a, it is easy to determine that the skyline moves by ∆r, and this ∆r is equal to ∆D. Since the distance between the camera and the skyline is D, and the FoV of the camera is α, the length of the skyline is The percentage of skyline movement due to offset is Appl. Sci. 2020, 10, 6661 6 of 24 When the offset direction is parallel to the image viewing direction as in Figure 6b, the change in the length of the skyline is 2∆r, where When ∆D << D (e.g., in Alps area or desert area), the influence of offset between the image taken point and its ground truth point on the skyline can be ignored. However, when the camera is close to the mountain, such as in a hilly area, the skyline location will be greatly affected.

Propose Method
An overview of the proposed geolocalization method for a skyline with dense vegetation in hilly area is shown in Figure 7.
The percentage of skyline movement due to offset is When the offset direction is parallel to the image viewing direction as in Figure 6b, the change in the length of the skyline is 2Δr, where When ΔD << D (e.g., in Alps area or desert area), the influence of offset between the image taken point and its ground truth point on the skyline can be ignored. However, when the camera is close to the mountain, such as in a hilly area, the skyline location will be greatly affected.

Propose Method
An overview of the proposed geolocalization method for a skyline with dense vegetation in hilly area is shown in Figure 7. In this method, two new features of skyline are first introduced, which are used for coarse matching and refined matching. Second, a new DEM dataset method is proposed to deal with the situation that the skyline is in hilly area. Finally, the matching method for skylines is introduced, which includes coarse matching and refined matching.

Enhanced Angle Chain Code
Chain code [25] and BAS (Beam Angle Statistics) [26] are common methods for curve feature, but both methods have certain shortcomings. The use of chain code to represent a skyline is unsatisfactory, because the chain code method is sensitive to noise and is not rotation invariant. When a curve is rotated by a small angle, the encoding of the curve will have a certain change, and any noise on in the curve will also have a large interference to the resulting chain code. The BAS method, on the other hand, samples n points on the closed curve, and calculates the angle between each point and the k-order adjacent points on the left and right sides of a point. Since determining the geolocation of a photo skyline is essentially a matching between a partial curve (an open curve in the In this method, two new features of skyline are first introduced, which are used for coarse matching and refined matching. Second, a new DEM dataset method is proposed to deal with the situation that the skyline is in hilly area. Finally, the matching method for skylines is introduced, which includes coarse matching and refined matching.

Enhanced Angle Chain Code
Chain code [25] and BAS (Beam Angle Statistics) [26] are common methods for curve feature, but both methods have certain shortcomings. The use of chain code to represent a skyline is unsatisfactory, because the chain code method is sensitive to noise and is not rotation invariant. When a curve is rotated by a small angle, the encoding of the curve will have a certain change, and any noise on in the curve will also have a large interference to the resulting chain code. The BAS method, on the other hand, samples n points on the closed curve, and calculates the angle between each point and the k-order adjacent points on the left and right sides of a point. Since determining the geolocation of a photo skyline is essentially a matching between a partial curve (an open curve in the photo skyline with width related to FoV, e.g., 24 • as illustrated in Figure 8) and an overall curve (360 • in DEM), it cannot be applied directly.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 24 photo skyline with width related to FoV, e.g., 24° as illustrated in Figure 8) and an overall curve (360° in DEM), it cannot be applied directly. In this paper, a new method of representing skyline, named Enhanced Angle Chain code, was proposed based on the combination of chain code and BAS, as follows. First, the 360° DEM skyline is regarded as the standard length, and the skyline of the image is normalized and scaled according to its FoV. Second, set the position of sampled points in the curve (e.g., four sampled points per 1°). Finally, the angle between a sampled point and an adjacent sampled point is calculated. For example, the 1°-order angle is the angle between the current sampled point and the sampled points to its left and right that differ by 1°.
Let the current point be Pcentre, the reference point on its left be Pleft, and the reference point on its right be Pright (as illustrated in Figure 9). The length of the three sides of a triangle, i.e., a, b, and c, are given by the Euclidean distance. According to the cosine theorem, the angle θ between Pcentre and the left and right reference points is (5) Considering the triangle parameters corresponding to mountain and valley are the same, the θ value can be calculated and limited according to Table 1. In this paper, a new method of representing skyline, named Enhanced Angle Chain code, was proposed based on the combination of chain code and BAS, as follows. First, the 360 • DEM skyline is regarded as the standard length, and the skyline of the image is normalized and scaled according to its FoV. Second, set the position of sampled points in the curve (e.g., four sampled points per 1 • ). Finally, the angle between a sampled point and an adjacent sampled point is calculated. For example, the 1 • -order angle is the angle between the current sampled point and the sampled points to its left and right that differ by 1 • .
Let the current point be P centre , the reference point on its left be P left , and the reference point on its right be P right (as illustrated in Figure 9). The length of the three sides of a triangle, i.e., a, b, and c, are given by the Euclidean distance. According to the cosine theorem, the angle θ between P centre and the left and right reference points is Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 24 photo skyline with width related to FoV, e.g., 24° as illustrated in Figure 8) and an overall curve (360° in DEM), it cannot be applied directly. In this paper, a new method of representing skyline, named Enhanced Angle Chain code, was proposed based on the combination of chain code and BAS, as follows. First, the 360° DEM skyline is regarded as the standard length, and the skyline of the image is normalized and scaled according to its FoV. Second, set the position of sampled points in the curve (e.g., four sampled points per 1°). Finally, the angle between a sampled point and an adjacent sampled point is calculated. For example, the 1°-order angle is the angle between the current sampled point and the sampled points to its left and right that differ by 1°.
Let the current point be Pcentre, the reference point on its left be Pleft, and the reference point on its right be Pright (as illustrated in Figure 9). The length of the three sides of a triangle, i.e., a, b, and c, are given by the Euclidean distance. According to the cosine theorem, the angle θ between Pcentre and the left and right reference points is (5) Considering the triangle parameters corresponding to mountain and valley are the same, the θ value can be calculated and limited according to Table 1. Considering the triangle parameters corresponding to mountain and valley are the same, the θ value can be calculated and limited according to Table 1. Peak 2ab P le f t .y+P right .y 2 < P center .y 1 P.y means the y value of the point P * .
In order to facilitate the comparison, we used the 32-based chain code to encode the feature angles of skylines and transformed the curve matching into a string matching. A 32-neighbour representation was proposed by dividing the 360 • into 32 subintervals (as illustrated in Figure 10) that are represented by "0~9" and "A~V", and each subinterval corresponds to 11.25 • . The enhanced angle chain code is a more accurate descriptor.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 24 In order to facilitate the comparison, we used the 32-based chain code to encode the feature angles of skylines and transformed the curve matching into a string matching. A 32-neighbour representation was proposed by dividing the 360° into 32 subintervals (as illustrated in Figure 10) that are represented by "0~9" and "A~V", and each subinterval corresponds to 11.25°. The enhanced angle chain code is a more accurate descriptor. Since the 32-base encoding was adopted, the encoding corresponding to the θ angle at Pcentre is given by In order to reduce the likelihood of coding different tilted versions of the same skyline, the normal vector of the curve was recorded at each sampled point of skyline (as illustrated in Figure 11), and 32-bit chain code was used for coding. Given the slope of and Since the 32-base encoding was adopted, the encoding corresponding to the θ angle at P centre is given by In order to reduce the likelihood of coding different tilted versions of the same skyline, the normal vector of the curve was recorded at each sampled point of skyline (as illustrated in Figure 11), and 32-bit chain code was used for coding. Given the slope of → P left · P right , and  Therefore, after extracting the skyline using supervised learning [27] and saving it as a list (as illustrated in Figure 12), a skyline comprises two Enhanced Angle Chain codes named curvature Enhanced Angle Chain code and normal Enhanced Angle Chain code to describe its features (see Table 2). Curvature enhanced chain code records the angle between the sampled point of the curve and the surrounding sampled points. Normal Enhanced Angle Chain code records the normal vector angle at each sampled point of the curve.

Subfeature
Name Subfeature Content curvature enhanced angle chain code Therefore, after extracting the skyline using supervised learning [27] and saving it as a list (as illustrated in Figure 12), a skyline comprises two Enhanced Angle Chain codes named curvature Enhanced Angle Chain code and normal Enhanced Angle Chain code to describe its features (see Table 2). Curvature enhanced chain code records the angle between the sampled point of the curve and the surrounding sampled points. Normal Enhanced Angle Chain code records the normal vector angle at each sampled point of the curve. Therefore, after extracting the skyline using supervised learning [27] and saving it as a list (as illustrated in Figure 12), a skyline comprises two Enhanced Angle Chain codes named curvature Enhanced Angle Chain code and normal Enhanced Angle Chain code to describe its features (see Table 2). Curvature enhanced chain code records the angle between the sampled point of the curve and the surrounding sampled points. Normal Enhanced Angle Chain code records the normal vector angle at each sampled point of the curve.

Lapel Point
Lapel refers to the direction in which the two collars overlap. In different periods of ancient China, due to the different requirements of different dynasties, there were different requirements for left lapel and right lapel (see Figure 13).

Lapel Point
Lapel refers to the direction in which the two collars overlap. In different periods of ancient China, due to the different requirements of different dynasties, there were different requirements for left lapel and right lapel (see Figure 13). Right lapel means the left collar is covering the right collar, where its front looks like a small letter y at the edge of the collar. The direction of left lapel is opposite to the right lapel, and the collar opening is left (as illustrated in Figure 14). Right lapel means the left collar is covering the right collar, where its front looks like a small letter y at the edge of the collar. The direction of left lapel is opposite to the right lapel, and the collar opening is left (as illustrated in Figure 14).

Lapel Point
Lapel refers to the direction in which the two collars overlap. In different periods of ancient China, due to the different requirements of different dynasties, there were different requirements for left lapel and right lapel (see Figure 13). Right lapel means the left collar is covering the right collar, where its front looks like a small letter y at the edge of the collar. The direction of left lapel is opposite to the right lapel, and the collar opening is left (as illustrated in Figure 14). The lapel point of skyline refers to the intersection of ridge line and skyline and belong to both skyline and ridge line (as illustrated in Figure 15).

Lapel Point
Lapel refers to the direction in which the two collars overlap. In different periods of ancient China, due to the different requirements of different dynasties, there were different requirements for left lapel and right lapel (see Figure 13). Right lapel means the left collar is covering the right collar, where its front looks like a small letter y at the edge of the collar. The direction of left lapel is opposite to the right lapel, and the collar opening is left (as illustrated in Figure 14).

Extracting Lapels from Images
Since the image and DEM are of different data types, the methods of extracting the lapel point are also different, as follows. First, skyline and ridge line are extracted from the image using methods of semantic segmentation, such as U-Net. Second, the sliding window is used to traverse each pixel on the skyline and extract its immediate eight neighbor. Then, each ridge point in the eight neighbors are marked as candidate points, and the trend of the ridge line to which the candidate point of the ridge line belongs is found through the dynamic programming algorithm. Finally, by comparing the x-axis coordinates of the candidate points with the midpoint of the ridge line, the type of the lapel points is determined, as in Table 3. Unlike image data, it is easy to use DEM to obtain the depth information of skyline while extracting the skyline. The jumping point of DEM skyline refers to two adjacent points, and the distances between the two skyline points and the observation point are quite different. The reason is similar to that of the image skyline lapel point. The distances between the skyline of the same mountain range and the observation point are relatively small, while the distances between different mountains and the observation point are quite different due to the different spatial positions. The depth information of the DEM skyline can be obtained from the coordinate points corresponding to the observation points and the skyline points (see Figure 16). The depth jump points of the DEM skyline were obtained by calculating the first-order derivative of the depth value of the skyline, i.e.,

Extracting Lapels from Images
Since the image and DEM are of different data types, the methods of extracting the lapel point are also different, as follows. First, skyline and ridge line are extracted from the image using methods of semantic segmentation, such as U-Net. Second, the sliding window is used to traverse each pixel on the skyline and extract its immediate eight neighbor. Then, each ridge point in the eight neighbors are marked as candidate points, and the trend of the ridge line to which the candidate point of the ridge line belongs is found through the dynamic programming algorithm. Finally, by comparing the x-axis coordinates of the candidate points with the midpoint of the ridge line, the type of the lapel points is determined, as in Table 3.

Extracting Lapels from DEM
Unlike image data, it is easy to use DEM to obtain the depth information of skyline while extracting the skyline. The jumping point of DEM skyline refers to two adjacent points, and the distances between the two skyline points and the observation point are quite different. The reason is similar to that of the image skyline lapel point. The distances between the skyline of the same mountain range and the observation point are relatively small, while the distances between different mountains and the observation point are quite different due to the different spatial positions. The depth information of the DEM skyline can be obtained from the coordinate points corresponding to the observation points and the skyline points (see Figure 16). The depth jump points of the DEM skyline were obtained by calculating the first-order derivative of the depth value of the skyline, i.e., where the threshold value corresponds to points with great change in the depth of the skyline (see Figure 17). The left and right sides of these jumping points belong to different mountains, corresponding to the lapel points in the same position in the image skyline.
where the threshold value corresponds to points with great change in the depth of the skyline (see Figure 17). The left and right sides of these jumping points belong to different mountains, corresponding to the lapel points in the same position in the image skyline.  There are differences between DEM and image. When the distance between two lapels of DEM is less than the width threshold, it indicates that only a small part of a mountain range appears in the skyline, and thus may not be recognized. When the two lapels are of the same type, they will be merged into one lapel point. When the two lapels are of different types, the two lapels will be offset. The method of saving jump points in DEM is where Index is the x-axis coordinate of the lapel point in the DEM rendering image, and Type is the type of the lapel point.

Building DEM Skyline Feature Database
The DEM skyline database was used to record DEM skyline features. Through the efficient management of features, the efficiency of image retrieval in the database can be greatly improved. For close skyline in hilly area, the heatmap of skyline distance was added to improve the locating accuracy. An overview of the database building method is shown in Figure 18.
where the threshold value corresponds to points with great change in the depth of the skyline (see Figure 17). The left and right sides of these jumping points belong to different mountains, corresponding to the lapel points in the same position in the image skyline.  There are differences between DEM and image. When the distance between two lapels of DEM is less than the width threshold, it indicates that only a small part of a mountain range appears in the skyline, and thus may not be recognized. When the two lapels are of the same type, they will be merged into one lapel point. When the two lapels are of different types, the two lapels will be offset. The method of saving jump points in DEM is where Index is the x-axis coordinate of the lapel point in the DEM rendering image, and Type is the type of the lapel point.

Building DEM Skyline Feature Database
The DEM skyline database was used to record DEM skyline features. Through the efficient management of features, the efficiency of image retrieval in the database can be greatly improved. For close skyline in hilly area, the heatmap of skyline distance was added to improve the locating accuracy. An overview of the database building method is shown in Figure 18. Example of first-order derivative of skyline depth (left) and the effect after thresholding (right).
There are differences between DEM and image. When the distance between two lapels of DEM is less than the width threshold, it indicates that only a small part of a mountain range appears in the skyline, and thus may not be recognized. When the two lapels are of the same type, they will be merged into one lapel point. When the two lapels are of different types, the two lapels will be offset. The method of saving jump points in DEM is where Index is the x-axis coordinate of the lapel point in the DEM rendering image, and Type is the type of the lapel point.

Building DEM Skyline Feature Database
The DEM skyline database was used to record DEM skyline features. Through the efficient management of features, the efficiency of image retrieval in the database can be greatly improved. For close skyline in hilly area, the heatmap of skyline distance was added to improve the locating accuracy. An overview of the database building method is shown in Figure 18. Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 24 Figure 18. Flowchart for constructing the DEM database.

Sampling and Generating 3D Rendering Image
The DEM data sampling density used in this dataset was 10 m, and we obtained the DEM skylines for every 0.002° in latitude and longitude and from 1.80 m above the ground, giving a total of 71 × 71 DEM skylines. Figure 19 illustrates the rendering of a DEM sampled point.
(a) (b) Figure 19. 360° rendering generated from a DEM sampled point. (a) Original greyscale image. The leftmost end of the image is in the north direction, which is unfolded in a clockwise direction. The middle of the picture is due south, and the far right is due north. The white part of the image is the sky, and the lower part shows the distance between the mountain and the observation point displayed with different grey levels. The closer the mountain is, the higher the grey value. The lower boundary of the white area is the skyline, and where the grey level of the lower half changes is the ridge line. (b) The pseudo color image was generated from the original greyscale image, which is convenient for manual observation and judgment.

Divide into 24 Subsections
Similar to the method described by the authors of [25], at each sample point, we rendered twenty-four 30° images at 15° offsets, covering the full 360° panorama at the location. Thus, the DEM skyline database was divided into 24 subdatabases according to angles, thereby accelerating the speed of retrieving and matching of the corresponding skyline.

Get the Features of the Skylines
The skyline features were extracted from each subdatabase, including the enhanced angle chain codes and skyline lapel points.

Sampling and Generating 3D Rendering Image
The DEM data sampling density used in this dataset was 10 m, and we obtained the DEM skylines for every 0.002 • in latitude and longitude and from 1.80 m above the ground, giving a total of 71 × 71 DEM skylines. Figure 19 illustrates the rendering of a DEM sampled point.

Sampling and Generating 3D Rendering Image
The DEM data sampling density used in this dataset was 10 m, and we obtained the DEM skylines for every 0.002° in latitude and longitude and from 1.80 m above the ground, giving a total of 71 × 71 DEM skylines. Figure 19 illustrates the rendering of a DEM sampled point. The leftmost end of the image is in the north direction, which is unfolded in a clockwise direction. The middle of the picture is due south, and the far right is due north. The white part of the image is the sky, and the lower part shows the distance between the mountain and the observation point displayed with different grey levels. The closer the mountain is, the higher the grey value. The lower boundary of the white area is the skyline, and where the grey level of the lower half changes is the ridge line. (b) The pseudo color image was generated from the original greyscale image, which is convenient for manual observation and judgment.

Divide into 24 Subsections
Similar to the method described by the authors of [25], at each sample point, we rendered twenty-four 30° images at 15° offsets, covering the full 360° panorama at the location. Thus, the DEM skyline database was divided into 24 subdatabases according to angles, thereby accelerating the speed of retrieving and matching of the corresponding skyline.

Get the Features of the Skylines
The skyline features were extracted from each subdatabase, including the enhanced angle chain codes and skyline lapel points. The leftmost end of the image is in the north direction, which is unfolded in a clockwise direction. The middle of the picture is due south, and the far right is due north. The white part of the image is the sky, and the lower part shows the distance between the mountain and the observation point displayed with different grey levels. The closer the mountain is, the higher the grey value. The lower boundary of the white area is the skyline, and where the grey level of the lower half changes is the ridge line. (b) The pseudo color image was generated from the original greyscale image, which is convenient for manual observation and judgment.

Divide into 24 Subsections
Similar to the method described by the authors of [25], at each sample point, we rendered twenty-four 30 • images at 15 • offsets, covering the full 360 • panorama at the location. Thus, the DEM skyline database was divided into 24 subdatabases according to angles, thereby accelerating the speed of retrieving and matching of the corresponding skyline.

Get the Features of the Skylines
The skyline features were extracted from each subdatabase, including the enhanced angle chain codes and skyline lapel points.

Get Heatmap of the Skyline Distance
Unlike other methods, the heatmap of DEM skyline distance was the first to be proposed for hilly area. For each subdatabase, the minimum distance of each sampling point skyline was extracted, and the distance heatmap was generated (as illustrated in Appendix A Figure A1).
As mentioned in Section 3.2, as the distance between the skyline and the camera becomes closer, the error between the location where the photo was taken and the ground truth will have an increasing impact on locating. Therefore, the distance in the heatmap was classified (as illustrated in Table 4 and Figure 20), and different locating strategies were adopted for different categories.

Get Heatmap of the Skyline Distance
Unlike other methods, the heatmap of DEM skyline distance was the first to be proposed for hilly area. For each subdatabase, the minimum distance of each sampling point skyline was extracted, and the distance heatmap was generated (as illustrated in Appendix A Figure A1).
As mentioned in Section 3.2, as the distance between the skyline and the camera becomes closer, the error between the location where the photo was taken and the ground truth will have an increasing impact on locating. Therefore, the distance in the heatmap was classified (as illustrated in Table 4 and Figure 20), and different locating strategies were adopted for different categories.

Coarse Matching
In the proposed coarse matching, skyline features are extracted from the image, and the corresponding DEM subdataset is selected according to the heading angle (obtained by compass, etc.). For hilly areas and the heatmap of skyline, the image skyline pyramid was proposed to improve the geolocalization accuracy of the skyline. According to the distance between adjacent skyline in DEM database and the distance between camera and skyline, the skyline of the original image was enlarged and reduced by 10% and 20%, respectively (as illustrated in Figure 21).

Coarse Matching
In the proposed coarse matching, skyline features are extracted from the image, and the corresponding DEM subdataset is selected according to the heading angle (obtained by compass, etc.). For hilly areas and the heatmap of skyline, the image skyline pyramid was proposed to improve the geolocalization accuracy of the skyline. According to the distance between adjacent skyline in DEM database and the distance between camera and skyline, the skyline of the original image was enlarged and reduced by 10% and 20%, respectively (as illustrated in Figure 21).

Get Heatmap of the Skyline Distance
Unlike other methods, the heatmap of DEM skyline distance was the first to be proposed for hilly area. For each subdatabase, the minimum distance of each sampling point skyline was extracted, and the distance heatmap was generated (as illustrated in Appendix A Figure A1).
As mentioned in Section 3.2, as the distance between the skyline and the camera becomes closer, the error between the location where the photo was taken and the ground truth will have an increasing impact on locating. Therefore, the distance in the heatmap was classified (as illustrated in Table 4 and Figure 20), and different locating strategies were adopted for different categories.

Coarse Matching
In the proposed coarse matching, skyline features are extracted from the image, and the corresponding DEM subdataset is selected according to the heading angle (obtained by compass, etc.). For hilly areas and the heatmap of skyline, the image skyline pyramid was proposed to improve the geolocalization accuracy of the skyline. According to the distance between adjacent skyline in DEM database and the distance between camera and skyline, the skyline of the original image was enlarged and reduced by 10% and 20%, respectively (as illustrated in Figure 21). When extracting the Enhanced Angle chain code, the original camera FoV is used as the skyline of the original image, and the zoomed FoV is used for the skyline of other sizes. When extracting the Enhanced Angle chain code, the original camera FoV is used as the skyline of the original image, and the zoomed FoV is used for the skyline of other sizes.
The common edit distance is used to determine the similarity of two skylines. The magnitude of edit difference and the mean of edit distance are used to measure the difference between two skylines. Unlike the traditional edit distance, the difference in distance of skyline matching is not only related to the number of different characters, but also related to the similarity of different characters. If the two strings to be compared are str1 and str2, and a i and b i are, respectively, the ith character on str1 and str2, then the average distance between the two strings is where computes the degree of difference between two characters, and the range of value is [0, 1). A value of 0 means the two characters are exactly the same, and a value of 1 means the difference between the two characters is very large. The image skyline features are compared with the features of each point in the subdataset, and the minimum error between the image skyline and each sampling point in the subdataset is recorded. For different types of heatmaps, different comparison strategies are adopted.
When the skyline is far away from the sampling point, the features of the original image skyline and the skyline features of the sampling point are used for comparison, and the error is recorded. When the distance between the skyline and the sampling point is moderate, the skyline of the original image and the skyline compressed and expanded by 10% are compared with the skyline features of the sampling points respectively. The minimum of the three error values is recorded. When the skyline is close to the sample point, the skyline of the original image and the skyline compressed and expanded by 10% and 20%, are compared with the feature of the sampling point. The minimum value of the five error values is recorded. The process of coarse location is shown in the Figure 22

Refined Matching
Coarse matching can obtain a location result which usually has about 200 candidates (as illustrated in Figure 23). Refined matching was used to improve the coarse matching results and eliminate any unreasonable candidate points (as illustrated in Figure 24). Coarse matching only found similar points in the DEM skyline database through skyline contour features, and refined matching eliminates interference results through location, type, and sequence of lapel points. Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 24     Figure 24. Coarse matching (a) and refined matching (b) The blue dots indicate the candidate points for localization, and the dark color means that the coarse matching error was small. lapel points can greatly improve the localization accuracy.

Experiment and Analysis
In order to demonstrate the effectiveness of the proposed method, we first performed experiments on the effects of noise, pitch, rotation, and other factors on the proposed method. We then compared the performance between the proposed method and the art-of-state methods on our dataset.

Dataset
The dataset was collected in a hilly area (as shown in Figure 25), which was about 200 km 2 (28.08° N, 112.69° E~28.22° N, 112.83° E), located in the suburb of Changsha City, China. In this region of interest (ROI), the skylines change greatly, and the vegetation is dense. However, the manmade buildings are few and have little interference on the trend of the skylines. The DEM dataset with an accuracy of 10 m was used to establish the DEM skyline database. Meanwhile, more than 200 photos were captured with a camera (focal length was 8 mm, FoV was 24°, and the resolution was 1920 × 1080), which was fixed on the platform about 180 cm above the ground. Figure 26 shows an example photo and Table 5 shows the longitude and latitude, attitude angle, and other information during the photoshoot. Figure 24. Coarse matching (a) and refined matching (b) The blue dots indicate the candidate points for localization, and the dark color means that the coarse matching error was small. lapel points can greatly improve the localization accuracy.

Experiment and Analysis
In order to demonstrate the effectiveness of the proposed method, we first performed experiments on the effects of noise, pitch, rotation, and other factors on the proposed method. We then compared the performance between the proposed method and the art-of-state methods on our dataset.

Dataset
The dataset was collected in a hilly area (as shown in Figure 25 Figure 24. Coarse matching (a) and refined matching (b) The blue dots indicate the candidate points for localization, and the dark color means that the coarse matching error was small. lapel points can greatly improve the localization accuracy.

Experiment and Analysis
In order to demonstrate the effectiveness of the proposed method, we first performed experiments on the effects of noise, pitch, rotation, and other factors on the proposed method. We then compared the performance between the proposed method and the art-of-state methods on our dataset.

Dataset
The dataset was collected in a hilly area (as shown in Figure 25), which was about 200 km 2 (28.08° N, 112.69° E~28.22° N, 112.83° E), located in the suburb of Changsha City, China. In this region of interest (ROI), the skylines change greatly, and the vegetation is dense. However, the manmade buildings are few and have little interference on the trend of the skylines. The DEM dataset with an accuracy of 10 m was used to establish the DEM skyline database. Meanwhile, more than 200 photos were captured with a camera (focal length was 8 mm, FoV was 24°, and the resolution was 1920 × 1080), which was fixed on the platform about 180 cm above the ground. Figure 26 shows an example photo and Table 5 shows the longitude and latitude, attitude angle, and other information during the photoshoot. The DEM dataset with an accuracy of 10 m was used to establish the DEM skyline database. Meanwhile, more than 200 photos were captured with a camera (focal length was 8 mm, FoV was 24 • , and the resolution was 1920 × 1080), which was fixed on the platform about 180 cm above the ground. Figure 26 shows an example photo and Table 5 shows the longitude and latitude, attitude angle, and other information during the photoshoot.

Localization Performance
The experiment involved 284 photo skylines and 5041 DEM skylines in an area of about 200 km 2 . The values of Top_1 to Top_100 items in the candidate list of correct locations were calculated. The photo skylines and the ground truth of DEM skylines were different in some details due to vegetation effects and error between locations (as illustrated in Figure 27). The skyline features of each image were extracted in turn, and then retrieved in the DEM skyline database, so that each image obtained a candidate list of DEM location points according to the matching score. Similar to the method described by the authors of [18], we measured the performance

Localization Performance
The experiment involved 284 photo skylines and 5041 DEM skylines in an area of about 200 km 2 . The values of Top_1 to Top_100 items in the candidate list of correct locations were calculated. The photo skylines and the ground truth of DEM skylines were different in some details due to vegetation effects and error between locations (as illustrated in Figure 27).

Localization Performance
The experiment involved 284 photo skylines and 5041 DEM skylines in an area of about 200 km 2 . The values of Top_1 to Top_100 items in the candidate list of correct locations were calculated. The photo skylines and the ground truth of DEM skylines were different in some details due to vegetation effects and error between locations (as illustrated in Figure 27). The skyline features of each image were extracted in turn, and then retrieved in the DEM skyline database, so that each image obtained a candidate list of DEM location points according to the matching score. Similar to the method described by the authors of [18], we measured the performance The skyline features of each image were extracted in turn, and then retrieved in the DEM skyline database, so that each image obtained a candidate list of DEM location points according to the matching score. Similar to the method described by the authors of [18], we measured the performance of several localization methods by counting the proportion of the Top_n items in the candidate list containing the correct location. Figure 28 shows that the new algorithm had better performance.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 19 of 24 of several localization methods by counting the proportion of the Top_n items in the candidate list containing the correct location. Figure 28 shows that the new algorithm had better performance. Although the proposed method performed better, the method failed on some images with a skyline which was only tens of meters away from the camera (as illustrated in Figure 29).

Robustness against Noise
In order to verify that the proposed method has better localization effect in the case of vegetation, the experiment added noise to photo skylines by randomly adding 1~5 pixels of noise on each image skyline point (as shown in Table 6). Then, the robustness of the two algorithms was calculated. Figure  30 shows the effects of added noise on a skyline. Figure 31 shows that the proposed method and concavity-based method have stronger anti-noise ability.  Although the proposed method performed better, the method failed on some images with a skyline which was only tens of meters away from the camera (as illustrated in Figure 29).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 19 of 24 of several localization methods by counting the proportion of the Top_n items in the candidate list containing the correct location. Figure 28 shows that the new algorithm had better performance. Although the proposed method performed better, the method failed on some images with a skyline which was only tens of meters away from the camera (as illustrated in Figure 29).

Robustness against Noise
In order to verify that the proposed method has better localization effect in the case of vegetation, the experiment added noise to photo skylines by randomly adding 1~5 pixels of noise on each image skyline point (as shown in Table 6). Then, the robustness of the two algorithms was calculated. Figure  30 shows the effects of added noise on a skyline. Figure 31 shows that the proposed method and concavity-based method have stronger anti-noise ability.

Robustness against Noise
In order to verify that the proposed method has better localization effect in the case of vegetation, the experiment added noise to photo skylines by randomly adding 1~5 pixels of noise on each image skyline point (as shown in Table 6). Then, the robustness of the two algorithms was calculated. Figure 30 shows the effects of added noise on a skyline. Figure 31 shows that the proposed method and concavity-based method have stronger anti-noise ability.

Rotation Invariance
In the experiment, in order to test the effect of camera roll angle on skyline geolocation, the image skyline was rotated by 1~4° in turn along the Z axis, and the matching skyline in the DEM database with the rotated curve was found. This experiment determined the robustness of the proposed method when a skyline was subjected to rotation (see Figure 32). Figure 33 shows that the proposal method has a strong adaptability to rotation, and a rotation of 4° had few effects on its accuracy. But with the increase in the degree of rotation, the accuracy of contour word decreased rapidly.

Rotation Invariance
In the experiment, in order to test the effect of camera roll angle on skyline geolocation, the image skyline was rotated by 1~4° in turn along the Z axis, and the matching skyline in the DEM database with the rotated curve was found. This experiment determined the robustness of the proposed method when a skyline was subjected to rotation (see Figure 32). Figure 33 shows that the proposal method has a strong adaptability to rotation, and a rotation of 4° had few effects on its accuracy. But with the increase in the degree of rotation, the accuracy of contour word decreased rapidly.

Rotation Invariance
In the experiment, in order to test the effect of camera roll angle on skyline geolocation, the image skyline was rotated by 1~4 • in turn along the Z axis, and the matching skyline in the DEM database with the rotated curve was found. This experiment determined the robustness of the proposed method when a skyline was subjected to rotation (see Figure 32). Appl. Sci. 2020, 10, x FOR PEER REVIEW 21 of 24

Conclusions and Future Work
Skyline localization is an important branch of visual geolocation, especially in the field where the feature points are sparse. When a GPS signal is unavailable or disturbed, skyline localization is an important auxiliary localization method in the field environment, but there is little research on skyline localization in hilly areas. This paper analyzed the difficulties of skyline locating in hilly areas, including lush vegetation and closeness to skyline. Therefore, a new method of skyline localization was proposed in this paper, which includes two new skyline features: Enhanced Angle chain code and skyline lapel point, and a new matching method which incorporates the skyline pyramid and the skyline distance heatmap. The experimental results show that this method has high localization accuracy in hilly areas. The proposed method also has some limitations. When the nearby trees significantly block the skyline, or when the mountain in front of the camera is too close, the accuracy of the method is low.
In the future, we will investigate enhanced lapel point, which adds the ridge line lapel point (the lapel point formed between ridge lines) to existing skyline lapels, so as to improve the accuracy of geolocation. Although the interference of atmospheric environment is not within the scope of this paper, the skyline geolocation under multiple visibilities is important, because the photo will change due to haze and other reasons.
Author Contributions: J.T., X.X. and Z.W. contributed to the conception of the study; Z.P. performed the experiment; Z.P. and J.T. contributed significantly to analysis and manuscript preparation; Z.P., J.T. and T.T. performed the data analyses and wrote the manuscript; Z.P. and J.T. helped perform the analysis with constructive discussions. All authors have read and agreed to the published version of the manuscript.  Figure 33 shows that the proposal method has a strong adaptability to rotation, and a rotation of 4 • had few effects on its accuracy. But with the increase in the degree of rotation, the accuracy of contour word decreased rapidly.

Conclusions and Future Work
Skyline localization is an important branch of visual geolocation, especially in the field where the feature points are sparse. When a GPS signal is unavailable or disturbed, skyline localization is an important auxiliary localization method in the field environment, but there is little research on skyline localization in hilly areas. This paper analyzed the difficulties of skyline locating in hilly areas, including lush vegetation and closeness to skyline. Therefore, a new method of skyline localization was proposed in this paper, which includes two new skyline features: Enhanced Angle chain code and skyline lapel point, and a new matching method which incorporates the skyline pyramid and the skyline distance heatmap. The experimental results show that this method has high localization accuracy in hilly areas. The proposed method also has some limitations. When the nearby trees significantly block the skyline, or when the mountain in front of the camera is too close, the accuracy of the method is low.
In the future, we will investigate enhanced lapel point, which adds the ridge line lapel point (the lapel point formed between ridge lines) to existing skyline lapels, so as to improve the accuracy of geolocation. Although the interference of atmospheric environment is not within the scope of this paper, the skyline geolocation under multiple visibilities is important, because the photo will change due to haze and other reasons.
Author Contributions: J.T., X.X. and Z.W. contributed to the conception of the study; Z.P. performed the experiment; Z.P. and J.T. contributed significantly to analysis and manuscript preparation; Z.P., J.T. and T.T. performed the data analyses and wrote the manuscript; Z.P. and J.T. helped perform the analysis with constructive discussions. All authors have read and agreed to the published version of the manuscript.

Conclusions and Future Work
Skyline localization is an important branch of visual geolocation, especially in the field where the feature points are sparse. When a GPS signal is unavailable or disturbed, skyline localization is an important auxiliary localization method in the field environment, but there is little research on skyline localization in hilly areas. This paper analyzed the difficulties of skyline locating in hilly areas, including lush vegetation and closeness to skyline. Therefore, a new method of skyline localization was proposed in this paper, which includes two new skyline features: Enhanced Angle chain code and skyline lapel point, and a new matching method which incorporates the skyline pyramid and the skyline distance heatmap. The experimental results show that this method has high localization accuracy in hilly areas. The proposed method also has some limitations. When the nearby trees significantly block the skyline, or when the mountain in front of the camera is too close, the accuracy of the method is low.
In the future, we will investigate enhanced lapel point, which adds the ridge line lapel point (the lapel point formed between ridge lines) to existing skyline lapels, so as to improve the accuracy of geolocation. Although the interference of atmospheric environment is not within the scope of this paper, the skyline geolocation under multiple visibilities is important, because the photo will change due to haze and other reasons.