A Vector Line Simpliﬁcation Algorithm Based on the Douglas–Peucker Algorithm, Monotonic Chains and Dichotomy

: When using the traditional Douglas–Peucker (D–P) algorithm to simplify linear objects, it is easy to generate results containing self-intersecting errors, thus a ﬀ ecting the application of the D–P algorithm. To solve the problem of self-intersection, a new vector line simpliﬁcation algorithm based on the D–P algorithm, monotonic chains and dichotomy, is proposed in this paper. First, the traditional D–P algorithm is used to simplify the original lines, and then the simpliﬁed lines are divided into several monotonic chains. Second, the dichotomy is used to search the intersection positions of monotonic chains e ﬀ ectively, and intersecting monotonic chains are processed, thus solving the self-intersection problems. Two groups of experimental data are selected based on large data sets. Results demonstrate that the proposed experimental method has advantages in algorithmic e ﬃ ciency and accuracy when compared to the D–P algorithm and the Star-shaped algorithm.


Introduction
With the development of remote-sensing technology, sensor technology, and Web 2.0, the large amounts of obtained spatial vector data produce great challenges in data storage, processing, and transmission.To enhance the processing capability for massive spatial vector data, new vector data simplification algorithms with high efficiency and robustness are urgently needed.
There are many classical methods used to simplify vector data, including the Douglas-Peucker algorithm (D-P algorithm) [1], Ramer algorithm [2], and other algorithms [3][4][5][6][7][8][9].The D-P algorithm [1] and Ramer algorithm [2] use a given distance tolerance to determine which vertices on a line are to be eliminated or retained.Lang [3] used a perpendicular distance tolerance to filter data, but the method was too time consuming [1].Based on a sequential set of five procedures, McMaster [4] presented a conceptual model to process linear digital data.This employed method used the perpendicular distance tolerance proposed by Lang [4] to simplify the lines and used smoothing techniques to produce the most aesthetically acceptable results.Based on selecting local minima and maxima, an algorithm for compressing digital contour data has been developed by Li [5].The new algorithm was more efficient than the D-P algorithm, but the result remained the same as the D-P algorithm.Visvalingam and Whyatt [6] used the "effective area" to simplify the line features and discussed the influence of rounding errors on a version of the Ramer-Douglas-Peucker algorithm [1,2] for line simplification.To show how to make robust, precise, and reproducible geographic information systems complex steps.To solve the self-intersection problems when using the D-P algorithm to simplify the linear objects and improve the efficiency of the algorithm, a new vector line simplification algorithm that combines the D-P algorithm, monotonic chains and dichotomy is proposed in this paper.There are four main stages: first, the D-P algorithm is used to process the original lines; second, the monotonic chain method is used to divide the simplified lines into monotonic chains if the simplified lines have self-intersection problems; third, the dichotomy is used to quickly and accurately locate the self-intersection position of the simplified lines, process the self-intersection problems, and obtain the final result; finally, the experimental results are presented in this part, and the results of the experiments show that our proposed method demonstrates a more effective and higher performance.
The remainder of this paper is organized as follows.The basic theories, methods and steps of the new algorithm are introduced in Section 2. Experimental results and analysis are reported in Section 3. Conclusions are drawn in Section 4.

Methodology
In this section, we will first introduce the basic theories of the D-P algorithm, monotonic chains and the dichotomy method; then, the basic steps of the improved algorithm are introduced in further detail.A flow chart of the proposed research method is shown in Figure 1.

Basic Theory of the Douglas-Peucker (D-P) Algorithm
The D-P algorithm is a classic algorithm used for curve compression.The algorithm is used to simplify polylines by deleting non-feature vertices and retaining the feature vertices.The basic theory and computational steps of the D-P algorithm are as follows [1,24]:

Basic Theory of the Douglas-Peucker (D-P) Algorithm
The D-P algorithm is a classic algorithm used for curve compression.The algorithm is used to simplify polylines by deleting non-feature vertices and retaining the feature vertices.The basic theory and computational steps of the D-P algorithm are as follows [1,24]: Step 1: For a curve L, which is composed of N coordinate vertices, the coordinate vertices set V is written as V = {v 1 , v 2 , . . ., v i , . . ., v N }, (i = 1, 2, . . ., N).First, connect the first vertex v 1 and the last vertex v N , to obtain a new straight line L v 1 v N .Second, calculate the shortest distances between the remaining vertices {v 2 , . . ., v N−1 } and the new straight line L v 1 v N and obtain the shortest distance sets D = {D 2 , . . .D k , . . ., D N−1 } (D k is the shortest distance between vertex v k and the new straight line L v 1 v N ); Step 2: Select the maximum distance (D max ) with shortest distance D, D max = D k (D k is the shortest distance between vertex v k and the new straight line L v 1 v N ).Given a distance ε as the distance threshold, if D max < ε, then the remaining vertices {v 2 , . . ., v N−1 } from vertices set V = {v 1 , v 2 , . . ., v N } are deleted, the given curve L is compressed into a straight line L v 1 v N and the D-P algorithm is finished.
Step 3: For the subsets V t and V s , repeat step 1 and 2, respectively.If all of the calculated shortest distances are less than the giving distance threshold (ε), then end the D-P algorithm.
Dichotomy: Dichotomy is one of the most commonly used search algorithms for ordinal sequences and has a high search efficiency [12,13].Given the target element t and the ordered sequence K = {k 1 , k 2 , . . ., k i , . . ., k U }, (i = 1, 2, . . ., U), (t ∈ K), to search for the target element t from K, the basic theory of dichotomy is as follows: Step 1: For the target element t, compare t with the intermediate element k U 2 from the sequence K.
, then K will be divided into two parts: , then execute step 1 in the K 2 until the target element t is found from the ordered K 2 ; if t < k U 2 , then execute step 1 in the K 1 until the target element t is found from the ordered K 1 .
In vector spatial data structure, it is well known that a simple curve is composed of a number of line segments.For a curve L, there are N coordinate vertices: P = p 1 , p 2 , . . ., p i , . . ., p N , (i = 1, 2, . . ., N), and the curve L is composed of some line segments, such as: (N is the number of the coordinate vertices).Figure 2, shows that curve L is composed of 26 coordinate vertices (0, 1, 2, . . ., 25).In the Gauss-Krueger plane rectangular coordinate system, the horizontal axis was the Y-axis, and the vertical axis was the X-axis.Along the Y-axis, L could be divided into two monotonic chains L i (i = 0, 1, 2, . . ., 13) and L ji ( j = 13, 14, . . ., 25).For L i , along the Y-axis, vertex p 0 is the smallest, and the vertex p 13 is the biggest, and L i is a monotonic increasing chain; For L j , along the Y-axis, vertex p 13 is the biggest, and the vertex p 25 is the smallest, and L j is a monotonic decreasing chain.When using the D-P algorithm to process the curve L, it should be noted that if the final result has self-intersection problems, it has been caused by the corresponding monotonic chains L i and L j .p is the biggest, and i L′ is a monotonic increasing chain; For j L′ , along the Y -axis, vertex 13 p is the biggest, and the vertex 25 p is the smallest, and j L′ is a monotonic decreasing chain.When using the D-P algorithm to process the curve , it should be noted that if the final result has selfintersection problems, it has been caused by the corresponding monotonic chains i L′ and j L′ .

The New Vector Line Simplification Algorithm based on the D-P Algorithm, Monotonic Chains and Dichotomy
This paper used the monotonic chains and dichotomy to solve the self-intersection problem in spatial line simplification when processed by the D-P algorithm.In our proposed method, we firstly use the D-P algorithm to simplify the original polyline M , and obtain the simplified polyline T ; Secondly, we check the self-intersection problems of T .If T does not have self-intersection problems, then we end this proposed method, otherwise, we use monotonic chain technology to quickly divide the T into several sequential monotonic chains; Thirdly, the dichotomy, MER (minimum-area enclosing rectangle, which refers to the rectangle with the smallest area that encloses the polyline) and geometric calculation method are used to process the sequential monotonic chains, in order to quickly locate the positions of the self-intersection problems of the sequential monotonic chains and solve the self-intersection problems, to obtain the final results.
This strategy of the proposed method does not only take the curve characteristics of a polyline into account, but also improves the time consumption of the proposed method.The main steps of the proposed method are described below.

The New Vector Line Simplification Algorithm based on the D-P Algorithm, Monotonic Chains and Dichotomy
This paper used the monotonic chains and dichotomy to solve the self-intersection problem in spatial line simplification when processed by the D-P algorithm.In our proposed method, we firstly use the D-P algorithm to simplify the original polyline M, and obtain the simplified polyline T; Secondly, we check the self-intersection problems of T. If T does not have self-intersection problems, then we end this proposed method, otherwise, we use monotonic chain technology to quickly divide the T into several sequential monotonic chains; Thirdly, the dichotomy, MER (minimum-area enclosing rectangle, which refers to the rectangle with the smallest area that encloses the polyline) and geometric calculation method are used to process the sequential monotonic chains, in order to quickly locate the positions of the self-intersection problems of the sequential monotonic chains and solve the self-intersection problems, to obtain the final results.
This strategy of the proposed method does not only take the curve characteristics of a polyline into account, but also improves the time consumption of the proposed method.The main steps of the proposed method are described below.
Step 1: Use the D-P algorithm to process one curve M (There aren't self-intersection errors of M) and obtain a new curve T.
Step 2: Check the self-intersection problems of the T; if there are self-intersection errors, then perform step 3; otherwise, T is the final result of line simplification.
Step 3: For T, after step 2 of processing, if there are self-intersection errors, according to the sequence of the coordinate vertices, use the monotonic chain technology (as described in Section 2.2) to divide T into several sequential monotonic chains T 1 , T 2 , . . ., T i , . . ., T j , . . ., T n (i, j ∈ [1, n]).
Step 4: For monotonic chains T i and T j , which include and coordinate vertices, respectively, if n ≥ m, then use the dichotomy to quickly divide T i into two monotonic chains: L 1,t and L t,n (t = n 2 , when n was even; or t = n 2 + 1, when n was odd, n is an integer, and n > 1), L 1,t and L t,n are also two monotonic chains.Similarly, if n < m, then divide T j into two monotonic chains S 1,t and S t,m (t = m 2 , when m was even; or t = m 2 + 1, when m was odd, m is an integer, and m > 1), S 1,t and S t,m are also two monotonic chains.
Step 5: If n ≥ m, calculate the MER of L 1,t , L t,n and T j , respectively, as R L 1,t , R L t,n and R T j .Similarly, if n < m, calculate the MER of S 1,t , S t,m and T i , respectively, as R S 1,t , R S t,n and R T i .
Step 6: For R L 1,t , R L t,n and R T j , if R L 1,t ∩ R T j = R L t,n ∩ R T j = φ, then there is a non-intersection between T i and T j ; If R L 1,t ∩ R T j φ, and R L t,n ∩ R T j = φ, then there may be an intersection problem between the monotonic chain L 1,t and T j , and there is a non-intersection between L t,n and T j ; If R L 1,t ∩ R T j = φ, and R L t,n ∩ R T j φ, there may be an intersection problem between the monotonic chain L t,n and T j , and there is a non-intersection between L 1,t and T j ; If R L 1,t ∩ R T j φ, and R L t,n ∩ R T j φ, then there may be an intersection problem between the monotonic chain L 1,t and T j , and there may be an intersection problem between the monotonic chain L t,n and T j .Using the same method, we can calculate whether there are intersection problems between R S 1,t , R S t,n , and R T i .
Step 8: After processing by step 1 to step 7, all the intersection problems of the monotonic chains are found.In this step, we take an example to show how the proposed method deals with these intersection problems.
For one curve T, which is processed by the D-P algorithm as shown in Figure 3a, T includes 25 coordinate vertices.Using step 2 and step 3, we can obtain three monotonic chains T 1 , T 2 and T 3 (as shown in Figure 3b); T 1 contains six coordinate vertices (P 1 , . . ., P 6 ), and P 1 and P 6 are the end vertices of T 1 ; T 2 contains nine coordinate vertices (P 6 , . . ., P 14 ), and P 6 and P 14 are the end vertices of T 2 ; T 3 contains 12 coordinate vertices (P 14 , . . ., P 25 ), and P 14 and P 25 are the end vertices of T 3 .After using step 4, step 5, step 6 and step 7, there is one intersection problem between T 1 and T 3 , and there is another intersection problem between T 2 and T 3 .
and the line segment  Using Figure 3c as an example, after processing by step 6, assuming that there is one intersection problem between T 1 and T 3 , to obtain the intersection line segment K 5,6 and K 17,18 by the geometric calculation method [12,25] and obtain the coordinate vertices P 5 , P 6 , and P 17 , P 18 .If there are coordinate vertices v p , v p+1 , . . .v i , . . ., v q , i ∈ [p, q] (p, q are two integers) between P 17 and P 18 that belong to the original curve M, then calculate the shortest distance between the vertices v p , v p+1 , . . .v i , . . . ,v q , i ∈ [p, q] and the line segment K 17,18 and find the maximum value (D max ) of the shortest distance and the corresponding coordinated point P i .
Connect P 17 P i , and P i P 18 and obtain two new monotonic chains T 17i and T i18 .Calculate whether there are intersection problems between the two new monotonic chains T 17i , T i18 and the monotonic chain T 1 .If there are no intersection problems, then conclude this algorithm; the monotonic chain T 3 will be divided into two new monotonic chains T 17i and T i18 .If there are other intersection errors, then re-execute step 8 and step 9 until there is no intersection error between T 1 and T 3 .
Similarly, if there are coordinate vertices between P 5 and P 6 that belong to the original curve M, re-execute steps 8 and 9 until there is no intersection error between T 1 and T 3 .
Execute steps 4 to 8 until there are no intersection errors between all the monotonic chains, and then obtain the final result T .Figure 3d shows the final result, k i and k j are two coordinate vertices from the original curve M.
Step 9: After processed by step 1 to step 8, all of the intersection problems have been processed, then end the proposed method, and obtain the final results.

Experiments and Analysis
We select two groups of experimental data to verify the validity of the proposed algorithm.The first group of data is the road line of Jiangxi Province in China.Its total length is approximately 1.56 × 10 5 km, and the data volume is approximately 92,000 bytes, including approximately 5.13 × 10 6 vertices.The second group of data is the land use line of Dingnan County in Jiangxi Province in China.Its total length is approximately 1.41 × 10 4 km, and the data volume is approximately 26,000 bytes, including approximately 1.24 × 10 6 vertices.

Assessment
In this study, we adopted a number of different methods to simplify the two groups of data and compare the performance of the proposed method.This is due to the ST algorithm [23], which is also based on the D-P algorithm, which could solve the self-intersection problems, in this paper, we compared the performance of the proposed method with the ST algorithm and the D-P algorithm.The scale of the experimental data is 1:10,000, and the results in target proportions of original vertices are 60% and 70%, respectively.As a result, the two groups of the data are displayed as large volumes.Thus it is difficult to show case further details, in the same experimental environment.Moreover, we chose six self-intersection problems from the two groups of data instead.The simplified results of the six self-intersection problems are shown in Figure 4. Step 9: After processed by step 1 to step 8, all of the intersection problems have been processed, then end the proposed method, and obtain the final results.

Experiments and Analysis
We select two groups of experimental data to verify the validity of the proposed algorithm.The first group of data is the road line of Jiangxi Province in China.Its total length is approximately 1.56*10 5 km, and the data volume is approximately 92,000 bytes, including approximately 5.13*10 6 vertices.The second group of data is the land use line of Dingnan County in Jiangxi Province in China.Its total length is approximately 1.41*10 4 km, and the data volume is approximately 26,000 bytes, including approximately 1.24*10 6 vertices.

Assessment
In this study, we adopted a number of different methods to simplify the two groups of data and compare the performance of the proposed method.This is due to the ST algorithm [23], which is also based on the D-P algorithm, which could solve the self-intersection problems, in this paper, we compared the performance of the proposed method with the ST algorithm and the D-P algorithm.The scale of the experimental data is 1:10,000, and the results in target proportions of the original vertices are 60 (a) As is shown in Figure 4, the simplified results brought up from each group of data, the D-P algorithm produced self-intersection problems, but the proposed method could process selfintersection problems as well as the ST algorithm.To compare the performance of the different methods, four metrics are selected, including time consumption, mean vector displacement [3,26], Hausdorff distance (HD) [27], and standardized measure of displacement (SMD) [28].Time consumption indicates how much time the algorithm takes.Mean vector displacement is computed as the average displacement of the vector between the original vertices and the simplified version of the same vertices.
The Hausdorff distance (HD) between the two geometric objects is the largest minimum distance between points on one object to the other [27].
Standardized measure of displacement (SMD) is defined by Joao [28], and the calculation formula is demonstrated as follows: As is shown in Figure 4, the simplified results brought up from each group of data, the D-P algorithm produced self-intersection problems, but the proposed method could process self-intersection problems as well as the ST algorithm.To compare the performance of the different methods, four metrics are selected, including time consumption, mean vector displacement [3,26], Hausdorff distance (HD) [27], and standardized measure of displacement (SMD) [28].Time consumption indicates how much time the algorithm takes.Mean vector displacement is computed as the average displacement of the vector between the original vertices and the simplified version of the same vertices.
The Hausdorff distance (HD) between the two geometric objects is the largest minimum distance between points on one object to the other [27].
Standardized measure of displacement (SMD) is defined by Joao [28], and the calculation formula is demonstrated as follows: W is the distance from the coordinate vertices which has the maximum displacement between the original polyline and the simplified polyline to the straight line.This is obtained by connecting the first and last nodes of the polyline, and O is the actual displacement of the coordinate vertices between the original polyline and the simplified polyline.method and ST algorithm are 6.79 m, 6.57 m, and 7.83 m, respectively, The second group of data showed that the mean vector displacement results of the D-P algorithm, the proposed method and ST algorithm are 4.92 m, 4.63 m, and 5.81 m, respectively.For each group of data, the mean vector displacement of the proposed method is similar to the D-P algorithm but much lower than the ST algorithm.
(4) Figure 7 shows the Hausdorff distance of the three methods for processing the two groups of data.For the first group of data, the Hausdorff distances of the D-P algorithm, the proposed method and ST algorithm are 6.25 m, 6.08 m, and 6.85, respectively.The second group of data showed, the Hausdorff distances of the D-P algorithm, the proposed method and ST algorithm are 4.75 m, 4.68 m, and 5.23 m, respectively.For each group of the data, the Hausdorff distance of the proposed method is similar to the D-P algorithm and the ST algorithm.
(5) We also used a standardized measure of displacement (SMD) to measure the location accuracy.As shown in Figure 8, the first group of data, the SMDs of the D-P algorithm, the proposed method, and ST algorithm are 3.46%, 3.58%, and 4.25%, respectively, The second group of data showed that the SMDs of the D-P algorithm, the proposed method and ST algorithm are 1.83%, 1.79%, and 2.81%, respectively.For each group of data, the mean vector displacement of the proposed method is similar to the D-P algorithm but much lower than the ST algorithm.

Figure 1 .
Figure 1.The flowchart of the proposed method.

Figure 1 .
Figure 1.The flowchart of the proposed method.

Figure 2 .
Figure 2. A schematic chart of the monotonic chain.

Figure 2 .
Figure 2. A schematic chart of the monotonic chain.

P and 6 PFigure 3 .
Figure 3. (a) The curve T which processed by D-P algorithm; (b) three monotonic chains T 1 , T 2 and T 3 processed by the monotonic chain technology; (c) minimum-area enclosing rectangle (MER) of T 1 and T 3 ; (d) the final result T .

15 Figure 3 .′ and 3 T 1 T ′ and 3 T
Figure 3. (a) The curve T which processed by D-P algorithm; (b) three monotonic chains 1 T ′ , 2 T ′ and 3 T ′ processed by the monotonic chain technology; (c) minimum-area enclosing rectangle (MER) of 1 T ′ and 3 T ′ ; (d) the final result T ′ .

Figure 4 .
Figure 4.The six simplified results of the three applied methods from the two groups of data.(a) The three self-intersection problems identified from the first group of data; (b) the three self-intersection problems identified from the second group of data.Notes: DP algorithm is the Douglas-Peucker algorithm proposed by Douglas and Peucker [1]; ST algorithm is the star-shaped algorithm proposed by Wu and Marquez [23].