A Novel Parallel Algorithm with Map Segmentation for Multiple Geographical Feature Label Placement Problem

: Multiple geographical feature label placement (MGFLP) is an NP-hard problem that can negatively inﬂuence label position accuracy and the computational time of the algorithm. The complexity of such a problem is compounded as the number of features for labeling increases, causing the execution time of the algorithms to grow exponentially. Additionally, in large-scale solutions, the algorithm possibly gets trapped in local minima, which imposes signiﬁcant challenges in automatic label placement. To address the mentioned challenges, this paper proposes a novel parallel algorithm with the concept of map segmentation which decomposes the problem of multiple geographical feature label placement (MGFLP) to achieve a more intuitive solution. Parallel computing is then utilized to handle each decomposed problem simultaneously on a separate central processing unit (CPU) to speed up the process of label placement. The optimization component of the proposed algorithm is designed based on the hybrid of discrete differential evolution and genetic algorithms. Our results based on real-world datasets conﬁrm the usability and scalability of the algorithm and illustrate its excellent performance. Moreover, the algorithm gained superlinear speedup compared to the previous studies that applied this hybrid algorithm.


Introduction
Feature label placement is a fundamental step in map production that can precisely visualize the information and has a significant influence on reader perception [1]. Geographical features are classified into three categories: point features, line features, and area features [2]. Hence, map labels are required to be clear, aesthetic, and legible to present geographical information accurately [3]. For this objective, a set of rules are assembled, including aesthetic criteria, prevention of overlapping or obscuring other features or labels, legibility, and avoidance of ambiguity between a label and its corresponding feature [4]. Therefore, high-quality feature labeling is a long-standing aim of map visualization that requires a comprehensive algorithm to place feature labels according to cartographic standardization [5].
Studies have determined that label placement takes over 50% of map production time overall [6]. However, shifting from manual labeling to automatic labeling decreased the overall time of map labeling significantly. Even with the great reduction in processing time, cartographers still encounter serious difficulties in the sense of good positioning of labels of features in a lower computational time [7]. In addition, feature label placement is an NP-hard problem. By adding to the number of features, the complexity of the label placement problem increases, which causes an exponential increase in the execution time of the algorithm [8,9]. Therefore, the probability of finding the optimal solution is almost zero in polynomial time due to the complexity of the automatic label placement. However, extensive algorithms are able to exploit part of the optimal solution in some cases [2].
In the present study, a novel parallel algorithm with the concept of map segmentation is introduced for automatic label placement, which aims to reduce the computational complexity of automatic label placement and enhance the quality of feature label placement. The concept of map segmentation is to decompose the input map into several subdomains to decrease the complexity of the problem. Then, each subdomain is assigned on a separate CPU to speed up the process of label positioning by handling each decomposed problem concurrently. Following, to find the ideal solution from generated positions, the hybrid of discrete differential evolution and genetic algorithms is applied in this proposed algorithm. The obtained results indicate that the proposed algorithm has great performance in label placement accuracy; moreover, the algorithm achieved superlinear speedup compared to the most recent studies that applied the same hybrid algorithm to label multiple geographical features.

Multiple Geographical Feature Label Placement
The procedure of automatic label placement of a 2D map is generally divided into three steps: position generation, position evaluation, and selection of the optimal positions. In the candidate position generation step, a set of candidate positions are generated for each feature label. For instance, four and eight positions are commonly used for point features; however, in this study, the number of candidate positions selected is 24. In the evaluation step, each generated position is evaluated based on the spatial relationship of the label with the corresponding feature. In the last step, an optimal position is selected for each feature label, as determined by the evaluation step [10].
Automatically generating candidate positions for feature labels requires highly complex computation; thus, an advanced algorithm can greatly improve the procedure of automatic label placement. In point feature cartographic label placement (PFCLP), most of the point features must be labeled if overlapping is not allowed. If the overlapping is allowed, all point features must be labeled [11]. Two standard models of positioning labels for PFCLP have been employed. The first model is the fixed-position model, in which each point has a number of predefined positions. To illustrate, a mathematical approach was proposed based on integer programming to address PFCLP [12]. Stochastic optimization of simulated annealing was applied on a dataset with 120 points and resulted in 42 label conflicts [13]. In addition, a handful of heuristic search algorithms have been introduced based on the idea of maximal independent vertex set problem (MIVSP) and mathematical formulation [14][15][16]. These approaches enhanced the quality of map labeling considerably. On the other hand, several studies have been implemented based on the concept of conflict graphs in addition to heuristics and metaheuristics. A conflict graph consists of nodes (V) and edges (E). The nodes were allocated for the candidate positions and edges were allocated for the possible conflict between labels. To eliminate label conflict, an additional step was added to cut the edges of the conflict graph [17,18]. Furthermore, a multicriteria optimization model for a high-quality point feature labeling was put forward that comprehensively considered all cartographic requirements [19]. Consequently, to better consider the avoidance of ambiguity of label association on the map with high label density, a new approach was presented for PFCLP [20]. The second model of point label positioning is the sliding model, where the labels move around point features until finding an optimal location for label position [21].
Apart from point features, line features illustrate linear infrastructure on the map (roads, railways, rivers, etc.). Two general rules have been considered for line features: placing the labels alongside the line features [22] and positioning the labels within the line features such as a city street [23]. The rules are defined for better readability of the labels and to place the labels more effectively and with higher label quality. Three evaluation metrics for labels of line features, additionally, were introduced: the relationship of labels to the swath line, the flatness that measures the degree of curvature of the swath line, and centeredness and aboveness that measure the relationship of labels with their corresponding features [24][25][26]. Along with point and line features, another challenging problem of cartographic visualization is the label placement of area features. Labels of area features are positioned within or outside the features. If the length and width of area features are large enough, the labels are placed within the features. Otherwise, the labels are placed outside the boundary of features, as is done for the point features [27][28][29]. Moreover, an artificial neural network was implemented for labeling area features, and in the optimal configuration, 80% of the labels were positioned correctly [30].
Even though many attempts have been made on each type of geographic feature label placement problem independently and well studied, in contrast, fewer studies have been carried out for labeling multiple geographical features as a cross-layer. A new approach for multiple geographical feature label placement problems was put forward based on the hybrid of discrete differential evolution and genetic algorithms (DDEGA) for labeling features as a cross-layer [31]. The algorithm was applied on three different datasets, with multiple geographical features. The first map includes 71 features, 39 area features, 17 lines, and 15 point features, and the second map contains 56 points and 10 line features. The achieved result based on the first map has 33 conflicts in total and 8 label-feature conflicts on the second dataset. However, the ideal solution cannot be achieved in less computational time by the DDEGA algorithm. More recently, the concept of two degrees of freedom space for label placement was introduced (DDEGA-NM) that improved the procedure of candidate position generation extensively for multiple geographical features [32]. The algorithm was implemented on the same multiple geographical feature dataset as was used by the DDEGA algorithm and obtained a result with 22 label-feature conflicts, an increase of 11 labels free of conflict. Nevertheless, the main challenges of the algorithm remain unaddressed, including crucially having long computational time and trapping in local minima due to a large number of solutions, and it is extremely difficult to achieve the ideal solution in a smaller iteration cycle. Hence, determining the optimal label position and resolving conflicts require intensive computational processing.
In view of the above studies, it is clearly evident from the literature review that most previous studies have been carried out on point feature label placement problems. In addition, some studies exist on label positioning of multiple geographical features; however, obtaining the solution close to the optimal is extremely difficult due to a set of a large combination of solutions, and most importantly, the execution times of the algorithms are crucially long. Therefore, this paper aims to reduce the complexity of automatic label placement by decomposing the given problem into a set of subproblems and then assigning each decomposed problem on a separate CPU to run the algorithm in parallel, which enables the algorithm to identify the ideal solution with less computational time. The proposed algorithm is applied on the same datasets that were used by DDEGA and DDEGA-NM algorithms to clearly analyze the usability and scalability of the designed algorithm; however, an additional map is added with a larger number of multiple geographical features, consisting of 88 polygons, 20 lines, and 36 point features.

Genetic Algorithm
The genetic algorithm is a search heuristic that is inspired by Charles Darwin, based on the principles of natural selection and genetics. A genetic algorithm consists of genetic operators which form the essential part of the algorithm as a problem-solving strategy. The operators are crossover and recombination, mutation, and selection. This concept has increasingly been applied in a wide range of research in the past decades to solve the complex optimization problem [33,34]. One of the major advantages of the GA is that it can be hybridized with other optimization algorithms to enhance its performance. Despite several benefits, there are some challenges to be considered while designing the algorithm, for instance, selection of initial populations, premature convergence, and selection of proper variants for the operators [35].
The process of natural selection starts with the selection of the fittest individuals from a population. They produce offspring which inherit the characteristics of the parents and will be added to the next generation. If parents have better fitness, their offspring will be better than parents and have a better chance at surviving. This process keeps on iterating and in the end, a generation with the fittest individuals will be found. The fitness function determines how fit an individual is and gives a score to each individual. The probability that an individual will be selected for reproduction is based on its fitness score. Figure 1 illustrates genetic operators. ISPRS Int. J. Geo-Inf. 2021, 10, x FOR PEER REVIEW 4 of 26 several benefits, there are some challenges to be considered while designing the algorithm, for instance, selection of initial populations, premature convergence, and selection of proper variants for the operators [35]. The process of natural selection starts with the selection of the fittest individuals from a population. They produce offspring which inherit the characteristics of the parents and will be added to the next generation. If parents have better fitness, their offspring will be better than parents and have a better chance at surviving. This process keeps on iterating and in the end, a generation with the fittest individuals will be found. The fitness function determines how fit an individual is and gives a score to each individual. The probability that an individual will be selected for reproduction is based on its fitness score. Figure 1 illustrates genetic operators.

Materials and Methods
Map labeling is a combinatorial optimization problem, which is proven to be an NPhard problem, and its computational time is (2 ). The complexity of the correspondent problems is compounded as the number of features for labeling increases, causing the execution time of the algorithms to grow exponentially. Therefore, the optimal solution cannot be found in polynomial time, and considerable computation time is required. However, an essential trade-off between problem complexity and feature label placement quality is needed while designing a comprehensive algorithm. Thereby, to minimize the computational complexity and obtain esthetic results, the concept of map segmentation is proposed in this study to decompose the problem of feature label placement into a set of subproblems, and then each decomposed problem is assigned to a separate CPU to speed up by running the algorithm concurrently. Parallel processing exploits high-performance computing capabilities by utilizing more than one CPU to manage separate parts of an overall task of label positioning simultaneously. A series of studies have been reported to implement parallel processing effectively [36][37][38]. To simplify the notation of map segmentation, the authors present Figures 2 and 3.

Materials and Methods
Map labeling is a combinatorial optimization problem, which is proven to be an NPhard problem, and its computational time is O(2 n ). The complexity of the correspondent problems is compounded as the number of features for labeling increases, causing the execution time of the algorithms to grow exponentially. Therefore, the optimal solution cannot be found in polynomial time, and considerable computation time is required. However, an essential trade-off between problem complexity and feature label placement quality is needed while designing a comprehensive algorithm. Thereby, to minimize the computational complexity and obtain esthetic results, the concept of map segmentation is proposed in this study to decompose the problem of feature label placement into a set of subproblems, and then each decomposed problem is assigned to a separate CPU to speed up by running the algorithm concurrently. Parallel processing exploits high-performance computing capabilities by utilizing more than one CPU to manage separate parts of an overall task of label positioning simultaneously. A series of studies have been reported to implement parallel processing effectively [36][37][38]. To simplify the notation of map segmentation, the authors present Figures 2 and 3.     Based on cartographic convention, labels at the centroids of area features and the midpoints of line features are preferred; in this research, these positions are given higher priority by assigning greater weight values. The candidate positions for area features are generated based on the skeleton lines in a procedure that is similar to the procedure of line features. Therefore, it is presumed that labels are placed at the midpoints of skeleton line and line features during the process of map segmentation, and these two types of features are presented as point features by their centroid and midpoint coordinates. In Figure 4a, there are four features with four candidate positions each, and it can be observed that features 1 and 2 have no direct connection with feature 4. In the traditional model of automatic label placement, during the process of evaluation in the optimization phase, the label conflict between all labels is investigated even though if there is no relationship between some labels, and this leads to longer computational times. However, if the map is split into smaller segments, as shown in Figure 4b, identifying the ideal solution is potentially more instinctive for the algorithm because each segment can be handled independently, which prevents unnecessary computation in the optimization phase of au- Based on cartographic convention, labels at the centroids of area features and the midpoints of line features are preferred; in this research, these positions are given higher priority by assigning greater weight values. The candidate positions for area features are generated based on the skeleton lines in a procedure that is similar to the procedure of line features. Therefore, it is presumed that labels are placed at the midpoints of skeleton line and line features during the process of map segmentation, and these two types of features are presented as point features by their centroid and midpoint coordinates. In Figure 4a, there are four features with four candidate positions each, and it can be observed that features 1 and 2 have no direct connection with feature 4. In the traditional model of automatic label placement, during the process of evaluation in the optimization phase, the label conflict between all labels is investigated even though if there is no relationship between some labels, and this leads to longer computational times. However, if the map is split into smaller segments, as shown in Figure 4b, identifying the ideal solution is potentially more instinctive for the algorithm because each segment can be handled independently, which prevents unnecessary computation in the optimization phase of automatic label placement. priority by assigning greater weight values. The candidate positions for area features are generated based on the skeleton lines in a procedure that is similar to the procedure of line features. Therefore, it is presumed that labels are placed at the midpoints of skeleton line and line features during the process of map segmentation, and these two types of features are presented as point features by their centroid and midpoint coordinates. In Figure 4a, there are four features with four candidate positions each, and it can be observed that features 1 and 2 have no direct connection with feature 4. In the traditional model of automatic label placement, during the process of evaluation in the optimization phase, the label conflict between all labels is investigated even though if there is no relationship between some labels, and this leads to longer computational times. However, if the map is split into smaller segments, as shown in Figure 4b, identifying the ideal solution is potentially more instinctive for the algorithm because each segment can be handled independently, which prevents unnecessary computation in the optimization phase of automatic label placement.   (b) presents that the input map is divided into two segments.

Map Segmentation
While segmenting the map, several criteria must be considered based on the existing problem. In the case of map feature label placement, two influential variables are viable: label conflict and label-feature conflict, which are fundamentally substantial in automatic label placement. All types of geographical features are connected; regardless of whether they are closer or farther, there is a spatial correlation among them [39]. Thereby, it is essential to consider a proper model of map segmentation to partition the input features uniformly and proportionately. Many studies have been conducted on decomposing large-scale data and then assigning each decomposed problem on a separate CPU to gain high-performance computing [40][41][42]. During the process of map segmentation, two fundamental issues have arisen, namely un-uniform and unequal distribution of features in each segment, which cause unload balancing on separate central processing units (CPUs). To cope with the aforementioned issues, map segmentation is the compromise of the initial map segmentation and adjusting phase, in this study.

Initial Map Segmentation
The given map contains a set of features (N) within a specific domain R ⊂ R 2 , noting that the domain refers to the segment. The features are then partitioned in a set of n subdomains R = {R 1 , R 2 , . . . , R n } of N into n subset such that no feature is located more than once in the subdomain. Let n ∈ R be the number of subdomains n < |N|, for each feature f ∈ N and subdomain i with ∀ i = 1, 2, . . . n. A binary variable, then, is defined: 1], where f represents the feature and i is one of the subdomains. If x f ,i = 1, the feature f is located in subdomain i; otherwise, it is not. The equations are defined as follows: Equation (1) is defined to make sure that each feature is positioned only once in one subdomain. Thus, a set of subdomains is defined as R = {R 1 , R 2 , . . . , R n } by Equation (2), and a variable w f is introduced to count the number of features in each segment, as formulated in Equation (3). In some cases, due to the complexity of feature structure, presumably the number of features distributed disproportionately in the initial map segmentation phase is high, which leads to a reduction in the overall performance of the algorithm. One of the substantial steps in parallel processing, nonetheless, is to distribute the amount of workload on each CPU proportionately due to the dependency of processors. Therefore, a tremendous potential of parallel processing can be achieved by balancing the workload among all parallel processing units, greatly improving the performance of the algorithm.

Adjusting the Number of Features
Parallel processing provides high-performance computing capabilities if the input features are divided proportionately in each segment. In a system with multiple processes, there is a very high chance that some processes be idle while others are overloaded. Therefore, the aim of the load balancing is to maintain the load on each processing element such that all the processing elements become neither overloaded nor idle and each processing element ideally has equal load during execution, as shown in Equations (4) and (5). Thus, the proper design of load balancing is crucial in accelerating the process of automatic label placement. To illustrate the correlation between computational time and the workload, the speedup of each segment is formulated as Equation (4), where t P (R i ) is the execution time of the segment R i on processor P, and the workload in each segment is formulated as Equation (5), where S m is the execution time for the segment R n .
To address the issue of unequal distribution of features, the adjusting phase is implemented, which removes some features from one segment and adds them to others. However, deciding which features are better to move from one segment to other segments requires high consideration, including examining which segment has enough space and also which is more suitable to accept these features [43]. To ensure that all segments have an equal number of features after the initial map segmentation, the number of distributed features is counted in every segment based on the maximum and minimum values that are defined; these values define the range value of the workload in each segment and are termed as max v and min v , respectively. Therefore, after the initial map segmentation, the number of distributed features must be adjusted on the basis of load balancing strategy if any segment breaches the defined range value. The adjusting steps are defined as follows: Equation (6) ensures that the number of features in subdomain i does not violate the range value; otherwise, this segment violates the principle of load balancing, and in such a case, some features should be removed from this segment and added to others based on the maximum and minimum values of subdomains. The following equations are set to appoint which features are more appropriate to be moved from one segment to others: In Equation (7), z i − µ j is the Euclidean distance between feature z i , which is a random feature, and other features (µ j ); this procedure continues for all features in order to find a feature that presents the centroid. k is the number of features. The feature that stands as the centroid (C i ) of the considered segment is the feature that has the lowest value of d average(i) . C i − µ j is the Euclidean distance between the feature that is selected as the centroid (C i ) and other features (µ j ), which is used to find the features with longer distances from the extracted centroid of features, using Equation (8). The visualization of Equation (7) is shown in Figure 5. Then, it must be determined which neighboring segments are appropriate to accept the new features.
. .} is a set of features that are identified to be moved from one segment to other segments, as illustrated in Figure 5b. This set of features can be different types of features, as the area features are presented by their centroids and the lines are presented by their midpoints in the map segmentation step.
Equation (6) ensures that the number of features in subdomain does not violate the range value; otherwise, this segment violates the principle of load balancing, and in such a case, some features should be removed from this segment and added to others based on the maximum and minimum values of subdomains. The following equations are set to appoint which features are more appropriate to be moved from one segment to others: In Equation (7), − is the Euclidean distance between feature , which is a random feature, and other features ( ); this procedure continues for all features in order to find a feature that presents the centroid. is the number of features. The feature that stands as the centroid ( ) of the considered segment is the feature that has the lowest value of ( ) . − is the Euclidean distance between the feature that is selected as the centroid ( ) and other features ( ), which is used to find the features with longer distances from the extracted centroid of features, using Equation (8). The visualization of Equation (7) is shown in Figure 5. Then, it must be determined which neighboring segments are appropriate to accept the new features. Suppose = { , , , . . . . } is a set of features that are identified to be moved from one segment to other segments, as illustrated in Figure 5b. This set of features can be different types of features, as the area features are presented by their centroids and the lines are presented by their midpoints in the map segmentation step.  (10); (c) presents, feature has relationship with features ( and ); (d) shows that all three features ( , , and ) have relationships and the connection is evaluated based on angle ( > 45°), (see Equation (11)); (e) illustrates that all three features ( , , and ) have relationships and the connection is evaluated based on angle ( < 45°), (see Equation (12)); (f) shows a condition where the angle ( ) can be greater than or less and equal to 45°, (see Equation (13)).
As shown in Figure 5, three subdomains exist; however, the segment with the centroid has more features in contrast to the two neighboring segments with the centroids and . Some features have to be removed from the segment having the centroid and added to the neighboring segments accordingly. To adjust the distribution of the features, feature types are applied as the basis for devising in the adjusting phase. According to the cartographic standardization for feature labeling, the labels of line and area features are placed at various angles, horizontal, vertical, or along the direction of features. While the label of point features is often positioned horizontally, in this case, the probability of label conflict and label-feature conflicts increases during the process of feature label placement.
In Figure 5, assume the segment with the centroid violated Equation (6), and three features with indices , , and are identified as the features to be removed from the segment with centroid and added to other segments based on Equation (8). To investigate the relationship between features in set = { , , , . . . . }, five factors are considered: the distance between the features, the length of the label box for point feature, the height of the label box, the angle between features, and the buffer distance for the area and line features. Since the position of each label has an influence on the position of neigh- (see Equation (12)); (f) shows a condition where the angle (α) can be greater than or less and equal to 45 • , (see Equation (13)).
As shown in Figure 5, three subdomains exist; however, the segment with the centroid C 1 has more features in contrast to the two neighboring segments with the centroids C 2 and C 3 . Some features have to be removed from the segment having the centroid C 1 and added to the neighboring segments accordingly. To adjust the distribution of the features, feature types are applied as the basis for devising in the adjusting phase. According to the cartographic standardization for feature labeling, the labels of line and area features are placed at various angles, horizontal, vertical, or along the direction of features. While the label of point features is often positioned horizontally, in this case, the probability of label conflict and label-feature conflicts increases during the process of feature label placement.
In Figure 5, assume the segment with the centroid C 1 violated Equation (6), and three features with indices f 1 , f 2 , and f 3 are identified as the features to be removed from the segment with centroid C 1 and added to other segments based on Equation (8). To investigate the relationship between features in set A = { f 1 , f 2 , f 3 , . . .}, five factors are considered: the distance between the features, the length of the label box for point feature, the height of the label box, the angle between features, and the buffer distance for the area and line features. Since the position of each label has an influence on the position of neighboring labels, the aforementioned factors are set to minimize the probability of label conflict. As shown in Figure 5a To study the relationship between features in set (10) is formulized, where L stands for the length of the label box. Note that the length of the label box is calculated according to the number of letters of labels. First, the dimension of each letter is measured based on the scale of the dataset and then multiplied by the number of letters; in this case, the label box generated is approximately the same size as the actual label. d i,j stands for the Euclidean distance between features i and j. If the distance is less than the length of the label box of any of these two features, the features are assumed to be dependent; otherwise, they are assumed to be two independent features. Figure 5b shows that there is no possible relationship between features in set A = { f 1 , f 2 , f 3 , . . .}, according to Equation (10); in such a case, each feature is added in different neighboring segments regardless of other features in set A = { f 1 , f 2 , f 3 , . . .}. However, if there is any dependency between features in set A, as shown in Figure 5c, the feature with index f 3 has relationships with the two other features. On the other hand, none of the neighboring segments can accept all three features. The following equations are defined to identify which segment is appropriate to accept the features from set A by assigning a weight value based on the condition of feature relationships: In Equation (11), if the angle α is greater than 45 degrees, the height of the label box and vertical distance are considered to assign weight values, as presented in Figure 5d. h stands for the height of the label box; V i,j and V i,k are the vertical distances between features, as shown with blue dash line in Figure 5d; and ϕ 1 and ϕ 1 are the weight values. If ϕ 1 > ϕ 1 , the features i and j are grouped as one category and can be moved in the same segment, and feature k is assumed to be an independent feature and can be added to another segment.
In Equation (12), if the angle α is less than or equal to 45 degrees, the length of the label box for point features, buffer distance for the area and line features, and horizontal distances are considered to assign weight values. l presents the length of label box for point feature and buffer distance for the line and area features; H i,j and H j,k are the horizontal distances between features, as shown with a black dash line in Figure 5e; and ϕ 1 and ϕ 1 are the weight values. The greater weight value shows that the features must be grouped as one category and the lower weight value indicates that the dependency of features can be regarded as independent.
Equation (13) describes the condition where the features in set A = { f 1 , f 2 , f 3 , . . .} are located in such a way that the angles α between them are less and greater than or equal to 45 degrees, as shown in Figure 5f. Therefore, these metrics are considered to assign weight values according to feature relationship to minimize the possibility of labels being positioned in conflict, which are the height (h) and length (l) of the label box and the horizontal (H) and vertical (V) distances between features. Ultimately, based on the assigned weight values, the features in set A will be categorized into different groups. Algorithm 1 presents the procedure of map segmentation.

Optimization Algorithm
The optimization algorithm of the proposed method is the hybrid of differential evolution (DDE) and genetic algorithm (GA) (see Section 2.2). This phase of the algorithm is composed of three phases: candidate position generation, finding the ideal solution from a large set of candidate position combinations, and the recombination result of each segment for the final output.

Candidate Position Generation and Quality Evaluation Function
In conformity with general rules of cartographic standardization in automatic label placement, several requirements for candidate position generation should be satisfied. The number of overlaps between labels and between label and feature must be minimized, and the orientation between labels and the corresponding feature must also be clear to maintain the highest probability of readability. Thereby, to realize MGFLP labels more effectively and with higher label quality, the quality evaluation function must consider esthetic properties, legibility, and clearness. Four fundamental quality factors are considered in the evaluation function of the proposed algorithm: • Label-feature conflict factor; • Label conflict factor; • Label position priority factor; • Ambiguity factor.
The input features are denoted as N, and the candidate positions are denoted as ps. So, based on four evaluation factors, the quality function is defined as follows: Equation (14) is the general formula for quality function; ρ 1 (γ) presents the percentage of features being labeled, and ρ 2 (γ) presents the sum of scores of all four quality metrics. In Equation (15), F is the total number of features that are labeled and N is the total number of input features. Equation (16) ensures that the features are labeled. If the value is zero, the feature is labeled; otherwise, the feature is not labeled.
In Equation (17), n is the number of segments, F Over i,j presents label conflict and labelfeature conflict, F Pos i,j presents label position priority, and F Amb i,j presents the ambiguity factor. Our objective is to minimize the value of ρ 2 (γ) function to ensure that the optimal positions are quantitatively evaluated. In this study, the considered quality factors are briefly explained; however, they are discussed in detail in [32].
Equation (18) illustrates label-label and label-feature conflicts, where zero shows no conflict occurring and greater than zero presents conflict. From the perspective of feature importance, the weight of label conflict with point feature is set greater compared to the weights of label conflict with line and area features, as it can be observed in the finding that no labels are in conflict with point features.
Equations (19) and (20) present the label position priority, and it is only considered for point and area features. Equation (19) is defined for label position priority of point features, where all positions in the first quadrant have the highest priority and its priority declines anti-clockwise, which is presented by angle β. Equation (20) is defined for label position priority of area features, where is the distance from the center of the label box to the midpoint of the approximate skeleton line of the area feature. min and max present the minimum and maximum distances of all candidate positions to the feature skeleton line, respectively. The smaller the value F Pos i,j is, the higher the label position priority is.
Equation (21) measures the ambiguity of the label and its corresponding feature; this quality factor was only taken into account for line features, and it calculates the distance of the selected label to the midpoint of the corresponding line feature. D is the distance of the center of the label box to the line feature, and D min and D max are the minimum and maximum distances of all candidate positions to the corresponding feature, respectively.
Algorithm 2 reports that the algorithm starts with the generation of candidate positions for the given features. When the generation of candidate positions is completed, the optimization phase begins with the selection of random initial populations, and then the fitness values of these populations are calculated and sorted. The selection of the population for each iteration is being exerted 30% by the DDE algorithm and 70% by the GA based on the fitness value. For terminating the iteration cycle, two conditions are set. First, if the fitness value of the obtained result is less than a threshold value, the loop iteration is terminated; otherwise, continue. Second, an iteration number is set; the optimization phase terminates after this number of iterations if the fitness value does not meet the first condition. Algorithm 2. Optimization algorithm.
1: Generating candidate Positions 2: set the parameters: mutation and crossover probability, termination (num-gen (t)) 3: initialize the population with pop-size (p) 4: while termination is note satisfied (iter < z) do 5: for the genotype of each individual g i (z) ∈ p(t) do 6: evaluate the fitness of g i (z) 7

Removing Label Conflict after Recombining the Result of Subdomain
To investigate label conflicts after recombining the result of each individual segment, an additional supplementary step is added in the proposed algorithm. Once the labeling process is completed in the second phase of the algorithm in each subdomain, the steps of recombination are implemented according to the two neighboring subdomains. First, the results of two neighbor subdomains are combined, and the rest of them wait until the combination of the two neighboring subdomains is completed; then, the two combined results recombine in the next step with the result of another subdomain. These procedures continue until all subdomains are combined. Figure 6 illustrates the steps of recombining the results of all segments after the optimization phase. until all subdomains are combined. Figure 6 illustrates the steps of recombining the results of all segments after the optimization phase. As it can be seen in Figure 6, on the left side of the figure, small colored boxes represent the final result of each subdomain, and the combination of results is carried out step by step as two neighboring segments. However, after the recombination phase, the presumption of label conflict is potentially high, especially near the boundary of segments, which negatively affects label placement quality, as shown by and in Figure 6. Therefore, after recombination of each two neighboring segments, the following steps (see Equations (22)-(25)) are defined to be implemented in order to eliminate or minimize the number of label conflicts: Equation (22) ensures that label conflict does not occur if the value is zero. Otherwise, label conflict occurs after the recombination of the results of two neighboring segments.
, is the weight value, and Χ , ∈ [0,1] is a variable. , denotes a set of features with labels that are in conflict after recombination with the label of feature . In Equation (23), is the recombination of label positions of two segments. Equation (24) shows that , k t C will be removed from the label list ( ).
presents the positions of labels, and , represents the position of feature having a conflict with another label, and it necessitates the identification of an alternative position that can remove the label conflict. The final equation is formulated as follows: As it can be seen in Figure 6, on the left side of the figure, small colored boxes represent the final result of each subdomain, and the combination of results is carried out step by step as two neighboring segments. However, after the recombination phase, the presumption of label conflict is potentially high, especially near the boundary of segments, which negatively affects label placement quality, as shown by f n and f m in Figure 6. Therefore, after recombination of each two neighboring segments, the following steps (see Equations (22)- (25)) are defined to be implemented in order to eliminate or minimize the number of label conflicts: S l ⇒ X − C k,t = x 1,j , x 2,j , . . . , x N−C,j − y k,t , . . . , y C,t ∀ j,t = 1, 2, 3, . . . ps (24) Equation (22) ensures that label conflict does not occur if the value is zero. Otherwise, label conflict occurs after the recombination of the results of two neighboring segments. ω i,j is the weight value, and X i,j ∈ [0, 1] is a variable. C k,t denotes a set of features with labels that are in conflict after recombination with the label j of feature i. In Equation (23), S l is the recombination of label positions of two segments. Equation (24) shows that C k,t will be removed from the label list (S l ). S l presents the positions of labels, and y k,t represents the position t of feature k having a conflict with another label, and it necessitates the identification of an alternative position that can remove the label conflict. The final equation is formulated as follows: where S LU is the label list updated with new label positions, and S l now presents the label list after removing C k,t . This step is continued in

Computational Results
The experiments were implemented on a machine with Intel Core i5-8265U CPU @1.60 GHz 1.8 GHz, running in Windows 10 Professional × 64 with 16.00 GB RAM installed. The code was written in Python language, using Pycharm framework based on Arcpy template using ArcGIS Pro. Message Passing Interface (MPI) library was used for parallel processing [44,45].
To evaluate the robustness and stability of the proposed algorithm (Parallel-MS), the experiments were implemented on three different real-world datasets, each containing multiple geographical features, as shown in Table 1. The experimental parameters were set as follows: the number of iterations was set to 10,000; crossover rate and mutation rate were 0.5 and 0.01, respectively; and 24 candidate positions were generated for each feature in two degrees of freedom space [32]. The obtained results are compared with the DDEGA algorithm, DDEGA-NM algorithm, and Maplex Label Engine to evaluate the applicability of the Parallel-MS.

Investigating the Performance of the Algorithm Based on Different Numbers of Segmentations
Map segmentation allows us to decompose the problem of map labeling into subtasks that can be managed concurrently. The aim of this experiment is to examine how the number of segments affects label placement quality and the acceleration performance of the algorithm. Five groups of experiments were implemented by varying the number of segments from 2 to 10 with an increment of 2. Each experiment ran five times, and then the best results after five runs for each group of the experiment were selected for comparison, as shown in Tables 2 and 3. The results were analyzed based on four metrics, namely the number of label conflicts, label-feature conflicts, score value, and computation time. The results of the first and second datasets are shown in Tables 2 and 3, respectively.  Tables 2 and 3 show that the computational complexity of the algorithm is reduced as the number of segments increases, enabling the algorithm to find the ideal solution instinctively. However, the number of segments is not infinite; thus, a suitable number must be set for segmentation. Otherwise, the probability of label conflict increases after the recombination of the result of each segment. This occurs when some labels are positioned near the boundary of segmentation, and the position of those labels can cause label conflict, particularly in feature-intensive areas. Due to the dependency of labels of features, the position of one label can affect neighboring labels, which leads to the labels being placed in conflict. As both tables present, label conflict occurred when the number of segments was set to 10. Therefore, increasing the number of segments leads to improvement in the computational performance of the algorithm; however, the score value is negatively influenced after 10 segments, indicating that the number of segments is not infinite. Hence, it is fundamentally important to select an appropriate number for segmentation, and its potential effect should be necessarily considered and evaluated.

Performance Analysis of Parallel Version vs. Serial Version of Map Segmentation with Different Segments
To examine how the concept of map segmentation can reduce the computational complexity, the experimental results were compared based on serial and parallel versions after applying the map segmentation. In the serial version of the algorithm, one CPU was employed to complete the procedure of automatic label placement of each segment in turn. The running times of the parallel version and serial version of the algorithm are shown in Figure 7a. As the figure shows, the execution time of the algorithm is significantly reduced when the number of segments is increased in both versions of the algorithm. Applying parallel computation significantly reduced the overall computation time. There was a 239 min reduction in computational time when the number of segments increased from 2 to 10 in the serial version and a 159 min reduction in the parallel version of the first dataset. On the second dataset, the same performance was achieved, 221 and 153.07 min in serial and parallel versions, respectively. Furthermore, the amount of speed and the ratio of speedup were measured to analyze the performance of the algorithm as well, as shown in Figure 7b. The speedup metric evaluates the amount of speed gained by increasing the number of segments, and the ratio of speedup describes the amount of speed that each CPU can achieve. The results illustrate that decomposing a complex problem into subproblems reduces the computation steps and further leads the algorithm to identify the ideal solution in a small iteration cycle. score value of the algorithm with two segments is the highest; in contrast, 10 segments have the lowest score value. The score has a dramatic decreasing trend from the first iteration to 1000 iterations, while between 1000 and 9000 iterations, the score values show a gradual decline; on the contrary, from 9000 to 10,000, all segments maintained the same value apart from 10 segments, which showed a slight rise. Although 10 segments initially had the lowest score, they outraced all segments at the end of the iteration due to the combination of the result of 10 segments as final output, which caused one label-label conflict. Therefore, this certifies that identifying the ideal solution for map labeling is better facilitated by decomposing the problem of label placement into subproblems. To sum up, the figures demonstrate that applying map segmentation enables the algorithm to reduce the computational complexity of multiple geographical feature label placement; even without parallel computation, the execution time of the algorithm is dramatically decreased. Additionally, the utilization of parallel computing sped up the algorithm linearly. Whereas the computational cost is influenced by increasing the number of features, the advantages of map segmentation and parallel processing become increasingly obvious in large-scale solutions, as they substantially reduce the computational complexity and obtain a solution with high quality.

Comparison Analysis of the Proposed Algorithm with Previous Studies
To quantitatively evaluate the practicality and effectiveness of the Parallel-MS algorithm, the obtained results are compared with previous studies. It should be noted that the basis of candidate position generation of the developed algorithm is the DDEGA-NM algorithm. Three experimental datasets were selected based on large and small scales in order to study the practical application of the Parallel-MS algorithm, and then the results were analyzed extensively in comparison with previous approaches. To begin with, the final result of the first dataset that was obtained by the DDEGA-NM algorithm is shown Apart from the aforementioned factors, the performance of the algorithm has been extensively analyzed in terms of label position accuracy based on the score value of the quality function. The aim is to minimize the value of the quality function in order to retrieve a solution with high quality (see Section 3.2). This experiment includes two metrics, the score values of serial and parallel versions of all three datasets. The figure depicts that the score values are reduced in both versions of the algorithm in response to the increase in segments. However, the score values increased when the number of segments was set to 10, as shown in Figure 7c, which is potentially caused by label conflict after the recombination of individual segments. Thus, 8 was selected as the best number of segments for the current datasets according to the experimental analysis. Furthermore, Figure 7d illustrates the speed of the convergence of score values during the iteration cycle on one dataset. As it can be observed, the higher the number of segments is, the lower the score value is in each iteration. Values along the y-axis show the score values and values along the x-axis represent the number of iterations. At the starting of the iteration, the score value of the algorithm with two segments is the highest; in contrast, 10 segments have the lowest score value. The score has a dramatic decreasing trend from the first iteration to 1000 iterations, while between 1000 and 9000 iterations, the score values show a gradual decline; on the contrary, from 9000 to 10,000, all segments maintained the same value apart from 10 segments, which showed a slight rise. Although 10 segments initially had the lowest score, they outraced all segments at the end of the iteration due to the combination of the result of 10 segments as final output, which caused one label-label conflict.
Therefore, this certifies that identifying the ideal solution for map labeling is better facilitated by decomposing the problem of label placement into subproblems. To sum up, the figures demonstrate that applying map segmentation enables the algorithm to reduce the computational complexity of multiple geographical feature label placement; even without parallel computation, the execution time of the algorithm is dramatically decreased. Additionally, the utilization of parallel computing sped up the algorithm linearly. Whereas the computational cost is influenced by increasing the number of features, the advantages of map segmentation and parallel processing become increasingly obvious in large-scale solutions, as they substantially reduce the computational complexity and obtain a solution with high quality.

Comparison Analysis of the Proposed Algorithm with Previous Studies
To quantitatively evaluate the practicality and effectiveness of the Parallel-MS algorithm, the obtained results are compared with previous studies. It should be noted that the basis of candidate position generation of the developed algorithm is the DDEGA-NM algorithm. Three experimental datasets were selected based on large and small scales in order to study the practical application of the Parallel-MS algorithm, and then the results were analyzed extensively in comparison with previous approaches. To begin with, the final result of the first dataset that was obtained by the DDEGA-NM algorithm is shown in Figure 8, and the result achieved by Parallel-MS algorithm is shown in Figure 9. Figure 8 shows that despite having more overlap, many labels of area features are positioned near the boundary of the area feature, which can cause more ambiguity for the readers, and some labels are placed closely together additionally. Moreover, two point features are covered by their labels. On the contrary, as Figure 9 shows, the result of label placement of the Parallel-MS algorithm is more precise, almost all features labels are positioned at the center of the area features, and no labels have conflicts with point features. Furthermore, the number of label-feature conflicts decreased from 23 to 12, and labels are placed at the most desirable position. Following, the result of the second dataset obtained by the DDEGA-NM algorithm is shown in Figure 10, and the Parallel-MS result is presented in Figure 11. By the same token, some labels are positioned far from their corresponding features, and they even are closer to the other features (see Figure 10), namely "Windy City Diamonds" and "Pizanos Pizza & Pasta". Those labels can cause misunderstanding for readers. In addition, three labels have conflicts with line features, and some labels of point features cover a small portion of their features; apart from this, one line feature is left without a label. The results of the proposed algorithm, conversely, in both parallel and serial versions (Serial-MS) show that most of those issues are solved, labels are placed near their related features, and the number of labels that covered their corresponding features is decreased considerably, as shown in Figure 11. Only one label is in conflict with the line feature. The results of Parallel-MS were also compared with other algorithms, and the statistical results are shown in Tables 4-6.       Figure 11. Result of the second dataset using the proposed algorithm. Figure 11. Result of the second dataset using the proposed algorithm.   Table 5 presents that the comparison begins not only with the DDEGA-NM algorithm but also with other algorithms by examining the performance of each algorithm on the same datasets. GA stands for the result of genetic algorithm, DDE represents discrete differential evolution result, and DDEGA is the hybrid of GA and DDE evolutionary algorithms. The Maplex Label Engine optimization was also implemented, in addition to the mentioned algorithms. The "%" column illustrates the percentage of features being labeled in the final result. In addition, in both tables, some rows are filled with the "//" symbol due to the lack of score value since those algorithms used different quality functions. First and foremost, the DDEGA algorithm iterated 300 iterations in 3060 min with 33 label-feature conflicts, and the DDEGA-NM algorithm iterated 10,000 iterations in 2166 min and gained the result with 22 label-feature conflicts (see Table 4). However, on the same dataset, the serial version of the proposed algorithm iterated 10,000 iterations in 225 minutes, and the parallel version finished the same number of iterations in 20 min with 12 label-feature conflicts.
Compared to previous approaches, the obtained results are much better relative to the results of other methods. Following, the number of label-feature conflicts and label conflicts in GA and DDE is 40, and Maplex achieved a result with 37 label-feature conflicts. The comparison on the second dataset was only implemented between the DDEGA algorithm, DDEGA-NM algorithm, and the proposed algorithm, and the results are presented in Table 4. Similar to the first dataset, the number of iterations was set to 300 for the DDEGA algorithm, the algorithm iterated in 2887 min, and eight labels conflicted with the features. The DDEGA-NM algorithm iterated 10,000 iterations in 2354 min with eight label-feature conflicts, which accelerated the execution time dramatically. The serial and parallel versions of the proposed algorithm, on the other hand, iterated 10,000 iterations in 214 and 16.03 min, respectively; two labels are in conflict with features in the serial version and three labels are in conflict with features in the parallel version. The obtained result of the third dataset by the proposed algorithm is presented in Figure 12, and its statistical analysis is reported in Table 6. Figure 13 is the result achieved by Maplex Label Engine, which has 65 label-feature conflicts. However, the numbers of label-feature conflicts in our approach are 28 and 29 in serial and parallel versions, respectively. Noting that the number of iterations was set to 10,000 for all experiments, the running time of the serial version of our algorithm is 486 min which declined to just 45 min in the parallel version of the algorithm. Among 144 features, 14 features were left without labels using Maplex Label Engine, and 4 and 6 features were left without labels in our serial and parallel algorithms, respectively. Therefore, the tables report that the previous approaches, including Maplex Label Engine, achieved a result in which the number of label-feature conflicts is much higher compared to the parallel and serial versions of map segmentation, the quality of label positioning is much lower compared to the standard labeling quality, and the computational time of other algorithms is crucially longer. Consequently, the proposed algorithm is well improved to solve the problem of MGFLP. On the whole, map segmentation strengthens the algorithm to explore most of the search solutions intuitively. Moreover, taking the approach of parallel computing sped up the algorithm significantly; it gained superlinear speedup compared to the mentioned algorithms.

Discussion
Experimental results demonstrate the usability and scalability of the proposed algorithm and substantially improved geographic feature label placement quality in both of the following aspects: (1) reduced the computational time of the problem by decomposing into smaller scale; (2) most of the labels are positioned at their desired positions. However, all pre-existing studies used the serial version of the algorithm to address the problem of automatic label placement. In this research, the authors extended the procedure of label placement beyond the traditional approaches by considering the concept of map segmentation and the utilization of parallel processing. To illustrate the notion of map segmentation (see Figure 4 in Section 3), four candidate positions are defined for each feature; without map segmentation, 4 4 combinations are required to check all the possible candidate positions combinations, despite there being no direct dependency between features (1, 4), (1,3), and (2,4). In contrast, if the map is segmented into two subdomains, as shown in Figure 4b, the number of steps that check all the solutions for both subdomains is reduced from 4 4 = 256 to 4 2 + 4 2 = 32 combinations. Thus, segmenting the map into several subdomains enables the algorithms to refrain from taking nonessential computational steps, which causes the algorithm to avoid being trapped in local minima and to not iterate through repetitive solutions, as well as reducing the computational time. These are the fundamental components while designing a comprehensive algorithm to cope with a combinatorial optimization problem. To substantially accelerate the speed of the algorithm, each segment is designated on a discrete CPU to facilitate the algorithm to run concurrently, which gained superlinear speedup compared to the previous works, as shown by the results presented in Tables 4-6.
Further, each feature label has 24 candidate positions in the current study; thus, the total number of generated solutions is N 24 , where N is the number of features. Identifying the ideal solution from such a large number of solutions is extremely difficult as one search domain within a reasonable time span and is a fundamental challenge in cartography. However, if the search domain is decomposed into several independent subdomains, the solution can be identified more intuitively with less computational time. While decomposing the problem of map labeling, a key challenge has arisen: label conflict after the recombination of subdomain results. To cope with the mentioned issue, an additional supplementary step is added to the algorithm that investigates the potential label conflict after each step of results recombination and identifies alternative positions for the labels with conflict. Note that all label conflicts are eliminated in all experimental datasets. Conversely, as the results demonstrate, there are still some label-feature conflicts, as shown in Figure 9, namely Seattle, Bremerton, Tacoma, Olympia, Aberdeen, and line I182. The elimination of these conflicts is almost impossible even manually due to the lack of enough space for labels of those features because of intense feature density, and there is no free space around the features to place the labels without label-feature conflict. As the third dataset illustrates, the number of features is 144, which also includes multiple geographical features, and the introduced approach obtained a good result. This result was not compared with DDEGA and DDEGA-NM algorithms, because they would take several days to finish the iteration cycle of 10,000.
There exists a pronounced trend to identify the ideal solution with HPC if a proper number of segments is chosen for the map. Several experiments were implemented to find a proper number for segmentation. As shown in Tables 2 and 3, while the number of segmentation increases, the computational time decreases in serial and parallel versions of the algorithm because the number of computational steps is reduced and the algorithm seeks the solution concurrently. In addition, the probability of finding the new solution in each iteration cycle is higher compared to a complex problem without decomposing, as presented in Tables 2 and 3. Therefore, special attention should be paid to the limited number of segments, as exceeding this limitation might cause more label conflicts in the final result, which are costly to eliminate and negatively influence the algorithm performance.

Conclusions
Multiple geographical feature label placement is a combinatorial optimization problem. Despite a long history of research to solve this problem, there is not any approach to verify the optimal solution to this problem. Several heuristics and metaheuristics have been introduced, and some considered the notion of conflict graph to address the problem of feature label placement. However, most of the heuristic approaches that were applied struggle with serious challenges, the algorithms become trapped in local minima, and the computational time is crucially long. This paper has presented a novel parallel algorithm with the concept of map segmentation to address the automatic label placement problem. To begin with, map segmentation enables the algorithm to decompose the problem of label placement into a smaller scale for coping with the computational intensity of labeling and identifying the ideal solution intuitively; it also improves the scalability and optimizes the performance of the algorithm. Next, parallel computing is applied to handle each segment concurrently on a separate processor to exploit the computational advantages of parallel processing. To generate appropriate candidate positions around points and line features and around or within area features, four quality factors are considered: label conflict, label-feature conflict, label ambiguity, and label position priority. For verifying the efficiency of the proposed algorithm, three datasets were selected for the experimental analysis, and each contained multiple geographical features. The experimental results demonstrate that 14.084% and 7.576% improvements have been achieved in terms of the number of label-feature conflicts for the first and second datasets, respectively, compared to the most recent study [32]. Moreover, 108.3% and 146.288% improvements have been achieved in computational time for the first and second datasets, respectively. By the same token, the number of label-feature conflicts decreased from 65 to 29 compared to Maplex Label Engine. Thus, by adopting the presented algorithm, the cartographers can consider large-scale data for feature label placement.
Furthermore, it must be noted that although the proposed algorithm improved the quality of label placement and accelerated the speed of labeling, it is important to address several limitations in future studies. First and foremost, further study is needed on the application of the proposed algorithm on maps with a high degree of feature density. Following, there is a need for more research on finding a correlation between the number of segments and the input features automatically. In addition, adding more quality metrics in the quality evaluation function for candidate generation can improve the procedure of candidate generation, which decreases the probability of label conflict and label-feature conflict, likewise helping the algorithm to find a better combination of solutions in fewer iterations.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data used in this study are available from the authors upon readers request.