Urban Parcel Grouping Method Based on Urban form and Functional Connectivity Characterisation

: The grouping of parcel data based on proximity is a pre-processing step of GIS and a key link of urban structure recognition for regional function discovery and urban planning. Currently, most literature abstracts parcels into points and clusters parcels based on their attribute similarity, which produces a large number of coarse granularity functional regions or discrete distribution of parcels that is inconsistent with human cognition. In this paper, we propose a novel parcel grouping method to optimise this issue, which considers both the urban morphology and the urban functional connectivity. Inﬁltration behaviours of urban components provide a basis for exploring the correlation between morphology mechanism and functional connectivity of urban areas. We measured the inﬁltration behaviours among adjacent parcels and concluded that the occurrence of inﬁltration behaviours often appears in the form of groups, which indicated the practical signiﬁcance of parcel grouping. Our method employed two parcel morphology indicators: the similarity of the line segments and the compactness of the distribution. The line segment similarity was used to establish the adjacent relationship among parcels and the compactness was used to optimise the grouping result in obtain a satisfactory visual expression. In our study, constrained Delaunay triangulation, Hausdor ﬀ distance, and graph theory were employed to construct the proximity, delineate the parcel adjacency matrix, and implement the grouping of parcels. We applied this method for grouping urban parcel data of Beijing and veriﬁed the rationality of grouping results based on the quantiﬁed results of inﬁltration behaviours. Our method proved to take a good account of inﬁltration behaviours and satisﬁed human cognition, compared with a k-means ++ method. We also presented a case using Xicheng District in Beijing to demonstrate the practicability of the method. The result showed that our method obtained ﬁne-grained groups while ensuring functional regions-integrity.


Introduction
Urban parcels, with a certain scale and spatial size, are bounded by a network of urban roads and comprise the basic spatial units for fine-scale urban modelling and urban studies, as well as for spatial planning [1]. Robert Krier [2] described a parcel as the original cell of urban design structures, which determines the form of the surrounding road network and the structure of internal buildings. Land parcel data are one of the cornerstones of contemporary urban planning [3]. Researchers have performed empirical research based on parcel data [4,5]. Normative planning and policies are performed expanding boundaries and spatial interaction of urban areas. The method of parcel grouping is not only a means of data organisation for urban planning but also a concrete practice for addressing the MAUP.
In this paper, we proposed a method for grouping urban parcels by considering functional connectivity and urban morphology. To analyse the potential link between urban function and parcel form, the infiltration behaviours (including dominance, functional complementarity, and imitation) of the components were measured using POIs within parcels. According to the results, we determined that infiltration behaviours often appear in the form of groups, which revealed the rule of functional interaction in the neighbourhood and further indicated the practical significance of parcel grouping. Based on this finding, we designed a parcel grouping method that generally involves three main steps: First, according to the Tobler's Law and Gestalt theory, we analysed the adjacent relationship among parcels and measured the proximity based on the line-segment similarity. Second, the grouping result was obtained by traversing the parcel adjacency matrix after setting a reasonable distance threshold. Last, we quantified the compactness of the grouping result and constructed the compactness curve, which was used to optimise the grouping result in obtain a reasonable visual expression. The proposed method was applied to a case study in Xicheng District in Beijing, and its feasibility was verified.

Related Work
A grouping of urban elements can reflect the urban form and distribution characteristics, which can be applied in numerous applications, such as urban planning, geovisualisation, transportation, and praxeology. Primarily, generalising maps for grouping has been adopted in existing research, which requires a substantial amount of effort for grouping buildings [43][44][45][46][47]. Regnauld [43] detected and organised building relations using the minimum spanning tree (MST). Li [48] presented an integrated methodology for the fully automated generalisation of buildings, including an automated grouping of buildings. Boffet and Serra [49] classified different types of settlement patterns and presented methods to characterise urban blocks and buildings. Rainsford and Mackness [45] focused on the simplification of the shape of individual buildings using a template matching technique for grouping buildings. Cetinkaya, Basaraner and Burghardt [50] presented a comparison of grouping algorithms for buildings in urban blocks and found that DBSCAN and spatial clustering based on Delaunay triangulation (ASCDT) were superior to CHAMELEON and MST.
The construction of proximity relationships among urban polygon elements is the primary task of grouping. From a cartographic viewpoint, individuals tend to visually perceive close objects in graphic representations as groups, because objects in proximity can be more closely associated with each other than objects further away. In some studies, Gestalt principles are the theoretical basis for forming groups and are usually introduced to identify urban morphology and spatial distribution patterns of urban elements. Li determined the direct alignments between neighbouring buildings based on Gestalt theory [48]. Yan and Weibel [51] considered the directional relations among buildings, and presented three rules and six parameters based on Gestalt theory to achieve a more comprehensive approach. Wang [52] applied multiple Gestalt rules and a graph-cut method to cluster similar buildings into the same group.
Common to all methods discussed above is that they tend to focus on urban buildings [43][44][45][46][47][48][49][50][51][52], and few studies address polygon grouping at the parcel (block) level. Urban parcels are carriers of buildings. Compared with buildings, the size of parcels is coarser, and the distribution of parcels is more compact. Due to the different size and irregular shape of parcels, unlike building grouping methods, it is difficult to obtain an accurate grouping result based on size, shape and orientation of parcels. Two main questions and issues for parcel grouping need to be addressed: (1) How to measure the proximity between parcels? A major problem with the traditional distance calculation methods (e.g., maximum distance, minimum distance, and centroid distance) is that they only measure the distance between two points or the centroid distance of parcels [43] and cannot adequately describe the similarity of the form between adjacent parcels and the compactness of discrete parcels. (2) How to measure the quality of the grouping? It will not be comprehensive enough for parcel groups if we evaluate the results merely by geometric indications [50], since the parcels have abundant semantic information. Some methods, from the viewpoint of area aggregation, have been proposed based on mesh density, geometric and semantics of blocks [53][54][55]. To ensure suitable forms of aggregation results, Haunert [56] presented an area aggregation method by mixed-integer programming, which optimised the problem. Luan [57] built a model for the aggregation of urban blocks based on Haunert's method to maintain the grids pattern after road selection. However, the results of the aggregation approach are not sufficiently flexible, and continued disaggregation is lacking, unlike the grouping method. For these reasons, our intention is to propose a new, comprehensive approach for parcel grouping, integrating urban morphology and urban functional connectivity. In the grouping process, the constrained Delaunay triangulation, Hausdorff distance, and graph theory were employed as supporting techniques to construct the proximity, delineate the parcel adjacency matrix, and implement the grouping of parcels. Two morphology indicators of parcels, i.e., line-segment similarity and compactness, were used to construct spatial proximity relationships and optimise the visual expression of the grouping results. The evaluation of grouping results involved Gestalt theory and quantised urban behaviours.

Study Area
The urban Xicheng District in Beijing, China was chosen as the study area ( Figure 1). Xicheng District, which is located in the western part of the urban core area, is the political and cultural centre of Beijing, China. As the central district in Beijing, Xicheng District is home to the offices of State agencies and more than 80 governmental ministries and agencies. Xicheng District has a modern financial industry, a booming cultural innovation industry, a high-tech industry with great potential and a wealth of commercial activities to offer. Its historical and cultural legacies form the basis for a unique tourism experience that is available to millions of foreign and domestic visitors each year. optimised the problem. Luan [57] built a model for the aggregation of urban blocks based on Haunert's method to maintain the grids pattern after road selection. However, the results of the aggregation approach are not sufficiently flexible, and continued disaggregation is lacking, unlike the grouping method. For these reasons, our intention is to propose a new, comprehensive approach for parcel grouping, integrating urban morphology and urban functional connectivity. In the grouping process, the constrained Delaunay triangulation, Hausdorff distance, and graph theory were employed as supporting techniques to construct the proximity, delineate the parcel adjacency matrix, and implement the grouping of parcels. Two morphology indicators of parcels, i.e., linesegment similarity and compactness, were used to construct spatial proximity relationships and optimise the visual expression of the grouping results. The evaluation of grouping results involved Gestalt theory and quantised urban behaviours.

Study Area
The urban Xicheng District in Beijing, China was chosen as the study area ( Figure 1). Xicheng District, which is located in the western part of the urban core area, is the political and cultural centre of Beijing, China. As the central district in Beijing, Xicheng District is home to the offices of State agencies and more than 80 governmental ministries and agencies. Xicheng District has a modern financial industry, a booming cultural innovation industry, a high-tech industry with great potential and a wealth of commercial activities to offer. Its historical and cultural legacies form the basis for a unique tourism experience that is available to millions of foreign and domestic visitors each year.
POI is a kind of dot data that represents real geographic entities, including spatial information, such as latitude and longitude, address, and attribute information, such as name and category. Our study shows region-wide planned land use for 516 parcels obtained from the Institute of Geographic Sciences and Natural Resources Research, CAS, while actual land use is measured by 23,123 geotagged POIs that are synthesised from a leading online business catalogue in China: the Baidu Map catalogues business establishments and housing options throughout the region (Xicheng District). The initial 25 POI types are re-classified into eleven general categories (refer to Table 1).  Table 1).  Table 1). POI is a kind of dot data that represents real geographic entities, including spatial information, such as latitude and longitude, address, and attribute information, such as name and category. Our study shows region-wide planned land use for 516 parcels obtained from the Institute of Geographic Sciences and Natural Resources Research, CAS, while actual land use is measured by 23,123 geo-tagged POIs that are synthesised from a leading online business catalogue in China: the Baidu Map catalogues business establishments and housing options throughout the region (Xicheng District). The initial 25 POI types are re-classified into eleven general categories (refer to Table 1).

Experimental Urban Parcel Datasets and Environment
The two selected experimental parcel datasets illustrated in Figure 2 clearly indicate two different morphologies (i.e., parcel group size and parcel components) in Beijing. Table 2 presents details about the prepared data, including 'number of parcels', 'number of POI', and 'main components'. The two sets of data were mainly used for the analysis of urban infiltration behaviours, as well as for the testing and results analysis of grouping urban parcels. The experiments were conducted on an Intel Core I7-6700 CPU running at 3.4 GHz, with 8 GB of RAM and a 1024 GB solid-state disk. The operating system was Windows 7 (64-bit). The proposed algorithms were implemented in Python.  The two selected experimental parcel datasets illustrated in Figure 2 clearly indicate two different morphologies (i.e., parcel group size and parcel components) in Beijing. Table 2 presents details about the prepared data, including 'number of parcels', 'number of POI', and 'main components'. The two sets of data were mainly used for the analysis of urban infiltration behaviours, as well as for the testing and results analysis of grouping urban parcels. The experiments were conducted on an Intel Core I7-6700 CPU running at 3.4 GHz, with 8 GB of RAM and a 1024 GB solidstate disk. The operating system was Windows 7 (64-bit). The proposed algorithms were implemented in Python.   Our proposed method for processing an urban parcel grouping is composed of three parts ( Figure 3). (1) An analysis of the infiltration behaviour of adjacent parcels (refer to Section 3.2). This study implemented three indicators to discover the infiltration behaviours of the components among parcels. (2) Construct the adjacent relationships among parcels and calculate the proximity between two adjacent parcels (refer to Section 3.3). (3) Urban parcel grouping method (refer to Section 3.4). The proposed urban parcel grouping method uses a graph algorithm to form parcel groups, and obtains the optimum grouping result by analysing the compactness among parcels.  Our proposed method for processing an urban parcel grouping is composed of three parts ( Figure 3). (1) An analysis of the infiltration behaviour of adjacent parcels (refer to Section 3.2). This study implemented three indicators to discover the infiltration behaviours of the components among parcels. (2) Construct the adjacent relationships among parcels and calculate the proximity between two adjacent parcels (refer to Section 3.3). (3) Urban parcel grouping method (refer to Section 3.4). The proposed urban parcel grouping method uses a graph algorithm to form parcel groups, and obtains the optimum grouping result by analysing the compactness among parcels.

Infiltration Behaviours of Components among Urban Parcels
Parcels are multiple functionally mixed land areas. With the enhancement of functional mixing, parcels promote the mixing of users, which promotes the development of urban diversification. Cooperative relationships exist between adjacent parcels; they utilise urban public space, that is, streets, as a transmission medium, to share social resources.
Compact distribution of parcels causes frequent interactions among the urban elements and population; this kind of interaction also causes infiltration of components without distinct boundaries [58]. The mutual infiltration forms the functional connectivity characteristics among parcels, which

Infiltration Behaviours of Components among Urban Parcels
Parcels are multiple functionally mixed land areas. With the enhancement of functional mixing, parcels promote the mixing of users, which promotes the development of urban diversification. Cooperative relationships exist between adjacent parcels; they utilise urban public space, that is, streets, as a transmission medium, to share social resources.
Compact distribution of parcels causes frequent interactions among the urban elements and population; this kind of interaction also causes infiltration of components without distinct boundaries [58]. The mutual infiltration forms the functional connectivity characteristics among parcels, which ensure the stability of shape and structure. This kind of infiltration behaviour consists of two main modes: (1) Functional complementarity among parcels, that is, the adjacent parcels form a multi-functional place, where a combination of functions (living, working, communicating, cultural and sporting activities) provide mutual support for various human activities. (2) Component imitation between adjacent parcels, that is, two adjacent parcels are similar in the combination of land use types. The imitation behaviour satisfies the description of Tobler's first law, i.e., everything is related to everything else, but near things are more related than distant things. These two modes do not exist in isolation. When functional complementary behaviour frequently occurs between two parcels, the imitation behaviour will eventually occur due to long-term infiltration of components.
The potential distribution patterns of the parcel components and urban behaviours within parcels are discussed based on an analysis of the infiltration behaviours of parcels. To effectively explore the infiltration behaviour of components among parcels, we use the parcels dominant function [1], the mixed land-use index (MLU) [29,59], and the Jaccard similarity coefficient [60] to quantify three indicators that describe the infiltration behaviour.

Definition 1. Dominant function:
Urban function for individual parcels is identified by examining dominant POI types within the parcels. A dominant function within a parcel is defined as the POI type that has accounted for more than 50% of all POIs within the parcel.

Definition 2. Functional complementarity behaviour:
In different parcels, the types of functions and the degree of functional mixing usually differ, which produces a functional requirement between adjacent parcels. A fine-grained mixing of residential, commercial, and recreational land use may enable local residents to walk or bike to desired destinations, which increases the frequency of interaction among parcels. Over time, a functional complementarity behaviour is formed between adjacent parcels.
As a supplemental measurement for the dominant function, we computed a mixed index to denote the degree of MLU. The mixed index (M) of a parcel is the entropy index and can be expressed as the following Equation (1): where (1) P ij = Percent of land use i in parcel j.
(2) N j = Number of represented land uses in all parcels. According to maximum entropy theory, when all data types in the dataset are evenly distributed, the information entropy of the data set will attain the maximum value, which is known as the principle of 'equal probability maximum entropy'. When the parcels are logically grouped, the types of land use increase in the parcel group and the MLU index for the group increases. If the MLU increases after the parcel grouping, we believe that functional complementarity behaviours arise among parcels.

Definition 3. Imitation behaviour:
This kind of behaviour describes the similarity of the components between adjacent parcels. When two parcels are adjacent and both parcels are composed of three parts-residential area, shopping centre, and catering service-then we conclude that there are imitation behaviours between two parcels. Thus, we can construct a vector to describe the composition of a parcel. Each attribute (dimension) in the vector corresponds to a land use type. The value of each attribute is 1 or 0, where "1" indicates that a certain land use type exists in the current parcel, and "0" indicates that it does not exist ( Figure 4). parts-residential area, shopping centre, and catering service-then we conclude that there are imitation behaviours between two parcels. Thus, we can construct a vector to describe the composition of a parcel. Each attribute (dimension) in the vector corresponds to a land use type. The value of each attribute is 1 or 0, where "1" indicates that a certain land use type exists in the current parcel, and "0" indicates that it does not exist ( Figure 4).  We introduced the Jaccard coefficient to verify the degree of imitation of adjacent parcels (the similarity between two vectors), which is described as follows: The Jaccard index is a statistic that is used to compare the similarity with the diversity of sample sets. The Jaccard coefficient measures the similarity between two finite sample sets and is defined as the size of the intersection divided by the size of the union of the sample sets. The coefficient can be expressed with Equation (2): where A and B are the composition vectors of two parcels. If both A and B are empty, we define J(A, B) = 1. 0 ≤ J(A, B) ≤ 1.

Identification of Adjacent Relationships among Parcels
Adjacent objects can be more spatially dependent or associated, which can be explained by Tobler's first law geographically and Gestalt principles cartographically, respectively. The grouping process is performed gradually to avoid producing extremely small and meaningless groups, as would result from the simultaneous use of all measures. The first step of grouping is to analyse proximity relationships [50]. Constraint Delaunay triangulation (CDT) is often used for the extraction of skeletons from map patches, as it possesses several highly desirable traits, such as adjacency and regional character [48,51,55].
The relationship between two parcels was expressed by the graph-based method. First, we constructed the CDT for the parcels using a point-by-point insertion algorithm. Second, we classified triangles according to the parcel to which the triangle's vertices belonged. The triangle, with vertices belonging to two different parcels, was named a connecting triangle. If two parcels were connected by connecting triangles, we named the two parcels conflicting parcels, which were evaluated based on the condition of 'whether the three vertices of a triangle belong to the same polygon boundary'. When two parcels consisted of conflicting parcels, we considered them to have an adjacent relationship. The CDT was constructed for all parcels by analysing whether an adjacent relationship existed between any two parcels. Considering that a long and thin triangle easily produces an incorrect assessment of the adjacency relationship between two parcels (refer to Figure 5), it was necessary to eliminate the long and thin triangles in connecting triangles.
This paper implements the redundant marking of long and thin triangles based on an area method [61], according to the ratio of the area of the equilateral triangle to the sum of the squares of the largest sides. The calculation method is expressed as follows: where (l 1 , l 2 , l 3 ) are the lengths of the three sides, and w is the positive area of the triangle. R w is regularity, 0 ≤ R w ≤ 1, and the regularity of the equilateral triangle is 1. When the triangle tends to be long and thin, R w tends towards zero.
The relationship between two parcels was expressed by the graph-based method. First, we constructed the CDT for the parcels using a point-by-point insertion algorithm. Second, we classified triangles according to the parcel to which the triangle's vertices belonged. The triangle, with vertices belonging to two different parcels, was named a connecting triangle. If two parcels were connected by connecting triangles, we named the two parcels conflicting parcels, which were evaluated based on the condition of 'whether the three vertices of a triangle belong to the same polygon boundary'. When two parcels consisted of conflicting parcels, we considered them to have an adjacent relationship. The CDT was constructed for all parcels by analysing whether an adjacent relationship existed between any two parcels. Considering that a long and thin triangle easily produces an incorrect assessment of the adjacency relationship between two parcels (refer to Figure 5), it was necessary to eliminate the long and thin triangles in connecting triangles. This paper implements the redundant marking of long and thin triangles based on an area method [61], according to the ratio of the area of the equilateral triangle to the sum of the squares of the largest sides. The calculation method is expressed as follows: The triangles in the CDT connect parcels that are adjacent (refer to Figure 6). The adjacent relationship between P 1 and P 2 is obtained from the edges of the connecting triangles, such as triangle ABC (AB). The heights (h) of all connecting triangles between two adjacent parcels are calculated and used to calculate the proximity between two adjacent parcels. where ( , , ) are the lengths of the three sides, and w is the positive area of the triangle. is regularity, 0 ≤ ≤ 1, and the regularity of the equilateral triangle is 1. When the triangle tends to be long and thin, tends towards zero. The triangles in the CDT connect parcels that are adjacent (refer to Figure 6). The adjacent relationship between P1 and P2 is obtained from the edges of the connecting triangles, such as triangle ABC (AB). The heights (h) of all connecting triangles between two adjacent parcels are calculated and used to calculate the proximity between two adjacent parcels.

Method for Measuring the Proximity of Parcels
In spatial clustering or pattern recognition, the distance between geographic elements is often calculated using the Euclidean distance, and a classification result is yielded according to the distance between two elements (i.e., adjacent elements are assigned to the same class). Some studies [52,62,63] have attempted to use the centroid to represent a line, a polygon or an elements group, considering that the influence of the size, direction, layout and other factors of the element on the mode structure can be disregarded in the recognition process. For a polygon element, such as a parcel, a major problem with the traditional distance calculation methods (e.g., maximum distance, minimum distance, and centroid distance) is that they only measure the distance between two points or the centroid distance of parcels and cannot adequately describe the similarity of the form between adjacent parcels and the compactness of discrete parcels.
An analysis of the characteristics of a parcel indicates that when two parcels are adjacent and have topological connectivity, their adjacent edges have a similarity of line segments, i.e., the natural extension orientation of adjacent edges is nearly equivalent. The two sets of skeleton points, which comprise the line segments, have the same central tendency and a fuzzy matching relationship; the distance between two line segments is nearly equal everywhere. To express this kind of similarity

Method for Measuring the Proximity of Parcels
In spatial clustering or pattern recognition, the distance between geographic elements is often calculated using the Euclidean distance, and a classification result is yielded according to the distance between two elements (i.e., adjacent elements are assigned to the same class). Some studies [52,62,63] have attempted to use the centroid to represent a line, a polygon or an elements group, considering that the influence of the size, direction, layout and other factors of the element on the mode structure can be disregarded in the recognition process. For a polygon element, such as a parcel, a major problem with the traditional distance calculation methods (e.g., maximum distance, minimum distance, and centroid distance) is that they only measure the distance between two points or the centroid distance of parcels and cannot adequately describe the similarity of the form between adjacent parcels and the compactness of discrete parcels.
An analysis of the characteristics of a parcel indicates that when two parcels are adjacent and have topological connectivity, their adjacent edges have a similarity of line segments, i.e., the natural extension orientation of adjacent edges is nearly equivalent. The two sets of skeleton points, which comprise the line segments, have the same central tendency and a fuzzy matching relationship; the distance between two line segments is nearly equal everywhere. To express this kind of similarity and further extract the proximity between two parcels, we applied the method of the Hausdorff-like distance (HD) to construct the calculation method of proximity, considering that the Hausdorff distance is the distance between two proper subsets in metric spaces [64,65].
The HD from set X to Set Y is a maximin function that is defined as follows: where x and y are points of set X and set Y, respectively, and * is a certain distance norm between feature points x and y, such as the sum norm, Euclidean distance, and maximal norm. A more general definition of HD is as follows: which defines the HD between X and Y, while Equation (4) is applied to HD from X to Y (also referred to as the directed Hausdorff distance). The two distances h(X, Y) and h(Y, X) are sometimes termed as the forward HD and backward HD, respectively, of X to Y. In this paper, the heights of the connecting triangles in two blocks were taken as the HD's distance norm (refer to Equation (6)). Considering that HD is very sensitive to noise points (that is, sensitive to outliers), to solve this problem, the median of the sequence of heights replaces the maximum of these heights in Equation (5), which not only effectively avoids noise interference but also reflects the central tendency of the set of boundary points in parcels.
where triangles(X, Y) represents the connecting triangles between two adjacent parcels, and heightList( * ) is storage for heights of connecting triangles.

Description of Urban Parcel Grouping (UPG) Algorithm
The graph-based grouping method is the most common approach in the grouping of polygons [43,45,[48][49][50]55]. As the parcel proximity calculation method proposed in this paper is a binary operation and satisfies commutative law (i.e., the adjacency relationship is bi-directional), an adjacency matrix of an undirected graph can be established for storing the proximity between parcels. Construction of the parcel adjacency matrix was primarily filled with three types of values (refer to Figure 7a): (1) for truly adjacent parcels, the adjacency matrix was filled according to the proximity between two parcels; (2) when two parcels are non-adjacent, the adjacency matrix was filled with the centroid distance; (3) for the part connected by long and thin triangles, we chose a random constant that was larger than the maximum proximity to fill the adjacency matrix.
The parcel grouping combined some compact-distribution parcels. Under the constraint of the distance threshold, the proximity of parcels is transitive and ensures parcel connectivity within a certain range. The parcel grouping process was an adjacent searching process, as illustrated in Figure 7b. Therefore, this paper intends to build some parcel grouping trees according to the depth-first traversal, form a grouping forest, and so to obtain the result of the parcel grouping.
The proposed algorithm for grouping parcels involved three main steps (refer to Figure 7c): (1) Extract the keys of a hash table, which stores conflicting parcels, and save these keys as a new list named ParcelID. (2) Iterate over the ParcelID. In the loop, we pop the value named idCurrent at the top of the ParcelID and then input the distance threshold, idCurrent, ParcelID, and the parcel's adjacent matrix into the depth-first traversal function to obtain the visited list of parcels. (3) Add the visited list to a hash table named Group_Result and remove it from ParcelID, update the ParcelID and enter the next loop. The algorithm terminates when ParcelID is empty. operation and satisfies commutative law (i.e., the adjacency relationship is bi-directional), an adjacency matrix of an undirected graph can be established for storing the proximity between parcels. Construction of the parcel adjacency matrix was primarily filled with three types of values (refer to Figure 7a): (1) for truly adjacent parcels, the adjacency matrix was filled according to the proximity between two parcels; (2) when two parcels are non-adjacent, the adjacency matrix was filled with the centroid distance; (3) for the part connected by long and thin triangles, we chose a random constant that was larger than the maximum proximity to fill the adjacency matrix. The parcel grouping combined some compact-distribution parcels. Under the constraint of the distance threshold, the proximity of parcels is transitive and ensures parcel connectivity within a With regard to the setting of the distance threshold, a proximity sequence was constructed, that is, the set of proximity values among all conflicting parcels. According to the series of hierarchical models, the entire proximity sequence was divided into several levels, and the minimum value in the current level was selected as a distance threshold. Level-by-level tuning is achieved by hierarchically setting the distance threshold. The numerical series hierarchical models can be calculated as follows: where S represents the proximity list, S 1 to S i belong to the first level, S i+1 belongs to the second level, then S i+1 to S i+j belongs to the second level and S i+1+j belongs to the third level. By analogy, we obtained a hierarchical proximity sequence. The result of level-by-level tuning is illustrated in Figure 8. According to the results, we observed that the number of groups easily changed with the levels. For example, at level 1, each parcel constituted a group (see Figure 8a) or, at level 13, all parcels formed only one group (see Figure 8c). According to Gestalt theory, this was not a very good means of grouping, and could not effectively convey the compactness and proximity mutation. Therefore, an optimum number of groups should be set to make the grouping process more automated and obtain a reasonable grouping result.
According to the results, we observed that the number of groups easily changed with the levels. For example, at level 1, each parcel constituted a group (see Figure 8a) or, at level 13, all parcels formed only one group (see Figure 8c). According to Gestalt theory, this was not a very good means of grouping, and could not effectively convey the compactness and proximity mutation. Therefore, an optimum number of groups should be set to make the grouping process more automated and obtain a reasonable grouping result.

Method of Obtaining the Optimum Grouping Result
We used cluster validation indices to measure whether a structure identified by the grouping analysis was adequate and how appropriately a parcel was clustered. Among the current indices, the silhouette criterion [66] is the most prevalent index for determining an appropriate value of a cluster number. The range of the silhouette value is between 0 and 1. A high value (near 1) indicates that an object is appropriately clustered and is highly unique from other clusters.
Using the method of measuring the silhouette value as a reference, a calculation index for the compactness of the parcels intra-group was proposed, which was based on the parcel grouping method (refer to Section 3.4.1) and the proximity calculation method (refer to Section 3.3.2). We only considered the parcel that was directly adjacent to another parcel (i.e., the conflicting parcels) and used the proximity between the two parcels as an input to the compactness calculation model. When the distance threshold (DT) is set for a parcel that conflicts with parcel i, the proximity is less than the DT, which is referred to as the intra-group distance (INDIS), while the proximity is greater than the DT, which is referred to as the inter-group distance (OUTDIS).
For each parcel i, let Avg_INDIS(i) be the average INDIS between parcel i and all other parcels within the same group. Let Avg_OUTDIS(i) be the average OUTDIS of parcel i to all parcels in any other group, of which i is not a member. We define the compactness as follows: which can also be written as follows: This definition indicates that the Compactness(i) is not consistent with the silhouette criterion. For Compactness(i) to be 1, we require that Avg_INDIS(i) to be 0. As Avg_INDIS(i) is a measure of how dissimilar i is to its cluster, a small value indicates that it is well matched. A large Avg_OUTDIS(i) implies that i is poorly matched to its neighbouring group. Thus, when the compactness is equal to 1, all parcels become separate groups (refer to Figures 8a and 9b). If the compactness is equal to minus one when Avg_OUTDIS(i) is zero, then we observe that all parcels are aggregated into one group (refer to Figures 8c and 9a). In addition to the two special cases, Avg_OUTDIS(i) is always greater than Avg_INDIS(i) according to their definitions (refer to Figure 9c). When a parcel is completely surrounded by other parcels in the same group, it is not directly adjacent to the parcels of other groups, its Avg_OUTDIS is replaced by the minimum OUTDIS of all other parcels within the same group. The average Compactness(i) for all parcels of a group is a measure of how tightly the parcels in the group are grouped. Thus, the average Compactness(i) for all data of the entire dataset is a measure of how appropriately the data have been grouped. To illustrate the relationship between the proximity of parcels and the compactness of parcel groups, we simulated the parcel grouping process (refer to Figure 10). Figure 10a shows five parcels, and the proximity relationship between these parcels is d1 < d2 < d3 < d4 < d5. To clearly identify the compactness of parcel groups, we assumed that d1 was slightly less than d2, d3 was approximately 3 times d2, d4 was approximately 2 times d3 and d5 was approximately 1.5 times d4. The grouping process started from ungrouped parcels (refer to Figure 10a) and gradually increased the proximity threshold to obtain different grouping results (refer to Figures 10b to 10f).
On the basis of Equation (10), we calculated Compactness(Pi) of each parcel (refer to Table 3) and To illustrate the relationship between the proximity of parcels and the compactness of parcel groups, we simulated the parcel grouping process (refer to Figure 10). Figure 10a shows five parcels, and the proximity relationship between these parcels is d1 < d2 < d3 < d4 < d5. To clearly identify the compactness of parcel groups, we assumed that d1 was slightly less than d2, d3 was approximately 3 times d2, d4 was approximately 2 times d3 and d5 was approximately 1.5 times d4. The grouping process started from ungrouped parcels (refer to Figure 10a) and gradually increased the proximity threshold to obtain different grouping results (refer to Figure 10b-f).
On the basis of Equation (10), we calculated Compactness(Pi) of each parcel (refer to Table 3) and drew the curve to describe the relationship between average of Compactness(Pi) and the proximity threshold (refer to Figure 10g). The curve exhibited a declining trend as a whole: (1) when the value of the curve was 1 or close to 1, the corresponding grouping result was ungrouped, that is, numerous parcels become separated groups; (2) when the value of the curve was −1, all parcels aggregate into one group. The curve has a mutation value, which corresponds to the grouping result, as illustrated in Figure 10c. Among all grouping results, Figure 10c showed that the intergroup distance was larger than the intragroup distance, which was consistent with the cognitive habit and satisfies the requirements of compactness and continuity in Gestalt theory. We determined that the reason for the mutation value was that Avg_OUTDIS(i) was larger than Avg_OUTDIS(i). Therefore, to obtain a reasonable parcel grouping result, we should determine the proximity threshold, which has a mutation value in the compactness curve. Through the analysis of Table 3, we also determined that the compactness of the internal parcel grouping increased gradually, as in P1. If the number of such parcels in the grouping result also increases gradually, then the value of the curve may increase. To illustrate the relationship between the proximity of parcels and the compactness of parcel groups, we simulated the parcel grouping process (refer to Figure 10). Figure 10a shows five parcels, and the proximity relationship between these parcels is d1 < d2 < d3 < d4 < d5. To clearly identify the compactness of parcel groups, we assumed that d1 was slightly less than d2, d3 was approximately 3 times d2, d4 was approximately 2 times d3 and d5 was approximately 1.5 times d4. The grouping process started from ungrouped parcels (refer to Figure 10a) and gradually increased the proximity threshold to obtain different grouping results (refer to Figures 10b to 10f).
On the basis of Equation (10), we calculated Compactness(Pi) of each parcel (refer to Table 3) and drew the curve to describe the relationship between average of Compactness(Pi) and the proximity threshold (refer to Figure 10g). The curve exhibited a declining trend as a whole: (1) when the value of the curve was 1 or close to 1, the corresponding grouping result was ungrouped, that is, numerous parcels become separated groups; (2) when the value of the curve was −1, all parcels aggregate into  Table 3. Description of relationship between proximity threshold and compactness.

Analysis of Infiltration Behaviour
According to Section 3.2, we used the metrics (Definitions 1 to 3) to analyse the two sets of parcel data (Parcel 1 and Parcel 2, refer to Section 3.1.2) with regard to three aspects: the MLU index, components similarity, and urban domain function.
MLU aims to measure the mixing degree of land use. Due to the complicated land use in the city centre, both datasets showed a higher mixed land use index (refer to Figure 11), which is consistent with the research of Long [29]. We also discovered that in most cases, the single-function regions were adjacent to the mixed function area. Region agglomeration development was observed in mushrooming financial districts, retail malls, and technology centres, which generated more mixed development than planned by the government [67]. The similarity component is a quantitative expression of the imitation behaviours in urban parcels. According to Definition 3 (refer to Section 3.2), in this study, values greater than 0.5 represent a high degree of similarity, which is denoted by the red line in the figures ( Figure 12); 0.3 to 0.5 indicates a moderate similarity, which is denoted by the blue line; and less than 0.3 represents a low degree of similarity or dissimilarity, which is denoted by the grey line. According to the results (refer to Figure 12), Parcel 1 has 33 instances of high-degree imitation behaviours, and Parcel 2 has 192 instances of high-degree imitation behaviours. The higher degree of similarity primarily occurred in clusters, and a large proportion occurred in adjacent parcels, from a global perspective. In addition, a dissimilarity or lower similarity of components existed in parcels that were not directly adjacent. The dominant function was attributed to the joint action between functional complementarity behaviour and imitation behaviour. As the group size of parcels increased, the functional combination mode became more complicated (refer to Figure 13). To some extent, the adjacent parcels The similarity component is a quantitative expression of the imitation behaviours in urban parcels. According to Definition 3 (refer to Section 3.2), in this study, values greater than 0.5 represent a high degree of similarity, which is denoted by the red line in the figures (Figure 12); 0.3 to 0.5 indicates a moderate similarity, which is denoted by the blue line; and less than 0.3 represents a low degree of similarity or dissimilarity, which is denoted by the grey line. According to the results (refer to Figure 12), Parcel 1 has 33 instances of high-degree imitation behaviours, and Parcel 2 has 192 instances of high-degree imitation behaviours. The higher degree of similarity primarily occurred in clusters, and a large proportion occurred in adjacent parcels, from a global perspective. In addition, a dissimilarity or lower similarity of components existed in parcels that were not directly adjacent. The similarity component is a quantitative expression of the imitation behaviours in urban parcels. According to Definition 3 (refer to Section 3.2), in this study, values greater than 0.5 represent a high degree of similarity, which is denoted by the red line in the figures (Figure 12); 0.3 to 0.5 indicates a moderate similarity, which is denoted by the blue line; and less than 0.3 represents a low degree of similarity or dissimilarity, which is denoted by the grey line. According to the results (refer to Figure 12), Parcel 1 has 33 instances of high-degree imitation behaviours, and Parcel 2 has 192 instances of high-degree imitation behaviours. The higher degree of similarity primarily occurred in clusters, and a large proportion occurred in adjacent parcels, from a global perspective. In addition, a dissimilarity or lower similarity of components existed in parcels that were not directly adjacent. The dominant function was attributed to the joint action between functional complementarity behaviour and imitation behaviour. As the group size of parcels increased, the functional combination mode became more complicated (refer to Figure 13). To some extent, the adjacent parcels The dominant function was attributed to the joint action between functional complementarity behaviour and imitation behaviour. As the group size of parcels increased, the functional combination mode became more complicated (refer to Figure 13). To some extent, the adjacent parcels had a complementary dominant function, and most cases involved the imitation of the dominant function.  This analysis concluded that the infiltration behaviour primarily occurred in adjacent parcels, which is a trend from discrete distribution to aggregated distribution. The combination of adjacent parcels can effectively increase the MLU index and complement the functions. From the perspective of urban form, the grouping of parcels based on the adjacent relationship is the macroscopic manifestation of infiltration behaviour. In addition, imitation behaviour and functional complementarity behaviour do not exist in isolation. Due to the long-term functional complementarity behaviour in parcels, a large number of interactions among personnel and purchase behaviours have emerged. In the process of seeking convenience within parcels, imitation behaviour is gradually formed in neighbourhoods. Therefore, infiltration behaviour motivates parcel grouping.
Furthermore, the quality of grouping results can be evaluated by infiltration behaviour. Acceptable grouping results should ensure that the influence of infiltration behaviours are considered as much as possible. Specifically, in this study, after the grouping, the degree of mixed land use within the group was improved, the same dominant functions were not separated, and fewer high-degree imitation behaviours were ignored.

Parcel grouping Method Based on the Centroid Proximity
Urban parcels, unlike other urban elements, are usually not distinctly separated. The k-means++ algorithm is capable of clustering this kind of distribution and can ensure a better clustering centre compared with k-means. Considering that the centroid distance can be measured by the Euclidean distance, this study applied the k-means++ algorithm as a comparative algorithm to implement parcel grouping, and optimised the number of groups using the elbow method based on sum of the squared errors (SSE) and the silhouette coefficient. Comprehensively analysing the results of the SSE curve and silhouette coefficient curve, we let Parcel 1 have 8 clusters (K=8) and Parcel 2 have 10 clusters (K=10). The results are illustrated in Figure 14. This analysis concluded that the infiltration behaviour primarily occurred in adjacent parcels, which is a trend from discrete distribution to aggregated distribution. The combination of adjacent parcels can effectively increase the MLU index and complement the functions. From the perspective of urban form, the grouping of parcels based on the adjacent relationship is the macroscopic manifestation of infiltration behaviour. In addition, imitation behaviour and functional complementarity behaviour do not exist in isolation. Due to the long-term functional complementarity behaviour in parcels, a large number of interactions among personnel and purchase behaviours have emerged. In the process of seeking convenience within parcels, imitation behaviour is gradually formed in neighbourhoods. Therefore, infiltration behaviour motivates parcel grouping.
Furthermore, the quality of grouping results can be evaluated by infiltration behaviour. Acceptable grouping results should ensure that the influence of infiltration behaviours are considered as much as possible. Specifically, in this study, after the grouping, the degree of mixed land use within the group was improved, the same dominant functions were not separated, and fewer high-degree imitation behaviours were ignored.

Parcel grouping Method Based on the Centroid Proximity
Urban parcels, unlike other urban elements, are usually not distinctly separated. The k-means++ algorithm is capable of clustering this kind of distribution and can ensure a better clustering centre compared with k-means. Considering that the centroid distance can be measured by the Euclidean distance, this study applied the k-means++ algorithm as a comparative algorithm to implement parcel grouping, and optimised the number of groups using the elbow method based on sum of the squared errors (SSE) and the silhouette coefficient. Comprehensively analysing the results of the SSE curve and silhouette coefficient curve, we let Parcel 1 have 8 clusters (K = 8) and Parcel 2 have 10 clusters (K = 10). The results are illustrated in Figure 14. Compared with the results in Section 4.1, this step evaluated the grouping results with regard to three aspects: 1) average increase of MLU within groups, 2) whether the same dominant functions were separated, and 3) ratio of ignored imitation behaviours to all high-degree imitation behaviours. The evaluation results are presented in Table 4. We found that the grouped parcels could increase the MLU index and achieve functional complementarity. However, this grouping mode disregarded the functional connectivity between parcels. Some clustered high-degree imitation behaviours (refer to Figure 15) and some same dominant functions were separated into different groups. Compared with the results in Section 4.1, this step evaluated the grouping results with regard to three aspects: (1) average increase of MLU within groups, (2) whether the same dominant functions were separated, and (3) ratio of ignored imitation behaviours to all high-degree imitation behaviours. The evaluation results are presented in Table 4. We found that the grouped parcels could increase the MLU index and achieve functional complementarity. However, this grouping mode disregarded the functional connectivity between parcels. Some clustered high-degree imitation behaviours (refer to Figure 15) and some same dominant functions were separated into different groups.    (42) According to the grouping results, although the parcels in the same group may satisfy the centroid adjacent, large gaps were observed within a group (refer to Figures 14 to 15). These gaps not only interrupted the proximity and the continuity, which was elaborated in Gestalt theory, but also disregarded the potential semantic relativity of urban parcels and further weakened the spatial correlation in parcels. Based on this discussion, as the morphology of parcels is capable of reflecting semantic relativity in urban space, grouping parcels based on the centroid distance is not sufficient.

Analysis of the UPG Method
First, we calculated the proximity between two conflicting parcels according to the method proposed in Section 3.3. The proximity sequence and the parcel adjacency matrix were formed by constructing CDT and calculating proximity. To obtain a better expression of parcel grouping results, according to Equations (7) and (8) (refer to Section 3.4.1), we built the distance threshold sequence based on the proximity sequence. The values in the distance threshold sequence were brought into the parcel grouping algorithm to obtain grouping results that corresponded to different threshold levels. The compactness that corresponds to each level was also calculated based on Equations (9) and (10) to form a compactness curve. The following figures show the compactness curves of two data sets. According to the grouping results, although the parcels in the same group may satisfy the centroid adjacent, large gaps were observed within a group (refer to Figures 14 and 15). These gaps not only interrupted the proximity and the continuity, which was elaborated in Gestalt theory, but also disregarded the potential semantic relativity of urban parcels and further weakened the spatial correlation in parcels. Based on this discussion, as the morphology of parcels is capable of reflecting semantic relativity in urban space, grouping parcels based on the centroid distance is not sufficient.

Analysis of the UPG Method
First, we calculated the proximity between two conflicting parcels according to the method proposed in Section 3.3. The proximity sequence and the parcel adjacency matrix were formed by constructing CDT and calculating proximity. To obtain a better expression of parcel grouping results, according to Equations (7) and (8) (refer to Section 3.4.1), we built the distance threshold sequence based on the proximity sequence. The values in the distance threshold sequence were brought into the parcel grouping algorithm to obtain grouping results that corresponded to different threshold levels. The compactness that corresponds to each level was also calculated based on Equations (9) and (10) to form a compactness curve. The following figures show the compactness curves of two data sets.
As Figure 16 illustrates, curve (a) is generally declining, and curve (b) declines at first then rises. Both curves began with a significant decline which we named an "unstable region". According to Equation (10), when Avg_INDIS(i) is close to Avg_OUTDIS(i), the compactness of a parcel is close to 0. The reason for the significant declining curves is that there were many parcels like this at the current stage. At this stage, many parcels form a group independently, and only a few parcels are grouped. The grouping results at this stage are inconsistent with the Gestalt theory and meaningless to our study, so we ignored the unstable area in the front of curves. The reason for the rising of curve (b) is that the increasing number of internal parcels, which had increasing Compactness(i) (refer to Equation (10)) with proximity level. Because internal parcels have been grouped, they do not affect the grouping results. As Figure 16 illustrates, curve (a) is generally declining, and curve (b) declines at first then rises. Both curves began with a significant decline which we named an "unstable region". According to Equation (10), when Avg_INDIS(i) is close to Avg_OUTDIS(i), the compactness of a parcel is close to 0. The reason for the significant declining curves is that there were many parcels like this at the current stage. At this stage, many parcels form a group independently, and only a few parcels are grouped. The grouping results at this stage are inconsistent with the Gestalt theory and meaningless to our study, so we ignored the unstable area in the front of curves. The reason for the rising of curve (b) is that the increasing number of internal parcels, which had increasing Compactness(i) (refer to Equation (10)) with proximity level. Because internal parcels have been grouped, they do not affect the grouping results.
Some sudden changes (such as level 16 in two curves) in compactness reveals that the compactness of two adjacent levels have significantly changed. Extracting the mutation characteristics of compactness allows a reasonable expression of grouping that satisfies Gestalt theory to be portrayed (refer to Section 3.4.2). When the curves tended to be steady, we chose the point at which the compactness distinctly changed. We discovered that both datasets obtained good visual expressions at level 16 (refer to Figure 17), which effectively conveyed the compactness and adjacency. Combined with the results of Section 4.1, the MLU indices of the two averages increased by 0.24 and 2.45 (refer to Table 5), which indicated that the functional configuration within groups was improved. Our method maintained the imitation behaviours of the parcel components (refer to Figure 18), with a distinct decrease in the ratio of the ignored imitation behaviours compared with the k-means++ algorithm (refer to Table 4). Based on this analysis, our method not only achieves a good visual expression of parcel grouping, but also ensures that the functional connectivity is not destroyed. Some sudden changes (such as level 16 in two curves) in compactness reveals that the compactness of two adjacent levels have significantly changed. Extracting the mutation characteristics of compactness allows a reasonable expression of grouping that satisfies Gestalt theory to be portrayed (refer to Section 3.4.2). When the curves tended to be steady, we chose the point at which the compactness distinctly changed. We discovered that both datasets obtained good visual expressions at level 16 (refer to Figure 17), which effectively conveyed the compactness and adjacency. Combined with the results of Section 4.1, the MLU indices of the two averages increased by 0.24 and 2.45 (refer to Table 5), which indicated that the functional configuration within groups was improved. Our method maintained the imitation behaviours of the parcel components (refer to Figure 18), with a distinct decrease in the ratio of the ignored imitation behaviours compared with the k-means++ algorithm (refer to Table 4). Based on this analysis, our method not only achieves a good visual expression of parcel grouping, but also ensures that the functional connectivity is not destroyed.

Practical Application of the UPG
In Section 4.3, we further verified the validity of the proposed method using Xicheng District in Beijing. CDT was utilised to construct the spatial proximity relationships; the result is shown in Figure 19b. The compactness curve showed a variation value (refer to Figure 19d), which we referred to as a proximity mutation. As depicted in our algorithm, we used level 22, to which the proximity mutation corresponded, as the input value, and obtained the ideal grouping result (refer to Figure  19c), which consisted of 51 groups. Compared to the MLU index of each parcel before grouping (refer to Figure 19a), the MLU index after grouping increased by 1.68 on average, and the overall growth was 85.71, which indicated that the parcels obtained functional complementarity.

Practical Application of the UPG
In Section 4.3, we further verified the validity of the proposed method using Xicheng District in Beijing. CDT was utilised to construct the spatial proximity relationships; the result is shown in Figure 19b. The compactness curve showed a variation value (refer to Figure 19d), which we referred to as a proximity mutation. As depicted in our algorithm, we used level 22, to which the proximity mutation corresponded, as the input value, and obtained the ideal grouping result (refer to Figure

Practical Application of the UPG
In Section 4.3, we further verified the validity of the proposed method using Xicheng District in Beijing. CDT was utilised to construct the spatial proximity relationships; the result is shown in Figure 19b. The compactness curve showed a variation value (refer to Figure 19d), which we referred to as a proximity mutation. As depicted in our algorithm, we used level 22, to which the proximity mutation corresponded, as the input value, and obtained the ideal grouping result (refer to Figure 19c), which consisted of 51 groups. Compared to the MLU index of each parcel before grouping (refer to Figure 19a), the MLU index after grouping increased by 1.68 on average, and the overall growth was 85.71, which indicated that the parcels obtained functional complementarity. spatial correlation of parcels within the same group. According to 'Technical code for urban road engineering (GB51286-2018)', and combined with the Baidu Map for an approximate distance measurement, we calculated that the width of the road within the parcel group ranged from 8 metres to 20 metres, and the width range showed agreement with the width range of branch roads, i.e., the fourth-level roads in the road grade system. The branch road, which has traffic function and service function, primarily provides a suitable living space, parking space, and necessary public space. The grouping result is consistent with the original intention of our design. The grouping result enables groups to be merged upward into a larger functional region (refer to Figure 20a). The case of the functional region that is partitioned into several groups is named "hard segmentation" in this paper. Hard segmentation disperses the dominant function of a functional region, which may weaken the dominant role of this function after grouping. As illustrated in Figure  20b, a functional region was partitioned and one part was assigned to Group 1 and the other to Group 2. The dominant function (park and plaza) of this region, however, was not dominant; that is, the As we discussed previously, narrow roads are more suitable for residents to walk or bike, which not only improve the mobility of residents but also increase the frequency of parcel interaction. Thus, grouping results should ensure that the roads within the group are narrower and the roads between two groups are wider, which enable parcels of the intra-group to frequently interact and enhance the spatial correlation of parcels within the same group. According to 'Technical code for urban road engineering (GB51286-2018)', and combined with the Baidu Map for an approximate distance measurement, we calculated that the width of the road within the parcel group ranged from 8 m to 20 m, and the width range showed agreement with the width range of branch roads, i.e., the fourth-level roads in the road grade system. The branch road, which has traffic function and service function, primarily provides a suitable living space, parking space, and necessary public space. The grouping result is consistent with the original intention of our design.
The grouping result enables groups to be merged upward into a larger functional region (refer to Figure 20a). The case of the functional region that is partitioned into several groups is named "hard segmentation" in this paper. Hard segmentation disperses the dominant function of a functional region, which may weaken the dominant role of this function after grouping. As illustrated in Figure 20b, a functional region was partitioned and one part was assigned to Group 1 and the other to Group 2. The dominant function (park and plaza) of this region, however, was not dominant; that is, the dominant function in Group 1 is residential area and the dominant function in Group 2 is catering. Therefore, hard segmentation should be avoided as much as possible. The number of groups that partition the functional region was named "number of hard segmentations". We counted the number of hard segmentations and listed these in Table 6. Therefore, hard segmentation should be avoided as much as possible. The number of groups that partition the functional region was named "number of hard segmentations". We counted the number of hard segmentations and listed these in Table 6. To understand how well the grouping result matches the urban land use, we measured the hard segmentation by the amount of overlapping that exists among the grouping results (refer to Figure  19c  According to the evaluation results presented in Table 6, we discovered that the hard segmentation ratio of our result is 9.80%, which is considerably lower than the other two methods, i.e., 57.14% and 39.22%. The UPG method can effectively organise the urban parcels data and ensures that the potential semantics in urban morphology are rarely destroyed when fully considering the spatial interaction patterns between two parcels. When given sufficient semantic information, our method can more reasonably infer the urban structure and urban function than other methods. This evaluation illustrates that our methodology, using a finer division, has the potential to yield  To understand how well the grouping result matches the urban land use, we measured the hard segmentation by the amount of overlapping that exists among the grouping results (refer to Figure 19c, Figure 21a,b) with Beijing's Urban Master Plan (2016-2035) (http://ghzrzyw.beijing.gov.cn/art/2018/1/9/ art_5096_544304.html). To ensure the validity of the contrast experiment, we performed the k-means++ method with two different settings of cluster number: (1) the optimised cluster number (k = 7) and (2) the same cluster number as group number of UPG (k = 51).
According to the evaluation results presented in Table 6, we discovered that the hard segmentation ratio of our result is 9.80%, which is considerably lower than the other two methods, i.e., 57.14% and 39.22%. The UPG method can effectively organise the urban parcels data and ensures that the potential semantics in urban morphology are rarely destroyed when fully considering the spatial interaction patterns between two parcels. When given sufficient semantic information, our method can more reasonably infer the urban structure and urban function than other methods. This evaluation illustrates that our methodology, using a finer division, has the potential to yield reasonable and rational results, identify the functional connectivity in urban space and generate the development of a new method for characterising urban behaviour.
To understand how well the grouping result matches the urban land use, we measured the hard segmentation by the amount of overlapping that exists among the grouping results (refer to Figure  19c, Figure 21a, Figure 21b) with Beijing's Urban Master Plan (2016-2035) (http://ghzrzyw.beijing.gov.cn/art/2018/1/9/art_5096_544304.html). To ensure the validity of the contrast experiment, we performed the k-means++ method with two different settings of cluster number: (1) the optimised cluster number (k = 7) and (2) the same cluster number as group number of UPG (k = 51). According to the evaluation results presented in Table 6, we discovered that the hard segmentation ratio of our result is 9.80%, which is considerably lower than the other two methods, i.e., 57.14% and 39.22%. The UPG method can effectively organise the urban parcels data and ensures that the potential semantics in urban morphology are rarely destroyed when fully considering the spatial interaction patterns between two parcels. When given sufficient semantic information, our method can more reasonably infer the urban structure and urban function than other methods. This evaluation illustrates that our methodology, using a finer division, has the potential to yield

Discussion
Some existing urban planning theories, such as functional zoning and neighbourhood unit, promote the development of parcels. However, these methods continually expand the size of region.
Fine-grained open parcels are usually more realistic than coarse-grained regions, with sustainable development capabilities and better city vitality. Jan Gehl [68] contended that street life scenes become dull and desolate when large units replace small, vivid units. Thus, considering the grain size of parcels is imperative in future urban planning. This research contributes to providing a new methodology for constructing the urban spatial structure based on fine-grained parcel data, which addresses the deficiencies in other methods of grouping parcels. The proposed parcel grouping method in this study is a method of urban data organisation. Unlike other methods, such as urban function zoning [26,27,37] and blocks aggregation methods [56,57], this method differs from the traditional top-down planning scheme and does not merge the parcels. Therefore, the zoning process can be realised by grouping parcels and then establishing functional zoning within the group, which can reduce the grain size of regions and obtain better zoning results. The bottom-up parcel grouping method designed in this paper was guided by urban morphology, the Gestalt principle, and the MAUP. On the basis of CDT and the Hausdorff distance, the adjacent relationship of parcels was constructed, which enabled the grouping results to take both the spatial planning and the graphical representation into account. To effectively discover the proximity mutation among parcels, we also redefined the calculation of the silhouette coefficient, that is, Compactness, and further optimised the grouping results by analysing the compactness curve. Compared with other urban elements grouping research [48,50,55], our method has fewer defining parameters and is more automatic in selecting the number of groups, without extensive manual intervention.
A city is a complex system. As the basic unit of a city, a parcel reflects the characteristics of the city and is a microcosm of a larger system. The functional connectivity among parcels is the source of the parcel vitality and parcel diversity. This paper described parcel diversity according to the infiltration behaviours of the components between two parcels; they were elaborated and modelled based on three main aspects: dominant function, functional complementarity, and imitation behaviour. Combined with the test results of the infiltration behaviour, our grouping method cannot only improve the degree of mixed land-use in groups but also ensure that the imitation behaviour is not destroyed. To better test the validity, we compared our method and the k-means++ algorithm based on these indicators, which was further demonstrated in Section 4.3 with the hard segmentation ratio. According to the results, our method can either protect the functional region layout under the government's macro-control or obtain a fine-grained grouping result on a human scale.
Roads are the main space for urban public activities and the dividing lines of an urban space. The width of the street not only embodies the compactness of parcels distribution but also potentially reflects the resident trip mode (e.g., the narrower a road is, the more likely travel will be by bicycle or on foot). A narrow road is one of the main reasons for the infiltration of components. The narrow road can reduce car speeds and create a better environment for walking and cycling, which makes the urban area more accessible, creates fewer urban heat islands, lowers the cost of infrastructure development, and so on. Narrow roads increase the density of the gaps and passages in a city and provide space for various urban systems, which connect the various elements in the city as a whole to optimise the urban structure. Our method made full allowance for the role of the narrow road space. According to the grouping result, the widths of the road spaces between the groups were sufficiently wide and primarily consisted of urban trunk roads and expressways. The roads inside each group primarily consisted of branch roads, whose width ranged from 8 m to 20 m.

Conclusions and Further Research
This work has presented a method for urban parcel grouping. The method makes use of the concepts of urban morphology and urban functional connectivity for analysing urban infiltration behaviours and forming parcel groups. Two main research questions raised in the related work section were addressed and solutions proposed: (1) the calculation method of proximity proposed in this study effectively measured the adjacent relationship and compactness between parcels, which ensured a good visual expression of parcel grouping results; (2) the quantified infiltration behaviours were used to evaluate grouping results, which considers urban semantics compared with the geometric method. The effectiveness and practicability of the proposed grouping method have been validated using actual urban parcel data from Beijing and compared with k-means++ method. The results show that our method not only achieves fine-grained grouping results, but also fits with human cognition. It also takes into account infiltration behaviours of urban parcels and preserves urban functions more completely than other methods.
Although we have successfully obtained reasonable parcel grouping results, we discovered that our method has the potential to fall into the local optima (refer to Figure 19c, where some parcel groups are too small and should be combined with adjacent ones), which is a bottleneck for improved results. Therefore, in future research, we can take advantage of the gravitation search algorithm (GSA) or the particle swarm algorithm to improve the current algorithm and achieve the global optimum. Another important issue for future research is that urban design should leverage human behavioural data and human location data, which provides a shortcut for demonstrating the interaction between residents and cities. Thus, we plan to investigate additional sensor data, e.g., about public bicycle-sharing data, heat island data and other semantic signatures to further quantify the problems in urban design and make efforts to transform urban design from 'space oriented' to 'human oriented'. We will build a 'city portrayal' by combining the parcel grouping method in this study with a top-down knowledge engineering approach based on urban form, human geography and urban planning.
Author Contributions: Shuqing Zhang and Peng Wu conceived the original idea for the study, and Shuqing Zhang and Huapeng Li provided the financial support. Peng Wu was responsible for the design of the study, setting up the experiments, and writing the initial draft of the manuscript. Peng Wu, Xiaohui Ding, and Yuanbing Lu conducted the processing and analysis of the data. Patricia Dale polished the language; Shuqing Zhang, Patricia Dale, and Huapeng Li revised the manuscript critically. All authors read and approved the final manuscript.