Extracting Land Use Change Patterns of Rural Town Settlements with Sequence Alignment Method

: Understanding land use change patterns of rural town settlements (RTSs) is crucial for rural and small-town planning; however, few studies have explored pattern mining approaches to RTS trajectory analysis. In this study, we adopted a novel method by building sequence alignment method (SAM) to detect representative trajectory clusters of land use change of 1158 RTSs in seven waves from 1980 to 2015 in Guangdong, China. The results suggest that there are 10 clusters of RTSs with varying trajectories of land use change, implying their differences in the development processes and underlying socioeconomic, demographical, and institutional factors. A spatial distribution map of RTSs shows that stable cultivated ecological and stable ecologically dominant RTSs are distributed in the northern, eastern, and western parts of Guangdong, whereas stable rural construction and stable mixed construction RTSs are mostly located around the provincial boundary. Notably, 73% of the RTSs that have undergone changes in land use types are located in the Pearl River Delta (PRD), including urbanized and agricultural upgraded RTSs. The analysis presented here summarizes the driving forces of the spatial evolution of RTSs, including the location, landforms, industries, and policy factors. This study provides dynamic policy implications to understand longitudinal and sequential spatial restructuring and regional coordinated development in the fast-growing PRD area.


Introduction
Land use change is an important clue that indicates the development pattern of a region [1] and has long been the analytical focus in the fields of geography, natural resource, and urban and rural planning. Understanding a region's development pattern is critical in tracing the pathway of spatial restructuring that results from socioeconomic transformation. This helps planners and decision-makers diagnose the growth or decline issues of specific regions, designate optimal development paths, and formulate plans for sustainable regional development [2]. Although an increasing number of empirical studies have tracked urban neighborhood-level land use changes in urban areas [3][4][5], only a few of them have investigated the trajectories of land use changes in rural town settlements (RTSs), particularly in China. In general, the term "rural town settlement" refers to the broad concept of township settlements in rural areas, excluding those in urban areas. Currently, China is at a transitional stage of promoting urban-rural integration under the context of rapid urbanization, with the urbanization rates sharply growing from 19.39% in 1980 to 63.89% in 2020 [6]. Hence, developing or revitalizing RTSs has become one of the most important national development agendas. Therefore, more studies are needed to investigate the land use change patterns of RTSs.
In general, major challenge of such studies is the lack of longitudinal data in finer-grain space-time resolutions and methods for pattern mining among land use trajectories [7]. Most of the current studies on land use changes focus on the macro scale, including research at the global [8][9][10], national [11][12][13][14], regional [15][16][17][18][19], and city scales [20][21][22][23]. At Generally, there are different types of trajectory mining approaches, including hierarchical cluster analyses [33], k-means [34][35][36], and sequence alignment method (SAM) [3,5,37]. Delmelle [3] examined the longitudinal trajectories of neighborhood change in Chicago and Los Angeles from 1970 to 2010. She applied SAM to mine patterns in the evolution of neighborhood types and found that the two cities exhibit different processes of neighborhood upgrading. Overall, she found 10 patterns in Chicago and 9 patterns in Los Angeles. Patterns in Chicago are marked by a process of center city revitalization, whereas neighborhood upgrading occurred in the form of suburban improvement in Los Angeles. Although SAM have been increasingly applied in urban neighborhood change studies, they are less used in analyzing land use change patterns of RTSs. Urban neighborhood SAM studies often focus on spatial evolutionary patterns of neighborhoods' socioeconomic components, such as ethnicity and poverty [34,38], whereas the RTS analysis emphasizes land use change patterns. The aim of this study was, therefore, to fill this gap, with a particular focus on identifying the varying development patterns in rural town areas in China.
In this study, we adopted long-term remote sensing data from 1980 to 2015 (seven waves) to extract the land use types of 1158 RTSs in Guangdong, China, and then constructed a seven-wave sequence of land use trajectories for each individual RTS. Relying on these trajectory data, we adopted a SAM based on progressive alignment to cluster RTSs with similar land use trajectories into groups. The aim of these analyses is to answer the following research questions: (1) How many types of development patterns can be extracted from the trajectory mining of RTS land use sequences? (2) What are these development patterns of land use changes and the underlying socioeconomic and institutional factors resulting in such patterns?
This study contributes to the literature in two ways. First, the results obtained in this study can enrich the development pattern-mining studies of RTSs while focusing on settlements' land use changes. This can help researchers and practitioners better understand the different types of evolution processes of spatial restructuring of RTSs in Guangdong, China. Given that RTSs are under-researched, it is important to comprehend their development patterns before formulating national or regional policies to revitalize their growth and moderate village declines, according to the local development conditions. Second, unlike most of the current studies only highlighting urban neighborhood changes, this study is considered one of the first studies applying the SAM approach to detect patterns of RTS changes. Given that the development of RTSs has become increasingly critical worldwide, particularly in rapid-urbanization countries, such as China, the novel SAM approach provides a dynamic perspective to understand longitudinal and sequential spatial restructuring and regional coordinated development.
The rest of the paper is organized as follows: The next section describes the data for the analysis and introduces the method. An analysis of the pattern mining results comes next. Discussions and conclusions are offered in the final section.

Data
In this study, the measures of land use changes rely on a large set of remote sensing image data extracted from Landsat TM imagery in terms of manual visual interpretation by the Data Center for Resources and Environmental Sciences of the Chinese Academy of Sciences. The dataset contains seven waves of land use data from 1980,1990,1995,2000,2005,2010, and 2015, with a spatial resolution of 30 m. Overall, six primary types of land use exist, that is, cultivated land, forest land, grassland, water, residential land, and unused land, as well as 25 secondary types. Because this study increasingly focuses on the land use changes that result from the socioeconomic development of RTSs affected by human activities, instead of natural land cover change, we aggregated land parcels for ecological uses (including forest land, grassland, water, and unused land) into one type as ecological land. At the same time, to more accurately observe the impact of urban and rural activities on land use, we divided residential land into three categories: urban residential land, rural Land 2022, 11, 313 4 of 17 residential land, and nonresidential construction land. Finally, we considered five types of land use in this study: cultivated land, urban land, rural residential land, nonresidential construction land, and ecological land. The study used the areal size of these five land use types as input variables.
We selected the Guangdong Province, China as our research area. Guangdong is a fast-developing megaregion with widely distributed rural areas. Although many rural areas have developed into urban areas during the rapid urbanization phase, there are still many villages and towns at the urban fringes. Given the uneven development processes of different cities and regions, the trajectories of land use change in Guangdong are expected to be complicated and diverse. Thus, Guangdong is regarded as a representative case for development pattern mining analysis. Guangdong contains 21 cities, in which there are 1158 RTSs in the rural areas and 310 neighborhood subdistricts in the urban areas, according to the 2008 official census. Because the boundaries of several administrative RTSs have changed over time, to perform a consistent comparison, we used the official boundary data of 2008 as a reference and computed the land use shares of all seven waves according to this boundary reference. Unlike the case in urban neighborhoods, this study focuses more on RTSs. Hence, we selected 1158 RTSs as analytical units.

Methods
We applied a SAM to cluster sequences of land use change in RTSs into groups and visualized and extracted varying development patterns of RTSs in Guangdong from 1980 to 2015. Generally, the SAM approach was sourced from the Needleman-Wunsch algorithm, which was proposed by Needleman and Wunsch in 1970 and used in the field of bioinformatics [39]. This method was first used to compare the similarity among protein or DNA sequences and detect the homology between the different sequences. Notably, early SAM algorithms were only able to compare two DNA sequences, but then they gradually became capable of multiple sequence alignments [40]. In the 1980s, SAM was introduced into the realm of social science [41], first in the field of sociology [42] and then in geography, to analyze neighborhood socioeconomic changes, trip-chain pattern mining, and behavioral trajectory patterns [3,42,43].
SAM is essentially a string editing technique that calculates the similarity between two sequences as a function of the number of steps required to completely convert one sequence into another. The editing operations of the algorithm are deletion, insertion, and substitution. The larger the number of steps followed to make two sequences equal, the greater the difference between the two. The Needleman-Wunsch algorithm is a classical dynamic programming algorithm used for globally aligning of two sequences. Most of the existing pattern clustering techniques are performed based on numerical values or individual distances [43,44]. SAM is able to capture RTS with similar evolutionary trajectories, not only in terms of type or value, but also considering the chronological order of RTS evolution [45,46]. SAM first needs to define the similarity between two sequences by adopting an alignment and matching method. As shown in Table 1, given the land use sequences (X and Y) of two RTSs, the analytical difficulty lies in how to quantitatively calculate the similarity between these two sequences. It is assumed that the elements (i.e., land use type in each wave) in the two sequences compose the trajectory of land use change. SAM then computes the similarity score on the basis of an alignment process between the two sequences. As an example, the Needleman-Wunsch algorithm consists of three computational steps, outlined as follows.
(a) The first step is to initialize the score matrix, construct a two-dimensional score matrix F, and initialize the score matrix according to the scoring rules. In this study, we set several constants as scoring rules as follows: where s x i , y j is the substitution score for letters i and j. When the elements x i and y j match, the score is 2, whereas when they do not match, the score is −5. Here, the symbol − denotes a gap. When aligning sequences, introducing gaps in the sequences can allow an alignment algorithm to match more terms than a gap-less alignment can. In this case, we set the gap penalty as −5.
(b) The second step is to compute the scores and fill in the matrix. The similarity score of each cell F(i, j) of the scoring matrix is calculated using the scoring rule and calculation formula as follows ( Figure 1): where s x i , y j is the substitution score for letters i and j, s − , y j is the gap penalty obtained by F(i − 1, j) in the vertical direction, and s(x i , − ) is the gap penalty obtained by F(i, j − 1) in the horizontal direction. SAM then computes the similarity score on the basis of an alignment process between the two sequences. As an example, the Needleman-Wunsch algorithm consists of three computational steps, outlined as follows.
(a) The first step is to initialize the score matrix, construct a two-dimensional score matrix F, and initialize the score matrix according to the scoring rules. In this study, we set several constants as scoring rules as follows: where , is the substitution score for letters and . When the elements and match, the score is 2, whereas when they do not match, the score is −5. Here, the symbol ' ' denotes a gap. When aligning sequences, introducing gaps in the sequences can allow an alignment algorithm to match more terms than a gap-less alignment can. In this case, we set the gap penalty as −5.
(b) The second step is to compute the scores and fill in the matrix. The similarity score of each cell , of the scoring matrix is calculated using the scoring rule and calculation formula as follows ( Figure 1): where , is the substitution score for letters and , ′ ′, is the gap penalty obtained by 1, in the vertical direction, and , ′ ′ is the gap penalty obtained by , 1 in the horizontal direction.
The score matrix , is filled in using the process of recursive computation, the three paths for computing , can be observed in Equation (2). As can be observed from Figure 1, the maximum of the three paths is required to obtain the score of , . (c) The third step is to search for the optimal alignment from the matrix and to estimate the optimal alignment by tracing backward from the diagonal element to the previous highest value. In this example, sequences X and Y represent the spatial evolution sequences of two RTSs. The sequence alignment results obtained by path backtracking are shown in Figure 2. According to the sequence alignment results, the final score is −6. The score matrix F(i, j) is filled in using the process of recursive computation, the three paths for computing F(i, j) can be observed in Equation (2). As can be observed from Figure 1, the maximum of the three paths is required to obtain the score of F(i, j).
(c) The third step is to search for the optimal alignment from the matrix and to estimate the optimal alignment by tracing backward from the diagonal element to the previous highest value. In this example, sequences X and Y represent the spatial evolution sequences of two RTSs. The sequence alignment results obtained by path backtracking are shown in Figure 2. According to the sequence alignment results, the final score is −6.
In this research, we studied the spatial evolution sequence of multiple RTSs, and therefore we needed to use a technique for multiple sequence alignment. We analyzed the evolutionary process of RTSs using progressive alignment. The main steps of progressive alignment are as follows: (1) pairwise align all sequences and calculate the distance matrix on the basis of the results of pairwise sequence alignment; (2) construct a guide tree to reflect the evolutionary relationships between sequences according to the distance matrix; (3) guide the two sequences with the closest evolutionary relationship to be aligned using In this research, we studied the spatial evolution sequence of multiple RTSs, and therefore we needed to use a technique for multiple sequence alignment. We analyzed the evolutionary process of RTSs using progressive alignment. The main steps of progressive alignment are as follows: (1) pairwise align all sequences and calculate the distance matrix on the basis of the results of pairwise sequence alignment; (2) construct a guide tree to reflect the evolutionary relationships between sequences according to the distance matrix; (3) guide the two sequences with the closest evolutionary relationship to be aligned using the global dynamic programming algorithm; and (4) progressively add the other sequences in pairwise sequence alignment. After all sequences are added to the alignment, the multiple sequence alignment is considered to be completed.
We applied the unweighted pair-group method using arithmetic averages (UPGMA) to construct a guide tree. UPGMA is a cluster analysis method, proposed by Michener and Sokal in 1957 [47], that adapts arithmetic averages. This method first constructs a distance matrix of pairs of distances of sequences according to the results of pairwise sequence alignment. Next, the minimum distance value is sought in the distance matrix, the two sequences , are clustered into a new class , and the distances between and other sequences are calculated. This is then followed by finding the sequence with the closest distance for one more clustering until all sequences are clustered into one class, and then constructing a phylogenetic tree on the basis of the clustering results.
We used the sequence alignment and phylogenetic analysis module in MATLAB to run SAM. Specifically, the following steps were used to analyze land use changes in this study. (1) We standardized land use data to reduce the impact of land parcel size on the data analysis. (2) We used the k-means algorithm to cluster the land use data of 1158 RTSs in the seven waves and classify RTSs with different land use spatial structures. (3) We constructed a DNA sequence of the spatial evolution trajectory of each RTS, calculated the distance between two sequences, constructed a distance matrix of all sequences, and on the basis of the distance matrix, established a phylogenetic tree for multiple sequences. (4) We compared the DNA sequences of the spatial evolution trajectory of RTSs through a SAM, and compared the similarities, differences, and trajectory trends among the sequences to explore the spatial evolution patterns of RTSs in Guangdong. We applied the unweighted pair-group method using arithmetic averages (UPGMA) to construct a guide tree. UPGMA is a cluster analysis method, proposed by Michener and Sokal in 1957 [47], that adapts arithmetic averages. This method first constructs a distance matrix M of pairs of distances of sequences according to the results of pairwise sequence alignment. Next, the minimum distance value D qp is sought in the distance matrix, the two sequences q, p are clustered into a new class r, and the distances between r and other sequences are calculated. This is then followed by finding the sequence with the closest distance for one more clustering until all sequences are clustered into one class, and then constructing a phylogenetic tree on the basis of the clustering results.
We used the sequence alignment and phylogenetic analysis module in MATLAB to run SAM. Specifically, the following steps were used to analyze land use changes in this study. (1) We standardized land use data to reduce the impact of land parcel size on the data analysis. (2) We used the k-means algorithm to cluster the land use data of 1158 RTSs in the seven waves and classify RTSs with different land use spatial structures. (3) We constructed a DNA sequence of the spatial evolution trajectory of each RTS, calculated the distance between two sequences, constructed a distance matrix of all sequences, and on the basis of the distance matrix, established a phylogenetic tree for multiple sequences. (4) We compared the DNA sequences of the spatial evolution trajectory of RTSs through a SAM, and compared the similarities, differences, and trajectory trends among the sequences to explore the spatial evolution patterns of RTSs in Guangdong.

Clustering Rural Town Settlements by Land Use Patterns
In this study, we first used the k-means clustering approach to partition RTSs in Guangdong into groups according to their land use compositions for all seven waves from 1980 to 2015. The variables selected for the clustering analysis were the standardized areal size of five land use types in RTSs: cultivated land, urban residential land, rural residential land, nonresidential construction land, and ecological land. Several indices were used to determine the number of clusters k. The Calinski-Harabasz index (CH index) is the ratio of between-group dispersion to within-group dispersion, and the score is calculated by evaluating the between-class variance and within-class variance. A higher value of this score indicates a better clustering result [48]. The silhouette coefficient is another widely-used index of evaluating the effectiveness of clustering [49]. It combines two factors, inertia and separation: The closer the evaluation result is to 1, the better the clustering effect is. The inertia index can be considered as a measure of intra-class aggregation. The smaller the inertia value, the better the partition is. We then computed the CH index, silhouette coefficient, and inertia index to identify the optimal number of clusters as 5, as shown in Figure 3. Thus, the five types of RTSs detected by the k-means analysis can be defined by their features of land use composition ( Table 2) as follows:

Clustering Rural Town Settlements by Land Use Patterns
In this study, we first used the k-means clustering approach to partition RTSs in Guangdong into groups according to their land use compositions for all seven waves from 1980 to 2015. The variables selected for the clustering analysis were the standardized areal size of five land use types in RTSs: cultivated land, urban residential land, rural residential land, nonresidential construction land, and ecological land. Several indices were used to determine the number of clusters k. The Calinski-Harabasz index (CH index) is the ratio of between-group dispersion to within-group dispersion, and the score is calculated by evaluating the between-class variance and within-class variance. A higher value of this score indicates a better clustering result [48].The silhouette coefficient is another widelyused index of evaluating the effectiveness of clustering [49]. It combines two factors, inertia and separation: The closer the evaluation result is to 1, the better the clustering effect is. The inertia index can be considered as a measure of intra-class aggregation. The smaller the inertia value, the better the partition is. We then computed the CH index, silhouette coefficient, and inertia index to identify the optimal number of clusters as 5, as shown in Figure 3. Thus, the five types of RTSs detected by the k-means analysis can be defined by their features of land use composition ( Table 2) as follows: (a) Type-1: Cultivated ecological RTSs (C). The land use types of the RTSs identified as Type-1 comprise a relatively large share of cultivated land (28.40%) and ecological land (68.34%) compared to other RTS types, whereas the share of construction land is relatively small (0.47% urban residential land, 2.29% rural residential land, and 0.50% nonresidential land). Type-1 RTSs also have relatively smaller areas, with an average of 100.55 km 2 . This type of RTS primarily represents the traditional primitive rural areas, serving agricultural production. (b) Type-2: Ecologically dominant RTSs (E). The land use proportions of Type-2 RTSs are often large and dominated by ecological land, with an average share of 84.61% among all types of land. Here, ecological land includes forest land, grassland, water, and unused land. In these RTSs, cultivated land is the second largest land use type, averaging around 14.40%. This implies that Type-2 RTSs are typical rural town areas for ecological reservation. (c) Type-3: Rural construction RTSs (R). In this type of settlement, the average area of rural residential land is the largest, at 12.39 km 2 , whereas the areas of urban residential land and nonresidential construction land are relatively trivial. The area of cultivated land in this type of RTS is also the largest, but that of ecological land is  40%. This implies that Type-2 RTSs are typical rural town areas for ecological reservation. (c) Type-3: Rural construction RTSs (R). In this type of settlement, the average area of rural residential land is the largest, at 12.39 km 2 , whereas the areas of urban residential land and nonresidential construction land are relatively trivial. The area of cultivated land in this type of RTS is also the largest, but that of ecological land is relatively small. This demonstrates that Type-3 RTSs are the agricultural upgrade and development RTSs, which promote the expansion of rural residential land. (d) Type-4: Mixed construction RTSs (M). The proportion of nonresidential construction land of in this type of RTS is much higher than in others, with an average of 8.35%. The areal sizes of urban and rural residential land also rank among the top. This type of RTS also has a relatively high proportion of cultivated and ecological land, and it contains a mixture of construction land types, indicating that it is a transformational development rural town area. (e) Type-5: Urban construction RTSs (U). These RTSs contain the highest proportion of urban construction land (37.98%) compared to other types, whereas the share of cultivated land is relatively low. The proportion of ecological land is the lowest compared to other RTSs, at 31.89%. A large amount of ecological and agricultural land is transformed into urban construction land. This is a typical urbanized rural town area that is often located near the fringes of urban areas.

Description of the Land Use Trajectories of Rural Town Settlements
According to the clustering results of RTSs, we can obtain the land use category of an RTS in each year and construct its land use change trajectory from 1980 to 2015 (with seven waves). For example, by analyzing the trajectories of 138 urban construction (Type-5) and mixed construction (Type-4) RTSs in 2015 (Figure 4), we can observe that in 1980, most of these 138 RTSs were categorized into Type-1 or Type-3. This demonstrates the rapid urbanization process in RTS areas where most urbanized RTSs have been evolved from traditional rural and agricultural areas. Such an evolutionary process shows the transformation of land use functions from mainly agricultural land and rural construction land eventually to urban construction land.
Specifically, Figure 4a shows the change of land use types of 138 RTSs over seven waves. Before 2005, only a small number of RTSs were observed, with a significant change in land use types to urbanized types (i.e., Type-4 and Type-5). However, immediately after 2005, significant RTSs evolution was witnessed, with most of the RTSs shifting from Type-1 and Type-3 to Type-4 or Type-5. Since 2000, the expansion of urban residential land and nonresidential construction land has entered an accelerated period. The amount of cultivated land declined rapidly, and even the areas of rural residential land started to decrease. It can also be observed that a certain amount of agricultural and ecological land was converted into the construction land from 2000 to 2015. In addition, Figure 4b shows the development process of 104 rural construction (Type-3) RTSs in 2015. Unlike urbanized RTSs, 89% of the rural upgraded RTSs became rural construction RTSs before 2005. This indicates that the rural residential land expanded rapidly from 1980 to 2005, and a large amount of cultivated and ecological land was converted into rural residential land for agricultural production and housing. decrease. It can also be observed that a certain amount of agricultural and ecological land was converted into the construction land from 2000 to 2015. In addition, Figure 4b shows the development process of 104 rural construction (Type-3) RTSs in 2015. Unlike urbanized RTSs, 89% of the rural upgraded RTSs became rural construction RTSs before 2005. This indicates that the rural residential land expanded rapidly from 1980 to 2005, and a large amount of cultivated and ecological land was converted into rural residential land for agricultural production and housing.

Extracting the Land Use Change Patterns of Rural Town Settlements with Sequence Alignment Methods
In this study, we applied the SAM approach to partition the land use sequences of 1158 RTSs in Guangdong into groups by calculating the similarity in the change trend of land use among the sequences. The modeling results revealed 10 types of land use sequences of RTSs (i.e., 10 types of 'DNA' of RTSs). Figure 5 shows these 10 types of RTSs relying on the sequential similarity of land use change. In general, these 10 types of RTS sequences can be divided into two categories: RTSs of stable land use patterns (or stable RTSs thereafter) and RTSs of changing land use patterns (or changing RTSs thereafter).

Extracting the Land Use Change Patterns of Rural Town Settlements with Sequence Alignment Methods
In this study, we applied the SAM approach to partition the land use sequences of 1158 RTSs in Guangdong into groups by calculating the similarity in the change trend of land use among the sequences. The modeling results revealed 10 types of land use sequences of RTSs (i.e., 10 types of 'DNA' of RTSs). Figure 5 shows these 10 types of RTSs relying on the sequential similarity of land use change. In general, these 10 types of RTS sequences can be divided into two categories: RTSs of stable land use patterns (or stable RTSs thereafter) and RTSs of changing land use patterns (or changing RTSs thereafter).
Stable RTSs mostly have unchanged land use categories from 1980 to 2015, or perhaps only slight changes in one or two occasional time slots. There are four types of stable RTSs Type-1 (cultivated ecological), Type-2 (ecologically dominant), Type-3 (rural construction), and Type-4 (mixed construction). Interestingly, we found no stable Type-5 RTSs. In other words, we found that not many rural town areas have been constantly dominated by urban construction land types since 1980. This is reasonable because immediately after 1980, at the beginning of the open-up and reform policies, most rural town areas in Guangdong and most of the regions in China were dominated by cultivated land. Thus, most of the  On the other hand, changing RTSs represent some rural town areas with significant land use changes from 1980 to 2015, including six changing patterns from Type-1 (cultivated ecological) to Type-3 (rural construction), from Type-1 to Type-4 (mixed construction), from Type-1 to Type-5 (urban construction), from Type-2 (ecologically dominant) to Type-4, from Type-3 to Type-4, and from Type-4 to Type-5. These changes demonstrate the general development patterns of land use of RTSs along more than a period of 30 years.
Most RTSs in Guangdong have stable land use patterns, accounting for 88% of all RTSs. Stable RTSs are also widely distributed in the eastern, northern, and western parts of Guangdong, outside the Pearl River Delta (PRD) area ( Figure 6). Thus, most stable RTSs are from less wealthy regions, whereas the changing RTSs are mainly distributed in the well-developed PRD region. Hence, because of the rapid urbanization in the PRD area, the region has experienced a significant increase in the proportion of construction land, resulting in a dramatic change in the land use patterns in RTSs, in addition to some RTSs outside the PRD area that have also undergone a transformation in their land use structure, as shown in Figure 6.

Discussion: Driving Forces Formulating Land Use Sequences of Rural Town Settlements
In general, the analysis of the driving force is an important reference for formulating the development strategy of backward RTSs. The analysis results of the spatial evolution patterns depict the distribution of various evolution patterns in space, combined with the internal and external conditions of the RTSs (Table 3), and discuss the driving factors of the spatial evolution patterns.
Most of the stable Type-2 RTSs are located in Northern Guangdong, which is a mountainous area. Because of the influence of topographical and geomorphological factors, it is difficult to convert cultivated land into construction land, which limits the spatial evolution process of these RTSs. Generally, RTSs in plain areas can rely on topographic advantages to expand the construction land. For example, in the PRD plains, the processes of rural industrialization driven by rural and urban expansion, urbanization with immigrations from inside and outside the Guangdong Province, and regional urban-rural integration are rapid, and a large number of rural areas have developed into urban areas.
Location is also an important driving force. On the one hand, RTSs adjacent to the city center can be influenced by the city planning. Not only do they have easy access to the city's basic services, but they can also receive industries that have moved away from the city. Therefore, RTSs close to the city center are well developed. In addition, RTSs located on the borders of two cities can be developed with the help of transportation roads between the cities by provincial or regional transportation planning schemes. On the other hand, RTSs in remote mountainous areas are difficult to develop because of their remote location and poor transportation services.
As the driving force of urban development, industries have also played an important Overall, the 10 types of RTSs by sequence, as shown in Figure 5, were found to exhibit varying evolution features from 1980 to 2015. Figure 6b shows the specific spatial distribution of these 10 types of RTSs. The land use change details and geographical distribution characteristics of these 10 types of RTSs are summarized as follows: Only 14 RTSs (1%) constitute this type, the fewest of all RTS DNA sequences. They are scattered in the central part of Guangdong, including a small number of RTSs in the PRD and Northern Guangdong. A typical example of this pattern is Genghe Town in Foshan, which has achieved rapid expansion of construction land through the optimization of industrial structure and investment attraction; (i) DNA-9 (changing from Type-3 to Type-4): These 15 RTSs, constituting 1% of the RTSs in the study area, follow a DNA sequence transition from Type-3 (rural construction) to Type-4 (mixed construction) in the time period. The spatial distribution map of this type of RTSs shows that most of them (93%) are located in the core cities of the PRD, including Guangzhou, Foshan, Dongguan, and Huizhou; (j) DNA-10 (changing from Type-3 to Type-5): Overall, the 19 RTSs (2%) in this group exhibit a changing DNA sequence over the 35 years of the study, and they represent RTSs that follow an expansion trend of urban residential land. Located at the core spatial extent of the PRD, these RTSs have a superior location (e.g., Xintang town in Guangzhou and Humen town in Dongguan). Both of these towns received spillover industries from urban areas, resulting in land use changes.

Discussion: Driving Forces Formulating Land Use Sequences of Rural Town Settlements
In general, the analysis of the driving force is an important reference for formulating the development strategy of backward RTSs. The analysis results of the spatial evolution patterns depict the distribution of various evolution patterns in space, combined with the internal and external conditions of the RTSs (Table 3), and discuss the driving factors of the spatial evolution patterns. Most of the stable Type-2 RTSs are located in Northern Guangdong, which is a mountainous area. Because of the influence of topographical and geomorphological factors, it is difficult to convert cultivated land into construction land, which limits the spatial evolution process of these RTSs. Generally, RTSs in plain areas can rely on topographic advantages to expand the construction land. For example, in the PRD plains, the processes of rural industrialization driven by rural and urban expansion, urbanization with immigrations from inside and outside the Guangdong Province, and regional urban-rural integration are rapid, and a large number of rural areas have developed into urban areas.
Location is also an important driving force. On the one hand, RTSs adjacent to the city center can be influenced by the city planning. Not only do they have easy access to the city's basic services, but they can also receive industries that have moved away from the city. Therefore, RTSs close to the city center are well developed. In addition, RTSs located on the borders of two cities can be developed with the help of transportation roads between the cities by provincial or regional transportation planning schemes. On the other hand, RTSs in remote mountainous areas are difficult to develop because of their remote location and poor transportation services.
As the driving force of urban development, industries have also played an important role in spatial evolution. For RTSs of stable land use patterns, their economic development relies on agriculture, lacking endogenous dynamics and external driving forces to promote their development. RTSs of changing land use patterns are guided by industrialization. Meanwhile, some RTSs have better landforms and location conditions, attracting foreign investments and the construction of large-scale industrial parks, which are also guided by dynamic city master plans.
During the spatial evolution of the RTSs in Guangdong, policies have driven land use changes in rural areas [50]. The emergence of the Household Responsibility System has liberated labor in RTSs and helped rural residents realize rural industrialization. A large number of rural laborers moved to the large cities, causing an exodus from remote rural areas and resulting in the land use structure of RTSs being stable for a long time. Finally, it is worth noting that the development of rural areas in the PRD relies on economic reform and opening up in order to attract foreign investments and invest in factories and selling products through foreign trade, in addition to the population influx.
In summary, the study found that RTSs of Guangdong Province have diverse land use evolutionary patterns and are highly polarized in terms of spatial distribution, with a large difference in land use sequences between the PRD and non-PRD regions. These findings are consistent with some existing findings [51]. For example, Ye [52] revealed the differences in land use structure within Guangdong Province, and Cao [53] confirmed the differences in economic aspects among the four regions of Guangdong Province. In addition, the evolution of the spatial structure of land use in the RTS varies with driving forces. The driving forces should be contextually variant in different countries. As found in Lambin's [54] study on land management, agriculture, and forestry in developing countries, economic globalization, land use regulations, and policies can help developing countries achieve sustainable land use transition. Uisso [55] also analyzed the changes in the amount of arable land in Tanzania and their influencing factors, and found that the economic and demographic factors play more important roles in land use changes. Thus, the findings in this study can enrich the worldwide research on land development trajectories in rural town areas by adding China's development experiences.

Conclusions
In this study, we investigated the spatial evolution process of RTSs from the perspective of long time series longitudinal land use changes. We also applied a SAM to mine the spatial evolution patterns of the RTSs in Guangdong Province and then analyze the influence of basic socio-economic attributes and policy factors. This study helps fill the gap in the study of the vertical and continuous spatial evolution of RTSs. We also applied the new method of neighborhood development pattern mining to the study of rural areas. The results show that, according to the spatial structure of land use, the RTSs in Guangdong can be clustered into five types. Moreover, 10 development patterns of RTSs were observed in Guangdong, which can be divided into RTSs of stable land use patterns and RTSs of changing land use patterns. Notably, most of the RTSs with changing land use patterns (around 73%) are located in the PRD. These findings emphasize the importance of the longitudinal data analysis of RTSs and pattern mining studies of land use trajectories. As shown here, if the land use development patterns of RTSs are not scrutinized, the development of backward areas may be overlooked, exacerbating the problem of polarization between rural areas in the PRD and those in other regions.
We also found that the location, landforms, industries, and policy factors influence the spatial evolution patterns of RTSs. In other words, RTSs located in plains and close to developed cities are more likely to develop rapidly and transform into other types of RTSs. These findings coincide with several recent studies performed in the PRD [56][57][58][59]. However, these studies mostly focused on the PRD region and used cross-sectional data. Therefore, we believe that it is more necessary to focus on the development process of backward regions outside the PRD region while using longitudinal data combined with pattern mining methods in order to coordinate regional development and revitalize the development of rural areas.
In addition, the results of spatial evolution patterns can help guide rural town area planning for rural revitalization. This study reveals the significance of the analysis based on the town scale and its spatial evolution trajectories. In particular, it provides a method for comparing the similarities and differences of the spatial evolution process and patterns of RTSs.
The empirical analysis also reflects many advantages of the SAM applied to spatial evolution pattern mining, such as the high efficiency of the algorithm, consideration of the time series factors, and long time series pattern mining. However, the algorithm has some shortcomings. First, progressive SAM is a greedy algorithm, the result of the sequence alignment is an approximate solution, and it is difficult to obtain an optimal solution. Second, multiple sequence alignment is a nondeterministic polynomial (NP) problem [60]. The evaluation of the result of sequence alignment temporarily lacks a perfect objective function, and none of the present objective functions can cover all aspects of the problem. Hence, converting sequence alignment into network-related data and using complex network analysis algorithms for clustering is considered a potential solution [61][62][63][64].
Some other shortcomings also require further exploration. For example, studying the spatial evolution pattern mining of RTSs is based on the spatial structure of land use. Different spatial structure types of land use help constitute evolutionary sequences and enable evolutionary pattern mining. Therefore, future studies should consider other spatial elements, such as the road network structure, spatial pattern of RTSs, and settlement structure. In addition, the multiple driving factors that promote spatial evolution and spatial differentiation are worthy of an in-depth study. Future sequence alignment studies should also consider both driving factors and multiple spatial evolution sequences to explore the mapping model of multiple driving force and the spatial evolution patterns of RTSs so as to realize a coupling analysis of multiple driving forces and spatial patterns in rural areas.