Czekanowski’s Diagram and Spatial Data Cluster Analysis for Planning Sustainable Development of Rural Areas

: Defects in the spatial structure of agricultural land resulting from the common phenomenon of land fragmentation constitute one of the most important factors that contribute to the lack of rational land management. Reconstruction of the spatial structure of rural areas is essential for their sustainable development. The process of land consolidation is a tool that can arrange space and lead to the desired structural changes. It is reasonable to select objects for land consolidation in such a way as to obtain the best possible effect. This article presents an algorithm for grouping areas with the concentration of the external land ownership patchwork with the use of Czekanowski’s method of cluster analysis. The clusters determined this way can be treated as the whole objects subjected to land consolidation, for which the process will bring the greatest beneﬁts in terms of the elimination of the external land ownership patchwork. The described algorithm is relatively simple to use and the graphical ﬁnal form is easy for the result interpretation. It allows for multi-variant examination of the analyzed phenomenon and can be applied wherever there is access to reliable information from land registry and cadastral and GIS databases that are used to obtain a complete picture of the spatial and ownership structure of the analyzed areas.


Introduction
Agriculture plays an important role in economic development [1,2]. Currently, various challenges and transformations affecting agriculture such as the growth of the human population and the degree of urbanization are globally observed [1,3]. The problem of excessive land development and farmland consumption resulting from unsustainable urban growth processes has been intensively discussed as well [4][5][6][7]. Despite land degradation, the increased demand for food has become a major task, hence agricultural production has been recognized as directly related to poverty reduction and sustainable development [1,8].
The following factors are of great importance for achieving sustainable performance in agriculture: soil properties, water resources, new plant breeding technologies, and, most of all, the ability to sustain crop growth and productivity [1,9]. Therefore, transformation of agricultural arable land from land fragmentation to large-scale management is regarded as an essential transition path. In the 21st century, each society and country needs a modern tool of compulsory land readjustment in order to be able to provide for sustainable use of land and public infrastructural equipment [10].
Land fragmentation with single farms having numerous land plots is a common feature of agriculture in many countries. This situation is caused by the processes of long-term transformations in politics and agricultural management taking place in areas with different social, demographic, and economic situations. The excessive fragmentation of plots belonging to a single farm is one of the major factors adversely affecting the profitability of agricultural production [1]. Fragmentation of farmland is a complex concept including several dimensions such as: type of land use, form of ownership, and the geometric structure of plots (e.g., area, number of plots per farm, plot shape, distance from the farmer's dwelling) [11][12][13]. One of the dimensions is a patchwork of land ownership.
Due to administrative boundaries land ownership patchwork is divided into internal (i.e., existing inside the village) and external. The latter can be found between villages, communes, counties, voivodships, and even between countries. Take for instance the patchwork of land between Poland and Slovakia and Poland and the Czech Republic, which came into being during the partitions, when there was no border between them [14]. The problem associated with excessive fragmentation of land and patchwork of land ownership exists in many countries in Europe [15][16][17] and all over the world [18][19][20], but few authors deal with the problem of elimination of external patchwork of land ownership. In Poland Rabczuk [21] defined the owners of land in the patchwork of land ownership as non-resident owners. The concept was then clarified by Noga [22]. The owners of land in the patchwork of land ownership can be described as non-resident and local owners due to their different nature. The owners who own their land outside the examined village will be named local out-of-village owners. On the other hand, non-resident out of village owners are those who have their land in the studied village and live in other localities.
The defective spatial structure of individual villages may be improved with land consolidation and land exchange tasks, which additionally contribute to the sustainable development of rural areas, including the creation of more favorable conditions for farming and forestry by improving farm layout and equipping the areas being restructured with technical and social infrastructure. Land consolidation is a complex project involving agricultural landscape and natural resources, restoring the spatial structure of villages and transferring land as well as the improvement in the road network and farmland irrigation systems [23,24]. It also has an impact on the proper determination of land ownership, the transformation of agricultural production methods, and the transfer of rural labor [24,25].
The land consolidation and exchange process mainly occur within the administrative boundaries of villages. Then, the land of local out-of-village owners is usually situated at the outer boundary of the village, which does not completely improve the existing plot patchwork of their farms; and vice versa, non-resident owners might not be interested in consolidation, having little land in the village to be consolidated. Therefore, carrying out land consolidation and exchange tasks in one single village improves the spatial structure of this village, but it does not eliminate the phenomenon of the external patchwork of land ownership which reduces the efficiency of the consolidation process. Moreover, as it was discussed with experts of carrying land consolidation and land exchange processes and local authorities, intensification of the phenomenon of the external patchwork of land ownership may make it impossible to carry out the consolidation process at all. Out-of-village owners with little land in the village are often not interested in the land consolidation process, which makes it difficult to obtain the required consent of the majority of landowners to carry the land consolidation in particular villages. Most often these are villages with the most urgent need to improve the spatial structure by consolidation, and without the consent of the majority of private owners, such action is impossible. Therefore, the funds allocated to these villages will be used elsewhere, exacerbating the degradation of their potential. It would be advisable to carry out land consolidation and exchange in a manner ensuring the possibility of a land exchange between local non-resident owners [26].
This paper describes an algorithm in search for areas with the external patchwork of land ownership and groups them into clusters. The clusters will aggregate the villages with the common problem of the external patchwork of land ownership for which the greatest number of non-resident owners are those who own land in these villages and live in other from that cluster. The clusters determined in this way can be treated as whole objects subjected to land consolidation and exchange works, for which the process will bring the greatest benefits in terms of the elimination of the external land ownership patchwork.
Cluster analysis methods group objects according to their similarity ascertained from the information on their features. The basis for the analysis is a symmetric matrix of the object distances from each other. The distance matrix includes measures of the object Sustainability 2021, 13, 11404 3 of 13 similarity expressed as the distances between sets of characteristic features in multidimensional space. In practice, at least a few types of distance measures are used [27]. The basic classification distinguishes between hierarchical and non-hierarchical methods. In hierarchical methods, there are separate levels on which individual objects are clustered while in the case of non-hierarchical methods, the order of clustering is not taken into account. Objects that are in one cluster do not necessarily have to stay together and they can move from one cluster to another [28].
At the beginning of the 20th century, Jan Czekanowski, a Polish anthropologist, proposed a simple and effective method of clustering objects in multidimensional space of characteristics [29,30]. Originally, it was designated for examining the structure of a series of human remains, but later it has been applied in other fields as well (biological sciences, archeological typology). Czekanowski's method can be basically classified as non-hierarchical, however, it has such an advantage over them that the number of clusters does not need to be predefined. In addition, it provides a clearer graphical presentation of results than dendrograms in hierarchical methods.

Materials and Methods
In cluster analysis, the input data have the form of an array with rows containing the values of individual features corresponding to the selected objects. The number of rows correlates with the number of objects being analyzed and the number of columns with the number of features that characterize them. The first step is to create the distance matrix of the objects according to the distance meter used. The smaller the distance, the more similar the objects are to each other and vice versa. Therefore, this matrix is sometimes called the matrix of similarity. It is a symmetric matrix containing zeros on the main diagonal (the distance from the object itself is zero). This matrix is the subject of further cluster analysis according to the selected algorithm.
The input data in the proposed algorithm refer to the three main features that adequately describe the analyzed objects in terms of external land ownership patchwork [22,31,32]. Due to the nature of the problem under consideration, the values included in the initial arrays refer directly to the relationship between the examined objects and express their mutual dependence in terms of the analyzed feature, which is intentional in this case. In the given example, the three following matrices have been used and they express, respectively: − A = [a ij ]-the percentage of non-resident owners in the total number of private plot owners, − B = [b ij ]-the percentage of non-resident owners' plots in the total number of private owners' plots within a cadastral unit (village), − C = [c ij ]-the percentage of the area of non-resident owners' plots in the total area of a cadastral unit (village).
All the values included in the cells of matrices are based on the data obtained from land registry and GIS databases. The adoption of percentage shares is a way to create a uniform and comparative scale for the entire studied area. The matrices from A to C are nxn dimensions, where n is the number of analyzed villages. The rows of the matrix refer to individual villages from the analyzed area and contain the percentage shares of out-of-village owners from other villages (in the same order as in the rows) referring to the considered feature. Therefore, despite the symmetrical arrangement of rows and columns, the values of matrix cells themselves are not symmetrically arranged. The sum of the values in the entire row corresponds to the percentage share of a given feature for all the people who are out-of-village owners but live in the area under consideration.
Moreover, a spatial neighborhood matrix (spatial weight matrix) is also used in the analyses of spatial statistics, which makes it possible to assign appropriate weights to the relationships of individual objects to each other. These kinds of matrices have also found their application in the analyses of land consolidation and exchange works [33]. Typical spatial neighborhood matrices can be divided into neighborhood-based matrices (e.g., according to the common border criterion) and distance-based matrices (e.g., according to the reciprocal distance criterion) [34]. The adoption of a spatial weight matrix based on the common border criterion allows avoiding cases when the distances between villages (centroids) being directly adjacent to each other are greater than between one of the villages of the considered pair and other villages. This is an undesirable phenomenon. For example, as Figure 1 shows, the distance from the village of Radzice Duże to the villages of Radzice Małe or Krzczonów with which it borders is greater than to the villages of Trzebinia or Strzyżów with which it does not have a common border. ues in the entire row corresponds to the percentage share of a given feature for all the people who are out-of-village owners but live in the area under consideration.
Moreover, a spatial neighborhood matrix (spatial weight matrix) is also used in the analyses of spatial statistics, which makes it possible to assign appropriate weights to the relationships of individual objects to each other. These kinds of matrices have also found their application in the analyses of land consolidation and exchange works [33]. Typical spatial neighborhood matrices can be divided into neighborhood-based matrices (e.g., according to the common border criterion) and distance-based matrices (e.g., according to the reciprocal distance criterion) [34]. The adoption of a spatial weight matrix based on the common border criterion allows avoiding cases when the distances between villages (centroids) being directly adjacent to each other are greater than between one of the villages of the considered pair and other villages. This is an undesirable phenomenon. For example, as Figure 1 shows, the distance from the village of Radzice Duże to the villages of Radzice Małe or Krzczonów with which it borders is greater than to the villages of Trzebinia or Strzyżów with which it does not have a common border. The similarity matrix, which constitutes the basis of cluster analysis, is symmetrical. For this reason, we adopt a simple operation of summing the matrix and its transposition to all the three matrices from A to C. In this way, we obtain symmetrical matrices with cell values corresponding to the mutual impact of each pair of villages in the analyzed area with regards to the selected feature. Basically, each of these arrays may already be a similarity matrix in the sense of cluster analysis. However, it will not express the similarity of objects but the mutual dependence of what is intended in this case. The similarity matrix, which constitutes the basis of cluster analysis, is symmetrical. For this reason, we adopt a simple operation of summing the matrix and its transposition to all the three matrices from A to C. In this way, we obtain symmetrical matrices with cell values corresponding to the mutual impact of each pair of villages in the analyzed area with regards to the selected feature. Basically, each of these arrays may already be a similarity matrix in the sense of cluster analysis. However, it will not express the similarity of objects but the mutual dependence of what is intended in this case.
In order to analyze all the features at once, it is necessary to adopt the aggregation of features corresponding to all the three matrices changed into one symmetrical similarity matrix. Due to the common scale of matrices A, B, and C, the aggregation of all arrays is performed by summing cell values. The spatial weight matrix D is also taken into account.     The arrangement of objects in the diagram is performed by changing the order of rows and the corresponding columns so that the graphical symbols representing the possible interdependencies are centered along the main diagonal, while the symbols corresponding to the declining relationships are remote from the main diagonal. The arrangement of the diagram refers to the numerical values included in the matrix and it is carried out according to the algorithm involving the optimization function proposed by Sołtysiak [35,36], which is as follows: where U m indicates the factor for matrix arrangement, n is the size of the matrix (the number of examined villages) and i and j are the numbers of the matrix column and row respectively. Defined in this way, the function of optimization takes the lowest value, if the lowest values of the cells are closest to the diagonal (i.e., the most similar objects adjoining one another). It is essential in clustering. As it can be seen, the S matrix does not indicate the closest distance (in the sense of the similarity of features) but the mutual dependence between villages (the bigger dependence within a pair of villages, the greater the value of a matrix cell). Therefore, one more transformation of the S matrix is needed and finally, the matrix S = [s ij ] is arranged, where: This operation maintains the relationship between individual cells. Changing the original zero values to numbers with an order of magnitude greater than the rest of the values practically eliminates these cells in the formation of clusters. The final result is a graphical image of the clusters represented by the above-mentioned graphic symbols, which is the subject of further visual evaluation. The organization of the whole process is schematically shown in Figure 3.

Results and Discussion
An example of the algorithm implementation is presented on the two objects. The smaller one is Drzewica commune with an area of 118.2 km 2 , where there are 17 analyzed villages. The larger one is Brzozów district with an area of 540.4 km 2 and 44 villages. Both of the areas are located in the central and south-eastern part of Poland (Figure 4).
It is the area of central, eastern, and south-eastern Poland that is characterized by the most unfavorable spatial structure of rural areas [37][38][39]. It also concerns the analyzed test area, where even more than 30% of the total number of plots belong to out-of-village nonresident owners (Figures 5 and 6).

Results and Discussion
An example of the algorithm implementation is presented on the two objects. The smaller one is Drzewica commune with an area of 118.  (Figure 4).
It is the area of central, eastern, and south-eastern Poland that is characterized by the most unfavorable spatial structure of rural areas [37][38][39]. It also concerns the analyzed test area, where even more than 30% of the total number of plots belong to out-of-village nonresident owners (Figures 5 and 6).  It is the area of central, eastern, and south-eastern Poland that is characterized by the most unfavorable spatial structure of rural areas [37][38][39]. It also concerns the analyzed test area, where even more than 30% of the total number of plots belong to out-of-village non-resident owners (Figures 5 and 6).       The three main clusters can be distinguished in the villages of Drzewica commune. They include nine villages and are marked in orange, green, and blue in the diagram. As it can be seen in the rows and columns of the diagram there are few connections between distinguished clusters and other analyzed villages. For example, over 60% of non-resident  The three main clusters can be distinguished in the villages of Drzewica commune. They include nine villages and are marked in orange, green, and blue in the diagram. As it can be seen in the rows and columns of the diagram there are few connections between distinguished clusters and other analyzed villages. For example, over 60% of non-resident owners from each village grouped in "blue" cluster lives in other villages in this cluster. They also own over 60% of all area of private non-resident owners plots. There are 80% non-resident owners who own over 80% of non-resident owners plots in the villages of the "green" cluster respectively.
All the remaining villages are connected between themselves closer and with a higher level of dependency. The individual possible clusters are not separate and partially overlap. Therefore, all the remaining villages are grouped into one larger aggregation consisting of several possible cluster accumulations. This group of villages may be the subject of further analysis in terms of various additional factors (founds allocation, other aspects of land fragmentation, or degradation of spatial structure) allowing for the identification of the alternative clusters.
The spatial distribution of clusters in the analyzed area is shown in Figure 8.  The result of applying the described algorithm in the area of Brzozów district is shown in Figure 9.
As it can be seen, all the formed clusters are separated. There are not many connections between villages from each particular cluster and other villages in the diagram. There are few "stand alone" villages as well. The spatial distribution of clusters in Brzozów district is shown in Figure 10.
The use of Czekanowski's clustering method has allowed determining clusters of villages in both tested areas. All the distinguished clusters one by one could be considered as separate objects to conduct land consolidation and exchange works. The land consolidation process realized on this type of object would give better results in terms of liquidation of land ownership scattering phenomenon in the villages constituting the selected objects.   Czekanowski's method is quite a simple tool for clustering objects in a multidimensional space of characteristics. However, it is important to remember that the main limitation of this method is that a very large amount of data is needed to develop the selected matrices. It is crucial to validate the data obtained from land registry and cadastral or GIS databases before clustering, particularly when automated procedures of the data acquisition are implemented. Some records in the databases that are used to calculate cells values are not consistent and may have the same value but they may be written in a different way (e.g., names, addresses, etc.). Therefore, this part of the algorithm is still under consideration, it is not fully automated and often requires manual corrections.
Contrary to the checkerboard matrices presented by Noga [22], the whole algorithm takes into account many features of the studied areas, which is an advantage. The S matrix, which is the subject of the final analysis, may include additional matrices containing numerical values of other defined indicators characterizing the studied areas in terms of the analyzed phenomenon. In particular, the final S matrix may include values reflecting the indicators proposed in [26] or the characteristics of rural areas considered in [40]. The use of the proposed solution allows for the identification of mutually similar areas, where consolidation works should take place at the same time. Although clusters themselves do not indicate the required sequence of consolidation and exchange of land, it may be one of the variables in analyzing the hierarchization of such works [41,42]. As it can be seen, all the formed clusters are separated. There are not many connections between villages from each particular cluster and other villages in the diagram. There are few "stand alone" villages as well. The spatial distribution of clusters in Brzozów district is shown in Figure 10. The use of Czekanowski's clustering method has allowed determining clusters of villages in both tested areas. All the distinguished clusters one by one could be considered as separate objects to conduct land consolidation and exchange works. The land consolidation process realized on this type of object would give better results in terms of liquidation of land ownership scattering phenomenon in the villages constituting the selected objects.
Czekanowski's method is quite a simple tool for clustering objects in a multidimensional space of characteristics. However, it is important to remember that the main limitation of this method is that a very large amount of data is needed to develop the selected matrices. It is crucial to validate the data obtained from land registry and cadastral or GIS databases before clustering, particularly when automated procedures of the data acquisition are implemented. Some records in the databases that are used to calculate cells values are not consistent and may have the same value but they may be written in a different way (e.g., names, addresses, etc.). Therefore, this part of the algorithm is still under consideration, it is not fully automated and often requires manual corrections.
Contrary to the checkerboard matrices presented by Noga [22], the whole algorithm takes into account many features of the studied areas, which is an advantage. The S matrix, which is the subject of the final analysis, may include additional matrices containing numerical values of other defined indicators characterizing the studied areas in terms of the analyzed phenomenon. In particular, the final S matrix may include values reflecting the indicators proposed in [26] or the characteristics of rural areas considered in [40]. The use of the proposed solution allows for the identification of mutually similar areas, where consolidation works should take place at the same time. Although clusters themselves do not indicate the required sequence of consolidation and exchange of land, it may be one of the variables in analyzing the hierarchization of such works [41,42].

Conclusions
The described algorithm is relatively simple to use and the graphical final form is easy for the result interpretation. It allows for multi-variant examination of the external

Conclusions
The described algorithm is relatively simple to use and the graphical final form is easy for the result interpretation. It allows for multi-variant examination of the external patchwork of land ownership and can be easily extended with additional dependency matrices created according to the selected features.
In practice, the presented methodology of calculations can be applied wherever there is access to reliable information from the land registry, cadastral, and GIS databases that are used to obtain a complete picture of the spatial and ownership structure of the analyzed areas. If the data are digital and consistent, the algorithm can be fully automated.
This algorithm provides important information for the process of planning consolidation works at the local government level. The distinguished clusters could be considered as separate objects to conduct land consolidation and exchange works. A common issue of the external patchwork of land ownership among villages grouped into clusters could activate non-resident owners from the area to participate in the consolidation process and make it possible to start land consolidation and exchange works. Land consolidation works for the whole cluster should provide better results than for particular villages separately. Land exchange will considerably decrease the distance between land belonging to out-of-village owners from the analyzed cluster and the farm homestead, which enables the more effective organization of private farms. Therefore searching for areas with an excessive concentration of external patchwork of land ownership will allow for better consolidation effects through its liquidation and thus for more effective spending of public funds allocated to land consolidation and exchange works. Land consolidation carried out for the rural areas with the greatest degradation of the spatial structure will ensure competitive conditions for their development.