1. Introduction
In China, other than those villages that have been included in the national protection system as “traditional villages”, there is still a large number of villages that can be classified as neither “traditional villages” nor “modern villages” for involving both traditional and modern landscapes; they are called “atypical traditional villages” [
1]. Village landscapes can refer to individual elements (e.g., buildings, vegetation, roads, mountains, water bodies, farmland, crops, and production and living facilities), as well as global patterns and field spirits formed by the coexistence and interaction of these elements. Protecting the traditional landscapes in atypical traditional villages is one of the important measures for preserving the historical traces and cultural memories of Chinese villages, inheriting fine traditional Chinese culture, cultivating healthy moral ideas, and promoting harmonious social development [
2]. Also, it is one of the crucial subjects under the “Beautiful Countryside” program. While all-round exploration and utilization of traditional village landscapes as cultural heritages are considered the necessary procedures to reshape rural cultures and develop rural tourism, precise capture of landscape elements’ spatial information (e.g., location, quantity, area, percentage, aggregation, and dispersion), especially traditional structures, is an important prerequisite for a scientific, reasonable, feasible planning and design scheme for conserving, progressing, and developing atypical villages.
Currently, primary methods for acquiring the spatial information of village landscapes include manual field survey and visual interpretation of unmanned aerial vehicle (UAV) remote sensing (RS) images, which are time- and labor-intensive and suffer from insufficient precision and efficiency. Therefore, rapid and precise capture of landscapes, especially traditional structures, is the key to narrowing the planning and design span, controlling the cost of information acquisition, improving the planning and design quality, and achieving the expected effect.
Aerial photography, laser scanning, and other RS technologies can provide high-resolution datasets in building analysis and identification [
3]. As the resolution of such satellite RS data has increased satisfactorily, automatic building identification is subject to a few restrictions, including scenario complexity, building variability, and sensor resolution [
4]. Typical satellite RS systems, such as Landsat 8 OLI/TIRS, ASTER, and Sentinel-2, boast abundant spectral information to provide time-series multi-spectral images for long-term observation, but their unsatisfactory resolutions (10–30 m) [
5] allow for only fuzzy spatial information of landscapes. These techniques are incapable of clearly outlining the boundaries and contours of small structures out of a complex scenario. Given the coexistence and interlacement of various surface features (e.g., houses, vegetation, rivers, and roads) in the village context, with influence factors like light, season, and hour involved, the phenomena of “the same object with different spectrums” and “different objects with the same spectrum” can be serious and frequent.
In recent years, UAV RS has been applied strategically to vegetation monitoring, environmental management, building information identification, and other fields for its high spatial resolution and time flexibility. The sensors equipped in UAVs are capable of precisely capturing the spatial information of surface features (e.g., roof tiles, automobiles, structures, and individual trees) and outputting sub-meter RS images [
6,
7]. UAVRS makes up for the satellite imagery’s shortcomings (e.g., low spatial resolution and long acquisition span) and increases the possibility of identifying different roofs in a rural scenario of high complexity. Despite the high-resolution RGB images provided by UAVs, the three-band spectral information is too barren to be used to distinguish the structures in a complex village scenario.
The “building extraction from RS images” technique is based on high-resolution RS images to integrate with classification methods such as edge detection, image segmentation, digital morphology, and object-oriented classification to extract building information according to shape, spectrum, texture [
8], and other building features [
9] according to the context [
10]. Compared with pixel classification, object-oriented segmentation is based on spatial and spectral features and takes the objects composed of homogeneous pixels as the analytical units, which can effectively avoid the fragmented distribution of the surface features classified. With the booming development of artificial intelligence (AI), scholars have attempted to extract building information from high-resolution images by virtue of deep learning [
9,
11,
12,
13], but it is almost impracticable to acquire the massive high-quality sample data for deep learning in a complex village scenario. This is exactly the case for atypical traditional village landscapes, as the difficulty of collecting high-quality data disqualifies deep learning models from direct use. Supervised and unsupervised classification rely on deterministic memberships (also known as “hard classification”) to group pixels accurately into different classes according to the training sample points and the statistical standards, because each pixel involves or does not involve one or more relationships of subordination. The two techniques are especially applicable to scenarios where the surface features are externally unique yet internally homogeneous [
14]. However, not all objects are equally representative when it comes to a complex village scenario, and such a decisive “yes or no” approach may lead the classifier to output unreliable classification results [
15]. Nevertheless, the fuzzy logic membership model allows for fuzzy classification in practice, where some memberships can be 0, 1, or a random value between 0 and 1, and it has been widely applied in crop classification, image classification [
14,
16], medical science, and other fields.
Based only on the UAV RGB images, this research proposes an atypical traditional village landscape classification model based on UAV imagery (ATVLUI) by virtue of object-oriented image segmentation and fuzzy logic membership classification, which is highly capable of identifying traditional structures in a complex scenario of atypical traditional villages according to their shapes, spectrums, and textures. After the classification procedure, the spatial information and the landscape pattern indexes of each class of surface features are calculated for analyzing the spatial distribution characteristics of traditional structures, so as to provide scientific references for the protection and overall planning of traditional structures.
2. Data and Methodology
The ATVLUI can solve the problem of low-precision automatic extraction of traditional structures’ roofs in a complex scenario when UAV imagery serves as the only data source (
Figure 1). For an atypical traditional village, the surface features in UAV images include traditional structures, modern structures (e.g., modern roofs, vegetable greenhouses, and impermeable roads), vegetation (e.g., crops, trees, shrubs, and building greenery), bare land (e.g., unvegetated farmland and permeable roads), and shadows. The interpretation signs are shown in
Table 1. According to the UAV images, the structures are concentrated, with large semblable roofs, whereas the vegetation, bare land, and shadows are unevenly distributed as fragments of varying sizes. Therefore, the surface features can be grouped into “Structures”, which includes traditional and modern structures, and “Non-Structures”, which involve vegetation, bare land, and shadows. The classification involves 4 steps: (1) use a small segmentation scale, with the spectral information combined, to extract the non-structures by virtue of fuzzy logic membership classification, which involves masking the non-structure information in the original images; (2) use a large scale to segment the structure layers to extract the traditional and modern structures via membership classification according to the shapes, spectrums, textures, and context relationships; (3) use the confusion matrix to verify the classification results, and compare them with the results output by the K-nearest neighbors (KNNs), decision tree (DT), and random forest (RF) methods; (4) use ArcGIS’s spatial analyst toolbox and Fragstats’ landscape pattern analysis program to explore the spatial distribution characteristics of the landscapes in Qianfeng Village.
2.1. Survey Region
The research subject is Qianfeng Village, Fengshan Town, Fengqing County, Lincang City, Yunnan Province, China. Located in the southwest area of Fengqing County and 15 km away from the center of the county, Qianfeng Village possesses a territory area of 7.28 square kilometers, an elevation of 1900.00 m, and an average annual temperature of 16.60 °C. Facing the east, the village sits on the gentle slope at the bottom of the valley and is surrounded by farmland and woodland to form a beautiful environment (
Figure 2A). In 2023, Qianfeng Village was listed by the local government as one of the 21 villages requiring key attention under the “Beautiful Countryside” program. As an atypical village, most of its old houses have been replaced with modern structures (
Figure 2C), and a small number of traditional structures of high value have been preserved (
Figure 2B). Also, Qianfeng Village enjoys some rural tourism resources as its traditional tea-making and wine-brewing cultures have been preserved.
2.2. Data Acquisition and Processing
In this research, the DJI Phantom 4 Pro equipped with an RGB digital camera was used to acquire high-spatial resolution UAV image data on 17 August 2020. Clouds and light angles can produce excessive shadows to impact the image quality to hinder the subsequent extraction of information. Thus, it is essential to arrange the optimal flight time according to the weather conditions, thereby ensuring the quality of UAV imagery. On the day of data acquisition, the flight was performed between 10:30 and 14:30 local time, when the solar ray was nearly perpendicular to the ground, the sky was clear and cloudless, and the wind velocity was low. To generate orthoimages that can cover the entire research area, the forward and side overlap rates were set to 80% and 70%, respectively. The data acquisition was accomplished based on the preset flight path.
The Pix4D 4.5.6 mapper software was used to process the highly overlapping images acquired by the UVA. The procedure included adding images, aligning images, matching feature points, generating density point clouds, operating the digital surface model, and outputting orthoimages. Subsequently, the orthoimages were input into the ENVI5.6 software to remove the fuzzy parts on the boundaries with its crop tool. A linear 2% stretch was applied to amplify the pixel spectrum contrast.
2.3. Atypical Traditional Village Landscape Classification Based on UAV RGB Images
2.3.1. Determination of Thresholds for Object-Oriented Segmentation
The multiresolution segmentation (MRS) algorithm provided in the eCognition9.0 software is used for image segmentation, which is a bottom-up technique that can merge two adjacent image regions into a large object until no more mergence appears under the specified scale, realizing region merging at minimum heterogeneity.
The segmentation scale is one of the key parameters in extracting structure and non-structure information from RS images [
17]. The ESP2 tool in eCognition is applied to analyze and calculate the scale parameters for structure and non-structure information, which is based on the statistics of the local variances (LVs) of different scales to determine the optimal scale [
18] because the LV increases as the scale rises and reaches its peak when the scale matches with the actual surface feature. At this point, the internal homogeneity among the same class of objects is maximum, and so is the heterogeneity between different object classes. So far, this LV can be considered the optimal segmentation scale. Moreover, according to the local rate of change (ROC; Equation (1)), an ROC-LV curve can be generated, where the curve peak is equivalent to the optimal segmentation scale.
where
ROC is the object’s local rate of change;
L is the local variance of the target layer; and
L − 1 is the local variance of the layer next to the target layer.
Scale solutions for structures and non-structures are different. Upon analyzing the surface features’ band information, the three bands’ proportional weights can be obtained. Meanwhile, the control variate method is used according to the class of objects being segmented so as to determine the shape weight, color weight, and compactness. Accordingly, assuming ESP2 is the optimal tool for segmentation scales and the ROC-LV curve peak is the optimal scale, the optimal edge compactness should be determined to locate the best scale solution.
2.3.2. Fuzzy Logic Membership Classification
Based on object-oriented segmentation, fuzzy logic classification uses the membership function to establish rules for object classification. In the extraction of surface features in a complex scenario, setting only one threshold for an object class could be prone to segmentation faults and, thus, unreliable classification results. For instance, there are significant differences between the spectrums of uncovered vegetation, partly-covered vegetation, and fully-covered vegetation. Thus, in threshold selection, it is recommended to locate the critical sample point between the target and the non-target and set the problematical point to a critical interval through a fuzzy logic membership function.
2.3.3. Extraction of Non-Structures
The non-structures were vegetation, bare land, and shadows. Shadows referred to the parts of structures, vegetation, and bare land that were unexposed to sunlight due to height differences, such as eave-structure gaps and vegetation voids. In this case, the shadows were extracted using the spectrum averages (brightness) of the RGB bands and fuzzy logic classification (Equation (2)). For the extraction of vegetation, the shadows were removed from the images at first, and then the vegetation index (Equation (3)) was used to extract the vegetation in the shadow-free regions. The principle was that the vegetation’s spectral reflectance was at its peak on the green band to render the vegetation objects green in the RGB images. The extraction of bare land was realized by the blue–green difference index (Equation (4)).
where
is the number of spectrums;
is the gray value of the mth band that corresponds to the object;
is the gray value of the red band;
is the gray value of the green band; and
is the gray value of the blue band.
2.3.4. Extraction of Structures
After removing the non-building objects from the UAV RS images, the patches that remain are structures. Subsequently, the differences between the traditional and modern structures, in terms of spectrum, texture, and geometrical shape, were explored one at a time, with different criteria selected or built and proper threshold parameters adopted, to separate the traditional structures from the modern ones using layered extraction. However, the problem of “different objects with the same spectrum” was noticed for the traditional and modern structures, which necessitated a sequential classification order from texture to shape and spectrum. Also, a refinement of the classification results based on the context relationships was performed.
The first step is the structure extraction based on the texture features. As for texture differences, the roofs of traditional structures are mainly composed of antique tiles with uneven surfaces, detailed and regular textures, and low homogeneity (
Figure 2A), whereas the modern structures (e.g., impermeable roads, vegetable greenhouses, and concrete roofs) tend to have smooth, flat surfaces, without significant textures and with high homogeneity (
Figure 3b,c). In this section, the gray level co-occurrence matrix (GLCM) dissimilarity and the spectral standard deviation (StdDev) were adopted to eliminate the modern structures without a significant texture, as the GLCM reflects the comprehensive information of the image grayscale in direction, inter-pixel distance, and amplitude of variation (the higher the dissimilarity, the rougher the texture) and the StdDev is calculated based on the layer values of all n pixels that consist of an object (the smoother and flatter the surface and the evener the color, the smaller the deviation).
The second step is the structure extraction based on the shape features. An object’s shape feature is equivalent to the spatial distribution statistics of the pixels constituting the object, whose geometric information can be captured by calculating the approximations of the objects’ bounding boxes. As for shape differences, the traditional structures have a rectangular roof, with a vertex on the center line and two slopes. The shape segmentation is structured, mostly presented as a wide rectangle (
Figure 3d,f). In contrast, the modern roofs tend to have a significantly different style from the traditional ones, most of which consist of a central platform and a thin, vertical brickwork parapet wall around it (
Figure 3e). In image segmentation, the brickwork structure can be segmented into independent strip-like objects due to its highly different color and shape from the platform. Particularly, these parapet walls have an extremely similar texture to traditional roofs. Therefore, the analysis of the shape differences between the traditional and the modern structures is necessary for the structures that remain in the images after the texture-based extraction. The non-rectangular structures and a part of the fragmented modern structures can be effectively removed according to their values of length–width ratio, rectangular fit, roundness, and area.
The third step is the structure extraction based on the spectrum features. After the two procedures above, there can still be some structures remaining ambiguous, which are mainly rectangular, textured roofs in red, blue, and white, with high reflectance (
Figure 3g–i). Based on the spectrum differences between the remaining structures and the traditional structures, white roofs with high reflectance can be removed using the brightness and the sum of the grey values of the RGB bands (Equation (5)), red roofs can be removed using the red–green difference index (Equation (6)) and the red mean difference index (Equation (7)), and blue roofs can be removed using the blue–green difference index (Equation (4)).
where
is red band;
is green band;
is blue band.
The final step is the refined structure extraction based on the context information. So far, some of the traditional structures might be mistakenly classified as “modern structures” for they satisfy a certain condition. For instance, some traditional structures are provided with a convex central beam, which can be segmented into an independent strip-like object due to its significantly different brightness from the surrounding objects because of sunlight. Therefore, these traditional beams may be classified as modern structures in the second shape-based extraction. According to the “relative border to class” rule of context information, for objects on the same layer, an object’s relative border is calculated by dividing the shared border length of the objects from the class to which this object is assigned by the total border length. The relative border can be 0 to 1, where 1 indicates that an object is fully surrounded by the objects that have been assigned to another class. In other words, when a “modern” object is surrounded by traditional objects, this object should possess a high “shared border to total border” ratio (i.e., rel. border to class), suggesting that it should be considered a traditional structure.The extraction process of traditional buildings is show in
Figure 4.
2.4. Evaluation of the Classification Results of Atypical Traditional Village Landscapes
The confusion matrix is an important tool for evaluating the performance of classification models by comparing the classification results with the actual measurements and presenting the number of true and false results in the matrix. To verify this layered multi-feature model’s performance in classifying traditional structures, the research compared its results with the results from the conventional machine learning (ML)-enabled supervised classification methods KNN, DT, and RF. The confusion matrix was used for classification accuracy evaluation for the model’s results [
19], and the model was compared with KNN, DT, and RF in terms of single-class accuracy, overall accuracy (OA), and Kappa coefficient. Also, random sampling was used to acquire sample points for accuracy validation, and a total of 1579 sample points were collected upon field verification and human–computer interaction interpretation.
where
OA is the overall classification accuracy, indicating the ratio of the number of true results to the number of false results;
N is the total number of verification samples; n is the number of classes; and
is the number of a class’s pixels being correctly classified.
where,
Kappa is the Kappa coefficient, which is a ratio representing the proportion of errors reduced by the current model to a fully random classifier;
is the number of pixels of any class in the classification results;
is the number of samples of the ith class after classification;
is the predicted number of samples of the ith class;
N is the total number of verification samples; and
n is the number of classes.
2.5. Calculation of Landscape Pattern Indexes
The landscape pattern indexes allow for quantitative analysis of the composition and spatial distribution characteristics of a landscape structure from three dimensions: patch level, patch type, and landscape level [
20,
21]. Based on the classification results extracted from the RGB UAV images, with the research objectives and these indexes’ ecological significances considered, the research used Fragstats 4.2 to calculate the aggregation index (AI), interspersion and juxtaposition index (IJI), mean nearest-neighbor distance (MNN), and contagion index (CONTAG), so as to explore Qianfeng Village’s landscape pattern and spatial heterogeneity.The four index’s landscape pattern indexs and their ecological significances show in
Table 2.
5. Conclusions
This atypical traditional village landscape classification model based on UAV images (ATVLUI) is capable of extracting the spatial information of village landscapes, especially traditional structures, in a rapid, efficient, and accurate manner. As a rational quantitative analysis for rural planning, it can shorten the planning and design cycle, reduce the costs of information acquisition, and improve the objectivity and accuracy of the information obtained. In this case, the ATVLUI has provided detailed, precise data to support the subsequent planning and construction of Qianfeng Village. At the same time, it is a new technological solution for spatial information acquisition, which can be used to improve the early investigation and later planning and construction of atypical traditional villages to provide an optimal scheme for village planning and design. So far, the following conclusions are suggested:
First, the integration between the automatic landscape classification algorithm and the centimeter UAV RS imagery provides an opportunity to capture the information of the landscapes in atypical traditional villages rapidly and accurately. The ATVLUI proposed in this research is capable of classifying the village landscapes in a complex scenario precisely based on UAV RGB images, which boasts a classification accuracy for traditional structures of 84%, an overall accuracy of 93%, and a Kappa coefficient of 0.89, superior to KNN, DT, and RF.
Second, the high-precision classification map for atypical traditional village landscapes provides us with the spatial distribution characteristics as basic, precise quantitative information for protecting atypical traditional villages. According to the calculations of the area and proportion of each class of landscape objects, the structures account for 33.94% of Qianfeng Village’s total area, in which 29.69% and 4.25% are modern and traditional structures, respectively. The number of traditional structures is 202, accounting for 13% of the total number of structures.
Third, connectivity between and extension of the modern structures can be noticed in Qianfeng village as an atypical traditional village, which demonstrates the existence of a trajectory where the traditional structures are being gradually substituted by modern ones. At the periphery of the village, the vegetation coverage is high. Within the village, the aggregation of all the classes of surface features is high, the lengths of structure-to-structure common boundaries are great, the modern structures are densely distributed, and the discretely distributed traditional structures gather as small clusters. Overall, the distribution pattern is highly fragmented, where different structures are highly interlaced.