Massive rural–urban migration has accelerated the process of urbanization and industrialization in China in the last few decades. From 2000 to 2016, China’s rural resident population decreased from 808 million to 589 million, showing a decline of 27.1% [1
]. However, the area under rural homesteads has increased rather than decreased because newly evicted farmers prefer to keep rural homes [2
]; it has expanded from 14.5 to 19.9 million hectares, translating into an increase of 37.2% [1
]. A vast number of farmers treat their homesteads as inherited wealth and not just as land for construction. At the same time, when farmers settle in cities, the transfer of homesteads to others is restricted [5
]. Many challenges persist regarding the use and management of rural homesteads. On the one hand, a rural homestead serves as housing security for the farmer [6
]. On the other hand, this sense of security has given rise to irrational phenomena, such as the over-occupation of land, leaving land idle, and the under-utilization of land [7
]. To promote rural development, the Chinese government’s proposal for the construction of beautiful villages focuses on the preparation of village plans according to local conditions, in-depth surveys of the farmers, and the rational layout and conservation of land. The use and management of homesteads is a key part of this exercise, and, thus, additional data and field surveys to those currently available are required. Household surveying is a common method of collecting relevant socio-economic and thematic information, with homestead and floor areas forming the core of this information [8
]. However, a small proportion of farmers often have an incentive to misrepresent data to receive higher government subsidies or to avoid exposing over-occupied land [9
]. Moreover, most of the villages in China are densely populated with homesteads, requiring extensive and time-consuming surveys. Therefore, additional approaches are needed to collect more accurate spatial data to properly monitor the condition of the rural homesteads.
The use of unmanned aerial vehicles (UAVs) offers new opportunities for monitoring rural homesteads, as they facilitate real-time and high-resolution data collection [10
]. Due to the centimeter-scale resolution of the ground texture, UAV images are beneficial for the visual interpretation of rural homesteads [11
]. Yang et al. measured the building density and floor area ratio of rural settlements using a Dajiang UAV with visual interpretation [12
]. However, visual interpretation is inadequate to support rural surveys in China, which usually cover thousands of villages. The height of a building may be detected in different ways using UAV images. Li et al. proposed a method for estimating building heights using sentinel-1 data, which focused on the urban scale [13
]. Wang et al. reconstructed a 3D building based on UAV tilt photography [14
], which is not suitable for dense rural homestead communities because their calculations were based on a single building.
In recent years, deep learning methods have also been used to identify rural buildings [15
]. Li et al. employed AlexNet and support vector machine algorithms to detect hollow village buildings based on high-resolution remote sensing images [16
]. These approaches are based on object detection methods [17
], whose primary task is to find all the objects of interest in the image and determine their locations [18
]. Object detection techniques use rectangular frames to locate objects, but both the roof distribution and the roof shape of rural buildings are irregular; thus, the identification accuracy of these methods is limited [19
]. Furthermore, in the homestead identification task, the desired output should include homestead building boundaries, and each pixel should be assigned a class label [20
]. Pixel-based technology implies that the network learns to provide predictions for each pixel [21
]. U-net is a convolutional autoencoder widely used in the medical field and other industries; it performs high precision pixel-based segmentation on images [20
]. However, the use of U-net to recognize rural homesteads is still uncommon.
In the estimation of the homestead and building floor areas at the village-level, it is still a challenge to explore a method applicable to rural China to achieve real-time image acquisition, pixel-based identification, and 3D modeling for rural buildings, one that provides a potential alternative to time-consuming and laborious household surveys. In this study, the objectives were: (1) to extract the spatial distribution of homesteads from UAV images, mainly relying on a pixel-based image classification using the U-net algorithm; (2) to develop and validate a building height model (BHM) to determine the number of floors and the floor area of rural buildings based on 3D modelling; and (3) to develop and test a village-level method to estimate homestead and floor areas in a rapid and low-cost manner, which is useful for rural surveys in China and other developing countries.
A method based on UAV imagery and the U-net algorithm was developed for the estimation of village-level homestead and floor areas, with the advantage of real-time image acquisition, pixel-based identification, and 3D modeling recognition. The overall resulting accuracies were 0.92 and 0.89 for the homestead area and the number of building floors, respectively. Thus, our experience of using a combination of UAV and U-net technologies to identify village-level objects provides a potential alternative to time-consuming and laborious household surveys, which has important implications for the ongoing homestead use and management reform in China, especially for homestead ownership confirmation.
In Table 1
, U-net showed high accuracy in identifying the buildings in this study. Many attempts have been made to use convolution neural networks (CNNs) to improve the performance of building detection based on object detection technology [15
]. However, object detection techniques use rectangular frames to locate objects, and the distribution of homestead buildings and the irregular shapes of the roof planes limit the identification accuracy of these methods [1
]. Konstantinidis et al. proposed a modular CNN architecture to identify buildings with pixel-based detection technology [29
], wherein the network learns to provide some dense predictions for each pixel [21
]. The pixel-based architecture is fully convolutional; therefore, in this work, we employed the commonly used pixel-based architecture. Papadomanolaki et al. compared multiple methods based on CNN architecture and enforced pixels that belonged to the same object to be classified under the same semantic category [21
]. Therefore, the results of this study prove the advantage of U-net and a pixel-based architecture for estimating the area of rural homesteads.
However, some error sources remain. The BHM estimates are the key to determining the floor areas of the rural buildings. In theory, the height of an object can be calculated from its UAV image using photogrammetry, by subtracting the DSM from the DTM [30
]. However, it is difficult to extract a BHM from a UAV-derived DTM because the terrain surface is obscured by the roof [31
]. Furthermore, the DSM was estimated to use the elevation control points located within the range of the country trails around the homesteads. Since rural trails in the southern part of China are generally narrower, the DSM interpolation surface errors were slightly higher for the narrow trails than the other open areas. Figure 4
b shows a comparison of the interpolated ground control points and UAV-derived DSM; both showed excellent agreement, as R2
equaled 0.99. However, as explained previously, the DSM elevation values generated with the UAV images were usually overestimated for the narrow roads surrounded by the buildings. Therefore, we set the DSM elevation surface threshold to less than 1 m for the ground surface area.
Moreover, the average slope of the area is approximately 30°, increasing the difficulty of an accurate interpolation. If the study area is located in a plain, the uncertainty caused by the slope will be relatively small. However, in complex terrains, such as the one in this study, improvement in accuracy will require an increase in the surveyed and measured sampling points.
U-net provides an advantage in terms of the number of training samples, as the algorithm requires a small amount of data to train the model [20
]. Due to the limited range of the UAV flights, the 188 image pairs were augmented to a total of 4324 image pairs. The overall accuracy of the U-net deep learning network recognition was generally good (0.92), and rapid image segmentation was possible with the established model. The data augmentation played a vital role, allowing a few annotated images and a very reasonable training time to complete image recognition. However, the question remains whether there is a lower limit of annotated images for U-net to work accurately. In subsequent studies, we plan to decrease the training image pairs to test the robustness of U-net.
In addition, this study referred to 19 ground survey sites, and the proposed technique provided a consistency of 89.47%. This number may be an overestimate or an underestimate; however, during household surveys, most farmers reflect the true situation, and evidence of homestead ownership was confirmed, but it is possible that some of the descriptions may have been biased.
In this study, a method based on UAV imagery and the U-net algorithm was developed for village-level homestead and building floor area estimation, with the advantage of real-time image acquisition, pixel-based identification, and 3D modeling recognition. The resulting overall accuracy for the estimation of the homestead area and the number of building floors was 0.92 and 0.89, respectively. This method is a potential alternative to time-consuming and costly household surveys and is, thus, of great significance not only for the use and management of homesteads, but also for the ongoing homestead ownership confirmation in China. The combination of UAV imagery and the U-net algorithm may also have broader applications in the area of homestead use and management. For instance, the number of greenhouses, irrigation facilities, and even agricultural machinery are important components of rural household surveys. The proposed method can assist decision-makers to grasp the current state of the rural socio-economic environment and make policy recommendations accordingly. In the future, the accuracy of the model for use in areas with complex topography and dense housing will be further improved.