1. Introduction
In recent years, advancements in the miniaturization of electronic devices and composite materials have facilitated the lightweight development of unmanned aerial vehicles (UAVs), offering advantages such as high portability, flexibility, low operational costs, simple and efficient workflows, and fewer legal constraints [
1,
2]. In the fields of remote sensing and earth sciences, lightweight UAVs have a broad range of applications [
3,
4], enabling high-quality observation tasks. The collected data exhibit high temporal resolution and very high spatial resolution (VHSR) [
5], capabilities that are gradually replacing traditional measurement methods and platforms [
6,
7]. Many countries and international organizations are now developing laws and regulations to establish the legality, limitations, and operational standards for UAV flights [
8,
9]. For instance, Chinese “Interim Regulations on the Flight Management of Unmanned Aerial Vehicles” classifies UAVs based on their weight and size, specifying different regulatory measures accordingly. To further promote the use of lightweight UAVs, legal frameworks often feature lower entry barriers, such as not requiring a pilot’s license and simplifying the process for obtaining flight permissions without the need for airspace authorization [
10]. These advantages offer significant convenience for practical surveying tasks.
Despite their various advantages, the reliability of data collected by lightweight UAVs remains uncertain, and there is limited understanding of their performance in remote sensing applications [
11]. A notable issue for lightweight foldable UAVs is whether they can balance smaller size with high data quality. This question warrants further investigation.
To evaluate the performance of specific UAV models in practical applications, many studies have designed tailored case studies to assess hardware usability as well as data accuracy and quality. For lightweight foldable UAVs, there have been evaluation articles on various models, such as DJI Spark [
12], DJI Mini 2 [
13], and DJI Mavic 2 Pro [
14]. These studies primarily focus on consumer-grade UAVs, evaluating the applicability of their captured photos and videos across different fields. Compared to consumer-grade UAVs, mapping-grade UAVs require additional parameter configurations during operation, and their data must undergo digital image processing to produce usable remote sensing datasets. Therefore, this study emphasizes that the evaluation of mapping-grade UAVs should focus on the practical workflows of surveying, considering whether the generated data can support subsequent analyses. In remote sensing, most existing research uses ground control points (GCPs) within the study area as accuracy reference points to compare the photogrammetric performance of different UAV systems [
15,
16,
17,
18]. While such evaluations are meticulously designed (e.g., flight path planning and GCP setup), they typically aim to obtain specific metrics or accuracy parameters, and some further compare these metrics across different devices [
17,
18]. However, these studies often lack comprehensive assessments of UAVs in real-world and challenging remote sensing scenarios. Regrettably, to date, only a very limited number of case-based evaluations have been conducted for professional-grade UAVs designed for surveying. Without validation in practical applications, it is challenging to accurately assess the performance of new devices, which may hinder the broader adoption of such UAVs. This is precisely the issue we aim to explore in this study.
To assess the reliability of low-altitude remote sensing data from lightweight foldable drones, this study used the extraction of dense buildings in urban villages as a case study. Urban villages refer to rural areas that have been absorbed into urban spaces or administrative boundaries [
19]; their formation is closely related to China’s unique urban–rural system, resulting in a mixed spatial structure and landscape [
20]. These areas are often associated with issues such as unhealthy living environments, inefficient land use, crowded and disordered physical landscapes, poor housing quality, and severe infrastructure deficiencies [
21]. Urban villages are characterized by narrow roads, buildings facing each other, and minimal open sky between structures [
22], which imposes high demands on the quality of remote sensing data. Satellite remote sensing, with its lower resolution and complex background interference, struggles to accurately delineate the boundaries of urban villages [
23], and it is even more challenging to capture the internal structural features of these areas. In contrast, drones equipped with imaging sensors can operate closer to the target, providing higher-resolution imagery, which serves as a valuable technical means for obtaining VHSR images of urban villages.
In terms of image analysis, Object-Based Image Analysis (OBIA) has become a significant component of remote sensing research related to land cover mapping [
24,
25]. Distinguished from traditional pixel-based image analysis methods [
26], OBIA first segments the input image into local objects, which then serve as spatial units for subsequent analysis, classification, and accuracy assessment [
27,
28]. Classifier methods are then applied to categorize the spectral characteristics of these segments, identifying the types of ground targets [
29], and allowing for the definition of more complex classes based on spatial and hierarchical relationships during the classification process [
26]. Integrating UAV systems with OBIA technology offers a flexible and cost-effective means to enhance ground-truth data [
30]. Previous studies have shown promising results using this approach in various applications, such as mangrove mapping [
31], classification of ecologically sensitive marine habitats [
32], extraction of solar photovoltaic panels [
33], olive tree canopies [
34], weed detection [
35], and detecting skips in sugarcane fields [
36]. In the context of building extraction, previous studies have employed satellite remote sensing imagery, such as Sentinel-2B [
37] and QuickBird [
38], to extract buildings. Most of these studies focused on areas with large building separations, where distinguishing between individual buildings was straightforward. However, the lower spatial resolution of satellite imagery makes it challenging to extract buildings in densely populated environments. Conversely, there are relatively few case studies utilizing OBIA to extract buildings from VHSR UAV imagery, especially those employing lightweight foldable drones [
39]. Overall, while OBIA has been widely applied to various types of land cover mapping, the differences between study areas imply that the methods and parameter settings effective in one area may yield inconsistent results in another [
40]. This provides a feasible point for us to design a specific case study of this approach.
The extraction of dense buildings in urban villages places higher demands on both the reliability of the data and the data processing techniques, presenting a test for the capabilities of lightweight foldable drone data and a new attempt at applying OBIA. In this study, the researchers employed the lightweight foldable DJI Mavic 3 Enterprise drone to capture VHSR imagery of urban villages, utilizing OBIA techniques to extract buildings densely packed with small separations. This study aims to evaluate both the advantages and limitations of this UAV system across the entire workflow of remote sensing data acquisition, processing, and analysis; accordingly, the usability of the drone’s remote sensing data was assessed based on building classification accuracy and extraction results. To achieve this, the following main tasks were carried out in this study:
- (1)
VHSR images of urban villages were captured using the Mavic 3 Enterprise drone, along with airborne laser point cloud data for evaluating segmentation performance;
- (2)
The Multi-Resolution Segmentation (MRS) was applied to segment the VHSR images, with a detailed exploration of the segmentation parameters for this case, and then visual comparison and evaluation of the segmentation results were performed using the airborne laser point cloud data;
- (3)
Classification of image objects using machine learning algorithms, comparing the classification accuracies of three algorithms, namely K-Nearest Neighbor (KNN), Bayes, and Decision Tree, and realizing the extraction of high-density buildings;
- (4)
The advantages of this UAV are elaborated and discussed in the above work process, and the challenging scenarios in the case of urban village are prospected.
This study is structured as follows:
Section 2 introduces the specifications and advantages of the drone, as well as the photogrammetry and data processing workflows.
Section 3 describes the object-based methodology and procedures, explaining how the study evaluated the effectiveness and accuracy of the results.
Section 4 presents the study’s results, including data quality, comparative results of feature segmentation, and the performance of building extraction.
Section 5 discusses the findings of our study, and lastly,
Section 6 summarizes the work and presents the study’s conclusions. To clearly illustrate each major step and the associated analytical methods, the framework presented in this paper is depicted in
Figure 1.
5. Discussion
5.1. Advantages and Development Prospects of Lightweight Surveying and Mapping UAVs
This study highlights the significant improvements in mobility and flexibility that lightweight UAVs bring to fieldwork. Compared to traditional UAV systems, such as the M300 small UAV used for airborne LiDAR point cloud collection, which requires carrying four batteries (20 kg), the UAV itself (8 kg), and airborne sensors (2 kg), lightweight foldable UAVs greatly reduce transportation and labor costs. Besides their compactness, the development of lightweight airborne hardware modules also improves fieldwork efficiency. For example, the traditional GCPs method involves carrying GNSS receivers and manually marking control points in the images, which is inefficient. In contrast, the RTK module used in this study weighs only 24 g and eliminates the need for ground control points, meeting 1:500 mapping accuracy requirements [
44] and significantly reducing GCP setup workload [
15,
16,
17,
18]. Furthermore, the weight of sensors is crucial to UAV design and aerodynamic performance [
65]. While consumer-grade digital cameras were previously used for capturing very high spatial resolution (VHSR) images [
66], resulting in heavy takeoff weights, the Mavic 3 Enterprise in this study features a lightweight visible light sensor, integrated with the gimbal and UAV as a single system. This integrated solution, first applied to consumer-grade UAVs, has since been successfully implemented on foldable platforms like the DJI Mavic 3M and DJI M30T, offering high portability in professional applications. Sensor miniaturization is key to achieving lightweight UAV designs. However, some sensors, such as LiDAR, hyperspectral, and synthetic aperture radar, remain large and heavy, requiring larger UAV platforms with sufficient takeoff weight [
6]. As sensor technology advances, we expect UAVs to become even more portable, expanding their applications across various industries.
Regarding regulatory restrictions, lightweight UAVs are subject to fewer controls, reducing the time flight operators need to secure airspace and coordinate with authorities. To clearly demonstrate the advantages of lightweight foldable drones in practical operations,
Table 9 presents the operational costs of the Mavic 3 Enterprise in this study, with the M300 RTK, also used in the case study, serving as a comparison group. The methodology for quantifying these costs is detailed in
Table 10.
The results indicate that lightweight foldable drones offer substantial economic benefits, particularly in terms of equipment procurement and associated hardware. Unlike the M300 RTK, which requires additional purchases such as onboard sensors and battery boxes, the Mavic 3 Enterprise integrates the camera with the aircraft body, significantly reducing procurement expenses. Operationally, the depreciation cost of batteries also highlights a major difference. Larger drones rely on more expensive batteries and often require multiple units per flight, leading to higher overall depreciation costs. For instance, the M300 RTK necessitates the simultaneous use of two high-cost batteries, whereas the Mavic 3 Enterprise operates efficiently with a single, more affordable battery. In practice, the cost of transporting equipment to the job site is a significant expense. The compact and lightweight design of the Mavic 3 Enterprise allows it to be transported in a single handheld carrying case, as shown in
Figure 3. In contrast, the M300 RTK requires the transport of around 20 kg of batteries and battery boxes, as well as additional on-board sensors, resulting in a six-fold difference in transport costs between the two in this case. Additionally, lightweight drones demonstrate lower costs in equipment deployment, operational maintenance, and personnel training. They also offer advantages in terms of reduced legal and regulatory burdens, which are often harder to quantify but significantly enhance the efficiency and feasibility of their application.
In the previous sections, several references and policy regulations were cited to explain that lightweight UAVs have a low operational threshold, which enhances the flexibility of flight operations. However, this does not imply that the operators can ignore regulatory constraints when using lightweight UAVs. For example, in China, all civilian UAVs must be registered with the Civil UAV Comprehensive Management Platform through their serial number (SN). In the event of a violation of airspace regulations, such as unauthorized entry into controlled airspace or exceeding altitude limits, the registered information will be used to contact the responsible party. It is important to note that while lightweight UAVs offer more flexible flight capabilities, entering sensitive airspace can pose serious safety risks [
67]. Special no-fly zones, such as airports and military facilities, must be observed, and operators should thoroughly understand the regulations governing the flight area and submit flight applications to relevant authorities before flying. In this study, the flight altitude was set to 120 m, which is the maximum height that does not require airspace approval according to the “Interim Provisions on the Administration of UAV Flights.” Nevertheless, the researchers still contacted the management committee of Beiting Village and the local police station before the flight to inform them of our flight time and area. It is believed that in the future, the threshold for low-altitude airspace usage and the difficulty of obtaining approval should be relaxed, which would require policy reforms from relevant authorities. At the same time, UAV operators should strictly adhere to regulations and establish communication with management authorities prior to flights, even if some application procedures may not be strictly necessary.
In terms of data evaluation, previous studies have already assessed the positional accuracy of the data [
44]. Therefore, this research primarily focuses on the usability of the data from a remote sensing analysis perspective, evaluated through case studies. Based on the DOM imagery collected and generated in this study, as well as the building extraction results from the case study, despite the small size of the Mavic 3 Enterprise, it still demonstrated the capability to capture high-precision and high-quality remote sensing data. The UAV was able to clearly display detailed textures in urban villages, such as canopies, water heaters, and solar panels. Although ground control points were not used to validate the positioning accuracy, it was observed that, with the support of the “Qianxun” service, the spatial location data collected by the two UAVs (DOM and LAS) exhibited a high degree of consistency. This consistency provided a solid foundation for subsequent data fusion and analysis. The efficiency of data processing was also a focus of the study. Compared with software like Pix4D and PhotoScan, DJI Terra exhibited higher compatibility for processing and analyzing the data, with automatic image recognition and simplified workflows. This suggests improved processing efficiency [
44] and a lower learning curve for users. It is worth noting that these images were collected under favorable weather conditions, and the impact of environmental variations on the data has yet to be studied. For instance, further research is needed to determine whether high wind speeds could cause deviations in flight paths, leading to data that do not meet preprocessing requirements, or whether poor lighting conditions could reduce image resolution. Moreover, the consistency of data under varying environmental conditions remains to be explored.
5.2. Challenges and Future Prospects in MRS for UAV Data
In terms of OBIA technology, MRS is one of the most widely used and successful remote sensing image segmentation algorithms [
68], with a large body of literature available for reference, and it can be conveniently implemented in eCognition software. This study not only validates the data but also provides a complete and practical workflow as a reference. Since there are no previous studies on lightweight foldable UAV data, we conducted 24 experiments adjusting the
,
and
to determine the optimal segmentation parameters for this study’s specific context. However, most existing papers do not provide detailed guidelines for setting MRS parameters [
26,
28,
40], which is considered an oversight in studies evaluating new types of data. Therefore, the results for the MRS parameter are provided in
Section 4. Additionally, it is important to note that a scientific and objective quantitative method remains to be explored for determining the MRS band weights. Moreover, the optimal values for MRS parameters are often determined through repeated trials [
54,
69,
70]. While these operations are easy to implement, repetitive work can be time-consuming. To address this, the ESP2 tool was used to minimize subjective human influence and reduce the number of repetitive experiments, although fully automated segmentation without human intervention was not achieved in this study. Regarding the evaluation of segmentation results, while many studies rely on quantitative methods to assess segmentation performance [
55,
70,
71,
72], no single evaluation method applies universally across all segmentation scenarios [
69,
73,
74]. Therefore, many studies resort to a visual subjective assessment of segmentation results [
75]. To this end, efforts were made to fully utilize the intensity and height features from the LiDAR point cloud to highlight the differences between dense buildings. Based on the PCV results, this approach more effectively emphasizes building contours. Compared to relying solely on visual subjective evaluation, this is considered a more intuitive and efficient method for visually assessing segmentation results.
In the practical case of extracting urban village buildings, this study demonstrates, on one hand, the high-resolution capabilities of UAV data, clearly highlighting the dense spacing between buildings and the narrow roads characteristic of urban villages [
22], which are challenging to capture using satellite remote sensing data [
39]. On the other hand, while the data proved sufficiently reliable, the increased surface texture details posed significant challenges for data analysis. It was found that the difficulty of extracting urban village buildings lies not only in their narrow spacing but also in the complex texture features within individual buildings. Previous studies [
33,
36,
37,
38] have successfully extracted features in areas with low building density, achieving high classification accuracy and providing precise surface feature counts. However, replicating such results in the case study of this research proved challenging, particularly in areas with complex surface conditions. These include buildings completely covered by multiple canopies, collapsed structures, or cases where the spacing between buildings is effectively zero. In such scenarios, even visual interpretation struggles to determine whether the features represent a single building or multiple structures.
Overall, this study demonstrates the potential application of the lightweight UAV + OBIA solution in complex land cover environments. As UAVs continue to become lighter and OBIA methods mature in high-resolution image processing, this solution is expected to be applied in more complex scenarios, including fine land cover extraction in mountainous terrains, urban renewal, and monitoring of aging urban areas, as well as post-disaster rapid assessments and emergency responses. These scenarios may include environments where large UAVs could not previously operate, tasks requiring high flexibility and immediacy, or cases where pixel-based analysis is inadequate.
5.3. Defects and Prospects: Addressing the Complex Urban Village Scenario
As mentioned in
Section 5.2, due to the complex and heterogeneous surface features of urban villages, achieving detailed building extraction remains a significant challenge. To investigate the key difficulties impacting building extraction in urban villages, a field research was conducted. In contrast to the low-altitude perspective of drones, an attempt was made to capture the complex scenes of urban villages from ground-based viewpoints using georeferenced control points. Handheld LiDAR scanners and panoramic cameras were employed to collect ground point clouds and 360-degree street view images, as shown in
Figure 16a, and these were synthesized into true-color ground point clouds. During the data collection process, unlike the previous network RTK, Ground Control Points (GCPs) with absolute coordinates were used, and the data were processed in Lidar360 software. Both airborne point clouds, collected with network RTK services, and ground-based point clouds controlled by GCPs were loaded into the software, and as shown in
Figure 16b, the two datasets aligned closely with negligible horizontal and vertical offsets, validating the reliability of using airborne LiDAR point clouds as a benchmark for evaluating segmentation results.
It was found that airborne LiDAR effectively captures data from building tops, while ground-based LiDAR provides more detailed information from oblique directions (e.g., building sides), as shown in
Figure 17. However, phenomena such as tightly spaced buildings (“handshake buildings”) and narrow passageways (“line of sky”) that appear on narrow village roads (highlighted by the red rectangle in
Figure 17) still pose challenges. Despite using handheld devices to enter these narrow pathways, a significant portion of the data could not be fully captured, or even when data were collected, some crucial information was lost. This is attributed to the narrow spacing between tall objects, which limits visibility. Specifically, airborne LiDAR captures data from building tops, while ground-based LiDAR captures data from the bases of buildings, which are less affected by visibility constraints. This results in data loss from the middle sections of buildings. Therefore, in very narrow village roads (and similar scenarios), obtaining complete image and point cloud data remains a challenging task. The incompleteness of the data hinders the correct recognition and analysis of such scenes. This is viewed as an unresolved challenge at present and a potential source of error in this study, which may be addressed through more refined air-ground data fusion techniques in future work.
As a preliminary evaluation, while this study has achieved satisfactory accuracy and results, the case study presented is limited to only one scenario of urban villages. As mentioned in
Section 5.1, the reliability of this equipment is anticipated to be validated under other complex operational conditions. Following this study, it is anticipated that lightweight drones will see broader applications in various fields such as photogrammetry, agriculture, and forestry—areas where drones may not have previously been used for data collection, or where larger, less portable UAV systems have been employed. At the same time, future research is expected to assess the performance of the OBIA method model proposed in this study across additional scenarios and evaluate the transferability of the method. This will rely on the collection of further datasets using novel drone technologies. Although airborne point cloud data in this study were primarily used for comparative evaluation, LiDAR, in contrast to visible light, offers the advantage of capturing three-dimensional geographic information and multiple return signals [
76]. This capability makes LiDAR more suitable for extracting information in complex environments. Further miniaturization of LiDAR technology and its broader adoption in more complex and challenging environments are anticipated.
In terms of algorithm selection, the focus of this study was to evaluate novel drone-based data. This study aimed to employ well-established processing methods that have been widely applied to various remote sensing datasets, for two main reasons: first, to assess the reliability of new data using these classical methods, and second, to evaluate the applicability of these methods to novel datasets. These two objectives are considered complementary. Machine learning algorithms tend to perform effectively in single case studies because they typically require fewer training samples, are less prone to overfitting, and offer good generalization capabilities. In this study, three machine learning classifiers were selected to classify image objects. Among them, the KNN algorithm classifies each pixel based solely on its neighboring samples. It is particularly sensitive to local decision boundaries and can more effectively adapt to complex boundary characteristics when processing different types of land cover, which is why it performs well in extracting areas with high-density buildings. However, this sensitivity can lead to misclassification when dealing with land cover types that have similar spectral characteristics, such as roads, narrow alleys, and vacant land. Consequently, the classification accuracy for roads is lower than for buildings. The Bayes algorithm, a probabilistic model, shows strong stability in areas where land cover categories are clearly distinguishable. However, due to its assumption of independence, it tends to misclassify roads and vacant land in regions with overlapping spectral features. The decision tree may make incorrect classifications at certain nodes when dealing with land cover types with ambiguous boundaries. Furthermore, a deeper tree depth may lead to overfitting, and the decision tree struggles with categories that have unclear boundaries, particularly in urban village scenarios. From the perspective of classification accuracy, the machine learning algorithms selected in this study meet the classification requirements for the urban village scenario. However, the current algorithms may have limitations when sample sizes and datasets are small. Specifically, both the Bayes and Decision Tree algorithms exhibited user accuracies below 0.6. As indicated by the confusion matrix, the primary cause of this issue is the misclassification between buildings and roads. The narrow spacing between buildings in urban villages results in the occlusion of roads, and the similarity in appearance between concrete rooftops and concrete roads in visible light imagery affects road classification accuracy. Furthermore, the complex road structures in urban villages pose challenges for classification, as these roads do not have consistent widths like main roads and often lack clear boundaries. The variability of road features (e.g., varying pavement materials and widths) makes accurate classification challenging for algorithms.
As the reliability of lightweight UAV data is increasingly proven and the application of similar devices becomes more widespread, further improvements in OBIA algorithms will promote the broader adoption of the OBIA + lightweight UAV solution. It is anticipated that deep learning methods will offer significant advantages for image classification in future research. Deep learning outperforms traditional machine learning in feature representation, enabling the automatic extraction of complex image features using multi-layer neural networks. It excels at classifying multi-scale, multi-shaped, and heterogeneous regions in images. With the increasing popularity of drones and the convenience of acquiring VHSR imagery, the integration of deep reinforcement learning into remote sensing image segmentation is expected. This will minimize human intervention and enable fully automated data processing. Additionally, developing evaluation methods for segmentation accuracy across various scenarios will greatly enhance the reliability and feasibility of related research. This is likely to become a key direction for future research in image analysis and processing in drone applications.
6. Conclusions
In this study, the data reliability of lightweight foldable UAVs in complex land-cover scenarios was demonstrated. Specifically, a case study focusing on the extraction of high-density buildings in urban villages was designed. Using the DJI Mavic 3 Enterprise equipped with network RTK, DOM of the study area were collected and processed using OBIA techniques to extract urban village buildings.
The generated images clearly revealed the chaotic surface features typical of urban villages. Even small, fragmented objects, such as water heaters and canopies, were distinctly visible, providing reliable data support for practical application needs. Regarding OBIA, the contours of image objects generated by MRS closely aligned with the actual land features, and the machine learning classification algorithms achieved relatively high accuracy. These results validated the feasibility of combining lightweight UAVs with OBIA techniques.
Considering data collection, data quality, and case results, this UAV demonstrated not only high portability, simplicity in workflow, and minimal regulatory constraints but also ensured data reliability, making it well-suited for this case study. As a practical reference, this study has the potential to promote the widespread application of similar UAVs across various fields.