Detection of Greenhouse and Typical Rural Buildings with Efficient Weighted YOLOv8 in Hebei Province, China

Bingkun Wang; Zhiyuan Liu; Jiangbo Xi; Siyan Gao; Ming Cong; Haixing Shang

doi:10.3390/rs17111883

,

and

¹

College of Geological Engineering and Geomatics, Chang’an University, Xi’an 710054, China

²

The State Key Laboratory of Loess, Xi’an 710054, China

³

Northwest Engineering Corporation Limited, Power China Group, Xi’an 710065, China

⁴

The Xi’an Key Laboratory of Clean Energy Digital Technology, Xi’an 710065, China

Remote Sens.2025, 17(11), 1883;https://doi.org/10.3390/rs17111883

Version Notes

Order Reprints

Abstract

The large-scale detection of greenhouses and rural buildings is important for natural resource surveys and farmland protection. However, in rural and mountainous areas, the resolution and accessibility of remote sensing satellite images from a single source are poor, making it difficult to detect greenhouses and rural buildings effectively and automatically. In this paper, a wide-area greenhouse and rural building (GH-RB) detection dataset is constructed as a benchmark by using high-resolution remote sensing images of Hebei Province, China, collected from the image platform. Then, Efficient Weighted YOLOv8 (EW-YOLOv8) is proposed by using the dataset with unbalanced and small samples of greenhouse and rural buildings, in which the improvement measures are introduced. These include the following: (1) replacing the traditional up-sampler with DySample in the up-sampling part of the neck of the model to recover the lost details after multiple down-sampling operations; (2) replacing the calculation loss function with NWD loss to compensate for the sensitivity of the IoU to the position deviation of small objects; and (3) introducing a weight function named Slide to resolve the data imbalance between easy and difficult samples. The experimental results show that the proposed method can achieve excellent object detection performance on the RSOD dataset compared with state-of-the-art methods, proving the effectiveness of the proposed EW-YOLOv8. The results on the constructed GH-RB dataset show that the proposed method can detect greenhouse and rural buildings quickly and accurately, which could help improve the efficiency of investigating farmland usage and performing natural resource surveys.

Keywords:

greenhouse detection; rural building detection; high-resolution remote sensing image; dense and small object

1. Introduction

In the urbanization process of human society, artificial constructions gradually replace natural land surfaces, such as agricultural land, grassland, mountainous land, etc., which causes a reduction in available arable land and ecological damage. Therefore, it is important to obtain site information about greenhouses and other buildings and create reasonable regulations and land usage plans []. However, traditional manual surveys have many limitations, especially large-scale ground object surveys, which are very time-consuming and labor-intensive. In order to obtain real-time and accurate information about greenhouses and rural buildings, a more efficient detection method is urgently needed.

Remote sensing can obtain high-spatial-resolution, all-weather, and large-scale data. It is used to identify buildings, but its versatility is poor due to various imaging conditions, such as its scale, spectral range, and temporal resolution. It is noteworthy that methods based on LiDAR [,,,,,] cannot easily obtain 3D lidar data and that methods based on unmanned aerial vehicles (UAVs) [,,,] are not efficient in acquiring images in large-scale surveys; they are, therefore, not suitable for large-scale building surveys. However, if high-resolution remote sensing satellite images with multiple data sources are used, the versatility of remote sensing images is greatly improved, and the acquisition of construction site information is very efficient [].

In high-resolution images, the textural features, shape, and size of different greenhouses and buildings vary significantly []. For example, the same class of greenhouses from different areas in China can have different textural features and geometric characteristics. The roofs of different buildings in rural environments show different spectral variations, even on the same roof. In addition, some greenhouses and small buildings may be obscured by trees and other objects. All these facts further cause difficulties in building detection in different areas in China.

Visual interpretation is the conventional method of identifying objects from remote sensing images with high accuracy, but it has the following limitations: (1) interpreters are required to have rich knowledge of and experience in object interpretation; (2) many artificial resources are associated with low efficiency; (3) interpreters have subjective influence on the results and are prone to misjudgment; and (4) there are no clear evaluation standards for the visual interpretation of results and quantitative analysis. Therefore, a large number of automated methods combining satellite imagery and machine learning models have emerged in the past few years [,,], but it is still a challenge to detect buildings in remote sensing images automatically and efficiently.

Traditional target detection algorithms focus on feature extraction and feature classification by using the AdaBoost algorithm framework [], HOG features [], decision trees [], support vector machines [], etc. In recent years, deep learning has made great progress in the recognition and classification of remote sensing imagery [], with target detection models being mainly divided into two-stage and one-stage models. In two-stage algorithms, a series of sample candidate frames are generated by the algorithm; then, the samples are detected by convolutional neural networks, such as SPP-NET [], Faster-RCNN [], and Mask R-CNN []. One-stage algorithms directly regress the class probability and location information of the object through the backbone network, such as the YOLO [,,,,,,] series and SSD [].

However, these CNN-based methods often have difficulty in processing complex and dense small targets in remote sensing images, so they cannot be applied to obtain building site information. These methods generally show different learning behaviors when dealing with simple samples and difficult samples: simple samples are easy to learn but may lead to overfitting, while difficult samples have a slower learning speed and less stable performance on the sample but require more training. The imbalance between simple and difficult samples greatly limits the performance of the model. At the same time, the up-sampling methods of these methods are often too rough, resulting in the loss of fine-grained information in the feature recovery process. With the development of algorithms, some advanced up-samplers have been proposed, such as CARAFE [], DySample [], FADE [], and SAPA []. At the same time, some better loss functions have been proposed, such as SlideLoss [] and NWD []. However, existing research on greenhouse and rural building target detection still has some limitations.

The current research on automatic greenhouse recognition is relatively limited. M. B. Amri et al. [] used Mask R-CNN to detect greenhouses. However, in the dataset they used, the greenhouse samples were relatively simple and more widely distributed, which limited their practical significance. J. Feng et al. [] and A. Ma et al. [] proposed the Dual-Task method, which can detect greenhouses better compared with single-task methods, but it lacks real-time performance and is not suitable for large-scale rapid detection. J. Lin et al. [] proposed a large-scale greenhouse rapid mapping method using an ensemble learning and Google Earth Engine. However, the accuracy of the classifiers they selected still lags behind that of models using deep learning. The 1D-CNN classifier proposed by H. Sun et al. [] also has certain limitations. X. Liu et al. [] proposed an improved YOLOX, which made progress in accuracy, but it needs a large number of computational resources. Similarly, some of the existing effective improvements in object detection for rural buildings [,,,,] often result in excessive computational stress when there are limited computing resources.

The greenhouses and buildings in rural areas of Hebei, China, as a research area, have typical optical characteristics in remote sensing images. To enable the large-scale and effective detection of these greenhouses and typical rural buildings, this paper first constructed a dataset in rural areas of Hebei Province, China, for training and testing. Then, the Efficient Weighted YOLOv8 model (EW-YOLOv8) is proposed for the detection of the two types of objects. The main contributions of this paper are as follows:

(1) A dataset of greenhouses and rural buildings (GH-RB) in Hebei Province, China, was constructed. It includes greenhouses and typical buildings with multi-sensor, multi-phase, and multi-resolution features, among others. These greenhouse and rural buildings are usually dense and small. Therefore, it is very challenging to detect them with high performance.

(2) The up-sampler DySample and the proposed loss function NWD-SlideLoss were combined to effectively extract the features of the greenhouses and rural buildings with reduced parameters of the learning models. Firstly, the lightweight up-sampler DySample not only ensures the lightweight nature of the model but also improves the up-sampling ability of the model. Then, the classification loss and boundary box regression loss are redefined by using the Slide and NWD functions.

(3) With the improved up-sampler and loss function, Efficient Weighted YOLOv8 (EW-YOLOv8) is proposed. It achieves the detection balance between easy samples and hard samples and reduces the loss of information by using an efficient up-sampler, which can detect small and dense objects well. EW-YOLOv8 has stronger feature learning ability, which is suitable for the efficient detection of large-scale ground objects. At the same time, the lightweight EW-YOLOv8 model can be deployed on devices with limited computing resources.

The rest of this paper is organized as follows: The study area, the greenhouse and rural building dataset, and the RSOD dataset are described in detail in Section 2. Section 3 introduces the EW-YOLOv8 target detection method. Section 4 shows the experimental setup and results. In Section 5, the model and its performance on the greenhouse and rural building datasets are discussed. The conclusion is given in Section 6.

2. Data and Study Areas

2.1. Wide-Area Greenhouses and Rural Building Detection Dataset

The constructed greenhouse and rural building (GH-RB) dataset from Hebei Province, China, includes various properties, such as multi-sensor, multi-phase, and multi-resolution. Even images from the same area have different resolutions, sizes, and colors due to varying sensors and times. The data are divided into two categories:

Greenhouses: Greenhouses are closely connected and similar in shapes but have different sizes, directions, spectral emissivity, and roof materials [].

Buildings: Buildings have different forms and large-scale differences, including villas, villages, factories, and tall buildings, with each exhibiting multiple types within a single class of characteristics []. Villas are typically built in suburbs or scenic spots, with small and not significant features in remote sensing images, which are difficult to distinguish.

The objects in this dataset are highly dense. As shown in Figure 1, the rural building category contains a very rich variety of buildings, and most of them are small objects in remote sensing images. It is worth noting that in some rural areas and their surroundings, there are usually small buildings with ultra-high density. We have enlarged the image from this area in Figure 1 in Figure 2. It is difficult to distinguish the boundaries of each building in the remote sensing image, which places high demands on the detection ability of the model. In practice, it generates significant difficulties in detecting them automatically. The dataset consists of 485 images, including two types of objects: rural buildings and greenhouses, as shown in Table 1.

Figure 1. Samples of greenhouses and rural buildings in the constructed dataset. (a,b) The optical and distribution characteristics of greenhouses in remote sensing images. (c,d) A scene where greenhouses and buildings appear simultaneously in remote sensing images. (e–h) The optical and distribution characteristics of some buildings. The target boxes for greenhouses are shown in red, and the target boxes for typical buildings are shown in green.

Figure 2. Sample of high-density buildings in rural areas and their surrounding areas.

Table 1. Dataset partition.

Each sample in the dataset has been calibrated and reviewed by professionals. Due to the large size of the images, there may be human errors in the calibration of individual samples in the dataset. However, a large number of samples ensures sufficient quality assurance.

2.2. Study Area

Hebei Province was chosen as the research area, which has a large size of farmland and a large number of greenhouses and rural buildings. A total of 1500 sample points were manually selected from the 91-satellite-image platform. An in-depth analysis of building expansion revealed that in some areas, the greenhouses built on farmland were misplaced and did not make effective use of the land. In some mountainous areas, buildings are constructed haphazardly, which has a negative impact on the ecology of mountainous areas. The size of the remote sensing images is too large to be directly input into the deep learning model, so the acquired remote sensing images were clipped with the overlapping rate of 25% by using a sliding window, as shown in Figure 3.

Figure 3. Sliding clipping of remote sensing image. Yellow square represents sliding window, and white arrow represents main direction of window sliding.

2.3. Dataset Preprocessing

After confirmation by experts, the quality of the original images was improved and could be used directly. However, to prevent the overfitting of deep learning models, we adopted data augmentation methods. We applied random stretching and rotation to the images and also utilized a new data augmentation method as shown below.

(1) Multi-temporal remote sensing image data from different regions and seasons were selected to improve the robustness and generalizability of the model.

(2) There were multiple types of samples in the same category, and the number of samples with fewer types was increased in the category. We cut the data with few samples by sliding in a multi-step manner to achieve a balance in the number of samples across categories.

2.4. RSOD Dataset

To validate the proposed approach, we conducted an experiment by using the challenging RSOD [] dataset in addition to the constructed GH-RB dataset.

RSOD is a remote sensing image dataset released by Wuhan University consisting of 976 images divided into four categories, i.e., aircraft (4399), playground (191), overpasses (180), and oil tanks (1586), as shown in Figure 4. For remote sensing target detection, a target that occupies less than 0.12% of the pixels in the entire image is defined as a small target. Targets occupying 0.12% to 0.5% of the pixels in the entire image are defined as medium targets. Targets occupying more than 0.5% of the entire image are defined as large targets. In the RSOD dataset, aircraft and oil tanks are usually small- and medium-sized targets, while playgrounds and overpasses are larger targets. The sample size is diverse, obtained under different lighting conditions, making the identification of this dataset challenging and representative.

Figure 4. Samples from RSOD dataset.

3. Methods

It is a challenging problem to detect greenhouses and rural buildings automatically and effectively using remote sensing images because (1) it is difficult to extract useful features of greenhouses and rural buildings due to the low resolution of optical remote sensing images in rural and mountainous areas, along with information loss caused by rough up-sampling; (2) there is an imbalance problem between simple and difficult samples because of different lighting, seasons, shades, etc.; (3) it is easy to miss or incorrectly detect objects because of the dense and small characteristics of greenhouses and rural buildings in poor-resolution remote sensing images in rural and mountainous areas. To address the above problems, we propose Efficient Weighted YOLOv8 (EW-YOLOv8), which is better suited for remote sensing object detection, even though remote sensing data contain a large number of imbalanced samples and small objects. Figure 5 shows the architecture of the proposed EW-YOLOv8. First, it introduces an ultralightweight and efficient dynamic up-sampler, DySample [], to reduce the information loss compared with YOLOv8. Second, we propose a new loss function, NWD-SlideLoss, which introduces NWD [] to compute the loss of boundary box regression and the sliding [] function to improve the computation of classification loss.

Figure 5. The architecture of the proposed EW-YOLOv8 model. The up-sampling part is replaced with the lightweight up-sampler DySample, and the loss function is replaced with NWD-SlideLoss. In NWD-SlideLoss, NWD is used to improve the regression performance of boundary box loss, and the weight function Slide is proposed to optimize the classification loss. The EW-YOLOv8 model in the detection stage is consistent with the traditional YOLOv8 model. χ₁ and χ₂ represent input features and up-sampling features, respectively.

3.1. DySample

Before feature fusion, the size of the feature maps must be unified, which inevitably leads to up-sampling. The traditional up-sampling method can cause the loss of feature map information due to the low efficiency of the baseline network in recovering feature maps effectively through up-sampling. To mitigate this loss, we introduce DySample.

Compared with CARAFE, FADE, and SAPA, DySample does not use time-consuming dynamic convolution or additional subnetworks for generating dynamic kernels but builds point up-sampling, which makes DySample simple enough to be used in diverse application scenarios.

The overall operation of DySample involves inputting a feature graph with size C₁ × H₁ × W₁, and χ₁ generating a set of sample points S with size 2 × H₂ × W₂ after the sampling point generator, where 2 represents the dimension size. Then, the grid sample function is used for resampling χ₁, and the up-sampled feature graph χ₂ is obtained, as shown in Figure 1.

The sampling point generator works in two ways, primarily reflected in the different methods of generating offset. First, as shown in Figure 6a, to satisfy the theoretical edge condition between overlapping and non-overlapping, the offset is multiplied by 0.25, and the formula is as follows:

σ = 0.25 \cdot l i n e a r (χ 1),

(1)

where σ indicates the offset and 0.25 is the static range factor. Second, as shown in Figure 6b, a dynamic range with 0.25 as the center and a value range of [0, 0.5] can be obtained by using the sigmoid function and a static factor of 0.5. The formula is as follows:

σ = 0.5 \cdot s i g m o i d (l i n e a r_{1} (χ 1)) \cdot l i n e a r_{2} (χ 1)

(2)

Figure 6. Sampling point generator in DySample. (a,b) represent two different sampling point generators, respectively.

In addition, as shown in Figure 5, the sampling point

S

is denoted by

S = ς + σ

(3)

where

ς

is the original sampling grid.

3.2. Normalized Wasserstein Distance

The sensitivity of the IoU is too high for small objects, and the IoU is greatly reduced by small deviations in position, resulting in incorrect label assignments and similar features between positive and negative samples. Therefore, it is difficult for the network to converge. Unlike other advanced IoUs [,,], NWD is not sensitive to objects of different scales, and it can measure similarities between tiny objects.

NWD mainly uses Wasserstein Distance to measure the similarity of the boundary box, which is performed in two ways: the boundary box is modeled as a two-dimensional Gaussian distribution, or Normalized Wasserstein Distance (NWD) is used to measure the similarity of the derived Gaussian distribution.

Specifically, given two 2D Gaussian distributions

μ_{1} = N (m_{1}, \sum_{1})

and

μ_{2} = N (m_{2}, \sum_{2})

, the second-order Wasserstein distance between them is defined as

W_{2}^{2} (μ_{1}, μ_{2}) = {‖m_{1} - m_{2}‖}_{2}^{2} + {‖\sum_{1}^{1 / 2} - \sum_{2}^{1 / 2}‖}_{F}^{2}

(4)

where

{‖\cdot‖}_{F}

is the Frobenius norm.

To convert this distance into a measure of similarity between 0 and 1, NWD normalized in exponential form is defined as

N W D (N_{1}, N_{2}) = e x p (- \frac{\sqrt{W_{2}^{2} (N_{1}, N_{2})}}{C_{2}})

(5)

where C₂ is the constant associated with the dataset and is usually given as the average absolute size of the target detection dataset.

3.3. SlideLoss

In an optical image dataset, there are different types of objects within the same category due to varying light conditions, seasons, and shadows. Different types of objects have varying levels of learning difficulties for deep learning, which means that samples are imbalanced for learning. Therefore, the model may not learn enough about individual types of objects, leading to the problem of missed detection.

Sample imbalance is a common problem in remote sensing images. We solved this problem by introducing SlideLoss. First, SlideLoss distinguishes easy and difficult samples by using the size of the predicted IoUs and real boxes. In order to reduce the number of hyperparameters, the average IoU value of all boxes is used as the threshold µ. Samples less than µ are regarded as negative samples, and samples greater than µ are regarded as positive samples. However, samples close to the boundary often suffer greater losses due to unclear classification. Higher weights are given to difficult samples. The samples are divided into positive and negative samples based on the parameter µ. Then, the samples near the boundary are highlighted with a weighting function, Slide, which is denoted as

f (x) = \{\begin{matrix} 1 \\ e^{1 - μ} \\ e^{1 - x} \end{matrix} \begin{matrix} x < μ - 0.1 \\ μ - 0.1 < x < μ \\ x \geq μ \end{matrix}

(6)

where µ is the positive–negative sample threshold parameter.

3.4. EW-YOLOv8

The original architecture of the YOLOv8 model is introduced, and the loss function and the up-sampling module are improved to make the model more efficient.

We propose a loss function denoted as NWD-SlideLoss. Simultaneously, NWD loss and the weight function Slide are introduced. The NWD loss reduces the sensitivity of the model to the position deviation of small objects and makes the predicted frame more consistent with the real frame. The weight function Slide solves the imbalance problem between simple and difficult samples.

It is important to note that the roles of the NWD loss and weight function Slide in the loss function are not isolated. When the weight function Slide defines the positive–negative sample threshold parameter µ, it is necessary to compute the average IoU value of all boxes, and the boundary box regression loss improved by NWD can provide more reasonable IoU values, which enable the weight function Slide to identify simple and difficult samples more easily and achieve more reasonable weight allocation.

Due to the limited feature information of small objects and difficult samples, it is necessary to improve the efficiency of up-sampling to avoid information loss in the up-sampling process. DySample, an up-sampling method based on point sampling, is effective and does not result in significant information loss.

Therefore, EW-YOLOv8 improves the efficiency of the model by improving the loss function and increasing the efficiency of up-sampling. It strengthens the ability of the learning model to detect small objects and other difficult samples.

4. Experimental Results

In this section, we describe the experimental setup in Section 4.1. Then, the evaluation indices of the experiment are described in Section 4.2. Finally, we verify the universality of application and overall performance of the proposed EW-YOLOv8 in Section 4.3 and Section 4.4, respectively.

4.1. Experimental Setting

The wide-area building dataset was divided into training, validation, and test sets in the ratio of 7:2:1. In the RSOD dataset, we used 75% of the images for training, and the remaining 25% were used for testing. In addition, the RSOD dataset was enhanced by performing horizontal flipping, scaling, and random cropping.

The experimental software environment used Python 3.10, Torch 2.4.1, and other related toolkits. All experiments were conducted on the 12th Gen Intel(R) Core (TM) i7-12700H 2.30 GHz CPU with 32 GB of memory. We used stochastic gradient descent (SGD) to optimize the parameters of the model. The initial learning rate of SGD was 0.01. The batch size was 1, and the number of epochs was set to 200 for both datasets. Among these, we mainly considered the IoU and conf metrics, as they are related to the computation of the loss function. In order to find the most suitable IoU and conf indicators, we conducted extensive parameter tuning experiments. The final IoU and conf were 0.5 and 0.0, respectively. We used the pre-training weight of YOLOv8n provided by Ultralytics for fine-tuning. In order to verify that the model was efficient enough to achieve good performance with fewer parameters, we chose YOLOv8n as the network structure for fine-tuning and conducted experiments. In the comparative experiment, we compared YOLO series models with our method, using networks of similar sizes, such as YOLOv9t, YOLOv10n, YOLOv11n, and YOLOv12n.

4.2. Experimental Indices

The wide-area greenhouses and rural building detection dataset includes objects with different morphologies and spectra and diverse scales, resulting in fewer features being sufficiently learned.

In this paper, average precision (AP), mean average precision (mAP), and P-R (precision–recall) curves were used to evaluate the performance of the model. The calculation formulas of precision and recall are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(7)

R e c a l l = \frac{T P}{T P + F N}

(8)

where TP is the number of correctly detected objects, FP is the number of erroneously detected objects, and FN is the number of missed objects. The AP value is the area enclosed by the P-R curve, whose calculation formula is as follows:

A P = \int_{0}^{1} P d R

(9)

where P is precision and R is recall.

mAP is the weighted average of the average accuracy (AP) of all categories, including mAP@50 and mAP@50-95, which indicate the mAP calculated when the IoU threshold is 0.5 and the average mAP calculated when the IoU threshold is 0.5 to 0.95 (0.05 is the step size), respectively. The calculation formula of mAP is as follows:

m A P = \frac{\sum_{1}^{M} A P}{M}

(10)

where M is number of classes.

4.3. Validation of EW-YOLOv8 on RSOD Dataset

To validate the performance of EW-YOLOv8, we compared our method with recent methods on the RSOD dataset, and the results are shown in Table 2. The proposed EW-YOLOv8 achieved a maximum mAP of 98.6%. For specific categories, other methods performed significantly worse than ours in detecting small targets, such as airplanes (YOLOv8: 96.8%; SOD-YOLOv10: 97.0%) and fuel tanks (YOLOv8: 98.1%; SCDFMixer: 98.1%). In contrast, our method achieved the highest accuracies of 97.9% and 99.5%. In addition, for the playground category, our mAP reached 99.5%, only 0.5% lower than that of CF2PN, but the overall mAP of CF2PN was 5.8% lower than that of our method. Our method performed well in detecting overpasses, with an mAP 3.5% higher than that of CANet and significantly higher than those of other models. Due to the large number of occlusions and a small number of overpasses in remote sensing images, overpasses are difficult samples in this dataset, indicating that our model can detect difficult samples well.

Table 2. Performance of different detection models on RSOD dataset.

Figure 7 visualizes the test results of our method on the RSOD dataset. As shown in the first two rows, the aircraft and oil tanks at the edges of the image or in the distance are accurately identified. As shown in the third row, there are trees around the overpass, and its background is similar to the target; however, the model can still detect it successfully. In addition, as shown in the fourth row, the model performs well on playground samples without error detection, which demonstrates the robustness of our method.

Figure 7. The detection results of EW-YOLOv8 on the RSOD dataset. (a–d) The aircraft category, (e–h) the oil tank category, (i–l) the overpass category, and (m–p) the playground category.

4.4. Accuracy Analysis of Detection Results Based on EW-YOLOv8

The YOLO series models have the characteristics of being lightweight and presenting fast detection speed, making them suitable for the detection of buildings in situations where computing resources are limited. Based on the GH-RB dataset, we further validated the performance of the EW-YOLOv8 model and evaluated the effectiveness of each newly added module. Table 3 shows the experimental results of the latest YOLO models on the wide-area greenhouse and rural building detection dataset. The proposed EW-YOLOv8 achieved the best results. Specifically, the average precision for greenhouses of the model with the introduced NWD loss alone was improved by 0.9%, and the introduction of NWD-SlideLoss further strengthened the average precision for greenhouses while improving the detection ability of the model for all categories. The introduced DySample greatly improved the average precision and the overall detection performance across all categories. Compared with the YOLOv8 method, the AP (rural building), AP (greenhouse), and mAP@50-95 indices increased by 0.4%, 1.3%, and 1.1%, respectively. It is worth noting that the latest release of YOLOv12 achieved the best performance in greenhouse testing, reaching 83.3%. However, in terms of overall performance, YOLOv12 was closer to the traditional YOLOv8 model and was inferior to the proposed EW-YOLOv8. Specifically, in terms of mAP50 and mAP@50-95, EW-YOLOv8 outperformed YOLOv12 by 0.6% and 0.9%, respectively.

Table 3. Performance comparison of different models.

As shown in Figure 8a,b, YOLOv8 is prone to omissions and false detection instances when the optical features of the target are similar. Figure 8c shows that the performance of YOLOv8 is severely affected when the object is obstructed by trees or other ground objects. Figure 8d shows the limitations of YOLOv8 in detecting dense and small objects. EW-YOLOv8 addresses these issues in a targeted manner. To sum up, the adopted up-sampling and the improvement in the loss function make EW-YOLOv8 overall fully competitive.

Figure 8. Comparison of YOLOv8 and EW-YOLOv8 detection results of greenhouses and rural buildings in our dataset. (A) Original image, (B) YOLOv8 detection result, (C) EW-YOLOv8 detection result, and (D) ground truth. (a–d) Each column represents the same image. Greenhouses are represented by blue frames and buildings by yellow frames.

5. Discussion

The GH-RB dataset has the characteristics of being multi-source, multi temporal, and high-resolution and presents rich attribute information. This interferes with the convergence of the learning model during training. This is related to our selection of images with varying reflectivity at different times when building the dataset. This construction method accurately reflects the optical characteristics of land cover in the research area throughout the year. As shown in Figure 9, images of the same area have different optical characteristics due to variations in time and reflectivity. Therefore, when using the EW-YOLOv8 model to detect such images, the results are often different. The method proposed based on this dataset has not only the ability to detect dense, small targets but also the ability to learn more complex optical features. The proposal of EW-YOLOv8 breaks through the application of traditional methods in a single time period and has broader application potential, as verified on the RSOD dataset.

Figure 9. Detection results of images with different colors and textures. (a,b) show the same area, (c,d) show the same area, and (a–d) are the detection results of the EW-YOLOv8 model. Greenhouses are represented by green frames and buildings by red frames.

As shown in Table 4, the proposed EW-YOLOv8 achieves increased performance in model efficiency (GFLOPs) by reducing the number of parameters and controlling gradient descent while maintaining a certain degree of accuracy or computational effectiveness. This indicates that while the EW-YOLOv8 model is lightweight, it can improve its processing ability for computation-intensive tasks, reduce the risk of overfitting, and enhance the generalization ability of the model. This makes EW-YOLOv8 easier to deploy on resource-constrained devices and improves the actual running speed of the model. Specifically, the parameters and gradients of the original YOLOv8 are 2.94 times larger than those of EW-YOLOv8, while the reasoning speed of EW-YOLOv8 is 1.36 times faster. The performance of EW-YOLOv8 in this aspect is also higher than other updated versions of the YOLO model.

Table 4. Comparison of model complexity and operation efficiency.

As shown in Figure 10, we present the precision–recall curves of our model during training on the GH-RB and RSOD datasets. Unlike Figure 10b, Figure 10a shows a sharp point on the curve representing buildings. This means that when the recall metric reaches around 50%, the accuracy of model detection for small buildings is significantly affected. In future work, we will further refine the precise positioning of small targets.

Figure 10. Precision–recall curves. (a) GH-RB dataset (b) RSOD dataset.

6. Conclusions

This paper aims to overcome several challenges in greenhouse detection, such as difficulties in obtaining image data, low resolution, and dense and small targets. To tackle these issues, we first construct a high-resolution dataset named GH-RB by using images from Hebei Province, China, which serves as a benchmark for the wide-area detection of greenhouses and rural buildings. Then, we propose the Efficient Weighted YOLOv8 (EW-YOLOv8) model, tailored for this task, particularly focusing on handling the imbalance in sample difficulty and the small sample sizes of greenhouses and rural buildings. The key improvements in EW-YOLOv8 include (1) replacing the traditional up-sampler with DySample and (2) combining the weight functions Slide and NWD loss effectively. The experimental results show that the proposed EW-YOLOv8 achieves excellent detection performance on the RSOD dataset. Furthermore, the proposed EW-YOLOv8 method was applied to the constructed GH-RB dataset, and the results demonstrated its effectiveness and efficiency in detecting dense and small greenhouses and rural buildings in wide areas in remote sensing images. Due to its lightweight property, the proposed method can be implemented on multiple advanced remote sensing platforms, enabling the large-scale real-time monitoring of the changes in greenhouses and buildings.

Author Contributions

Conceptualization, B.W. and J.X.; methodology, B.W. and Z.L.; validation, B.W. and H.S.; formal analysis, M.C.; investigation, B.W.; resources, J.X.; data curation, B.W.; writing—original draft preparation, B.W. and S.G.; writing—review and editing, Z.L. and J.X.; visualization, B.W.; supervision, J.X. and Z.L.; project administration, J.X.; funding acquisition, J.X. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China (2023YFC3008300, 2023YFC3008304, and 2022YFC3004302), in part by the Major Program of National Natural Science Foundation of China (41941019), in part by National Natural Science Foundation of China (42371356, 42171348, and 41929001), in part by Shaanxi Province Science and Technology Innovation Team (2021TD-51) and Shaanxi Province Geoscience Big Data and Geohazard Prevention Innovation Team (2022), in part by Fundamental Research Funds for the Central Universities (300102262202, 300102260301/087, 300102264915, 300102260404/087, 300102262902, 300102269103, 300102269304, 300102269205, and 300102262712), in part by Northwest Engineering Corporation Limited Major Science and Technology Projects (XBY-YBKJ-2023-23), and in part by The Key Scientific and Technological Project of Power China Corporation (KJ-2023-022).

Data Availability Statement

The original data presented in the study are openly available in the RSOD dataset at https://github.com/RSIA-LIESMARS-WHU/RSOD-Dataset- (accessed on 7 April 2025). The traditional YOLO series models and corresponding pre-training results of Ultralytics can be obtained from https://github.com/ultralytics/ultralytics (accessed on 7 April 2025).

Conflicts of Interest

Author Haixing Shang was employed by the company Northwest Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Choi, H.O. An evolutionary approach to technology innovation of cadastre for smart land management policy. Land 2020, 9, 50. [Google Scholar] [CrossRef]
Ji, Y.; Wu, W.; Wan, G.; Zhao, Y.; Wang, W.; Yin, H.; Tian, Z.; Liu, S. Segment Anything Model-Based Building Footprint Extraction for Residential Complex Spatial Assessment Using LiDAR Data and Very High-Resolution Imagery. Remote Sens. 2024, 16, 2661. [Google Scholar] [CrossRef]
Nahhas, F.H.; Shafri, H.Z.; Sameen, M.I.; Pradhan, B.; Mansor, S. Deep learning approach for building detection using lidar–orthophoto fusion. J. Sens. 2018, 2018, 7212307. [Google Scholar] [CrossRef]
Hermosilla, T.; Ruiz, L.A.; Recio, J.A.; Estornell, J. Evaluation of automatic building detection approaches combining high resolution images and LiDAR data. Remote Sens. 2011, 3, 1188–1210. [Google Scholar] [CrossRef]
Wierzbicki, D.; Matuk, O.; Bielecka, E. Polish cadastre modernization with remotely extracted buildings from high-resolution aerial orthoimagery and airborne LiDAR. Remote Sens. 2021, 13, 611. [Google Scholar] [CrossRef]
Sun, Y.; Fu, Z.; Sun, C.; Hu, Y.; Zhang, S. Deep multimodal fusion network for semantic segmentation using remote sensing image and LiDAR data. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5404418. [Google Scholar] [CrossRef]
Liu, Z.; Gao, X.; Yang, Y.; Xu, L.; Wang, S.; Chen, N.; Wang, Z.; Kou, Y. EDT-Net: A Lightweight Tunnel Water Leakage Detection Network Based on LiDAR Point Clouds Intensity Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 7334–7346. [Google Scholar] [CrossRef]
Zheheng, L.; Peng, D.; Fuquan, J.; Sen, S.H.; Rulan, W.E.; Gangsheng, X.I. The application of illegal building detection from VHR UAV remote sensing images based on convolutional neural network. Bull. Surv. Mapp. 2021, 4, 111. [Google Scholar]
Shi, L.; Zhang, Q.; Pan, B.; Zhang, J.; Su, Y. Global-local and occlusion awareness network for object tracking in UAVs. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8834–8844. [Google Scholar] [CrossRef]
Hou, W.; Wu, H.; Wu, D.; Shen, Y.; Liu, Z.; Zhang, L.; Li, J. Small Object Detection Method for UAV Remote Sensing Images Based on αS-YOLO. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 8984–8994. [Google Scholar] [CrossRef]
Niu, K.; Wang, C.; Xu, J.; Liang, J.; Zhou, X.; Wen, K.; Lu, M.; Yang, C. Early Forest Fire Detection with UAV Image Fusion: A Novel Deep Learning Method Using Visible and Infrared Sensors. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6617–6629. [Google Scholar] [CrossRef]
Huang, X.; Chen, K.; Tang, D.; Liu, C.; Ren, L.; Sun, Z.; Hänsch, R.; Schmitt, M.; Sun, X.; Huang, H.; et al. Urban building classification (UBC) V2—A benchmark for global building detection and fine-grained classification from satellite imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5620116. [Google Scholar] [CrossRef]
Konstantinidis, D.; Stathaki, T.; Argyriou, V.; Grammalidis, N. Building detection using enhanced HOG–LBP features and region refinement processes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 888–905. [Google Scholar] [CrossRef]
Chen, Y.; He, C.; Guo, W.; Zheng, S.; Wu, B. Mapping urban functional areas using multisource remote sensing images and open big data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7919–7931. [Google Scholar] [CrossRef]
Ball, J.E.; Anderson, D.T.; Chan, C.S. Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community. J. Appl. Remote Sens. 2017, 11, 042609. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the ICML’96: Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; Volume 96, pp. 148–156. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Tsangaratos, P.; Ilia, I. Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides 2016, 13, 305–320. [Google Scholar] [CrossRef]
Vladimir, V.N.; Vapnik, V. The Nature of Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Jeon, J.; Jeong, B.; Baek, S.; Jeong, Y.S. Hybrid malware detection based on Bi-LSTM and SPP-Net for smart IoT. IEEE Trans. Ind. Inform. 2021, 18, 4830–4837. [Google Scholar] [CrossRef]
Jiang, D.; Li, G.; Tan, C.; Huang, L.; Sun, Y.; Kong, J. Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model. Future Gener. Comput. Syst. 2021, 123, 94–104. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Fang, W.; Wang, L.; Ren, P. Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access 2019, 8, 1935–1944. [Google Scholar] [CrossRef]
Huang, R.; Pedoeem, J.; Chen, C. YOLO-LITE: A real-time object detection algorithm optimized for non-GPU computers. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 2503–2510. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Lin, Y.; Li, J.; Shen, S.; Wang, H.; Zhou, H. GDRS-YOLO: More efficient multiscale features fusion object detector for remote sensing images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6008505. [Google Scholar] [CrossRef]
Zhang, H.; Xu, D.; Cheng, D.; Meng, X.; Xu, G.; Liu, W.; Wang, T. An improved lightweight yolo-fastest V2 for engineering vehicle recognition fusing location enhancement and adaptive label assignment. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2450–2461. [Google Scholar] [CrossRef]
Zhao, T.; Feng, R.; Wang, L. SCENE-YOLO: A One-stage Remote Sensing Object Detection Network with Scene Supervision. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5401515. [Google Scholar] [CrossRef]
Meng, S.; Shi, Z.; Pirasteh, S.; Ullo, S.L.; Peng, M.; Zhou, C.; Gonçalves, W.N.; Zhang, L. TLSTMF-YOLO: Transfer Learning and Feature Fusion Network for Earthquake-Induced Landslide Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5610712. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference 2016, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to upsample by learning to sample. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6027–6037. [Google Scholar]
Lu, H.; Liu, W.; Fu, H.; Cao, Z. FADE: Fusing the assets of decoder and encoder for task-agnostic upsampling. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland; pp. 231–247. [Google Scholar]
Lu, H.; Liu, W.; Ye, Z.; Fu, H.; Liu, Y.; Cao, Z. SAPA: Similarity-aware point affiliation for feature upsampling. Adv. Neural Inf. Process. Syst. 2022, 35, 20889–20901. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
Amri, M.B.; Larabi, M.E.A.; Bakhti, K.; Meroufel, H. Plastic Greenhouses Detection from Alsat-2A Satellite Data Using Mask R-CNN. In Proceedings of the2022 IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Istanbul, Turkey, 7–9 March 2022; pp. 82–85. [Google Scholar]
Feng, J.; Wang, D.; Yang, F.; Huang, J.; Wang, M.; Tao, M.; Chen, W. PODD: A dual-task detection for greenhouse extraction based on deep learning. Remote Sens. 2022, 14, 5064. [Google Scholar] [CrossRef]
Ma, A.; Chen, D.; Zhong, Y.; Zheng, Z.; Zhang, L. National-scale greenhouse mapping for high spatial resolution remote sensing imagery using a dense object dual-task deep learning framework: A case study of China. ISPRS J. Photogramm. Remote Sens. 2021, 181, 279–294. [Google Scholar] [CrossRef]
Lin, J.; Jin, X.; Ren, J.; Liu, J.; Liang, X.; Zhou, Y. Rapid mapping of large-scale greenhouse based on integrated learning algorithm and Google Earth Engine. Remote Sens. 2021, 13, 1245. [Google Scholar] [CrossRef]
Sun, H.; Wang, L.; Lin, R.; Zhang, Z.; Zhang, B. Mapping plastic greenhouses with two-temporal sentinel-2 images and 1d-cnn deep learning. Remote Sens. 2021, 13, 2820. [Google Scholar] [CrossRef]
Liu, X.; Xiao, B.; Jiao, J.; Hong, R.; Li, Y.; Liu, P. Remote sensing detection and mapping of plastic greenhouses based on YOLOX+: A case study in Weifang, China. Comput. Electron. Agric. 2024, 218, 108702. [Google Scholar] [CrossRef]
Liu, J.; Huang, H.; Sun, H.; Wu, Z.; Luo, R. LRAD-Net: An improved lightweight network for building extraction from remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 675–687. [Google Scholar] [CrossRef]
Chen, X.; Xiao, P.; Zhang, X.; Muhtar, D.; Wang, L. A cascaded network with coupled high-low frequency features for building extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10390–10406. [Google Scholar] [CrossRef]
Wang, Q.; Zhang, M.; Ren, J.; Li, Q. Exploring Context Alignment and Structure Perception for Building Change Detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5609910. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, Y.; Wang, D.; Ma, G. Damaged Building Object Detection From Bi-temporal Remote Sensing Imagery: A Cross-task Integration Network and Five Datasets. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5648827. [Google Scholar] [CrossRef]
Gao, W.; Sun, Y.; Han, X.; Zhang, Y.; Zhang, L.; Hu, Y. AMIO-Net: An attention-based multiscale input–output network for building change detection in high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2079–2093. [Google Scholar] [CrossRef]
Liu, L.; Bai, Y.; Li, Y. Locality-aware rotated ship detection in high-resolution remote sensing imagery based on multiscale convolutional network. IEEE Geosci. Remote Sens. Lett. 2021, 19, 3502805. [Google Scholar] [CrossRef]
Luo, M.; Ji, S.; Wei, S. A diverse large-scale building dataset and a novel plug-and-play domain generalization method for building extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 4122–4138. [Google Scholar] [CrossRef]
Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
Zhang, Y.; Li, H.; Wang, R.; Zhang, M.; Hu, X. Constrained-SIoU: A metric for horizontal candidates in multi-oriented object detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 15, 956–967. [Google Scholar] [CrossRef]
Zhang, S.; Li, C.; Jia, Z.; Liu, L.; Zhang, Z.; Wang, L. Diag-IoU loss for object detection. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7671–7683. [Google Scholar] [CrossRef]
Ni, X.; Ma, Z.; Liu, J.; Shi, B.; Liu, H. Attention network for rail surface defect detection via consistency of intersection-over-union (IoU)-guided center-point estimation. IEEE Trans. Ind. Inform. 2021, 18, 1694–1705. [Google Scholar] [CrossRef]
Liang, B.; Su, J.; Feng, K.; Liu, Y.; Zhang, D.; Hou, W. SCDFMixer: Spatial–channel dual-frequency mixer based on satellite optical sensors for remote sensing multiobject detection. IEEE Sens. J. 2024, 24, 5383–5398. [Google Scholar] [CrossRef]
Sun, H.; Yao, G.; Zhu, S.; Zhang, L.; Xu, H.; Kong, J. SOD-YOLOv10: Small Object Detection in Remote Sensing Images Based on YOLOv10. IEEE Geosci. Remote Sens. Lett. 2025, 22, 8000705. [Google Scholar] [CrossRef]
Zhao, B.; Qin, Z.; Wu, Y.; Song, Y.; Yu, H.; Gao, L. A Fast Target Detection Model for Remote Sensing Images Leveraging Roofline Analysis on Edge Computing Devices. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 19343–19360. [Google Scholar] [CrossRef]
Yang, X.; Zhang, S.; Duan, S.; Yang, W. An effective and lightweight hybrid network for object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 62, 5600711. [Google Scholar] [CrossRef]
Yin, H.; Wang, H.; Zhu, Z. Progressive Dynamic Queries Reformation Based DETR for Remote Sensing Object Detection. IEEE Geosci. Remote Sens. Lett. 2025, 22, 6003705. [Google Scholar] [CrossRef]

Figure 1. Samples of greenhouses and rural buildings in the constructed dataset. (a,b) The optical and distribution characteristics of greenhouses in remote sensing images. (c,d) A scene where greenhouses and buildings appear simultaneously in remote sensing images. (e–h) The optical and distribution characteristics of some buildings. The target boxes for greenhouses are shown in red, and the target boxes for typical buildings are shown in green.

Figure 2. Sample of high-density buildings in rural areas and their surrounding areas.

Figure 3. Sliding clipping of remote sensing image. Yellow square represents sliding window, and white arrow represents main direction of window sliding.

Figure 4. Samples from RSOD dataset.

Figure 5. The architecture of the proposed EW-YOLOv8 model. The up-sampling part is replaced with the lightweight up-sampler DySample, and the loss function is replaced with NWD-SlideLoss. In NWD-SlideLoss, NWD is used to improve the regression performance of boundary box loss, and the weight function Slide is proposed to optimize the classification loss. The EW-YOLOv8 model in the detection stage is consistent with the traditional YOLOv8 model. χ₁ and χ₂ represent input features and up-sampling features, respectively.

Figure 6. Sampling point generator in DySample. (a,b) represent two different sampling point generators, respectively.

Figure 7. The detection results of EW-YOLOv8 on the RSOD dataset. (a–d) The aircraft category, (e–h) the oil tank category, (i–l) the overpass category, and (m–p) the playground category.

Figure 8. Comparison of YOLOv8 and EW-YOLOv8 detection results of greenhouses and rural buildings in our dataset. (A) Original image, (B) YOLOv8 detection result, (C) EW-YOLOv8 detection result, and (D) ground truth. (a–d) Each column represents the same image. Greenhouses are represented by blue frames and buildings by yellow frames.

Figure 9. Detection results of images with different colors and textures. (a,b) show the same area, (c,d) show the same area, and (a–d) are the detection results of the EW-YOLOv8 model. Greenhouses are represented by green frames and buildings by red frames.

Figure 10. Precision–recall curves. (a) GH-RB dataset (b) RSOD dataset.

Table 1. Dataset partition.

	Data Source	Data Resolution	Actual Resolution	Number of Images	Image Size	Number of Buildings
Building	Google map	1 m	>1 m	458	1024 × 1024	About 68,130
Greenhouse	Google map	1 m	>1 m	170	1024 × 1024	About 32,080

Table 2. Performance of different detection models on RSOD dataset.

Method	Aircraft	Oil Tank	Playground	Overpass	mAP
Faster R-CNN []	81.3	96.7	95.4	85.4	89.7
YOLOv10 []	95.8	96.9	98.2	75.0	91.5
CF2PN []	89.7	96.5	100	85.0	92.8
Casade R-CNN []	92.2	97.2	99.7	87.4	94.1
FTD-RLE []	92.0	98.3	99.9	90.5	95.2
CANet []	91.7	97.0	97.9	94.1	95.2
BAC-FSAR []	-	-	-	-	95.6
YOLOv8 []	96.8	98.1	99.3	88.6	95.7
SOD-YOLOv10 []	97.0	98.6	99.0	89.0	95.9
SCDFMixer []	97.8	98.1	99.5	89.4	96.2
Relation DETR []	-	-	-	-	97.4
EW-YOLOv8	97.9	99.5	99.5	97.6	98.6

Table 3. Performance comparison of different models.

	AP (Buildings)	AP (Greenhouses)	mAP	mAP@50-95
YOLOv8	54.2%	81.6%	67.9%	40.1%
YOLOv9	44.3%	77.0%	60.6%	33.1%
YOLOv10	40.7%	69.8%	55.2%	29.9%
YOLOv11	49.3%	82.3%	65.8%	37.7%
YOLOv12	53.0%	83.3%	68.2%	40.3%
YOLOv8 + NWD	54.2%	82.5%	68.4%	40.1%
YOLOv8 + NWD + DySample	54.6%	82.7%	68.7%	41.1%
EW-YOLOv8	54.6%	82.9%	68.8%	41.2%

Table 4. Comparison of model complexity and operation efficiency.

	Parameters	Gradients	GFLOPs
YOLOv8	2,690,598	2,690,582	6.9
YOLOv9	2,005,798	2,005,782	7.8
YOLOv10	2,707,820	2,707,804	8.4
YOLOv11	2,590,230	2,590,214	6.4
YOLOv12	2,538,486	2,538,470	6.0
EW-YOLOv8	914,870	914,854	9.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Detection of Greenhouse and Typical Rural Buildings with Efficient Weighted YOLOv8 in Hebei Province, China

Abstract

1. Introduction

2. Data and Study Areas

2.1. Wide-Area Greenhouses and Rural Building Detection Dataset

2.2. Study Area

2.3. Dataset Preprocessing

2.4. RSOD Dataset

3. Methods

3.1. DySample

3.2. Normalized Wasserstein Distance

3.3. SlideLoss

3.4. EW-YOLOv8

4. Experimental Results

4.1. Experimental Setting

4.2. Experimental Indices

4.3. Validation of EW-YOLOv8 on RSOD Dataset

4.4. Accuracy Analysis of Detection Results Based on EW-YOLOv8

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics