A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing

Du, Jingnan; Xu, Sucheng; Li, Jinshan; Duan, Jiakun; Xiao, Wu

doi:10.3390/rs16162981

Open AccessArticle

A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing

by

Jingnan Du

^†,

Sucheng Xu

^†,

Jinshan Li

,

Jiakun Duan

and

Wu Xiao

^*

School of Public Affairs, Zhejiang University, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2024, 16(16), 2981; https://doi.org/10.3390/rs16162981

Submission received: 16 July 2024 / Revised: 8 August 2024 / Accepted: 13 August 2024 / Published: 14 August 2024

(This article belongs to the Special Issue Advances in Remote Sensing for Crop Monitoring and Food Security)

Download

Browse Figures

Versions Notes

Abstract

Accurate and timely information on farmland size is crucial for agricultural development, resource management, and other related fields. However, there is currently no mature method for estimating farmland size in smallholder farming areas. This is due to the small size of farmland plots in these areas, which have unclear boundaries in medium and high-resolution satellite imagery, and irregular shapes that make it difficult to extract complete boundaries using morphological rules. Automatic farmland mapping algorithms using remote sensing data also perform poorly in small-scale farming areas. To address this issue, this study proposes a farmland size evaluation index based on edge frequency (ECR). The algorithm utilizes the high temporal resolution of Sentinel-2 satellite imagery to compensate for its spatial resolution limitations. First, all Sentinel-2 images from one year are used to calculate edge frequencies, which can divide farmland areas into low-value farmland interior regions, medium-value non-permanent edges, and high-value permanent edges (PE). Next, the Otsu’s thresholding algorithm is iteratively applied twice to the edge frequencies to first extract edges and then permanent edges. The ratio of PE to cropland (ECR) is then calculated. Using the North China Plain and Northeast China Plain as study areas, and comparing with existing farmland size datasets, the appropriate estimation radius for ECR was determined to be 1600 m. The study found that the peak ECR value for the Northeast China Plain was 0.085, and the peak value for the North China Plain was 0.105. The overall distribution was consistent with the reference dataset.

Keywords:

agriculture; farmland size; smallholder

1. Introduction

Farmland size has a significant impact on food security, environmental sustainability, and socio-economic development. Developing countries such as those in South Asia and Africa, where smallholder farmers dominate, face a series of challenges in agricultural development. An aging population reduces labor productivity [1,2]; limited arable land necessitates reliance on other agricultural inputs (such as pesticides and fertilizers) to increase land productivity. However, excessive application of fertilizers, herbicides, and pesticides can lead to soil compaction, disruption of soil microbial communities, and other detrimental effects, ultimately resulting in soil degradation [3]. Smallholder farmers’ limited capital accumulation and education often hinder their access to credit and other financial support, slowing the adoption of new production technologies and impeding the modernization of agriculture [4].

Although information on farmland size is crucial, accurate estimation is not easy. In recent years, thanks to the increased availability of very high-resolution (VHR) satellite imagery, visual interpretation of VHR images can provide insights into farmland size at a given location. In 2015, Fritz et al. initiated the Geo-Wiki field size campaign, mobilizing volunteers to interpret remote sensing images and label farmland sizes in sample areas [5]. By sampling 13,000 points globally and using inverse distance weighting spatial interpolation, they provided the first estimation of global farmland size at a 1 km resolution. However, due to insufficient sample points and the limitations of the spatial interpolation method, the results of the spatial interpolation were not accurate enough. Lesiv et al. addressed the shortcomings of Fritz’s research by launching another campaign, sampling ten-times more points (130,000) than their predecessor, and designing more meticulous and rigorous labeling specifications to improve estimation accuracy [6]. Their research contributed significantly to understanding global farmland size, but could not serve as a routine monitoring tool due to the unavoidable subjective interference and high labor costs of manual interpretation.

Designing automatic farmland size mapping algorithms using image processing and computer vision principles can overcome the shortcomings of manual interpretation and has achieved satisfactory results in areas dominated by large farms, such as the United States [7] and Europe [8]. Research on automatic farmland size mapping using remote sensing images can be classified into three categories based on principles: edge-based [7,9], region-based [10,11], and hybrid methods [8]. Edge-based algorithms identify farmland by recognizing farmland edges, using edge detection operators for convolution operations on images to extract edges with sudden changes in gray values. Region-based algorithms identify spatially similar pixels as the same region, or the same farmland, through region growing from random seed points. Hybrid algorithms combine the characteristics of the first two algorithms. The remote sensing images used in these studies can be divided into two categories based on spatial resolution: VHR images with meter-level or higher resolution and 10 m-level data such as the Sentinel and Landsat series. VHR images contain rich semantic information and often use a large number of labeled samples to train deep learning models to extract plot boundaries and complete plot identification. The latter relies more on expert experience to establish rules for extraction.

Distinct from crowdsourcing and automatic farmland size mapping methods, survey-based methods are the third approach, but the temporal and spatial resolution of the statistical data is very coarse [12]. Due to high costs, systematic censuses are often conducted only once every several years, and data released at the smallest administrative unit cannot finely depict spatial differences. Additionally, differences in statistical caliber, such as whether cultivated land includes permanent grassland or whether the size refers to farm size or farmland size, have led to significant discrepancies in different scholars’ estimates of global smallholder farming [13,14]. According to a study published in the journal World Development, 84% of the world’s 608 million farms operate on less than 2 hectares of farmland [12]. They use 12% of the world’s agricultural land and produce about 35% of the world’s food. In China, 98% of farmers operate small farms of less than 2 hectares, accounting for almost half of the world’s small farms [15]. Smallholder farming practices do not match modern agricultural development. For example, fragmented farmland hinders the use of large agricultural machinery, and scattered smallholders find it difficult to accumulate the capital needed for modern agriculture. To address this, the Chinese government encourages land transfer and consolidation, merging small plots into larger ones and concentrating farmland in the hands of a few farmers for large-scale agricultural operations [16]. This process of farmland concentration, strongly promoted by the Chinese government, is proceeding rapidly. However, there is a lack of suitable methods to track this process timely, accurately, and cost-effectively.

Automatic farmland size mapping algorithms using remote sensing images struggle to achieve satisfactory results in small-scale farming areas with fragmented farmland. Census-based data on farmers, due to its coarse temporal and spatial resolution, cannot finely depict spatiotemporal dynamics. While crowdsourcing, which utilizes online volunteers to interpret remote sensing images and label field sizes, can provide relatively accurate estimations, organizing such campaigns requires considerable human and material resources. Therefore, in the face of the research gap in farmland size measurement methods for small-scale farming areas, the purpose of this study is to develop a farmland size measurement algorithm based on medium and high-resolution satellite imagery. Meter-level VHR images, although they can clearly depict farmland boundaries, are limited in their large-scale application due to data and computational costs. Open-access data policies make 10 m-level medium and high-resolution satellite imagery a better choice for scientific research and practical applications. However, in small-scale farming areas, plots are smaller and more fragmented, and the boundaries between plots are thinner, often less than 10 m, which is smaller than the spatial resolution of the image and not obvious in medium and high-resolution remote sensing images. Therefore, farmland size measurement in small-scale farming areas is very difficult. Our innovation lies in strengthening the boundaries by overlaying multiple edge images and calculating the frequency of edge occurrences to classify the boundaries into different types. Boundaries in farmland can be divided into two categories: permanent edges (PE) and temporary edges. The former can be identified in images throughout the year, such as wider paved roads, while the latter can only be identified in images from certain seasons, such as traces of strip-planted crops or traces of agricultural machinery cultivation. Generally, the larger the farmland, the smaller the proportion of PE per unit of cropland area; conversely, the smaller the farmland, the larger the proportion. Therefore, we can use the proportion of PE per unit area of farmland (Edge Cropland Ratio, ECR) to reflect farmland size.

We selected China’s main grain-producing provinces as the study area, used Sentinel-2 remote sensing images from 2019, and calculated the ECR of sample points from the Geo-Wiki field size campaign. Comparison with manually labeled results verifies the feasibility and scientific validity of this method. The remote sensing images and computing resources provided by the Google Earth Engine cloud computing platform provide an opportunity for the large-scale application of this method. Therefore, this study attempts to answer the following questions: Can the ECR proposed in this study reflect farmland size? What is the appropriate radius size for sample points?

2. Materials

2.1. Data

The remote sensing data used in this study is the Sentinel-2 surface reflectance data from the European Space Agency (ESA, Paris in France). The 10 m resolution red and near-infrared bands are used to calculate the Normalized Difference Vegetation Index (NDVI). Vegetation indices effectively highlight the differences between vegetated and non-vegetated areas. Among them, NDVI is a long-standing and widely used index, calculated based on the 10 m red and near-infrared bands, thus preserving the high spatial resolution. Other vegetation indices require the use of 20 m or 60 m bands for calculations, resulting in a reduced spatial resolution. This data is acquired by the Sentinel-2 A/B satellites, which can achieve a revisit cycle of 3–5 days, and even 1–2 days in high-latitude regions. Data from 2019 was selected to maintain consistency with the year of Lesiv’s data.

We also utilized the 10 m global land cover product WorldCover provided by ESA to determine cropland areas [17]. The nominal year of this product is 2020, and it is produced using Sentinel-1/2 data, representing one of the most advanced land cover products currently available.

The farmland size annotation dataset comes from the research of Lesiv [6]. Lesiv’s Geo-Wiki Field Size campaign collected a total of 130,000 sample points globally. Through crowdsourcing, volunteers were mobilized to label the size of the largest farmland appearing around the sample points, referencing high-resolution remote sensing satellite images from providers such as Bing and Google, as well as medium and high-resolution satellite images from Landsat and Sentinel.

The data was preprocessed using the Google Earth Engine Python API [18], downloading NDVI remote sensing image tiles to the local computer. Python libraries such as scikit-learn, rasterio, and numpy were used for processing and calculating the ECR.

2.2. Study Region

We selected six major grain-producing provinces in China, located in the Northeast China Plain and North China Plain, as our study area (Figure 1). These provinces include Inner Mongolia, Liaoning, Shandong, Anhui, Hubei, and Henan. The study area encompasses important grain production regions in China, with diverse crop types and cropping systems. Statistical yearbooks (https://data.cnki.net/yearData (accessed on 15 July 2024)) show that in 2020, the study area contained 40.8 million hectares of cultivated land, producing 84.668 million tons of summer grain and 85.669 million tons of wheat, accounting for 32%, 59%, and 64% of China’s total production, respectively.

The Northeast China Plain has a temperate continental monsoon climate with cold, long winters and warm, short summers, with precipitation concentrated in the summer. The main crops grown are spring wheat, corn, soybeans, sorghum, and rice. The cropping system is single cropping, with sowing mainly in the spring and harvesting in autumn. Due to the short growing season, most crops are early-maturing varieties. The Northeast China Plain has fertile soil, with black soil being widely distributed, suitable for large-scale mechanized farming. However, it is prone to natural disasters such as drought and sandstorms in spring.

The North China Plain has a temperate monsoon climate with four distinct seasons and precipitation concentrated in the summer. The main crops grown are winter wheat, corn, soybeans, peanuts, cotton, vegetables, and fruits. The cropping system is double or triple cropping. Winter wheat is sown in autumn and harvested in the summer; corn, soybeans, and other crops are sown in spring or summer and harvested in autumn. Some areas can also grow a season of vegetables or other cash crops. The North China Plain has flat terrain, fertile soil, good irrigation conditions, and a long history of agricultural production. However, water resources are unevenly distributed, and some areas experience drought and flooding.

We overlaid the farmland size dataset onto the study area, using color and point size to display different levels of farmland. According to Lesiv’s estimates, large-scale farmland (L, XL) is mainly located in the Northeast China Plain, while the North China Plain is dominated by very small farmland (XS).

3. Methods

The algorithm proposed in this study utilizes one year of Sentinel-2 imagery to calculate the ECR index, which reflects farmland size. The entire process, as illustrated in Figure 2, can be divided into three parts: (1) In the preprocessing stage, image tiles of the sample area are acquired, and vegetation indices are calculated. (2) In the edge extraction stage, edge detection algorithms are applied to identify edges from the vegetation indices. The frequency of edge occurrences is calculated, and the Otsu automatic thresholding algorithm [19] is iteratively applied twice to identify permanent edges. (3) In the analysis stage, we compare the estimated ECR values at different radii and select the most suitable radius.

3.1. Preprocessing

We selected samples from the Geo-Wiki Global Field Size estimation dataset that fell within our study area and obtained Sentinel-2 remote sensing image chips covering a certain range around these samples. The QA band of the images indicates pixel quality, such as whether there is cloud contamination. Based on the QA band, we calculated the percentage of clean pixels in the tile area and retained tiles with a clean pixel percentage greater than 99%, discarding contaminated image tiles. We used the 10 m red and near-infrared bands to calculate the NDVI, maintaining the original spatial resolution (10 m). The NDVI highlights the difference between vegetation and non-vegetation areas more than the original spectral bands. Its value ranges from −1 to 1, with denser vegetation having higher values; vegetated areas typically have values above 0.2.

3.2. Edge Extraction

Inside the farmland are dense crops, while the edges are roads or other dividers with sparser vegetation. On the NDVI image, there is a gradient difference from the farmland to its edges. After converting the floating-point NDVI image to integer data by multiplying it by 10,000, the Canny edge detection operator was used for convolution to extract areas with abrupt grayscale changes, i.e., farmland edges. Compared to other edge detection operators like Sobel and Laplacian, Canny performed better in this regard [20]. The relevant parameters were determined through trial and error, set at 100 for the threshold and 1.5 for sigma. At this setting, more edges can be identified.

Binary edges were extracted from each NDVI image and then summed to get the number of times edges were marked in the NDVI image over a year. Its typical probability density distribution is shown in Figure 2, which can be divided into three categories from left to right: internal farmland areas, temporary edge areas, and permanent edge areas. Temporary edge areas could be identified in some NDVI images throughout the year, such as strip planting marks within the farmland, which disappear after the crops grow vigorously. Permanent edges (PE), on the other hand, can be seen in almost all images throughout the year. We used the land cover product WorldCover to extract cropland in the sample area, and then used the automatic threshold segmentation algorithm Otsu [19] iteratively twice to automatically classify the three types of farmland areas. The first threshold (otsu_1) divides the farmland area into edges and non-edges, and the second threshold (otsu_2) further divides the edge area into PE and Non-PE. The PE ratio of the farmland area, i.e., the ECR, was calculated, corresponding to the area to the right of otsu_2 in the probability density distribution graph. Ideally, the larger the farmland size, the smaller the proportion of PE in the segmented farmland, i.e., a lower ECR.

3.3. Radius Analysis

The ECR indicator is affected not only by the actual farmland size but also by the estimation radius of the sample points. Therefore, we conducted a robustness test on the radius of the ECR to determine the optimal radius. To preserve heterogeneity, a smaller radius is needed. However, the smaller the radius, the less farmland area is included in the sample, and when it is small enough, the statistical distribution of the edge counts will not be typical, and the Otsu threshold segmentation algorithm based on a histogram distribution will not be able to find a suitable threshold, resulting in more outliers in the ECR estimation. The radius used for the interpretation of the Geo-Wiki dataset is 200 m, so we started testing from 200 m, then testing 200 m, 400 m, 800 m, 1600 m, and 3200 m.

3.4. Validation and Assessment

The Geo-Wiki Field Size dataset is currently the most accurate publicly available global farmland size labeling product [6]. We compared our ECR estimation results with it to validate our algorithm. There were a total of 1792 sample points in the study area for this dataset. Based on the farmland size within a 200 m radius grid around the sample point, three volunteers interpret high-resolution remote sensing images to determine the dominant farmland size, divided into five levels: XL for >100 ha, L for 16–100 ha, M for 2.56–16 ha, S for 0.64–2.56 ha, and XS for <0.64 ha. In addition, there is “No field”, indicating no farmland within the range, and “NA” for no label due to inconsistent labeling by the three volunteers or a lack of available satellite imagery for interpretation. It is worth noting that labeling based on the dominant farmland size, rather than the average farmland size, overestimates farmland size in areas with mixed large and small farmlands.

The reference data is an ordinal variable, while our calculated ECR is a continuous variable. We conducted three forms of comparison: First, we used box plots to show the ECR distribution of different parcel size levels to visually demonstrate whether this variable can reflect parcel size differences. Then, we discretized the continuous ECR variable into an ordinal variable consistent with Geo-Wiki labels, with the thresholds determined based on correlation coefficients and expert experience. We used a confusion matrix to show the consistency between the parcel size predicted by the ECR and the manually labeled parcel size, and calculated accuracy metrics such as the user accuracy, producer accuracy, and overall accuracy. Finally, we performed a Spearman test on the two variables to test their correlation. The Spearman rank correlation coefficient (Equation (1)) can assess the association between two ordinal variables [21]. Its value ranges from −1 to 1. Greater than zero indicates a positive correlation between the two variables, less than zero indicates a negative correlation, and equal to zero indicates no linear correlation between the two variables. The absolute value of the coefficient indicates the strength of the correlation, with values closer to 1 indicating a stronger correlation.

ρ = 1 - \frac{6 \times \sum d^{2}}{n \times (n^{2} - 1)},

(1)

4. Results

4.1. Edge Extraction

The main result of this study is the ECR dataset estimated at different radii. Figure 3 demonstrates the edge count calculation process for a 3200 m radius of a sample (sample ID: 962700) located in the Inner Mongolia Autonomous Region. The southern part of the sample area is a village settlement, while the rest is farmland, divided by a northwest-oriented river. The image was taken in early spring, when the crops in the fields had not yet awakened, but the plants in the riverbed had turned green earlier due to nourishment from melted snow.

A total of 145 Sentinel-2 image tiles from 2019 were used to calculate the percentage of locally clear pixels within the tile range. Based on this, 80 contaminated tiles (with a local clear pixel percentage less than 99%) were removed, leaving 65 clean tiles. The NDVI image tile set was calculated using the red and near-infrared bands. The Canny edge detection operator was used to extract the edges of the NDVI image tile set, resulting in a series of binary farmland boundaries. By overlaying the edges of the farmland, the number of times each pixel was marked as an edge was obtained. Non-farmland areas in the tiles were masked out using the WorldCover land cover dataset to clearly reveal the boundaries of farmland and the traces of strip planting within.

Figure 4a shows the buffer zones of different radii for the sample, ranging from 3200 m to 200 m. As the buffer radius is halved, the buffer area shrinks to 1/4 of the outer ring. The smaller the buffer, the less farmland it contains. Figure 4b shows the edge count images obtained at different radii. For clarity, only the 3200 m radius edge count is selected for subsequent display in Figure 4c–f. Figure 4f shows the distribution of its edge count: there is a peak around 10, representing the less-marked internal areas of farmland. The Otsu automatic threshold segmentation algorithm found the first threshold of 17 based on the numerical distribution of the grayscale image. The area greater than this threshold is the edge area, as shown in Figure 4d. The edge area is large, including not only the outer boundary of the farmland but also the strip planting traces of the internal boundary. The Otsu segmentation was applied again to the edge area to find the second threshold, 28. The area greater than this threshold is the PE (as shown in Figure 4e), mainly including the outer boundary of the farmland. The ECR, in Figure 4e, refers to the proportion of the white area (PE) in the non-transparent area (farmland); in Figure 4f, it refers to the area greater than the second threshold in the area enclosed under the curve. For each sample, we calculated the ECR obtained at different radii. The next section will analyze the estimated radius of the ECR.

4.2. Radius Analysis

The distribution of the ECR under different radius estimations is shown in the boxplot in Figure 5, and its statistical description is shown in the table on the right. As the radius increases, the distribution of the ECR gradually converges, with the standard deviation decreasing from 0.063 at 200 m to 0.019 at 800 m, and stabilizing after 800 m. After 800 m, the quantile indicators and mean values remain basically unchanged. At a radius of 3200 m, all 1792 sample points calculated valid ECR values. As the radius decreased, the number of valid estimates decreased. The Geo-Wiki campaign used manual interpretations of VHR images to label land based on the situation of the 200 m-radius area around the sample points, but under 10 m-resolution remote sensing images, the number of farmland pixels does not exceed 1600 (40 × 40). With too few farmland pixels, the statistical distribution is not representative, and the Otsu algorithm has difficulty finding a reasonable threshold. At a radius of 200 m, there are only 1677 estimated values, including outliers (max = 1). Therefore, only radii above 400 m are considered in the subsequent analysis.

The ECR is grouped and displayed based on the parcel size labels, with different radius estimation results shown in separate figures (Figure 6). Under different radius estimations, from group L to XS, the size of the farmland gradually decreases, and the distribution of the ECR shows an upward trend. This is in line with our expectation that the smaller the size of the farmland, the more fragmented the farmland is divided per unit area and the higher the number of outer boundaries of the farmland—that is, the higher the ECR. The ECR of the XL group is not lower than that of the L group because the manual labeling is based on the size of the largest farmland within 200 m of the sample point, rather than the average farmland size, so it is obviously overestimated compared to our results. The dominant farmland area near the XL group sample points exceeds 100 ha, far exceeding the size of the 200 m radius range. Although manual interpretation can rely on background information outside this range to judge, our automatic labeling algorithm cannot. Therefore, it can be seen that as the radius increased from 400 m to 3200 m, the ECR distribution trend of the XL group showed a downward trend—that is, the larger the farmland size, the larger the radius of ECR is needed for evaluation.

4.3. Validation and Assessment

The ECR can capture differences in parcel size. As seen in Figure 7, the first two rows are Inner Mongolia and Liaoning Province, located in the Northeast China Plain. Historically, the Inner Mongolia Autonomous Region was the territory of the nomadic Mongolian people, sparsely populated, with farmland mainly used as large pastures, resulting in relatively large field sizes. Overall, their ECR distribution trends are further to the left than the four provinces in the North China Plain below, indicating smaller ECR values, which is consistent with the distribution of labels—farmland larger than 16 hectares is mainly concentrated in the Northeast China Plain.

A size of 1600 m is the most suitable estimation radius for this study area. In the line chart on the left, the blue line represents the estimation results for a 400 m radius, which deviates significantly from the results for radii above 400 m. Although the estimation results for an 800 m radius are close to those for larger radii in the four provinces of the North China Plain (the bottom four rows of Figure 7), there is still a large deviation in the Northeast China Plain. When the radius increases to 1600 m, the estimation results in each province reach stability and are basically consistent with the estimation at 3200 m. In order to retain regional heterogeneity, the radius should be as small as possible while ensuring the robustness of the estimation results. Therefore, we believe that 1600 m is the optimal estimation radius for ECRs in this study area. Subsequent analysis will be based on the ECR estimation results at 1600 m.

The ECR estimation results (1600 m radius) of typical samples are shown in Figure 8. Columns a, b, and c correspond to extra-large and medium farmland samples in the Northeast Plain, and XS samples in the North China Plain, with corresponding ECRs of 0.0652, 0.0814, and 0.1267, respectively. The edge count image in the second row clearly captures traces of strip planting within the farmland and the outer edges, with the former having lower values (bluish) and the latter having higher values (reddish). The second threshold, found by applying Otsu’s method twice to the edge count, extracts the latter, i.e., the permanent edges. Overall, the PE basically corresponds to the outer edges of the farmland.

The ECR, as a continuous variable, can better reflect differences in parcel size compared to the ordinal variable of field size labels. However, for better comparison, we need to convert the ECR into an ordinal variable to utilize the confusion matrix for comparison and to calculate accuracy metrics. To discretize the continuous ECR variable into five categories matching the labels, four thresholds need to be determined. We determined the three thresholds for classifying L, M, S, and XS based on Spearman’s rank correlation coefficients. When these three thresholds were 0.074, 0.085, and 0.099, the correlation between the predicted results and the actual results was maximized. The threshold for classifying XL was determined to be 0.065 based on expert experience. This is different from the method used to determine the first three thresholds because the XL label, according to our evaluation results, is an outlier (Figure 6). Using these four thresholds, the ECR can be converted into predicted values of parcel size and compared with the actual values. At this point, the Spearman correlation coefficient between the two variables reached 0.315, and the p-value was much smaller than 0.001, rejecting the null hypothesis that the two variables are independent.

In the confusion matrix of Figure 9, the areas on the diagonal from the upper left to the lower right indicate that the predicted values are consistent with the actual values. The consistent areas are more distributed on small-scale farmland. On both sides of the diagonal, the upper right side is denser than the lower left side, indicating that our prediction results tended to underestimate. This is because we calculate the average size of farmland near the sample points, while the actual values are labeled based on the size of the largest farmland. The Spearman correlation coefficient between the farmland size prediction results based on the ECR and the manually labeled results of the Geo-Wiki Field Size campaign reached 0.315, with a p-value much smaller than 0.001, proving that the two variables have a high positive correlation.

5. Conclusions and Discussion

Estimating farmland size in smallholder farming areas using publicly available medium- and high-resolution satellite remote sensing data is a current challenge in farmland size estimation. This study developed a farmland size estimation method suitable for smallholder farming areas using Sentinel-2 imagery. We used the Northeast China Plain and the North China Plain as study areas and conducted robustness tests on the radius size of the samples. A radius of 1600 m is recommended for this study area. At this radius, the ECR in North China was concentrated at 0.085, while the Northeast Plain was concentrated at 0.105. This is consistent with the farmland size distribution revealed by the Geo-Wiki plot size annotation dataset. After converting the continuous variable ECR into an ordinal variable using the thresholds 0.065, 0.074, 0.085, and 0.099, the Spearman correlation coefficient with the reference value reached 0.315, and the p-value was much less than 0.001. This proves the feasibility and scientific validity of the method. The method uses publicly available remote sensing data with low computational costs and self-adaptive algorithm parameters, which has good spatial generalization ability. In the future, large-scale farmland size estimation can be carried out using this algorithm.

5.1. Comparison with Crowdsourcing-Based Visual Interpretation

The farmland size predicted based on the ECR was consistent with the visually interpreted results in small farmland areas, but there were significant differences in large farmland areas. This discrepancy is not due to algorithm flaws, but rather to differences in interpretation rules: visual interpretation labels are based on the size of the largest farmland within the sample area. Moreover, if the largest farmland exceeds the sample area, interpreters can determine the size of the farmland based on the surrounding context. Therefore, using only a 200 m radius, theoretically, the maximum farmland area is 16 ha (400 m × 400 m), but farmland exceeding 100 hectares can be labeled. Our ECR algorithm reflects the average size of farmland within the sample area, not the maximum value. It strictly calculates based on the sample area and cannot utilize background information outside the sample area.

Crowdsourcing (or citizen science) involves public participation in non-profit academic research, allowing researchers to collect information at a lower cost. However, the efforts of participants still represent a significant human cost. Additionally, the quality of the collected information depends on the participants’ competence, which may conflict with the reproducibility requirements of scientific research. Therefore, after obtaining reference results for global farmland size through citizen science, it is still necessary to strive for the development of automated methods for farmland size estimation.

5.2. Comparison with Automatic Field Boundary Delineation

Automated farmland boundary identification algorithms require the accurate identification of farmland boundaries to estimate farmland size. However, the prerequisite for the accurate identification of farmland boundaries is that they are clearly visible in remote sensing images. This requires remote sensing data with sufficiently high spatial resolution. After initially identifying farmland boundaries, the post-processing stage often involves complex morphological operations to correct topological errors in the boundaries. The design of morphological rules assumes that the shape of farmland is regular, such as circular or rectangular. Therefore, automated farmland boundary identification algorithms face many challenges in small farmland areas.

5.3. Prospect and Limitations

Large-scale, regular monitoring needs to consider data costs and algorithm generalization ability [22,23]. In terms of data, although earth observation technology is very advanced and can observe the earth’s surface at a sub-meter level, visual interpretation through crowdsourcing can assess global farmland size, but the high labor cost makes it impossible to use as a routine monitoring method [5,6]. Automated farmland boundary identification algorithms are not yet mature enough for application in small farmland areas, and cannot balance cost and efficiency [10,24]. In terms of algorithms, although there are many object-oriented classification algorithms such as Trimble eCognition software version 9, which provides many object-oriented classification algorithms, and deep learning models represented by the Segment Anything Model (SAM) [25], these algorithms have some limitations. For example, the multi-resolution segmentation algorithm commonly used in eCognition software v.9 has many parameters that need to be adjusted [26]; deep learning models require very high computational costs and a large amount of training data [24].

We used publicly available ten-meter Sentinel-2 satellite data and designed an algorithm based on adaptive parameters that can reflect the farmland size in small farm areas. Some factors affecting the ECR, such as topography, lighting conditions, climate, and farming systems, will be further considered in future research. This algorithm can be migrated to the Planet satellite constellation in the future [27]. Although this public satellite data cannot match commercial satellites in terms of spatial resolution, its advantages such as its high revisit cycle, more accurate radiometric calibration, and lower data cost (free) make it the best choice for large-scale, long-term monitoring.

Author Contributions

Conceptualization, J.D. (Jingnan Du) and S.X.; methodology, S.X.; software, S.X.; validation, S.X.; formal analysis, S.X.; investigation, J.L.; resources, J.D. (Jiakun Duan); data curation, S.X.; writing—original draft preparation, S.X.; writing—review and editing, S.X.; visualization, S.X.; supervision, W.X. and J.L.; project administration, W.X.; funding acquisition, W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Fund of Key Laboratory of Natural Resources Monitoring and Supervision in Southern Hilly Region, Ministry of Natural Resources, grant number NRMSSHR2023Y18; the Hunan Provincial Natural Science Foundation of China, grant number 2024JJ8351; and the Fundamental Research Funds for the Central Universities, grant number S20230127.

Data Availability Statement

Data available in a publicly accessible repository at https://doi.org/10.6084/m9.figshare.26339845.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ren, C.; Zhou, X.; Wang, C.; Guo, Y.; Diao, Y.; Shen, S.; Reis, S.; Li, W.; Xu, J.; Gu, B. Ageing threatens sustainability of smallholder farming in China. Nature 2023, 616, 96–103. [Google Scholar] [CrossRef]
Harper, S. Economic and social implications of aging societies. Science 2014, 346, 587–591. [Google Scholar] [CrossRef] [PubMed]
Looga, J.; Jürgenson, E.; Sikk, K.; Matveev, E.; Maasikamäe, S. Land fragmentation and other determinants of agricultural farm productivity: The case of Estonia. Land Use Policy 2018, 79, 285–292. [Google Scholar] [CrossRef]
Wang, Y.; Li, X.; Lu, D.; Yan, J. Evaluating the impact of land fragmentation on the cost of agricultural operation in the southwest mountainous areas of China. Land Use Policy 2020, 99, 105099. [Google Scholar] [CrossRef]
Fritz, S.; See, L.; McCallum, I.; You, L.; Bun, A.; Moltchanova, E.; Duerauer, M.; Albrecht, F.; Schill, C.; Perger, C.; et al. Mapping global cropland and field size. Glob. Chang. Biol. 2015, 21, 1980–1992. [Google Scholar] [CrossRef]
Lesiv, M.; Laso Bayas, J.C.; See, L.; Duerauer, M.; Dahlia, D.; Durando, N.; Hazarika, R.; Kumar Sahariah, P.; Vakolyuk, M.; Blyshchyk, V.; et al. Estimating the global distribution of field size using crowdsourcing. Glob. Chang. Biol. 2019, 25, 174–186. [Google Scholar] [CrossRef] [PubMed]
Yan, L.; Roy, D.P. Conterminous United States crop field size quantification from multi-temporal Landsat data. Remote Sens. Environ. 2016, 172, 67–86. [Google Scholar] [CrossRef]
Weissteiner, C.J.; García-Feced, C.; Paracchini, M.L. A new view on EU agricultural landscapes: Quantifying patchiness to assess farmland heterogeneity. Ecol. Indic. 2016, 61, 317–327. [Google Scholar] [CrossRef]
Graesser, J.; Ramankutty, N. Detection of cropland field parcels from Landsat imagery. Remote Sens. Environ. 2017, 201, 165–180. [Google Scholar] [CrossRef]
Cheng, T.; Ji, X.; Yang, G.; Zheng, H.; Ma, J.; Yao, X.; Zhu, Y.; Cao, W. DESTIN: A new method for delineating the boundaries of crop fields by fusing spatial and temporal information from WorldView and Planet satellite imagery. Comput. Electron. Agric. 2020, 178, 105787. [Google Scholar] [CrossRef]
Zhang, P.; Hu, S.; Li, W.; Zhang, C. Parcel-level mapping of crops in a smallholder agricultural area: A case of central China using single-temporal VHSR imagery. Comput. Electron. Agric. 2020, 175, 105581. [Google Scholar] [CrossRef]
Lowder, S.K.; Sánchez, M.V.; Bertini, R. Which farms feed the world and has farmland become more concentrated? World Dev. 2021, 142, 105455. [Google Scholar] [CrossRef]
Ricciardi, V.; Ramankutty, N.; Mehrabi, Z.; Jarvis, L.; Chookolingo, B. How much of the world’s food do smallholders produce? Glob. Food Secur. 2018, 17, 64–72. [Google Scholar] [CrossRef]
Lowder, S.K.; Skoet, J.; Raney, T. The Number, Size, and Distribution of Farms, Smallholder Farms, and Family Farms Worldwide. World Dev. 2016, 87, 16–29. [Google Scholar] [CrossRef]
George Rapsomanikis. The Economic Lives of Smallholder Farmers: An Analysis Based on Household Data from Nine Countries; George Rapsomanikis: Rome, Italy, 2015; Available online: http://www.fao.org/3/a-i5251e.pdf (accessed on 15 July 2024).
Rogers, S.; Wilmsen, B.; Han, X.; Wang, Z.J.-H.; Duan, Y.; He, J.; Li, J.; Lin, W.; Wong, C. Scaling up agriculture? The dynamics of land transfer in inland China. World Dev. 2021, 146, 105563. [Google Scholar] [CrossRef]
Zanaga, D.; van de Kerchove, R.; de Keersmaecker, W.; Souverijns, N.; Brockmann, C.; Quast, R.; Wevers, J.; Grosu, A.; Paccini, A.; Vergnaud, S.; et al. ESA WorldCover 10 m 2020 v100. 2021. Available online: https://worldcover2020.esa.int/download (accessed on 15 July 2024).
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Watkins, B.; van Niekerk, A. A comparison of object-based image analysis approaches for field boundary delineation using multi-temporal Sentinel-2 imagery. Comput. Electron. Agric. 2019, 158, 294–302. [Google Scholar] [CrossRef]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72. [Google Scholar] [CrossRef]
Xu, S.; Xiao, W.; Yu, C.; Chen, H.; Tan, Y. Mapping Cropland Abandonment in Mountainous Areas in China Using the Google Earth Engine Platform. Remote Sens. 2023, 15, 1145. [Google Scholar] [CrossRef]
Xiao, W.; Xu, S.; He, T. Mapping Paddy Rice with Sentinel-1/2 and Phenology-, Object-Based Algorithm—A Implementation in Hangjiahu Plain in China Using GEE Platform. Remote Sens. 2021, 13, 990. [Google Scholar] [CrossRef]
Wang, S.; Zhou, Y.; Yang, X.; Feng, L.; Wu, T.; Luo, J. BSNet: Boundary-semantic-fusion network for farmland parcel mapping in high-resolution satellite images. Comput. Electron. Agric. 2023, 206, 107683. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. 2023. Available online: http://arxiv.org/pdf/2304.02643 (accessed on 15 July 2024).
Xu, S.; Xiao, W.; Ruan, L.; Chen, W.; Du, J. Assessment of ensemble learning for object-based land cover mapping using multi-temporal Sentinel-1/2 images. Geocarto Int. 2023, 38, 2195832. [Google Scholar] [CrossRef]
Rufin, P.; Bey, A.; Picoli, M.; Meyfroidt, P. Large-area mapping of active cropland and short-term fallows in smallholder landscapes using PlanetScope data. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102937. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The red borders outline the six major grain-producing provinces in China, which serve as our study area. From top to bottom, left to right, they are Inner Mongolia and Liaoning Province in the Northeast China Plain, and Shandong Province, Henan Province, Hubei Province, and Anhui Province in the North China Plain. There are a total of 1792 sample points from the Geo-Wiki plot size dataset that fall within the study area. Both color and size are used to display the points based on plot size for better visualization.

Figure 2. Technical flowchart of this study. Abbreviations: NDVI stands for normalized difference vegetation index; Otsu, Otsu binary segmentation algorithm; ECR, edge cropland ratio [6].

Figure 3. The figure illustrates the Edge count generation process within a 3200 m radius of a sample point (Sample ID: 962700, Latitude: 45.417702, Longitude: 121.545998) located in the Inner Mongolia Autonomous Region. Edge count represents the number of times each pixel is marked as an edge. In the grayscale image, brighter areas indicate higher counts, while pure black areas represent non-farmland regions.

Figure 4. (a) shows the buffer zones of different radii for the sample point, ranging from 200 m to 3200 m. (b) displays the edge count images obtained at different radii. The edge count result for the 3200 m radius is shown in (c), with non-farmland areas set as transparent. (d–f), respectively, represent the extracted edges, permanent edges, and edge frequency distribution of (c).

Figure 5. The boxplot illustrates the distribution of estimated ECR values at different radii. The numerical statistics are shown in the right figure.

Figure 6. The distribution of ECR for each parcel size group is shown in separate figures with different radii.

Figure 7. The line chart on the left shows the probability density distribution of ECRs at different radii, while the bar chart on the right shows the number of Field size labels at each level, categorized by province.

Figure 8. The figure presents three samples of different farmland sizes in separate columns, from left to right: XL, M, and XS. The three rows from top to bottom are: Google Satellite basemap and sample’s metadata, edge count image and grayscale histogram (upper right corner), and identified permanent edges (in red).

Figure 9. The probability density curves of the 1600 m ECRs are displayed, grouped by parcel size. The three vertical dashed lines in the left figure represent the optimal thresholds for classifying farmland size based on ECRs, determined using Spearman’s rank correlation coefficient. The confusion matrix on the right shows the comparison between our ECR-predicted farmland sizes and the manually interpreted labeling results.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, J.; Xu, S.; Li, J.; Duan, J.; Xiao, W. A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing. Remote Sens. 2024, 16, 2981. https://doi.org/10.3390/rs16162981

AMA Style

Du J, Xu S, Li J, Duan J, Xiao W. A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing. Remote Sensing. 2024; 16(16):2981. https://doi.org/10.3390/rs16162981

Chicago/Turabian Style

Du, Jingnan, Sucheng Xu, Jinshan Li, Jiakun Duan, and Wu Xiao. 2024. "A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing" Remote Sensing 16, no. 16: 2981. https://doi.org/10.3390/rs16162981

APA Style

Du, J., Xu, S., Li, J., Duan, J., & Xiao, W. (2024). A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing. Remote Sensing, 16(16), 2981. https://doi.org/10.3390/rs16162981

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Approach for Farmland Size Estimation in Small-Scale Agriculture Using Edge Counting and Remote Sensing

Abstract

1. Introduction

2. Materials

2.1. Data

2.2. Study Region

3. Methods

3.1. Preprocessing

3.2. Edge Extraction

3.3. Radius Analysis

3.4. Validation and Assessment

4. Results

4.1. Edge Extraction

4.2. Radius Analysis

4.3. Validation and Assessment

5. Conclusions and Discussion

5.1. Comparison with Crowdsourcing-Based Visual Interpretation

5.2. Comparison with Automatic Field Boundary Delineation

5.3. Prospect and Limitations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI