Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification

Bae, Yeongung; Ban, Yuseok

doi:10.3390/app15137056

Open AccessArticle

Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification

by

Yeongung Bae

^1,2 and

Yuseok Ban

^1,*

¹

Department of Electronics Engineering, Chungbuk National University, 1 Chungdae-ro, Seowon-gu, Cheongju 28644, Republic of Korea

²

OptAI, 45 Yangcheong 4-gil, Cheongwon-gu, Cheongju 28116, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7056; https://doi.org/10.3390/app15137056

Submission received: 7 May 2025 / Revised: 9 June 2025 / Accepted: 19 June 2025 / Published: 23 June 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Satellite image-based farmland classification plays an essential role in agricultural monitoring. However, typical tiling-based classification approaches, which extract patches at fixed offsets within each image during training, often suffer from structural issues such as patch duplication, limiting training diversity. Additionally, farmland classification frequently exhibits class imbalance due to uneven cultivation areas, resulting in biased training toward majority classes and poorer performance on minority classes. To overcome these issues, we propose Class-Balanced Random Patch Training, which combines Random Patch Extraction (RPE) and Class-Balanced Sampling (CBS). This method improves patch-level diversity and ensures balanced class representation during training. We evaluated our method on the FarmMap dataset, using a validation set from the same region and year as the training data, and a test set from a different year and region to simulate domain shifts. Our approach improved the F1 scores of minority classes and overall performance. Furthermore, our analysis across varying levels of class difficulty showed that the method consistently outperformed other configurations, regardless of minority-class difficulty. These results demonstrate that the proposed method offers a practical and generalizable solution for addressing class imbalance in tiling-based remote sensing classification, particularly under real-world conditions with spatial and temporal variability.

Keywords:

remote sensing; farmland classification; tiling-based classification; class imbalance; random patch extraction; class-balanced sampling

1. Introduction

Farmland classification, including crop classification, is essential for agricultural monitoring, yield estimation, and policy-making. Advances in satellite-based remote sensing have enabled accurate mapping of farmland distribution and characteristics over extensive areas, greatly enhancing resource management and decision-making [1].

Early remote sensing-based farmland classification mainly employed pixel-wise classification methods [2]. However, these methods have notable drawbacks, including “salt-and-pepper” noise resulting from isolated misclassified pixels and difficulty handling mixed pixels [3]. To overcome these limitations, Object-Based Image Analysis (OBIA) emerged as an alternative. OBIA segments satellite images into meaningful objects (e.g., parcels) and classifies these based on statistical features such as the mean Normalized Difference Vegetation Index (NDVI), texture, and area, typically using traditional machine learning algorithms [4,5].

Recently, convolutional neural network (CNN)-based deep learning techniques have increasingly been applied to farmland classification. These approaches can be broadly categorized into two types: (i) methods that first extract field boundaries and subsequently classify each parcel using CNNs [6,7,8,9] and (ii) end-to-end segmentation models that directly learn from pixel-level data to predict land cover classes for entire images [10,11,12]. In parallel, research focusing on precise field-boundary extraction has gained traction [13,14,15,16], aiming to improve object definitions within classification pipelines.

Alongside these technical advances, numerous studies have reported that not only the design of model architectures but also non-architectural factors—such as the composition of training samples, data sampling methods, and pretraining strategies—can play a significant role in improving classification performance [17,18,19,20]. Based on this perspective, the present study aims to improve performance by refining the method of training for data extraction while preserving the model architecture.

Farmland classification is also particularly sensitive to temporal and regional variability. Even identical crop types can exhibit significant spectral and textural differences, driven by variations in growth stages, soil properties, and climate conditions across regions [21]. These factors critically impact the generalization ability of classification models, emphasizing the need for methods that maintain robust performance across spatio-temporal variations. To address this need, the present study aims to develop a farmland classification model capable of strong generalization—both spatially and temporally.

Class imbalance is among the most significant challenges in farmland classification. Due to variations in crop cultivation areas influenced by region, season, and agricultural policies, farmland datasets often exhibit highly imbalanced class distributions. This trend has also been observed in previous studies, where farmland data were found to be heavily skewed toward a few dominant crop types, while rare crops appeared only sparsely [22]. For instance, staple crops such as rice and barley typically occupy large portions of agricultural datasets, whereas specialty crops like ginseng or nursery fields appear far less frequently—a pattern observed in our experimental dataset as well.

This imbalance frequently leads models to overfit majority classes and poorly classify minority classes [23]. Models trained with cross-entropy loss are particularly vulnerable to this issue, as the loss is inherently biased toward well-represented classes due to its weighting based on sample frequency. Consequently, accuracy for under-represented classes decreases, diminishing the overall robustness and fairness of the models.

In the field of remote sensing, approaches to address class imbalance can be broadly categorized into two groups. The first is data-level oversampling strategies, and the second includes algorithmic approaches such as loss-function adjustment using focal loss or weighted loss.

Oversampling methods aim to balance the class distribution by increasing the number of samples for minority classes. The most straightforward technique is simple replication of minority-class samples. However, this approach can lead to overfitting due to repeated exposure to identical data [24,25]. To alleviate this, techniques such as SMOTE [26] and ADASYN [27] have been proposed, which generate synthetic samples by interpolating between minority-class instances. These techniques have also been successfully applied to remote sensing classification problems, demonstrating strong performance in various applications [24,25,28,29]. However, such methods have been criticized for potentially generating noise or distorted spectral information that deviates from the true data distribution in high-dimensional remote sensing imagery [30,31]. To overcome these issues, recent studies have explored GAN-based oversampling, which offers the advantage of generating more realistic samples [32,33]. However, when the number of original minority-class samples is extremely limited, there is a risk of generating repetitive or unrealistic synthetic images, as has been reported in the literature [34,35].

In contrast, algorithmic approaches adjust the loss function to compensate for class imbalance by assigning greater weights to errors in minority-class predictions during training. For example, focal loss [36] and weighted cross-entropy modify the loss values to emphasize under-represented classes. These methods have the advantage of enhancing minority-class learning without altering the original data itself [37,38]. However, they are generally highly sensitive to hyperparameter settings, and if not carefully tuned, they may lead to excessive overfitting to the minority class or instability in the training process [23,39].

In this study, we adopt a data-level oversampling strategy, and accordingly, avoiding redundant learning and ensuring diversity in the training data become key objectives.

While addressing class imbalance is important, it is equally critical to consider the geometric variability of objects in remote sensing imagery, including farmland data. Even at a fixed spatial resolution, objects in such imagery can vary significantly in scale, shape, and aspect ratio, making it challenging to process them in a consistent manner [40,41,42,43]. This variability poses a fundamental obstacle to the accuracy of CNN-based analysis tasks in remote sensing, such as classification and detection. To address this issue, previous studies have proposed various approaches, including multi-scale feature pyramid frameworks [40,41] and patch-based prediction methods that divide images into fixed-size regions [42,43]. Among these, our study explicitly adopts the patch-based prediction strategy known as tiling, which effectively handles geometric variability while satisfying the fixed input-size requirements of CNN models.

One straightforward method is resizing each parcel image to a predetermined input size. However, because farmland parcels vary significantly in shape and size, applying a uniform output size introduces inconsistent scaling factors across samples. Figure 1 illustrates this issue by comparing two example parcels and how they are transformed when applying different input preparation strategies. As shown in Figure 1d, resizing causes images with different native resolutions to be rescaled disproportionately, distorting their original textures and spatial proportions. This inconsistency undermines the uniformity of image preprocessing and hampers the model’s ability to learn meaningful visual features. As a result, the visual distinction among classes is reduced, especially for crop types with fine-grained structural differences, making classification more difficult. In contrast, fixed-size patch extraction avoids this problem by directly sampling from the original image without altering its resolution, as shown in Figure 1c. This method preserves the local spatial scale and textural characteristics of each parcel, enabling the model to capture more reliable and class-specific visual cues, thereby improving its ability to distinguish between visually similar classes.

To systematically implement this fixed-size patch approach, tiling-based classification frameworks have been introduced [42,43]. Instead of resizing the entire parcel, this approach divides each parcel into overlapping fixed-size patches. Each patch is individually classified by a CNN, and parcel-level predictions are aggregated from patch-level results—typically using a class-wise product. This strategy maintains spatial resolution, avoids global scaling distortions, and satisfies CNN input requirements, making it a widely adopted approach in tiling-based classification.

Despite these advantages, typical tiling-based training methods have limitations. First, the Fixed Patch Extraction (FPE) method crops patches from fixed spatial offsets, which matches inference-time behavior. However, this fixed sampling strategy restricts spatial diversity in training data, as the same patch locations are repeatedly used during training epochs, thereby limiting variation in spatial contexts [44,45]. Such structural limitations can lead to overfitting to specific spatial patterns and degrade generalization performance for unseen domains [44]. Although repeated patch exposure is somewhat reduced by randomly sampling only 40% of patches when more than three patches are generated per parcel, fixed offsets still cause patch duplication across epochs. This redundancy limits training diversity and heightens the risk of overfitting, as shown in Figure 2.

Second, most tiling-based pipelines preserve the original class distribution without applying balancing mechanisms—a scenario we define as No Class Balancing (NCB). In imbalanced farmland datasets, NCB results in skewed training exposure: majority classes dominate the training process, while rare classes (e.g., ginseng or tree nursery) become severely under-represented. Consequently, models develop biases toward frequent classes, performing poorly on minority classes.

In short, patch duplication from FPE and class imbalance due to NCB degrade the generalization capability of models, underscoring the need for more robust training strategies tailored to parcel-based tiling classification.

To address these challenges, we adopt two complementary techniques: random patch extraction (RPE) and class-balanced sampling (CBS).

RPE improves overall classification performance by ensuring spatial diversity in training patches and also enhances performance across most minority classes. Under the 1× resolution setting, overall accuracy increased by +0.0176 on the validation set and +0.0574 on the test set. For an easy minority class, F1 scores improved by +0.0696/+0.0790 (Val/Test), respectively. However, when used alone, RPE does not resolve class-level training imbalance. In very difficult minority classes, the spatial diversity provided by RPE hindered generalization under limited training exposure, and the model failed to make any correct predictions, with F1 scores dropping to 0 on both the validation and test sets.

CBS mitigates majority-class bias by equalizing class-level training opportunities and facilitates better inter-class discrimination, improving both overall and minority-class performance. In the 1× setting, overall accuracy improved by +0.0172 (Val) and +0.0334 (Test), and for a normal-difficulty minority class, F1 scores increased significantly by +0.4000/+0.4332 (Val/Test). However, in a very difficult minority class, the F1 score increased by +0.4086 in validation but only +0.0349 in test, indicating limited generalization. An easy minority class also showed signs of overfitting, with an F1 gain of +0.0600 in validation but only +0.0123 in test.

By combining the two methods, we simultaneously achieve balanced training opportunities through CBS and spatial patch diversity through RPE, resulting in the following complementary effects. CBS compensates for the insufficient training exposure of minority classes under RPE, especially in high-difficulty cases, by ensuring sufficient repetition for learning. Meanwhile, RPE introduces spatial diversity at the patch level, alleviating the structural limitation of repeated patch sampling inherent in CBS and helping prevent overfitting. In practice, the proposed combined method achieved consistent performance improvements across minority classes of varying difficulty. Notably, for a very difficult minority class where both individual methods struggled, the combined strategy yielded substantial F1 gains of +0.6171 (Val) and +0.4044 (Test). Furthermore, overall accuracy improved significantly, with gains of +0.0443 (Val) and +0.0876 (Test), demonstrating enhanced general classification performance.

The key contributions of this study are summarized as follows:

Class-Balanced Random Patch Training
We integrate data augmentation via random patch extraction (RPE) and over/undersampling through class-balanced sampling (CBS) in a unified training pipeline for tiling-based classification. RPE and CBS independently address patch duplication and class imbalance, respectively, but their combination demonstrates complementary effects on performance. This combined strategy significantly improves the F1 score for minority crop classes, achieving gains of up to +0.6171 (Validation) and +0.4044 (Test) compared to the baseline. Unlike many conventional methods that rely on synthetic data to address class imbalance, our method leverages only real data to ensure training diversity and achieves stable performance without complex hyper-parameter tuning.
Robustness to Context Loss and Aggregation Effects under Upscaling
We perform 2× upscaling of parcels to reduce the context per patch while increasing the number of patches per parcel. As a result, our method demonstrates robustness to context loss, maintaining stable patch-level performance, even when context is reduced to one-fourth. Furthermore, we quantitatively confirm that the increased number of patches enhances the aggregation effect of the softmax product, leading to improved overall classification accuracy.

2. Materials and Methods

2.1. Dataset and Preprocessing

In this study, experiments were performed using the FarmMap dataset compiled by the Korea Agency of Education, Promotion, and Information Service in Food, Agriculture, Forestry, and Fisheries (EPIS, Sejong-si, Republic of Korea) [46]. The dataset offers high-resolution orthophotos in GeoTIFF format as inputs and farmland polygon boundaries with farmland labels in SHP format. Each orthophoto comprises three RGB channels, and the ground sampling distance is fixed at 1 m per pixel.

As summarized in Table 1, the data were partitioned into training, validation, and test subsets. All parcels were extracted from the designated 2021 imagery of the same target regions (94,241 parcels in total), then randomly split in a 9:1 ratio into training and validation subsets; as a result, both subsets belong to an identical spatio-temporal domain. In contrast, the test subset consists of 15,227 parcels collected in 2020 from different target regions, providing a spatio-temporal domain that differs from the training imagery. Representative parcel-level orthophoto crops from each domain are shown in Figure 3.

FarmMap includes 11 farmland classes whose parcel counts are highly imbalanced. In the combined Train + Val distribution (Figure 4a), orchard and growing field each exceed 10,000 parcels, whereas ginseng and tree nursery are limited to 563 and 694 parcels, respectively. Such disparity reveals a severe class imbalance that can bias a classifier toward majority classes during training.

The marked shift in imbalance patterns in the test subset (Figure 4b) indicates that the observed distributional changes cannot be explained by class-ratio differences alone. Instead, these shifts point to an underlying domain shift driven by collection year and region. Therefore, farmland classification models must maintain robust performance under domain shifts.

Because the FarmMap dataset is supplied with polygon coordinates and farmland labels in SHP format and orthophotos in GeoTIFF format, parcel-level images had to be created before classification. The Coordinate Reference System (CRS) values of the SHP file and of the GeoTIFF orthophotos were first harmonized. Using the transformed polygon coordinates, the orthophoto was clipped so that only the pixels inside the parcel were retained, while all exterior pixels were set to RGB (0, 0, 0) to mask the background. The resulting parcel image was then paired with its farmland label (e.g., plowed rice paddy), and all subsequent training and inference was performed exclusively on these parcel-level images.

2.2. Proposed Method

This section presents two complementary techniques aimed at solving the severe class-imbalance inherent to farmland classification and the patch-duplication issue that emerges when the tiling-based scheme is used during training. Section 2.2.1 introduces the core ideas of class-balanced sampling (CBS) and random patch extraction (RPE), Section 2.2.2 describes how these techniques are combined in the training pipeline, and Section 2.2.3 details the corresponding inference procedure.

2.2.1. Methodology

This section introduces two key techniques designed to address the training limitations discussed in Section 1: random patch extraction (RPE) and class-balanced sampling (CBS). CBS guarantees that each class—including minority classes—is sampled at an equal frequency during training. RPE enhances patch-level diversity by randomly selecting spatial offsets, reducing redundancy caused by fixed patch positions. Figure 5 and Figure 6 provide visual comparisons of these methods.

Class-Balanced Sampling

CBS mitigates class imbalance by selecting a farmland class with uniform probability every time the data loader requests the next parcel sample, regardless of the class’s natural frequency. After a class is chosen, one parcel from that class’s list is read sequentially, and the list wraps around when it reaches the end. Repeating this per-sample selection oversamples minority classes and undersamples majority classes, eventually exposing the model to all classes in nearly equal proportions. Figure 5 provides a visual comparison between CBS and the natural, imbalanced sampling scheme. For reference, this “natural, imbalanced sampling” baseline—i.e., preserving the original class distribution without any balancing—is referred to as no class balancing (NCB). Therefore, CBS is evaluated against this NCB baseline throughout this paper.

Figure 5. Sampling strategy comparison between NCB and CBS.

Random Patch Extraction

We introduce RPE to enrich patch diversity within each farmland parcel, thereby improving generalization and reducing overfitting. This design is inspired by previous findings that exposure to multiple spatial offsets enhances model robustness [47,48]. The conventional fixed-offset tiling-based scheme—hereafter referred to as fixed patch extraction (FPE)—extracts patches at the same coordinates across epochs, often leading to redundant information. In contrast, RPE samples each patch using a random spatial offset such that the resulting patch includes a sufficient portion of the parcel. Consequently, even within a single parcel, RPE exposes the model to multiple views that differ in background clutter, illumination, and planting layout. This diversity mitigates duplication in training inputs, improves generalization, and reduces the risk of overfitting. As shown in Figure 6, RPE provides significantly more spatial variation across training iterations compared to FPE. Throughout the experiments, we quantitatively assessed the benefits of RPE against the fixed-baseline FPE.

Neither RPE or CBS is tied to any specific network architecture; they are broadly applicable to any classification pipeline that operates on fixed-size patches, a common paradigm in tiling-based parcel classification. Our approach focuses on dataset extraction and sampling strategies rather than on model design itself, making it applicable to a wide range of systems that employ fixed-size patch-based training. To assess the effectiveness of our training-stage techniques in isolation, we conducted experiments on a simplified parcel-level framework originally proposed as an early form of tiling-based parcel classification [42,43]. This baseline system serves as a foundational structure that remains representative and widely applicable, providing a neutral ground for evaluating patch-level learning strategies. By avoiding the complexity of modern architectures, we minimized confounding factors, ensuring that the improvements stem directly from the proposed training methods.

Figure 6. Comparison of patch extraction patterns used in FPE and RPE. In RPE (b), patch positions vary randomly at each epoch, as indicated by the differently colored boxes.

2.2.2. Training Pipeline

In this study, we combined CBS with RPE to alleviate class imbalance and enhance patch diversity during training. Whenever we loaded a parcel image for training, we first used CBS to select a farmland class at uniform probability, then retrieve the parcels belonging to that class in sequence. Because every class is effectively chosen at the same rate, minority classes receive greater sampling opportunities, thereby mitigating skewed class distributions.

Next, we applied RPE to the chosen parcel image, extracting multiple patches by drawing random

(x, y)

offsets. If the valid area of a patch fell below a user-defined threshold, we discarded that attempt and simply drew a new

(x, y)

offset to try again. Even for the same parcel, these random offsets capture subtle variations in texture, background masking, or illumination, allowing the network to see multiple different views during training. The flowchart shown in Figure 7, represents the overall training pipeline, including class sampling, patch extraction, and model input stages.

Finally, the extracted patches were passed to a CNN for softmax probability calculation and loss computation, followed by weight updates via back-propagation. Repeating this procedure over multiple epochs progressively reduces bias toward majority classes and increases patch diversity. Detailed parameters—such as the total patch count, valid-area cutoff, and batch size—are provided later in Section 2.3.

2.2.3. Inference Pipeline

During inference, we employed the same tiling procedure as the baseline (FPE + NCB), ensuring that any performance difference stemmed solely from the training scheme. Specifically, each parcel image was divided into overlapping patches, covering the entire farmland region. Any patch whose valid area was below 10% was discarded to avoid low-quality inputs. As illustrated in Figure 8, the trained network then computed softmax probabilities (

p (c ∣ {patch}_{i})

) for each retained patch.

We aggregated these patch-level predictions by multiplying the class-wise probabilities and selecting the class that maximized the product:

{\hat{y}}_{parcel} = \arg \max_{c} \{\prod_{i = 1}^{N} p (c ∣ {patch}_{i})\} .

Hence, the inference process remains identical to that of FPE + NCB, ensuring a fair comparison that isolates the effects of CBS and RPE on final performance. In this manner, the final parcel label (

{\hat{y}}_{parcel}

) reflects the combined evidence across all retained patches. Details of the inference configuration and parameter choices are provided in Appendix A.

2.3. Implementation and Evaluation

The experiments were conducted in Python (3.10.15) with PyTorch (2.4.1). We used the Adam optimizer throughout training, with an initial learning rate of

1 \times 10^{- 3}

; whenever the validation performance plateaued for several epochs, the PyTorch ReduceLROnPlateau scheduler reduced the learning rate by a factor of 0.1. The batch size (32) and the number of epochs (100) were, likewise, fixed. Data augmentation, including rotations at 0°, 90°, 180°, and 270°, as well as horizontal and vertical flips, was applied after patch extraction, exposing the network to multiple orientations.

To compare training schemes rather than network architectures, we fixed the model architecture to EfficientNet-B0 [49]. EfficientNet has demonstrated strong performance across various remote sensing tasks, making it a suitable choice for such applications [50,51,52]. The model was initialized using ImageNet-1k pretrained weights, which have proven effective in remote sensing classification tasks in prior studies [53]. After introducing CBS, we maintained a total number of training iterations identical to that of the baseline, ensuring the same training steps for each approach. Similarly, RPE was aligned with the baseline settings for the number of patches extracted from each parcel and the same 10% valid-area threshold as FPE. If a patch’s valid area fell below that threshold, we simply drew a new

(x, y)

offset (up to ten attempts) and, if still invalid, defaulted to a parcel-centered patch to preserve the required patch count. This measure ensures that no confounding factors, aside from the training scheme itself, could influence the results.

Model performance was primarily evaluated using accuracy, macro F1, and the Kappa coefficient [54]. Accuracy reflects how many predictions are correct, and it is primarily influenced by the majority classes in the evaluated dataset. Macro F1 computes the unweighted average of F1 scores across all classes, enabling a balanced evaluation, regardless of class distribution. Unlike simple accuracy, the Kappa coefficient provides a chance-corrected measure of classification performance, offering a more precise assessment of agreement between predicted and true labels. In addition, we supplemented the evaluation with class-wise F1 scores for each class. Particular attention was given to minority classes such as ginseng and tree nursery. To account for computational efficiency during training, we also tracked the average training time per epoch and monitored memory usage under each configuration. We also employed a validation set (Val) from the same domain (year/region) as the training set and a test set (Test) from a different domain to assess both in-domain and out-of-domain generalization. Since tiling-based methods are only applicable to models with fixed input sizes, we varied only CBS and RPE while keeping the EfficientNet-B0 backbone fixed. This approach isolated the effect of the proposed patch-based methods, ensuring that performance changes were due to the data processing techniques rather than modifications to the network architecture.

Additionally, we investigated how enlarging the input parcels influences classification performance. In this study, we define the concept of patch coverage as the spatial area that each individual patch actually contains. Tiling is often adopted in resource-constrained environments where entire images cannot be processed at once due to memory limitations, resulting in a reduced patch-level context [55]. In tiling-based classification, this leads to decreased contextual information per patch while increasing the number of patches, thereby enhancing the aggregation effect. To simulate such practical conditions without introducing architectural confounders, we applied bilinear upscaling to the input parcels while keeping both the patch size and model architecture fixed. This allowed us to isolate the effects of patch coverage on performance. To systematically examine these effects, we applied bilinear interpolation to upscale parcel images by different scaling factors (1×, 2×). When each parcel image is up-scaled by a fixed factor and patches of the original resolution are then extracted, two opposing effects arise: (i) context reduction—every patch covers a smaller ground area, so the information available to the network per patch decreases; and (ii) patch-count increase—a larger parcel yields more patches, giving the model a greater number of predictions to aggregate.

Because these two effects work in opposite directions, upscaling presents a clear trade-off between the loss of contextual detail and the benefit of improved aggregation from an increased number of patches. Patch predictions are combined by multiplying their softmax probabilities class-wise so that consistent classes are reinforced while noisy predictions are attenuated; this aggregation provides a compensatory effect at the parcel level.

Our experiment measured the drop in patch-level performance caused by context reduction and the extent to which parcel-level aggregation recovers that loss. We also compared the robustness of RPE on its own, CBS on its own, and the combined RPE + CBS scheme under reduced-context conditions. Finally, we verified whether the additional patches generated by up-scaling translated into higher parcel-level accuracy, thereby strengthening the case for tiling-based inference.

As illustrated in Figure 9, the same farmland parcel is processed at 1× (no upscaling) and 2× (bilinearly upscaled) by our inference pipeline. At the 2× scale, patch coverage decreases to approximately 1/4 compared to the 1× scale, reducing context information per patch. However, the total number of extracted patches increases, allowing the model to aggregate more patch-level predictions. This visual comparison underscores the trade-off between reduced contextual information per patch and an increase in the number of patches contributing to parcel-level aggregation.

3. Results

This section presents the results of experiments carried out to evaluate the effectiveness of our proposed method. In Section 2.2.1, we introduced the core techniques, random patch extraction (RPE) and class-balanced sampling (CBS). Here, we compare them against two baselines, fixed patch extraction (FPE), which uses a fixed offset per parcel, and no class balancing (NCB), which preserves the original class distribution, by performing an ablation study that involves gradually adds RPE, CBS, and the combined RPE + CBS on top of the baselines.

Specifically, we (i) measure model performance using accuracy, the macro F1 score, and Kappa to determine how our techniques affect overall classification performance; (ii) examine class-wise F1 scores to check whether minority classes (e.g., ginseng and tree nursery) benefit from RPE and CBS; (iii) evaluate patch-level predictions to gauge the model’s direct performance, as well as parcel-level predictions to see how aggregation (i.e., combining patches) enhances the final outcome; and (iv) analyze computational efficiency in terms of training time and memory usage across different configurations.

As noted in Section 2.3, all experiments were performed under both 1× (no upscaling) and 2× (bilinearly upscaled) settings. In the 2× environment, bilinear interpolation enlarges the image, reducing each patch’s spatial context but increasing the number of overlapping patches per parcel, thereby compensating for the loss at the patch level via aggregated predictions.

3.1. Overall Performance

The ablation study’s overall performance (accuracy, macro f1 score, and kappa) at 1× and 2× is summarized in Table 2a (validation set) and Table 2b (test set).

Based on these results, we observe that the baseline configuration—NCB + FPE—exhibits limited performance due to two key issues: FPE restricts data diversity by applying a fixed offset, and NCB amplifies class imbalance by preserving the natural class distribution. In contrast, RPE, when replacing FPE, enhances training diversity through random offsets. CBS, when substituting NCB, mitigates majority-class bias by enforcing uniform class sampling. Both modifications lead to individual performance gains over the baseline. When combined, the full RPE + CBS scheme achieves the greatest improvements across all evaluation settings.

In particular, the test set under the 2× condition is the most challenging, given not only domain differences but also reduced patch context. Nevertheless, RPE + CBS raises accuracy from 0.4857 (baseline) to 0.8033 (+31.76%p), the macro F1 score from 0.4394 to 0.7166 (+0.2772), and Kappa from 0.4227 to 0.7687 (+0.3410). Such a large improvement indicates that random patch extraction and class-balanced sampling remain highly effective, even when the patch context is reduced and the domain differs substantially.

3.2. Per-Class Performance

Table 3 and Table 4 summarize the per-class F1 scores for the validation (Validation) and test (Test) sets, respectively. In this section, we primarily focus on the minority classes—namely, ginseng and tree nursery—while also briefly noting changes in major classes. We examine how each method’s F1 score changes (

∆

Val/

∆

Test) compared to the baseline (NCB + FPE).

RPE enhances training diversity by introducing random offsets during patch selection. While this approach generally improves performance across various classes, including several major ones, our analysis focuses on its effects on minority classes. For instance, ginseng (

1 \times

) showed substantial gains (+0.0696/+0.0788). However, tree nursery (

2 \times

) exhibited a performance drop (−0.2025/−0.0795), indicating that RPE does not consistently benefit all minority classes.

CBS addresses class imbalance by ensuring equal sampling probability across all classes, thereby providing more consistent learning opportunities for minority classes. While the number of training examples for most major classes is moderately reduced by undersampling, their performance sometimes improves—likely because balancing helps the model learn clearer class boundaries. Ginseng (

2 \times

) shows a large improvement of

(+ 0.4000 / + 0.4332)

. In contrast, ginseng (

1 \times

) sees an improvement of

(+ 0.0600 / + 0.0123)

, which is on par with other methods on the validation set but significantly lower on the test set. Tree nursery (

2 \times

) also improves to

+ 0.4086

on the validation set, but the gain is limited to

+ 0.0349

on the test set. These results suggest that the use of CBS alone may lead to performance gaps between validation and test sets in certain minority classes.

When RPE and CBS are combined, the proposed method achieves top or near-top F1 scores for minority classes under both

1 \times

/

2 \times

conditions on the validation and test sets, and it also frequently attains the highest scores among the major classes. Overall, this indicates stable and consistent improvement for minority classes. In particular, tree nursery(

2 \times

) rises to

(+ 0.6171 / + 0.4044)

, yielding the highest observed performance gain. Further interpretation of these results is presented in Section 4.1.

3.3. Tiling Aggregation Experiment

As described in Section 2.2.3, parcel-level classification is achieved by subdividing each parcel into multiple overlapping patches and aggregating their individual predictions. Although our final objective is parcel-level prediction, we separately evaluate patch-level performance to assess the model’s robustness under reduced contextual information and to verify how aggregation improves final accuracy. Therefore, we first evaluate the model at the patch level, then compare it to parcel-level results after aggregation, as summarized in Table 5.

Under the 2× environment, each patch covers less contextual information compared to the 1× environment, while the total number of overlapping patches per parcel increases. In the 1× setting, parcel-level accuracy improved by an average of +0.0114 over patch-level accuracy (with a maximum of +0.0238), the macro F1 score rose by an average of +0.0197 (up to +0.0293), and the Kappa improved by an average of +0.0179 (up to +0.0296). In contrast, under the 2× setting, parcel-level performance improved by an average of +0.0345 in accuracy (up to +0.0518), +0.0417 in macro F1 score (up to +0.0542) and +0.0458 in Kappa (up to +0.0669), highlighting how an increased number of patches per parcel can enhance aggregation effectiveness.

Because each patch necessarily covers less context in the 2× setting, every method exhibited lower patch-level performance compared to the 1× setting. In particular, for the baseline approach, switching from 1× to 2× caused patch-level accuracy to drop by 0.2015, the macro F1 score to decrease by 0.1835, and the Kappa to fall by 0.2151, underscoring that the loss of contextual information makes the model’s prediction task more challenging. A more detailed analysis of these results is provided in Section 4.1.2.

Summarizing all experimental findings in the Results section, our combined approach (RPE + CBS) consistently surpassed both the baseline and single-method variants in in-domain experiments (same year/region) and demonstrated robust performance, even in out-of-domain experiments (different year/region). These findings underscore the method’s effectiveness for farmland classification, especially for minority classes such as ginseng and tree nursery. In the following Discussion section, we address potential limitations and propose future directions.

3.4. Computational Efficiency

To assess potential increases in computational overhead due to additional preprocessing required by RPE and CBS compared to the baseline, we measured the average training time and memory usage over 100 epochs for each experimental configuration in the 1× setting; results are summarized in Table 6.

Using RPE increased training time by only 1.3% compared to FPE due to the additional step of verifying valid regions within each patch. In contrast, CBS introduced negligible computational overhead, as it merely involves selecting the class from which patches are extracted, exhibiting computational efficiency similar to that of the baseline. Overall, the combined approach of RPE and CBS effectively enhanced classification performance at the cost of a marginal increase in computation.

4. Discussion

Building on the experimental results presented above, this section offers a deeper examination of the proposed method’s effectiveness and limitations.

The discussion proceeds as follows. First, focusing on the observed performance patterns among minority classes, we analyze how random patch extraction (RPE) and class-balanced sampling (CBS) produce distinct effects depending on each class’s relative difficulty. Second, we comprehensively analyze the trade-off arising from changes in patch coverage, where intra-patch context and the number of patches vary inversely. Specifically, when patch coverage increases, intra-patch context grows, but the number of patches decreases, weakening the aggregation effect. Conversely, when patch coverage decreases, intra-patch context diminishes, but the number of patches increases, strengthening the aggregation effect. In this section, we examine how this trade-off impacts overall performance. Finally, we summarize the key limitations revealed by the experimental results and propose future research directions and potential expansions. By exploring these issues from multiple perspectives, we clarify why the proposed approach remains stable, even under severe class imbalance and reduced resolution; verify the impact of aggregation; and discuss its practical applicability and constraints in real-world remote sensing scenarios.

4.1. Difficulty-Based Performance Analysis on Minority Classes

In order to interpret the different performance variations observed in minority classes (Section 3.2) more thoroughly, this section presents a performance analysis based on the difficulty level of each class. Specifically, we first assess how challenging each minority class is to predict, then revisit the method-specific performance changes to clarify why improvements (or declines) arise, as reported in Section 3.2.

4.1.1. Class Difficulty Assessment

Figure 10 shows representative parcel images of ginseng and tree nursery, making it straightforward to observe the textural differences between these classes. Ginseng parcels typically maintain uniform coloration and vegetation patterns, whereas tree nursery parcels feature uneven textures; the color and shape can vary considerably according to each seedling’s growth stage and cultivar, and multiple distinct traits are often intermingled at the parcel level.

To quantitatively support this visual assessment, Table 7 summarizes the validation (same domain) metrics obtained under a baseline method in the 1× and 2× environments. In particular, we compare the difficulty of classifying ginseng and tree nursery using the F1 score, partial confusion matrices, and average confidence, excluding the test set because the domain discrepancy made it unsuitable for difficulty analysis.

Ginseng (1×) recorded an F1 score of 0.9043, a correct-prediction rate of about 89.7%, and an average confidence of 0.8837, signifying high predictive performance overall. Its parcel interior remains relatively uniform, which likely facilitated easy training and inference, making Ginseng (1×) an easy class to classify.

Ginseng (2×) is still the same ginseng class, but due to 2× upscaling, the patch coverage area was effectively reduced to less than half, leading to an F1 score of 0.5500, a correct prediction rate of 37.9%, and an average confidence of 0.7680. These results indicate a noticeable drop compared to ginseng (1×), with most misclassifications going to majority classes such as mulched field. Consequently, ginseng (2×) can be deemed a normal difficulty level.

Meanwhile, tree nursery (1×) obtains an F1 score of 0.4694, a correct prediction rate of 39.1%, and an average confidence of 0.6562, reflecting generally low predictive performance. Most misclassifications occurred as orchard. Within a single parcel, the color and shape are highly variable, depending on each seedling’s growth stage and cultivar, forming an uneven texture that makes it difficult for the model to recognize tree nursery as a single class. Hence, tree nursery (1×) is considered difficult.

Tree nursery (2×) undergoes further context loss from 2× upscaling, yielding a correct prediction rate of only 11.6%, an F1 score of 0.2025, and an average confidence of 0.6170. Already irregular textures fragment even more, resulting in frequent misclassifications as orchard or other classes, placing tree nursery (2×) in a very difficult class.

In summary, combining these validation metrics with the visual observations, we distinguish the following difficulty levels: ginseng (1×) as easy, ginseng (2×) as normal, tree nursery (1×) as difficult, and tree nursery (2×) as very difficult. Ginseng’s relatively uniform internal texture supports stable training and inference, while tree nursery’s more complex vegetation distribution within a single parcel ultimately increases the model’s prediction difficulty.

4.1.2. Effects of Each Method According to Class Difficulty

Based on the class difficulties defined in Section 4.1.1, we compare the effects of RPE and CBS in Table 8. All performance values are presented in the form of (

∆

Val/

∆

Test).

Ginseng (1×) is classified as an easy class. When using CBS alone, the validation F1 score improved by (+0.0600), but the test F1 score increased by only (+0.0123), indicating a relatively modest improvement on the test set. This supports the observation in [56] that excessive oversampling for a low-difficulty class can induce model bias and degrade generalization. In contrast, applying RPE alone—via random patch extraction at various positions—enhanced data diversity and yielded balanced performance gains of (+0.0696/+0.0790), improving the model’s generalization ability. When RPE and CBS were combined, the data diversity from RPE offset the overfitting tendency sometimes seen with CBS alone, yielding the highest domain adaptation gains (+0.0581/+0.1241).

Under 2× upscaling, ginseng (2×) suffers reduced context information and is considered a normal-difficulty class. Even with CBS alone, it achieved (+0.4000/+0.4332), which falls only slightly short (

- 0.0500

/

- 0.0042

) of the RPE + CBS combination. Although RPE alone produced stable gains (+0.3015/+0.3051), it was more limited compared to CBS. Combining RPE and CBS led to the highest performance (+0.4500/+0.4374). These results imply that the class is neither extremely low in difficulty nor lacking distinct structural features, so oversampling can be applied without causing overfitting—unlike the oversampling tendency observed in ginseng (1×). Hence, for oversampling to work effectively, a sufficiently moderate difficulty level and clear class texture are both required.

Tree nursery (1×) is deemed a difficult class. CBS alone improved performance by (+0.3185/+0.2750), whereas RPE alone provided a somewhat limited gain (+0.3119/+0.1579). This indicates that as class difficulty increases, merely increasing data diversity (RPE) without additional training cycles is insufficient, and repeated sampling for minority classes (CBS) becomes more vital. Combining RPE and CBS offers both the stability of repeated sampling and enhanced data diversity, producing the greatest overall improvement (+0.4409/+0.3239).

Tree nursery (2×) is considered very difficult. It remains the same tree nursery class but, under the 2× condition, suffers a significant loss of context information. Applying RPE alone further intensified the model’s confusion, failing to make any correct predictions for this class, while CBS alone produced a substantial improvement on the validation set (+0.4086) but had limited effects on the test set (+0.0349), consistent with [57], which notes that oversampling of highly complex minority classes may be restricted or counterproductive. These findings show that neither method alone can overcome both context loss and class uncertainty. However, combining the two methods secures both repeated training exposure and greater data diversity simultaneously, resulting in the highest improvement (+0.6171/+0.4044).

In this study, we observed that the classification performance of RPE and CBS varied according to the difficulty level of minority classes, indicating that factors such as texture complexity and available context strongly influence the effectiveness of these methods for minority-class classification. While RPE alone enriches the training set’s diversity and can enhance generalization, high difficulty and complex texture structures may exacerbate confusion. CBS alone ensures repeated sampling for minority classes but risks overfitting if the class is already too easy. Hence, combining RPE and CBS compensates for each approach’s limitations, delivering more consistent and stable performance improvements across varying difficulty levels and structural class characteristics.

4.2. Analysis of Trade-Off According to Patch Coverage Change

As discussed previously in Section 2.3, patch coverage has two opposing effects on model performance. When patch coverage decreases, the contextual information per patch diminishes, whereas the number of patches extracted per parcel increases, enhancing the aggregation effect. In this section, we comprehensively analyze the trade-off between these two effects and examine how varying levels of patch coverage impact model performance, including under conditions of severely reduced coverage.

4.2.1. Patch-Count Increase and Aggregation Benefits

As previously observed in Section 3.3, upscaling allows more patches to be extracted per parcel, thereby increasing the total number of patches for parcel-level inference. This configuration makes parcel-level aggregation more informative, providing a strong rationale for tiling-based approaches.

Table 5 indicates that under the

1 \times

condition, the gain from patch level to parcel level was modest (accuracy up to +0.0238, macro F1 score up to +0.0293, and Kappa up to +0.0296), whereas under the

2 \times

condition, the same aggregation offered more pronounced improvements (accuracy up to +0.0518, macro F1 score up to +0.0542, and Kappa up to +0.0669). Thus, increasing the patch count is not simply a matter of having more predictions but also of combining different patch-level outputs for more stable parcel-level decisions.

However, FPE + CBS showed a slight accuracy drop of 0.0019 after aggregation. Product-based aggregation can yield robust performance when patches exhibit uniformly high confidence, effectively capturing a consensus among similar texture patterns, yet it also demands consistent outputs across all patches, so a single outlier patch that assigns extremely low confidence to the correct class can substantially diminish the final aggregated score. This trade-off helps explain why, in most configurations, aggregation provided noticeable gains—particularly for classes with relatively uniform texture—while FPE + CBS encountered a drop in this scenario due to one or more outlier patches. Despite this limitation, under upscaling, tiling-based parcel-level prediction continued to demonstrate its structural advantage overall, compensating for potential instabilities at the patch level in most cases.

4.2.2. Patch-Level Robustness Under Reduced Context

Meanwhile, as also discussed in Section 3.3, upscaling causes each patch to cover a smaller real-world area, thereby reducing the available contextual information. In this section, we further analyze how each method responds when patch-level context is diminished. Upscaling enlarges the input parcel by a certain factor but extracts patches of the same size, so effectively, the patch coverage shrinks. This leads to a general decline in model performance, though each method exhibits different levels of robustness to this loss of information.

According to the patch-level metrics presented in Table 5 (Section 3.3), switching from

1 \times

to

2 \times

caused the baseline (FPE + NCB) to drop by

0.2015

in accuracy, by

0.1835

in macro F1 score, and by

0.2151

in Kappa. In contrast, RPE + CBS led to a declines of only 0.0068 (accuracy), 0.0111 (macro F1), and 0.0082 (Kappa), indicating the smallest performance drop and maintaining high predictive stability, even under reduced context conditions. This RPE + CBS combination exhibited the least performance degradation at the patch level, thanks to its complementary approach of increased data diversity and balanced class sampling. This confirms that this strategy remains robust in scenarios where patch-level context is limited.

4.2.3. Robustness Under Extremely Reduced Patch Coverage

To further examine how each method performs under extremely reduced patch coverage, we conducted additional experiments at a higher scaling factor (3× scaling), where patch coverage is approximately 1/9 of the original 1× condition. Table 9 summarizes the model performance results under these conditions, where the context information per patch is drastically reduced but the number of extracted patches significantly increases, enhancing the aggregation effect.

As shown in Table 9, all methods failed to achieve their peak performance under the 3× scaling condition. This indicates that when patch coverage becomes excessively small, the context information within each patch is severely limited, causing a sharp decrease in individual patch-level prediction accuracy. Consequently, even the increased aggregation effect due to a higher patch count is insufficient to fully compensate for this decline. However, even under these extreme conditions of reduced patch coverage, the proposed CBS+RPE method exhibited the smallest performance degradation compared to the other methods, maintaining relatively high accuracy and stability despite the limited information available. This result suggests that CBS + RPE can provide comparatively robust performance, even in situations where patch coverage is extremely reduced.

In conclusion, these supplementary experiments highlight the inherent trade-off between patch-level contextual information and the aggregation effect across patches. While reduced patch coverage can enhance aggregation, it simultaneously limits contextual cues crucial for accurate prediction.

Despite this trade-off, our results demonstrate that the proposed CBS + RPE method maintains high performance, even under severely reduced patch coverage, suggesting strong robustness in resource-constrained environments. This indicates that the method is well-suited for real-world tiling-based classification systems, particularly in scenarios involving limited hardware capacity or varying spatial resolutions.

4.3. Limitations

4.3.1. No External Context Consideration

In some cases, plowed rice paddy and plowed field share very similar internal cultivation patterns, as shown in Figure 11, making them difficult to distinguish based solely on parcel-internal textures. In reality, a rice paddy is typically situated on lower ground to retain water, whereas a field is often located on slightly higher terrain to facilitate drainage. However, in this study’s approach, all external background information is masked out so that the model trains only on the parcel’s interior textures. As a result, crucial contextual cues—such as ground elevation and drainage conditions—are excluded, preventing the model from adequately separating plowed rice paddy and plowed field. This frequent confusion between the two classes suggests that supplementary strategies incorporating “external background” data may be necessary.

4.3.2. Vulnerability to Heterogeneous Internal Textures

In the mulched field class, certain parcels may not be uniformly mulched across their entire area, leaving some portions mulched and others unmulched (see Figure 12). Under these partial mulching conditions, specific patches may capture only unmulched regions and, thus, become prone to misclassification as a different class (for example, plowed field), even though they are genuinely part of the mulched field parcel. Furthermore, in a tiling-based approach, multiple patch-level softmax probabilities are multiplied (product) to obtain the final score; therefore, if even one patch assigns an extremely low probability to the correct class, the overall product drops sharply, causing the final prediction to deteriorate. These observations indicate that tiling-based methods can be vulnerable when textures are not uniform, highlighting the need for additional measures to address such heterogeneous scenarios.

5. Conclusions

In this study, we introduced a class-balanced random patch training strategy to improve tiled crop-type classification under class imbalance. By integrating random patch extraction (RPE) with class-balanced sampling (CBS), our method enriches patch diversity and ensures that minority crop classes receive equitable training attention. We demonstrated that this combined approach substantially boosts classification performance: on the FarmMap dataset, our models achieved large gains in minority-class F1 scores (up to +0.62 on a validation set and +0.40 on an out-of-domain test set) while maintaining high overall accuracy and macro F1 scores. Notably, these benefits persisted across different difficulty levels of minority classes, indicating that RPE + CBS reliably aids in even the most challenging categories. The proposed pipeline also proved robust to domain shifts (using data from different years and regions), suggesting good generalizability. These results confirm that the method effectively addresses the structural issues of class imbalance and patch duplication in tiling-based farmland classification. Despite its effectiveness, the method has a few limitations. It does not utilize external contextual cues and may be sensitive to within-parcel textural variability. Addressing these aspects could further enhance reliability in real-world settings. Future work should incorporate information on the surrounding area as additional input and explore aggregation methods that are more tolerant to local textural inconsistency.

Author Contributions

Conceptualization, Y.B. (Yeongung Bae); methodology, Y.B. (Yeongung Bae); software, Y.B. (Yeongung Bae); validation, Y.B. (Yeongung Bae) and Y.B. (Yuseok Ban); formal analysis, Y.B. (Yeongung Bae) and Y.B. (Yuseok Ban); writing—original draft preparation, Y.B. (Yeongung Bae); writing—review and editing, Y.B. (Yeongung Bae) and Y.B. (Yuseok Ban); visualization, Y.B. (Yeongung Bae); supervision, Y.B. (Yuseok Ban); project administration, Y.B. (Yuseok Ban). All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1A5A8026986), a Korea Institute for Advancement of Technology(KIAT) grant funded by the Korea Government (MOTIE) (P0020536, HRD Program for Industrial Innovation), and the Institute of Information & Communications Technology Planning & Evaluation (IITP)—Innovative Human Resource Development for Local Intellectualization program grant funded (20%) by the Korea government (MSIT)(IITP-2025-RS-2020-II201462).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. The data were obtained from the Korea Agency of Education, Promotion, and Information Service in Food, Agriculture, Forestry, and Fisheries (EPIS) and are not publicly available. The authors do not have permission to redistribute the data.

Conflicts of Interest

Author Yeongung Bae was employed by the company OptAI. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

CNN	Convolutional Neural Network
OBIA	Object-Based Image Analysis
FPE	Fixed Patch Extraction
RPE	Random Patch Extraction
CBS	Class-Balanced Sampling
NCB	No Class Balancing

Appendix A. Experimental Basis for Configuration Selection

This appendix presents the experimental analysis and final selection of configuration strategies applied in this study. Specifically, we investigated three design components: (1) the patch overlap strategy, (2) the valid-area threshold, and (3) the aggregation method. Each component was empirically evaluated in terms of its impact on classification performance and computational efficiency, and a final decision was made based on the experimental results.

Appendix A.1. Patch Overlap Strategy

The patch overlap strategy adjusts the degree of spatial redundancy between adjacent patches during inference. In semantic segmentation, such an overlap has been shown to help preserve boundary information [58,59], but its effectiveness in classification tasks has not been clearly validated. Motivated by these findings, we examined whether a similar effect could be observed in patch-level classification tasks. To this end, we compared three minimum overlap ratio settings: 0%, 5%, and 10%. While a higher minimum overlap increased the number of inference patches and the overall computation time, it did not lead to meaningful performance gains and, in some cases, even caused performance degradation. Therefore, to maintain computational efficiency and design simplicity, we decided not to apply a minimum overlap constraint and fixed the overlap ratio at 0% for all subsequent experiments.

Appendix A.2. Valid-Area Threshold

The valid-area threshold defines the minimum proportion of farmland pixels within a patch required for it to be considered valid during training and inference. If the threshold is set too low, patches may contain mostly background, introducing noise during training and providing insufficient visual cues during inference. This aligns with previous findings that a low proportion of the target class within a patch can reduce classification accuracy [60].

Conversely, if the threshold is set too high, boundary patches may be filtered out more than necessary, resulting in reduced data diversity and degraded generalization performance. This also corresponds to observations in prior studies showing that removing patches with a high proportion of background can lead to the loss of boundary information and cause overfitting to interior-dominant patterns [60].

To identify a balanced setting, we compared valid-area thresholds of 5%, 10%, and 15%. At 5%, many low-quality patches were included, degrading performance due to insufficient contextual information. At 15%, boundary patches were filtered out more than necessary, limiting spatial variation. In contrast, a threshold of 10% provided the best trade-off between maintaining contextual integrity and ensuring patch diversity. Accordingly, we adopted 10% as the default valid-area threshold throughout both the training and inference pipelines.

Appendix A.3. Aggregation Strategy

In the inference phase, multiple patch-level predictions must be aggregated to produce a single parcel-level classification result. We compared three aggregation methods: majority voting, class-wise summation, and class-wise product of softmax probabilities.

Among the three, the product method consistently yielded the most stable and accurate results. This appears to be particularly effective in our setting, where parcels often have uniform internal textures, leading to consistent patch-level predictions that can be reinforced through multiplicative aggregation.

To ensure a fair and consistent comparison across all experiments, we adopted product-based aggregation as the unified strategy for final prediction.

To summarize the effects of all configuration settings discussed above, we provide a comparative overview in the following table. The experimental results discussed above are summarized in Table A1, which provides a consolidated comparison of performance across all configuration combinations—including patch overlap settings, valid-area thresholds, and aggregation methods—based on accuracy, macro F1, and the Kappa coefficient.

Table A2 presents the classification performance under domain-shift conditions across various configurations. The results indicate that our proposed method consistently maintained higher and more stable performance compared to the baseline, demonstrating its robustness to spatio-temporal variability.

Table A1. Parcel-level classification performance (accuracy, macro F1, and Kappa) on the validation set, comparing the baseline (NCB + FPE) and our method (CBS + RPE) under different combinations of minimum overlap, valid-area thresholds, and aggregation strategies.

Minimum Overlap	Valid-Area Threshold	Aggregation	NCB + FPE (Baseline)			CBS + RPE (Ours)
Minimum Overlap	Valid-Area Threshold	Aggregation	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa
0%	5%	vote	0.8763	0.7896	0.8593	0.9093	0.8510	0.8970
		sum	0.8770	0.7925	0.8601	0.9132	0.8591	0.9014
		product	0.8762	0.7909	0.8591	0.9123	0.8587	0.9004
	10%	vote	0.9643	0.9146	0.9595	0.9910	0.9815	0.9898
		sum	0.9643	0.9120	0.9595	0.9909	0.9814	0.9897
		product	0.9645	0.9130	0.9598	0.9910	0.9815	0.9898
	15%	vote	0.9019	0.8482	0.8885	0.9330	0.8587	0.9241
		sum	0.9039	0.8508	0.8908	0.9346	0.8639	0.9259
		product	0.9037	0.8512	0.8905	0.9353	0.8632	0.9268
5%	5%	vote	0.8762	0.7895	0.8591	0.9101	0.8516	0.8979
		sum	0.8769	0.7923	0.8600	0.9137	0.8595	0.9020
		product	0.8765	0.7930	0.8595	0.9131	0.8592	0.9012
	10%	vote	0.9643	0.9146	0.9595	0.9910	0.9815	0.9898
		sum	0.9644	0.9121	0.9596	0.9909	0.9814	0.9897
		product	0.9647	0.9132	0.9600	0.9910	0.9815	0.9898
	15%	vote	0.9020	0.8478	0.8886	0.9931	0.8591	0.9243
		sum	0.9038	0.8502	0.8906	0.9345	0.8639	0.9258
		product	0.9035	0.8510	0.8903	0.9353	0.8632	0.9268
10%	5%	vote	0.8762	0.7895	0.8591	0.9106	0.8520	0.8985
		sum	0.8768	0.7922	0.8599	0.9139	0.8597	0.9022
		product	0.8759	0.7924	0.8589	0.9133	0.8594	0.9015
	10%	vote	0.9643	0.9146	0.9595	0.9910	0.9815	0.9898
		sum	0.9643	0.9121	0.9595	0.9909	0.9814	0.9897
		product	0.9648	0.9131	0.9600	0.9910	0.9815	0.9898
	15%	vote	0.9034	0.8509	0.8902	0.9330	0.8590	0.9241
		sum	0.9037	0.8501	0.8905	0.9345	0.8639	0.9258
		product	0.9034	0.8506	0.8902	0.9353	0.8632	0.9268

Table A2. Parcel-level classification performance (accuracy, macro F1, and Kappa) on the test set under the same parameter combinations as Table A1. These results verify the robustness and generalization ability of the proposed method under domain shift.

Minimum Overlap	Valid-Area Threshold	Aggregation	NCB + FPE (Baseline)			CBS + RPE (Ours)
Minimum Overlap	Valid-Area Threshold	Aggregation	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa
0%	5%	vote	0.5656	0.5390	0.5129	0.7419	0.6423	0.6962
		sum	0.5673	0.5377	0.5144	0.7436	0.6438	0.6981
		product	0.5677	0.5393	0.5148	0.7441	0.6445	0.6987
	10%	vote	0.6958	0.6169	0.6508	0.7802	0.7033	0.7435
		sum	0.6950	0.6156	0.6500	0.7813	0.7049	0.7446
		product	0.6957	0.6168	0.6506	0.7811	0.7043	0.7444
	15%	vote	0.5681	0.5467	0.5181	0.7799	0.6770	0.7410
		sum	0.5710	0.5483	0.5210	0.7819	0.6802	0.7433
		product	0.5698	0.5485	0.5199	0.7815	0.6799	0.7431
5%	5%	vote	0.5654	0.5387	0.5127	0.7419	0.6421	0.6962
		sum	0.5675	0.5381	0.5146	0.7438	0.6439	0.6983
		product	0.5677	0.5393	0.5149	0.7442	0.6446	0.6988
	10%	vote	0.6957	0.6167	0.6507	0.7803	0.7033	0.7435
		sum	0.6949	0.6156	0.6498	0.7813	0.7048	0.7446
		product	0.6956	0.6168	0.6506	0.7811	0.7043	0.7444
	15%	vote	0.5680	0.5466	0.5180	0.7799	0.6770	0.7410
		sum	0.5707	0.5482	0.5207	0.7818	0.6801	0.7432
		product	0.5697	0.5483	0.5198	0.7816	0.6799	0.7431
10%	5%	vote	0.5651	0.5382	0.5123	0.7421	0.6423	0.6964
		sum	0.5673	0.5378	0.5144	0.7437	0.6438	0.6982
		product	0.5678	0.5394	0.5149	0.7439	0.6443	0.6985
	10%	vote	0.6956	0.6165	0.6506	0.7803	0.7034	0.7435
		sum	0.6948	0.6154	0.6498	0.7811	0.7045	0.7443
		product	0.6956	0.6167	0.6505	0.7809	0.7040	0.7441
	15%	vote	0.5676	0.5464	0.5177	0.7797	0.6767	0.7407
		sum	0.5704	0.5478	0.5204	0.7817	0.6798	0.7431
		product	0.5693	0.5477	0.5193	0.7814	0.6797	0.7429

References

Xue, H.; Fan, Y.; Dong, G.; He, S.; Lian, Y.; Luan, W. Crop classification in the middle reaches of the Hei River based on model transfer. Sci. Rep. 2024, 14, 28964. [Google Scholar] [CrossRef] [PubMed]
Foody, G.M.; Mathur, A. Toward intelligent training of supervised image classifications: Directing training data acquisition for SVM classification. Remote Sens. Environ. 2004, 93, 107–117. [Google Scholar] [CrossRef]
Phiri, D.; Morgenroth, J. Developments in Landsat Land Cover Classification Methods: A Review. Remote Sens. 2017, 9, 967. [Google Scholar] [CrossRef]
Li, Q.; Wang, C.; Zhang, B.; Lu, L. Object-Based Crop Classification with Landsat-MODIS Enhanced Time-Series Data. Remote Sens. 2015, 7, 16091–16107. [Google Scholar] [CrossRef]
Peña, J.M.; Gutiérrez, P.A.; Hervás-Martínez, C.; Six, J.; Plant, R.E.; López-Granados, F. Object-Based Image Classification of Summer Crops with Machine Learning Methods. Remote Sens. 2014, 6, 5019–5041. [Google Scholar] [CrossRef]
Li, H.; Tian, Y.; Zhang, C.; Zhang, S.; Atkinson, P.M. Temporal Sequence Object-based CNN (TS-OCNN) for crop classification from fine resolution remote sensing image time-series. Crop J. 2022, 10, 1507–1516. [Google Scholar] [CrossRef]
Li, H.; Zhang, C.; Zhang, Y.; Zhang, S.; Ding, X.; Atkinson, P.M. A Scale Sequence Object-based Convolutional Neural Network (SS-OCNN) for crop classification from fine spatial resolution remotely sensed imagery. Int. J. Digit. Earth 2021, 14, 1528–1546. [Google Scholar] [CrossRef]
Li, H.; Zhang, C.; Zhang, S.; Atkinson, P.M. A hybrid OSVM-OCNN Method for Crop Classification from Fine Spatial Resolution Remotely Sensed Imagery. Remote Sens. 2019, 11, 2370. [Google Scholar] [CrossRef]
Altun, M.; Turker, M. Integration of convolutional neural networks with parcel-based image analysis for crop type mapping from time-series images. Earth Sci. Inform. 2025, 18, 303. [Google Scholar] [CrossRef]
Nair, S.; Sharifzadeh, S.; Palade, V. Farmland Segmentation in Landsat 8 Satellite Images Using Deep Learning and Conditional Generative Adversarial Networks. Remote Sens. 2024, 16, 823. [Google Scholar] [CrossRef]
Zhang, K.; Yuan, D.; Yang, H.; Zhao, J.; Li, N. Synergy of Sentinel-1 and Sentinel-2 Imagery for Crop Classification Based on DC-CNN. Remote Sens. 2023, 15, 2727. [Google Scholar] [CrossRef]
Carvalho, O.L.F.d.; de Carvalho Júnior, O.A.; Albuquerque, A.O.d.; Bem, P.P.d.; Silva, C.R.; Ferreira, P.H.G.; Moura, R.d.S.d.; Gomes, R.A.T.; Guimarães, R.F.; Borges, D.L. Instance Segmentation for Large, Multi-Channel Remote Sensing Imagery Using Mask-RCNN and a Mosaicking Approach. Remote Sens. 2021, 13, 39. [Google Scholar] [CrossRef]
Gafurov, A. Automated Mapping of Cropland Boundaries Using Deep Neural Networks. AgriEngineering 2023, 5, 1568–1580. [Google Scholar] [CrossRef]
Kim, H.S.; Song, H.; Jung, J. Cadastral-to-Agricultural: A Study on the Feasibility of Using Cadastral Parcels for Agricultural Land Parcel Delineation. Remote Sens. 2024, 16, 3568. [Google Scholar] [CrossRef]
Persello, C.; Tolpekin, V.A.; Bergado, J.R.; de By, R.A. Delineation of agricultural fields in smallholder farms from satellite images using fully convolutional networks and combinatorial grouping. Remote Sens. Environ. 2019, 231, 111253. [Google Scholar] [CrossRef]
Vlachopoulos, O.; Leblon, B.; Wang, J.; Haddadi, A.; LaRocque, A.; Patterson, G. Delineation of Crop Field Areas and Boundaries from UAS Imagery Using PBIA and GEOBIA with Random Forest Classification. Remote Sens. 2020, 12, 2640. [Google Scholar] [CrossRef]
Shenoy, J.; Zhang, X.D.; Tao, B.; Mehrotra, S.; Yang, R.; Zhao, H.; Vasisht, D. Self-Supervised Learning across the Spectrum. Remote Sens. 2024, 16, 3470. [Google Scholar] [CrossRef]
Zhang, C.; Liu, T.; Xiao, J.; Lam, K.M.; Wang, Q. Boosting Object Detectors via Strong-Classification Weak-Localization Pretraining in Remote Sensing Imagery. IEEE Trans. Instrum. Meas. 2023, 72, 1–20. [Google Scholar] [CrossRef]
Zhang, C.Q.; Deng, Y.; Chong, M.Z.; Zhang, Z.W.; Tan, Y.H. Entropy-Based re-sampling method on SAR class imbalance target detection. ISPRS J. Photogramm. Remote Sens. 2024, 209, 432–447. [Google Scholar] [CrossRef]
Xu, T.; Zhao, Z.; Wu, J. Breaking the ImageNet Pretraining Paradigm: A General Framework for Training Using Only Remote Sensing Scene Images. Appl. Sci. 2023, 13, 11374. [Google Scholar] [CrossRef]
Aneece, I.; Thenkabail, P.S. Classifying Crop Types Using Two Generations of Hyperspectral Sensors (Hyperion and DESIS) with Machine Learning on the Cloud. Remote Sens. 2021, 13, 4704. [Google Scholar] [CrossRef]
Waldner, F.; Chen, Y.; Lawes, R.; Hochman, Z. Needle in a haystack: Mapping rare and infrequent crops using satellite imagery and data balancing methods. Remote Sens. Environ. 2019, 233, 111375. [Google Scholar] [CrossRef]
Zhang, Y.; Kang, B.; Hooi, B.; Yan, S.; Feng, J. Deep long-tailed learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10795–10816. [Google Scholar] [CrossRef] [PubMed]
Islam, M.T.; Islam, M.R.; Uddin, M.P.; Ulhaq, A. A Deep Learning-Based Hyperspectral Object Classification Approach via Imbalanced Training Samples Handling. Remote Sens. 2023, 15, 3532. [Google Scholar] [CrossRef]
Douzas, G.; Bacao, F.; Fonseca, J.; Khudinyan, M. Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sens. 2019, 11, 3040. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
Yuan, J.; Zheng, Z.; Chu, C.; Wang, W.; Guo, L. A Hybrid Synthetic Minority Oversampling Technique and Deep Neural Network Framework for Improving Rice Yield Estimation in an Open Environment. Agronomy 2024, 14, 1890. [Google Scholar] [CrossRef]
Rendón, E.; Alejo, R.; Castorena, C.; Isidro-Ortega, F.J.; Granda-Gutiérrez, E.E. Data Sampling Methods to Deal with the Big Data Multi-Class Imbalance Problem. Appl. Sci. 2020, 10, 1276. [Google Scholar] [CrossRef]
Datta, D.; Mallick, P.K.; Reddy, A.V.N.; Mohammed, M.A.; Jaber, M.M.; Alghawli, A.S.; Al-qaness, M.A.A. A Hybrid Classification of Imbalanced Hyperspectral Images Using ADASYN and Enhanced Deep Subsampled Multi-Grained Cascaded Forest. Remote Sens. 2022, 14, 4853. [Google Scholar] [CrossRef]
Quan, Y.; Zhong, X.; Feng, W.; Chan, J.C.W.; Li, Q.; Xing, M. SMOTE-Based Weighted Deep Rotation Forest for the Imbalanced Hyperspectral Data Classification. Remote Sens. 2021, 13, 464. [Google Scholar] [CrossRef]
Roy, S.K.; Haut, J.M.; Paoletti, M.E.; Dubey, S.R.; Plaza, A. Generative Adversarial Minority Oversampling for Spectral–Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5500615. [Google Scholar] [CrossRef]
Al-Najjar, H.A.H.; Pradhan, B.; Sarkar, R.; Beydoun, G.; Alamri, A. A New Integrated Approach for Landslide Data Balancing and Spatial Prediction Based on Generative Adversarial Networks (GAN). Remote Sens. 2021, 13, 4011. [Google Scholar] [CrossRef]
Miftahushudur, T.; Grieve, B.; Yin, H. Permuted KPCA and SMOTE to Guide GAN-Based Oversampling for Imbalanced HSI Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 489–505. [Google Scholar] [CrossRef]
Yuan, W.; Li, P. Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection. Big Data Cogn. Comput. 2025, 9, 28. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Ding, X.; Wang, Z.; Peng, S.; Shao, X.; Deng, R. Research on Land Use and Land Cover Information Extraction Methods for Remote Sensing Images Based on Improved Convolutional Neural Networks. ISPRS Int. J. Geo-Inf. 2024, 13, 386. [Google Scholar] [CrossRef]
Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices. Remote Sens. 2024, 16, 533. [Google Scholar] [CrossRef]
Zhou, Z.; Zheng, C.; Liu, X.; Tian, Y.; Chen, X.; Chen, X.; Dong, Z. A Dynamic Effective Class Balanced Approach for Remote Sensing Imagery Semantic Segmentation of Imbalanced Data. Remote Sens. 2023, 15, 1768. [Google Scholar] [CrossRef]
Zhang, X.; Zhu, K.; Chen, G.; Tan, X.; Zhang, L.; Dai, F.; Liao, P.; Gong, Y. Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network. Remote Sens. 2019, 11, 755. [Google Scholar] [CrossRef]
Zhang, C.; Xiao, J.; Yang, C.; Zhou, J.; Lam, K.M.; Wang, Q. Integrally Mixing Pyramid Representations for Anchor-Free Object Detection in Aerial Imagery. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6009905. [Google Scholar] [CrossRef]
Yang, C.; Rottensteiner, F.; Heipke, C. Classification of Land Cover and Land Use Based on Convolutional Neural Networks. Isprs Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, IV-3, 251–258. [Google Scholar] [CrossRef]
Yang, C.; Rottensteiner, F.; Heipke, C. Towards Better Classification of Land Cover and Land Use Based on Convolutional Neural Networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-2/W13, 139–146. [Google Scholar] [CrossRef]
Sun, Y.; Tian, Y.; Xu, Y. Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: Structural stereotype and insufficient learning. Neurocomputing 2019, 330, 297–304. [Google Scholar] [CrossRef]
Zhao, Q.; Liu, J.; Li, Y.; Zhang, H. Semantic Segmentation with Attention Mechanism for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5403913. [Google Scholar] [CrossRef]
Korea Agency of Education, Promotion and Information Service in Food, Agriculture, Forestry and Fisheries (EPIS). Farm-Map Service. Available online: https://agis.epis.or.kr/ (accessed on 7 February 2023).
Cira, C.I.; Manso-Callejo, M.Á.; Yokoya, N.; Sălăgean, T.; Badea, A.C. Impact of Tile Size and Tile Overlap on the Prediction Performance of Convolutional Neural Networks Trained for Road Classification. Remote Sens. 2024, 16, 2818. [Google Scholar] [CrossRef]
Yang, R.; Wang, R.; Deng, Y.; Jia, X.; Zhang, H. Rethinking the Random Cropping Data Augmentation Method Used in the Training of CNN-Based SAR Image Ship Detector. Remote Sens. 2021, 13, 34. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
Yin, H.; Yang, C.; Lu, J. Research on Remote Sensing Image Classification Algorithm Based on EfficientNet. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp. 1757–1761. [Google Scholar] [CrossRef]
Charoenchittang, P.; Boonserm, P.; Kobayashi, K.; Cooharojananone, N. Airport Buildings Classification through Remote Sensing Images Using EfficientNet. In Proceedings of the 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Mai, Thailand, 19–22 May 2021; pp. 127–130. [Google Scholar] [CrossRef]
Chai, J.; Nan, Y.; Guo, R.; Lin, Y.; Liu, Y. Recognition Method of Landslide Remote Sensing Image based on EfficientNet. In Proceedings of the 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 27–29 May 2022; pp. 1224–1228. [Google Scholar] [CrossRef]
Zhang, D.; Liu, Z.; Shi, X. Transfer Learning on EfficientNet for Remote Sensing image Classification. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 2255–2258. [Google Scholar] [CrossRef]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Reina, G.A.; Panchumarthy, R.; Thakur, S.P.; Bastidas, A.; Bakas, S. Systematic evaluation of image tiling adverse effects on deep learning semantic segmentation. Front. Neurosci. 2020, 14, 65. [Google Scholar] [CrossRef]
Shi, J.X.; Wei, T.; Xiang, Y.; Li, Y.F. How Re-sampling Helps for Long-Tail Learning? In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 75669–75687. [Google Scholar]
Sinha, S.; Ohashi, H.; Nakamura, K. Class-difficulty based methods for long-tailed visual recognition. Int. J. Comput. Vis. 2022, 130, 2517–2531. [Google Scholar] [CrossRef]
Hu, X.; Tang, C.; Chen, H.; Li, X.; Li, J.; Zhang, Z. Improving Image Segmentation with Boundary Patch Refinement. Int. J. Comput. Vis. 2022, 130, 2571–2589. [Google Scholar] [CrossRef]
Liu, Y.; Ren, Q.; Geng, J.; Ding, M.; Li, J. Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images. Sensors 2018, 18, 3232. [Google Scholar] [CrossRef]
Park, S.; Park, N.W. Effects of Class Purity of Training Patch on Classification Performance of Crop Classification with Convolutional Neural Network. Appl. Sci. 2020, 10, 3773. [Google Scholar] [CrossRef]

Figure 1. Comparison between fixed-size patch extraction and resizing as input preparation strategies. (a) Orchard sample (1 m/pixel). (b) Growing field sample (1 m/pixel). (c) Fixed-size patch comparison (1 m/pixel). (d) Resized image comparison (orchard: 4.5 m/pixel; growing field: 1.6 m/pixel) (e) Zoomed-in view of fixed-size patch. (f) Zoomed-in view of resized image.

Figure 2. Limitations of FPE due to repeated patch duplication across epochs.

Figure 3. Parcel-level farmland images showing spatio-temporal domain differences across dataset splits.

Figure 4. Class distribution of farmland parcels for the training/validation subsets and the test set.

Figure 7. Training flowchart of our proposed method. CBS selects parcels according to class balance, and RPE extracts patches from randomly sampled positions (red boxes) that satisfy predefined conditions.

Figure 8. Inference pipeline using patch-wise softmax prediction and product-based aggregation.As described in the text, patches are extracted in the same way as FPE, and the patch regions are indicated by orange boxes.

Figure 9. Inference -phase patch coverage comparison between original (1×) and upscaled (2×) input images, showing how upscaling affects patch layout (orange boxes).

Figure 10. Parcel images from two visually distinct minority farmland classes.

Figure 11. Visual confusion between similar farmland types due to overlapping texture patterns.

Figure 12. A parcel partially covered with mulch, where unmulched areas (red outlines) result in texture heterogeneity.

Table 1. Summary of dataset composition by year, region, and usage purpose.

Dataset	Year	Region	Parcels	Purpose
Train Dataset	2021	Gyeongsang, Daegu, Jeolla, Jeju, Seosan, Taean	84,810	Train
Val Dataset	2021	Gyeongsang, Daegu, Jeolla, Jeju, Seosan, Taean	9431	Same-domain eval
Test Dataset	2020	Geumsan, Namyangju	15,227	Diff-domain eval

Table 2. Overall ablation performance metrics: accuracy, macro F1 score, and Kappa.

(a) Validation Set
Method	Scale: 1×			Scale: 2×
Method	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa
NCB + FPE (baseline)	0.9645	0.9130	0.9598	0.8392	0.7097	0.8162
NCB + RPE	0.9821	0.9632	0.9797	0.9659	0.8694	0.9613
CBS + FPE	0.9817	0.9621	0.9792	0.9502	0.9169	0.9415
CBS + RPE (Ours)	0.9910	0.9815	0.9898	0.9794	0.9663	0.9782
(b) Test Set
Method	Scale: 1×			Scale: 2×
Method	Accuracy	Macro-F1	Kappa	Accuracy	Macro-F1	Kappa
NCB + FPE (baseline)	0.6957	0.6168	0.6506	0.4857	0.4394	0.4277
NCB + RPE	0.7531	0.6775	0.7130	0.6638	0.5672	0.5594
CBS + FPE	0.7291	0.6798	0.6891	0.7814	0.6707	0.7434
CBS + RPE (Ours)	0.7811	0.7043	0.7451	0.8033	0.7166	0.7687

Table 3. F1 scores on the validation set for all farmland classes under each ablation setting.

(a) Original Image (Scale: 1×)
Method	Rice Paddy				Field
Method	Plowed	Transplanted	Growing	Harvested	Plowed	Mulched	Growing	Tree Nursery *	Orchard	Greenhouse	Ginseng *
NCB + FPE (baseline)	0.9664	0.9801	0.9964	0.9912	0.9350	0.9095	0.9793	0.4694	0.9667	0.9461	0.9043
NCB + RPE	0.9717	0.9817	0.9984	0.9902	0.9639	0.9599	0.9850	0.7813	0.9892	1.0000	0.9739
CBS + FPE	0.9783	0.9886	0.9974	0.9953	0.9542	0.9483	0.9822	0.7879	0.9868	0.9994	0.9643
CBS + RPE (Ours)	0.9897	0.9926	0.9979	0.9975	0.9868	0.9832	0.9901	0.9103	0.9924	0.9932	0.9624
(b) Enlarged Image (Scale: 2×)
Method	Rice Paddy				Field
Method	Plowed	Transplanted	Growing	Harvested	Plowed	Mulched	Growing	Tree Nursery *	Orchard	Greenhouse	Ginseng *
NCB + FPE (baseline)	0.7051	0.9237	0.9815	0.9351	0.7269	0.8405	0.9523	0.2025	0.9577	0.0298	0.5500
NCB + RPE	0.9746	0.9865	0.9867	0.9922	0.9538	0.9039	0.9483	0.0000	0.9752	0.9912	0.8515
CBS + FPE	0.9315	0.9446	0.9881	0.9828	0.9014	0.8710	0.9446	0.6111	0.9741	0.9867	0.9500
CBS + RPE (Ours)	0.9696	0.9845	1.0000	0.9932	0.9622	0.9515	0.9782	0.8196	0.9834	0.9872	1.0000

* Minority classes.

Table 4. F1 scores on the test set for all farmland classes under each ablation setting.

(a) Original Image (Scale: 1×)
Method	Rice Paddy				Field
Method	Plowed	Transplanted	Growing	Harvested	Plowed	Mulched	Growing	Tree Nursery *	Orchard	Greenhouse	Ginseng *
NCB + FPE (baseline)	0.6353	0.7430	0.8479	0.8497	0.5936	0.3420	0.4548	0.0980	0.4749	0.9604	0.7859
NCB + RPE	0.5692	0.8204	0.8791	0.8249	0.6221	0.5404	0.5876	0.2559	0.5139	0.9746	0.8647
CBS + FPE	0.6671	0.7739	0.8598	0.8669	0.6335	0.5042	0.5825	0.3730	0.4554	0.9633	0.7982
CBS + RPE (Ours)	0.6040	0.8087	0.9126	0.8478	0.6552	0.5112	0.5882	0.4219	0.5252	0.9626	0.9100
(b) Enlarged Image (Scale: 2×)
Method	Rice Paddy				Field
Method	Plowed	Transplanted	Growing	Harvested	Plowed	Mulched	Growing	Tree Nursery *	Orchard	Greenhouse	Ginseng *
NCB + FPE (baseline)	0.4628	0.6129	0.8150	0.6983	0.2092	0.4938	0.3288	0.0795	0.3651	0.2544	0.5135
NCB + RPE	0.4020	0.7591	0.8059	0.6154	0.4988	0.4157	0.4279	0.0000	0.5543	0.9415	0.8186
CBS + FPE	0.5499	0.7903	0.8910	0.7696	0.6091	0.5786	0.6146	0.1144	0.5433	0.9703	0.9467
CBS + RPE(Ours)	0.5446	0.7894	0.8487	0.8591	0.6387	0.5605	0.5985	0.4839	0.6293	0.9804	0.9509

* Minority classes.

Table 5. Comparison between patch-level and parcel-level performance on the test set: accuracy, macro F1, and Kappa (difference = parcel level − patch level).

(a) Original Image (Scale: 1×)
Method	Patch Level			Parcel Level			Difference
Method	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa
NCB + FPE (baseline)	0.6809	0.5956	0.6290	0.6957	0.6168	0.6506	+0.0148	+0.0212	+0.0216
NCB + RPE	0.7323	0.6482	0.6834	0.7531	0.6775	0.7130	+0.0238	+0.0293	+0.0296
CBS + FPE	0.7310	0.6695	0.6852	0.7291	0.6798	0.6891	−0.0019	+0.0103	+0.0039
CBS + RPE (Ours)	0.7724	0.6862	0.7280	0.7811	0.7043	0.7444	+0.0087	+0.0181	+0.0164
(b) Enlarged Image (Scale: 2×)
Method	Patch Level			Parcel Level			Difference
Method	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa
NCB + FPE (baseline)	0.4794	0.4121	0.4139	0.4857	0.4394	0.4277	+0.0063	+0.0273	+0.0138
NCB + RPE	0.6217	0.5235	0.5594	0.6638	0.5672	0.6129	+0.0421	+0.0437	+0.0535
CBS + FPE	0.7296	0.6165	0.6765	0.7814	0.6707	0.7434	+0.0518	+0.0542	+0.0669
CBS + RPE (Ours)	0.7656	0.6751	0.7198	0.8033	0.7166	0.7687	+0.0377	+0.0415	+0.0489

Table 6. Memory usage and training time for different configurations.

Method	Peak CPU Memory (MB)	Peak GPU Memory (MB)	Time per Epoch (s)
NCB + FPE (baseline)	1238.5	3010.8	1564.3
NCB + RPE	1237.1	3010.1	1584.8
CBS + FPE	1239.3	3005.4	1565.7
CBS + RPE (Ours)	1238.4	3005.2	1582.3

Table 7. Partial confusion matrix for minority classes under 1× and 2× conditions, with F1 scores and average correct confidence shown below.

GT	Ginseng (1×)	Ginseng (2×)	Tree Nursery (1×)	Tree Nursery (2×)
Pred	Ginseng (1×)	Ginseng (2×)	Tree Nursery (1×)	Tree Nursery (2×)
Plowed Rice Paddy	0	0	0	0
Transplanted Rice Paddy	0	0	0	0
Growing Rice Paddy	0	0	0	0
Harvested Rice Paddy	0	0	0	0
Plowed Field	4	0	0	0
Mulched Field	2	30	0	1
Growing Field	0	4	0	1
Tree Nursery	0	0	27	8
Orchard	0	2	41	59
Greenhouse	0	0	0	0
Ginseng	52	22	1	0
Validation F1 score	0.9043	0.5500	0.4694	0.2025
Average Correct Confidence	0.8837	0.7680	0.6562	0.6170

Green cells = correct predictions. Red cells = incorrect predictions. Color intensity = percentage of parcels predicted as that class.

Table 8. Baseline F1 score accuracy and performance improvements (

∆

Val/

∆

Test) for RPE, CBS, and RPE + CBS on minority classes, categorized by class difficulty.

Table 8. Baseline F1 score accuracy and performance improvements (

∆

Val/

∆

Test) for RPE, CBS, and RPE + CBS on minority classes, categorized by class difficulty.

Class	Difficulty	Baseline (Val/Test)	RPE ( $∆$ Val/ $∆$ Test)	CBS ( $∆$ Val/ $∆$ Test)	RPE + CBS ( $∆$ Val/ $∆$ Test)
Ginseng (1×)	Easy	0.9043/0.7859	+0.0696/+0.0790	+0.0600/+0.0123	+0.0581/+0.1241
Ginseng (2×)	Normal	0.5500/0.5135	+0.3015/+0.3051	+0.4000/+0.4332	+0.4500/+0.4374
Tree Nursery (1×)	Difficult	0.4694/0.0980	+0.3119/+0.1579	+0.3185/+0.2750	+0.4409/+0.3239
Tree Nursery (2×)	Very Difficult	0.2025/0.0795	−0.2025/−0.0795	+0.4086/+0.0349	+0.6171/+0.4044

Table 9. 3× Overall ablation performance metrics: accuracy and macro F1 score.

Method	Validation			Test
Method	Accuracy	Macro F1	Kappa	Accuracy	Macro F1	Kappa
NCB + FPE (baseline)	0.5817	0.4054	0.5165	0.2968	0.2783	0.2392
NCB + RPE	0.7721	0.6531	0.7401	0.3249	0.3216	0.2692
CBS + FPE	0.8894	0.8119	0.8744	0.7129	0.5946	0.6605
CBS + RPE (Ours)	0.9823	0.9584	0.9799	0.7861	0.6852	0.7481

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bae, Y.; Ban, Y. Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification. Appl. Sci. 2025, 15, 7056. https://doi.org/10.3390/app15137056

AMA Style

Bae Y, Ban Y. Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification. Applied Sciences. 2025; 15(13):7056. https://doi.org/10.3390/app15137056

Chicago/Turabian Style

Bae, Yeongung, and Yuseok Ban. 2025. "Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification" Applied Sciences 15, no. 13: 7056. https://doi.org/10.3390/app15137056

APA Style

Bae, Y., & Ban, Y. (2025). Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification. Applied Sciences, 15(13), 7056. https://doi.org/10.3390/app15137056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Class-Balanced Random Patch Training to Address Class Imbalance in Tiling-Based Farmland Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Preprocessing

2.2. Proposed Method

2.2.1. Methodology

Class-Balanced Sampling

Random Patch Extraction

2.2.2. Training Pipeline

2.2.3. Inference Pipeline

2.3. Implementation and Evaluation

3. Results

3.1. Overall Performance

3.2. Per-Class Performance

3.3. Tiling Aggregation Experiment

3.4. Computational Efficiency

4. Discussion

4.1. Difficulty-Based Performance Analysis on Minority Classes

4.1.1. Class Difficulty Assessment

4.1.2. Effects of Each Method According to Class Difficulty

4.2. Analysis of Trade-Off According to Patch Coverage Change

4.2.1. Patch-Count Increase and Aggregation Benefits

4.2.2. Patch-Level Robustness Under Reduced Context

4.2.3. Robustness Under Extremely Reduced Patch Coverage

4.3. Limitations

4.3.1. No External Context Consideration

4.3.2. Vulnerability to Heterogeneous Internal Textures

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Experimental Basis for Configuration Selection

Appendix A.1. Patch Overlap Strategy

Appendix A.2. Valid-Area Threshold

Appendix A.3. Aggregation Strategy

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI