Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration

Moon, Hye-Jung; Cho, Nam-Wook

doi:10.3390/rs18020205

Open AccessArticle

Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration

by

Hye-Jung Moon

¹

and

Nam-Wook Cho

^2,*

¹

Graduate School of Public Policy and IT, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea

²

Department of Industrial and Information Systems Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(2), 205; https://doi.org/10.3390/rs18020205

Submission received: 14 November 2025 / Revised: 25 December 2025 / Accepted: 5 January 2026 / Published: 8 January 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Response Surface Methodology-based channel calibration achieves up to 67.86 percentage points of IoU improvement for coniferous forest with low baseline performance, while yielding a 59.92 percentage points IoU gain in non-targeted agricultural land, demonstrating cross-class benefits without GPU-based retraining.
Class-wise optimal hyperparameters transfer across domains via proportional mapping, proving that generalization is possible between French coastal/mountainous areas and Korean data.

What are the implications of the main findings?

The proposed training-free approach enables practical transfer learning in resource-constrained environments using only 30–150 labeled tiles instead of thousands required for conventional fine-tuning, with minimal cost and effort.
The reproducible relationship between RGB channel statistics and segmentation performance suggests that CNN internal representations form structured manifolds proportional to input spectral characteristics, advancing the interpretability beyond traditional “black box” paradigms.

Abstract

This study proposes a lightweight framework for transferring pretrained land cover classification architectures without additional training. The system utilizes French IGN imagery and Korean UAV and aerial imagery. It employs FLAIR U-Net models with ResNet34 and MiTB5 backbones, along with the AI-HUB U-Net. The implementation consists of four sequential stages. First, we perform class mapping between heterogeneous schemes and unify coordinate systems. Second, a quadratic polynomial regression equation is constructed. This formula uses multispectral band statistics as hyperparameters and class-wise IoU as the dependent variable. Third, optimal parameters are identified using the stationary point condition of Response Surface Methodology (RSM). Fourth, the final land cover map is generated by fusing class-wise optimal results at the pixel level. Experimental results show that optimization is typically completed within 60 inferences. This procedure achieves IoU improvements of up to 67.86 percentage points compared to the baseline. For automated application, these optimized values from a source domain are successfully transferred to target areas. This includes transfers between high-altitude mountainous and low-lying coastal territories via proportional mapping. This capability demonstrates cross-regional and cross-platform generalization between ResNet34 and MiTB5. Statistical validation confirmed that the performance surface followed a systematic quadratic response. Adjusted R² values ranged from 0.706 to 0.999, with all p-values below 0.001. Consequently, the performance function is universally applicable across diverse geographic zones, spectral distributions, spatial resolutions, sensors, neural networks, and land cover classes. This approach achieves more than a 4000-fold reduction in computational resources compared to full model training, using only 32 to 150 tiles. Furthermore, the proposed technique demonstrates 10–74× superior resource efficiency (resource consumption per unit error reduction) over prior transfer learning schemes. Finally, this study presents a practical solution for inference and performance optimization of land cover semantic segmentation on standard commodity CPUs, while maintaining equivalent or superior IoU.

Keywords:

land cover segmentation; lightweight transfer learning; class-wise spectral calibration; response surface methodology; domain generalization; decision fusion

1. Introduction

Climate-driven environmental transitions are accelerating in conjunction with global warming. Extreme meteorological disasters such as the 2025 wildfires in Los Angeles, USA, and South Korea [1,2], and floods in northern Nigeria [3] are occurring with increasing frequency. Approximately 89,000 km² of permafrost thaws annually, transforming into thermokarst terrain [4,5,6,7]. Between 1990 and 2020, approximately 4.3 million km² of Earth, equivalent to the area of the EU, underwent conversion to drylands [8], while economic losses attributed to coastal erosion have been steadily increasing [8,9]. To quantitatively track these rapid shifts, international monitoring frameworks have been established, including the ESA Copernicus Programme [10], the UNCCD’s Land Degradation Neutrality (LDN) [11], and the IPCC guidelines for land use and land cover change (LUCC) [12,13]. Land use land cover (LULC) classification and segmentation using satellite, aerial, and UAV imagery have emerged as key technologies for ecological monitoring [14,15].

Recent deep learning-based topographic classification models have demonstrated high performance; however, they require large-scale labeled datasets and substantial computing resources. This resource-intensive characteristic presents a practical barrier in data-constrained environments, leading to increased efforts to apply existing architectures to new regions through transfer learning [16,17,18]. However, cross-country transfer learning for surface mapping faces lower accuracy and class mismatches caused by heterogeneity in local spectral characteristics, sensor conditions, and taxonomic schemes. Furthermore, existing adaptation methods require thousands of labeled samples during retraining, limiting their real-world utility.

To address these issues, this paper presents a technique that enhances cross-domain performance using only dozens of labeled tiles without retraining. The approach aims to optimize accuracy by calibrating input channel statistics at inference time. Section 2 identifies the limitations and improvements of the relevant literature on land-cover segmentation. Section 3 describes data preprocessing, performance metrics, implementation of the objective function, integration of results, algorithms, and the workflow. It also explains the class mapping methodology for addressing heterogeneous labeling schemes across geographic zones. Section 4 validates the effectiveness through three aspects: response surface construction for performance optimization, proportional transfer across domains and architectures, and statistical validation with cost analysis. Section 5 compares our approach with existing methodologies, discusses its contributions and limitations, and suggests future investigation directions.

2. Related Works

Table 1 provides a comprehensive summary of transfer learning research on land-cover classification, organized by research field, model architecture, dataset characteristics, and performance metrics. Existing studies can be classified into four categories: simple transfer learning (sTL), benchmarking models (BM), domain adaptation (DA), and research methodology development (MTH).

The transfer learning techniques applied at each stage of deep learning for land-cover classification in previous studies are as follows. The main technique in the architecture design stage was backbone reconstruction [19,20,21,22,23,24,25]. The main techniques in the preprocessing stage were data augmentation [19,22,23,25,26,27,28,29,30,31,32], reference label generation [22,23,28,31,32,33,34,35,36], label calibration [26,29] and resolution adaptation [25,26,28,29,30,32,34]. The techniques in the training stage were weight fine-tuning [19,20,22,23,24,26,27,29,30,31,32,33,34,37,38,39,40] and feature extraction [19,21,24,25,31,34]. In the optimization stage, learning rate optimization [19,20,23,24,25,31,34,40] and early stopping [19,20,21,24,39,40] were employed as the primary techniques. In the inference stage, hyperparameter adjustment [19,21,22,23,24,25,26,27,28,29,30,31,33,34,39,40,41] and backbone reuse [21,27,28,29,30,32,39,40] were primarily employed.

Most transfer learning research has focused on level classification, with limited cases applied to pixel-level segmentation. Ref. [31] performed pixel-level segmentation of historical land cover in the Sagalassos area of Turkey into 14 classes, and ref. [32] segmented UAV imagery in arid ground crack environments into four classes. These represent a small number of studies that applied transfer learning to segmentation, demonstrating the difficulty and necessity of pixel-level analysis that requires spatial precision.

However, existing land cover classification studies demand large-scale data and high-performance computing resources and remain confined to simple patch-based classification or intra-regional benchmarking. Hyperparameter optimization has focused on training pipelines (e.g., learning rate, batch size) and data augmentation [20,24,27], exhibiting limited performance gains without fine-tuning pre-trained weights. To address these limitations, this study proposes the following distinctive approaches.

First, this study presents a resource-efficient transfer learning strategy. While existing studies have relied on backbone reconstruction, weight fine-tuning, and performance optimization, this research proposes a lightweight approach that achieves performance solely by optimizing the preprocessing, inference, and post-processing stages. Second is the scalability of cross-country transfer learning. Whereas prior studies have been confined to intra-regional performance comparisons, this study implements transfer learning between France and Korea, mitigating domain shift through class mapping and channel calibration, without relying on data augmentation or spectral normalization. Third, a data-driven optimization technique is employed. The relationship between spectral statistics (channel-wise mean and standard deviation) of input images and performance metrics is mathematically modeled to explore optimal channel correction ratios. This approach minimizes manual intervention while dynamically calculating class-specific correction amounts based on data statistics. Fourth, this study validates geographic generalization by applying the performance function to geographically distinct distincts (D004, D067) within the French FLAIR dataset and extending the same proportional formula to Korean AI-HUB imagery. Fifth is cross-architecture transferability. Unlike existing methods that require architecture-specific retraining, this approach validates the architecture-agnostic nature of spectral optimization by enabling hyperparameter transfer across different model backbones (e.g., ResNet34 to MiTB5).

Table 1. Previous research on transfer learning for land cover classification.

Field	Task Year	Application	Architecture (Backbone)	Patch Size Channels	Data (GSD, #Classes) : Volume	Accuracy (%) (Δp.p.) of Best Model	Base
sTL	CD 2022	Detection of spectral differences between Sejong and Goseong regions, Korea [37]	U-Net	572 ² 3	Sentinel-2 (10): NA	Visual verification	OA
	CD 2022	Detection of changes across four land-cover types in Pyeongtaek, Korea [26]	CNN (MobileNet)	224 ² 3	K-NLIF (0.25): 4000	100 patches/image: 97	OA
	DET 2023	Object (paddy fields, fields and greenhouse) detection in Pyeongtaek [38]	CNN (YOLOv5)	400 ² 3	K-NLIF (0.25): 1598	80 (+7.9)	OA
	DET 2023	Aircraft detection in complex images [27]	TLH²TD (YOLO v5x)	128 ² 3	RarePlanes (0.3): 8525 (train) HIS (7.1~3.3): 10 (test)	95.66 (+26.5%)	OA
BM	DET 2021	Vehicle detection from aerial images [28]	CNN (Faster R-CNN, YOLOv3, YOLOv4)	608 ² 3	UAV (3DR Solo): Stanford 8506; PSU 270	Stanford: 91.62 (+8.91) PSU: 89.47 (+8.42)	IoU
	DET 2023	Building extraction using semantic segmentation [20]	U-Net, DeeplabV3+ (MobileNet, ResNet50, ResNet101)	512 ² 3	AiHub (0.51) 53,000	84.65 (−0.18)	f1
	CLS 2021	Deep transfer learning for LULC classification of Europe [19]	VGG16, W-ResNet50	224 ² 3	EuroSAT (10, 10): 27,000	99.17 (+0.13)	OA
	CLS 2022	LULC classification in the remote sensing image of the USA [21]	CNN (ResNet50V2 InceptionV3, VGG19)	256 ² 3	UCM (0.3, 21): 2100	99.64 (+5.34)	OA
	CD 2023	Change detection of LULC classification in urban areas [22]	DeepLabV3+, U-Net (EfficientNetV2T, YoloX, ResNest, VGG19)	128 ² 6	Sentinel-2 (10): 2040 OSCD: 2475	97.66	OA
	CLS 2024	Transformer-based LULC classification of Europe [39]	ResNet50/101, InceptionV3, DenseNet161, GoogLeNet, DeiT-Base, SwinT-Small/Large, ViT-Base/Large	224 ² 3	EuroSAT (10, 10): 27,000 PatternNet (0.3, 38): 30,400	EuroSAT: 99.07 PatternNet: 99.59	OA
	CLS 2024	Classification of maize straw types (cob, stalk, and pile) [29]	CNN (DenseNet201, ResNet50, GoogLeNet)	300 ² 3	UAV (0.015, 4): 6318	95.51	OA
	CLS 2024	Land Cover Classification of California, USA [30]	CNN (Inception-v3, DenseNet121, ResNet-50)	224 ² 3	UCM (0.3, 18 ← 21):18,000	92.00	OA
DA	CLS 2022	Transfer learning from EU to Russia [23]	CNN (ResNet50, ResNet152, VGG16, VGG19)	64 ² 3	EuroSAT (10, 10): 27,000 Sentinel-2 (10): 2000	96.83 (−1.74)	OA
	DET 2020	Cloud detection from Taean to Ulsan-Busan, Korea [33]	Deeplab-V3+ (Xecption)	512 ² 4	Landsat8 (15–100):4032 PlanetScope (4): 3296	94.96 (+10)	OA
	SEG 2022	Multiclass land cover mapping from historical orthophotos in Sagalassos [31]	U-Net, FPN, DeeplabV3+ (EfficientNetB5)	512 ² 3	Sagalassos (0.84~0.30, 14): 460 MiniFrance (0.5, 15): 117,832	Pre-train: 29.2 (+13.0) Fine-tunning: 27.8 (+27.2)	mIoU
	CLS 2023	Inter-region transfer learning for LULC classification in European [24]	CNN (EfficientNet-B5)	64 ² 3	BigEarthNet (10, 43): 590,326	65.5 (+11.8)	recall
MTH	CLS 2023	Assessing the Impact of Sampling Intensity on LULC Estimation [34]	CNN (VGG16)	224 ² 3	K-NLIF (0.51): 5000	91.1	OA
MTH	CLS 2025	Coastal and LULC recognition from high-resolution images [25]	SR-RAN5 (M²IAN)	227 ² 3	Mixed Coastal: 9206 NWPU_RESIS45: 10,500	91.8 (+16.46~6.8)	OA
BM MTH	CLS 2024	Classification of in a smart city for green space [40]	CNN (Inception, LSTM, VGG-16, MobileNet, GoogleNet, Efficient, ResNet50, Dilated, AlexNet, proposed MAFDN)	90 ² 3	NWPU (0.3~0.2, 15): 10,500 EuroSAT (10, 10): 27,000	NWPU 99.01 EuroSAT 99.00	OA
BM MTH	SEG 2024	Auto training for vegetation detection in a dry thermal valley [32]	Seg-Res-Net50, U-Net, Seg-Net, FCN (ResNet-50)	250 ² 3	UAV (DJI Phantom4v2): 300 → 30,000 (augmentation)	90.88 (+16.00)	mIoU

Notes: Task: CLS = Classification; DET = Object Detection; SEG = Semantic Segmentation; CD = Change Detection. Accuracy: OA = Overall Accuracy; IoU = Intersection over Union. ² indicates squared values (e.g., 512² = 512 × 512).

3. Research Design

3.1. Datasets and Pre-Trained Models

Both France and Korea are located in the mid-latitudes of the Northern Hemisphere, exhibit distinct four seasons, and share complex land cover structures where coastal, agricultural, forest, and urban areas coexist. However, significant variations in vegetation, soil, and land cover arise from environmental differences between France (a mix of maritime, Mediterranean, and continental climates) and Korea (a temperate monsoon climate). For instance, the same ‘coniferous forest’ class exhibits different chlorophyll activity and RGB statistics due to Korea’s hot and humid summer conditions versus France’s mild oceanic climate. Nevertheless, publicly available labeled datasets (FLAIR with 15 classes, AI-HUB with 12 classes) enable controlled comparison. Consequently, applying a France-trained model to South Korea serves as an appropriate experimental setting for validating the generalization performance of lightweight transfer learning.

Table 2 summarizes the datasets used in this study. Korean data from AI-HUB includes UAV (Unmanned Aerial Vehicle) and aerial imagery with labels and pretrained models. French data from IGN at https://geoservices.ign.fr/ (accessed on 25 December 2025) provide aerial imagery, labels, and models through the FLAIR project [41].

In addition to raw imagery, this study uses specialized, pre-trained semantic segmentation models provided by each data platform. Specifically, the FLAIR U-Net models with ResNet34 and MiTB5 backbones, optimized for land-cover classification on French aerial imagery, and the AI-HUB U-Net, trained on Korean land-cover datasets, serve as baselines for experiments E1–E5 (Section 3.5).

3.2. Performance Evaluation Criteria

Performance evaluation metrics include Accuracy, IoU, F1 score, Precision, and Recall, which are computed from the following parameters: TP (True Positive), FP (False Positive), FN (False Negative), and TN (True Negative). Accuracy represents the proportion of correct predictions among all predictions. IoU (Intersection over Union) is the ratio of the intersection to the union of predicted and actual regions, serving as a metric for evaluating area-based prediction performance. The F1 score represents the harmonic meaning of Precision and Recall, indicating the balance between these two metrics. Precision is the proportion of actual correct results among the model’s predictions, while Recall is the proportion of correct predictions among the actual ground truth. The calculation formulas for each metric are as follows:

Accuracy = (TP + TN)/(TP + TN + FP + FN)

IoU = TP/(TP + FP + FN)

F1 score = 2 × (Precision × Recall)/(Precision + Recall)

Precision = TP/(TP + FP)

Recall = TP/(TP + FN)

Accuracy and IoU are both metrics that represent prediction performance, but they differ in their evaluation perspectives. Accuracy is the proportion of correctly predicted pixels among all pixels, while IoU evaluates how accurately the predicted extents match the actual partial boundaries for each class. Therefore, when certain classes have small area proportions, or when there are many unclassified classes, Accuracy can be high even if the model performs well only on the majority classes, whereas IoU strictly evaluates the actual prediction performance of each class and may show lower values. Because the AI-HUB dataset has a high proportion of unclassified classes, IoU is used as the primary evaluation metric for land cover classification to enable more accurate performance comparisons.

3.3. Hyperparameter Optimization

A key approach to improving deep learning model performance is hyperparameter tuning. In land cover classification, the mean and standard deviation of multispectral bands serve as critical hyperparameters that normalize input distributions. Preliminary experiments with the FLAIR model revealed that IoU exhibits a smooth, unimodal, and convex pattern with respect to RGB channel statistics at the optimal region. This observed pattern aligns precisely with the theoretical foundation of Response Surface Methodology (RSM). RSM has been proven effective when the underlying process exhibits stable quality characteristics or follows natural statistical distributions [42,43]. Land cover, as a natural phenomenon, inherently follows normal distributions—natural variables such as temperature, soil properties, and vegetation coverage do not change abruptly across space but transition smoothly. Therefore, the observed unimodal convex response surface is not coincidental but a natural consequence of the physical properties being modeled. Based on this rationale, this study adopted RSM as the approach for modeling the performance function.

Since optimal hyperparameter combinations differ across classes, RSM was applied on a class-wise basis to implement the performance function. An RSM-based performance function was constructed with RGB statistics as independent variables and IoU as the dependent variable to identify the RGB mean and standard deviation values that maximize IoU for each class.

f (x_{1}, \dots, x_{d}) = β_{0} + \sum_{i = 1}^{d} β_{i} x_{i} + \sum_{i = 1}^{d} \sum_{j = i}^{d} β_{i j} x_{i} x_{j}

(1)

The quadratic response surface equation includes one intercept, d linear coefficients, and d(d + 1)/2 interaction coefficients, where d = 2b (b: number of spectral bands). The minimum sample size N_min is given by Equation (2). For RGB imagery (b = 3), d = 6 variables (mean and standard deviation for each band), requiring N_min = 28 observations.

N_m i n = p (d) = 1 + d + \frac{d (d + 1)}{2}

(2)

Figure 1 illustrates an example of optimizing land cover segmentation in urban areas with high impervious surfaces, such as buildings and asphalt. The inputs are a pretrained segmentation model M, a set of inference tiles T, and initial parameters θ₀. The outputs include the optimized RGB spectral statistics (mean and standard deviation)

θ_{o p t}^{c}

, best

{IoU}_{b e s t}^{c}

, and predicted results

P_{b e s t}^{c}

for each class C(a).

Sample data for constructing the performance function are collected through a two-stage sequential design (Figure 1h): first, approximately 30 inferences with broad parameter variations (adj₀ ≈ 0.5) ensure diverse sampling across the response surface; second, iterative refinement with narrowed ranges (adj₀: 0.1–0.2) continues until R² ≥ 0.9 or stabilizes with p < 0.001, typically requiring only 10–20 additional inferences to locate the optimum. Total inference counts range from 40 to 50 per class—approximately 1.5–2 times the theoretical minimum of 28 observations for a six-variable quadratic model.

The proposed algorithm systematically manages sensitivity through convergence conditions (d1), (d2), (e1), and (e2) in Figure 1. Condition (d1) validates the difference between performance function predictions and actual IoU, while (e1) and (e2) verify whether IoU improvements converge below εstop. After identifying the optimal region through two-stage design, sensitivity was experimentally validated by further reducing adj to plus-minus 0.05 and 0.01. For well-trained models such as E1, E4, and E5 where R² exceeds 0.97, reducing adj below plus-minus 0.1 yielded no further IoU improvements, demonstrating intrinsic robustness near the optimum. Conversely, for insufficiently trained models such as E2 and E3b with baseline IoU below 65 percent, reducing adj to plus-minus 0.01 produced unstable results. This indicates that the response surface fails to satisfy RSM assumptions due to labeling errors or insufficient training, suggesting that response surface fitness serves as a reliable diagnostic indicator of data and model quality.

A quadratic regression function f(θ) is fitted to derive θ^c at its stationary point. These optimized values serve as input for the next iteration. These optimized values serve as input for the next iteration. Convergence is achieved when either (i) IoU no longer increases or (ii) the difference between predicted and actual values falls below the threshold ε_stop. In the urban example, IoU improved from 52.87% to above 79%.

There are two main methods for finding the optimal solution. The analytical method involves differentiating the regression equation and solving simultaneous equations, working stably with small datasets when the solution is concentrated at a single point. Optimization algorithm-based search utilizes Newton-Raphson or Bayesian techniques, requiring more data but handling complex forms with distributed near-optimal solutions. On the response surface, a stationary point satisfies ∇f(x) = 0. Accordingly, the necessary conditions for stationarity with respect to each variable are:

\frac{\partial f}{{\partial x}_{k}} = β_{k} + {2 β}_{k k} x_{k} + \sum_{j = 1}^{k - 1} β_{j k} x_{j} + \sum_{j = k + 1}^{d} β_{k j} x_{j} = 0 (k = 1, \dots, d)

(3)

The algorithm operates independently for each class, deriving class-specific optimal hyperparameters. Since optimal configurations differ across classes, decision fusion is applied to integrate class-wise predictions, with target classes (e.g., coniferous forests) optimized sequentially and combined into the final output [44,45].

3.4. Experimental Design

The experimental design for the proposed technique is shown in Table 3. Instead of utilizing the entire dataset for calibration, this study distinguishes between Total Tiles and Target Tiles to ensure optimization efficiency. While Total Tiles serve as the basis for calculating the model’s baseline performance (Base IoU), the channel calibration process is performed exclusively on Target Tiles, a subset in which the target class is present. This targeted sampling strategy is designed to focus optimization on relevant spectral features and prevent interference from tiles that do not contain the class of interest. The rationale for selecting different target classes for each experiment, based on the specific characteristics of the datasets and sensors, is detailed in Section 3.5. By applying this approach across various sensors (UAV and Aerial) and countries (Korea and France), we evaluate the generalizability of the proposed technique in improving the performance of the lowest-performing classes.

This study selected the ResNet-based U-Net, which demonstrated the highest initial inference performance on 3-channel Korean imagery, as the baseline model among the pre-trained models provided by FLAIR. Considering the disparate classification schemes between the FLAIR and AI-HUB datasets, class-wise IoU was employed as the evaluation metric, limited to standard classes that are semantically correspondent.

Experiments E4 and E5 use aerial images from the FLAIR project, targeting areas D004 and D067, respectively. These two datasets were selected because (i) they contain significant proportions of the classes for which the FLAIR pretrained model performs the worst—coniferous and deciduous—and (ii) they represent geographically and scene-wise markedly different areas. This choice aligns with the study’s objective of exploring class-wise performance gains via hyperparameter search. In addition to the baseline U-Net (ResNet34) model, these experiments also evaluate the U-Net (MiTB5) to assess the transferability of optimized hyperparameters across different model architectures.

Geographically, D004 (43.7578°N, 5.7481°E) in southern France represents Mediterranean climate conditions with a rural landscape mosaic of croplands, shrublands, and forests. In contrast, D067 (48.6560°N, 7.7656°E) in northeastern France exhibits a humid continental climate with urban-industrial-agricultural land use on a low-relief plain. We quantified inter-domain dissimilarity by computing the Jensen–Shannon divergence (JSD) between normalized histograms for each RGB channel and reporting the Jensen–Shannon distance (

\sqrt{J S D})

[46,47]. With log 2, JSD is a bounded (0–1), symmetric measure, and

\sqrt{J S D}

satisfies the metric properties. The resulting

\sqrt{J S D}

averaged 0.6447 overall (R = 0.6159, G = 0.6379, B = 0.6803), indicating pronounced spectral heterogeneity across channels.

Accordingly, if applying the response function (and proportional mapping) derived from one region to the other yields a significant improvement in segmentation accuracy, this supports that the proposed method is not merely overfit to a particular dataset but possesses transferable generalization capability.

3.5. Class Mapping and Semantic Equivalence

While France and Korea share geographic similarities and commonalities in land cover composition, the AI-HUB and FLAIR datasets exhibit fundamental differences in their land cover classification schemes from Land Cover and Land Use perspectives. FLAIR is strictly defined by Land Cover (LC), identifying physical surface materials, whereas AI-HUB utilizes a hybrid of Land Use (LU)—functional or legal designations—and LC. This distinction is critical for establishing the scope of Experiment E1. Semantic equivalence is guaranteed only for the five classes (buildings, greenhouses, deciduous/coniferous forests, and water bodies) defined by their physical properties in both datasets. Despite differences in the characteristics and distribution of land, these classes maintain consistent spectral signatures and physical forms, enabling reliable 1:1 mapping.

In contrast, LU-based categories such as roads, parking lots, agricultural fields, and bare land were excluded from E1 due to their physical heterogeneity. For instance, “Road,” “Parking lot,” and “Bare land” in AI-HUB are legal designations that may physically consist of asphalt, cement, or bare soil, whereas FLAIR classifies these as distinct LC types (e.g., ‘Impervious’ or ‘Bare soil’). Similarly, paddy and dry fields undergo extreme seasonal spectral variance—shifting between water, herbaceous vegetation, and bare soil—making it impossible for a static pre-trained model to maintain semantic consistency.

Accordingly, experimental targets were designated based on these taxonomic constraints (see Table 4). Experiment E1 utilizes only the five semantically equivalent classes for rigorous cross-dataset calibration, while E3 evaluates all 12 AI-HUB classes, and E2/E4/E5 employ all 15 FLAIR classes within their native taxonomies.

4. Experimental Results and Validation of Performance Functions

4.1. Channel Calibration and Performance Optimization

This subsection presents the performance improvements achieved through RGB channel calibration across three model-dataset combinations. Experiments E1-E3 demonstrate the effectiveness of the proposed method by targeting the lowest-performing class in each scenario, with detailed IoU improvements reported for all evaluated classes.

4.1.1. FLAIR Model on Korea Dataset (E1)

Table 5 summarizes the results of applying the FLAIR model to the AI-HUB Korean land-cover dataset, with channel calibration centered on coniferous forests. Among the five classes commonly defined between the Korean and FLAIR datasets, the initial inference (b) showed significant performance gaps for coniferous and deciduous forests compared to the FLAIR model, reflecting domain-specific spectral differences.

After applying channel correction–based hyperparameter calibration (c), IoU for coniferous increased sharply from 13.57% to 81.43%, demonstrating a +67.86 p.p. gain over the uncalibrated inference. The optimization also produced secondary improvements in deciduous forests and other non-target classes, such as water and greenhouse, indicating that spectral alignment effects extended beyond the primary target. These results confirm that the proposed calibration effectively compensates for spectral mismatch between French and Korean datasets while maintaining the model’s generalization capacity.

Figure 2 shows the procedure for performing channel correction-based class-specific optimization by applying FLAIR U-Net (ResNet-34) to an AI-HUB Korean land cover imagery (a) and combining the results through decision fusion. The AI-HUB ground truth has some areas remaining unclassified (black) (b), while FLAIR predicts classes that are not present in the AI-HUB standard classification, such as pervious/impervious surface, bare soil, herbaceous, agricultural land, and plowed land, according to its own system (c,d).

Decision fusion is a post-processing strategy that combines class-specific inference results, whereas Diffusion Feature Fusion integrates feature-level representations during the encoding stage [48]. In Figure 2, class-specific optimization was performed targeting coniferous forests (d) and compared with the baseline inference results obtained prior to optimization (c). Specifically, decision fusion was performed to generate result (e) by replacing pixels at the same locations in the basic inference (c) with coniferous forest based on the segmentation (inference) results (d) obtained by applying the optimal hyperparameters searching for coniferous forests.

While this example demonstrates single-class optimization (coniferous), the framework supports independent optimization across multiple classes simultaneously. Each class can be optimized with class-specific parameters (

θ_{o p t}^{c}

), and their predictions can be combined through decision fusion to produce the final segmentation map. In this study, “decision fusion” denotes a post-processing procedure that combines multiple segmentation results obtained via class-specific channel correction. Critically, the fusion process prioritizes classes that showed greater IoU improvements after optimization—rather than selecting the class with the highest probability at each pixel—even if their absolute IoU values remain relatively low. For instance, when both coniferous forest and street tree classes are optimized, pixels are preferentially assigned to these improved classes over higher-confidence but unoptimized classes. This IoU improvement-based priority approach ensures that successfully calibrated classes contribute more to the final result, enabling comprehensive performance improvement across all classes without trade-offs between individual class accuracies [49]. The original images, ground truth, class-specific optimization results, and final combined results for the Gimpo area, South Korea (37.594334°N, 126.696809°E) are available in the Supplementary Materials.

4.1.2. FLAIR Model on FLAIR Dataset (E2)

Table 6 presents the results of applying the FLAIR model U-Net (ResNet-34) to the FLAIR toy dataset. Among the 15 land-cover classes, the IoU for building, impervious surface, water, brushwood, herbaceous, and plowed land remained close to the baseline, whereas the IoU for swimming pool improved notably (+17.42 p.p.). In contrast, pervious surface, bare soil, coniferous, deciduous, and greenhouse showed declines, with coniferous recording the lowest IoU (13.36%).

To address this, class-specific hyperparameter calibration was performed targeting the coniferous class. After calibration, its IoU increased sharply to 75.24% (+61.88 p.p.), and deciduous also improved by +19.46 p.p., reaching FLAIR-level performance. Herbaceous and swimming pool classes exhibited secondary gains, suggesting that spectral calibration enhanced discrimination beyond the primary target.

The relatively low baseline IoUs stem from the dataset’s heterogeneous composition—samples collected from various locations and times—causing inconsistent spectral statistics that limit model consistency. Specifically, coniferous was selected as the optimization focus because it is empirically one of the most challenging classes for segmentation due to its sharp leaf geometry and high intra-class variance. Furthermore, the proposed calibration showed consistent improvement in the Korean dataset (+67.86 p.p. for coniferous), confirming its cross-zonal robustness while preserving the spectral identity of each image.

Figure 3 shows the procedure for performing channel correction-based class-specific optimization by applying the FLAIR model to the toy dataset provided by FLAIR and combining results through decision fusion. While AI-HUB ground truth has approximately 20% unclassified areas, FLAIR classifies land cover into 15 categories with very small unclassified areas. The initial IoU obtained by using the image’s mean and standard deviation as hyperparameters yielded the lowest segmentation IoU for coniferous forests, as shown in Table 6. The low IoU in both AI-HUB and FLAIR appears to be due to the pointed, elongated coniferous leaves, which are particularly difficult to detect in images. Figure 3 shows class-specific optimization targeting coniferous forests (d), compared with the basic inference results before optimization (c). Decision fusion was performed to generate result (e) by replacing pixels at the same locations in basic inference (c) with coniferous forest based on segmentation results (d) obtained by applying optimal hyperparameters searching for coniferous forests.

Both E1 (Korea) and E2 (France) exhibited similarly low baseline IoU for coniferous forests (13.57% and 13.36%, respectively), reflecting severe inter-regional spectral variation caused by the metropolitan area and vicinity, along with French nationwide datasets, under heterogeneous acquisition conditions. However, the Korean dataset achieved a larger improvement (+67.86 p.p.) than the French dataset (+61.88 p.p.), primarily attributable to its higher image resolution and quality. The E1 dataset utilizes ultra-high-resolution UAV imagery at 12 cm, whereas the E2 dataset consists of 25 cm aerial imagery. This higher spatial resolution, combined with significantly higher RGB standard deviations in the Korean dataset (67, 62, 64) compared to the French dataset (52, 44, 43)—a difference of 22.4% to 32.8%—indicates sharper contrast and richer spectral information. Consequently, when channel statistics were optimized via RSM calibration, the superior clarity and fine-grained spatial details of the 12 cm Korean imagery enabled the model to define more distinct class boundaries, resulting in improved performance.

4.1.3. AI_HUB Model on Korea Dataset (E3a, E3b)

This experiment applies optimization algorithms to South Korean neural network models to assess whether the proposed technique generalizes to deep learning architectures beyond the FLAIR project and the ResNet framework. Table 7 presents the results for UAV (E3a) and aerial (E3b) imagery from AI-HUB. In this setting, the street tree class—which exhibited the lowest baseline IoU (80.86% and 64.27% for UAV and aerial, respectively)—was selected as the primary optimization target.

While the primary target, the street tree, showed only marginal improvements (+1.79 p.p. in E3a, +2.19 p.p. in E3b), substantial gains were observed in other classes. Specifically, UAV calibration (E3a) showed marked improvements in parking lot (+9.91 p.p.), dry field (+6.27 p.p.), and building (+4.28 p.p.), whereas the aerial calibration (E3b) gains were modest across all categories.

This cross-class improvement pattern suggests that the algorithm does not merely adjust the morphological shapes of a specific category, but rather focuses on calibrating the overall spectral balance—such as brightness and color consistency—across the entire image. Although these secondary benefits were not the primary objective, hyperparameters optimized for a single target class inherently enhance the accuracy of related classes. For instance, optimizing for coniferous forests naturally improves the performance of other green spaces, such as deciduous forests and grasslands. Similarly, because street trees are geographically associated with road and parking infrastructure, calibrating for street trees yields a synergistic effect that simultaneously boosts the classification accuracy of roads and parking lots.

Furthermore, the effectiveness of calibration is strongly modulated by spatial resolution; the higher resolution of UAV imagery (12 cm GSD) enables significantly more effective channel calibration than the lower resolution of aerial imagery (25 cm GSD).

Experiments using the FLAIR model (Table 5 and Table 6) showed rapid IoU gains from low initial baselines, but the coniferous class plateaued at 81.43% (E1) and 75.24% (E2), both below the 85% threshold. In contrast, experiments using the AI-HUB model (E3a) achieved substantial improvements despite starting from a much higher baseline IoU (80.86–96.23% in UAV imagery). In Experiment E3a, while most classes reached IoU levels exceeding 90% after calibration, the street tree class showed only a marginal improvement of +1.79 p.p. (reaching 82.65%). Investigation into the ground truth samples revealed that this limited gain was not due to a lack of model fitness, but rather to incomplete reference labeling. Many roads and parking lots—where street trees are typically planted—were classified as “other” in the ground-truth dataset, without finer classification. Consequently, street trees located along these unlabeled infrastructure could not be properly identified as the street tree class and were instead lumped into the generic “other” category. The model accurately captured the spectral characteristics of these trees—identifying them as deciduous vegetation—but was penalized for failing to match the coarse “other” label in the ground truth. This result underscores that the proposed algorithm accurately captures transferable spectral features even when the measured performance appears suppressed by incomplete annotation.

4.2. Automatic Ratio-Based Transfer Across Regions and Models

This section examines whether the derived optimal hyperparameters can be transferred across geographic zones and model architectures without additional calibration. Section 4.2.1 first analyzes the response surface characteristics that enable such transferability, followed by cross-district validation experiments (E4–E5).

4.2.1. Performance Function Properties and Ratio Derivation

To assess whether the hyperparameter auto-search performance function designed to improve land-cover segmentation accuracy can be applied across disparate locations, we fitted the function separately to D004 and D067 and then applied the proportional mapping of the stationary (optimal) solution from D004 to D067. We first compared the response surface implied by the empirical IoU distribution from actual inference. Figure 4 presents the predicted response surface for the coniferous class in D067, where two RGB statistics (mean or standard deviation) are varied along the axes while the remaining factors are fixed at their optimal values. Across all six panels—three mean pairs (R_mean–G_mean, G_mean–B_mean, R_mean–B_mean) and three standard-deviation pairs (R_std–G_std, G_std–B_std, R_std–B_std)—the high-IoU zone exhibits local concentration near the stationary point (focusing).

Empirically observed IoU distributions also exhibited local concentration within specific ranges of RGB statistics (Figure 5). Furthermore, coniferous forest classification investigations conducted on French FLAIR test districts (toy dataset) and Korean AI-HUB imagery revealed that, despite significant differences in geographic coordinates and acquisition conditions, all trials consistently showed the same optimal hyperparameter ratio pattern across channels (R↓, G↑, B↓). Based on these reproducible observations, we hypothesize that the weight distribution of neural networks trained on RGB spectral properties is structured as a manifold within a specific region of feature space. Consequently, we infer that the ideal solution derived for each dataset maintains a proportional relationship with the RGB statistical attributes of that dataset, and that this relationship is transferable between distinct territories.

Based on this reasoning, a proportional-transfer method was developed to estimate optimal parameters across heterogeneous areas by applying the same ratio between the optimal solution and the baseline distribution obtained from the performance function. First, for D004, the hyperparameter values that yielded the highest IoU were identified, and the ratio of these optimal parameters to the domain’s baseline RGB statistics (mean and standard deviation) was computed. This ratio was then applied to the baseline RGB distribution of another region, D067, to construct a new hyperparameter set for inference. The corresponding variable X is defined as the set of RGB mean and standard deviation values: θ

= (R_{m e a n}, G_{m e a n}, B_{m e a n}, R_{s t d}, G_{s t d}, B_{s t d})

. Let as

θ_{A, b a s e}

,

θ_{B, b a s e}

, and

θ_{B, b e s t}

, denote the baseline of region A, the baseline of region B, and the optimal parameters of B, respectively. Then, the predicted optimum for A, obtained by applying the proportional ratio of B’s optimum,

θ_{A, p r e d ∣ B}

is defined as follows: (⊙: element-wise multiplication; ⊘: element-wise division)

θ_{A, p r e d ∣ B} = θ_{A, b a s e} ⊙ θ_{B, b e s t} ⊘ θ_{B, b a s e}

(4)

4.2.2. Applying D004 Optimization to D067 (E4)

The first data in Table 8 represent the land-cover classification results obtained by substituting the baseline RGB mean and standard deviation (

θ_{D 004, b a s e}

) of the D004 image (a). The IoU values for coniferous and plowed land were 25.77% and 19.00%, respectively, approximately 20 p.p. lower than those of the FLAIR model (56.60% and 49.01%) (Table 6), while the IoU of other classes achieved the baseline IoU levels provided by FLAIR. The second column (b) presents the results optimized for the D004 data, focusing on the coniferous class (

θ_{D 004, b e s t}

). The optimal RGB mean values were 90.39, 103.62, and 86.23, with standard deviations of 46.91, 38.01, and 33.23, respectively. Using this optimal value

θ_{D 004, b e s t}

, IoU values for coniferous and plowed land increased to 42.80% and 58.83%, corresponding to improvements of +17.03 p.p. and +39.82 p.p. over the baseline. Subsequently, the predicted optimum

X_{D 004, p r e d ∣ D 067}

computed by applying the optimized ratio (

θ_{D 004, b a s e}

:

X_{D 004, p r e d ∣ D 067}

=

θ_{D 067, b a s e}

:

θ_{D 067, b e s t}

) obtained from D067 yielded RGB mean values of 88.01, 101.49, and 85.71, and standard deviations of 46.36, 38.30, and 33.13, respectively (c). By substituting the value of

θ_{D 004, p r e d ∣ D 067}

, IoU values for coniferous and plowed land were obtained as 42.61% and 66.98%, respectively (c).

The numerical discrepancy between the local optimum (b) and the predicted optimum (c) in Table 8 stems from differing territorial intercepts. Since D004 has lower baseline RGB means and standard deviations than D067, the starting point of its unimodal response surface is fundamentally shifted. While the geometric curvature and peak reflect the intrinsic spectral properties of the objects, the absolute optimal coordinates vary across regions. Our proportional transfer strategy addresses this distributional discrepancy by transferring relative ratios rather than absolute values. This validates our technique as a robust method for domain adaptation without requiring retraining or fine-tuning.

When the performance function optimized for D004 (

θ_{D 004, b e s t}

) was reapplied (b), the IoU of coniferous decreased slightly by −0.19 p.p., whereas that of plowed land increased significantly by +12.15 p.p. These results indicate that the optimized ratio derived from D067’s performance function contributed to a greater improvement in the plowed land class. When the optimized hyperparameters of U-Net (ResNet34) were applied to the U-Net (MiTB5) model, similar performance improvements were observed. Despite architectural differences, coniferous IoU improved by up to +4.37 p.p., and herbaceous IoU additionally increased by +5.88 p.p. These results demonstrate that the proposed method is applicable across different architectures.

Figure 6 compares the land-cover classification maps of D004, generated from the predefined class assignments (a) and the hyperparameter application criteria. The map produced using the baseline parameters (

θ_{D 004, b a s e}

) (b) shows that many coniferous areas were misclassified as deciduous regions. In contrast, the map produced using the optimized parameters (

θ_{D 004, b e s t}

) (c) demonstrates clearer class boundaries. Furthermore, when the performance function derived from D067 was applied to infer the optimized parameters (

θ_{D 004, p r e d ∣ D 067}

) (d), coniferous areas were slightly over-classified, whereas the area of agricultural land decreased, and the IoU declined from 54% to 34%. These results indicate that the performance function, implemented under an identical model structure, maintains a certain level of calibration capability and applicability even in domains that differ from the training region.

4.2.3. Applying D067 Optimization to D004 (E5)

Table 9 presents the land-cover classification results for D067 obtained using the baseline RGB parameters

θ_{D 067, b a s e}

(a). The IoU for the agricultural land class was 10.51%, significantly lower than the IoU achieved by the FLAIR-published model (Table 6a). In Experiments E1 and E2 (Table 5 and Table 6), coniferous IoUs were notably low (13.57% and 13.36%) due to heterogeneous spectral characteristics from geographically dispersed samples. In contrast, Experiment E5 (Table 9) focused on district D067, where consistent elevation and concentrated geographic coverage resulted in a base IoU of 75.93%—exceeding even the original FLAIR model performance of 56.60%. In contrast, the IoU for coniferous areas was the lowest in most other regions, so it was included as a target class for improvement. Brushwood class was excluded from the improvement targets because, although its IoU remained low, its total area was relatively small. Based on agricultural land and coniferous classes, the optimal parameters

θ_{D 067, b e s t}

that achieved the highest IoU were derived using the performance function. The RGB means were calculated as 98.05, 114.46, and 103.70, and the corresponding standard deviations were 62.32, 56.19, and 53.94 (b). When applying these optimized parameters, IoU values for coniferous and agricultural land increased to 80.35% and 63.87%, representing improvements of +4.47 p.p. and +53.36 p.p., respectively, compared with the baseline. Subsequently, using the same model structure, the optimal ratio (

θ_{D 067, b a s e}

:

X_{D 067, p r e d ∣ D 004}

=

θ_{D 004, b a s e}

:

θ_{D 004, b e s t}

) obtained from the D004 region was applied to D067 as

θ_{D 067, p r e d ∣ D 004}

. The resulting RGB means were 100.69, 115.87, and 104.32, and the standard deviations were 63.06, 55.77, and 54.11 (c). When this parameter set was applied, the IoU values for coniferous and agricultural land increased to 80.41% and 70.43%, corresponding to improvements of +0.06 p.p. and +6.56 p.p. compared with

θ_{D 067, p r e d ∣ D 004}

.

Similar results were observed for the D067. When U-Net (ResNet34)’s optimized hyperparameters (

θ_{D 067, b e s t}

) and D004 proportional transfer results (

X_{D 067, p r e d ∣ D 004}

) were applied to U-Net (MiTB5), coniferous IoU improved by up to +7.06 p.p. and herbaceous IoU increased by +16.35 p.p. These findings demonstrate that the proposed method can be automatically applied across different architectures and regions without additional training. These results indicate that, in both D004 and D067, the performance function maintained its generalization capability across diverse zones. In other words, the optimized ratio obtained from one district could reproduce similar performance improvements when transferred to another site with different spectral characteristics.

Figure 7 compares the land-cover classification maps of D067, generated from the predefined class assignments (a) and from the application of different hyperparameter settings. The map using the baseline parameters

θ_{D 067, b a s e}

(b) shows that a large portion of herbaceous areas was misclassified as agricultural land. In contrast, the map using the optimized parameters

θ_{D 067, b e s t}

(c) demonstrates that the IoU of herbaceous increased from 56.12% to 74.68%. When applying the proportional function derived from D004, the map inferred with

X_{D 067, p r e d ∣ D 004}

(d) also showed improvements in both coniferous and herbaceous classes. The IoU of herbaceous reached a level comparable to

X_{D 067, b e s t}

, while the IoU of agricultural land increased to 70.43%, which is higher than both

X_{D 067, b a s e}

and

X_{D 067, b e s t}

. These results indicate that the optimal ratios derived from the performance functions of D004 and D067 are transferable across regions. In other words, even when applied to areas with different spectral characteristics, the model maintained a consistent level of calibration and generalization performance.

4.3. Statistical and Economic Validation of RSM-Based Performance Function

This section validates statistical significance, cross-domain generalizability, and economic advantages over conventional fine-tuning.

4.3.1. Statistical Validity and Multi-Dimensional Generalizability

The unimodal, convex IoU response surface assumption was statistically validated across all six experiments (p < 1.54 × 10⁻¹⁴⁷). Experiments E1, E4, and E5 showed high model fitness (R² = 0.973–1.000), confirming that performance follows a well-defined mathematical model rather than a random distribution.

The relatively lower R² values observed in some experiments demonstrate that the proposed methodology accurately reflects the uncertainties of the actual system. In cases such as E2 (Adj.R² = 0.706) and E3b (Adj.R² = 0.771), where baseline class-specific performance was low (E2 coniferous IoU: 13.36%, E3b street tree accuracy: 61.68%; Table 10), the model’s response to spectral changes was inconsistent, resulting in an irregular surface that was accurately captured by the decline in R². Additionally, cases such as street trees in E3a, where improvement was constrained despite a high R², were analyzed and attributed to labeling noise, such as misclassifying adjacent roads or parking lots as the “Other” class. These results indicate that the response surface model serves as a reliable indicator of data quality and class-specific characteristics, rather than merely reflecting numerical overfitting.

Beyond fine-tuning specific objects, this methodology ensures broad generalizability by capturing fundamental spectral-spatial interactions of ground surfaces. Consistent performance improvements were confirmed across varying sensor and resolution conditions, from 12 cm UAV imagery to 25 cm aerial imagery. The synergistic effect observed in E3a, where calibration derived for the street tree class simultaneously improved performance for the non-targeted parking lot class, provides empirical evidence that this method utilizes the inherent physical reflection principles of ground objects rather than being limited to specific object geometries.

The proposed response surface model exhibits characteristics independent of specific neural network architectures. As confirmed in experiments E4 and E5, optimal hyperparameter settings derived from the ResNet34 architecture transferred successfully to the MiT-B5 model without recalibration, yielding similar performance improvements. This cross-architecture transferability confirms that the optimization method captures intrinsic spectral properties of ground objects rather than model-specific internal characteristics.

As demonstrated in experiments E1 through E5, this algorithm exhibits exceptional sample efficiency, constructing a robust response surface with only 32 to 150 tiles, corresponding to areas of 12 to 157 hectares. By generating sample data from an average of 50 to 60 model inferences, sufficient statistical significance is achieved within data ranges typically collected in operational environments. Notably, robust construction was achieved even with a per-class coverage as small as 0.445 hectares (E3b. Although the spatial extent was very limited, it provided sufficient spectral information to generate 50–60 independent performance samples via repeated inferences. The number of samples is approximately twice the theoretical minimum of 28 measurements required for a six-variable quadratic model, ensuring the high reliability and statistical significance of the resulting response surface.

4.3.2. Cost–Benefit Analysis

A key advantage of our approach lies in class-specific spatial selection. Rather than processing the entire dataset, our method optimizes only tiles where the target class actually exists, exploiting the spatial distribution of land cover classes. For instance, experiment E1 optimized only 32 out of 100 UAV tiles containing coniferous forests (32% selection rate), while E4 optimized 50 tiles in northern coniferous areas (50% selection rate). This strategy selectively calibrates low-performing classes and improves overall accuracy via Decision Fusion, distinct from fine-tuning, which requires retraining across all tiles regardless of class distribution. To quantify this advantage, we compare implementation costs with those of full model training (Table 11) and with prior studies [31,32] (Table 12).

As seen in the cases of AI-HUB and FLAIR, training a full model requires large-scale datasets and high-performance GPU clusters (Table 11). In contrast, the proposed channel calibration operates with a minimum of 32–150 tiles, completing optimization within 2–11 h on commodity CPU-based environments (Intel i9-13900). Despite these minimal costs, experimental results demonstrate exceptional performance gains. This study used the Error Reduction Rate (ERR) to account for the fact that performance improvements become increasingly difficult as baseline accuracy increases. Notably, Base IoU reflects actual inference performance on the target area, which often drops significantly due to cross-regional transfer despite high official benchmark scores (e.g., 13.57 in E1). In experiment E1, the proposed method improved Base IoU to 80.35 (77.27% ERR), resolving over three-quarters of real-world errors encountered during deployment. Cloud costs were reduced by a minimum of 23.5× to a maximum of over 4000× compared to full training.

A direct comparison with prior lightweight transfer learning research targeting labeling cost reduction demonstrated the superiority of the proposed channel calibration across all aspects (Table 12): data efficiency, hardware accessibility, execution time, and cost-effectiveness. While the methods in [31] and MTPI [32] require thousands to tens of thousands of tiles and tens to hundreds of hours of training on high-performance GPUs (Tesla V100, RTX 4090), this study completes optimization within hours using only tens to hundreds of tiles on commodity CPU environments. Performance-wise, it achieved approximately 2.6× higher IoU than [31] while maintaining accuracy comparable to MTPI, yet demonstrating 75× and 10× superior cost-effectiveness in the Cost/ERR metric, respectively. Cost/ERR denotes the cost per unit error reduction; lower values indicate more economically efficient performance improvements. This study recorded 0.04–0.81 (average 0.43), proving significantly superior cost-efficiency compared to [29]’s 31.87 and MTPI’s 4.21. Notably, this study achieved methodological distinctiveness by adapting to multiple regions without retraining, using a training-free approach that optimizes only input-normalization hyperparameters—channel-wise mean and standard deviation (2b parameters across b spectral bands)—while keeping the pre-trained model weights frozen.

The proposed channel calibration demonstrates itself as a superior architectural choice rather than a compromise. It achieves cost reductions of up to thousands of times relative to full model training and tens of times the cost-effectiveness of prior transfer learning studies, while maintaining equivalent or superior performance. Optimal hyperparameters derived using response surface methodology are transferable across domains and models, enabling automated performance enhancement. This allows research budgets previously allocated to redundant retraining to be redistributed toward expanded geographic coverage, increased monitoring frequency, or multi-temporal analysis. Particularly, the capability to operate on commodity CPUs alone establishes a democratic environment where all researchers can implement state-of-the-art deep learning-based land cover classification technologies, even in settings with limited access to specialized AI infrastructure. This presents a versatile and practical framework that is scalable to arbitrary multispectral sensor configurations and is expected to lower the economic barriers to satellite-based environmental monitoring substantially.

5. Conclusions

Our approach achieved substantial performance improvements of up to 67.86 p.p. in IoU through inference-stage hyperparameter calibration while maintaining the original network structure. This method demonstrated applicability across various areas, including different locations, periods, and environments (e.g., Korea and France), as well as different models.

Transfer learning techniques can be categorized into four types based on the extent of architecture modification and the requirements for training on new data (Table 13) [17]. These include traditional transfer learning, which improves the architecture by training on new data; data expansion, which preserves the architecture while training on new data; weight fine-tuning, which improves the architecture without training on new data; and lightweight transfer learning, which modifies neither the architecture nor the training. This study falls into the fourth category, employing hyperparameter adjustment and label refinement through decision fusion.

The contributions of this study can be summarized in four aspects. First, the proposed method demonstrates resource efficiency by operating without requiring GPUs or large-scale datasets. Second, it enhances accessibility by enabling researchers without complex technical backgrounds to apply the technique. Third, it provides practical value by enabling immediate reuse of existing models and data without retraining. Fourth, it confirms the methodological significance by demonstrating the potential for performance optimization through the utilization of expert knowledge.

Beyond practical contributions, this study reveals a fundamental pattern through iterative experiments: CNN weight spaces exhibit structural order corresponding to input data’s spectral and spatial characteristics. Weight distributions stabilize as low-order polynomials of spectral statistics (channel-wise mean and standard deviation), enabling inference in unseen regions without retraining. Neural networks have traditionally been perceived as “black boxes” due to the opacity of their internal representations [50,51].

Addressing such opacity, the history of science demonstrates that similar challenges have been resolved through inductive research, as exemplified by Newton’s falling apple [52] and Durkheim’s suicide rates [53]. Like these precedents, this study suggests the existence of a ‘Consistent Structural Ordering,’ where CNN weights are systematically organized according to domain-specific statistical characteristics. This empirical observation may provide insights into the underlying principles of deep learning architectures across diverse domains, including medical imaging and natural language processing.

Supplementary Materials

The interactive pilot service for visual verification of this study is available at http://moondb.iptime.org/. The Algorithm Foundation (E0) consists of the pilot experiment results for the Gimpo area, which served as the Proof-of-Concept (PoC) to establish the fundamental response surface optimization logic described in Figure 1. For the Main Experiments (E1, E2, E4, and E5), the service provides a curated selection of representative target tiles where the performance gains from channel calibration and decision fusion are most significant. To ensure server stability and efficient data transfer, E1 and E4 present 32 and 50 representative tiles, respectively. The Excluded Experiments (E3a and E3b), despite their high statistical accuracy reported in Table 7, were excluded from the pilot service due to the high computational load and bandwidth requirements for hosting the extensive street-tree datasets on a personal server environment.

Author Contributions

Conceptualization, H.-J.M. and N.-W.C.; methodology, H.-J.M.; validation, H.-J.M. and N.-W.C.; formal analysis, H.-J.M.; investigation, N.-W.C.; resources, H.-J.M.; data curation, H.-J.M.; writing—original draft preparation, H.-J.M.; writing—review and editing, N.-W.C.; visualization, H.-J.M.; supervision, N.-W.C.; project administration, N.-W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets analyzed in this study are publicly available. The FLAIR dataset can be accessed at https://ignf.github.io/FLAIR/, and the AI-HUB Land Cover dataset is available at https://aihub.or.kr/ (registration required). Additional processed data, including class-wise channel calibration parameters and optimization files generated in this study, are not publicly available due to ongoing intellectual property (patent) considerations.

Acknowledgments

This study was supported by the Research Program funded by the SeoulTech (Seoul National University of Science and Technology).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI-HUB	Artificial Intelligence Hub
FLAIR	French Land cover from Aerospace ImageRy
GDS	Ground Sampling Distance
K-NLIF	Korea National Land Information Platform

References

Li, Z.; Yu, W. Economic Impact of the Los Angeles Wildfires; UCLA Anderson Forecast: Los Angeles, CA, USA, 2025. [Google Scholar]
Kiyada, S.; Huang, H.; Arranz, A. How South Korea’s Largest and Deadliest Wildfire Spread. Available online: https://www.reuters.com/graphics/SOUTHKOREA-FIRE/movaydeneva/ (accessed on 25 December 2025).
Adebayo, T.; Asadu, C. Death Toll Reaches 151 in North-Central Nigerian Town Submerged in Floods, with Thousands Displaced. Available online: https://apnews.com/article/nigeria-floods-mokwa-c29db671f8b92972d8800a82adb2cd97 (accessed on 31 August 2025).
Guo, H.-X.; Zhu, W.-Q.; Xiao, C.-D.; Zhao, C.-L.; Chen, L.-Y. The freezing—thawing index and permafrost extent in pan-Arctic experienced rapid changes following the global warming hiatus. Adv. Clim. Change Res. 2025, 16, 350–360. [Google Scholar] [CrossRef]
Knoblauch, C.; Beer, C.; Liebner, S.; Grigoriev, M.N.; Pfeiffer, E.-M. Methane production as key to the greenhouse gas budget of thawing permafrost. Nat. Clim. Change 2018, 8, 309–312. [Google Scholar] [CrossRef]
Walter Anthony, K.M.; Anthony, P.; Grosse, G.; Chanton, J. Geologic methane seeps along boundaries of Arctic permafrost thaw and melting glaciers. Nat. Geosci. 2012, 5, 419–426. [Google Scholar] [CrossRef]
Miner, K.R.; Turetsky, M.R.; Malina, E.; Bartsch, A.; Tamminen, J.; McGuire, A.D.; Fix, A.; Sweeney, C.; Elder, C.D.; Miller, C.E. Permafrost carbon emissions in a changing Arctic. Nat. Rev. Earth Environ. 2022, 3, 55–67. [Google Scholar] [CrossRef]
Lindsey, R. Climate Change: Global Sea Level. Available online: https://www.climate.gov/news-features/understanding-climate/climate-change-global-sea-level (accessed on 31 August 2025).
Korea Hydrographic and Oceanographic Agency. Sea Level in Korea Has Risen by an Average of 9.9 cm over the Past 33 Years. Available online: https://www.mof.go.kr/doc/ko/selectDoc.do?docSeq=48293&menuSeq=971&bbsSeq=10 (accessed on 31 August 2025).
European Space Agency. Copernicus: Introducing Copernicus. Available online: https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Introducing_Copernicus (accessed on 27 October 2025).
United Nations Convention to Combat Desertification (UNCCD). Land Degradation Neutrality. Available online: https://www.unccd.int/land-and-life/land-degradation-neutrality/overview (accessed on 31 August 2025).
Intergovernmental Panel on Climate Change (IPCC). 2006 IPCC Guidelines for National Greenhouse Gas Inventories. Volume 4: Agriculture, Forestry and Other Land Use (AFOLU); Intergovernmental Panel on Climate Change: Geneva, Switzerland, 2006; Volume 4. [Google Scholar]
Intergovernmental Panel on Climate Change (IPCC). 2019 Refinement to the 2006 IPCC Guidelines for National Greenhouse Gas Inventories; Intergovernmental Panel on Climate Change: Geneva, Switzerland, 2019. [Google Scholar]
Gomez-Ossa, L.F.; Sanchez-Torres, G.; Branch-Bedoya, J.W. Land Cover Classification in the Antioquia Region of the Tropical Andes Using NICFI Satellite Data Program Imagery and Semantic Segmentation Techniques. Data 2023, 8, 185. [Google Scholar] [CrossRef]
Chroni, A.; Vasilakos, C.; Christaki, M.; Soulakellis, N. Fusing Multispectral and LiDAR Data for CNN-Based Semantic Segmentation in Semi-Arid Mediterranean Environments: Land Cover Classification and Analysis. Remote Sens. 2024, 16, 2729. [Google Scholar] [CrossRef]
Caye Daudt, R.; Le Saux, B.; Boulch, A.; Gousseau, Y. Multitask learning for large-scale semantic change detection. Comput. Vis. Image Underst. 2019, 187, 102783. [Google Scholar] [CrossRef]
Dastour, H.; Hassan, Q.K. A Comparison of Deep Transfer Learning Methods for Land Use and Land Cover Classification. Sustainability 2023, 15, 7854. [Google Scholar] [CrossRef]
Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
Naushad, R.; Kaur, T.; Ghaderpour, E. Deep Transfer Learning for Land Use and Land Cover Classification: A Comparative Study. Sensors 2021, 21, 8083. [Google Scholar] [CrossRef]
Yoo, S.; Sohn, H.-G. Automatic Building Extraction Using Deep Learning-Based Semantic Segmentation Technique: Focusing on changes in accuracy according to the weight of the model and transfer learning. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2023, 41, 605–615. [Google Scholar] [CrossRef]
Alem, A.; Kumar, S. Transfer Learning Models for Land Cover and Land Use Classification in Remote Sensing Image. Appl. Artif. Intell. 2022, 36, 2014192. [Google Scholar] [CrossRef]
Gomroki, M.; Hasanlou, M.; Reinartz, P. STCD-EffV2T Unet: Semi Transfer Learning EfficientNetV2 T-Unet Network for Urban/Land Cover Change Detection Using Sentinel-2 Satellite Images. Remote Sens. 2023, 15, 1232. [Google Scholar] [CrossRef]
Yifter, T.; Razoumny, Y.N.; Lobanov, V.K. Deep Transfer Learning of Satellite Imagery for Land Use and Land Cover Classification. Inform. Autom. 2022, 21, 963–982. [Google Scholar] [CrossRef]
Siddamsetty, J.; Stricker, M.; Charfuelan, M.; Nuske, M.; Dengel, A. Inter-Region Transfer Learning for Land Use Land Cover Classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, X-1/W1-2023, 881–888. [Google Scholar] [CrossRef]
Khan, M.A.; Hamza, A.; Ibrar, W.; Jamel, L.; Alasiry, A.; Marzougui, M.; Kumari, S.; Nam, Y. Coastal and Land Use Land Cover Area Recognition From High-Resolution Remote Sensing Images Using a Novel Multimodal Attention Inception Residual Deep Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 17460–17475. [Google Scholar] [CrossRef]
Hong, I.; Lee, G. Detection of Land Use Change Using Transfer Learning and Aerial Photography. J. Korean Cartogr. Assoc. 2022, 22, 15–24. [Google Scholar] [CrossRef]
Wu, Y.; Li, Z.; Zhao, B.; Song, Y.; Zhang, B. Transfer Learning of Spatial Features From High-Resolution RGB Images for Large-Scale and Robust Hyperspectral Remote Sensing Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–32. [Google Scholar] [CrossRef]
Ammar, A.; Koubaa, A.; Ahmed, M.; Saad, A.; Benjdira, B. Vehicle Detection from Aerial Images Using Deep Learning: A Comparative Study. Electronics 2021, 10, 820. [Google Scholar] [CrossRef]
Zhou, J.; Gu, X.; Gong, H.; Yang, X.; Sun, Q.; Guo, L.; Pan, Y. Intelligent classification of maize straw types from UAV remote sensing images using DenseNet201 deep transfer learning algorithm. Ecol. Indic. 2024, 166, 112331. [Google Scholar] [CrossRef]
Fayaz, M.; Nam, J.; Dang, L.M.; Song, H.-K.; Moon, H. Land-Cover Classification Using Deep Learning with High-Resolution Remote-Sensing Imagery. Appl. Sci. 2024, 14, 1844. [Google Scholar] [CrossRef]
Van den Broeck, W.A.J.; Goedemé, T.; Loopmans, M. Multiclass Land Cover Mapping from Historical Orthophotos Using Domain Adaptation and Spatio-Temporal Transfer Learning. Remote Sens. 2022, 14, 5911. [Google Scholar] [CrossRef]
Chen, Y.; Zhou, B.; Xiaopeng, C.; Ma, C.; Cui, L.; Lei, F.; Han, X.; Chen, L.; Wu, S.; Ye, D. A method of deep network auto-training based on the MTPI auto-transfer learning and a reinforcement learning algorithm for vegetation detection in a dry thermal valley environment. Front Plant Sci 2024, 15, 1448669. [Google Scholar] [CrossRef]
Seong, S.K.; Choi, S.K.; Choi, J.W. Cloud Detection of PlanetScope Imagery Based on Deeplab-V3+ by Using Transfer Learning. J. Korean Soc. Geospat. Inf. Sci. 2020, 28, 25–32. [Google Scholar] [CrossRef]
Lee, Y.-k.; Sim, W.-d.; Lee, J.-s. Assessing the Impact of Sampling Intensity on Land Use and Land Cover Estimation Using High-Resolution Aerial Images and Deep Learning Algorithms. J. Korean Soc. For. Sci. 2023, 112, 267–279. [Google Scholar] [CrossRef]
Boston, T.; Van Dijk, A.; Larraondo, P.; Thackway, R. Comparing CNNs and Random Forests for Landsat Image Segmentation Trained on a Large Proxy Land Cover Dataset. Remote Sens. 2022, 14, 3396. [Google Scholar] [CrossRef]
Sertel, E.; Ekim, B.; Ettehadi Osgouei, P.; Kabadayi, M.E. Land Use and Land Cover Mapping Using Deep Learning Based Segmentation Approaches and VHR Worldview-3 Images. Remote Sens. 2022, 14, 4558. [Google Scholar] [CrossRef]
Jo, W.; Park, K.-H. Deep learning based Land Cover Change Detection Using U-Net. J. Korean Geogr. Soc. 2022, 57, 297–306. [Google Scholar] [CrossRef]
Hong, I. Land Cover Object Detection Using Aerial Photography and YOLOv5. J. Korean Cartogr. Assoc. 2023, 23, 37–48. [Google Scholar] [CrossRef]
Khan, M.; Hanan, A.; Kenzhebay, M.; Gazzea, M.; Arghandeh, R. Transformer-based land use and land cover classification with explainability using satellite imagery. Sci. Rep. 2024, 14, 16744. [Google Scholar] [CrossRef] [PubMed]
Sahu, M.; Dash, R.; Kumar Mishra, S.; Humayun, M.; Alfayad, M.; Assiri, M. A deep transfer learning model for green environment security analysis in smart city. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 101921. [Google Scholar] [CrossRef]
Garioud, A.; Gonthier, N.; Landrieu, L.; De Wit, A.; Valette, M.; Poupée, M.; Giordano, S. FLAIR: A country-scale land cover semantic segmentation dataset from multi-source optical imagery. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 16456–16482. [Google Scholar]
Myers, R.H.; Montgomery, D.C.; Anderson-Cook, C.M. Response Surface Methodology: Process and Product Optimization Using Designed Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Torkashvand, A.; Ramezanipour Penchah, H.; Ghaemi, A. Exploring of CO2 adsorption behavior by Carbazole-based hypercrosslinked polymeric adsorbent using deep learning and response surface methodology. Int. J. Environ. Sci. Technol. 2022, 19, 8835–8856. [Google Scholar] [CrossRef]
Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S.; Philips, W. Combining feature fusion and decision fusion for classification of hyperspectral and LiDAR data. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 3–18 July 2014; pp. 1241–1244. [Google Scholar]
Zhong, Y.; Cao, Q.; Zhao, J.; Ma, A.; Zhao, B.; Zhang, L. Optimal Decision Fusion for Urban Land-Use/Land-Cover Classification Based on Adaptive Differential Evolution Using Hyperspectral and LiDAR Data. Remote Sens. 2017, 9, 868. [Google Scholar] [CrossRef]
Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 2002, 37, 145–151. [Google Scholar] [CrossRef]
Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef]
Shi, Z.; Fan, J.; Du, Y.; Zhou, Y.; Zhang, Y. LULC-SegNet: Enhancing Land Use and Land Cover Semantic Segmentation with Denoising Diffusion Feature Fusion. Remote Sens. 2024, 16, 4573. [Google Scholar] [CrossRef]
Scott, G.J.; Marcum, R.A.; Davis, C.H.; Nivin, T.W. Fusion of Deep Convolutional Neural Networks for Land Cover Classification of High-Resolution Imagery. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1638–1642. [Google Scholar] [CrossRef]
Sussillo, D.; Barak, O. Opening the black box: Low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 2013, 25, 626–649. [Google Scholar] [CrossRef]
Olden, J.D.; Jackson, D.A. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 2002, 154, 135–150. [Google Scholar] [CrossRef]
Fara, P. Catch a falling apple: Isaac Newton and myths of genius. Endeavour 1999, 23, 167–170. [Google Scholar] [CrossRef]
Danigelis, N.; Pope, W. Durkheim’s Theory of Suicide as Applied to the Family: An Empirical Test. Soc. Forces 1979, 57, 1081–1106. [Google Scholar] [CrossRef]

Figure 1. Workflow and Algorithm of Hyperparameter Optimization for IoU Improvement.

Figure 2. Class-wise Channel Calibration on AI-HUB datasets with Decision Fusion. (c) baseline prediction on total tiles (n = 100), (d) optimized calibration focusing on target tiles containing the specific class (n = 32), and (e) final decision-fusion results applied back to total tiles (n = 100) to verify domain-wide improvement.

Figure 3. Coniferous-centered Optimization on FLAIR datasets with Decision Fusion.

Figure 4. IoU Response Surfaces with Respect to RGB Means and Standard Deviations (D067). Red dots indicate the optimum points with maximum predicted IoU for each response surface combination.

Figure 5. IoU vs. RGB Means of D004 (left), D067 (right) domain.

Figure 6. Land-cover segmentation map for region D004 under different hyperparameters.

Figure 7. Land-cover segmentation map for region D067 under different hyperparameters.

Table 2. Data collection and Pre-trained Model from AI-HUB and FLAIR.

Institution	Type	GSD (m)	Patch (ea)	Multispectral Bands	CRS	Model (Backbon)
IGN https://github.com/IGNF/	Aerial	0.20	77,412	5 (RGB, NIR, Elevation)	EPSG:2154	U-Net (ResNet34) U-Net (MiTB5)
AI-HUB https://www.aihub.or.kr/	UAV	0.12	900	3 (RGB)	EPSG:5186	U-Net
AI-HUB https://www.aihub.or.kr/	Aerial	0.25	45,000	3 (RGB)	EPSG:5186	U-Net

Note: All URLs were accessed on 25 December 2025.

Table 3. Experimental Design with Class-Specific Channel Calibration.

Experiment	Region	Image Type	Purpose	Total Classes	Total Tiles	Target Class	Target Tiles	Model (Backbone)
E1	Korea	UAV	Hyperparameter Optimization	5	100	coniferous	32	U-Net (ResNet34)
E2	France	Aerial		15	47 ¹	coniferous	47	U-Net (ResNet34)
E3a	Korea	UAV		12	100	street tree	100	U-Net
E3b	Korea	Aerial		12	44 ²	street tree	44	U-Net
E4	France (D004)	Aerial	Domain Generalization	15	100	coniferous	50	U-Net (ResNet34) U-Net (Mitb-5)
E5	France (D067)	Aerial	Domain Generalization	15	150	coniferous	150	U-Net (ResNet34) U-Net (Mitb-5)

Notes: Total Tiles: The number of tiles of the experimental target class that serves as the basis for Base IoU; Target Tiles: The number of calibration target tiles in which the target class exists. D004 and D067 indicate specific district IDs. ¹ Excluding 3 mislabeled tiles from 50 toy datasets provided by FLAIR; ² Select only tiles with street trees from aerial photos provided by AI-HUB.

Table 4. Semantic Mapping between AI-HUB and FLAIR for Experimental Coverage E1–E5.

Notes: ‘O’ indicates classes included in experiments (E1–E5); ‘★’ denotes target classes used for deriving and validating the Performance Function. Empty cells represent excluded classes due to taxonomic mapping discrepancies.

Table 5. Coniferous-centered Optimization for AI-HUB Dataset Using FLAIR Model.

Classes	IoU (%)			ΔIoU (p.p.)
Classes	(a) Model	(b) Base	(c) Best	(b) − (a)	(c) − (b)	(c) − (a)
building	78.58	79.83	81.34	1.25	1.51	2.76
water	84.56	86.69	93.05	2.13	6.36	8.49
coniferous	56.07	13.57	81.43	−42.5	67.86	25.36
deciduous	69.53	26.04	54.53	−43.49	28.49	−15
greenhouse	60.69	87.61	91.52	26.92	3.91	30.83

Notes: (a) model = Official performance reported by model provider; (b) base = Inference using image-wise RGB mean and standard deviation; (c) best = Optimized after hyperparameter calibration.

Table 6. Coniferous-centered Optimization for FLAIR Toy Dataset Using FLAIR Model.

Classes	IoU (%)			ΔIoU (p.p.)
Classes	(a) Model	(b) Base	(c) Best	(b) − (a)	(c) − (b)	(c) − (a)
building	78.58	79.39	79.39	0.81	0.00	0.81
pervious surface	53.00	37.28	37.28	−15.72	0.00	−15.72
impervious surface	71.92	70.72	70.86	−1.20	0.14	−1.06
bare soil	60.60	33.44	33.44	−27.16	0.00	−27.16
water	84.56	76.70	77.53	−7.86	0.83	−7.03
coniferous	56.07	13.36	75.24	−42.71	61.88	19.17
deciduous	69.53	49.58	69.04	−19.95	19.46	−0.49
brushwood	27.87	22.33	22.19	−5.54	−0.14	−5.68
vineyard	75.72	62.58	62.29	−13.14	−0.29	−13.43
herbaceous	51.94	54.61	56.04	2.67	1.43	4.10
agricultural land	57.73	52.22	52.14	−5.51	−0.08	−5.59
plowed land	40.91	40.71	40.71	−0.20	0.00	−0.20
swimming	37.08	54.50	54.50	17.42	0.00	17.42
greenhouse	60.69	0.38	0.38	−60.31	0.00	−60.31

Table 7. Street tree-centered Optimization for AI-HUB datasets with the AI-HUB model.

Classes	E3a. IoU (%) of Calibration on UAV			E3b. IoU (%) of Calibration on Aerial
Classes	a. Base	b. Best	b − a	c. Base	d. Best	d − c
building	93.08	97.36	4.28	94.82	94.26	−0.56
parking lot	87.97	97.88	9.91	87.53	86.87	−0.66
road	95.21	97.86	2.65	95.42	94.64	−0.78
street tree	80.86	82.65	1.79	64.27	66.46	2.19
paddy field	97.23	99.09	1.86	97.54	96.98	−0.56
greenhouse	96.23	95.44	−0.79	92.67	91.96	−0.71
dry field	91.33	97.60	6.27	95.31	93.96	−1.35
deciduous	92.29	92.26	−0.03	87.26	87.36	0.10
coniferous	93.12	95.48	2.36	88.88	88.62	−0.26
bare land	94.05	94.48	0.43	93.03	92.53	−0.50
water	94.20	97.68	3.48	95.58	95.16	−0.42

Notes: Unlike Table 5 and Table 6, AI-HUB reports accuracy not IoU. Street tree accuracy: E3a = 79.68%, E3b = 61.68%. Base and best show class-wise IoU after calibration.

Table 8. IoU of D004 Inference: Coniferous-centered RSM Optimum.

Basis	$(a) θ_{D 004, b a s e}$		$(b) θ_{D 004, b e s t}$		$(c) X_{D 004, p r e d ∣ D 067}$		ΔIoU (p.p.)
RGB_mean	87.32, 96.64, 90.50		90.39, 103.62, 86.23		88.01, 101.49, 85.71		-		-
RGB_std	46.47, 38.61, 33.39		46.91, 38.01, 33.23		46.36, 38.30, 33.13		-		-
Classes	a1. ResNet	a2. MiTB5	b1. ResNet	b2. MiTB5	c1. ResNet	c2. MiTB5	b1 − a1	b2 − a2	c1 − a1	c2 − a2
building	76.22	76.74	75.36	76.09	75.41	76.14	−0.86	−0.65	−0.81	−0.60
pervious surface	38.73	31.97	36.03	30.53	35.46	30.68	−2.70	−1.44	−3.27	−1.29
impervious surface	56.81	59.54	57.70	59.38	58.13	59.13	0.89	−0.16	1.32	−0.41
coniferous	25.77	29.15	42.80	33.48	42.61	33.52	17.03	4.33	16.84	4.37
deciduous	58.94	69.42	61.74	61.13	61.71	62.04	2.80	−8.29	2.77	−7.38
brushwood	40.19	47.65	40.68	47.02	40.08	47.08	0.49	−0.63	−0.11	−0.57
vineyard	86.40	85.49	82.90	86.40	81.59	86.50	−3.50	0.91	−4.81	1.01
herbaceous	50.26	48.79	48.63	51.64	48.13	51.38	−1.63	2.85	−2.13	2.59
agricultural land	54.14	56.90	34.40	62.78	34.27	59.98	−19.74	5.88	−19.87	3.08
plowed land	19.00	36.87	58.83	37.94	66.98	38.22	39.83	1.07	47.98	1.35
swimming	24.98	72.21	25.07	72.51	25.12	72.44	0.09	0.30	0.14	0.23
greenhouse	78.07	76.74	79.28	76.09	80.35	76.14	1.21	−0.65	2.28	−0.60

Table 9. IoU of D067 Inference: Coniferous-centered RSM Optimum.

Basis	$(a) θ_{D 067, b a s e}$		$(b) θ_{D 067, b e s t}$		$(c) X_{D 067, p r e d ∣ D 004}$		ΔIoU (p.p.)
RGB_mean	97.28, 108.99, 109.48		98.05, 114.46, 103.70		100.69, 116.87, 104.32		-		-
RGB_std	62.47, 56.64, 54.35		62.32, 56.19, 53.94		63.06, 55.77, 54.11		-		-
Classes	a1. ResNet	a2. MiTB5	b1. ResNet	b2. MiTB5	c1. ResNet	c2. MiTB5	b1 − a1	b2 − a2	c1 − a1	c2 − a2
building	78.32	69.13	78.04	72.62	77.52	72.29	−0.28	3.49	−0.80	3.16
pervious surface	68.73	67.35	67.41	63.75	67.06	63.80	−1.32	−3.60	−1.67	−3.55
impervious surface	42.68	39.41	42.15	39.06	42.22	39.56	−0.53	−0.35	−0.46	0.15
water	81.09	84.15	84.32	83.39	84.24	83.86	3.23	−0.76	3.15	−0.29
coniferous	75.93	55.61	80.35	62.19	80.41	62.67	4.42	6.58	4.48	7.06
deciduous	72.19	69.30	77.90	72.74	78.47	72.90	5.71	3.44	6.28	3.60
brushwood	6.15	1.88	4.39	3.74	4.81	4.01	−1.76	1.86	−1.34	2.13
herbaceous	56.12	51.83	74.68	63.84	74.95	68.18	18.56	12.01	18.83	16.35
agricultural land	10.51	7.85	63.87	13.95	70.43	18.80	53.36	6.10	59.92	10.95
plowed land	97.21	93.77	98.09	95.22	97.94	95.44	0.88	1.45	0.73	1.67

Table 10. Generalizability and Statistical Rationale of the Response Surface Assumption.

Type	GSD (cm)	Experiments	Model (Backbone)	Target Tiles (EA)	Target Area (ha)	Geographic Coverage for Total Tiles (ha)	Target Class Coverage per Class (ha) Model IoU (%)	ΔIoU (Effect of RSM Function, Effect of Proportional Transfer)	R²	Adj. R²	F-Stat.	Prob
UAV	12	E1	U-Net (ResNet34)	32	12.08	≈1.23 M (Gyeonggi region)	coniferous 1.845 56.07	coniferous (67.86) deciduous (28.49) water (6.36) greenhouse (3.91)	1.000	0.999	985.9	1.01 × 10⁻³
Aerial	20	E2	U-Net (ResNet34)	47	49.28	≈64.54 M (France)	coniferous 2.224 56.07	coniferous (61.88) deciduous (19.46)	0.770	0.706	12.03	1.55 × 10⁻²⁰
UAV	12	E3a	U-Net	100	37.75	≈1.23 M (Gyeonggi region)	street tree 0.6956 79.68 (accuracy)	street tree (1.79) parking lot (9.91) dry field (6.27) building (4.28)	0.972	0.968	269.9	1.54 × 10⁻¹⁴⁷
Aerial	25	E3b	U-Net	44	72.09	≈10.05 M (Korea)	street tree 0.445 61.68 (accuracy)	street tree (2.09)	0.792	0.771	38.48	2.55 × 10⁻⁷⁷
Aerial	20	E4	U-Net (ResNet34)	50	52.43	52.43 (D004)	coniferous 6.568 56.07 (Resnet34) 58.91 (MiTB5)	coniferous (17.03, 16.84) plowed land (39.83, 47.98)	0.984	0.973	83.99	3.44 × 10⁻²⁵
Aerial	20	E4	U-Net (MiTB5)	50	52.43	52.43 (D004)	coniferous 6.568 56.07 (Resnet34) 58.91 (MiTB5)	coniferous (4.33, 4.37) agricultural land (5.88, 3.08)	0.984	0.973	83.99	3.44 × 10⁻²⁵
Aerial	20	E5	U-Net (ResNet34)	150	157.29	157.29 (D067)	coniferous 16.365 56.07 (Resnet34) 58.91 (MiTB5)	coniferous (4.42, 4.48) herbaceous (18.56, 18.83) agricultural land (53.36, 59.92)	0.992	0.986	156.7	4.24 × 10⁻²⁷
Aerial	20	E5	U-Net (MiTB5)	150	157.29	157.29 (D067)	coniferous 16.365 56.07 (Resnet34) 58.91 (MiTB5)	coniferous (6.58, 7.06) herbaceous (12.01, 16.35) agricultural land (6.10, 10.95)	0.992	0.986	156.7	4.24 × 10⁻²⁷

Notes: E1–E3: Single value = RSM-based direct optimization. E4–E5: First value = RSM-based optimization on target region; second value = proportional transfer from source region without recalibration.

Table 11. Cost–Benefit Analysis: Full Model Training vs. Training-Free RSM Calibration.

Criteria	Learning Type		Original Model Training		This Study
Labeled Data	Experiment#-Imagery Type		AI-HUB	IGN (FLAIR)	RSM Based Optimization	Proportional Transfer (E4, E5)
	E1-UAV	Target tiles (Region)	800 (Korea)	-	32 (Gyeonggi region)	-
	E2-Aerial		-	218,400 (France)	47 (Korea)	-
	E3a-UAV		800 (Korea)	-	100 (Gyeonggi region)	-
	E3b-Aerial		40,000 (Korea)	-	44 (Korea)	-
	E4-Aerial		-	218,400 (France)	50 (E4, D004)	50 (E4, D004)
	E5-Aerial		-	218,400 (France)	150 (E5, D067)	150 (E5, D067)
Computing Resource	Computing Resource		AI-HUB Resource	FLAIR Resource	Optimization Resource	Transfer Resource
	Main Processor		Intel Xeon Gold 6230, NVIDIA Tesla T4	NVIDIA V100 × 4 gpus × 4 nodes	Intel i9-13900 CPU × 1	Intel i9-13900 CPU × 1
	Cost of Main Processor (US$)		≈4400 = 1900 + 2500 (2018 MSRP)	≈112,000(2017 MSRP) (7000 × 4 gpus × 4 nodes)	≈500~600 (2022 MSRP)	≈500~600 (2022 MSRP)
	Execution Time (min)		NA	432(7.2 h) (72 epochs * 6 min/epoch)	140 (E1), 200 (E2),300 (E3a), 190 (E3b), 220 (E4), 660 (E5)	3.6 (E4), 11 (E5)
	Cloud Cost (US$) *		NA	352.51 = 12.24 ($/h) × 7.2	≈3.2~15 = 1.36 ($/h) × (2.3~11)	≈0.08~0.25
Performance	Experiment#-Target Class–Model		AI-HUB Model IoU	FLAIR Model IoU	This Study: Base IoU → Best IoU(ERR)
	E1. coniferous-U-Net (ResNet34)		-	56.07	13.57 → 80.35 (77.27)	-
	E2. coniferous-U-Net (ResNet34)		-	56.07	13.36 → 75.24 (71.42)	-
	E3a. street tree-U-Net		84.89	-	80.86 → 82.65 (9.35)	-
	E3b. street tree-U-Net		55.67	-	64.27 → 66.46 (6.13)	-
	E4. coniferous-U-Net (ResNet34)		-	56.07	25.77 → 42.80 (22.94)	25.77 → 42.61 (22.69)
	E4. coniferous-U-Net (MiTB5)		-	58.91	-	29.15 → 33.52 (6.17)
	E5. coniferous-U-Net (ResNet34)		-	56.07	75.93 → 80.35 (18.36)	75.93 → 80.41 (18.61)
	E5. coniferous-U-Net (MiTB5)		-	58.91	-	55.61 → 62.67 (15.90)

* Cloud costs calculated using 2025 pricing: FLAIR training on AWS p3.8xlarge ($12.24/hour, 4× V100 GPUs per node); optimization on Google Cloud n2-standard-32 ($1.36/hour). Actual costs may vary by region and commitment type. Notes: Error Reduction Rate (ERR) normalizes performance improvement as (Base_Error − Best_Error)/Base_Error, where Error = 100 − IoU, quantifying the proportion of remaining error eliminated. Original Model Benchmark IoU based on official documentation: AI-HUB, IGN FLAIR.

Table 12. Cost–Benefit Analysis: Light Transfer Learning vs. Training-Free RSM Calibration.

Method	Model (Backbone)	Processor (Collab Service) Price (US$)/h	Patch Size #Classes	Target Tiles (GSD)	Execution Time (h)	IoU (ERR)	Cloud Cost	Cost per 1% ERR
Channel calibration [this paper]	U-Net (Renet34) U-Net (Renet34) U-Net U-Net U-Net (Renet34) U-Net (Renet34)	Intel i9-13900 (n2-standard-32) 1.36	512 ² 12–15	32 (0.12)	2.33 (E1)	80.35 (77.27)	3.17	0.04
				47 (0.25)	3.33 (E2)	75.24 (71.42)	4.53	0.06
				100 (0.12)	5.00 (E3a)	82.65 (9.35)	6.80	0.73
				44 (0.25)	3.17 (E3b)	66.46 (6.13)	4.31	0.70
				50 (0.20)	3.67 (E4)	42.80 (22.94)	4.99	0.22
				150 (0.20)	11.00 (E5)	80.35 (18.36)	14.96	0.81
Domain adaptation [31]	EfficientNetB5	Tesla V100 16GB (n1-standard-8) 2.48	512 ² 14	4320 (0.30)	169.6	27.80 (27.36)	871.97	31.87
MTPI [32]	Seg-Res-Net50	RTX 4090 × 1 (a2-highgpu-1g) 3.67	250 ² 4	30,000 (0.12)	73	90.88 (63.7)	267.91	4.21

Table 13. Classification of Transfer Learning Approaches and Position of This Study.

	Architecture Improvements O	Architecture Improvements X
Learning new datasets O	<Traditional Transfer Learning> Backbone reconstruction [19,20,21,22,23,24,25] Weight fine-tuning [19,20,22,23,24,26,27,29,30,31,32,33,34,37,38,39,40] Feature extraction [19,21,24,25,31,34]	<Data Expansion> Domain shift [23,24] Domain adaptation [24,31,33] Reference-label generation [22,23,28,31,32,33,34]
Learning new datasets X	<Structure Optimization> Learning rate optimization [19,20,23,24,25,31,34,40] Early stopping [19,20,21,24,39,40] Decoder reconstruction [20]	<Lightweight Transfer Learning (This Study)> Hyperparameter adjustment [19,21,22,23,24,25,26,27,28,29,30,31,33,34,39,40,41] Backbone reuse [21,27,28,29,30,32,39,40] Label calibration [26,29]

Note. The reference numbers indicate prior studies in which each technique was applied. A single study may be classified into multiple categories.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moon, H.-J.; Cho, N.-W. Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration. Remote Sens. 2026, 18, 205. https://doi.org/10.3390/rs18020205

AMA Style

Moon H-J, Cho N-W. Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration. Remote Sensing. 2026; 18(2):205. https://doi.org/10.3390/rs18020205

Chicago/Turabian Style

Moon, Hye-Jung, and Nam-Wook Cho. 2026. "Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration" Remote Sensing 18, no. 2: 205. https://doi.org/10.3390/rs18020205

APA Style

Moon, H.-J., & Cho, N.-W. (2026). Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration. Remote Sensing, 18(2), 205. https://doi.org/10.3390/rs18020205

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Training-Free Lightweight Transfer Learning for Land Cover Segmentation Using Multispectral Calibration

Highlights

Abstract

1. Introduction

2. Related Works

3. Research Design

3.1. Datasets and Pre-Trained Models

3.2. Performance Evaluation Criteria

3.3. Hyperparameter Optimization

3.4. Experimental Design

3.5. Class Mapping and Semantic Equivalence

4. Experimental Results and Validation of Performance Functions

4.1. Channel Calibration and Performance Optimization

4.1.1. FLAIR Model on Korea Dataset (E1)

4.1.2. FLAIR Model on FLAIR Dataset (E2)

4.1.3. AI_HUB Model on Korea Dataset (E3a, E3b)

4.2. Automatic Ratio-Based Transfer Across Regions and Models

4.2.1. Performance Function Properties and Ratio Derivation

4.2.2. Applying D004 Optimization to D067 (E4)

4.2.3. Applying D067 Optimization to D004 (E5)

4.3. Statistical and Economic Validation of RSM-Based Performance Function

4.3.1. Statistical Validity and Multi-Dimensional Generalizability

4.3.2. Cost–Benefit Analysis

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI