Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images

Liu, Qi; Fang, Guodong; Zhang, Naichi; Pei, Chenhao; Wu, Song; Yang, Min; Shen, Jie; Yu, Kai; Shi, Xuezheng; Sun, Weixia; Liu, Jie; Liu, Cun; Wang, Yujun

doi:10.3390/soilsystems10030039

Open AccessArticle

Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images

by

Qi Liu

^1,2,

Guodong Fang

^1,2,*,

Naichi Zhang

^2,3,

Chenhao Pei

^2,3

,

Song Wu

²,

Min Yang

^4,5,*,

Jie Shen

^2,6,

Kai Yu

⁶,

Xuezheng Shi

²,

Weixia Sun

²,

Jie Liu

²,

Cun Liu

^2,* and

Yujun Wang

^2,3

¹

School of Earth and Environment, Anhui University of Science and Technology, Huainan 232001, China

²

State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 211135, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Nanjing Institute of Environmental Science, Ministry of Ecology and Environment, Nanjing 210042, China

⁵

State Environmental Protection Key Laboratory of Soil Environmental Management and Pollution Control, Nanjing 210042, China

⁶

School of Environmental and Chemical Engineering, Nanchang Hangkong University, Nanchang 330063, China

^*

Authors to whom correspondence should be addressed.

Soil Syst. 2026, 10(3), 39; https://doi.org/10.3390/soilsystems10030039

Submission received: 27 November 2025 / Revised: 5 March 2026 / Accepted: 5 March 2026 / Published: 9 March 2026

Download

Browse Figures

Versions Notes

Abstract

Comprehensive soil surveys necessitate the integration of multidimensional pedological information, ranging from the morphological delineation of horizons and the taxonomic identification of soil groups to the quantitative assessment of soil organic matter (SOM). These attributes collectively constitute the basis for interpreting pedogenesis and guiding sustainable soil management. However, conventional methods are limited by the subjectivity of expert judgment for horizon and soil group identification, and the time-consuming nature of laboratory analyses for SOM quantification. We developed a novel multimodal deep learning framework based on an improved Mask R-CNN architecture that integrates soil profile images with auxiliary soil property data to concurrently delineate soil horizons, classify soil groups, and quantify SOM. The model was trained on high-resolution soil profile images from 451 soil survey sampling sites spanning ten soil groups across Anhui Province, China. Data augmentation and transfer learning with pre-training on large general image datasets were employed to address the dataset size limitations and improve model generalization. In addition to accurately delineating master horizons, we evaluated three schemes for classifying transitional horizons, which are often ambiguously determined by expert assessments: (i) assigning the transitional horizon to one adjacent master horizon; (ii) assigning it to both neighboring master horizons as an overlapping section; and (iii) treating the transitional horizon as an independent layer. Scheme (iii) achieved the best overall performance, e.g., horizon delineation with accuracy = 0.925, recall = 0.933, F1-score = 0.929, and segmentation mean average precision (seg-mAP) = 0.918, soil group classification accuracy = 0.717 and prediction of SOM with R² = 0.565. These results demonstrate that treating transitional horizons as independent layers yields superior segmentation. Consequently, this integrated framework provides a robust, automated solution for high-throughput soil resource assessment.

Keywords:

soil survey; soil profile; mask R-CNN; transitional horizon; transfer learning

1. Introduction

Soil horizons, as distinct layers formed from long-term weathering and biogeochemical formation processes, characterized by variation in color, texture and properties, are essential indicators of soil formation and functions [1]. Accurate delineation of soil horizons, coupled with soil group classification and key property assessment (e.g., soil organic matter, SOM), underpins soil taxonomy, quality evaluation and sustainable management [2]. SOM, in particular, drives soil structure formation, nutrient supply and overall soil quality, making its precise quantification integral to soil horizon and soil group interpretations as key components in soil surveys [3]. However, these tasks rely on experienced soil scientists and ex situ laboratory analyses, which may introduce subjectivity and inconsistency across different observers and biases from evolving instrumental procedures [4,5].

Recently, soil proximal sensors, such as visible near-infrared (vis-NIR) and portable X-ray fluorescence, have enabled rapid quantitative characterization of soil horizons and properties. However, these methods provide indirect morphological insights via quantification of element or functional group contents, while additional sample preparation and high instrument costs are required, limiting their field application [6,7]. In contrast, widespread usage of optical sensing devices, e.g., commercial cameras, can capture color and texture variation directly in soil profile images, which can more straightforwardly mimic expert delineation via advanced vision recognition techniques [8]. Traditional image recognition approaches, such as image segmentation based on color space models (e.g., CIE Lab*, HSV), effectively utilize soil color variations to detect horizon boundaries. However, these techniques exhibit instability when handling transitional horizons characterized by gradual color changes and often fail when applied to soil groups with indistinct color variations [8]. Deep learning methods for image recognition and segmentation have rapidly advanced in recent years. Jiang et al. successfully applied a nested U-Net model for automatic delineation of master horizons (A, B and C horizons) from soil profile images, with a mobile application for field soil surveys [5]. Yang et al. developed an improved UNet++ model to identify diagnostic horizons for better soil group classification [9]. Furthermore, the integration of smartphone-captured soil images with modern deep learning frameworks has significantly advanced in situ soil organic matter prediction, providing the viability of image-based proximal sensing for rapid soil assessment [10]. Recent applications of Mask R-CNN have also demonstrated its efficacy in segmenting complex soil features, highlighting the potential of instance segmentation for automated soil image understanding [11]. However, the delineation of transitional horizons such as AB, AC, and BC as discrete layers remains challenging due to gradual property changes and intra-profile variations [8,12]. Moreover, small-scale datasets, such as typical soil profile studies, severely constrain the performance of deep learning models by hindering comprehensive feature learning, thereby reducing predictive accuracy and stability [13]. Pre-training on large general datasets followed by transfer learning to the typically small specialized datasets can dramatically improve generalization and accuracy in small-data settings [14]. The broad, diverse features extracted from massive image datasets (e.g., Common Objects in Context dataset, COCO, with 330K images for object detection and segmentation) provide robust priors that adapt efficiently to fine-grained image patterns (e.g., soil horizon–profile segmentation), mitigating overfitting, accelerating convergence, and boosting predictive robustness [13].

Despite these advancements, several major challenges remain. First, most current models prioritize master horizons, often failing to accurately delineate transitional horizons, which are characterized by diffuse or indistinct boundaries. Second, existing frameworks are predominantly single-task in nature, emphasizing geometric segmentation while neglecting simultaneous prediction of functionally related soil attributes. This decoupling ignores the intrinsic correlations between profile morphology and soil properties, thereby hindering the development of comprehensive automated soil surveys. Multi-task and multimodal deep learning models, fueled by recent computing advances, have been used increasingly in a wide range of areas by integrating diverse inputs across domains [15]. Advanced image segmentation algorithms, such as Mask R-CNN instance segmentation models, leverage pre-training on large general datasets of image segmentation followed by transfer learning, incorporate various disparate features with images, capable of accurately predicting segmentation masks at the pixel level, and capture complementary information from many modalities [16]. This integrative approach holds strong promise to address the challenges faced by traditional image processing and deep learning methods in recognizing complex soil images.

To address these limitations, this study presents a novel multimodal deep learning framework based on an improved Mask R-CNN architecture. The contribution of this study is threefold: (i) we implement a multi-task learning strategy for the end-to-end joint execution of soil horizon delineation, soil group identification, and SOM prediction, utilizing a shared backbone to extract synergistic visual features; (ii) we systematically evaluate three labeling schemes to validate the efficacy of explicitly modeling transitional horizons as independent layers, thereby reducing the ambiguity inherent in traditional expert delineation; and (iii) we integrate transfer learning with advanced data augmentation to ensure robust generalization on small-scale soil datasets. This integrative approach not only mitigates the subjectivity of manual methods but also provides a scalable solution for precision agriculture and environmental monitoring.

2. Materials and Methods

The integrated research workflow is shown in Figure 1A. The process encompasses the following primary stages: Soil profiles were documented and imaged; horizon masks were annotated by experts; a model using an improved Mask R-CNN architecture was trained with data augmentation; and transfer learning for simultaneous evaluation of soil horizon segmentation, soil group classification and soil organic matter prediction.

2.1. Study Area and Soil Profiles

The study area is located in Anhui Province, China, which lies in the transitional zone of the middle and lower reaches of the Yangtze River, featuring diverse geomorphological conditions. Geographically, it falls within the typical East Asian monsoon climate (29°41′ N to 34°38′ N, 114°54′ E to 119°37′ E). Elevation ranges from <10 m in the Huaibei Plain to >1800 m in southern mountainous regions, with annual mean temperatures of 14–17 °C and annual precipitation ranging from 800 to 1800 mm, fostering varied soil development [17]. The interaction of diverse parent materials, geomorphological patterns, vegetation cover, and human activities results in significant soil heterogeneity [18], dividing the province into five geomorphic regions: Huaibei Plain (dominated by Fluvo-aquic soils and Shajiang black soils), Jianghuai Hills (Lithosols, Skeletal soils, Yellow-cinnamon soils, Yellow-brown earths, and Paddy soils), in western Anhui, Yangtze plain (Paddy soils and Fluvo-aquic soils), Dabie Mountains and southern hilly regions (Yellow-brown earths, Yellow-cinnamon soils, Skeletal soils, Meadow soils, Fluvo-aquic soils, Paddy soils, and Purplish soils).

To capture adequate coverage of soil variability and parent material diversity, a total of 451 soil profiles were sampled during October 2021 to June 2022 via recommended grid spacing, and adjustments were made considering nearby pollution sources (Figure 2A). Each soil profile had dimensions of approximately 1.5 m in length, 0.8 m in width, and 1.2 m in depth. Soil horizons were delineated by experts based on visual criteria, such as color, root distribution, gravel content, and lime reaction intensity, and tactile criteria, including soil structure, compaction, and moisture condition, accompanied by photographic documentation. Soil profile morphological descriptions and horizon boundary delineations were performed using standardized terminology and procedures consistent with international guidelines, including the FAO Guidelines for Soil Description and the USDA Soil Survey Manual [19,20]. High-resolution soil profile photographs (no less than 8 megapixels, 300 dpi) were taken using Canon digital cameras (e.g., Canon EOS 7D and Canon EOS 90D, Canon Inc., Tokyo, Japan) at a distance of approximately 1.0–1.5 m from the soil profile, ensuring consistent perspective and detail across the dataset. Images were captured under uniform natural light to ensure true color representation and to avoid overexposure and shadows. Each photograph included a scale bar and profile ID, covering the entire excavated profile from the surface to the bottom of the pit. All images were stored in JPEG format without post-processing. For each horizon, a composite soil sample was collected, air-dried, and sieved (2 mm) for laboratory analyses. SOM was derived by determining soil organic carbon (SOC) using the potassium dichromate–sulfuric acid oxidation method with external heating, multiplied by the conventional Van Bemmelen factor of 1.724 in accordance with the Agricultural Industry Standard of the People’s Republic of China (NY/T 1121.6-2006) [21]. While it is noted that the SOC-to-SOM ratio can vary with soil type, land use and depth [22], this standardized conversion was maintained to ensure consistency with China’s soil fertility assessment guidelines. All samples were analyzed in duplicate with procedural blanks and certified reference soils for quality control, and calcareous horizons were acid-pretreated to remove inorganic carbon prior to determination. Soil groups were classified according to the Genetic Soil Classification of China (GSCC), a hierarchical system comprising six levels, including order, suborder, great group, subgroup, family, and series [23]. We adopted the GSCC because it is the foundational taxonomy for national soil surveys and related soil databases in China, ensuring consistency with our original field records. Here, ‘soil group’ specifically refers to the GSCC great group level. The dataset comprises ten distinct soil groups, and their distribution across the 451 sampled profiles is summarized in Table 1. Representative profile photographs for each soil group are shown in Figure 2B.

2.2. Labeling Schemes

The genetic master horizons (A, B and C horizons) of soil profiles in the dataset were identified following field assignment by experts [1]. Horizon symbols and transitional horizon notation used in this study strictly follow the standard horizon nomenclature in these international guidelines to ensure methodological transparency and reproducibility [19,20]. Specifically, to ensure model robustness and annotation consistency under the constraint of limited sample size, detailed sub-horizons recorded in field surveys were aggregated into their respective master horizons (e.g., Ap was classified as A horizon, and Bt was classified as B horizon). However, transitional horizons were generally difficult to assign since they often appear in profile images as gradual variations in color, texture, and structure, leading to blurred boundaries that might result in inconsistent delineation among experts [8]. We therefore evaluated three distinct labeling schemes, which handled the unique characteristics of transitional zones in different ways: Scheme 1, transitional zones such as AB, AC, and BC were assigned to the dominant neighboring master horizon based on visible differences in color or texture; Scheme 2 transition zones were assigned with overlapping labels (e.g., the AB horizon was labeled in both A and B horizons); Scheme 3, transitional layers were treated as independent classes, resulting in a total of six labels: A, B, C, AB, AC, and BC. The comparison of the three strategies is shown in Figure 1C.

The annotation process was implemented using the Labelme (version 5.2.1, Massachusetts Institute of Technology, Cambridge, MA, USA) [24]. Each labeling scheme was applied to all soil profile images in the dataset to generate the corresponding mask images. These annotated images collectively constituted the foundational dataset for soil horizon segmentation.

2.3. Data Augmentation

Deep learning requires large-scale and high-quality training datasets to achieve optimal performance [25]. However, soil profile annotation is costly and expertise-intensive, limiting labeled data availability [5,8]. To address this, data augmentation was applied to expand the original training set (400 high-quality images selected out of 451 soil profile images) to 1200, enhancing variability and generalization capability [26,27]. Specifically, augmentation operations included brightness adjustment (scaling factor 0.35–1.0), Gaussian noise addition (randomly generated noise patterns with varying intensity across images), random pixel perturbation (~3% of pixels set to 0 or 255), image translation (horizontal and vertical shifted up to one-third of the maximum allowable distance without displacing bounding boxes outside the image), and random flipping (horizontal, vertical, or diagonal), as illustrated in Figure 1D. For each original image, at least two out of the five augmentation methods were randomly selected and combined with the original annotations preserved, ensuring the semantic integrity of the soil horizon masks. As a result, the dataset size was tripled, thus enhancing the Mask R-CNN model’s ability to recognize soil horizon structures under various image transformation conditions.

2.4. The Improved Mask R-CNN Model Architecture

The Mask R-CNN model processed the image input through a backbone network to generate feature maps, which were passed to a Region Proposal Network (RPN), and candidate Regions of Interest (RoIs) were then aligned with RoIAlign and passed to three heads for classification, bounding box regression, and mask prediction [16]. To capture subtle transitions between soil horizons, we adopted a ResNet backbone to mitigate vanishing gradients and accelerate convergence [28]. ResNet-50 was selected instead of ResNet-101 for the optimal balance of performance and efficiency. A Feature Pyramid Network (FPN) performed hierarchical multi-scale fusion to robustly represent fine stratification patterns and objects of different sizes [29]. The resulting multi-scale features were consumed by the RPN, which slid small convolutions over feature maps to generate anchors, performed foreground–background classification, and refined candidates via bounding box regression, yielding RoIs likely to coincide with horizon boundaries [30].

In addition to the standard detection and segmentation heads, we introduced a soil-attribute branch to the Mask R-CNN architecture to predict soil attributes at the image level. This branch processed the multi-scale features from the FPN by applying Global Average Pooling (GAP), followed by concatenation into a shared global vector. The vector was then passed through a lightweight MLP (Multi-Layer Perceptron) to create two parallel outputs: one head for soil group classification to predict category probabilities, and another head for SOM regression, which predicted a single scalar of SOM content. During training, SOM values were normalized to stabilize learning, and during inference, they were de-normalized back to their original scale. This soil-attribute branch shared the backbone and FPN with the detection and segmentation heads and was trained using a multi-task joint loss, combining soil group and SOM prediction with the standard losses for classification, bounding box, and mask prediction. Importantly, the addition of this branch did not interfere with the instance-level paths through RPN/ROI but augmented the network’s ability to predict soil attributes with minimal computational overhead.

After RoIAlign, per-RoI features were dispatched to three parallel output heads. The classifier assigned a horizon category; the regressor refined box coordinates; and the mask head—implemented as a small fully convolutional sub-network—produced a binary mask for each detected instance. The addition of the soil-attribute branch introduced two additional outputs for soil group and SOM prediction. Crucially, masks were class-specific, which alleviated inter-class conflicts typical of conventional FCN-style multi-class segmentation and improved the delineation of horizons and transitional layers in complex soil profiles [31]. The overview of the improved Mask R-CNN architecture was depicted in Figure 1E.

2.5. Loss Function

The loss function of the improved Mask R-CNN model comprised five components: classification loss (

L_{c l s}

), bounding box regression loss (

L_{b b o x}

), mask segmentation loss (

L_{m a s k}

), soil group classification loss (

L_{g r o u p}

), and SOM regression loss (

L_{s o m}

). The classification and bounding box regression losses were consistent with those used in the Faster R-CNN framework [16]. The soil group classification loss employed a cross-entropy loss function, while the SOM regression loss adopted a mean squared error loss function. During training, these two additional loss terms were incorporated into the original Mask R-CNN multi-task loss function with adjustable weights, thereby achieving end-to-end joint optimization. The complete formulation of the loss function was as follows:

L = L_{c l s} + L_{b b o x} + L_{m a s k} + L_{g r o u p} + L_{s o m}

(1)

L_{c l s} = - \sum_{i} y_{i} \log {(p}_{i})

(2)

L_{b b o x} = \frac{1}{|l a b e l s|} \sum_{i} L_{s m o o t h_L 1} ({b o x_r e g r e s s i o n}_{p r e d} [i], {b o x_r e g r e s s i o n}_{g t} [i])

(3)

L_{s m o o t h_L 1} (x) = \{\begin{matrix} 0.5 x^{2}, i f |x| < 1 \\ |x| - 0.5, o t h e r w i s e \end{matrix}

(4)

L_{g r o u p} = α_{g r o u p} \times \frac{1}{N} \sum_{j}^{N} - W_{j} \log (p_{j})

(5)

L_{s o m} = α_{s o m} \times \frac{1}{N} \sum_{k}^{N} {(y_{k} - s_{k})}^{2}

(6)

Here,

L_{c l s}

represents the classification loss, which computes the probability that each Region of Interest (RoI) belongs to a specific class. This loss is calculated using the cross-entropy function, where

y_{i}

denotes the ground-truth class label and

p_{i}

denotes the predicted class probability.

L_{b b o x}

denotes the bounding box regression loss, which measures the discrepancy between the predicted and ground-truth bounding boxes. It employs the smooth L1 loss function

L_{s m o o t h_L 1} (x)

, where

|l a b e l s|

indicates the number of labeled instances in the current sample,

{b o x_r e g r e s s i o n}_{p r e d} [i]

refers to the i predicted bounding box parameter, and

{b o x_r e g r e s s i o n}_{g t} [i]

denotes the i ground-truth bounding box parameter.

L_{m a s k}

is the mask segmentation loss, which evaluates the difference between the predicted and ground-truth masks for each RoI. This term is computed using binary cross-entropy loss, where

y_{i}

is the true label of the i pixel (either 0 or 1), and

p_{i}

is the predicted probability that the pixel belongs to the target class.

L_{g r o u p}

is the soil group classification loss, with

α_{g r o u p}

as its weight coefficient and N as the number of samples processed in a single forward/backward pass.

W_{j}

is the weight of the true class, and

p_{j}

is the post-softmax probability assigned to the true class.

L_{s o m}

is the SOM regression loss, where

α_{s o m}

is the loss weight coefficient.

y_{k}

is the model’s prediction, and

s_{k}

is the normalized ground-truth value.

To quantitatively evaluate the model’s performance in soil horizon segmentation, soil group classification, and SOM prediction under different labeling schemes, four metrics were selected to assess soil horizon segmentation performance: precision, recall, F1-score, and segmentation mean average precision (seg-mAP). For soil group classification, accuracy and the F1-score (denoted as F1_group) were used as evaluation metrics. For SOM regression, the coefficient of determination (R²), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were used as evaluation metrics.

2.6. Model Training

A transfer learning strategy prior to model training was adopted by initializing the Mask R-CNN model with weights pre-trained on the COCO dataset. The model was implemented in PyTorch (version 2.4.1, Meta Platforms, Inc., Menlo Park, CA, USA) and trained in parallel on four NVIDIA A6000 GPUs (NVIDIA Corporation, Santa Clara, CA, USA) using a distributed training setup. All experiments followed a unified configuration for training parameters and involved comparative training across the three different labeling schemes. For each scheme, the augmented dataset was split into training and validation sets in an 8:2 ratio. Stochastic Gradient Descent (SGD) optimizer was used with an initial learning rate of 0.02 and a weight decay coefficient of 1 × 10⁻⁴. Training was carried out for a total of 200 epochs with a batch size of 16. To enhance I/O efficiency during data loading, four parallel data-loading threads were employed.

3. Results and Discussion

3.1. Contributions of Data Augmentation and Transfer Learning in Soil Horizon Segmentation Models

The training loss and segmentation accuracy were compared between models trained with and without data augmentation (NDA), as shown in Figure 3A. Augmentation models exhibited faster loss reduction during the early and mid-training phases, converging to significantly lower stabilized values well before the two-hundred-epoch limit. NDA curves declined more slowly and stabilized later at higher terminal losses. Throughout late-stage training, the augmented models consistently maintained lower loss values across all schemes, indicating not merely transient gains but superior convergence and optimization outcomes. Additionally, the application of data augmentation resulted in marked improvements in all key metrics, including precision and seg-mAP on the validation set (Table 2). Scheme 3 showed that precision increased from 0.627 to 0.925 (approximately 47.5% improvement), and seg-mAP increased from 0.525 to 0.918 (approximately 74.9% improvement), indicating that data augmentation effectively enhanced the model’s adaptability to variations in image characteristics. These findings are consistent with the study by Shorten and Khoshgoftaar, who highlighted image data augmentation as a critical strategy for boosting performance and generalization capability in vision tasks [32].

3.2. Model Training Performance and Evaluation

The training objective is a composite loss that combines a cross-entropy term for RoI classification, a smooth L1 term for bounding box regression, a per-pixel binary cross-entropy term for mask prediction within each positive RoI, an image-level cross-entropy term for soil group classification, and a mean-squared error term for SOM regression. Consequently, the loss curves in Figure 3B reflect joint optimization of recognition, localization, mask quality, soil group classification and SOM regression. Training and validation losses were monitored over 200 epochs to characterize optimization dynamics across three labeling schemes. All schemes showed monotonic decrease followed by stable convergence, indicating effective gradient propagation and stable parameter updates. Specifically, Scheme 3 achieved the fastest convergence and lowest final losses on both training and validation sets, followed by Scheme 1. Scheme 2 performed the worst. In terms of evaluation metrics, Scheme 3 attained a precision of 0.925 (2.4% and 2.9% improvement over Scheme 1 and Scheme 2, respectively), a recall of 0.933 (1.4% and 10.7% improvement), an F1-score of 0.92 (1.9% and 6.9% gain), a seg-mAP of 0.918 (6.4% and 11.6% gain) (Table 2). These improvements suggested that explicitly labeling transitional zones (AB, AC, BC) provides more stable supervision during optimization, reducing ambiguity near fuzzy boundaries, improving convergence efficiency, and enhancing segmentation performance. Existing soil profile segmentation studies have mainly adopted semantic segmentation networks and addressed either master horizons using a nested U-Net, which reported a test pixel accuracy of approximately 0.83 [5], or diagnostic horizons using UNet++, which achieved a pixel accuracy of 82.66% [9]. In contrast, our Mask R-CNN framework achieves higher precision performance (precision = 0.925) while extending beyond segmentation to a unified multi-task formulation.

3.3. Performance Evaluation of the Soil-Attribute Branch

Given the expanded parameter scale of the multi-task framework and the added soil-attribute branches, overfitting or unstable optimization is likely to occur without data augmentation. Therefore, we conducted end-to-end joint training of the soil group classification and SOM regression branches exclusively on the data augmented dataset. Under Scheme 3, the loss curves for the soil-attribute branches (Figure 3C) decreased rapidly and stabilized within 80 epochs. During epochs 180–200, the fluctuation amplitude was approximately 0.01–0.02, with no evident oscillation or overfitting, indicating robust and reliable convergence.

The confusion matrix for soil group classification was presented in Figure 3E, where class-wise recall ranged from 0.48 to 0.86, with overall accuracy = 0.717 and F1_group = 0.677 (Table 2). Paddy soil, Limestone soil, and Shajiang black soil were identified most robustly, while Fluvo-aquic soil, Red soil, and Skeletal soil also performed well. Major misclassifications concentrate in two closely related groups: Yellow-brown earth—Yellow soil and Purple soil—Yellow-brown earth. Specifically, Yellow soil is misclassified as Yellow-brown earth at approximately 0.31; Yellow-brown earth is misclassified as Yellow soil at approximately 0.19 and 0.14, respectively, yielding a relatively low recall (0.48). Purple soil is misclassified as Yellow-brown soil at about 0.38, with a recall of 0.50. These errors could be plausibly explained as follows: (i) The hue of yellowish soils is predominantly governed by iron-oxide content, producing intrinsically similar color tones and only subtle differences in visual and texture cues, which lowers image-based separability. (ii) Purple soil is a parent-material–controlled soil whose profile can exhibit purple-brown to brownish-yellow tones under varying degrees of weathering and moisture, thereby overlapping in appearance with Yellow-brown earth [33]. (iii) Variations in illumination and moisture strongly affect visible reflectance and color expression, further exacerbating confusion among visually similar classes [34]. By contrast, Limestone soil is a typical calcareous soil, and Shajiang black soil exhibits salient morphological–physicochemical traits (heavy clay texture, poor permeability, and susceptibility to waterlogging/drought). These features yield more distinctive image and texture signatures, leading to fewer confusions [35].

SOM regression on the validation set yielded a linear fit in the ln(SOM) space as y = 0.76x + 0.79, R² = 0.565, MAE = 0.227, and RMSE = 0.327 (Figure 3D). The results indicate that image-based features have moderate explanatory power for SOM due to the complex interplay of pedogenic and environmental factors. Specifically, variations in soil moisture cause a reduction in reflectance across all wavelengths, a darkening effect that the model may misinterpret as humified organic matter accumulation [36]. Furthermore, parent material and texture define the soil’s primary mineral matrix, where high concentrations of Fe/Mn oxides or carbonates can impose dominant baseline colors that can mask the subtle chromatic shifts associated with SOM concentrations, thereby limiting prediction accuracy [37]. Additionally, illumination conditions also play a pivotal role in the accuracy of image-derived features. Intense ambient light can lead to overexposure and glare, which may compress the dynamic range of effective pixel intensities, effectively weakening the sensitivity of color-based indices [38]. Such illumination-induced artifacts may also introduce significant uncertainty and potential bias in the model’s SOM prediction.

3.4. Quantitative and Qualitative Evaluation of Labeling Schemes for Soil Horizon Segmentation

To further investigate the impact of different labeling schemes on the segmentation accuracy of individual soil horizons, this study calculated the mean average precision (mAP) scores for each horizon class at an Intersection over Union (IoU) threshold of 0.5. As shown in Figure 3F, under Scheme 1, the mAP values for horizons A and B were 0.89 and 0.92, respectively, while horizon C achieved only 0.75. In Scheme 2, the mAP for horizon C increased to 0.85, the mAP for horizon A decreased to 0.64, and the mAP for horizon B remained largely unchanged at 0.91. In contrast, Scheme 3 yielded the highest mAP scores across all three horizons, with values of 0.91 for A, 0.94 for B, and 0.85 for C. Additionally, the transition horizons AB, AC, and BC achieved mAP scores of 0.97, 0.91, and 0.93, respectively, indicating that the model was capable of accurately capturing the texture and morphological features of transitional layers. Overall, Scheme 3 demonstrated superior segmentation accuracy for both master and transitional horizons.

Qualitative analysis (Figure 4) further compared model predictions against expert delineations on the selected test set. Each row presents a side-by-side comparison between expert-delineated genetic horizons (column 2) and the predictions generated by the Mask R-CNN model (column 3), as well as the probability distribution of soil group prediction (column 4) and SOM prediction (column 5). Under Scheme 1, the model’s predictions were generally consistent with expert delineations, but overestimated A horizon extent and misaligned boundaries between horizons B and C, primarily attributed to the forced assignment of transitional zones to master horizons, leading to boundary drift. Under Scheme 2, the model-predicted A horizon covered a smaller spatial extent than the expert-delineated A horizon, allowing B horizon encroachment, while the C horizon spanned a larger extent into the B horizon due to overlapping multi-labels. Such multi-label supervision can trigger label conflicts and gradient interference within the same region, resulting in uncertainty in the model’s predictions. This is particularly problematic in profiles with narrow or poorly textured transitional zones, where the model’s discrimination capability diminishes, leading to under-segmentation or discontinuous outputs [39,40,41]. This consistently explained the drop in mAP for the A horizon to 0.64 and the increase in mAP for the C horizon to 0.85. In contrast, Scheme 3 produced the most accurate and coherent segmentation results together with the best predictions with both soil group and SOM content, closely matching expert delineations. The transitional layers were sharply segmented, and all horizons achieved the highest mAP values. From a semantic modeling perspective, Scheme 3 explicitly assigned distinct labels to transitional zones, enhancing the model’s sensitivity to these regions and preventing them from becoming ambiguous classification boundaries. This approach effectively mitigated label conflicts and gradient interference, while also improving the model’s generalization capacity and boundary consistency when dealing with complex soil profile structures and properties [42,43].

4. Conclusions

This study developed an improved Mask R-CNN multi-task framework to delineate soil horizons from profile photographs while simultaneously predicting soil groups and soil organic matter (SOM). Our findings demonstrate that treating transitional horizons (AB/AC/BC) as independent classes significantly reduces boundary ambiguity and improves multi-task stability. This approach provides a robust computational solution for capturing the inherent continuity of soil profiles.

Unlike previous studies that relied on semantic segmentation for single-task horizon identification [5,9], our Mask R-CNN framework integrates morphological, taxonomic and functional soil properties into a unified deep learning pipeline. By achieving high precision across multiple objectives, this method reduces observer subjectivity and offers a scalable tool for high-throughput soil surveys and digital soil description. The systematic evaluation of labeling strategies further confirms that explicit modeling of transitional layers is essential for improving both boundary clarity and the robustness of soil property predictions.

Despite these advancements, the current dataset is geographically concentrated, and some soil groups remain visually similar, which may limit the model’s generalization. Future research will focus on expanding the dataset across diverse pedogenic regions to support finer horizons and sub-horizons, including diagnostic horizon classification and more detailed taxonomic hierarchies. Furthermore, incorporating additional environmental and physicochemical covariates with image-based features will be critical to further refining SOM predictions and accounting for the complex interference of soil moisture and illumination.

Author Contributions

Conceptualization, G.F., M.Y. and C.L.; methodology, Q.L., X.S. and W.S.; software, Q.L. and J.L.; formal analysis, Q.L. and C.P.; investigation, Q.L. and J.S.; resources, M.Y. and S.W.; writing—original draft preparation, Q.L. and C.L.; writing—review and editing, Q.L. and N.Z., G.F. and C.L.; supervision, M.Y. and K.Y.; funding acquisition, Y.W., G.F. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (42225701, 42177015, 41977027 and 41671239).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions associated with the ongoing funded projects and the privacy of precise geospatial sampling locations.

Acknowledgments

Thanks to the National Engineering Laboratory of Soil Pollution Control and Remediation Technologies for providing computing resources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hartemink, A.; Zhang, Y.; Bockheim, J.; Curi, N.; Silva, S.; Grauer-Gray, J.; Lowe, D.J.; Krasilnikov, P. Soil horizon variation: A review. Adv. Agron. 2020, 160, 125–185. [Google Scholar]
Raiesi, F. A minimum data set and soil quality index to quantify the effect of land use conversion on soil quality and degradation in native rangelands of upland arid and semiarid regions. Ecol. Indic. 2017, 75, 307–320. [Google Scholar] [CrossRef]
Mirzaee, S.; Ghorbani-Dashtaki, S.; Mohammadi, J.; Asadi, H.; Asadzadeh, F. Spatial variability of soil organic matter using remote sensing data. Catena 2016, 145, 118–127. [Google Scholar] [CrossRef]
Lehmann, J.; Bossio, D.A.; Kögel-Knabner, I.; Rillig, M.C. The concept and future prospects of soil health. Nat. Rev. Earth Environ. 2020, 1, 544–553. [Google Scholar] [CrossRef]
Jiang, Z.-D.; Owens, P.R.; Zhang, C.-L.; Brye, K.R.; Weindorf, D.C.; Adhikari, K.; Sun, Z.-X.; Sun, F.-J.; Wang, Q.-B. Towards a dynamic soil survey: Identifying and delineating soil horizons in-situ using deep learning. Geoderma 2021, 401, 115341. [Google Scholar] [CrossRef]
Zhang, Y.; Hartemink, A.E. Soil horizon delineation using vis-NIR and pXRF data. Catena 2019, 180, 298–308. [Google Scholar] [CrossRef]
Sun, F.; Bakr, N.; Dang, T.; Pham, V.; Weindorf, D.C.; Jiang, Z.; Li, H.; Wang, Q.-B. Enhanced soil profile visualization using portable X-ray fluorescence (PXRF) spectrometry. Geoderma 2020, 358, 113997. [Google Scholar] [CrossRef]
Zhang, Y.; Hartemink, A.E. A method for automated soil horizon delineation using digital images. Geoderma 2019, 343, 97–115. [Google Scholar] [CrossRef]
Yang, R.; Chen, J.; Wang, J.; Liu, S. Toward field soil surveys: Identifying and delineating soil diagnostic horizons based on deep learning and RGB image. Agronomy 2022, 12, 2664. [Google Scholar] [CrossRef]
Naeimi, M.; Porwal, V.; Scott, S.; Krzic, M.; Daggupati, P.; Vasava, H.; Saurette, D.; Biswas, A.; Roul, A.; Biswas, A. Deep metric learning for soil organic matter prediction: A novel similarity-based approach using smartphone-captured images. Comput. Electron. Agric. 2025, 237, 110728. [Google Scholar] [CrossRef]
Chen, Y.; Liu, K.; Xin, Y.; Zhao, X. Soil image segmentation based on mask R-CNN. In Proceedings of the 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 6–8 January 2023; pp. 507–510. [Google Scholar]
Viscarra Rossel, R.; Webster, R. Discrimination of Australian soil horizons and classes from their visible–near infrared spectra. Eur. J. Soil Sci. 2011, 62, 637–647. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
Bayoudh, K.; Knani, R.; Hamdaoui, F.; Mtibaa, A. A survey on deep multimodal learning for computer vision: Advances, trends, applications, and datasets. Vis. Comput. 2022, 38, 2939–2970. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Lu, L.; Wang, J. High-resolution mapping of soil texture at various depths in Anhui Province, China. Earth Sci. Inform. 2025, 18, 174. [Google Scholar] [CrossRef]
Yabuki, T.; Matsumura, Y.; Nakatani, Y. Evaluation of pedodiversity and land use diversity in terms of the Shannon entropy. arXiv 2009, arXiv:0905.2821. [Google Scholar] [CrossRef]
Ditzler, C.; Scheffe, K.; Monger, H. Soil Survey Manual. In USDA Handbook 18; United States Department of Agriculture: Washington, DC, USA, 2017. [Google Scholar]
Jahn, R.; Blume, H.-P.; Asio, V.; Spaargaren, O.; Schad, P. Guidelines for Soil Description; FAO: Rome, Italy, 2006. [Google Scholar]
NY/T 1121.6–2006; Soil Testing-Part 6: Method for Determination of Soil Organic Matter. Ministry of Agriculture of the People’s Republic of China: Beijing, China, 2006.
Pribyl, D.W. A critical review of the conventional SOC to SOM conversion factor. Geoderma 2010, 156, 75–83. [Google Scholar] [CrossRef]
GB/T 17296–2009; Classification and Codes for Chinese Soil. Standardization Administration of the People’s Republic of China: Beijing, China, 2009.
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar]
Pham, V.; Weindorf, D.C.; Dang, T. Soil profile analysis using interactive visualizations, machine learning, and deep learning. Comput. Electron. Agric. 2021, 191, 106539. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Birodkar, V.; Lu, Z.; Li, S.; Rathod, V.; Huang, J. The surprising impact of mask-head architecture on novel class segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 7015–7025. [Google Scholar]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Zhong, S.; Han, Z.; Du, J.; Ci, E.; Ni, J.; Xie, D.; Wei, C. Relationships between the lithology of purple rocks and the pedogenesis of purple soils in the Sichuan Basin, China. Sci. Rep. 2019, 9, 13272. [Google Scholar] [CrossRef] [PubMed]
Weidong, L.; Baret, F.; Xingfa, G.; Qingxi, T.; Lanfen, Z.; Bing, Z. Relating soil surface moisture to reflectance. Remote Sens. Environ. 2002, 81, 238–246. [Google Scholar] [CrossRef]
Guo, Z.; Zhang, Z.; Zhou, H.; Wang, D.; Peng, X. The effect of 34-year continuous fertilization on the SOC physical fractions and its chemical composition in a Vertisol. Sci. Rep. 2019, 9, 2505. [Google Scholar] [CrossRef]
Lin, L.; Gao, Z.; Liu, X.; Sun, Y. A new method for multicolor determination of organic matter in moist soil. Catena 2021, 207, 105611. [Google Scholar] [CrossRef]
Qiu, N.-X.; Xie, X.-L.; Guan, L.; Li, A.-B.; Liu, J.; Liu, M.; Zhao, Y.-G. Eliminating the influence of free iron oxides on the prediction of organic matter in red soils using Vis-NIR reflectance spectroscopy. Geoderma 2025, 464, 117639. [Google Scholar] [CrossRef]
Gozukara, G.; Hartemink, A.E.; Zhang, Y. Illumination levels affect the prediction of soil organic carbon using smartphone-based digital images. Comput. Electron. Agric. 2023, 204, 107524. [Google Scholar] [CrossRef]
Islam, M.; Glocker, B. Spatially varying label smoothing: Capturing uncertainty from expert annotations. In Proceedings of the International Conference on Information Processing in Medical Imaging, Virtual Event, 28–30 June 2021; pp. 677–688. [Google Scholar]
Kim, D.; Tsai, Y.-H.; Suh, Y.; Faraki, M.; Garg, S.; Chandraker, M.; Han, B. Learning semantic segmentation from multiple datasets with label shifts. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 20–36. [Google Scholar]
Reiß, S.; Seibold, C.; Freytag, A.; Rodner, E.; Stiefelhagen, R. Every annotation counts: Multi-label deep supervision for medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9532–9542. [Google Scholar]
Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
Ding, H.; Jiang, X.; Liu, A.Q.; Thalmann, N.M.; Wang, G. Boundary-aware feature propagation for scene segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 6819–6829. [Google Scholar]

Figure 1. Schematic of the improved Mask R-CNN model for soil profile segmentation and attribute prediction. (A) The overall prediction process of the model. (B) Profile image of a soil sample showing the original soil profile. (C) Three different annotation schemes. (D) Data augmentation process. (E) Schematic architecture of the improved Mask R-CNN model.

Figure 2. Soil sampling locations and profile images. (A) Distribution of sampling points across different regions of Anhui Province, with its location in China indicated. Sample points are distributed across the Huabei Plain (blue), Jianghuai Hills (yellow), Dabie Mountains in western Anhui (green), the plain along the Yangtze River (orange), and mountainous and hilly regions of southern Anhui (purple). Dots are color-coded for soil profile depth from 25 cm to 120 cm, as indicated by the scale bar on the left. (B) Representative profile photographs for the ten soil groups (GSCC great group level) used in this study, including Purplish soils, Limestone soils, Red earths, Skeletal soils, Yellow-brown earths, Paddy soils, Shajiang black soils, Fluvo-aquic soils, and Yellow earths.

Figure 3. Improved Mask R-CNN: training dynamics and soil-attribute prediction performance. (A) Training loss trends across epochs under different labeling schemes with and without data augmentation (NDA). (B) Training and validation loss curves for three labeling schemes. (C) Training loss of the added soil-attribute branch (Scheme 3). (D) SOM regression performance in log space, where green dots represent individual soil samples. (E) Confusion matrix for soil group classification (normalized). (F) Comparison of segmentation performance (mAP@0.5) across soil layers under different labeling schemes.

Figure 4. Qualitative comparison across labeling schemes: horizon delineation, soil group probability, SOM prediction and distribution. Each row corresponds to one labeling scheme (Scheme 1, Scheme 2 and Scheme 3). Column 1: original soil profile; Column 2: expert-delineated genetic horizons; Column 3: predictions by the improved Mask R-CNN; Column 4: probability distribution of soil groups for the whole profile; Column 5: SOM prediction for the soil profile and its dataset-wide distribution across all samples. The letters (e.g., A, B, C, AB) overlaid on the images represent different soil horizons.

Table 1. Descriptive distribution of the dataset across soil groups and horizon occurrence (n = 451 profiles). Count indicates the number of profiles in which the corresponding horizon class appears at least once. A single profile can contribute to multiple horizon classes.

Section	Category	Count (n)	Percentage (%)
Soil group	Purple soils	19	4.21
	Limestone soils	20	4.43
	Red soils	65	14.4
	Skeletal soils	35	7.76
	Yellow-brown earths	45	9.98
	Paddy soils	101	22.39
	Shajiang black soils	70	15.52
	Fluvo-aquic soils	55	12.21
	Yellow-brown soils	28	6.21
	Yellow soils	13	2.89
	Total	451	100
Horizon occurrence	A	446	98.89
	B	347	76.94
	C	293	64.97
	AB	306	67.85
	AC	104	23.06
	BC	168	37.25

Table 2. Performance metrics of Mask R-CNN under three annotation schemes.

	Soil Horizon Segmentation				Soil Group Classification		SOM Prediction
	Precision	Recall	F1	Seg mAP	Accuracy	F1_Group	R²	MAE	RMSE
NDA-Scheme 1	0.694	0.545	0.611	0.666	-	-	-	-	-
NDA-Scheme 2	0.543	0.397	0.459	0.440	-	-	-	-	-
NDA-Scheme 3	0.627	0.464	0.533	0.525	-	-		-	-
Scheme 1	0.901	0.919	0.910	0.854	0.689	0.624	0.514	0.253	0.367
Scheme 2	0.896	0.826	0.860	0.802	0.674	0.601	0.486	0.274	0.389
Scheme 3	0.925	0.933	0.929	0.918	0.717	0.677	0.565	0.227	0.327

Bold values indicate the best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Q.; Fang, G.; Zhang, N.; Pei, C.; Wu, S.; Yang, M.; Shen, J.; Yu, K.; Shi, X.; Sun, W.; et al. Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images. Soil Syst. 2026, 10, 39. https://doi.org/10.3390/soilsystems10030039

AMA Style

Liu Q, Fang G, Zhang N, Pei C, Wu S, Yang M, Shen J, Yu K, Shi X, Sun W, et al. Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images. Soil Systems. 2026; 10(3):39. https://doi.org/10.3390/soilsystems10030039

Chicago/Turabian Style

Liu, Qi, Guodong Fang, Naichi Zhang, Chenhao Pei, Song Wu, Min Yang, Jie Shen, Kai Yu, Xuezheng Shi, Weixia Sun, and et al. 2026. "Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images" Soil Systems 10, no. 3: 39. https://doi.org/10.3390/soilsystems10030039

APA Style

Liu, Q., Fang, G., Zhang, N., Pei, C., Wu, S., Yang, M., Shen, J., Yu, K., Shi, X., Sun, W., Liu, J., Liu, C., & Wang, Y. (2026). Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images. Soil Systems, 10(3), 39. https://doi.org/10.3390/soilsystems10030039

Article Menu

Improved Mask R-CNN Multimodal Framework for Simultaneous Soil Horizon Delineation, Soil Group Identification and SOM Prediction from Soil Profile Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Soil Profiles

2.2. Labeling Schemes

2.3. Data Augmentation

2.4. The Improved Mask R-CNN Model Architecture

2.5. Loss Function

2.6. Model Training

3. Results and Discussion

3.1. Contributions of Data Augmentation and Transfer Learning in Soil Horizon Segmentation Models

3.2. Model Training Performance and Evaluation

3.3. Performance Evaluation of the Soil-Attribute Branch

3.4. Quantitative and Qualitative Evaluation of Labeling Schemes for Soil Horizon Segmentation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI