Next Article in Journal
Advanced Oxidation Process in the Sustainable Treatment of Refractory Wastewater: A Systematic Literature Review
Previous Article in Journal
Studies on Grass Germination and Growth on Post-Flotation Sediments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessing Soil Organic Carbon in Semi-Arid Agricultural Soils Using UAVs and Machine Learning: A Pathway to Sustainable Water and Soil Resource Management

by
Imad El-Jamaoui
*,
María José Delgado-Iniesta
*,
Maria José Martínez Sánchez
,
Carmen Pérez Sirvent
and
Salvadora Martínez López
Department of Agricultural Chemistry, Geology and Pedology, Faculty of Chemistry, University of Murcia, 30100 Murcia, Spain
*
Authors to whom correspondence should be addressed.
Sustainability 2025, 17(8), 3440; https://doi.org/10.3390/su17083440
Submission received: 17 February 2025 / Revised: 4 April 2025 / Accepted: 10 April 2025 / Published: 12 April 2025

Abstract

The global effort to combat climate change highlights the critical role of storing organic carbon in soil to reduce greenhouse gas emissions. Traditional methods of mapping soil organic carbon (SOC) have been labour-intensive and costly, relying on extensive laboratory analyses. Recent advancements in unmanned aerial vehicles (UAVs) offer a promising alternative for efficiently and affordably mapping SOC at the field level. This study focused on developing a method to accurately predict topsoil SOC at high resolution using spectral data from low-altitude UAV multispectral imagery, complemented by laboratory data from the Nogalte farm in Murcia, Spain, as part of the LIFE AMDRYC4 project. To attain this objective, Python version 3.10 was used to implement several machine learning techniques, including partial least squares (PLS) regression, random forest (RF), and support vector machine (SVM). Among these, the random forest algorithm demonstrated superior performance, achieving an R2 value of 0.92, RMSE of 0.22, MAE of 0.19, MSE of 0.05, and EVE of 0.71 in estimating SOC. The results of the RF model were then visualised spatially using GIS and compared with simple spatial interpolations of soil analyses. The findings suggest that a multispectral sensor UAV-based modelling and mapping of SOC can provide valuable insights for farmers, offering a practical means to monitor SOC levels and enhance precision agriculture systems. This innovative approach reduces the time and cost associated with traditional SOC mapping methods and supports sustainable agricultural practices by enabling more precise management of soil resources.

1. Introduction

Soil organic matter (SOM) plays a vital role in supporting plant growth, enhancing soil fertility, and sequestering carbon dioxide (CO2) [1,2]. The primary component of SOM, soil organic carbon (SOC), is crucial to various soil functions and environmental benefits [3]. Assessment of SOC levels is, therefore, recommended, since it is an indication of improved soil health, resulting in positive influences on physical, chemical, and biological characteristics [4]. This evaluation enables the implementation of better soil management practices that enhance soil structure, nutrient availability, water retention, and microbial activity. Consequently, these enhancements have been shown to increase soil fertility, boost agricultural productivity, and enhance resilience against environmental stresses [5]. Additionally, the capacity of the soil to support plant productivity, maintain water and air quality, and facilitate nutrient cycling is sustained by SOC [6]. Furthermore, it contributes to the mitigation of greenhouse gas emissions [7]. It is evident that variations in land use and agricultural practices have the capacity to significantly influence SOC levels and the balance between its formation and decomposition processes [8,9].
Although laboratory methods for measuring SOC are well established, they can be challenging, costly and time-consuming. The execution of these methods necessitates the utilisation of specialised equipment, the implementation of precise procedures, and the expertise of skilled technicians to ensure the accuracy of SOC level measurements [10,11]. This often necessitates extensive sample preparation, the use of chemical reagents, and advanced instrumentation, which can limit their accessibility and practicality for large-scale or frequent assessments. Furthermore, the necessity for numerous samples to be sent to a laboratory for analysis adds to the overall expense and time required for comprehensive SOC monitoring [12]. The labour-intensive and costly nature of these methods, coupled with the high volume of samples needed, underscores the significant need for rapid analysis techniques to enable timely and efficient monitoring of SOC on a large scale.
Although remote sensing employing multispectral satellite imagery has been demonstrated to be capable of predicting the spatial distribution of soil organic carbon (SOC) at small to medium scales, its effectiveness is hindered by factors such as low spatial and temporal resolution and weather interference, particularly in mountainous regions [13]. UAV-based remote sensing has emerged as a promising alternative for high-resolution SOC monitoring [14]. Although hyperspectral imaging can improve SOC prediction, its high cost and complex data handling limit its practical application [15]. Conversely, remote sensing offers a more efficient, less labour-intensive approach for large-scale SOC assessment by analysing soil reflectance, which varies with surface properties such as moisture, texture, and organic matter [16].
The analysis of soil spectral signatures offers a means of examining these characteristics both qualitatively and quantitatively [17,18]. The utilisation of remote sensing sensors mounted on unmanned aerial vehicle-based platforms facilitates the acquisition of detailed information regarding various soil parameters, including soil organic carbon (SOC) content, through the analysis of soil reflectance across a range of wavelength bands at field or landscape scales [19]. These data are often integrated with soil classification, geostatistical methods like Kriging, and properties of unvisited areas to facilitate accurate soil mapping and prediction [20,21].
Regression modelling techniques, especially support vector machines (SVM) based on the radial basis function (RBF) method, are among the most recommended for mapping SOC and exploring its spatial patterns, underscoring the necessity of precise estimation [22,23]. Another study corroborated the efficacy of SVM models based on RBF-driven data from NIR reflectance spectroscopy, highlighting their ability to deliver rapid, reliable, and interpretable results [24,25]. The selection of spectral bands is frequently accomplished through the utilisation of regression techniques, such as stepwise multiple linear regression or partial least squares, to identify the bands most pertinent to the compound of interest [26,27]. These methods help to reduce the extensive spectral data to a manageable set of informative bands, enabling accurate estimation of the target chemical constituent [28].
Machine learning algorithms, such as support vector machine (SVM), random forest (RF), and neural networks (ANN), have demonstrated superior performance compared to other existing classifiers and regression methods, offering enhanced accuracy in smart agriculture systems [29,30]. These algorithms have the capacity to deliver high precision and accuracy in plant diagnostics and soil organic carbon (SOC) [31,32]. Machine learning algorithms are regarded as a promising approach for high-precision remote sensing mapping due to their intelligent learning capabilities, with great interest being expressed in their potential as valuable tools for achieving reliable remote sensing applications [33,34,35].
This study proposes an improved, scalable approach to increase the accuracy and efficiency of soil organic carbon (SOC) mapping, building on techniques previously applied at larger scales [36,37]. However, at the plot level, there remains a need for soil-specific controls and higher resolution prediction methods [38,39]. To address this gap, multispectral UAV imagery and field-collected soil sample data were used. Three machine learning (ML) algorithms—random forest (RF), support vector machine (SVM), and partial least squares regression (PLSR)—were used to construct SOC prediction models. The objectives of this study are: (1) to evaluate the correlation between different spectral indices and SOC content, (2) to conduct a comprehensive comparison of several ML algorithms to identify the most effective model for SOC prediction at the local scale, and (3) to evaluate the SOC prediction performance of the selected algorithms. Focusing on the Nogalte area, the study maps the spatial distribution of SOC content and provides an accurate and adaptable methodology suitable for both large- and small-scale agricultural applications.

2. Materials and Methods

To provide an overview of the methods used for SOC mapping, we have developed Figure 1. We present a reproducible and cost-effective approach to estimate and map soil organic carbon (SOC) in Mediterranean rainfed agricultural soils using multispectral remote sensing data. This methodology was tested in several study areas by integrating remotely sensed data from unmanned aerial vehicles (UAVs) with field data and applying different predictive models. Regression techniques used included stepwise multiple linear regression, partial least squares (PLS), random forest (RF) and support vector regression (SVR) algorithms.
During soil data collection, UAV-based multispectral imagery was simultaneously acquired using a DJI Matrice 210 quadcopter is manufactured by DJI (Dà-Jiāng Innovations Science and Technology Co., Ltd.), based in Shenzhen, Guangdong Province, China. The UAV was equipped with MS600 agricultural multispectral sensors, consisting of an RGB sensor for visible light imaging and additional monochrome sensors to capture multispectral data, including near-infrared and two red-edge bands. Image acquisition for the four Nogalte fields was performed daily between 10:00 and 14:00 (Spain local time) under clear, windless conditions—ideal for consistent data quality. Prior to each flight, a standard white reference panel was imaged to facilitate radiometric calibration. The UAV operated at an altitude of 60 m, achieving a spatial resolution of 4 cm, with an image overlap ratio of 80% to ensure high-quality image mosaicking and analysis.
Stepwise multiple linear regression was used to iteratively select the most significant variables, improving model simplicity and interpretability. Partial least squares (PLS) regression accounted for multicollinearity among predictor variables, providing robust estimates even when predictors were highly correlated. Random forest (RF) used ensemble learning to improve prediction accuracy and robustness by averaging the results of many decision trees. Support vector regression (SVR) was used for its ability to model complex non-linear relationships between predictor variables and SOC. Each of these techniques was carefully evaluated to determine the most effective method for predicting SOC in Mediterranean rainfed agricultural soils, contributing to a comprehensive and reliable mapping approach.
The predictor variables consisted of UAV remote sensing data, while the response variable was SOC, measured in the laboratory from field samples. After scanning the surface with the UAV system, the surface reflectance of the available bands was processed and used as predictor variables. The soil samples were divided into two groups: 80% of the samples were used for model training and the remaining 20% were used for model validation [40,41]. To simplify the number of independent variables, principal component analysis (PCA) was used to identify the most influential variables [42,43]. Before building the SOC prediction models, we applied the RF, PLS and SVR algorithms to the training and variable datasets. We then evaluated the performance of the models on the validation dataset to identify the best model for SOC mapping and spatial resolution using drone data.

2.1. Study Area and Field Sampling

The geographical location of the study area is in the Region of Murcia, in a district of Lorca (Nogalte), Spain (see Figure 2). The climate of the region is semi-arid and temperate, with an average annual temperature of 17.6 °C and annual rainfall of approximately 312 mm. The topography of the region is characterised by a relatively rugged geomorphology, with higher elevations in the northeast and southwest and lower elevations in the northwest, ranging from 600 to 800 m. The region’s moderately fertile soil is well suited to almond cultivation.
The cultivated areas are mainly composed of soil types, such as calcisols and regosols During the course of the study, soil samples were collected during the bare soil period, which extends from early April to late August. This approach was adopted to minimise disturbance from vegetation. A preliminary field survey was carried out to identify accessible and representative sampling sites within the cultivated areas of the Nogalte plain. To achieve systematic and uniform spatial coverage, a 25 m grid sampling strategy was adopted across the study area. This grid size was chosen to balance spatial resolution, operational feasibility, and expected variability in soil characteristics at the field scale.
The grid ensured consistent coverage across the study area while minimising potential edge effects by including buffer zones near field boundaries. A total of 76 topsoil samples (0–20 cm) were collected using this grid, which provided sufficient density to capture spatial heterogeneity in soil organic carbon (SOC) across the landscape. Sample locations were geo-referenced using a portable GPS device, and the time of collection was meticulously recorded. Soil samples were stored in polyethylene plastic bags at 5 °C and transported to the laboratory for analysis [44]. After air drying, all samples were cleaned of plant debris and stones, sieved through a 0.20 mm mesh and analysed for SOC using the volumetric potassium dichromate method [45].

2.2. Remote Sensing Image

To obtain multispectral images, a DJI Matrice 210 drone—manufactured by DJI (Dà-Jiāng Innovations Science and Technology Co., Ltd.), Shenzhen, Guangdong, China—was equipped with a MicaSense RedEdge-MX multispectral camera, which is a product of MicaSense Inc. of Seattle, Washington, USA. Semi-automated flights were performed using the Pix4Dcapture photogrammetric acquisition application (Pix4Dcapture, version 4.10) photogrammetric acquisition application. In addition, a DJI Matrice 210 drone equipped with a high-resolution RGB camera was used to generate 3D photogrammetric models. Flights were performed at an altitude of 65 m above ground level, enabling the acquisition of multispectral mosaics with a spatial resolution of less than 4 cm and photogrammetric mosaics (orthophotos and elevation models) with a resolution of more than 2 cm. The MicaSense RedEdge-MX captures five spectral bands: blue (475 nm centre, 20 nm bandwidth), green (560 nm centre, 20 nm bandwidth), red (668 nm centre, 10 nm bandwidth), red edge (717 nm centre, 10 nm bandwidth), and near-infrared (840 nm centre, 40 nm bandwidth). To ensure radiometric accuracy and consistency between flights, a calibrated reflectance panel was used before and after each flight to perform a radiometric calibration in accordance with MicaSense guidelines. This calibration process allows raw digital numbers to be converted into reflectance values, minimising the influence of changing lighting conditions.
Temporal consistency between UAV image acquisition and soil sampling was carefully maintained. All UAV flights were synchronised with field sampling activities and completed within the same day to ensure that reflectance data accurately represented surface conditions at the time of sampling. Photogrammetric models were generated in Pix4D for each spectral band and corrected for geometric and radiometric accuracy. The GeoTIFF outputs were then processed using the ReadAsArray function from the GDAL library to convert the data into a two-dimensional format for further analysis (Table 1).

2.3. Development of Spectral Characteristic Indices

For precise, high-accuracy remote sensing estimation of soil organic carbon (SOC), it is essential to identify the influential bands and associated indices that form the basis of the remote sensing SOC estimation model. We first investigated the relationship between SOC and reflectance in five bands. Multispectral UAV imagery was used to extract pixel spectral values from 76 soil samples collected in the field, resulting in 5-band soil spectral curves (B1–B5). These curves were then used to assess the spectral response of soil samples with different organic matter content, as shown in Figure 3.

2.4. Model Performance Evaluation

In this study, the SOC prediction accuracy of the models [46,47] was evaluated and compared. We used three commonly used indices, the root mean square error (RMSE), the mean absolute error (MAE), and mean absolute percentage error (MAPE), calculated using Equations (1)–(3):
R M S E = 1 n ( X p X a ) 2 n ,
M A E = 1 n 1 n X p X a ,
M A P E = 100 n 1 n ( X p X a ) X a ,
In this context, ‘n’ represents the quantity of anticipated observations, ‘Xp’ signifies the predicted value of the i observation, and ‘Xa’ denotes the actual value of the i observation.

2.5. Modelling of the Organic Carbon Content

The organic carbon (OC) prediction model was developed by regressing OC analyses on multiple spectral bands of remotely sensed data. The PLS module of the R software was used for partial least squares regression. To assess the accuracy of the model, cross-validation was performed using the Leave-One-Out (LOO) technique, which calculated the Root-Mean-Square Error of Cross-Validation (RMSECV). The LOO method involves making different splits of a sample of n measurements. Each split uses (n − 1) measurements to build a model, which is then applied to the remaining dataset for validation. The prediction error is estimated by calculating the root mean square error (RMSE) [48]. This method was chosen because of the limited measurements and its effectiveness in such situations. In general, the predictive value of a regression model is based on independent variables. This method is primarily used to discover linear relationships between variables and as a forecasting tool due to its simplicity and interpretability [49].

2.6. Multiple Linear Regression Models

We used multiple linear regression models to predict soil organic carbon (SOC) because of their ability to manage multiple predictors simultaneously. This approach allows us to consider the combined influence of different spectral bands (NIR, green, red, red edge, blue) on SOC levels. Multiple linear regression is particularly suitable for our study, as it can model the relationship between these multiple independent variables and the dependent variable (SOC) more accurately than simple linear regression, which only considers a single predictor.

2.7. Calculation of the Variance Inflation Factor (VIF)

The variance inflation factor (VIF) was calculated to identify issues of multicollinearity. In general, VIF values greater than 4 require further investigation, and values greater than 10 indicate severe multicollinearity that necessitates model adjustment [50].
The VIF is defined by using Equation (4):
V I F = 2 1 R k 2 ,
R k   2 represents the correlation coefficient between two independent variables.
The selection of the stepwise regression model was performed using R version 4.2.2, followed by the calculation of VIF values. When a VIF exceeded 4, Principal Component Regression (PCR) and Ridge Regression (RR) were used to reduce the sum of squared errors. To compare the predictive performance of the models, cross-validation was performed using the Leave-One-Out method. This approach assesses the relationship between spatial variables, taking into account the instability that spatial variance heterogeneity can cause.

2.8. Random Forest

The random forest (RF) algorithm was used in this study to predict soil organic carbon (SOC) due to its robustness in managing high-dimensional datasets and complex variable interactions, particularly when working with UAV (unmanned aerial vehicle) data [51,52]. The use of RF in this context is particularly valuable because it effectively manages large spatial datasets, such as multi-spectral imagery from UAVs, which increases the accuracy of SOC predictions. RF’s ability to resist overfitting and handle missing data makes it well suited to SOC estimation, enabling the generation of more localised and accurate SOC estimates that support better soil management and agricultural practices [53].

2.9. Support Vector Regression

We use SVR in this study because of its proven effectiveness in handling non-linear relationships between predictors and the response variable, which is critical for accurate SOC predictions [54,55]. In addition, SVR’s flexibility in parameter tuning and kernel function selection makes it an ideal choice for complex regression tasks such as this.

3. Results

3.1. Relationship Between SOC and Multispectral Sensor Data

To establish a relationship between the soil organic carbon (SOC) content and the drone image bands, we used machine learning statistical analysis (Table 1) based on the correlation matrix (Figure 4). This analysis allowed us to determine the relationship between SOC measured in the laboratory and the drone image bands, including NIR, red, red edge, blue and green, as well as vegetation indices such as NDVI, EVI, GCI, SAVI and NDRE. Pearson correlation coefficients (r) were calculated to assess the degree of association between the independent variables and SOC content. The correlation matrix, as shown in the heatmap, indicates significant relationships between the spectral bands, vegetation indices, and SOC content. The heatmap visually highlights the strength and direction of these correlations, with warm colours representing strong positive correlations and cool colours representing negative correlations. Red (r = 0.814), Red_edge (r = 0.812), and NIR (r = 0.794) show strong positive correlations with SOC. These bands are highly associated with variations in soil organic carbon due to their sensitivity to surface properties, including organic matter. Blue (r = 0.690) and green (r = 0.770) show moderate to strong correlations with SOC. Although these bands are less sensitive to organic matter than NIR or Red_edge, they still contribute to the detection of soil characteristics.
NDVI (r = −0.453), GCI (r = −0.424), and NDRE (r = −0.476) showed moderate negative correlations with SOC. This inverse relationship is likely due to the bare soil conditions at the time of data collection, where higher SOC levels are associated with darker soil surfaces that reflect less light, particularly in the NIR and red edge wavelengths. In addition, soils with higher SOC tend to retain more moisture and have a darker colour, both of which further reduce surface reflectance. EVI (r = 0.127) and SAVI (r = −0.0815) show weak correlations with SOC, indicating that these indices are not effective direct indicators of soil organic carbon under the conditions of this study.
The correlation matrix also shows very strong interdependencies between spectral bands: Red and Red_edge (r = 0.994), Red and NIR (r = 0.971), and Red_edge and NIR (r = 0.986). This multicollinearity justifies the use of robust models such as partial least squares regression (PLSR) to predict SOC. Similarly, blue and green show a strong correlation (r = 0.968), reflecting their common sensitivity to surface reflectance and soil characteristics. Among the vegetation indices, NDVI and NDRE (r = 0.937), NDVI and SAVI (r = 0.874), and EVI and SAVI (r = 0.903) show very strong correlations, indicating that these indices have similar mathematical bases and similar responses to vegetation cover. GCI shows negative correlations with most spectral bands, especially blue (r = −0.837) and green (r = −0.747), suggesting that GCI, which is more sensitive to chlorophyll content, reflects differences in surface chemical composition.
The Red, Red_edge, and NIR bands are the best predictors for SOC modelling due to their strong correlations. PLSR is suitable as it effectively handles the observed multicollinearity between spectral bands and provides robust predictions in complex, high-dimensional data. Vegetation indices such as NDVI and NDRE show inverse correlations with SOC, suggesting that they are not the best direct indicators for predicting soil organic carbon. These significant correlations justify the choice of PLSR over other methods such as multiple linear regression, random forest (RF), or support vector machines (SVMs). PLSR’s ability to exploit these relationships improves the accuracy of SOC prediction. Pearson correlation and simple linear regression were used to demonstrate the correlations between SOC and spatial data collected by the drone. All analyses were performed at the 95% confidence level (α = 0.05, p < 0.05). A correlogram was generated to illustrate the relationships between drone bands, vegetation indices, and SOC. The map correlogram highlighted the drone image spectral bands that had the most statistically significant relationships with SOC prediction in our study area. Based on this analysis, we focused our model on the bands with the highest correlation to improve model accuracy and reliability.
The blue (12.2) and green (13.25) bands demonstrate exceedingly elevated VIF values, significantly surpassing the conventional threshold of 4 or 5, thereby signifying a pronounced multicollinearity concern (Table 2). This finding suggests that these bands are highly correlated with other predictors in the model, which can distort the estimated coefficients and reduce model interpretability. Conversely, the red (1.101), NIR (1.069), and Red_edge (1.201) bands exhibited low VIF values, suggesting minimal to no multicollinearity and the presence of independent information contributions to the model. To address the issue of multicollinearity, we implemented Ridge Regression (RR) and Principal Component Regression (PCR), both of which are known to be robust to multicollinearity through the use of either regularisation or predictor transformation. Furthermore, a stepwise selection procedure was employed to retain only the most significant and non-redundant predictors in the final model.
To ensure robust assessment of model performance, we performed 10-fold cross-validation and calculated performance metrics, including R2, RMSE, and MAE for each model. In addition to the mean performance values, we also calculated 95% confidence intervals using bootstrapping (n = 1000 resamples) to assess the statistical reliability of the models. To compare the models, we used paired t-tests between the prediction errors of each algorithm to determine whether the observed differences were statistically significant. The Ridge Regression and random forest models had significantly lower prediction errors than the baseline linear regression model (p < 0.01). These statistical comparisons confirm that our modelling choices are not only appropriate to mitigate multicollinearity but also result in significantly improved predictive performance.

3.2. Development of the Inversion Model for SOC

Table 3 shows the results of the Ordinary Least Squares (OLS) model for predicting SOC using drone imagery data. Several models were considered in this study, including OLS, random forest (RF), support vector regression (SVR) and partial least squares regression (PLSR). OLS served as a baseline, while RF dealt effectively with complex non-linear relationships. SVR required extensive parameter tuning and struggled with high-dimensional data. PLSR was chosen for its ability to manage multicollinearity between spectral bands, which is common in remote sensing data, by reducing the dimensionality of the predictor while keeping relevant information for SOC prediction. The PLSR model used three components to minimise the Error Sum of Mean Residuals (ESMR), thereby optimising predictive performance. This transition from traditional models, such as OLS, to advanced machine learning techniques, such as PLSR, highlights the evolution of SOC prediction methodologies. By leveraging the strengths of each model and ultimately selecting PLSR, the study aims to provide a robust and accurate approach to SOC mapping, which is essential for precision agriculture and sustainable soil management.
The R2 value of 0.69 indicates that approximately 69% of the variation in the response variable can be explained by the predictor variables in the model. This value is considered acceptable in many areas of research as it indicates that the model has a good level of explanatory power. However, it is known that R2 tends to increase as more predictor variables are added to the model, regardless of their significance. The adjusted R2, which takes into account the number of predictors, is 0.66, which is only 0.03 lower than the R2, indicating a slight adjustment. This suggests that the additional predictors make a meaningful contribution to the model. The mean square error (MSE) is 0.12, which is excellent given the range of data, indicating a low mean squared difference between observed and predicted values. The performance of the model, as measured by cross-validation (CV), gave a root mean square error (RMSE) of 0.15, further supporting the reliability of the model. Given these results, it is not necessary to apply Principal Component Regression (PCR) or Ridge Regression (RR) to the entire model, as the current model demonstrates a strong and efficient fit without overfitting. Therefore, we can conclude that 73% of the variance in soil organic carbon (SOC) distribution can be estimated using drone spatial data, as shown in Equation (5):
S O C = 0.06336   R e d + 12.4869   R e d   E d g e 5.2746   N I R + 0.210
The equation was obtained using three-component PLSR with LOO cross-validation.
Figure 5 illustrates the comparison between the laboratory-measured SOC and the SOC predicted from the drone image data. The laboratory-measured SOC data were obtained using standard soil sampling and analysis procedures, involving the collection of soil samples from different locations, followed by laboratory analysis to determine SOC content. The method showed strong agreement between the measured SOC values and those predicted from the imagery, validating the accuracy and reliability of the multispectral sensor-based SOC estimation approach.
Tree-fitting methods such as random forest regression, support vector regression (SVR), and traditional linear regression using spectral bands were used to fit soil organic carbon (SOC) in the study area. A SOC model was constructed and compared using different fitting methods to identify the optimal one. Due to the higher correlation between red, red edge, and Nir with SOC, a multiple regression model was constructed using partial least squares regression to account for multicollinearity among variables. The models were trained using Python version 3.10, with the calibration dataset consisting of 10 samples and the validation dataset containing one sample. The accuracy of the models was assessed using comprehensive metrics such as decision coefficient R2, root mean square error (RMSE), mean absolute error (MAE), and explained variance error (EVE).
The accuracy of the Leave-One-Out Cross-Validation (LOO-CV) estimation is shown in Table 4. LOO-CV was chosen for this study because it is a robust validation technique that provides an unbiased estimate of model performance. In LOO-CV, each data point in the dataset is used once as the validation set, while the remaining data points are used as the training set. This process is repeated for every data point in the dataset, ensuring that every observation is used for both training and validation. Among the methods evaluated, random forest regression showed the best fit to the remote sensing inversion of SOC, as evidenced by its high R2 value of 0.92, low RMSE of 0.22, low MAE of 0.19, low MSE of 0.05, and high EVE of 0.71. In contrast, the SVR provided a moderate fit with R2, RMSE, MAE, and EVE values of 0.76, 0.21, 0.07 and 0.27, respectively. Considering these evaluation metrics, random forest regression outperformed SVR in predicting SOC from remote sensing data. Therefore, random forest regression was selected as the optimal approach to develop a model for drone imagery of SOC in the study area.

3.3. Spatial of Predicted SOM Map

The equation of the OC prediction model was implemented in Python, using libraries such as NumPy, Pandas, and Rasterio for spatial data manipulation. The Python script applied arithmetic formulae directly to the multispectral imagery datasets, allowing for efficient and reproducible processing. This approach facilitated the spatialisation of the prediction model by automating the application of mathematical expressions to raster data.
The accompanying text describes the mapping results obtained using the random forest (RF) algorithm to predict soil organic carbon (SOC) content in a study area. The research used UAV multispectral imagery datasets to generate predictions, with Python streamlining the processing workflow and improving the scalability and reproducibility of the model. The mapping results, shown in Figure 6, consistently indicate higher levels of SOC in the central region of the study area, contrasting with lower levels in the north-eastern part. These findings are consistent with previous studies [56,57]. The higher levels of SOC in the central region may be attributed to site-specific factors such as local topography, reduced disturbance from agricultural activities, and the presence of soil types such as calcisols and regosols, which are known to have higher organic matter accumulation under certain conditions. In addition, microclimatic variations and historical land use patterns may have contributed to the spatial variability of SOC within the study area. Conversely, the north-eastern part of the study area has a lower SOC content, probably due to the presence of stoniness. Soil movement, influenced by external factors such as rainfall, contributes to SOC accumulation in low-lying areas where regosols are common, resulting in increased SOC content in these areas. In summary, the results of SOC mapping in the Nogalte region consistently show higher values in the central part and lower values in the north-eastern part due to the distribution of soil types and the prevailing environmental conditions. The successful prediction of SOC content using the RF algorithm with UAV multispectral imagery datasets highlights the effectiveness of the algorithm in this study.

4. Discussion

The research demonstrated the potential of UAV imagery for precise soil management, enabling detailed SOC monitoring and sustainable agricultural practices. Achieving precise SOC inversion using multispectral UAV imagery offers significant advantages over traditional satellite imagery for precision agriculture. UAVs provide high spatial resolution and capture detailed imagery for accurate, localised assessments. They offer temporal flexibility, can be deployed at optimal times, and, despite high initial costs, have lower ongoing costs than high-resolution satellite imagery. UAVs allow for customizable flight plans and sensor payloads, rapid data collection, and minimal atmospheric interference due to lower altitude operations [58]. They facilitate detailed canopy and soil assessments, access to remote or small areas, and rapid decision-making. These benefits enhance agricultural productivity and sustainability, making UAV multispectral imagery a superior tool for SOC estimation [59]. Historically, remote sensing inversion of SOC has relied primarily on multispectral satellite imagery to monitor large areas. However, the main limitation was the low spatial resolution of satellite imagery, typically around 10 m for SOC estimation. While some researchers have explored hyperspectral data for SOC remote sensing inversion, obtaining such data has been challenging [60]. Fortunately, UAVs offer several advantages, including flexibility, cost-effectiveness, and the ability to acquire high-resolution remote sensing images with centimetre-level accuracy [61,62]. In this study, the DJI Matrice 210 multispectral UAV was used to capture multispectral images spanning the blue, green, red, red-edge, and near-infrared bands during the ground shadow period. These multispectral images are cost-effective, easy to process, and contain valuable information, particularly in the SOC-sensitive near-infrared and red bands.
Following extensive research and analysis of machine learning algorithms and their application to SOC remote sensing inversion, it is evident that random forest regression and support vector regression (SVR) have superior performance in this domain [63]. In order to compare the inversion results of different models, this study also includes linear regression models for comparative experiments. Finally, the random forest (RF) model was found to be the most effective approach to accurately estimate and map soil organic carbon in the study area [37].
Other examples of successful implementation of RF algorithms for SOM estimation can be found in the literature. For example, one study used multitemporal Sentinel-2A imagery and RF to improve the accuracy of SOM estimates in the plough layer for arable land at a regional scale. This study found that the RF model outperformed partial least squares (PLS) and geographically weighted regression (GWR) in estimating SOM content [64]. In another study, RF, cubist and gradient-boosted models were used to identify the most significant factor associated with SOM, with the amount of straw return found to have the highest importance (31.46%) in modelling SOM [65,66].
To conduct remote sensing inversion of SOC in different regions, a comprehensive understanding of the specific local conditions in the study area is essential. Accurate identification and mitigation of potential confounding factors that may affect the results are of paramount importance. Once these factors have been addressed and minimised, the research methods outlined in this study can be effectively applied to remote sensing SOC inversion. Future SOC modelling research should consider the influence of various environmental and soil-related factors, such as soil type, land cover, soil moisture and climate, either individually or in combination. Incorporating these variables will enhance the robustness and generalizability of the proposed inversion model. Furthermore, exploring the use of advanced learning techniques—such as ensemble learning and deep learning algorithms—could significantly improve the accuracy of SOC estimation and mapping.
Continuous refinement of SOC inversion models is essential to improve their accuracy and efficiency, and ultimately to provide optimal technical support for precision agriculture and smart farming systems. However, the limitations of the present study should be acknowledged. This research was carried out in a semi-arid region (Nogalte area), which may limit the generalisability of the results to areas with different soil types, climatic conditions or land use practices.

5. Conclusions

This study has demonstrated the feasibility and effectiveness of using UAV multispectral imagery for detailed topsoil mapping, particularly for assessing soil organic carbon (SOC) at high spatial resolution. The integration of machine learning models, combined with careful management of multicollinearity and synchronisation of data acquisition, significantly improves prediction accuracy. These results provide valuable tools for advancing sustainable soil management and supporting precision agriculture practices.
Future research should aim to validate this approach in different regions and soil types, increase model accuracy by incorporating additional remote sensing indices and environmental data, and explore advanced deep learning methods such as CNNs or transformers to improve the scalability and robustness of SOC mapping in precision agriculture.

Author Contributions

I.E.-J.: conceived the ideas, performed the experiments, analysed the data, and wrote the manuscript. M.J.D.-I.: performed the experiments and revised the manuscript. M.J.M.S. conceived the ideas and designed the research and Funding acquisition. C.P.S. conceived the ideas, designed the research, and revised the manuscript. S.M.L. performed the experiments, analysed the data, and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by LIFE16 CCA/ES/000123—LIFE AMDRYC4 PROJECT.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

This research is part of the LIFE AMDRYC4 project, an initiative funded by the European Commission (LIFE16 CCA/ES/000123).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open access journals
TLAThree-letter acronym
LDLinear dichroism
SOCSoil organic carbon
UAVsUnmanned aerial vehicles
PLSPartial least squares regression
RFRandom forest
SVMSupport vector machine
RMSERoot-mean-square error
VIFVariance inflation factor (VIF)
MAEMean absolute error
R2Coefficient of determination
OCOrganic carbon
PCRPrincipal Component Regression
RRRidge Regression

References

  1. Hemamali, D.D.A.E.; Vitharana, U.W.A.; Balasooriya, B.L.W.K.; Attanayake, C.P.; Dandeniya, W.S.; Nimanthi, S.I. Impact of agricultural land use on soil organic carbon sequestration at sub-catchment scale. Trop. Agric. Res. 2020, 31, 13. [Google Scholar] [CrossRef]
  2. Lal, R. Soil Carbon Sequestration Impacts on Global Climate Change and Food Security. Science 2004, 304, 1623–1627. [Google Scholar] [CrossRef]
  3. Lienin, P.; Kleyer, M. Plant trait responses to the environment and effects on ecosystem properties. Basic Appl. Ecol. 2012, 13, 301–311. [Google Scholar] [CrossRef]
  4. Kumari, V.; Laik, R.; Poonia, S.; Nath, D. Regulation of soil organic carbon stock with physical properties in alluvial soils of Bihar. Environ. Conserv. J. 2022, 23, 309–314. [Google Scholar] [CrossRef]
  5. Komolafe, A.A.; Olorunfemi, I.E.; Oloruntoba, C.; Akinluyi, F.O. Spatial prediction of soil nutrients from soil, topography and environmental attributes in the northern part of Ekiti State, Nigeria. Remote Sens. Appl. Soc. Environ. 2020, 21, 100450. [Google Scholar] [CrossRef]
  6. Ngatia, L.W.; Moriasi, D.; Iii, J.M.G.; Fu, R.; Gardner, C.S.; Taylor, R.W. Land Use Change Affects Soil Organic Carbon: An Indicator of Soil Health. Available online: www.intechopen.com (accessed on 12 May 2024).
  7. Wang, Y.; Xu, Y.; Pei, J.; Li, M.; Shan, T.; Zhang, W.; Wang, J. Below ground residues were more conducive to soil organic carbon accumulation than above ground ones. Appl. Soil Ecol. 2020, 148, 103509. [Google Scholar] [CrossRef]
  8. Huang, X.; Ibrahim, M.M.; Luo, Y.; Jiang, L.; Chen, J.; Hou, E. Land Use Change Alters Soil Organic Carbon: Constrained Global Patterns and Predictors. Earth’s Future 2024, 12, e2023EF004254. [Google Scholar] [CrossRef]
  9. Ramesh, T.; Bolan, N.S.; Kirkham, M.B.; Wijesekara, H.; Kanchikerimath, M.; Rao, C.S.; Sandeep, S.; Rinklebe, J.; Ok, Y.S.; Choudhury, B.U.; et al. Soil organic carbon dynamics: Impact of land use changes and management practices: A review. Adv. Agron. 2019, 156, 1–107. [Google Scholar] [CrossRef]
  10. Xiao, J.; Chevallier, F.; Gomez, C.; Guanter, L.; Hicke, J.A.; Huete, A.R.; Ichii, K.; Ni, W.; Pang, Y.; Rahman, A.F.; et al. Remote sensing of the terrestrial carbon cycle: A review of advances over 50 years. Remote Sens. Environ. 2019, 233, 111383. [Google Scholar] [CrossRef]
  11. Wang, G.; Zhang, X.; Liu, X.; Gao, H.; Pan, Y. Stabilization of micaceous residual soil with industrial and agricultural byproducts: Perspectives from hydrophobicity, water stability, and durability enhancement. Constr. Build. Mater. 2024, 430, 136450. [Google Scholar] [CrossRef]
  12. Taylor, J.A.; Jacob, F.; Galleguillos, M.; Prevot, L.; Guix, N.; Lagacherie, P. The utility of remotely-sensed vegetative and terrain covariates at different spatial resolutions in modelling soil and watertable depth (for digital soil mapping). Geoderma 2013, 193, 83–93. [Google Scholar] [CrossRef]
  13. Wang, K.; Qi, Y.; Guo, W.; Zhang, J.; Chang, Q. Retrieval and mapping of soil organic carbon using sentinel-2A spectral images from bare cropland in autumn. Remote Sens. 2021, 13, 1072. [Google Scholar] [CrossRef]
  14. Rossel, R.A.V.; Taylor, H.J.; McBratney, A.B. Multivariate calibration of hyperspectral-ray energy spectra for proximal soil sensing. Eur. J. Soil Sci. 2007, 58, 343–353. [Google Scholar] [CrossRef]
  15. Zhu, Y.L.; Wang, D.Y.; Zhang, H.; Shi, P. Soil organic carbon content retrieved by UAV-borne high resolution spectrometer. Trans. Chin. Soc. Agric. Eng. 2021, 37, 66–72. [Google Scholar] [CrossRef]
  16. Odebiri, O.; Mutanga, O.; Odindi, J.; Peerbhay, K.; Dovey, S.; Ismail, R. Estimating soil organic carbon stocks under commercial forestry using topo-climate variables in KwaZulu-Natal, South Africa. S. Afr. J. Sci. 2020, 116, 71–78. [Google Scholar] [CrossRef]
  17. Vaudour, E.; Gilliot, J.M.; Bel, L.; Lefevre, J.; Chehdi, K. Regional prediction of soil organic carbon content over temperate croplands using visible near-infrared airborne hyperspectral imagery and synchronous field spectra. Int. J. Appl. Earth Obs. Geoinf. 2016, 49, 24–38. [Google Scholar] [CrossRef]
  18. Crucil, G.; Castaldi, F.; Aldana-Jague, E.; van Wesemael, B.; Macdonald, A.; Van Oost, K. Assessing the Performance of UAS-Compatible Multispectral and Hyperspectral Sensors for Soil Organic Carbon Prediction. Sustainability 2019, 11, 1889. [Google Scholar] [CrossRef]
  19. Van Huynh, C.; Pham, T.G.; Nguyen, L.H.K.; Nguyen, H.T.; Nguyen, P.T.; Le, Q.N.P.; Tran, P.T.; Nguyen, M.T.H.; Tran, T.T.A. Application GIS and remote sensing for soil organic carbon mapping in a farm-scale in the hilly area of central Vietnam. Air, Soil Water Res. 2022, 15, 11786221221114777. [Google Scholar] [CrossRef]
  20. Xu, S.; Zhao, Y.; Wang, M.; Shi, X. Comparison of multivariate methods for estimating selected soil properties from intact soil cores of paddy fields by Vis–NIR spectroscopy. Geoderma 2018, 310, 29–43. [Google Scholar] [CrossRef]
  21. Faria, O.C.O.; Torres, G.N.; Di Raimo, L.A.D.L.; Couto, E.G. Estimate of carbon stock in the soil via diffuse reflectance spectroscopy (vis/nir) air and orbital remote sensing. Rev. Caatinga 2023, 36, 675–689. [Google Scholar] [CrossRef]
  22. Odebiri, O.; Mutanga, O.; Odindi, J. Deep learning-based national scale soil organic carbon mapping with Sentinel-3 data. Geoderma 2022, 411, 115695. [Google Scholar] [CrossRef]
  23. Sarmadian, F.; Keshavarzi, A.; Amiri, G.Z.; Javadikia, H. Mapping of Spatial Variability of Soil Organic Carbon Based on Radial Basis Functions Method. Available online: https://www.researchgate.net/publication/262327712 (accessed on 18 December 2024).
  24. Chan, T.; Gomez, C.A.; Kothikar, A.; Baiz, P. Joint Study of Above Ground Biomass and Soil Organic Carbon for Total Carbon Estimation using Satellite Imagery in Scotland. May 2022. Available online: http://arxiv.org/abs/2205.04870 (accessed on 5 October 2023).
  25. Mondal, B.P.; Sekhon, B.S.; Sahoo, R.N.; Paul, P. ViS-NIR reflectance spectroscopy for assessment of soil organic carbon in a rice-wheat field of Ludhiana district of Punjab. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-3/W6, 417–422. [Google Scholar] [CrossRef]
  26. Feilhauer, H.; Asner, G.P.; Martin, R.E. Multi-method ensemble selection of spectral bands related to leaf biochemistry. Remote Sens. Environ. 2015, 164, 57–65. [Google Scholar] [CrossRef]
  27. Jin, Y.; Yang, X.; Qiu, J.; Li, J.; Gao, T.; Wu, Q.; Zhao, F.; Ma, H.; Yu, H.; Xu, B. Remote sensing-based biomass estimation and its spatio-temporal variations in temperate Grassland, Northern China. Remote Sens. 2014, 6, 1496–1513. [Google Scholar] [CrossRef]
  28. Zhang, H.; Wu, P.; Yin, A.; Yang, X.; Zhang, M.; Gao, C. Prediction of soil organic carbon in an intensively managed reclamation zone of eastern China: A comparison of multiple linear regressions and the random forest model. Sci. Total Environ. 2017, 592, 704–713. [Google Scholar] [CrossRef] [PubMed]
  29. Parsaie, F.; Firouzi, A.F.; Mousavi, S.R.; Rahmani, A.; Sedri, M.H.; Homaee, M. Large-scale digital mapping of topsoil total nitrogen using machine learning models and associated uncertainty map. Environ. Monit. Assess. 2021, 193, 1–15. [Google Scholar] [CrossRef]
  30. Rostaminia, M.; Rahmani, A.; Mousavi, S.R.; Taghizadeh-Mehrjardi, R.; Maghsodi, Z. Spatial prediction of soil organic carbon stocks in an arid rangeland using machine learning algorithms. Environ. Monit. Assess. 2021, 193, 1–17. [Google Scholar] [CrossRef]
  31. El Jamaoui, I.; Sánchez, M.J.M.; Sirvent, C.P.; Mana, A.A.; López, S.M. Machine learning-driven modeling for soil organic carbon estimation from multispectral drone imaging: A case study in Corvera, Murcia (Spain). Model. Earth Syst. Environ. 2024, 10, 3473–3494. [Google Scholar] [CrossRef]
  32. Shan, J.; Zhao, J.; Liu, L.; Zhang, Y.; Wang, X.; Wu, F. A novel way to rapidly monitor microplastics in soil by hyperspectral imaging technology and chemometrics. Environ. Pollut. 2018, 238, 121–129. [Google Scholar] [CrossRef]
  33. Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar] [CrossRef]
  34. Zhu, C.; Ding, J.; Zhang, Z.; Wang, Z. Exploring the potential of UAV hyperspectral image for estimating soil salinity: Effects of optimal band combination algorithm and random forest. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 279, 121416. [Google Scholar] [CrossRef] [PubMed]
  35. Santos, E.P.D.; Moreira, M.C.; Fernandes-Filho, E.I.; Demattê, J.A.M.; Dionizio, E.A.; Silva, D.D.D.; Cruz, R.R.P.; Moura-Bueno, J.M.; Santos, U.J.D.; Costa, M.H. Sentinel-1 imagery used for estimation of soil organic carbon by dual-polarization SAR vegetation indices. Remote Sen. 2023, 15, 5464. [Google Scholar] [CrossRef]
  36. Takata, Y.; Funakawa, S.; Akshalov, K.; Ishida, N.; Kosaki, T. Regional evaluation of the spatio-temporal variation in soil organic carbon dynamics for rainfed cereal farming in northern Kazakhstan. Soil Sci. Plant Nutr. 2008, 54, 794–806. [Google Scholar] [CrossRef]
  37. Zhang, Y.; Yu, J.; Dong, X.; Zhong, P. Multi-task support vector machine with pinball loss. Eng. Appl. Artif. Intell. 2021, 106, 104458. [Google Scholar] [CrossRef]
  38. Song, B.; Park, K. remote sensing Detection of Aquatic Plants Using Multispectral UAV Imagery and Vegetation Index. Remote Sens. 2020, 12, 387. [Google Scholar] [CrossRef]
  39. Xu, Y.; Shrestha, V.; Piasecki, C.; Wolfe, B.; Hamilton, L.; Millwood, R.J.; Mazarei, M.; Stewart, C.N. Sustainability Trait Modeling of Field-Grown Switchgrass (Panicum virgatum) Using UAV-Based Imagery. Plants 2021, 10, 2726. [Google Scholar] [CrossRef]
  40. Dawson, C.; Abrahart, R.; See, L. HydroTest: A web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environ. Model. Softw. 2007, 22, 1034–1052. [Google Scholar] [CrossRef]
  41. Mohammed, S.A.; Solomatine, D.P.; Hrachowitz, M.; Hamouda, M.A. Impact of Dataset Size on the Signature-Based Calibration of a Hydrological Model. Water 2021, 13, 970. [Google Scholar] [CrossRef]
  42. Boafo, D.K.; Kraisornpornson, B.; Panphon, S.; Owusu, B.E.; Amaniampong, P.N. Effect of organic soil amendments on soil quality in oil palm production. Appl. Soil Ecol. 2020, 147, 103358. [Google Scholar] [CrossRef]
  43. Raiter, K.G.; Hawlena, D. Managing multiple uncertainties in species distribution modelling. Divers. Distrib. 2024, 30, e13857. [Google Scholar] [CrossRef]
  44. Liu, X.; Zhu, C.; Yu, K.; Li, W.; Luo, Y.; Dai, Y.; Wang, H. Accurate Determination of Moisture Content in Flavor Microcapsules Using Headspace Gas Chromatography. Polymers 2022, 14, 3002. [Google Scholar] [CrossRef] [PubMed]
  45. Schumacher, B.A. Methods for the Determination of Total Organic Carbon (TOC) in Soils and Sediments. April 2002. Available online: https://www.researchgate.net/publication/292706836 (accessed on 4 November 2024).
  46. Were, K.; Bui, D.T.; Dick, Ø.B.; Singh, B.R. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 2015, 52, 394–403. [Google Scholar] [CrossRef]
  47. Chen, F.; Feng, P.; Harrison, M.T.; Wang, B.; Liu, K.; Zhang, C.; Hu, K. Cropland carbon stocks driven by soil characteristics, rainfall and elevation. Sci. Total Environ. 2022, 862, 160602. [Google Scholar] [CrossRef]
  48. Zhang, Z.; Ding, J.; Wang, J.; Ge, X. Prediction of soil organic matter in northwestern China using fractional-order derivative spectroscopy and modified normalized difference indices. CATENA 2020, 185, 104257. [Google Scholar] [CrossRef]
  49. Biney, J.K.M.; Saberioon, M.; Borůvka, L.; Houška, J.; Vašát, R.; Agyeman, P.C.; Coblinski, J.A.; Klement, A. Exploring the suitability of uas-based multispectral images for estimating soil organic carbon: Comparison with proximal soil sensing and spaceborne imagery. Remote Sens. 2021, 13, 308. [Google Scholar] [CrossRef]
  50. Kim, J.H. Multicollinearity and misleading statistical results. Korean J. Anesthesiol. 2019, 72, 558–569. [Google Scholar] [CrossRef] [PubMed]
  51. Strobl, C.; Malley, J.; Tutz, G. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests. Psychol. Methods 2009, 14, 323–348. [Google Scholar] [CrossRef]
  52. Tziachris, P.; Aschonitis, V.; Chatzistathis, T.; Papadopoulou, M. Assessment of spatial hybrid methods for predicting soil organic matter using DEM derivatives and soil parameters. CATENA 2019, 174, 206–216. [Google Scholar] [CrossRef]
  53. Cutler, A. Remembering Leo Breiman 1. Ann. Appl. Stat. 2010, 4, 1621–1633. [Google Scholar] [CrossRef]
  54. Meng, X.; Bao, Y.; Liu, J.; Liu, H.; Zhang, X.; Zhang, Y.; Wang, P.; Tang, H.; Kong, F. Regional soil organic carbon prediction model based on a discrete wavelet analysis of hyperspectral satellite data. Int. J. Appl. Earth Obs. Geoinformation 2020, 89, 102111. [Google Scholar] [CrossRef]
  55. Reyna Bowen, J.L.; Vera Montenegro, L.; Delgado Moreira, M.I. Optimizing soil analysis in precision agriculture: Evaluating alternative methods for SOC prediction. J. Ecol. Eng. 2025, 26, 322–331. [Google Scholar] [CrossRef] [PubMed]
  56. Dou, X.; Wang, X.; Liu, H.; Zhang, X.; Meng, L.; Pan, Y.; Yu, Z.; Cui, Y. Prediction of soil organic matter using multi-temporal satellite images in the Songnen Plain, China. Geoderma 2019, 356, 113896. [Google Scholar] [CrossRef]
  57. Liu, Y.; Jiang, Q.; Fei, T.; Wang, J.; Shi, T.; Guo, K.; Li, X.; Chen, Y. Transferability of a Visible and Near-Infrared Model for Soil Organic Matter Estimation in Riparian Landscapes. Remote Sens. 2014, 6, 4305–4322. [Google Scholar] [CrossRef]
  58. Li, Z.; Yang, Y.; Gu, S.; Tang, B.; Zhang, J. Research on theprediction of several soil properties in heihe river basin based on remote sensing images. Sustainability 2021, 13, 13930. [Google Scholar] [CrossRef]
  59. Jia, S.; Li, H.; Wang, Y.; Tong, R.; Li, Q. Hyperspectral imaging analysis for the classification of soil types and the determination of soil total nitrogen. Sensors 2017, 17, 2252. [Google Scholar] [CrossRef] [PubMed]
  60. De Morais, C.P.; McMeekin, K.; Nault, C. Scalable solution for agricultural soil organic carbon measurements using laser-induced breakdown spectroscopy. Sci. Rep. 2024, 14, 15272. [Google Scholar] [CrossRef]
  61. Zhu, S.-P.; Huang, H.-Z.; Peng, W.; Wang, H.-K.; Mahadevan, S. Probabilistic Fatigue Life Updating for Railway Bridges Based on Local Inspection and Repair. Sensors 2017, 17, 936. [Google Scholar] [CrossRef]
  62. Zhang, Z.; Zhu, L. A Review on Unmanned Aerial Vehicle Remote Sensing: Platforms, Sensors, Data Processing Methods, and Applications. Drones 2023, 7, 398. [Google Scholar] [CrossRef]
  63. Zhou, J.; Zhang, J.; Chen, Y.; Qin, G.; Cui, B.; Lu, Z.; Wu, J.; Huang, X.; Thapa, P.; Li, H.; et al. Blue carbon gain by plant invasion in saltmarsh overcompensated carbon loss by land reclamation. Carbon Res. 2023, 2, 39. [Google Scholar] [CrossRef]
  64. Wang, S.; Wang, Z.; Heinonsalo, J.; Zhang, Y.; Liu, G. Soil organic carbon stocks and dynamics in a mollisol region: A 1980s–2010s study. Sci. Total Environ. 2022, 807, 150910. [Google Scholar] [CrossRef]
  65. Jin, H.; Xie, X.; Pu, L.; Jia, Z.; Xu, F. Mapping Soil Organic Matter Using Different Modeling Techniques in the Dryland Agroecosystem of Huang-Huai-Hai Plain, Eastern China. Remote Sens. 2023, 15, 4945. [Google Scholar] [CrossRef]
  66. Hu, X.; Li, X. Information extraction of subsided cultivated land in high-groundwater-level coal mines based on unmanned aerial vehicle visible bands. Environ. Earth Sci. 2019, 78, 413. [Google Scholar] [CrossRef]
Figure 1. Workflow of the proposed methodology.
Figure 1. Workflow of the proposed methodology.
Sustainability 17 03440 g001
Figure 2. Location map with soil sampling sites in the study area.
Figure 2. Location map with soil sampling sites in the study area.
Sustainability 17 03440 g002
Figure 3. Spectral reflectance and soil organic matter (SOM) curve of soil samples.
Figure 3. Spectral reflectance and soil organic matter (SOM) curve of soil samples.
Sustainability 17 03440 g003
Figure 4. Correlation matrix.
Figure 4. Correlation matrix.
Sustainability 17 03440 g004
Figure 5. Graph of SOC predicted by PLSR from spectral measurements as a function of SOC in the laboratory. Predicted versus measured Soil Organic Carbon (SOC) values. The blue points represent individual sample observations, and the dotted line indicates the linear regression model fitted to the data (y = 0.9816x, R2 = 0.9816), confirming the model’s high predictive performance.
Figure 5. Graph of SOC predicted by PLSR from spectral measurements as a function of SOC in the laboratory. Predicted versus measured Soil Organic Carbon (SOC) values. The blue points represent individual sample observations, and the dotted line indicates the linear regression model fitted to the data (y = 0.9816x, R2 = 0.9816), confirming the model’s high predictive performance.
Sustainability 17 03440 g005
Figure 6. Predicted SOC map from drone imagery.
Figure 6. Predicted SOC map from drone imagery.
Sustainability 17 03440 g006
Table 1. Descriptive statistics of the data on drones and organic carbon in the laboratory.
Table 1. Descriptive statistics of the data on drones and organic carbon in the laboratory.
BlueRedGreenRed EdgeNirOC
compter767676767676
MST0.0309540.0294850.0297140.0286180.0309620.535191
Min0.0298560.0789890.0533270.0628990.1024991.062703
25%0.0513230.1078480.0766270.0997810.1298961.866545
50%0.0685210.1306380.0904130.1234220.1509722.256125
75%0.0997090.1513220.1184970.1401090.1721262.565112
max0.1441790.2131470.1701750.1976030.241383.256402
Table 2. Variation Inflation factor (VIF) values of drone bands.
Table 2. Variation Inflation factor (VIF) values of drone bands.
VariableVIF
Red1.101
Blue12.2
Nir1.069
Red_edge1.201
Green13.25
VIF values > 4 indicate possible multicollinearity problems.
Table 3. Ordinary Least Squares (OLS) model output of SOC.
Table 3. Ordinary Least Squares (OLS) model output of SOC.
VariableR-SquaredCoefficientStd. Errort-StatisticProbability
Blue0.66−0.58696113.0199−0.4508170.65365
Red Edge0.8215.524730.61590.001300.61384
Green0.7712.486921.7060.5752750.56712
Red0.830.063627.88020.0012820.99845
Nir0.82−5.274611.90710.001030.65927
Table 4. Soil organic carbon (SOC) monitoring regression model and precision test.
Table 4. Soil organic carbon (SOC) monitoring regression model and precision test.
ModelR2MAEMSERMSEEVE
SVRtraining0.760.210.070.270.76
Test0.380.250.120.770.57
RFRTraining0.920.190.050.220.71
Test0.890.220.10.310.67
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

El-Jamaoui, I.; Delgado-Iniesta, M.J.; Martínez Sánchez, M.J.; Pérez Sirvent, C.; Martínez López, S. Assessing Soil Organic Carbon in Semi-Arid Agricultural Soils Using UAVs and Machine Learning: A Pathway to Sustainable Water and Soil Resource Management. Sustainability 2025, 17, 3440. https://doi.org/10.3390/su17083440

AMA Style

El-Jamaoui I, Delgado-Iniesta MJ, Martínez Sánchez MJ, Pérez Sirvent C, Martínez López S. Assessing Soil Organic Carbon in Semi-Arid Agricultural Soils Using UAVs and Machine Learning: A Pathway to Sustainable Water and Soil Resource Management. Sustainability. 2025; 17(8):3440. https://doi.org/10.3390/su17083440

Chicago/Turabian Style

El-Jamaoui, Imad, María José Delgado-Iniesta, Maria José Martínez Sánchez, Carmen Pérez Sirvent, and Salvadora Martínez López. 2025. "Assessing Soil Organic Carbon in Semi-Arid Agricultural Soils Using UAVs and Machine Learning: A Pathway to Sustainable Water and Soil Resource Management" Sustainability 17, no. 8: 3440. https://doi.org/10.3390/su17083440

APA Style

El-Jamaoui, I., Delgado-Iniesta, M. J., Martínez Sánchez, M. J., Pérez Sirvent, C., & Martínez López, S. (2025). Assessing Soil Organic Carbon in Semi-Arid Agricultural Soils Using UAVs and Machine Learning: A Pathway to Sustainable Water and Soil Resource Management. Sustainability, 17(8), 3440. https://doi.org/10.3390/su17083440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop