Field-Scale Winter Wheat Growth Prediction Applying Machine Learning Methods with Unmanned Aerial Vehicle Imagery and Soil Properties

: Monitoring crop growth conditions during the growing season provides information on available soil nutrients and crop health status, which are important for agricultural management practices. Crop growth frequently varies due to site-specific climate and farm management practices. These variations might arise from sub-field-scale heterogeneities in soil composition, moisture levels, sunlight, and diseases. Therefore, soil properties and crop biophysical data are useful to predict field-scale crop development. This study investigates soil data and spectral indices derived from multispectral Unmanned Aerial Vehicle (UAV) imagery to predict crop height at two winter wheat farms. The datasets were investigated using Gaussian Process Regression (GPR), Ensemble Regression (ER), Decision tree (DT), and Support Vector Machine (SVM) machine learning regression algorithms. The findings showed that GPR (R 2 = 0.69 to 0.74, RMSE = 15.95 to 17.91 cm) has superior accuracy in all models when using vegetation indices (VIs) to predict crop growth for both wheat farms. Furthermore, the variable importance generated using the GRP model showed that the RedEdge Normalized Difference Vegetation Index (RENDVI) had the most influence in predicting wheat crop height compared to the other predictor variables. The clay, calcium (Ca), magnesium (Mg), and potassium (K) soil properties have a moderate positive correlation with crop height. The findings from this study showed that the integration of vegetation indices and soil properties predicts crop height accurately. However, using the vegetation indices independently was more accurate at predicting crop height. The outcomes from this study are beneficial for improving agronomic management within the season based on crop height trends. Hence, farmers can focus on using cost-effective VIs for monitoring particular areas experiencing crop stress.


Introduction
Wheat is one of the most widely grown cereal crops, covering about 220.62 million hectares (ha) worldwide in 2022/23 [1][2][3].During this period, the yield increased steadily, providing about 789.02 million metric tons globally [4].Wheat provides between 20% and 36% of calories for the world's population [5,6].The Food and Agriculture Organization (FAO) emphasises that the rapidly growing population and escalating demands for cereal production would require a 70% increase in cereal supply by 2050 [1,7,8].Monitoring wheat growth is essential for meeting future food demands and ensuring food security, which promotes sustainable agricultural management and enhances yields.However, variations in soil properties, agro-ecosystems, topography, and crop growth conditions within fields impact crop growth [1,7].Furthermore, wheat growth is affected by variations in the intra-field soil properties; biological, physical, and chemical factors; and management practices [7].Accurate in situ measurements and establishing the distribution of soil properties within planted areas are crucial for understanding their impact on intra-field crop growth and promoting sustainable agricultural management [9][10][11].
Soil physical and chemical properties regulate soil productivity, which influences crop development [12].Concurrently, infertile acidic soils are detrimental to crop development.Infertile soils are characterised by high aluminium (Al) toxicity, low pH (acidic), low microbial activity, low soil organic carbon, and a lack of essential chemical properties that hinder wheat growth at the early development stages [5,13].These soil conditions and characteristics result in problems such as reduced root branching, deformed root tips, lodging, and the discolouration of leaf tissue with shades of yellow and purple [13,14].Furthermore, wheat cultivated within infertile acidic soil experiences a reduced protein content and growth rate and lower yields, which result in reduced profits.Soil elements such as phosphorus (P), potassium (K), magnesium (Mg), calcium (Ca), sodium (Na), nitrogen (N), and pH are vital for crop growth and often exist in low concentrations in arid and semi-arid environments [15][16][17][18].Deficiencies of N, P, and K in soil affect wheat growth and yield drastically [19].The influence of intra-field soil physiochemical properties' variation and meteorological conditions are key factors on crop development across various crop stages.Other detrimental effects on wheat growth include abiotic stresses such as droughts, frost, waterlogging, salinity, high temperatures, and other natural calamities [20,21].The biotic factors, which include the infestation of diseases, competing weeds, and pests, are common challenges for crop development [22][23][24][25].
There are various vegetation indices derived from the red and near-infrared (NIR) bands, which aid in the understanding of vegetation absorption and reflectance properties.These vegetation indices are commonly used in monitoring crop development, growth, and associated stresses during various phenological stages of the crop for timely interventions in farm management [26][27][28].In addition to vegetation indices derived from satellite products, UAV-derived indices can also aid in detecting the intra-field spatial variability of wheat crop growth with a higher spatial resolution and accuracy compared to most satellite products.The existing conventional methods (i.e., scouting and automated observation systems using computer vision) to monitor crop growth variation do not accommodate vegetation indices to model and predict intra-field crop growth.Furthermore, traditional methods are time-consuming, labour-intensive, and unrealistic for time-series modelling required by large-scale farms.They usually result in many forms of inaccuracies associated with human survey errors [29].Recent developments of UAVs in remote sensing provide an efficient, non-destructive, and rapid alternative approach that can provide cost-effective time-series data of vegetation indices for modelling crop growth variability [29,30].However, the reflectance can be greatly affected by the surface temperature, atmospheric distortions, water content, saturation, landscape heterogeneity, and vegetation type, which can affect the modelling accuracies of actual crop growth [27,31].Moreover, coarse spatial resolution satellite imagery limits the regression model estimation accuracy due to spectral mixing of different classes [32].Combining high-resolution UAV-derived vegetation indices with in situ soil properties' data can enhance crop growth modelling.For example, a Belgian case study confirmed that soil properties account for 15 to 26% of the wheat growth variance using machine learning methods with UAV imagery [7].A case study of Southwest Montana in the USA has successfully predicted accurate soil properties and wheat growth variation using machine learning algorithms and vegetation indices derived from UAV imagery [13].The integration of UAV-based imagery and elevation data improved modelling accuracies based on machine learning methods for wheat height growth and above-ground biomass in Fengling Reservoir fields in China [33].Nevertheless, there is still a lack of methods, which integrate the multiple factors influencing plant growth as well as quantifying their importance in modelling.The common modelling approaches include parametric and non-parametric regression for crop biophysical parameter estimation.These include partial least squares regression (PLSR), random forest (RF), support vector machine (SVM), extreme gradient boosting (Xgboost), conditional inference forest (CI-forest), artificial neural network (ANN), least squares linear regression (LSLR), multiple linear regression (LR), neural network (NN), decision tree (DT), regression tree (RegT), K-nearest neighbour (KNN), boost tree (BST), and bagging tree (BagT) ensemble learning algorithms [7,18,30,[34][35][36].PLSR provides a high level of interpretability and can overcome problems of collinearity in modelling, enhancing the accuracy of the model [9,17].However, other studies suggest that PLSR is not always adequate for modelling the relationship between soil properties and crop height because this relationship is not always linear [37][38][39].This limitation has contributed to the rising need for exploring the use of nonlinear machine learning algorithm (MLA) methods and other models.RF has the capabilities to classify and handle complex data with continuous values, but it is not robust and sensitive to outliers, which can cause overfitting or poor generalisation, and it does not address collinearity when applied with large or small input data [40][41][42].SVM has similar merits and demerits to RF, except that it uses kernel-based functions for mapping input features at higher dimensional space and exploits support vectors for fixing regression fitting [43].In general, several MLAs such as NNs, RF, SVM, KNN, RegT, and Xgboost often experience black box problems, among others [44][45][46].Meanwhile, GPR has the capability of overcoming the black box challenges by employing kernel functions, which offer uncertainty estimates for model predictions across a spectrum of data inputs, ranging from simple to highly complex [45,46].Kernel-based regression algorithms such as GPR are superior to several MLAs in retrieving modelling accuracy [47,48].Few studies have reported the feasibility of kernel-based methods in modelling wheat biophysical variables such as crop height using time-series vegetation indices' data for an entire season [49].A multispectral sentinel-2 dataset has shown a potential estimation of crop biophysical variables such as the plant height, leaf area index (LAI), leaf chlorophyll content (LCC), fraction of absorbed photosynthetically active radiation (FAPAR), fraction of vegetation cover (FVC), and canopy chlorophyll content (CCC) using random forest tree bagger (RFTB), BagT, LSLR, PLSR, and GPR [46,[49][50][51].However, studies focusing on soil properties and UAV datasets that have a high spatial, spectral, and temporal resolution are lacking.UAVs have a high potential for estimating field-scale wheat growth.
This study addresses a gap in the existing literature by focusing on the integration of high-resolution UAV-derived vegetation indices with in situ soil properties' measurements [7,13].This could contribute to a more comprehensive understanding of the factors influencing wheat growth and enhance the modelling accuracy.Furthermore, this study aims to investigate machine learning regressions such as GPR, ER, DT, and SVM for predicting wheat crop height using a combination of UAV-derived vegetation indices and soil properties.By considering multiple factors simultaneously, the research aims to fill a gap related to the absence of holistic approaches in previous studies that often focused on individual aspects of wheat growth.Additionally, while previous studies have explored UAV imagery and soil properties, this study specifically aims to address the gap in research focusing on field-scale wheat growth variability [7,14,35].This may involve considering the spatial, spectral, and temporal resolution of data to provide more detailed insights into wheat height variability patterns.The main objectives of this study were to (1) investigate and understand the contribution of soil properties and vegetation indices in modelling crop height of heterogeneous winter wheat planted in a dryland environment, and (2) assess the prediction accuracy changes when using the vegetation-index-only scenario and combined vegetation indices with soil properties scenario for wheat crop height.Although experiments were conducted in South Africa, the techniques developed in this study can be tested in other semi-arid regions as well.An example is Australia, which is a major producer of wheat and is also facing a decline in wheat production [52].India is also a significant wheat-producing country with diverse agro-climatic zones and unique Land 2024, 13, 299 4 of 25 challenges related to smallholder farming, decreasing soil nutrients, and issues in water resource management [53].

Materials and Methods
Figure 1 provides an overview of the methodology used in this study to investigate the contribution of vegetation indices and soil physical and chemical properties to wheat crop height.The physical and chemical properties of the soil samples were used for generating kriging spatial interpolation maps.UAV data bands were used to calculate the vegetation indices' map.The vegetation indices' data were first used separately and secondly stacked with kriging soil properties for model prediction.The datasets were divided into 80% for the training set and 20% for the testing set for GRP, ER, DT, and SVM models.Thereafter, model evaluation accuracy was generated for all the evaluated models.
major producer of wheat and is also facing a decline in wheat production [52].India is also a significant wheat-producing country with diverse agro-climatic zones and unique challenges related to smallholder farming, decreasing soil nutrients, and issues in water resource management [53].

Materials and Methods
Figure 1 provides an overview of the methodology used in this study to investigate the contribution of vegetation indices and soil physical and chemical properties to wheat crop height.The physical and chemical properties of the soil samples were used for generating kriging spatial interpolation maps.UAV data bands were used to calculate the vegetation indices' map.The vegetation indices' data were first used separately and secondly stacked with kriging soil properties for model prediction.The datasets were divided into 80% for the training set and 20% for the testing set for GRP, ER, DT, and SVM models.Thereafter, model evaluation accuracy was generated for all the evaluated models.

Study Area
The Clarens experiment wheat farms cover about 30 ha (Farm A) and 17 ha (Farm B).The two farms were prepared using Cireun 100 kg/ha fertilizer with a ratio of N:55:P:15:K:8 for cultivar PAN: 3161.This wheat cultivar is suitable for sowing in the dryland production areas of the Free State province.The two farms are located at the Dihlabeng Local Municipality (DLM) within the Thabo Mofutsanyane district in the northern part of the Free State province in South Africa (Figure 2).The municipality receives an annual average rainfall of 688 mm, a minimum of 7.8 °C in the summer season, and a maximum of 20.7 °C (average temperatures) during the winter and summer season

Study Area
The Clarens experiment wheat farms cover about 30 ha (Farm A) and 17 ha (Farm B).The two farms were prepared using Cireun 100 kg/ha fertilizer with a ratio of N:55:P:15:K:8 for cultivar PAN: 3161.This wheat cultivar is suitable for sowing in the dryland production areas of the Free State province.The two farms are located at the Dihlabeng Local Municipality (DLM) within the Thabo Mofutsanyane district in the northern part of the Free State province in South Africa (Figure 2).The municipality receives an annual average rainfall of 688 mm, a minimum of 7.8 • C in the summer season, and a maximum of 20.7 • C (average temperatures) during the winter and summer season [54].Most rainfalls occur in summer with hot days and cold dry winter seasons [55,56].The predominant soil type is sandy loam with Avalon and Pinedene characteristics that indicate moderately permeable soils [57].The Thabo Mofutsanyane district is characterised by dryland production areas and is one of the main rainfed winter wheat producers in the Free State province [58,59].However, long dry spells, droughts, and frost occurrence are prominent climatic drivers that affect crop yields and agricultural production in this region [60][61][62][63].The reliance of winter wheat on rainfall in the Free State province makes it susceptible to the risk of altered rainfall distribution patterns and declining rainfall amounts, which affect the rate of growth and anticipated yields [61][62][63].The selected case study locations cultivate wheat consistently.Nevertheless, the farmers have been experiencing declining yields in recent years, which could be linked to several factors such as soil properties' variability and changing climate conditions that cause crop stress.A previous study based on this region only focused on the characterisation of wheat nematodes from cultivars [58].There is currently a lack of studies that focus on the biophysical properties of both soil and crops.
nd 2024, 13, x FOR PEER REVIEW 5 of 26 indicate moderately permeable soils [57].The Thabo Mofutsanyane district is characterised by dryland production areas and is one of the main rainfed winter wheat producers in the Free State province [58,59].However, long dry spells, droughts, and frost occurrence are prominent climatic drivers that affect crop yields and agricultural production in this region [60][61][62][63].The reliance of winter wheat on rainfall in the Free State province makes it susceptible to the risk of altered rainfall distribution patterns and declining rainfall amounts, which affect the rate of growth and anticipated yields [61][62][63].The selected case study locations cultivate wheat consistently.Nevertheless, the farmers have been experiencing declining yields in recent years, which could be linked to several factors such as soil properties' variability and changing climate conditions that cause crop stress.A previous study based on this region only focused on the characterisation of wheat nematodes from cultivars [58].There is currently a lack of studies that focus on the biophysical properties of both soil and crops.

Analytical Analysis of Soil Samples
The soil samples collected amounted to 97 (Farm A) and 76 (Farm B) collected within a 0-20 cm depth in the topsoil layer, during the dry month of August 2021.A handheld Global Positioning System (GPS) was used to capture the spatial position of each sampled point.All the soil-sample-processing procedures were administered by the Agricultural Research Council Institute for Soil, Climate and Water Analytical Laboratory.These procedures include air-drying the soil samples at 25 °C and crushing and sieving them into a less than 2 mm size to remove gravel stones and plant residues.All soil samples were mostly characterised by sandy loam and clay soil textures.Multiple analytical processing methods were used to classify soil physical properties, chemical nutrients, and texture (as described in Table 1 below).Between August and November 2021, UAV flight missions and in situ measurements were conducted during early tillering and heading stages of winter wheat growth on both farms.Table 2 displays the results.Ground crop height measurements were performed using a metal tape measure in centimetres (cm).All flight missions were planned at a 120 m height above the surface and the UAV imagery overlap was set at 75% for both frontal and lateral overlaps.Thus, a spatial resolution with a pixel size of 8 cm UAV imagery was obtained under clear sky conditions and moderate wind speeds.Figure 4a

Field Data Collection 2.2.1. Analytical Analysis of Soil Samples
The soil samples collected amounted to 97 (Farm A) and 76 (Farm B) collected within a 0-20 cm depth in the topsoil layer, during the dry month of August 2021.A handheld Global Positioning System (GPS) was used to capture the spatial position of each sampled point.All the soil-sample-processing procedures were administered by the Agricultural Research Council Institute for Soil, Climate and Water Analytical Laboratory.These procedures include air-drying the soil samples at 25 • C and crushing and sieving them into a less than 2 mm size to remove gravel stones and plant residues.All soil samples were mostly characterised by sandy loam and clay soil textures.Multiple analytical processing methods were used to classify soil physical properties, chemical nutrients, and texture (as described in Table 1 below).

UAV Data Collection and Crop Height Measurements
Between August and November 2021, UAV flight missions and in situ measurements were conducted during early tillering and heading stages of winter wheat growth on both farms.Table 2 displays the results.Ground crop height measurements were performed using a metal tape measure in centimetres (cm).All flight missions were planned at a 120 m height above the surface and the UAV imagery overlap was set at 75% for both frontal and lateral overlaps.Thus, a spatial resolution with a pixel size of 8 cm UAV imagery was obtained under clear sky conditions and moderate wind speeds.Figure 4a depicts the multirotor UAV DJI-Matrice 600 Pro system with a MicaSense RedEdge-MX multispectral sensor.Figure 4b shows the calibration reflectance panel (CPR) used to calibrate the acquired UAV tiles during data processing.multispectral sensor.Figure 4b shows the calibration reflectance panel (CPR) used to calibrate the acquired UAV tiles during data processing.The information in Table 3 presents spectral information about MicaSense RedEdge-MX camera wavelength (475-840 nm), bandwidth (20-40 nm), and constant laboratorycalibrated reflectance panel (CRP) values ranging from 0.532 to 0.536, respectively.The integration of sensor measurements' irradiance of a Downwelling Light Sensor (DLS) and CRP is vital during the calibration process to construct accurate surface reflectance in all spectral bands.

UAV Data Processing
Generally, the process of UAV image processing involves (1) aerial triangulation, (2) Digital Surface Model (DSM) generation, (3) the rectification of individual images, and (4) an orthomosaic [70].Radiometric, geometric-corrected, vignette-corrected, and mosaicking of UAV imagery collections from different surveys on winter wheat fields were carried out using Pix4Dmapper software 4.8.0 version (Pix4D SA, Lausanne, Switzerland) to produce accurate orthorectified surface reflectance images.Before each flight, pictures from the radiometrically calibrated target, the position of the sun, and incoming radiance were The information in Table 3 presents spectral information about MicaSense RedEdge-MX camera wavelength (475-840 nm), bandwidth (20-40 nm), and constant laboratorycalibrated reflectance panel (CRP) values ranging from 0.532 to 0.536, respectively.The integration of sensor measurements' irradiance of a Downwelling Light Sensor (DLS) and CRP is vital during the calibration process to construct accurate surface reflectance in all spectral bands.

UAV Data Processing
Generally, the process of UAV image processing involves (1) aerial triangulation, (2) Digital Surface Model (DSM) generation, (3) the rectification of individual images, and (4) an orthomosaic [70].Radiometric, geometric-corrected, vignette-corrected, and mosaicking of UAV imagery collections from different surveys on winter wheat fields were carried out using Pix4Dmapper software 4.8.0 version (Pix4D SA, Lausanne, Switzerland) to produce accurate orthorectified surface reflectance images.Before each flight, pictures from the radiometrically calibrated target, the position of the sun, and incoming radiance were simultaneously measured.The data captured are used to generate surface reflectance imagery.The UAV onboard Global Positioning System (GPS) sensor data are used in the bundle block adjustment process by applying the Structure from Motion (SfM) algorithm to compute the relative locations of the sensors during the flight and to simultaneously calculate the sensor parameters of each image [71].A DSM was generated using the dense point cloud by applying multi-view stereo matching [72] and grid interpolation.Orthomosaicked individual images are combined into five multispectral bands.This process was also followed by similar studies such as [73,74].

Wheat Crop Growth Band Spectral Response
The earth's surface features have different spectral reflectance (spectral signatures) in the electromagnetic spectrum [75].Figures 5 and 6 in Farm A and Farm B (A-E) present five different spectral bands of reflectance generated after UAV processing from tillering to ripening wheat growth stages.Generally, the reflectance from the five spectral bands varied from August 2021 to November 2021.The Blue (A), Green (B), and Red (C) spectral bands were less sensitive to surface reflectance of wheat growth canopies because of chlorophyll absorption in visible light of the electromagnetic spectrum.However, the RedEdge (D) and NIR (E) spectral bands showed a substantial wheat surface reflectance variation at different stages.Overall, both RedEdge and NIR are very important spectral bands in detecting intra-season crop growth changes.
simultaneously measured.The data captured are used to generate surface reflectance imagery.The UAV onboard Global Positioning System (GPS) sensor data are used in the bundle block adjustment process by applying the Structure from Motion (SfM) algorithm to compute the relative locations of the sensors during the flight and to simultaneously calculate the sensor parameters of each image [71].A DSM was generated using the dense point cloud by applying multi-view stereo matching [72] and grid interpolation.Orthomosaicked individual images are combined into five multispectral bands.This process was also followed by similar studies such as [73,74].

Wheat Crop Growth Band Spectral Response
The earth's surface features have different spectral reflectance (spectral signatures) in the electromagnetic spectrum [75].Figures 5 and 6 in Farm A and Farm B (A-E) present five different spectral bands of reflectance generated after UAV processing from tillering to ripening wheat growth stages.Generally, the reflectance from the five spectral bands varied from August 2021 to November 2021.The Blue (A), Green (B), and Red (C) spectral bands were less sensitive to surface reflectance of wheat growth canopies because of chlorophyll absorption in visible light of the electromagnetic spectrum.However, the RedEdge (D) and NIR (E) spectral bands showed a substantial wheat surface reflectance variation at different stages.Overall, both RedEdge and NIR are very important spectral bands in detecting intra-season crop growth changes.

Derived Vegetation Spectral Indices
The list of vegetation indices in Table 4 was computed using different spectral bands from the UAV imagery.This study used the following vegetation indices: Normalized Difference Vegetation Index (NDVI), RedEdge Normalized Difference Vegetation Index (RENDVI), Normalized Difference Index (NDI), and Ratio Vegetation Index 2 (RVI 2).The selection of the above indices was based on the previous literature of similar studies that showed their capacity to characterise crop growth heterogeneity, reduce saturation, and improve model predictions [7,76].

Intra-Field Crop Growth Modelling Using Different Machine Learning Regressions
In this study, the four models including GPR, ER, DT, and SVM were selected for intra-field crop growth modelling and mapping.These regression models are explained in the following sub-sections.

Derived Vegetation Spectral Indices
The list of vegetation indices in Table 4 was computed using different spectral bands from the UAV imagery.This study used the following vegetation indices: Normalized Difference Vegetation Index (NDVI), RedEdge Normalized Difference Vegetation Index (RENDVI), Normalized Difference Index (NDI), and Ratio Vegetation Index 2 (RVI 2).The selection of the above indices was based on the previous literature of similar studies that showed their capacity to characterise crop growth heterogeneity, reduce saturation, and improve model predictions [7,76].

Intra-Field Crop Growth Modelling Using Different Machine Learning Regressions
In this study, the four models including GPR, ER, DT, and SVM were selected for intra-field crop growth modelling and mapping.These regression models are explained in the following sub-sections.

Gaussian Process Regression (GPR)
The GPR is a non-parametric kernel-based MLA, which can learn the relationship between the dependent and independent variables by fitting Bayesian statistics [78,79].
GPR generally uses simple parameter optimisation in comparison to other machine learning regression methods [80,81].However, it can be automatically completed by maximising the marginal likelihood in the training dataset [5].The GPR used in this study was applied in MATLAB software R2019b version.

Ensemble Regression (ER)
ER consists of least squares boosting trees (LSboost) and bagging trees (BGTs).Ensemble approaches construct a baseline group of learning (classifiers) procedures that are combined by voting on their estimations [81,82].Bagging creates baseline learners by producing simulated bootstrap data and boosting the weights of the training set samples [83].However, there are several differences in bagging and boosting learning algorithms [84].Bagging selects the training samples randomly and autonomously while boosting has succession relation to the previous learning.Furthermore, bagging has equal weights and boosting has different weights for all base learners.Bagging generates parallel base learners and boosting chronologically.Both the bagging and boosting regression approaches often perform better than a single classifier.This occurs because of the generation of classifiers with higher accuracy through combining diverse classifiers with lower accuracy, which is often applied to learn complex and nonlinear data in solving practical problems [44].This study applied ensemble learning algorithms using MATLAB software [83,85].

Decision Trees (DTs)
DT belongs to non-parametric algorithms, which are used for both regression and classification tasks [86,87].DT is a supervised machine learning method and its principle relies on using probability trees to facilitate the decision-making process and estimate the value of a target feature.Additionally, DT is a built model to learn all the decision rules inferred from the input data variables and later can be used to make decisions and estimations.Optimisable DT uses optimal parameters and hyperparameters to create a system that can define search space for distinct hyperparameters.This study used MATLAB software to train the DT model.

Support Vector Machines (SVMs)
The SVM is a nonlinear and non-parametric method that relies on kernel functions (mathematical functions) [88][89][90].The kernel functions transform the input data into the required format using SVM algorithms [91].The principle of kernel functions is to help translate input data into higher-dimensional space for receiving it linearly and separately by a hyperplane in solving quadratic optimisation problems [92].The SVM algorithms use different types of kernel functions, which consist of the radial basis function (RBF) and sigmoid, Gaussian polynomial, linear, and sinusoidal functions [93,94].The nonlinear kernels such as RBF usually perform better than linear kernels, while the linear kernels have efficient computation [95].To improve performance in RBF kernels, both the sigma parameter and complexity C (regularisation) parameter need to be enhanced in the prediction process [96].Furthermore, SVM has several hyperparameters that influence its performance, such as the choice of kernel, regularisation parameter, and kernel-specific parameters.These hyperparameters need to be carefully tuned using techniques like crossvalidation to obtain the best model performance [90,97].SVM is a common regression method for modelling different crop biophysical parameters.This study applied SVM using MATLAB software.

Kriging
Kriging is the geostatistical algorithm that predicts the value of unsampled points using the procedure of weighing neighbouring point values [98,99].The ordinary kriging (OK) procedure was applied to interpolate the value of unsampled points and generate maps of soil physical and chemical properties with Equation (1).The soil sampled input data were used to compute spatial variation structure and evaluated using a semi-variogram [100,101].
where Z(S i ) is the measured value at the ith location; λ i is an unknown weight for the measured value at the ith location; Ẑ(S 0 ) is the prediction location; n is the number of measured values separated by the distance h.The values of c o , c, and a are derived on estimated standard error (SE) parameters fitted to semi-variograms.

Experiments
The study investigated the contribution of soil properties and vegetation indices to improve the modelling accuracy of intra-field crop growth variability for two winter wheat farms.The first experiment used vegetation-index-only datasets as predictor variables for wheat crop height (Table 5).Furthermore, the second experiment used a combination of vegetation indices and soil properties as predictor variables.The K-10-fold cross-validation was applied to divide datasets into 80% (400 points) for the training set and 20% (100 points) for the testing set for Farm A, while datasets for Farm B were split into 80% (304 points) for the training set and 20% (76 points) for the testing set.MATLAB software was used to run the four machine learning regressions consisting of GPR, ER, DT, and SVM.

Validation and Accuracy of the Models
To monitor wheat growth using different datasets, including soil properties and vegetation indices, data were used individually and synergistically with applying machine learning.The predictive model accuracy performances of GPR, ER, DT, and SVM were evaluated using the RMSE, mean absolute error (MAE), and R 2 presented in the equations below, Equations ( 7) to (9).Both RMSE and MAE are non-negative metrics with lower values indicating better model performance.R 2 ranges from 0 to 1, where 0 indicates the model and explains none of the variance in the dependent variable, and 1 signifies a perfect fit, explaining all the variance.
where n in the equations represents the number of sample points; P i and O i represent the estimated and observed crop height.The i and σ represent the standard deviations [102].
The crop heights measured in the field using a tape measure were compared to the values predicted by the UAV imagery to assess the validity of the models.This study used a k-fold strategy, where k is the number of folds with a value of 10, which repeats the data split 10 times during the process for both the training and validation of models to avoid overfitting.All samples are split into 80% and 20% for both training and testing, respectively.At least for each time, random split sub-datasets were used iteratively for training and the remaining sub-subset was used for validation.Repeating the training procedure multiple times resulted in all observations for both training and validation with each observation being used for validation once [103].This study performed assumptions' diagnosis to assess residual normality distribution in Figure A1 (Appendix A).A random distribution of the residuals was observed, which indicates that the linear model was suitable to fit wheat growth with vegetation indices and soil data measurements.The histograms showed a positive skew to the right of residuals and more residuals were closer to the straight line.In contrast, the QQ plot revealed variations in terms of distribution around the diagonal line.

Descriptive Statistical Analysis for Soil Physical and Chemical Properties
The descriptive statistics for collected soil physical and chemical properties' measurements are presented in Table 6 for both winter wheat farms.The pH values ranged from 3.94 to 6.94, which is classified under acidic soils for both farms.However, sand was the predominant soil physical property ranging from 62 to 92%, followed by smaller amounts of clay (8-22%) and silt (0-18%).These physical properties are characteristics of loamy soils suitable for wheat growth [104].Other soil chemical properties such as Ca, K, P, Na, and Mg show high intra-field variation in both farms, individually.

Ordinary Kriging Semi-Variogram and Residuals for Soil Physical and Chemical Properties
This study computed the ordinary kriging spatial interpolation and evaluated the spherical, Gaussian, stable, and exponential models for experimental semi-variograms based on cross-validation.The lowest RMSE was the criteria to select optimal models in Table 7.The summary of semi-variogram model parameters included the Nugget, range (m), sill, number of lags (nlag), lag size, and Nugget/Sill ratio.The Nugget/Sill ratio (spatial dependencies) was 0-5.35 in Farm A and 0.001-6.43 in Farm B, which shows a high spatial correlation of soil physical and chemical properties in both farms [104], and the range (effective spatial dependence distance) was 100.31-575.68 and 0.001-0.009m, which means beyond this distance there is little or no autocorrelation in the soil physical and chemical properties.The soil physical and chemical properties' spatial interpolation maps generated by ordinary kriging are presented in Figures A2 and A3 (Appendix A).These interpolation maps show a high intra-field variation of all measured soil physical and chemical properties in both winter wheat farms.

Correlation Matrix
The Pearson correlation matrix shows that soil properties-particularly Ca, Mg, K, and clay-have a moderate positive correlation with crop height compared to the vegetation indices in both winter wheat farms appearing in Figures A4 and A5, respectively.However, there was a high variability for all collected soil properties in Farm A and Farm B. For instance, Farm B had a higher correlation with actual crop height with Mg (r = 0.7), K (r = 0.61), and clay (r = 0.49) than Farm A with Mg (r = 0.34), k (r = 0.33), and clay (r = 0.18).The difference in both soil chemical and physical properties' correlation can be attributed to an imbalance in the fertilisation rate and the availability of nutrients.

Model Validation
The performance of the GPR, ER, DT, and SVM models' accuracy statistics is summarised in Table 8 (Farm A) and Table 9 (Farm B).The GPR (R 2 = 0.69 to 0.74, RMSE = 15.95 to 17.91 cm) model performed better than ER (R 2 = 0.67 to 0.70 and RMSE = 17.13 to 18.68 cm) and other models for both farms using vegetation indices as input features, respectively.The UAV-derived vegetation indices' input features showed a slight improvement in model accuracies compared to the data fusion of vegetation indices and soil properties scenario.Overall, the evaluated GPR, ER, DT, and SVM models achieved a satisfactory accuracy result with training datasets.However, the minimal difference between the training and testing sets shows that sufficient data were used to reduce model overfitting.The difference between the training and testing datasets was minimal with an R 2 of 0.62-0.78for Farm A and an R 2 of 0.5-0.69 for Farm B. This validates the model and indicates the robustness of the model in handling variations.

GPR Model Variable Importance
The GPR robust performance model was used to rank the importance of predictor variables using the training data for different farms, respectively (Figure 9).The most important input features for GPR were RENDVI and NDI in Farm A, while RENDVI, RVI2, and NDVI ranked highly for the GPR model in Farm B. pH is the only soil chemical property that had a lower ranking in Farm A, while other soil properties had no contribution in both farm experiments.

GPR Model Variable Importance
The GPR robust performance model was used to rank the importance of predictor variables using the training data for different farms, respectively (Figure 9).The most important input features for GPR were RENDVI and NDI in Farm A, while RENDVI, RVI2, and NDVI ranked highly for the GPR model in Farm B. pH is the only soil chemical property that had a lower ranking in Farm A, while other soil properties had no contribution in both farm experiments.

Discussion
According to descriptive statistics, the soil pH of both farms is acidic, ranging from 3.5 to 6.94.The range of the pH conforms with previous findings that indicate a pH of about 5.5 in the study area [105].This scenario is anticipated in dryland wheat production within the study area.Low pH is detrimental to wheat growth [106,107].Furthermore, results revealed that soil properties, particularly Ca, Mg, K, and clay, have a moderate positive correlation with wheat crop height.Similar findings from other studies have re-

Discussion
According to descriptive statistics, the soil pH of both farms is acidic, ranging from 3.5 to 6.94.The range of the pH conforms with previous findings that indicate a pH of about 5.5 in the study area [105].This scenario is anticipated in dryland wheat production within the study area.Low pH is detrimental to wheat growth [106,107].Furthermore, results revealed that soil properties, particularly Ca, Mg, K, and clay, have a moderate positive correlation with wheat crop height.Similar findings from other studies have revealed that an abundance of soil chemical properties such as K, Mg, and Ca have an influence on the wheat crop height throughout the growth period [7,21,59].Other studies also demonstrated that the clay content, silt%, and pH values are more significant factors influencing plant growth [8,104].Vegetation indices showed a weak positive correlation with wheat crop height.In contrast, other studies found a strong correlation between vegetation indices and wheat height and yields [108,109].Additionally, the correlation increases as the winter wheat grows [108].Ordinary kriging is widely known for its ability to generate spatial interpolation maps in precision agriculture applications [110][111][112].This study confirmed that ordinary kriging is a robust method to produce soil property maps for both farms based on the low cross-validation RSME of semi-variogram models.
Four predictive machine learning models were evaluated.Results show that the GPR model outperformed ER, DT, and SVM models when predicting crop height at the wheat farms.The GPR model prediction accuracy results ranged between 65% and 75% for wheat height in the entire season.These results are better than findings from previous studies that obtained a 13% to 84% prediction accuracy for monitoring winter wheat growth using the PLSR model during the entire growing season [113].Other studies have found 68%, 88%, and 90% prediction accuracy of field-scale wheat biophysical variables, wheat yield, and wheat plant nitrogen density using the GPR model [44,45,114].These findings are similar to previous studies that showed the higher capabilities of GPR modelling performance compared to algorithms such as LR, RF, PLSR, LSLR, BagT, KNN, DT, NNs, ANN, and RegT when estimating different crop biophysical parameters [34,44,46,114].Furthermore, the result showed that the GPR model has a lower prediction accuracy with soil and UAV imagery derived from data fusion compared with the UAV vegetation indices scenario.In contrast, previous research showed that hyperspectral UAV and soil data fusion improve GPR modelling precision while providing more accurate results with vegetation indices for estimating wheat above-ground biomass [42,115].The improved performance of GPR can be linked to its use of kernel functions when dealing with input [46,47,82].Furthermore, GPR is flexible and reduces the potential of overfitting with highly dimensional observations in crop parameter estimation [42,50,116,117].In contrast, other studies show that PLSR and SVM achieved the highest prediction modelling accuracy compared to GPR for wheat crop height, above-ground biomass, and wheat yield [35,44].Additionally, ANN and RF have outperformed the GPR model for plant height and biomass estimation in previous studies [118,119].However, the robustness of the MLA model depends on the amount of input data and its features to calibrate nonlinear and complex data structures [42, 45,46].Despite the advantages of the GPR model such as the kernel function when dealing with the input training data, it cannot be generalised that GPR always performs better than other machine learning models.
The GPR model variable importance analysis indicates that RENDVI is vital for predicting wheat crop height.A similar study revealed that vegetation indices such as the enhanced vegetation index (EVI) performed better than soil properties in modelling crop height [44,113].These findings showed that the vegetation indices, especially those using the red-edge band, are superior for forecasting crop growth.Several studies have concluded that red-edge bands are anticipated to have a higher-ranking variable of importance in predicting crop growth because of their higher sensitivity in crop changes [31,120].Moreover, the wheat crop height changes throughout the season could have influenced the top ranking of RENDVI computed with red-edge bands in the current study.Meanwhile, this study showed that soil properties play a lesser role when estimating wheat crop height.pH had a lower ranking in all soil properties used to estimate crop growth.All other soil properties such as sand, clay, Na, Mg, Ca, K, and P showed no contribution to the GPR variable importance.However, previous studies highlighted contrasting findings that pH and K are top-ranking soil properties [7,14].In addition, random forest variable importance has revealed that the Ca_Mg ratio ranked highly compared to other soil properties and vegetation indices in soil organic carbon content [121].It is worth noting that clay plays a very important role in growing crops, whereas sand is not an ideal environment for growing crops [104].The changes within findings of variable importance are attributed to differences in the model input predictor variables.Understanding the different growth stages helps farmers plan and implement appropriate agricultural practices, such as timing irrigation, fertilization, and harvesting.The techniques developed in this study can be used in other semi-arid regions facing challenges related to optimising crop yield, resource management, and sustainable agriculture practices [52].
This study highlights the importance of vegetation indices and soil properties to predict crop height, which provides valuable information about basic crop management.However, the limitation of this research includes high fieldwork costs that resulted in one visit per month for data collection at different crop development stages.This study focused on time-series modelling but may not fully capture the temporal dynamics of wheat growth.The effects of short-term environmental fluctuations and seasonal variations on crop growth may not be adequately addressed.This study acknowledges that vegetation indices' reflectance can be affected by various factors such as surface temperature, atmospheric distortions, ambient light, water content, and vegetation type.These factors could introduce uncertainties in the accuracy of the models.We recommend incorporating climate data, soil indices, and environmental variables for a holistic understanding of crop growth while optimising model estimation accuracy.Furthermore, we recommend to investigate the benefits of fusing data from multiple sensors, such as thermal imaging, LiDAR, and hyperspectral sensors.This can provide a more comprehensive characterisation of crop health and growth status.

Conclusions
This study evaluates UAV-derived vegetation indices and soil properties to predict winter wheat growth at two identical farms.Vegetation indices and soil properties' predictor variables were related to crop height.The red-edge and NIR bands were highly sensitive to the surface reflectance of wheat growth.Clay, Ca, Mg, and K soil properties were related to wheat crop height with a positive correlation between 0.18 and 0.7.All the evaluated machine learning models including GPR, ER, DT, and SVM produced reasonable accuracies for crop height prediction.Additionally, model performance findings show that GPR (R 2 = 0.69 to 0.74, RMSE = 15.95 to 17.91 cm) has a high predictive capacity for crop height in both wheat farms.Variable importance highlighted RENDVI as the most influential predictor variable in the GPR model.The methodology developed in this study can help farmers improve farm management practices such as timing irrigation, fertilization, and harvesting.Extension services can also benefit from recommending site-specific crop management decisions to increase expected yields while improving food security.The research prospects may include seasonal datasets to understand variations and identify appropriate windows for early production assessment.Additionally, they can examine wheat physiological stresses and yield data to predict crop productivity.

Figure 1 .
Figure 1.Methodology flowchart for intra-field crop growth modelling used in this study.

Figure 1 .
Figure 1.Methodology flowchart for intra-field crop growth modelling used in this study.

Figure 2 .
Figure 2. Map showing the location of the Clarence wheat farms in the Thabo Mofutsanyane district in the Free State province of South Africa.

Figure 3
Figure 3 displays the monthly rainfall and temperatures received by Clarens wheat farms throughout 2021.The rainfall and temperatures were downloaded from NASA POWER (https://power.larc.nasa.gov/,accessed on 12 January 2024).NASA POWER has a spatial resolution of 0.5° latitude by 0.5° longitude to provide daily temperatures at 2 m and precipitation (mm/day) among other climate variables [64].Despite the low rainfall amount and relatively moderate temperatures received during wheat-growing months, an upward trend in average temperatures and rainfall was recorded between August and November 2021.

Figure 2 .
Figure 2. Map showing the location of the Clarence wheat farms in the Thabo Mofutsanyane district in the Free State province of South Africa.

Figure 3
Figure 3 displays the monthly rainfall and temperatures received by Clarens wheat farms throughout 2021.The rainfall and temperatures were downloaded from NASA POWER (https://power.larc.nasa.gov/,accessed on 12 January 2024).NASA POWER has a spatial resolution of 0.5 • latitude by 0.5 • longitude to provide daily temperatures at 2 m and precipitation (mm/day) among other climate variables [64].Despite the low rainfall amount and relatively moderate temperatures received during wheat-growing months, an upward trend in average temperatures and rainfall was recorded between August and November 2021.

Figure 3 .
Figure 3. Average monthly meteorological rainfall and temperature data from January to December 2021.

Figure 3 .
Figure 3. Average monthly meteorological rainfall and temperature data from January to December 2021.
The best-performing GPR model was used to produce scatterplots using training datasets.The data points close to the diagonal line show a good agreement between measured and predicted crop height values.The GPR model produced with vegetation indices' training data had a slightly better performance (R 2 = 0.74, RMSE = 15.95, and MAE = 11.59)compared to the GPR model (R 2 = 0.73, RMSE = 16.41, and MAE = 11.69)generated using a soil properties and vegetation indices data fusion for Farm A (Figure7).Similar results were observed in Farm B when the GPR model achieved a coefficient of determination of 0.4 with vegetation indices' input features (Figure8).

d
2024, 13, x FOR PEER REVIEW 16 of

Figure 7 .
Figure 7. Farm A scatterplots showing the relationship between predicted and observed crop heig using the GPR model with (a) vegetation indices and soil properties data fusion and (b) vegetatio index-only scenario.

Figure 7 .
Figure 7. Farm A scatterplots showing the relationship between predicted and observed crop height using the GPR model with (a) vegetation indices and soil properties data fusion and (b) vegetationindex-only scenario.

Figure 7 .
Figure 7. Farm A scatterplots showing the relationship between predicted and observed crop height using the GPR model with (a) vegetation indices and soil properties data fusion and (b) vegetationindex-only scenario.

Figure 8 .
Figure 8. Farm B scatterplots showing the relationship between predicted and observed crop height using the GPR model with (a) vegetation indices and soil properties data fusion and (b) vegetationindex-only scenario.

Figure 8 .
Figure 8. Farm B scatterplots showing the relationship between predicted and observed crop height using the GPR model with (a) vegetation indices and soil properties data fusion and (b) vegetationindex-only scenario.

Figure 9 .
Figure 9. Variable importance feature performance of vegetation indices and soil properties within (a) Farm A and (b) Farm B in the GPR algorithm.

Figure 9 .
Figure 9. Variable importance feature performance of vegetation indices and soil properties within (a) Farm A and (b) Farm B in the GPR algorithm.

Author Contributions:
Conceptualization, L.N. and C.M.; methodology, L.N. and C.M.; software and data pre-processing, L.N. and C.M.; writing-original draft preparation, L.N.; writing-review and editing, C.M., J.G.C., A.M.K., Z.M.-M., W.M. and P.E.R.; supervision, J.G.C., C.M., A.M.K., Z.M.-M.and W.M. All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the Council for Scientific and Industrial Research (CSIR) and the Department of Science and Innovation (DSI).Research support from the Agricultural Research Council-Natural Resources and Engineering (ARC-NRE), National Research Foundation (NRF) (grant number: TTK200221506319), and the South African National Space Agency (SANSA) is acknowledged.

Figure A2 .
Figure A2.Farm A ordinary kriging spatial interpolation maps of soil physical and chemical properties of sampled points.

Figure A3 .
Figure A3.Farm B ordinary kriging spatial interpolation maps of soil physical and chemical properties of sampled points.Figure A3.Farm B ordinary kriging spatial interpolation maps of soil physical and chemical properties of sampled points.

Figure A3 .
Figure A3.Farm B ordinary kriging spatial interpolation maps of soil physical and chemical properties of sampled points.Figure A3.Farm B ordinary kriging spatial interpolation maps of soil physical and chemical properties of sampled points.Land 2024, 13, x FOR PEER REVIEW 21 of 26

Figure A4 .
Figure A4.Farm A: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.

Figure A4 .
Figure A4.Farm A: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.

Figure A4 .
Figure A4.Farm A: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.

Figure A5 .
Figure A5.Farm B: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.Figure A5.Farm B: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.

Figure A5 .
Figure A5.Farm B: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.Figure A5.Farm B: Pearson correlation matrix of vegetation indices and soil physical and chemical properties.

Table 1 .
Summary of analytical soil physical and chemical properties.

Table 1 .
Summary of analytical soil physical and chemical properties.

Table 2 .
UAV-Survey Scheduled and ground-based measurements at different winter wheat growth stages across Farm A and Farm B.

Table 2 .
UAV-Survey Scheduled and ground-based measurements at different winter wheat growth stages across Farm A and Farm B.

Table 3 .
Properties of the MicaSense RedEdge-MX series sensor.

Table 3 .
Properties of the MicaSense RedEdge-MX series sensor.

Table 4 .
Vegetation spectral indices were used in this study.

Table 4 .
Vegetation spectral indices were used in this study.

Table 5 .
Experimental dataset for training and testing GPR, ER, DT, and SVM models.

Table 6 .
Statistical description of soil parameters across Farm A and Farm B.
Note: The top nine variables of Clay%; Sand%; Silt%; pH; Ca mg/kg; K mg/kg; P mg/kg; Na mg/kg; Mg mg/kg are derived from Farm A. The bottom nine variables of Clay%; Sand%; Silt%; pH; Ca mg/kg; K mg/kg; P mg/kg; Na mg/kg; Mg mg/kg are derived from Farm B.

Table 7 .
OK best-fitted semi-variogram and residuals of model parameters for soil physical and chemical properties.

Table 8 .
GPR, ER, DT, and SVM model evaluation statistics for Farm A.

Table 9 .
GPR, ER, DT, and SVM model evaluation statistics for Farm B.