National Scale 3D Mapping of Soil pH Using a Data Augmentation Approach

: Understanding the spatial variation of soil pH is critical for many different stakeholders across different ﬁelds of science, because it is a master variable that plays a central role in many soil processes. This study documents the ﬁrst attempt to map soil pH (1:5 H 2 O) at high resolution (100 m) in New Zealand. The regression framework used follows the paradigm of digital soil mapping, and a limited number of environmental covariates were selected using variable selection, before calibration of a quantile regression forest model. In order to adapt the outcomes of this work to a wide range of different depth supports, a new approach, which includes depth of sampling as a covariate, is proposed. It relies on data augmentation, a process where virtual observations are drawn from statistical populations constructed using the observed data, based on the top and bottom depth of sampling, and including the uncertainty surrounding the soil pH measurement. A single model can then be calibrated and deployed to estimate pH a various depths. Results showed that the data augmentation routine had a beneﬁcial effect on prediction uncertainties, in particular when reference measurement uncertainties are taken into account. Further testing found that the optimal rate of augmentation for this dataset was 3-fold. Inspection of the ﬁnal model revealed that the most important variables for predicting soil pH distribution in New Zealand were related to land cover and climate, in particular to soil water balance. The evaluation of this approach on those validation sites set aside before modelling showed very good results ( R 2 = 0.65, CCC = 0.79, RMSE = 0.54), that signiﬁcantly out-performed existing soil pH information for the country.


Introduction
Soil pH indicates the relative acidity or alkalinity of the soil, and is a master variable in soil science, both in managed and un-managed landscapes [1]. It plays a central role in numerous soil functions, soil quality, and fertility processes, impacting on physical structure, carbon, nitrogen, and phosphorus cycling, biological activity and regulation, bioavailibility of a range of nutrients, mobility and uptake of some trace elements such as cadmium [2,3]. This translates to soil pH being a parameter that is critical for a wide range of applications and stakeholders, such as the fertiliser industry, the soil across very distinct horizons, or the inclusion of uncertainties affecting the observations. The other advantage of a 3D approach is that the final predictions can be dynamically re-aggregated in different soil depth supports (for example, to serve the needs of different stakeholders) without having to re-fit a different model. In the case of New Zealand, this is an important advantage, as the soil grids produced using DSM need to address the needs of a variety of stakeholders who use different soil depths in their analyses.
The objectives of this study were to (1) calibrate a soil pH DSM model for New Zealand that uses the best set of observations available for the country; (2) test the potential of a novel 3D DSM approach that augments the amount of depth and attribute information using a resampling strategy; (3) evaluate the results of this model with those of a more traditional 2.5D approach; and (4) compare the accuracy of our model against existing soil pH products available for New Zealand.

Study Area
The study area is the terrestrial land mass of New Zealand. Three main islands make up most New Zealand: North Island/Te Ika-a-Māui, South Island/Te Waipounamu, and Stewart Island/Rakiura. The latter represents a significant area (1683 km 2 ); however, it is very sparsely populated, and is mainly covered in conservation land. Due to the lack of data coverage, smaller islands and archipelagoes, such as the Chatham Islands Group, located 800 km east of South Island/Te Waipounamu, and the sub-Antarctic islands such as the Auckland Islands, were excluded from this study.

Data
The approach used to map soil pH at national level follows the paradigm of digital soil mapping (DSM [27]), which relates point observations of soil properties with environmental variables representing soil-forming factors [18]. DSM has become a major framework in the generation of new soil information grids, often underpinned by national or international initiatives such as GlobalSoilMap [15].

Soil pH Data
Soil pH data were collated by merging a variety of disparate datasets from a wide range of soil surveys, soil quality programmes, wetland monitoring programmes, and soil biodiversity research programmes. The primary dataset was sourced from the National Soil Data Repository (NSDR, n = 5874, [28]). Soil pH in the NSDR was measured indifferent solutions: in calcium chloride (CaCl 2 ), in water at a 1:5 ratio, and in water at a 1:1 ratio. All pH measurements were converted to soil pH measured in a 1:5 water solution using a pedo-transfer function based on a simple linear model [29]. The spatial distribution of samples in the NSDR is biased towards productive land, so additional soil pH data were sourced from a plot-based national survey of forests and shrublands [30], and a series of site-specific studies in forest ecosystems around New Zealand. These datasets are hereafter collectively referred to as the "Forests" dataset (n = 3576). Additional points were sourced from the wetlands soils monitoring program ("Wetlands", n = 850). Finally, soil pH data were also collated from Topoclimate South, an intensive soil survey program that took place across the Southland region (n = 1644). The statistical distribution of soil pH was inspected in order to detect potential outliers. Samples with pH lower than 2 or higher than 11 were removed from the dataset, as such values were considered unrealistic for New Zealand conditions.
The spatial coverage of the point dataset for 1:5 H 2 O soil pH ( Figure 1) shows that most regions of New Zealand were well covered by the collated dataset. There were several exceptions: the region of Canterbury, on the East Coast of the South Island, was poorly covered. However, these are mostly recent soils, formed on Quaternary gravels, which typically show a lower magnitude of spatial variation. In the North Island, the most notable data gaps were located in the lower East Coast, in the regions of Hawke's Bay and Wairarapa. For the deeper layers, the bias against conservation land was evident, because samples from the "Forests" and "Wetlands" were collected only in the first 10 cm. A suite of 498 soil profiles (representing 1,520 individual samples) were set aside from the soil pH dataset before any model calibration, in order to evaluate the predictions of the different models tested in this study. They are hereafter referred to as the validation set, as opposed to the calibration set. Entire soil profiles were set aside for validation, in order to account for the autocorrelation of soil pH observations within soil profiles. Soil profiles in the validation set were selected by using spatially balanced sampling, as implemented by the pivotal method [31] using the BalancedSampling R package [32]. All available sites were given the same probability to be included in the validation set. The selected samples were balanced spatially, but also against the average pH value within each soil profile.

Environmental Covariates
A large number of spatial layers (n = 29) were pre-selected as potential covariates for soil pH modelling. They were initially selected on expert basis, following the assumption that all these layers could provide a certain degree of control of soil pH at the national scale. The different layers are listed on Table 1, and are organised following the common conceptual frameworks that are SCORPAN [27] and CLORPT [18]. Some of the covariates, such as rainfall or erosion yield, represent direct observations of a specific soil-forming factor, while others, such as latitude or northerness, represent a proxy for a wider range of soil-forming factors. The covariate data were processed in SAGA GIS [33] and GRASS GIS [34], then collated and interpolated to a common grid in GRASS GIS. The final resolution of the prediction grid was 100 m, and uses the New Zealand Transverse Mercator 2000 projection system. Finally, as no reliable model of soil depth is available for New Zealand, the SoilGrids layer "depth to bedrock" [13] was used to censor the depth of soil pH predictions.

Depth as a Covariate Using Data Augmentation
In statistics, data augmentation refers to the artificial augmentation of the number of observed data points in order to improve the analysis of the dataset [46]. The augmented dataset is a simulated dataset, and the augmentation procedure makes use of the characteristics of the observed dataset used to generate it. This concept has been recently adapted to machine learning, when a suite of subtle variations to the original dataset are used to increase the size of the calibration set, and improve the robustness of the generated models [47]. Data augmentation is typically used when the calibration set presents some deficiencies that might otherwise impair the analysis (missing values, few samples), or when it does not represent enough variability, and could therefore produce biased results. Here we propose a method using data augmentation to more rigorously include (i) soil depth as an explanatory variable, and (ii) soil pH measurement uncertainties in the pH prediction model.

Augmenting Depth Information
In soil databases, soil depth is usually recorded using the top and the bottom of the sampled horizon. Most 3D DSM models published to date take the horizon mid-point as a depth measurement [13,26]. Here, we improve on this by explicitly acknowledging that these measurements correspond to material sampled over a depth interval, as opposed to a specific point. The soil pH measurements in our dataset have been made on samples extracted using a variety of methods, including coring, augering, or pit sampling, and most have been collected from pedological horizons-either pedogenetic or functional horizons. An implicit assumption in those cases is that the sample collected for a given horizon (along with its associated measurements) is representative of all depths contained in this horizon. Thus, we propose that for a given horizon bounded by the depths top and bottom, a k-fold data augmentation of the soil depth can be achieved by randomly sampling k values from a uniform distribution U (min, max), where min and max are the top and bottom depths of the horizon (Figure 2a).

Augmenting Attribute Information
Data augmentation can also aim at improving the representativeness of the dataset used to calibrate a predictive model, and better represent the subtle variations that exist in the real world phenomena that are being modelled. In our case, we propose to include information about the uncertainty in soil pH measurement in the way the soil pH values are included in the different augmented datasets. In the absence of a more accurate alternative, we propose that the soil pH measurement associated with a given horizon can be described by a normal distribution N (µ, σ), where µ is the measured pH value for that horizon, and σ is a standard deviation describing the uncertainty of the pH measurement method (Figure 2b). Replication analysis at the laboratory responsible for most of our measurements (Manaaki Whenua-Landcare Research Environmental Chemistry Laboratory, Palmerston North, New Zealand) suggest that this standard deviation would be around 0.1.
Combining the augmentation of depth and soil pH information allows to convert the original dataset ( Figure 2c, left) into a k-augmented, virtual dataset (Figure 2c, right). While Figure 2c shows a 4-fold data augmentation, the effects of using different values of k have been tested in this study.

Modelling Framework
In this study, random forest (RF [48]) was the main tool used at different steps of the modelling, i.e., for feature selection of the final soil pH predictors, and then to calibrate the soil pH prediction model. A RF is an ensemble of regression trees that have been generated through bootstrapping. For a regression problem, a large number of decision trees are generated (the "forest", with a number of trees ntrees between 500 and 2000 typically being grown for most RF applications). Each tree is generated from an independent bootstrap of the calibration dataset, until the size of the nodes reaches a minimum size limit (node size). Unlike other similar algorithms, no pruning is done on the trees, however, at each node of the trees, a random subset of the available predictors (of cardinality mtry, where mtry ≤ n and n is the number of predictors) is chosen. The default value of mtry is arbitrarily chosen as √ n or n/3 by most RF implementations, but it is a parameter which, like node size, needs hyper-parameterisation [49]. Finally, every tree generates an individual prediction, and the RF prediction is an average of all individual tree predictions. In that sense, RF is considered an ensemble model. Samples left out by the bootstrap sampling are used to evaluate the predictions of each tree, and are referred to as "out-of-the-bag" (OOB).   The RF implementation used throughout the analysis was ranger [50], which is computationally efficient and easily parallelisable. To generate the final pH predictions, the quantile regression forest method was used (QRF [51]). QRF infers conditional quantiles by generalising the RF algorithm to conditional distributions (as opposed to just the mean), and therefore can derive prediction intervals, along with a median prediction [52]. All modelling was done in R version 3.6.3 [53].

Variable Selection
The number of environmental covariates selected for soil pH modelling was substantial, and numerous variables presented significant levels of correlation. While there was discussions on the ability of RF to handle a large number of correlated variables, and while the environmental covariates have been pre-selected using expert knowledge about the soil-forming factors associated with soil pH differences across New Zealand, the choice was made in this study to use variable selection to restrict the number of input variables, and combine the performance of RF with the interpretability of a parsimonious model [54,55].
The variable selection approach used was Variable Selection Using Random Forests (VSURF) [56], as implemented in the eponymous package for R [57]. The VSURF approach works in two steps, corresponding to two different, but complementary, goals. The first step aims at finding all the important variables for interpretation, even where there is redundancy within the selected set of variables. The second step reduces the number of variables further, and aims to restrict the predictors to a parsimonious suite of important variables, avoiding redundancy so as to get the best prediction model possible. The VSURF algorithm uses an iterative procedure with ranking and permutation of the variables in a large number of RF runs. The ranking itself is done based on the mean OOB error rate, defined as the sum of squared errors: whereŷ i is the predicted value of y i by trees belonging to the OOB samples. We used repeated 10-fold cross-validation (with 30 repeats) to find the optimal parameters for the RF with all the potential environmental predictors in order to determine which parameters to use with VSURF. The set of parameters minimising the root mean squared error (RMSE) were mtry = 6 and node size = 5. Following Genuer et al. [56], the number of trees in the RF was set to ntree = 2000, since using RF for variable selection requires more trees than the number typically used in regression so to give stable results [49].

Calibration of the Soil pH Model
A predictive model for soil pH from the selected variables was calibrated for different values of the augmentation factor (k ∈ {2, 3, 4, 5, 6, 7, 8, 10}) and different levels of uncertainty associated with the reference soil pH measurement (σ ∈ {0, 0.01, 0.05, 0.1, 0.15, 0.20, 0.25, 0.30}), in order to test the effects of these two factors. Additionally, strategies of (i) taking the mid-point of the horizon depths, and (ii) using a random depth between the top and the bottom of each horizon (which corresponds to k = 1) were also tested.
The hyper-parameterisation of each model (mtry and node size) was done on the calibration set using repeated spatial cross-validation using the CAST package [58] with 10 folds and 50 repeats. Spatial cross-validation was used in order to avoid the impact of spatial auto-correlation during the cross-validation [55]. The number of trees per forest was set to ntree = 500, since using more trees did not improve the accuracy of the model (results not shown), and following the suggestion that using too many trees might be detrimental in some cases [49]. Optimal hyper-parameters of the models were chosen to minimise the root mean square error (RMSE).
Soil pH predictions were then generated for the 100-m resolution grid supporting the environmental covariates. The optimal prediction model was used to develop predictions for each centimetre of soil depth between 0 and 200 cm (depth ∈ {0, 1, . . . , 200 } cm). The median predictions of soil pH were generated alongside the 5th, 25th, 75th, and 95th quantiles, so that the 50% and 90% prediction intervals could be calculated. Finally, maximum soil depth was used to prevent producing any values deeper than the depth to bedrock. The maximum soil depth estimates from SoilGrids 250 m [13] were used as there is currently no such national estimates for New Zealand.

Evaluation and Comparison to Existing Products
The generated soil pH maps were inspected visually for consistency with what is known in terms of soil pH distribution across New Zealand soils [59,60]. Then, the validation sites were used to evaluate the performance of the pH predictions, and compared with the other existing information products covering New Zealand: the soil pH layer from the Fundamental Soil Layers (FSL [12]), and SoilGrids 250 m [13]. The proposed approach was also compared with the more common 2.5D DSM approach, which involves (i) harmonising the observed soil data to predefined depth intervals using a mass-preserving spline, then (ii) fitting an independent prediction model for each of these depth intervals. The GlobalSoilMap intervals were used as the predefined depth intervals, since they are the de facto standard for this type of soil information product, and cover the entire soil profile [15]. Our calibration soil pH data were splined using the methodology originally proposed by Bishop et al. [16] to fit the six standard GlobalSoilMap depths. Then, at each depth interval, a predictive model was calibrated using the methodology presented above-but excluding soil depth from the covariates.
The respective predictions of both methodologies were aggregated back to the original depth support of the validation samples. To do so, soil pH estimates were re-aggregated on the depth support used by the different validation profiles. The integration method detailed in the following section was used to adapt the soil pH of our approach, while results from the 2.5D DSM approach, FSL and SoilGrids predictions were adapted using weighted averaging based on 1 cm slices, using the method of Beaudette et al. [61]. Performance metrics were then computed by comparing the predicted and observed values for the different approaches and products: R 2 , RMSE, bias, and Lin's Concordance Correlation Coefficient (CCC).

Adaptation of the Predictions to Suit Different End-Users
The dynamic re-aggregation of predictions to tailor the needs of different stakeholders was demonstrated for different end-users: (i) the GlobalSoilMap project, that uses a set of six standard depth intervals (depth intervals {0-5, 5-15, 15-30, 30-60, 60-100, 100-200} cm [15]); (ii) the New Zealand fertiliser industry, which uses a single {0-7.5} cm depth interval; and (iii) both the soil ecology community and the New Zealand soil quality reporting system, that uses a {0-10} cm depth interval. To aggregate the pH value Q k , corresponding to the k th quantile estimate for a given horizon H, bounded by the depths d top and d bottom , the following formula was used: where q k (d top , d bottom ) is the depth profile of predicted soil pH, generated using a linear interpolation of the predicted pH values estimated every centimetre between d top and d bottom . This approach was used to aggregate the {5th, 25th, 50th, 75th, 95th} quantiles predicted using QRF.

Soil Data
The summary statistics of the data collated across New Zealand (n = 11,944) for the study are presented in Table 2. In order to showcase more clearly the impact of soil depth on soil pH, horizon depths were classified using the GlobalSoilMap depth intervals using a majority rule. Generally, soil pH increases with depth, with the 0-15 cm interval (corresponding to topsoil in most NZ regions) being more acidic (mean soil pH < 5 in the upper 15 cm, and >5 below 15 cm). Many samples (almost half the total number of samples) were collected in the top 15 cm, which again is explained by past studies that have focused on the uppermost horizons (the forest and wetlands data in particular). The number of samples available past 100 cm depth was much smaller (884 samples, representing 7.4% of the total number of samples), and the deepest samples (past 200 cm) were very few (52 samples, representing 0.43% of the total number of samples). The observed variability, as assessed by the inter-quartile range (IQR), shows the 100-200 cm horizon to be the most variable, which can be explained by the differences in parent material that contribute to variations deeper in the soil profile. The topsoil depths (0-15 cm) also show a pronounced variability. Skewness of the data was small, and the data did not require transformation prior to modelling.

Selected Predictors and Variable Importance
After the two steps of the VSURF variable selection algorithm, 13 covariates (out of 29) were retained for modelling. Figure 3 shows the variables selected by the algorithm, sorted by their respective average OOB error rate. The most important covariate for prediction was land cover, followed by covariates related to moisture status (potential evapotranspiration, annual water deficit, monthly water balance, mean annual rainfall). This was not surprising, because the relationships between both land cover and soil-water balance with soil pH are well documented [9,62,63]. Then, soil depth and latitude were also selected. Position within the soil profile has an obvious impact on soil pH (Table 2), with the impact of water flows and various organisms being more pronounced near the surface. Latitude, on the other hand, is a compounding factor: due to the shape of New Zealand, which spans from sub-tropical latitudes (32 • ), down to latitudes nearing 50 • on Rakiura/Stewart Island, latitude encompasses a range of climatic effects. The last group of selected variables is composed of terrain variables (elevation, SAGA wetness index), climatic variables (solar radiation, mean annual temperature, observed evapotranspiration), and biological input (net primary production).
The absence of parent material information in the final model is surprising, but the quality of the main source of parent material information (the "top rock" field of the New Zealand Land Resource Inventory, [44]) is limited, and work is currently underway to better represent the pattern of parent material across New Zealand. Moreover, other variables, such as latitude, and climatic factors, such as rainfall and temperature, might be already capturing the spatial patterns of parent material (e.g., volcanic material in the north of the country, Quaternary gravel deposits in the south).
Further inspection of the covariates shows strong correlations between parent material and land cover, or potential evapotranspiration, for example. As for all data-driven DSM exercises, care should be taken when trying to infer knowledge about the drivers of soil properties solely from an analysis of variable importance [64]. Variables were selected and used by the model in order to produce the best possible prediction, and the model does not reflect their relative importance in terms of soil processes. It also does not distinguish whether selected variables are impacting soil pH, or affected by soil pH.  Figure 4 shows the impact of (i) the data augmentation strategy and (ii) the magnitude of the uncertainty of the pH measurement technique on the performance of the model. Results are shown both on the OOB samples, which are set aside during the RF training, and on the validation samples. On the OOB samples, and with low uncertainty around the pH values used for modelling, data augmentation was beneficial, up to about 5-fold, from which there was no marginal gain. However, when uncertainties around pH values in the calibration set increase over 0.1, data augmentation was optimal around a 3-fold augmentation rate, after which the benefits of augmentation on RMSE were more limited, but still yielded better results than no augmentation. There was no difference between the mid-point and the random depth strategies in terms of OOB RMSE. On the validation set, data augmentation was not beneficial compared with no augmentation when pH uncertainties were either very low or ignored. For pH uncertainties over 0.1 however, it was beneficial on validation RMSE, up to a level of 3-fold, from which no marginal gains were observed. Following these observations, an augmentation rate of k = 3 was used for the rest of the modelling, along with the standard deviation of pH values σ = 0.1 that was provided by the laboratory processing our datasets.

Model Evaluation
Results for the 498 soil profiles set aside for validation (n = 1520 samples) were used to assess the pH prediction performance. Overall, the validation statistics showed that the model performed well (R 2 = 0.65, CCC = 0.79). The RMSE of the model across all depths was 0.54. Moreover, the model showed negligible bias (bias = −0.03). The distributions for predicted and observed pH values (compared in Figure 5a) showed very minor differences. Most of the divergences between both distributions occurred around the 90% quantile, and suggest some under-prediction of the highest pH values. However, the 95% confidence intervals of the distributions were still overlapping, showing this had minor effects on the overall distribution.
Analysis of model residuals showed that they were normally distributed for most validation sites, and no significant trend could be detected (red line, Figure 5b). Heavy tails on the Q-Q plot (presented Figure 5c) and a slight deviation from normality can be noted on the Q-Q plot of those residuals, though. This confirms what was observed in Figure 5a, and this deviation from normality corresponds to the most alkaline soils (pH > 7). Such alkaline soils, however, are rare in New Zealand, and represent less than 2% of the measurements that were collated for this study.

Depth Profiles
Depth profiles were generated for 20 validation sites, at a 1-cm resolution from the soil surface, down to 2 m, or maximum recorded soil depth-whichever occured first ( Figure 6). These profiles were picked at random from validation sites that had more than six horizons sampled, so as to better study how the model performed throughout the whole profile. Uncertainty of predictions is shown using the 50% (dark blue band) and 90% (light blue band) prediction intervals, which surround the median prediction (dark blue line). The reference laboratory measurements are shown in red, and are surrounded by their 90% confidence interval (red band). Some profiles showed excellent results, and the prediction does follow the measured pH values, with narrow prediction intervals surrounding the predictions (e.g., SB08337, SB09355, SB09799). Other examples do follow the reference pattern of values as they evolve with depth, but present much wider prediction intervals (SB07653, SB09446). For some profiles, prediction intervals get to a width that make it difficult to use, such as SB09897, although these do carry useful information from a modelling perspective. Predictions for profiles such as SB08730 and SB09377 are unsatisfactory: in these specific cases, they correspond to profiles located in dry climate regions that are less represented in the original datasets. However, most of the inspected profiles did present a good result. Figure 7 shows the pH product aggregated for three different depth intervals that are often used for topsoil: 0-5 cm (first GlobalSoilMap depth interval), 0-7.5 cm (widely used by the fertiliser industry in New Zealand), and 0-10 cm (used for national reporting on soil quality in New Zealand, and widely used by the ecological research community). The differences between these maps are small, with absolute differences mostly <0.1, but these illustrate the potential to adapt the DSM outputs to different stakeholders, demanding different depth supports, without having to re-train a dedicated model for each of them.

Maps
The pH maps generated for each of the six GlobalSoilMap depth intervals are shown in Figure 8. These depth intervals are spanning the entire soil profile, and allow us to explore the variability of soil pH, laterally and vertically. Grey areas represent parts of the landscape without soil. Like most temperate countries, soils in New Zealand typically become more alkaline with soil depth [59]. The topsoil horizons (0-5 cm and 5-15 cm) are showing substantially lower pH, which is a feature of temperate, humid climates such as New Zealand's. This illustrates the impact of vegetation on soil pH. The darkest colours, associated with the most acidic soils, are located mostly over conservation land, where native forest dominates (Ruahine and Tararua ranges in the North Island), regions exposed to very large amount of rainfall (West Coast of the South Island), and peatlands. The other main soil landscapes of the country are also reflected in the successive maps. The highest pH values are found in Central Otago (in the southern part of the South Island), which hosts the only semi-arid soils in the country, in the Hawke's Bay (on the East Coast of the North Island), which has soils evolved on loess and a dry, warm climate, and in the Far North of the North Island, which has coastal, sandy soils.

Uncertainty of the Predictions Across the Country
The uncertainties associated with the soil pH predictions were assessed using the ratio between the inter-quartile range (IQR, i.e., the 50% prediction interval) and the median, and are mapped for each GlobalSoilMap depth interval (Figure 9). Higher values of that ratio correspond to wider prediction intervals (so to predictions that are more uncertain). The spatial distribution of these uncertainties is not random across two dimensional space and depth. Overall, prediction uncertainties increase with soil depth: this is an expected outcome, given the number of calibration points dramatically decreases with soil depth (Table 2). These maps also outline specific areas where the model is not performing well. For the topsoil depths (above 30 cm), the most uncertain predictions are encountered for the peatlands in the Waikato, in hill country under native vegetation (Northland forests, Central Plateau, Urewera and Kaimanawa ranges), and in the main mountain ranges (Ruahine, Tararua, and Southern Alps). The prediction intervals for the semi-arid soils of Central Otago-arguably the most alkaline soils in the country [60]-are widening very substantially for subsoil depths (below 30 cm). This is also the case for other dry areas of the country, such as the southern East Coast of the North Island, or South Canterbury.

Comparison with 2.5D DSM
The comparison between the results of our approach and of the 2.5D DSM models shows that they both yield very similar results (Table 3). Validation R 2 and CCC were slightly higher for the 2.5D models (R 2 = 0.65 vs. 0.68 and CCC = 0.79 vs. 0.8 respectively). The RMSE between both approaches was almost identical, and much smaller than the standard deviation of the reference pH laboratory method (RMSE = 0.54 vs. 0.53 respectively). Finally, the 2.5D models produced predictions that were slightly more biased (bias = −0.03 vs. −0.05 for our approach), although arguably that amount of bias could be considered negligible. To compare the uncertainties associated with both 2.5D and 3D DSM models, Nauman and Duniway [26] introduced the Relative Prediction Interval (RPI), expressed as the ratio between the 95% prediction interval and the median of the training data. It was adapted to our study, where the 90% prediction interval was used to express model uncertainty, and calculated as the ratio between the 90% prediction interval and the median of pH in the training dataset. The statistical distribution of RPI values calculated on the validation set for both approaches are compared in Figure 10. In contrast to the results reported by Nauman and Duniway [26] for soil organic carbon, we did not observe higher RPI values for our 3D approach, and overall the prediction widths for both methods were very similar. approach and the 3D approach tested in this study.

Comparison with Other Soil pH Products for New Zealand
Validation samples were also used to assess our approach against the currently available soil pH products for New Zealand, i.e., the Fundamental Soil Layers (FSL), a national legacy soil information product, and SoilGrids, the main global soil information product currently available. The results are presented in Table 3, and show that our model and the 2.5D approach significantly out-performs both existing products. The FSL and SoilGrids showed similar level of prediction performance (R 2 = 0.31 and 0.37, CCC = 0.49 and 0.51, RMSE = 0.76 and 0.79 respectively). The scatterplots of predicted vs. observed values for our model, FSL, and SoilGrids are compared in Figure 11, and outline in more detail the differences between the three pH products. The FSL is a polygon-based, expert-driven soil information product, as evidenced by the binning effects in the predicted values. Low values of pH, in particular, are poorly predicted by the FSL, possibly because these values correspond to regions of New Zealand where expertise is less developed (i.e., soils under native vegetation). Predictions for SoilGrids also suffer from censoring extreme values of pH, at both ends of the spectrum (i.e., for observed pH < 4.8 or pH > 7). The predictions from the approach used in this study produce more accurate predictions, with points closer to the grey 1:1 line. The more acidic soils are well predicted, unlike the FSL and SoilGrids; however, as suggested by the residuals plots ( Figure 5), the alkaline soils seem to be systematically under-predicted. New Zealand soils are generally quite acidic, a general phenomenon of southern hemisphere, temperate soils [65], so strong model performance in the low pH range is reassuring. However, exceptional soils with high pH, which include salt pans, some recent, sandy soils, and those derived from schist in arid climates, can foster distinctive biological communities in New Zealand [66,67] and greater precision in predicting these sites would be desirable for biodiversity modelling and conservation planning.
In absolute terms, the performance of our approach was satisfactory, and shows levels of uncertainties comparable or better than published DSM-based pH maps at regional or national scale in other parts of the world [24,26,52,68,69]. Most importantly, it represents a significant improvement on existing soil pH maps for New Zealand.

Limitations and Potential Improvements
The data augmentation step allowed for the inclusion of uncertainty about the soil pH measurements: in our case, we represented this uncertainty by the standard deviation σ = 0.1, which was derived by the laboratory using replication experiments over the recent years. There is potential to use this mechanism to include uncertainties associated with the large range of soil attributes measurement techniques that are now available; not only reference laboratory techniques, but also estimates made using pedo-transfer functions, or using proximal soil sensing (such as visible near-infrared and mid-infrared spectroscopy, portable X-ray fluorescence). In some cases, it may require further work to derive appropriate uncertainty estimates alongside those measurements. Furthermore, it does assume that the soil attribute value affected to the horizon is representative: this may only be the case when sampling has been done based on pedogenetic horizons, and when the soil property is reflected in soil morphology.
There are several ways both the prediction results and the conceptual modelling framework could be improved. Some covariates were not available for the study, but will soon be available for New Zealand landscape, in particular a better parent material map that includes pumice and tephra deposits, which are critical to many soil-forming processes, in particular in the North Island. Other covariates that could potentially improve the model are related to the management effects that are affecting soil pH distribution: better land use information could improve estimates on productive land, where liming is practised. Similarly, the collection of more observation data to compensate for imbalance in the calibration dataset should be aimed towards alkaline soils and non-productive land, in particular for deeper horizons.
Other issues are inherent to the method itself: in particular, using data augmentation inflates the number of data points to process, so computational requirements to calibrate the model are often higher than for a 2.5D approach, unless the latter uses a very large number of horizons. The computational cost is also very large when using the trained model to generate products, as predictions are (i) generated for every centimetre between the soil surface and the maximum soil depth, and then (ii) an integral is calculated from this suite of estimates to calculate values for specific depth intervals. Kriging of residuals was not tested in this study for this reason, but should be tested in the future. Nevertheless, we found the benefits associated with a 3D model (in particular the ability to adapt to different depth supports depending on the stakeholder's needs) to outweigh these issues: the use of high-performance computing will allow to by-pass most, if not all, the computational challenges.

Conclusions
This study presented the first mapping of a soil attribute for New Zealand-in this instance, soil pH-at 100 m resolution. To do so, we introduced a novel prediction method that draws virtual samples from the actual soil observations recorded from a depth interval, which artificially augment the training dataset. This allowed us to represent a wider range of depths associated with any given pH measurement, but also the uncertainties associated with that measurement. The augmented dataset was then used to train a single, 3D prediction model based on a quantile regression forest. This 3D DSM approach to soil pH modelling had similar performance to the more established 2.5D methods, and out-performed significantly all the existing soil pH information products currently available for New Zealand. However, compared with the 2.5D approaches, our model was also able to cater for the needs of a wide range of stakeholders, and we demonstrated the generation of soil pH grids for different depth intervals.