Development of Pedotransfer Functions to Predict Soil Physical Properties in Southern Quebec (Canada)

: Pedotransfer functions (PTFs) are empirical ﬁts to soil property data and have been used as an alternative tool to in situ measurements for estimating soil hydraulic properties for the last few decades. PTFs of Saxton and Rawls, 2006 (PTFs’S&R.2006) are some of the most widely used because of their global aspect. However, empirical functions yield more accurate results when trained locally. This study proposes a set of agricultural PTFs developed for southern Quebec, Canada for three horizons (A, B, and C). Four response variables (bulk density ( ρ b ), saturated hydraulic conductivity (K sat ), volumetric water content at ﬁeld capacity ( θ 33 ), and permanent wilting point ( θ 1500 )) and four predictors (clay, silt, organic carbon, and coarse fragment percentages) were used in this modeling process. The new PTFs were trained using the stepwise forward regression (SFR) and canonical correlation analysis (CCA) algorithms. The CCA- and SFR-PTFs were in most cases more accurate. Θ 1500 and at θ 33 estimates were improved with the SFR. The ρ b in the A horizon was moderately estimated by the PTFs’S&R.2006, while the CCA- and SFR-PTFs performed equally well for the B and C horizons, yet qualiﬁed weak. However, for all PTFs for all horizons, K sat estimates were unacceptable. Estimation of ρ b and K sat could be improved by considering other morphological predictors (soil structure, drainage information, etc.).


Introduction
A thorough knowledge of soil physical properties is important for crop production, water resource management, erosion risk prevention, contaminant discharge, and flooding interventions. Measurement of soil physical properties such as porosity and saturated hydraulic conductivity can be expensive and time-consuming. In order to avoid laborious measurements, Pedotransfer functions (PTFs) are used as predictors to estimate the physical characteristics of soil by using soil properties that are abundant, easy to measure, and inexpensive. PTFs are frequently developed to estimate volumetric water content for any given matric potential, porosity, saturated hydraulic conductivity, or bulk density. PTFs are also used to estimate plant available water [1,2], to model physical properties of soil during seasonal evapotranspiration [3], or to characterize the parameters of water retention curve models [4,5]. The most common predictors of soil physical properties are soil particle size distribution, organic matter content, coarse fragment content, and sometimes bulk density. Some authors also used texture class [6], moisture class [7], and soil morphological data such as soil structure [8] and color [9].
According to Patil and Singh [10], there exist two methods of PTF development: mechanistic and empirical approaches. Mechanistic approaches translate easily measured soil

Study Area
The study was conducted in the Monteregie agricultural area, located southeast of Montréal, Quebec, Canada. The climate of this region is temperate, with an average air temperature of −10.2 • C in January and 20.4 • C in July [34]. In terms of yearly averages, the duration of the frost-free period is 206.5 days, total rainfall is 931.7 mm, and total snowfall is 224.5 cm. Monteregie is one of the largest and most productive agricultural areas in Quebec [35]. Analyses conducted on both ground and surface water using bacteriological and physicochemical indices revealed that water quality is poor at many sampling points [36]. Soil quality was also affected by nutrient leaching, erosion, and overfertilization [37]. Greater use of beneficial management practices is, therefore, needed to ensure soil and water conservation in this region. Development of a set of appropriate PTFs to estimate secondary soil properties would be helpful for planning beneficial management practices implementation.
A high degree of pedodiversity and soil texture variability is perceived in the study area ( Figure 1). The soils in this region are gravelly, sandy, loamy, clayey, and organic soils [38]. Many soil taxonomic orders (as defined by the US system of soil taxonomy) are present, including spodosols, inceptisols, and histosols [39]. The soils of the region tend to have poor natural drainage; however, after artificial drainage, they become mostly moderately well drained [40]. Several soil surveys of southern Quebec have been updated since 1975 and are available on the Canadian Soil Information Service website [41].

Soil Data
Agriculture and Agri-Food Canada (AAFC) has maintained an analytical soil database for southern Quebec since 1975. This database contains a set of georeferenced samples from A, B, and C horizons. Soil physical data (primary and secondary properties) from the analytical soil database were used to develop and assess the proposed PTFs.
Primary soil properties are the first and second components of particle size distribution, such as clay, silt, organic carbon (OC), and coarse fragment (CF) percentages. These properties, which define soil pore space, have an impact on soil water-holding properties, hydraulic conductivity, and soil bulk density. That is why they were chosen as PTF predictors. To avoid multicollinearity problems, sand percentage was not considered. This choice implies the exclusion of organic soils, since they have no mineral content. Soil texture (clay and silt) was determined by the hydrometer method [42], CF (>2 mm) content was determined by sieving [42], and OC content was determined by the Walkley-Black method [43].

Statistical Methods
For a given soil profile, the distribution of soil properties changes with depth. For instance, OC tends to decrease with increasing depth [47]. In this paper, most of the studied soils are tilled, which can also influence physical properties (ρ b , θ, and K sat ) of the A horizon [48]. Consequently, a PTF solely based on A horizon properties is not suitable to estimate soil physical properties at other depths. The dataset used for PTF development in this paper was stratified according to soil horizons (A, B, and C).
Before developing new PTFs, a preliminary study of predictors and response variables is essential [49]. This preliminary study was conducted on each soil dataset corresponding to a soil horizon. The first step was outlier detection. Values larger than three standard deviations from the mean value were regarded outliers and removed from the dataset. Two development approaches were tested: one based on stepwise forward regression (SFR) and the other based on canonical correlation analysis (CCA). The CCA method requires that each soil sample contains all four predictors (clay, silt, OC, and CF) along with the four response variables (θ 1500 , θ 33 , ρ b , and K sat ). To be consistent in both CCA and SFR development methods, only the soil samples having these four predictors and four response variables were kept. In statistical regressions, predictors and response variables must be normally distributed. In order to respect the normality hypothesis, some variables were transformed using the Box-Cox algorithm. The Box-Cox applies a power transformation λ, which maximizes a log-likelihood function (Equation (1)). When the value of λ is 1, a logarithm transformation is applied (Equation (2)).
A correlation study was performed on both predictors and response variables. A strong correlation between predictors indicates that the information is redundant. Adding highly correlated predictors to a PTF will not improve its prediction potential. By contrast, a strong correlation between a predictor and a response variable indicates that selection of the predictor will improve the PTF. Correlation between response variables indicates the linearity of these secondary soil properties.

Stepwise Forward Regression (SFR)
SFR is a commonly used method in statistical regression to identify the most significant predictors when estimating a response variable. The selection of predictors is based on entrance and exit thresholds, which are set according to the p-value of regression coefficients. The p-value is computed to decide whether a given input variable should be considered as a predictor or not. This p-value must be lower than the entrance significance threshold; otherwise, the predictor is rejected. Each time a new predictor is accepted in the regression, the p-values of all previously accepted predictors are recomputed; predictors are retained only if their new values are lower than the exit threshold. Significance levels are fixed at 5% for entrance and exit from the regression model. The method is applied to each response variable using the training dataset.

Canonical Correlation Analysis (CCA)
The objective of a CCA is to transform predictor (x) and response (y) variables using linear combinations in two sets of canonical variables, U j and V j respectively. The parameters are calculated such that the correlation between canonical variables U j and V j is maximal. However, the internal correlation of the different components of each canonical variable is minimal. Canonical variables are generated using canonical coefficients (a ij and b ij ) and mean-centered variables (see Equations (3) and (4)).
where x i is an observed value of a predictor, x i is the average value of the predictor, y i is an observed value of a response variable, and y i is the average value of the response variable.
One of the attractive features of CCA is that canonical variables enable grouping redundant properties into a single component. Each canonical variable related to predictors (U j ) contains the maximum amount of information available having an optimal correlation with the canonical variable related to the response variables (V j ) [50]. However, in the context of the PTF development, the goal is not to generate canonical response variables V j , but rather to predict response variables. Nevertheless, if there are interconnections between secondary and primary soil properties, it can be useful to perform a CCA with predictor and response variables and a multiple regression using U j as predictors of the response variables. In doing so, a maximum of information derived from the predictors is translated into U j canonical variables, which are then optimized according to the V j canonical variables computed from the response variables. It has been demonstrated that performing a regression on a system of canonical variables gives satisfactory results when used in an estimation process in the presence of collinearity [51].
The CCA-PTF development method is depicted in Figure 2. The first step of this method is to conduct a CCA with predictors x i (clay, silt, OC, and CF) and response variables y i (θ 1500 , θ 33 , ρ b , and K sat ) according to a training dataset for a given soil horizon. The next step is to keep the U j canonical variables and their canonical coefficients a ij . V j canonical variables and their canonical coefficients b ij are not used. As mentioned before, each U j has a maximum correlation with its corresponding V j calculated from the response variables, y i . A multiple regression is then performed for each of the response variables and U j variables from the training dataset. Only U j variables with a significant regression coefficient are retained in the regression (the coefficient must differ from 0 with a significance p-value of 5%). Regressions are then recomputed for all other response variables from the training dataset. The next step is to produce a set of canonical variables using predictors x i from a validation dataset, as well as the previously generated mean predictors and canonical coefficients a ij . Regression parameters and previously determined intercepts (from the training dataset) are then used with the new canonical variables to estimate the response variable y i . Finally, accuracy of the results is assessed by comparing the estimated response variables and response variables from the validation dataset.

Accuracy Assessment
To evaluate the estimation quality of the developed PTFs, the determination coefficient (R 2 ), root-mean-square error (RMSE), bias, and Nash-Sutcliffe efficiency (NSE) index [52] were calculated for each secondary soil property during the accuracy assessment procedure. The RMSE (Equation (5)) quantifies the contribution of systematic and random errors expressed in measurement units, and the bias (Equation (6)) quantifies the systematic error (over-or underestimation), also expressed in measurement units [53]. The NSE (Equation (7)) is used to characterize the goodness of fit of a model. NSE values can range from −∞ to 1. An NSE value equal to 1 indicates perfect modeling, while a value below 0 means that the average of measured values is a better predictor than the model predictions. When the NSE value is equal to 0, the performance of both predictors is similar. A classification of the NSE applied to the PTFs is detailed in Table 1. A cross-validation procedure was conducted on the developed PTFs. Each horizon dataset was randomly split into two datasets: one for training and one for validation [54]. A ratio of 1: 4 was used to randomly split each horizon dataset into a training dataset (75%) and a validation dataset (25%). Minimum and maximum values for each response variable were assigned to the training dataset to prevent extrapolation. This procedure was repeated 10,000 times from the original horizon dataset. At each loop, a PTF was calculated from the training dataset and then applied on the validation dataset. Each metric (R 2 , RMSE, bias, and NSE) was calculated from the estimated and the measured soil physical property of the validation dataset. At the end of the final iteration, the average and confidence intervals of each metric were calculated (probability = 95%). The existing PTFs were also evaluated at each horizon using the corresponding dataset. The same metrics were calculated but in a context of independent validation.
where E is the estimated value, M is the measured value, and N is the number of samples.

Data Exploration
Means and coefficients of variation (CV) were calculated for each soil parameter and horizon dataset ( Table 2). The mean percentage of OC, θ 1500 , and K sat decreased with soil depth (from the A to the C horizon), while ρ b increased from the A to the C horizon. OC is the result of plant decomposition and manure application, which take place within the top 20 cm soil layer. Soil compaction increases with depth ρ b and reduces the porosity, thus determining the available space for water. It should be noted that CVs were generally lower than 1, except for CF and K sat , which often show very high spatial variability at the field scale [55]. Statistical analysis of soil parameters was conducted on normally distributed data. The data transformations applied to each variable (when required) are shown in Table 3. Distributions, scatter plots, and Pearson's correlation coefficients of both primary and secondary soil properties are illustrated in a matrix form for each soil horizon in Figure 3. The cross-correlation coefficients obtained between predictors were below 0.5, absolute value, at all soil horizons. However, these correlation coefficients were often significant, which means that certain predictors were collinear (i.e., silt, clay, etc.). Table 3. Normal distribution transformations of soil properties.

Soil Properties A Horizon B Horizon C Horizon
Primary Most of the highest correlation coefficients, in absolute values, were obtained between predictors and response variables. θ 33 was positively correlated with silt (0.60, 0.49, and 0.50) and clay (0.52, 0.80, and 0.80), at the A, B, and C horizons, respectively. θ 1500 was also positively correlated with silt (0.60, 0.46, and 0.51) and clay (0.76, 0.86, and 0.89), at the A, B, and C horizons, respectively. Clay content has an impact on water retention properties, as water content tends to be greater in soils that have a high clay percentage. A negative correlation was observed between ρ b and OC percentage (−0.56, −0.52, and −0.55) at the A, B, and C horizons, respectively. In the C horizon, clay percentage was negatively correlated to ρ b (−0.52). K sat had weak correlation coefficients with all predictors for the A horizon. However, negative correlations of −0.44 and −0.56 with silt percentage were observed at the B and C horizons, respectively. The effect of CF on K sat can be either negative or positive, depending on CF location. When CFs are on the soil surface, they increase infiltration by preventing the soil from sealing, but they reduce infiltration when contained within the soil layer [56][57][58]. The relationship between OC and K sat can also vary. Some authors argue that OC increases K sat by improving soil structure [59], while others conclude that it decreases K sat because OC retains water, its aggregates increase tortuosity, and the quality/kind of organic matter may affect hydraulic properties [23]. Table 4 shows the regression coefficients obtained using the SFR method. Standardized coefficients show the weight accorded to predictors in the developed PTF (coefficients without unit effect measurement). Table 4. Standardized (St. *) and non-standardized (Non st. **) regression coefficients obtained by stepwise forward regression.

Response
Variables For the estimation of θ 33 , the A horizon PTF used the following predictors: OC, silt, and clay percentages, listed by decreasing weight. The PTFs developed for the B and C horizons had the highest weight of clay, followed by CF percentage, and small weights of silt and OC. The standardized regression coefficients of OC were stronger for the A horizon, which is explained by the abundance of OC on the soil surface.
In the case of the estimation of θ 1500 , clay and silt, followed by OC, were retained as predictors for the A horizon. Coefficient values were positive, indicating that increases in these properties correspond to increases in θ 1500 . At the B and C horizons, clay percentage coefficients were broadly higher, followed by silt percentage. As shown in Figure 3, the correlation between θ 1500 and clay was the highest at each soil horizon. Increasing clay content increases the soil water-holding capacity.
In the case of the estimation of ρ b , a negative weight was given to OC and clay content for all horizons. As mentioned above, a negative correlation was observed between ρ b and these predictors. At the B horizon a moderate positive weight was also given to silt content. At the C horizon, moderate positive weights were given to CF and silt percentages.
The predictors selected for the estimation of K sat were different for each horizon (Table 4). At the A horizon, only CF and OC were selected with a positive weight. OC was also given a positive weight at the B horizon, followed by silt with a negative weight, and CF with a positive weight. At the C horizon, silt was selected with a negative weight, followed by CF (positive weight) and OC (negative weight).

Canonical Correlation Analysis-Based Pedotransfer Functions
Correlations between canonical variables U and V are presented in Table 5. As expected, the correlation decreased from the first to the fourth canonical variable. The weights accorded by the PTF to a predictor were determined using a combination of the weight of the predictors related to the canonical variables (Table 6) and the weight given to the canonical variables obtained by regression (Table 7) between response variables. Because of the scale effect, it is recommended to use correlation coefficients to describe the contribution of a predictor to a canonical variable, instead of interpreting canonical coefficients a ij [60]. Canonical coefficients are only used to calculate canonical variables.

Contribution of Predictors to Canonical Variables
At the A horizon, the predictors making the highest contribution to U 1 were OC and clay percentages, followed by silt with a moderate correlation coefficient ( Table 6). The main contributions to U 1 at the B and C horizons came from the clay percentage followed by the silt percentage and OC. The OC contribution to the B and C horizons was weak compared to its contribution to the A horizon. The dominant contributions to U 2 at the A horizon (highest negative correlation) came from OC and clay content, with a similar contribution but with positive value (Table 6). OC was also the predictor with the highest correlation at the B horizon, but with a slightly lower and positive value. Silt percentage made a moderate negative contribution. At the C horizon, the contributor with the highest correlation was silt, followed by OC, with a similar absolute contribution. Thus, U 2 is essentially explained by the OC contribution, although, depending on the horizon, clay and silt contribute to that canonical variable. For all soil horizons, the predictor with the highest correlation with U 3 was CF (Table 6). It should be noted that a negative correlation was observed with both the A and B horizons. It was not possible, regarding U 4 , to identify one or two predictors that were applicable to all three horizons. In terms of predictive power, the first canonical variable U 1 , showed the highest correlation with most response variables, in all horizons. The strength of correlation then decreased from U 2 to U 4 with most response variables.

Regressions Using Canonical Variables as Predictors
For the estimation of θ 33 , U 1 was the only canonical variable selected for all horizons, with correlation coefficients varying between 0.75 and 0.84 (Table 6). U 1 was essentially defined by clay and silt percentages in these horizons, but OC also made a strong contribution in the A horizon. The weights given to the predictors were similar to those obtained with the SFR approach. The results of the regression used to estimate θ 1500 were similar to those estimating θ 33 , in terms of predictor weight (Table 6) and regression coefficients of the canonical variables (Table 7). A strong correlation between θ 1500 and θ 33 was previously observed (Figure 3). U 1 was strongly correlated with θ 1500 (Table 6), which explains why it was once again retained in the regression (Table 7). Clay also had a large impact on these functions. U 2 was selected for the A horizon, which was mostly correlated with clay and OC contents. Again, the resulting PTF was similar to the one developed with the SFR approach.
For each horizon, U 1 and U 2 were selected to estimate ρ b ( Table 7). As previously mentioned, U 1 was explained by clay, silt, and OC percentages, while U 2 was mostly explained by OC, followed by silt percentage for B and C horizons. OC was more important in the A horizon and decreased with increasing soil depth. Multiplication of the correlation coefficient (Table 6) by the regression coefficients (Table 7) gives the effect of a predictor on a response variable. Negative weights were given to clay and OC percentages. Silt had a negative effect at the A horizon and a positive effect at the B and C horizons. Organic matter has a lower ρ b than mineral material; thus, the overall ρ b is reduced when the organic matter percentage in mineral soil increases. This could explain the negative effect of OC in both CCA-and SFR-derived PTFs for ρ b prediction. The negative contribution of clay in PTFs predicting ρ b was also observed by Jones [61].
In the case of K sat estimation, the correlation with canonical variables was different for each horizon. For the A horizon, K sat was slightly correlated to U 3 (negatively; Table 6). The regression between K sat and its canonical variables gave a negative weight to U 3 ( Table 7). CF was the main predictor for U 3 (negative weight), followed by silt (positive weight) and OC (negative weight). The resulting combination of correlation and weight showed the positive effects of OC and CF, and the negative effect of silt on K sat . These results are consistent with the previously conducted cross-correlation analysis (Figure 3). For the B horizon, the regression selected U 2 and U 1 with positive weights (Table 7). U 2 was correlated with K sat , followed by U 1 (moderately). As a result, OC was the most contributing predictor (positive effect), followed by silt and clay (negative), and CF (small positive). These results are consistent with those of the cross-correlation analysis (Figure 3), where K sat was positively correlated with OC and CF, and negatively correlated with silt and clay percentages. For the C horizon, U 3 was selected as a predictor with positive weight, while U 2 and U 1 were selected with negative weights. The correlation between U 1 and K sat was similar to that obtained with U 2 in absolute terms. However, the correlation obtained with U 3 was weak ( Table 6). The negative coefficients for U 1 and U 2 indicate that clay and silt made a negative contribution to this PTF, while the effect was negative for OC. The positive correlation coefficient for U 3 was explained by a positive effect of CF.
Neither the SFR nor the CCA method selected identical predictors at each horizon. However, it is well known that higher clay-to-silt percentages increase soil water retention and that higher CF reduces this property [62]. In this study, increased OC content led to increased K sat . However, it has been demonstrated that the effect of clay is positive when its proportion is less than 30%, but varied and more complex when the proportion is higher [63].  Accuracy assessment plots of the PTFs developed to estimate θ 33 are presented in Figure 4a. PTFs'S&R.2006 for A and C horizons had a lower R 2 than both the SFR-and the CCA-derived PTFs. However, they also had the best R 2 for the B horizon. Nevertheless, PTFs'S&R.2006 were more erroneous in terms of RMSE and bias. Estimation quality for θ 33 was very similar with both the SFR and the CCA methods (R 2 values ranging from 0.53 to 0.70). In terms of NSE, the performance of PTFs'S&R.2006 was unsatisfactory with a negative value for the A horizon, satisfactory for the B horizon (moderately), and unsatisfactory for the C horizon (weak). The SFR-and the CCA-derived PTFs performed equally well with a moderate NSE for the A horizon and a good NSE for the B and C horizons. Accuracy assessment plots of the PTFs developed to estimate θ 1500 are illustrated in Figure 4b. The same pattern as that observed with PTFs'S&R.2006 used to predict θ 33 was found with θ 1500 . In comparison with θ 33 , the performance of θ 1500 PTFs was slightly higher. The R 2 values were good, especially for the B and C horizons. In terms of NSE, the performance of PTFs'S&R.2006 was unsatisfactory for the A horizon (weak) and satisfactory for the B and C horizons (moderate). The SFR-derived PTFs outperformed the CCA-derived PTFs; however, for both methods, the NSE was qualified as good for all horizons.

Validation and Comparison with the Saxton and Rawls's PTFs
Accuracy assessment plots of the PTFs developed to estimate ρ b are illustrated in Figure 4c. PTFs'S&R.2006 gave the best ρ b prediction for the A horizon, with the highest R 2 and lowest RMSE (0.48 and 0.15 g·cm −3 ). For this horizon, the SFR-and CCA-derived PTFs led to the same R 2 and RMSE values (0.28 and 0.15 g·cm −3 , respectively). Both methods also generated the same R 2 and RMSE values at the B and C horizons. In terms of NSE, the performance of PTFs'S&R.2006 was satisfactory (moderate) for the A horizon and unsatisfactory (weak) for the B and C horizons. The SFR-and the CCA-derived PTFs performed equally well with a weak NSE for the A and B horizons and a moderate NSE for the C horizon.
Accuracy assessment plots of the PTFs generated to estimate K sat are illustrated in Figure 4d. At the A horizon, the SFR method performed better than the CCA method. In comparison to PTFs'S&R.2006 and the CCA-derived PTF, the SFR-derived PTF had a higher R 2 and a lower RMSE. However, its estimation quality remained weak (R 2 of 0.15 and RMSE of 30.7 cm·h −1 ). The estimation quality of PTFs'S&R.2006 was poor for the B horizon. For this horizon, both CCA-and SFR-derived PTFs were slightly more accurate (R 2 of 0.42 and RMSE of 22.9 cm·h −1 for SFR-PTFs). At the C horizon, results were similar to those of the B horizon, but with lower RMSE (12.3 cm·h −1 ). In terms of NSE, PTFs'S&R.2006 were unsatisfactory for all horizons (negative values). The same conclusion was noted for both SFR-and CCA-derived PTFs, but with weak positive values for the B horizon. The performance remained unsatisfactory. The SFR-derived PTFs slightly outperformed the CCA-derived PTFs; however, for both methods, the NSE was qualified as unacceptable for the A and C horizons and weak for the B horizon.

Discussion
The above results demonstrate the potential of the primary soil properties (clay, silt, OC, and CF percentages) to estimate the secondary soil properties (θ 1500 , θ 33 , ρ b , and K sat ) for different horizons (A, B, and C) with varying accuracy rates. The hypothesis tested in this study was the ability of locally trained PTFs, using SFR and CCA algorithms, to produce more accurate estimates than the PTFs'S&R.2006 that were trained with global soil data. As expected, our results highlighted that, in most cases, locally trained PTFs achieved the best accuracy in estimating secondary soil properties.
In the case of θ 33 , PTFs'S&R.2006 results were systematically underestimated. This underestimation increased in the deeper horizons (soil depth), which is unsurprising, since the PTFs'S&R.2006 were developed using A horizon soil samples. Both the SFR-and CCA-derived PTFs had moderate performances for the A horizon and good performances for the B and C horizons. Pollacco [64] evaluated the performance of eight different PTFs to predict θ 33 . The RMSE values for these PTFs ranged from 5.7% to 11.1%, whereas they ranged from 4.3% to 7.4% for the PTFs developed in the present study, which is considered acceptable. These results are comparable to the θ 33 accuracy assessment results found in the literature [2,28,65].
In the case of θ 1500 , the PTFs'S&R.2006 were again less accurate with a lower R 2 and higher RMSE and bias than the SFR-and CCA-derived PTFs for the A horizon. Methods developed using SFR and CCA showed similar performance. The SFR method produced slightly higher R 2 and lower RMSE and bias at the A and C horizons. The performance of the two methods was identical at the B horizon. In fact, the clay percentage was strongly correlated with θ 1500 (Figure 3). The reason is that the SFR method selected clay as a predictor, while the CCA method selected the canonical variable to which clay was the major contributor. Both SFR-and CCA-derived PTFs were satisfactory (good) for all horizons. Again, RMSE values obtained in this study (ranging from 3.2% to 5.0%) outperformed the RMSE values obtained by Pollacco [64] (ranging from 4.7% to 7.5%). These results are acceptable and similar to the θ 1500 results found in the literature [2,28,65].
In the case of ρ b estimation, PTFs'S&R.2006 were well adapted to the A horizon. This is likely because the PTFs'S&R.2006 use volumetric water content at saturation (obtained by a PTF) to generate normal density, which is then used with CF to predict ρ b , and they were originally calibrated using the A horizon data. At the B horizon, in terms of estimation errors, our PTFs were less erroneous than the PTFs'S&R.2006. The latter produced the highest R 2 (0.47) at the B horizon, but it also showed an important RMSE value (0.21 g·cm −3 ). Both the SFR-and CCA-derived PTFs outperformed PTFs'S&R.2006 at the C horizon in terms of R 2 and RMSE (0.53 vs. 0.40 and 0.18 vs. 0.28 g·cm −3 ). A comparison with PTFs available in the literature [2,66,67] suggests that RMSE should range between 0.13 and 0.23 g·cm −3 , which was the case.
In the case of K sat , the estimation quality was poor for all horizons. The SFR method slightly outperformed the CAA method at the B horizon, and both methods performed poorly for the A and C horizons yet showed higher accuracy than PTFs'S&R.2006 estimates. In fact, the soil of the A horizon is frequently disturbed by tillage, plant root penetration, and field alterations that modify soil structure. This variation in soil structure could explain the high variability observed in saturated soil hydraulic conductivity (Table 2), which is not explained by soil texture and OC alone. In addition, K sat and CF showed great variability; consequently, averages were less representative for these properties than for other soil properties. Jorda et al. [26] found that the most influential predictor for K sat was land use. They noted a difference between samples in conventional agricultural sites and non-tilled sites. These results demonstrate the importance of soil structure to K sat estimations. The lack of selected predictors that relate to soil structure might explain the difficulty to estimate K sat . Prediction of this physical property could, however, be improved by adding morphological predictors such as soil structure and drainage information.
Overall, the SFR-derived PTFs were equal to or more accurate than the CCA-derived PTFs and were more accurate than the PTFs'S&R.2006. In fact, the major difference between CCA and SFR methods is that CCA always uses all available predictors to develop its canonical variables, whereas the SFR method uses a selection strategy to determine explanatory predictors. The use of all predictors may introduce noise, which explains the observed difference in performance between the two developed PTFs. Even in the few cases where the CCA-derived PTFs outperformed the SFR-derived PTFs (θ 33 for horizon C, for example (Table 8)), we still recommend using the SFR-derived PTF because of the small significant difference in results and its easier execution.

Conclusions
In this study, four secondary soil properties-bulk density (ρ b ), saturated hydraulic conductivity (K sat ), and volumetric water content (θ) measured at two matric potentials, −33 kPa (field capacity (θ 33 )) and −1500 kPa (permanent wilting point (θ 1500 ))-were estimated for A, B, and C horizons for agricultural areas of southern Quebec, Canada. Estimates were performed using existing functions from Saxton and Rawls's PTFs (2006) and new PTFs trained using the stepwise forward regression (SFR) and canonical correlation analysis (CCA) algorithms. Primary soil properties (clay, silt, organic carbon, and coarse fragment percentages) were used as inputs for estimating the secondary soil properties. All PTFs (equations are available in Appendix A) were assessed using the cross-validation technique from which the R 2 , Nash-Sutcliffe efficiency (NSE), root-mean-square error (RMSE), and bias were generated. Except for ρ b for A and B horizons that showed higher accuracy in terms of NSE and R 2 and equal accuracy in terms of RMSE and bias, all the other physical secondary soil properties estimated using either SFR-or CCA-derived PTFs were more accurate, particularly for θ 33 and θ 1500 . According to the NSE index, θ 1500 showed the best performance (qualified as good), followed by θ 33 (qualified as moderate to good) for the different horizons. The NSE index for ρ b was qualified as low, while the NSE index for K sat was qualified as unacceptable to low for the best performances for the different horizons. It is, thus, recommended to retrain the last two soil properties using other morphological predictors such as soil structure and drainage information before considering their use. Overall, the SFR method showed equal to or better performance than the CCA method.