Linear Multi-Task Learning for Predicting Soil Properties Using Field Spectroscopy

: Field spectroscopy has been suggested to be an efﬁcient method for predicting soil properties using quantitative mathematical models in a rapid and non-destructive manner. Traditional multivariate regression algorithms usually regard the modeling of each soil property as a single task, which means only one response variable is considered as the output during modeling. Therefore, these algorithms are less suitable for the prediction of several key soil properties with low concentrations or unobvious spectral absorption signals. In the current study, we investigated the performance of a linear multi-task learning (LMTL) algorithm based on a regularized dirty model for modeling and predicting several key soil properties using ﬁeld spectroscopy (350–2500 nm) as an integrated approach. We tested seven key soil properties including available nitrogen (N), phosphorus (P) and potassium (K), pH, water content (WC), organic matter (OM), and electrical conductivity (EC) in drylands. The model performances of LMTL models were compared with the commonly used single-task algorithm of the partial least squares regression (PLS-R). Our results show that the LMTL models outperformed the PLS-R models with the advantage of shared features; the ratio of performance to deviation (RPD) values in the validation set improved by 10.24%, 4.93%, 25.77%, 11.76%, 6.74%, 53.13%, and 3.15% for N, P, K, pH, WC, OM, and EC, respectively. The best prediction was obtained for OM with RPD = 2.29, indicating high accuracy (RPD > 2). The prediction results of N, P, WC, and pH were categorized as of moderate accuracy (1.4 < RPD < 2), while K and EC were categorized as of poor accuracy (RPD < 1.4). However, the explanatory power of the LMTL models was moderate due to fewer features being selected by the regularization algorithm of the LMTL approach, which should be further studied in the soil spectral analysis. Our results highlight the use of LMTL in ﬁeld spectroscopy analysis that can improve the generalization performance of regression models for predicting soil properties.


Introduction
The assessment and monitoring key soil properties are important processes for quantifying soil quality and developing tools for soil management in general and precision agriculture in particular. Conventional laboratory methods for detecting soil properties and quality are expensive and time-consuming. An alternative approach, namely reflectance spectroscopy, has been proposed as The overarching aim of the study was to explore the performance of (linear multi-task learning) LMTL algorithms based on a regularized dirty model (composed of shared and non-shared features) for modeling and predicting several key soil properties using field spectroscopy (350-2500 nm). The specific objectives are three-fold: (1) investigate the predictive ability of field spectroscopy to predict soil N, P, K, WC, pH, EC, and OM in dryland regions; (2) compare the prediction performance of LMTL via a regularized dirty model with traditional single-task learning via the PLS-R; and (3) study the shared features between different soil properties in LMTL as compared to single-task learning.

Study Area
The soil samples were collected from four small watersheds located in the central Negev Highlands of Israel (30 • 54 N, 34 • 49 E). The mean annual rainfall in this area is 95 mm and is limited to the winter season, with a high annual variability (ranging from 20-180 mm). The study site included watersheds containing runoff harvesting systems (RHSs), which are used for increasing agricultural production or for developing afforestation systems in drylands. RHSs are designed to collect runoff water and nutrients from small rocky watersheds into ponds bounded by soil dikes (termed limans) that are used as afforestation groves. Geologically, the area is composed of limestone and chalk of the Turonian age. The hillslopes are relatively steep (up to 29 degrees) and subdivided into two distinct sections: (1) the upper parts are mainly barren, with steep limestone rocky outcrops and shallow patches of soil cover; and (2) the lower parts consist of colluvium embedded with unconsolidated rocks [34,35]. A similar subdivision is also observed along the channels. The upper part of the channel is rocky while the lower part is covered with an alluvial fill [34]. Lithology across the study site consists of limestone dominantly covering the area, frequently mixed with dolomite, chalk and marl. The stream channels characterized by loessial soils, and are composed of clay, silt, and gravel alluvial soil. In general, the RHSs are located in the downstream area of the watershed where there is a relatively high volume of alluvial soil. In the current study, the differences in soil properties at different locations resulted from the proximity to the RHSs (upstream, downstream, and RHS).

Soil Field Spectroscopy Measurements
The soil spectra were acquired under field conditions (undisturbed samples) before soil sampling with the portable analytical spectral device (ASD) Field Spec ® Pro spectrometer, with a 25 • field of view. During the field campaigns, the skies were clear, and the soil spectral samples were taken on exposed bare soil. The field spectral measurement was vertical in relation to the soil surface using a bare fiber with a height interval of about 1 m, and exposed to the sun as illumination. The instrument was repeatedly calibrated to spectral reflectance using a standard white reference panel (Spectralon Labsphere Inc., North Sutton, NH, USA). To reduce spectral noise and the effects of micro-topography shadowing, four spectral readings for each soil sample were measured and averaged to a final value representing the field sample. The ASD covers a spectral range of 350-2500 nm with a spectral resolution that varies from 3 nm in the VIS-NIR range to 8-10 nm in the SWIR range. We resampled the ASD's spectral band to 5 nm uniformly along the entire spectral region.

Soil Sampling and Physicochemical Lab Analysis
The soil sampling included 10 replicates in each sampling location in the watershed in the upstream, downstream and RHS sites (n = 30 replicates in a watershed) with a total of 120 soil samples collected in September 2015, which is at the end of the dry season, at a depth of 0-0.15 m. All soil samples were transferred to the laboratory and were stored unopened at room temperature until they were analyzed. Soil was air dried, passed through a 2-mm sieve, and analyzed for soil physiochemical properties. The soil properties included the following: N was measured by potassium chloride extractions [36]; P was measured by the Olsen method [37]; K was measured by a flame photometer [38]; WC was measured by drying the soil in 105 • C; pH was measured in a saturation paste using a handheld portable probe; EC was measured in the extracts from the saturation paste by a handheld portable probe; and OM was measured by drying the soil for two hours at 500 • C. The results of the soil physiochemical properties were tested for their statistical variation. The outliers were determined by boxplot [39], and 12 samples with extremely large N or K concentrations beyond the upper whisker were removed. Thus, the remaining 108 samples were used for further study in this work.

Spectral Preprocessing and Transformations
We removed the low signal-to-noise ratio wavebands at both ends and the atmospheric water absorption wavebands ranging from 1350-1420 and 1800-1960 nm. The resultant reflectance spectrum of 355 wavebands (400-2400 nm) was henceforth used. In addition, several commonly used preprocessing methods and transformations were sequentially applied to the soil spectral reflectance in this study, including: the Savitzky-Golay filtering algorithm (SG) [40] with a second-order polynomial that was selected to smooth spectral reflectance; a standard normal variate (SNV) [41,42] that was performed to correct additive and multiplicative effects; and a first derivative (FD) that was conducted to remove the baseline and improve the linear trend [43,44].

Learning Algorithms
To compare the performance of multi-task and single-task algorithms, the regression models were built upon the same dataset. As a commonly used single-task algorithm, the PLS-R is particularly useful for predicting a set of dependent variables from a large set of independent variables [45]. To overcome the problem of collinearity between predictors, the PLS-R decomposed independent variables and dependent variables by linear combinations to extract latent variables (LVs, or components) and built the regression model based on the LVs instead of the original training variables [15]. To avoid overfitting or underfitting, a leave-one-out cross-validation was used to determine the number of LVs with the smallest mean squared error in calibration. The variable importance for projection (VIP) scores [46] obtained by PLS-R has been recognized as a useful measure to identify important wavelengths when the score is more than 1 [47,48].
Regularization is commonly used as a shrinkage method in least square linear regression modeling to avoid overfitting [49]. The main idea of multi-task learning in multiple linear regressions is to take advantage of the shared feature structure (block-sparse) between each task and model all the tasks simultaneously with L 1 /L q norm block-sparse regularization, particularly with L 1 /L ∞ [50,51]. Since non-shared features might have existed for several specific tasks, the regression coefficient matrix W (features × tasks) cannot fall cleanly into any one structural bracket. To overcome this, a dirty model was proposed to decompose the matrix W into a block-sparse matrix W b (corresponding to the shared features) and an elementwise sparse matrix W e (corresponding to the non-shared features); details can be found in [31]. The object function is: subject to : where X i is the spectral matrix of task i that is the same for each task, Y i is the predicted variable of task i, and λ b and λ e are regularization parameters to control the degree of penalty on W b and W e , respectively. For the calibration and validation of the regression models (multi-task learning and single-task learning), 108 soil samples were divided into two parts with a split ratio of 0.7 to 0.3, respectively, based on the Kennard-Stone algorithm [52], conducted on the preprocessed spectral matrix. Thus, 76 samples were selected as the calibration set (also used for cross-validation during training) and the Remote Sens. 2017, 9, 1099 5 of 19 remaining 32 as the validation set (independent testing for the established model). To avoid attributes in the higher numerical ranges dominating those in the lower numerical ranges, both the calibration and validation sets were standardized by mapping their mean and standard deviations to 0 and 1, respectively, before calibration and validation [53].

Accuracy Comparison
In this study, the prediction accuracy of the regression models was validated and compared with the ratio of performance to deviation (RPD) of the validation set that was calculated as: where m is the number of testing samples in the validation set, y i is the real value of sample i, f (X i ) is the predicted value of sample i, and y is the average value of y. According to [54], the following three categories of predictability were adopted: Category A (RPD > 2.0) with good accuracy; Category B (1.4 < RPD < 2.0) with moderate accuracy; and Category C (RPD < 1.4) with poor accuracy. The ratio between the interpretable sum squared deviation and the real sum squared deviation (SSR/SST), which was calculated as: was recognized as the proportion of the variability of the dependent variable explained by the regression model [55][56][57][58][59]. A good model should have both high RPD and SSR/SST [13]. Usually, the SSR/SST should be greater than 0.5 to ensure the model stability. All mathematical analysis methods mentioned above were conducted in MATLAB (MathWorks, Natick, MA, USA). The process of single-task learning and multi-task learning were carried out with MALSAR Version 1.1 [60]. Table 1 shows the descriptive statistics of the measured concentrations of the seven soil properties in the 108 samples used for calibrating and validating the regression models. The results showed high variation in the soil properties, indicating that the data could be used for the regression analysis. Table 2 shows the Pearson correlation coefficients (R) between different soil properties. The highest positive correlation was found between OM and WC (R = 0.74). Moderately positive correlations were found between N and P (R = 0.69), N and K (R = 0.58), P and K (R = 0.57), and OM and K (R = 0.51). In addition, it was found that pH has a low negative correlation with every other soil property.  Abbreviations used: available nitrogen (N), available phosphorous (P), available potassium (K), water content (WC), pH, electrical conductivity (EC), and organic matter (OM). Figure 1 shows the field spectral reflectance of 108 soil samples. The spectra show large variations between different samples that are caused by soil color, soil composition, water content, particle size, and the coverings on the soil surface, such as residual dry vegetation, rock particles, and mineral deposits [61]. The obvious spectral signatures near 930 nm may be related to the absorption of hydroxyl in Fe oxides [25], the ones near 940 nm, 1150 nm and 1450 nm may be influenced by the absorption of the atmospheric water content [62]; the ones near 1765 nm may be related to the signal jump of the spectral instrument; the ones near 2205 nm may be related to the absorption of Al-OH [25]. The spectral signatures ranging from 2300-2400 nm may be related to several other clay minerals and soil organics [63].  Abbreviations used: available nitrogen (N), available phosphorous (P), available potassium (K), water content (WC), pH, electrical conductivity (EC), and organic matter (OM). Figure 1 shows the field spectral reflectance of 108 soil samples. The spectra show large variations between different samples that are caused by soil color, soil composition, water content, particle size, and the coverings on the soil surface, such as residual dry vegetation, rock particles, and mineral deposits [61]. The obvious spectral signatures near 930 nm may be related to the absorption of hydroxyl in Fe oxides [25], the ones near 940 nm, 1150 nm and 1450 nm may be influenced by the absorption of the atmospheric water content [62]; the ones near 1765 nm may be related to the signal jump of the spectral instrument; the ones near 2205 nm may be related to the absorption of Al-OH [25]. The spectral signatures ranging from 2300-2400 nm may be related to several other clay minerals and soil organics [63].  Table 3 shows the prediction results of the PLS-R models for the seven soil properties. The numbers of LVs were determined by leave-one-out cross-validations, which are also shown in Table  3. Among all soil properties, OM was the most accurately predicted with a RPD = 2.22, and with a  Table 3 shows the prediction results of the PLS-R models for the seven soil properties. The numbers of LVs were determined by leave-one-out cross-validations, which are also shown in Table 3. Among all soil properties, OM was the most accurately predicted with a RPD = 2.22, and with a model prediction accuracy categorized as A (RPD > 2). The prediction accuracies of P, WC, and pH were categorized as B (1.4 < RPD < 2) with RPD = 1.42, 1.53, and 1.78, respectively, indicating moderate prediction ability. All properties in Category A or B achieved SSR/SST values that were more than 0.5, confirming the model stability for predicting P, WC, pH, and OM. However, the prediction accuracies of N, K, and EC were categorized as C (RPD < 1.4) with RPD = 1.27, 0.97, and 0.64, respectively, demonstrating poor prediction ability. In addition, it is worth mentioning that the SSR/SST of N was 0.66, but the SSR/SSTs of K and EC were over 1 and related to low prediction accuracy and overfitting of the models. Table 3. The used parameters and prediction results of the partial least squares regression (PLS-R) and linear multi-task learning (LMTL) models for predicting seven soil properties.  3.2.2. Feature Importance in PLS-R Figure 2 shows the distribution of the VIP scores of different soil properties over the entire wavelength range (400-2400 nm). We used all of the 355 wavebands in the VNIR/SWIR region in the PLS-R models, which was necessary to evaluate the feature importance in the prediction of each soil property. We identified four feature-block regions with high importance existing in the entire VNIR/SWIR region whose properties were associated with soil OM, WC, clay minerals and Fe oxides [64]. The first region, ranging from 410 to 650 nm, is mostly related to the Fe oxides [19,25], with the 560 nm waveband related to OM [24,65]. The second region ranges from 850 to 1075 nm; of this, the 850-930 nm range is related to hydroxyl in Fe oxides [19,25], the 970 nm waveband is related to the soil water absorption waveband [66], the 1010 nm waveband is a hydrate-related absorption feature [67], and the 1025-1075 nm range is mostly related to the electronic transition bands of Fe 2+ or Fe 3+ [19]. The third region, ranging from 1530 to 1770 nm, is almost entirely related to soil organics [19,25]. The fourth region, ranging from 2005 to 2400 nm, may be connected to water, organics and clay minerals [19,25,68].

Prediction Results
Besides the four feature-block regions, several important individual wavebands with significantly high VIP scores still could be seen in Figure 2. The 730 nm waveband in the prediction of K, WC and EC may be related to the sensitive absorption of soil salinity [67]. The 805 nm waveband is related to the red-edge, which is known to be sensitive to biomass [69], and the predictions of P, K, pH and OM were found to be correlated. The 1120 and 1155 nm wavebands may be related to the v S-O stretching bands of sulfate [70], which can be used to predict P and K. The 1225 nm [71] and 1315 nm [72] wavebands were found to be important bands for predicting soil organics; here, several properties are correlated. The 1450 nm waveband in the prediction of OM may be related to the carboxylic acids in organics [25].
Remote Sens. 2017, 9, 1099 8 of 19 pH and OM were found to be correlated. The 1120 and 1155 nm wavebands may be related to the v S-O stretching bands of sulfate [70], which can be used to predict P and K. The 1225 nm [71] and 1315 nm [72] wavebands were found to be important bands for predicting soil organics; here, several properties are correlated. The 1450 nm waveband in the prediction of OM may be related to the carboxylic acids in organics [25].

Effects of Regularization Parameters on Modeling
The range of λ b was set to 0-200 with a gradient of 10, and the range of λ e was set to 0-50 with a gradient of 1. Figure 3 shows the sparsity (the number of non-zero elements in the regression coefficients) of the block-sparse matrix (W b ), the elementwise sparse matrix (W e ), and the combined regression coefficients matrix (W) of the regression model generated from LMTL for predicting N when changing λ b and λ e ; other properties had similar characteristics (Supplementary Material, Figure S1) and can be seen in the Appendix. According to Figure 3, the sparsity of W b decreased significantly with increasing λ b , especially in the low λ b values ranging from 0 to 75. The sparsity of W e decreased significantly with increasing λ e , especially in the low λ e values ranging from 0 to 3.5. However, the sparsity of W b was also affected by λ e in the range of 0-20, and the sparsity of W e was also affected by λ b in the range of 0-10. Therefore, λ b and λ e controlled the degree of penalty on W b and W e , respectively, but also interacted with each other. Both regularization parameters greatly affected the sparsity of W. The degree of effects gradually weakened with the increase of λ b or λ e .

Effects of Regularization Parameters on Modeling
The range of was set to 0-200 with a gradient of 10, and the range of was set to 0-50 with a gradient of 1. Figure 3 shows the sparsity (the number of non-zero elements in the regression coefficients) of the block-sparse matrix ( ), the elementwise sparse matrix ( ), and the combined regression coefficients matrix ( ) of the regression model generated from LMTL for predicting N when changing and ; other properties had similar characteristics (Supplementary Material, Figure S1)and can be seen in the Appendix. According to Figure 3, the sparsity of decreased significantly with increasing , especially in the low values ranging from 0 to 75. The sparsity of decreased significantly with increasing , especially in the low values ranging from 0 to 3.5. However, the sparsity of was also affected by in the range of 0-20, and the sparsity of was also affected by in the range of 0-10. Therefore, and controlled the degree of penalty on and , respectively, but also interacted with each other. Both regularization parameters greatly affected the sparsity of . The degree of effects gradually weakened with the increase of or .
(a) (b) (c)  Figure 4 shows the RPD and SSR/SST performance of the LMTL prediction models for predicting the seven soil properties of the validation set when changing and . According to Figure 4, the SSR/SSTs of all the properties showed an obviously decreasing trend with increasing and , which is similar to the changing characteristics of model sparsity shown in Figure 3c. The RPD performances generally increased with and in the low-value range and decreased in the highvalue range. The aim of the object function of LMTL is to minimize the overall squared error of the seven soil properties (see Equation (2)). Thus, obtaining the individual maximum RPD for every soil property simultaneously is a difficult task. The locations of the high RPDs and the decreasing trends of N, P, and OM are similar and are dependent on , representing the shared features. The locations of the high RPDs of WC, pH and EC are determined by both and , representing both the shared and non-shared features. The location of the high RPD of K is almost on the border of the regularization parameters, indicating that few features are used. This means that the best prediction of different properties may depend on the special combination of shared and non-shared features; thus, different combinations of and were selected to obtain the best RPDs for different properties under the condition of SSR/SST values higher than 0.5 (Table 3).  Figure 4 shows the RPD and SSR/SST performance of the LMTL prediction models for predicting the seven soil properties of the validation set when changing λ b and λ e . According to Figure 4, the SSR/SSTs of all the properties showed an obviously decreasing trend with increasing λ b and λ e , which is similar to the changing characteristics of model sparsity shown in Figure 3c. The RPD performances generally increased with λ b and λ e in the low-value range and decreased in the high-value range. The aim of the object function of LMTL is to minimize the overall squared error of the seven soil properties (see Equation (2)). Thus, obtaining the individual maximum RPD for every soil property simultaneously is a difficult task. The locations of the high RPDs and the decreasing trends of N, P, and OM are similar and are dependent on λ b , representing the shared features. The locations of the high RPDs of WC, pH and EC are determined by both λ b and λ e , representing both the shared and non-shared features. The location of the high RPD of K is almost on the border of the regularization parameters, indicating that few features are used. This means that the best prediction of different properties may depend on the special combination of shared and non-shared features; thus, different combinations of λ b and λ e were selected to obtain the best RPDs for different properties under the condition of SSR/SST values higher than 0.5 (Table 3).  (f) (g) Figure 4. The ratio of performance to deviation (RPD) (left) and the ratio between the interpretable sum squared deviation and the real sum squared deviation (SSR/SST) (right) performance of linear multi-task learning prediction models for predicting: available nitrogen (a); available phosphorous (b); available potassium (c); water content (d); pH (e); electrical conductivity (f); and organic matter (g) in a validation set when changing the regularization parameters ( and ). Table 3 shows the selected regularization parameters, the numbers of used features and the prediction results of LMTL models for predicting seven soil properties. Figure 5 shows the distributions of the features that were used in the models with different combinations of and (The distributions of the features with specific = 40 and = 10 could be seen in Supplementary Material, Figure S2). The regularization algorithm reduced the numbers of used features to a large extent and selected a few wavebands as optimal features, which made the distributions of the used features sparse. Nevertheless, the RPD performances of all the soil properties increased compared to the single-task models built by PLS-R (Table 3): N increased to 1.40 from 1.27; P increased to 1.49 from 1.42; K increased to 1.22 from 0.97; WC increased to 1.71 from 1.53; pH increased to 1.90 from 1.78; EC increased to 0.98 from 0.64; OM increased to 2.29 from 2.22. Consequently, the prediction accuracy of OM was categorized as A; Category B included pH, WC, P, and N; and K and EC were categorized as C. However, the SSR/SSTs of all properties expect P, decreased due to regularization (Table 3). Despite this, the SSR/SST results were still more than 0.5, which can confirm the model stability, except EC which had an inappropriately large value.

Prediction Results and Used Features
According to Figure 5, the used features of N, P, and OM were all in the block-sparse matrix, indicating shared features. Some of the used features of WC, pH, and EC were in the block-sparse matrix, and others were in the elementwise sparse matrix, indicating that both shared features and non-shared features were used for modeling. The used features of K were all in the elementwise sparse matrix, indicating non-shared features. These results agree with the analysis results in Figure  4. Shared features can be seen in Figure 5a, with most of them in the four feature-block regions. Out of these four regions, important features, such as 805, 1225, and 1315 nm, with large VIP scores that were used in the PLS-R were also recognized as shared features. Some less important features with small VIP scores, such as 1105, 1275, and 1965 nm (Figure 2), were also shared to provide more information. The distribution positions of the non-shared features used by K, WC, pH, and EC are Figure 4. The ratio of performance to deviation (RPD) (left) and the ratio between the interpretable sum squared deviation and the real sum squared deviation (SSR/SST) (right) performance of linear multi-task learning prediction models for predicting: available nitrogen (a); available phosphorous (b); available potassium (c); water content (d); pH (e); electrical conductivity (f); and organic matter (g) in a validation set when changing the regularization parameters (λ b and λ e ). Table 3 shows the selected regularization parameters, the numbers of used features and the prediction results of LMTL models for predicting seven soil properties. Figure 5 shows the distributions of the features that were used in the models with different combinations of λ b and λ e (The distributions of the features with specific λ b = 40 and λ e = 10 could be seen in Supplementary Material, Figure S2). The regularization algorithm reduced the numbers of used features to a large extent and selected a few wavebands as optimal features, which made the distributions of the used features sparse. Nevertheless, the RPD performances of all the soil properties increased compared to the single-task models built by PLS-R (Table 3): N increased to 1.40 from 1.27; P increased to 1.49 from 1.42; K increased to 1.22 from 0.97; WC increased to 1.71 from 1.53; pH increased to 1.90 from 1.78; EC increased to 0.98 from 0.64; OM increased to 2.29 from 2.22. Consequently, the prediction accuracy of OM was categorized as A; Category B included pH, WC, P, and N; and K and EC were categorized as C. However, the SSR/SSTs of all properties expect P, decreased due to regularization (Table 3). Despite this, the SSR/SST results were still more than 0.5, which can confirm the model stability, except EC which had an inappropriately large value.

Prediction Results and Used Features
According to Figure 5, the used features of N, P, and OM were all in the block-sparse matrix, indicating shared features. Some of the used features of WC, pH, and EC were in the block-sparse matrix, and others were in the elementwise sparse matrix, indicating that both shared features and non-shared features were used for modeling. The used features of K were all in the elementwise sparse matrix, indicating non-shared features. These results agree with the analysis results in Figure 4. Shared features can be seen in Figure 5a, with most of them in the four feature-block regions. Out of these four regions, important features, such as 805, 1225, and 1315 nm, with large VIP scores that were used in the PLS-R were also recognized as shared features. Some less important features with small VIP scores, such as 1105, 1275, and 1965 nm (Figure 2), were also shared to provide more information.
The distribution positions of the non-shared features used by K, WC, pH, and EC are shown in Figure 5b and are similar to the shared features in Figure 5a. These indicated that the predictions of the seven soil properties were correlated. However, because different regularization degrees of W b and W e were implemented for different models, the features were divided into either the block-sparse matrix or the elementwise sparse matrix.
Remote Sens. 2017, 9, 1099 12 of 19 shown in Figure 5b and are similar to the shared features in Figure 5a. These indicated that the predictions of the seven soil properties were correlated. However, because different regularization degrees of and were implemented for different models, the features were divided into either the block-sparse matrix or the elementwise sparse matrix.

Discussion
We investigated the performances of the LMTL algorithm for modeling several soil properties simultaneously using field VNIR/SWIR spectroscopy. To the best of our knowledge, most studies dealing with quantitative soil spectroscopy analysis focus on building regression models via individual single-task learning algorithms with only one response variable as the output. PLS-R could also work in a multiple-response mode, but features could not be shared during modeling. In this study, we used PLS-R to represent single-task learning algorithms for comparing the performance of LMTL. Our results show that the use of LMTL algorithms with multiple response variables as output improves the RPD in all of the seven tested soil properties. We found that shared features can be used to improve the generalization performance of regression models for predicting these seven key soil properties. In addition, we found that low concentration soil properties such as K and EC, with few spectral absorption signals are usually difficult to predict with current optical methods.

Comparison of Two Algorithms
The PLS-R models explained most of the variance in the dependent variable with several latent variables obtained from the spectral matrix, and they recognized the important wavebands for each soil property efficiently. Therefore, the SSR/SST values of the PLS-R models were very high, Figure 5. Used features (non-zero items in the transpose of block-sparse matrix W b (a); and elementwise sparse matrix W e (b)) of the linear multi-task learning models for predicting available nitrogen (N), available phosphorous (P), available potassium (K), water content (WC), pH, electrical conductivity (EC), and organic matter (OM), respectively. (1), (2), (3), and (4) represent four feature-block regions.

Discussion
We investigated the performances of the LMTL algorithm for modeling several soil properties simultaneously using field VNIR/SWIR spectroscopy. To the best of our knowledge, most studies dealing with quantitative soil spectroscopy analysis focus on building regression models via individual single-task learning algorithms with only one response variable as the output. PLS-R could also work in a multiple-response mode, but features could not be shared during modeling. In this study, we used PLS-R to represent single-task learning algorithms for comparing the performance of LMTL. Our results show that the use of LMTL algorithms with multiple response variables as output improves the RPD in all of the seven tested soil properties. We found that shared features can be used to improve the generalization performance of regression models for predicting these seven key soil properties. In addition, we found that low concentration soil properties such as K and EC, with few spectral absorption signals are usually difficult to predict with current optical methods.

Comparison of Two Algorithms
The PLS-R models explained most of the variance in the dependent variable with several latent variables obtained from the spectral matrix, and they recognized the important wavebands for each soil property efficiently. Therefore, the SSR/SST values of the PLS-R models were very high, indicating strong explanatory power (Table 2). However, the low RPD values in the validation set indicating prediction accuracy were not satisfactory due to the large number of used features and the high complexity of the PLS-R models. In comparison, the LMTL based on the dirty model algorithm regularized the overall wavebands and built the regression models with selected shared and non-shared features ( Figure 5), which led to certain degrees of improvement of the RPD values in the validation set, indicating higher prediction accuracy and stronger generalization performance. We found that the RPD values improved by 10.24%, 4.93%, 25.77%, 11.76%, 6.74%, 53.13%, and 3.15% for N, P, K, pH, WC, OM, and EC, respectively. It is worth mentioning that the prediction accuracy category of N improved from C to B, and the prediction ability of K was highly improved. Regrettably, the explanatory power of the LMTL models correspondingly decreased as fewer features were used, increasing the model sparsity ( Table 2). The SSR/SST of pH increased with the LMTL model, this result might be because the redundant information was successfully removed by the regularization and the useful information was kept.

The Shared Features
Correlations between the modeling of different soil properties have been suggested by various studies (e.g., [25,28]). Our study also showed that the distributions of the important wavebands with high VIP scores for the seven key soil properties were quite similar, especially in the four feature-block regions ( Figure 2). In addition, the features used in the LMTL models illustrated the existence of correlations between the different soil properties ( Figure 5). The correlations were mostly attributed to the soil Fe oxides, water content, organics and clay minerals, which constitute the basis of the feature-block regions and shared features in soil spectroscopy. With the advantage of shared features, the prediction accuracies of several soil properties with low concentrations or unobvious spectral absorption signals improved. For example, the 1105, 1275, and 1965 nm wavebands obtained low VIP scores in the prediction of N with the PLS-R model, but were useful with the LMTL model as they were shared by P, WC and OM. The 650 nm waveband in the prediction of N, P and OM was shared by WC, the 1195 nm waveband in the prediction of P was shared by OM, the 1500 nm and 1985 nm wavebands in the prediction of P were shared by pH, and so on ( Figure 5).

Assessing the Performance of Field VNIR/SWIR Spectroscopy
The application of field VNIR/SWIR spectroscopy for quantitative soil property prediction is not a new idea, and great effort has been made toward the goal of improving the prediction accuracy of regression models, especially during the past two decades [3]. Table 4 summarizes the results of past studies for predicting seven key soil properties using the field spectroscopy analysis approach. Many studies conducted field spectral measurements using either a contact probe to touch the soil surface (or soil profile) (e.g., [16]) or a mobile subsoiler to penetrate the soil (e.g., [7]), both with a built-in light source. The performance results of non-contact field VNIR/SWIR spectroscopy were usually not as accurate as those obtained using contact spectroscopy due to the effects of atmospheric water absorption, residual coverings on the soil surface, and scattering [73].  It is well-known that soil OM can be successfully predicted by VNIR/SWIR spectroscopy because of its sensitivity to broad overtones and combination absorptions, such as O-H, C-H, and N-H [75]. Our results show that the prediction performance of OM had the highest accuracy among all of the seven soil properties, with RPD = 2.29, which is categorized as good accuracy (RPD > 2) and is comparable to previous studies that used contact spectroscopy. The prediction accuracy of pH varies among different studies, possibly due to the fact that pH is related to many soil properties but has no direct spectral absorptions [25]. The RPD of pH was 1.90, indicating moderate accuracy (1.4 < RPD < 2). The removal of the atmospheric water absorption wavebands (350-1420 and 1800-1960 nm) caused the loss of several useful features, especially for soil WC, which has significant absorption bands at around 1400 and 1900 nm [76]. Therefore, the prediction performance of WC in this study is not high with an RPD = 1.71, which can only be categorized as moderate accuracy. We found that among the three soil available nutrients, P has been relatively well studied with different levels of predication accuracy; this could be attributed to the chemical measurement method of P, which can greatly affect its prediction [3]. The prediction accuracies of N, P and K in this study (RPD = 1.40, 1.49, and 1.22, respectively) were low, perhaps because of their extremely low concentrations in dryland soils. The prediction of EC in this study was unreliable possibly due to the wavebands sensitive to several key soil salinities, such as NaCl at 1930 nm and Na 2 SO 4 at 1825 nm [77], which were removed along with the atmospheric water absorption wavebands.

Next Steps
We have shown that using a LMTL algorithm based on a regularized dirty model can improve the prediction accuracies of seven key soil properties with the advantage of shared features and regularization. A larger dataset of soil samples may improve the performances of LMTL algorithms and enhance the shared features. The concentrations of soil Fe oxides and clay minerals should also be considered as outputs to enhance the learning ability of the LMTL models. Previous studies have argued that a nonlinear correlation exists between soil properties and spectral features (e.g., [16][17][18][19][20]). A future study should be conducted to apply nonlinear multi-task learning algorithms, such as a deep neural network [78,79], focusing on optimizing these algorithms to improve both the prediction accuracy and the explanatory power. In addition, airborne and satellite-based hyperspectral remote sensing should also be an important research area in which LMTL could be used to develop large-scale soil property monitoring and mapping.

Conclusions
Our study illustrates that LMTL algorithms can improve the prediction accuracies of seven key soil properties by field VNIR/SWIR spectroscopy in the drylands. Our results demonstrate that: (1) The used features for predicting different soil properties are correlated and most of them are attributed to soil Fe oxides, WC, OM and clay minerals. (2) In the current study, OM was predicted with good accuracy (RPD > 2); N, P, WC and pH were predicted with moderate accuracy (1.4 < RPD < 2); K and EC were predicted with poor accuracy (RPD < 1.4). (3) Compared to the PLS-R, LMTL models based on regularization algorithms usually have slightly higher prediction accuracy (with respect to the RPD values) and lower explanatory power (with respect to the SSR/SST values) as the used features are sparse. (4) LMTL could use the advantages of the shared features in the soil spectroscopy of different soil properties and improve the model generalization performance. Our study provides a novel analysis method for in-deep research on the underlying correlations in the soil spectroscopy of different soil properties, which can be used in future studies of soil property prediction and soil quality assessment based on spectroscopy.
Supplementary Materials: The following are available online at www.mdpi.com/2072-4292/9/11/1099/s1. Figure S1: Sparsity (the number of non-zero elements) of: the block-sparse matrix W b (1); the elementwise sparse matrix W e (2); and the combined regression coefficients matrix W (3), of the model generated from linear multi-task learning for predicting available nitrogen (a); available phosphorous (b); available potassium (c); water content (d); pH (e); electrical conductivity (f); and organic matter (g). Figure S2: Used features (non-zero items in the transpose of the block-sparse matrix W b (a); the elementwise sparse matrix W e (b); and the combined regression coefficients matrix W (c)) of linear multi-task learning models with λ b = 40 and λ e = 10.