Retrieval of Soil Water Content in Saline Soils from Emitted Thermal Infrared Spectra Using Partial Linear Squares Regression

Timely information of soil water content is urgently required for monitoring ecosystem processes and functions at various scales. Although remote sensing has already provided many practical applications of retrieving soil moisture, it is largely limited to visible/near infrared or microwave domains and few studies have ever been conducted on the thermal infrared. In addition, soil salinization in arid land further complicates the situation when retrieving soil moisture from emitted spectra. In this study, we attempt to fill the knowledge gap by retrieving the soil moisture of saline soils with various salt contents. This was based on lab-controlled experiments for spectroscopy using a Fourier Transform Spectrometer (2–16 μm). Partial least squares regression (PLSR) has been applied in analyses based on either original measured or first-order derivative spectra. The results revealed that the PLSR model using first-order derivative spectra, which had a determination coefficient (R) of 0.71 and a root mean square error (RMSE) of 3.3%, should be recommended for soil moisture estimation, judged from several statistical criteria. As thermal infrared wavelengths identified in this study are contained in several current available satellite sensors, the PLSR models should have great potential for large-scale application despite extensive validations are needed in future studies. OPEN ACCESS Remote Sens. 2015, 7 14647


Introduction
Soil moisture is a key parameter for exchanging mass and energy between the atmosphere and the terrestrial surface, monitoring ecosystem function [1] or scheduling irrigation [2].It has also been publicly recognized as an important variable in enabling vegetation growth [3], especially in arid regions [4].For this reason, timely and promptly monitoring the soil moisture content is highly desirable.
Compared to traditional approaches, which are generally time consuming, laborious and only able to provide point data, remote sensing provides a novel path of estimating soil moisture [5,6].Numerous studies have already been conducted to estimate soil moisture using various remote sensing data at regional and global scales [7][8][9][10][11][12].Among them, most studies have attempted to investigate soil moisture using spectral data from optical to microwave domains at various spatial scales [11,[13][14][15][16][17][18].
However, although soil moisture being widely studied at visible-near infrared [19][20][21] or microwave bands [4,[22][23][24], few researchers have ever set eyes on the thermal infrared domain [25,26].This is mainly because of the sophisticated technology, complicated design [27] and difficult-to-operate equipment that is required in this domain [28].Even so, a few previous studies, either laboratory or field-measured, have revealed that soil moisture content had a significant effect on the land surface emissivity, especially in the 8-9.5 μm range [29,30], which indicates a promising possibility to retrieve soil moisture inversely.With the rapid progress of thermal infrared spectroscopy, emitted spectra are growing to be an alternative data source for retrieving soil moisture.
Salt-affected soil is a common existence in arid land, where soil-salt and water jointly affected spectra and caused huge anomalies in estimating soil salinity or moisture content [31].Despite fairly satisfactory results from Wang et al. [20], who acquired various level of soil salt content based on hyperspectral reflectance, Farifteh [32] emphasized the difficulty of estimating soil moisture from complex soil, especially mixed with soil salt.Compared with visible-near infrared bands expressing reflective characteristics of an object, thermal infrared shows more emission properties, which are affected mainly by temperature [33,34].Soil water exercises a great influence on soil surface temperature [35], thus, in turn, the thermal infrared band should have the potential to retrieve the soil water content.
On the other hand, Fourier Transform Infrared (FTIR) spectroscopy is commonly used to distinguish chemical components in many fields through the vibrational characteristics of their structural chemical bonds [36].Haberhauer et al. [37] used FTIR data to determine quantitatively the composition of soil organic matter, and Tatzber et al. [38] quantified carbonate in soil samples using FTIR spectroscopy.Thus, thermal infrared spectra might have great potential for measuring soil water content due to the distinctive vibrational characteristics of water molecules.However, to the best of our knowledge, only a few studies on soil water content have been performed in the thermal infrared domain based on spectral emissivity [25,39]; thus, there are still challenges related to retrieving soil moisture content for saline soils in arid land based on the thermal infrared domain.
Many data-mining approaches have been applied for spectral-based analysis, e.g., principal component regression (PCR), artificial neural networks (ANN), or partial least squares regression (PLSR) [40][41][42][43][44].Among these methods, PLSR is an effective statistical method for mining the intricate, frequently-used data in chemometrics, owing to its ability to analyze data with many noisy, collinear and even incomplete variables [42].Existing research claimed that PLSR had the capability to model linear relationships between spectra and soil properties, and has been used successfully for mapping soil properties [45,46].Furthermore, Huang et al. [14] found it helpful to deal with high-correlated datasets to avoid the potential overfitting problem that commonly occurred with multiple linear regressions.Spectra data, which generally hold hundreds of bands, were extremely likely to contain correlated data; thus, PLSR was applied to estimate soil moisture content in this study.
In addition, deriving information from spectral data generally involves various spectral processing, such as spectral unmixing [47,48], band ratio [49], absorption feature analysis [50] and derivative spectra [20].As first-order derivative spectra can highlight the signal effectively [20], they will hence also be employed in this study.
The main target of this study is therefore to develop an effective model for predicting soil moisture from complex soil mixtures by explicitly exploring the response of thermal infrared spectra with different soil moisture conditions for typical saline soils in arid land.Specific aims of the study include: (1) evaluating the potentiality of applying thermal infrared bands for estimating the soil moisture content; (2) developing a predictive model based on thermal infrared bands using the PLSR model; and (3) exploring the possibilities of monitoring and mapping soil moisture over arid land at large scales.

Experimental Designs
Two sets of soil were collected from field for lab-controlled experiments.One set was saline desert soil collected from a typical inland river basin on the western edge of China (44.29°N, 87.94°E), where the mean annual precipitation is 184 mm and the temperature is 6.6°C [51], and estimated evapotranspiration is ca.1840 mm [52].Topsoil (up to 20 cm) was collected in the field and transported to a laboratory where it was air-dried for several weeks before sifting.Two sieves, 0.2 mm and 2 mm, were used to create artificially three groups of soil samples with different particle sizes to simulate various soil surface roughnesses.Top-opened cylinder containers with 15 cm diameters were filled at different heights from 1 to 10 cm at a pace of each centimeter to resemble the diversity of soil moisture contents due to different evaporation rates of soils at different heights.Soil particle sizes and soil column heights were employed to create as many complex field conditions as possible.
The second set of soils were Aeolian sandy soils sampled from the Gurbantunggut Desert (44.43°N, 87.90°E), where the mean annual precipitation ranged from 80 to 160 mm and had a pan evaporation of 2000 mm [53].Haloxylon ammodendron is the dominant species in this region.The soil here contained less salt, which is why it was later used as the basic soil when creating artificial samples with different soil salt contents.Similarly, the topsoil (up to 20 cm) was collected and air-dried for several weeks before being sifted through a 2 mm sieve.All debris such as dry branches and fallen leaves were removed.
Both soil sets were investigated for their physical and chemical properties before creating artificial samples with different salt contents and moisture levels (Table 1).The first set of soils was sandy loam with the clay estimated to be 12.95%, while the second set of soils contained more sands.As field-collected soils contained preferably only Na2SO4 type salts, this type of salt was applied to create different salt contents for artificial samples at seven levels (0%, 1%, 3%, 5%, 10%, 15% and 20%).Each level had three repeats.Salts were ground into powder and combined with basic soils.Note: >2 mm, 0.2-2 mm and 0-0.2 mm represent the particle size of the saline desert soil.
The gravimetric method was applied to monitor the temporally varied soil water contents (SW) of all samples.Distilled water was added gently and evenly to all the samples using a watering can to avoid damaging the soil surface until soil water content reached 20%.Then, the natural evaporating process began, and the soil water content and soil-emitted radiance of each sample were synchronously recorded five times in chronological sequence.

Spectral Emission Recording
The Design and Prototypes (D&P) Inc.Model 102 Fourier Transform infrared (FTIR) Spectrometer [54] was used to record soil emitted radiance.This spectrometer consists of Indium Antimonide (InSb) and Mercury Cadmium Telluride and covers the wavelength domain of 2 to 16 μm, and thermal infrared bands (8-14 μm, 188 contiguous spectral bands) were addressed here [55].The input optic is 2.54 cm in diameter with a 4.8 degree expanding field of view, which gives a 7 cm diameter spot at about 70 cm height in this study.The instrument was set up with a spectral resolution of 4 cm −1 and scans to co-add of eight.For each measurement, three types of raw data were obtained: warm blackbody (WBB), cold blackbody (CBB), and sample intensity.The sample radiance could be calculated by calibrating WBB and CBB, and the unit was W/ (m 2 • μm• sr).The function "Fit Planck to Radiance" in the software was used to calculate each sample temperature.Soil samples were heated to ca. 303 k using a quartz tungsten halogen lamp, which was a practical solution for accurate emissivity measuring on one hand while minimizing water loss through soil evaporation as possible on the other hand.Due to the noise of environment and instrument, the five points moving average method was used to eliminate noise signal [56].
Radiance of blackbody is governed by Planck's Law with the sample temperature: where the unit is mW/m Relative emissivity (denoted as Er in this study), which has been further normalized with the emitted radiance at 7-7.5 μm (a domain with emissivity approximated to 1), has been applied in this study in order to eliminate further measuring noises.
Based on the relative emission spectra, the first-order derivative spectra form was calculated as follows: where DE is the first-order derivative of Er.

Partial Least Squares Regression
PLSR is a mature method that is widely used in chemometrics [57,58].Combining multiple linear regression (MLR) and principal components analysis (PCA), PLSR specifies a linear relationship between numerous dependent variables (Y) and predictor variables (X) by compressing the numerous measured collinear spectral variables to a few non-correlated principal components (PCs) [57,59].In general, stepwise regression was applied to choose each relevant spectral wavelength, and then cross-validation was carried out to determine the number of components and calculate the spectral loading of each principal component [42].This is because the spectra had massively redundant information, and stepwise regression could be used to select each band from a multi-linear model based on their statistical significance with the dependent variable, which calculates the P value of an F-statistic to test the model before a selection is determined.
Cross-validation is a frequent-used model diagnostic tool for assessing the result of a statistical analysis.This method was employed here to test the predictive significance of each PLSR component and determine the optimal number of components by minimizing the predicted residual sums of squares (PRESS) [60].Based on previous literature for identifying training and test sets [61,62], 85% of all data were used for calibration; the remaining data were used for validation.
The performance of PLSR models was evaluated using the Akaike Information Criterion (AIC), which was commonly used to examine the model's effectiveness [63,64].Because of the small sample size [65], we applied the corrected version of the AIC (AICc) as a criterion for model selection.It was calculated as follows: where n is the number samples; L is the logarithmic likelihood value; r is residual vector; and k is the number of features used in the prediction.The best model should have the smallest AICc.
In addition, PLSR models are also evaluated with the determination coefficient (R 2 ), root mean square error (RMSE), and the ratio of performance to deviation (RPD).Models have no prediction ability with RPD < 1.4 [66,67].

General Patterns of Emissivity at Different Salts and Water Conditions
The two groups produced diverse patterns of emissivity in various conditions, which have been applied to analyze the emissivity characteristics.Some representatives are illustrated in Figure 1.For desert saline soil, samples with medium particle size and 8 cm soil columns were chosen to analyze the effect of particle size and soil water content on soil emissivity (Figure 1a,b).The results clearly indicated that the emissivity around the domain of 8-10.5 um generally decreased along with the a decrease in soil water, although the emissivity with the highest level of soil water content decreased to a lower point than that of the second highest level in longer wavelengths, while others were generally in the order.Moreover, Figure 1b indicated that emissivity with a fine particle size had a very similar pattern with that of a medium particle size, but apparently was much higher than that of a coarse particle size.Emissivity of Aeolian sandy soil samples with various salt levels and CK (samples without salt) are illustrated in Figure 1c,d, respectively, which show that the emissivity around 9.2-14 μm was decreasing along with increasing salt content levels despite a controversial pattern in the domain of 8-9.2 μm.In addition, Figure 1d deployed decreasing emissivity along with decreasing soil water levels from 8.8-14 μm, and showed complex features from 8-8.8 μm.Fine, medium and coarse means the different soil particles.

Correlations between Soil Water Contents and Original or First Derivative Spectra
Two-hundred and fifty-five paired data each for soil water contents and original emissivity or soil water contents with first derivative spectra were used for correlation analyses; the data were collected at different surface roughness conditions (150 data pairs) and had different salt types and salt contents (105 data pairs).
For original emissivity spectra, the highest correlation was found at 8.835 μm with a correlation coefficient of 0.502 (P < 0.01) for SW, while the largest correlation coefficient reached −0.549 (P < 0.01) at 10.673 μm if using the first-order derivative spectra (Figure 2).Original emissivity was only positively correlated with SW, while both positive and negative relationships were found if using the first-order derivative spectra.Although there existed many more bands that appeared to be sensitive to soil moisture content in original emissivity spectra than in first-derivate spectra, there were still many sensitive bands in the first derivative spectra, suggesting the possibility of applying both spectra to capture soil water content in many featured wavelengths.

Partial Least Squares Regression Models
Three PLSR models for original spectra and six for first-order derivative spectra were set up.The final selection of best PLSR model was based on the AICc, and the results are presented in both Table 2 and Figure 3.The coefficients and contributions of each band of the final identified PLSR were determined in order to validate the performance of each band.Table 3 shows the intercept and coefficient of each band, and Table 4 displays the respective contribution of each band to each principal component.Larger numerical values indicated a greater contribution of bands to a certain principal component [68].All the three bands identified for the PLSR model had much larger loading weights for the three principal components, indicating they were critical for the PLSR model.Ultimately, the PLSR model had a determination coefficient (R 2 ) of 0.53 and an RMSE of 4.24%, which, in a polynomial equation, can be represented as:   Tables 5 and 6 showed the results that were obtained from the first-order derivative spectra based on identical processes.Seven bands were selected by stepwise regression analysis, and five principal components were determined for the PLSR with the method of PRESS (Figure 3c).Table 6 exhibited the importance of seven bands to each principal component.The bands with great contributions were evenly distributed from 8.097 to 8.769 μm.The PLSR model had a higher determination coefficient(R 2 = 0.71) compared to that of using original spectra and a lower RMSE (3.3%), but the AICc of the model, however, was higher than the PLSR model based on original spectra.Even so, the RPD of it was above 1.4 while that of the PLSR model based on original spectra was below 1.4, indicating a much better predication ability of the model.Accordingly, the first-order derivative spectra form was deemed to model effectively the soil water content.

Thermal Infrared vs. Visible-Near Infrared and Microwave
Bands in visible-near infrared and microwave prevailed in the field of soil moisture estimation, owing to their hyperspectral characteristics in visible-near infrared wavelengths and penetration in microwave wavebands [4,19].Despite these, thermal infrared still had its own superiority over them.Comparing the absorption depth at 1.4 and 1.9 μm for soil moisture content [19], Lesaignoux et al. [69] found that soil moisture content had an important impact between 8-11 μm, and provided a polynomial function based on reflectance in the thermal domain.Besides, Mira et al. [70] also found that soil moisture content could influence soil emissivity from 8.2-9.2 μm, more so than other domains, which is consistent with this study.Furthermore, previous studies also found that thermal infrared had potential to distinguish soil organic content, soil texture, clay and sand content [70][71][72][73].Overall, thermal infrared bands with longer wavelengths have less energy than visible-near infrared bands [74], which might overlook certain detailed background information on saline soil using thermal infrared bands.Compared to previous work on hyperspectral remote sensing [20], the current study also obtained satisfactory results.
Similar to thermal infrared sensors, passive microwave remote sensing received the emissive information of subjects; however, they differed in the ability to penetrate the soil surface.Thermal infrared observations were sensitive to the top surface layer, while microwaves were sensitive to deeper layers [75].Although microwave remote sensing has successfully predicted soil moisture content, especially in C and L bands [23,76], it did not obtain fine spatial resolution at large scales for its long wavelength.Thermal infrared sensors had the superiority of finer resolution for shorter wavelengths, and sensors would receive sufficient energy in a smaller area.Integration of multisource data was also an important solution in the time of the abundant appearance of remote sensing products and application.It has been proven that the combination of microwave and thermal infrared data also have the potential to generate high resolution, high accuracy soil moisture data [77].

Original vs. First-Order Derivative
From the above results, we conclude that the PLSR model based on the first-order derivate spectra could more accurately estimate soil moisture content than raw spectra.Coincidently, many studies reported that features and attributes of soil [20], plants [78] and other terrestrial objects could be captured more effectively by derivative spectra than raw spectra.This might be caused by the interfering factors of raw spectra, which derivative spectra had the capacity to eliminate [79].However, it is necessary to consider simultaneously both accuracy and effectiveness to build a model with favorable performance [64].At the band range related to soil moisture content, correlations between soil moisture content and raw spectra were generally higher than that of first-order derivative spectra (Figure 2).This might result in the model in original spectra performing more comprehensively than the model with first-order derivate spectra.
Here, we analyzed the correlation between relative wavebands and the soil moisture content to reveal the reason for both raw spectra and derivative spectra, as shown in Tables 7 and 8.For raw spectra, the correlations between soil water and involved bands were high; for derivate spectra, the correlations were much smaller than the former.Nevertheless, the derivate spectra had seven contributive components (far more than raw spectra), which made the derivate spectra more accurately model soil moisture content.As a comparison, it was notable that the correlations between bands of raw spectra were much higher than 0.96; contrarily, that of derivative spectra were lower than 0.35.High correlations indicated information that was more similar among bands and provided redundant information for soil moisture content estimation [42].Therefore, derivative spectra had the advantage of supplying much more information with a limited number of bands.Unfortunately, in the current study, most of the involved bands were weakly related to soil moisture content, which means that first-order derivate spectra required more bands to model accurately soil moisture content and would simultaneously reduce the model's effectiveness, resulted in a slightly higher AICc as compared with the PLSR model with raw spectra.However, further analysis using RPD clearly revealed that the PLSR model based on derivative spectra had higher prediction ability and hence is recommended for soil moisture estimation.

Potential Applications at Large Scale
Currently, thermal infrared sensors have more coarse spectral and spatial resolution such as ASTER, AVHRR, MODIS and Landsat ETM+ (Table 9).Unfortunately, we could see that most sensors did not include the band region referred to above, but there were still two sensors (ASTER and MODIS) covering several bands that provided exceptional performance of soil moisture content estimation.Considering the realistic difficulty of applying this estimation at a large scale, two problems must be considered.Firstly, the bandwidth used in lab test was not consistent with that in satellite sensors.This may be solved via weighted summation for changing narrowband emissivity to broadband emissivity in line with Cheng et al. [80] and Wang et al. [25].According to the chosen model, the ASTER sensor has two channels corresponding to the sensitive bands for predicting soil moisture content in this study: the MODIS sensor only has one.Moreover, the spatial resolution of ASTER was much finer than that of MODIS, which indicated that ASTER had great potential for monitoring and mapping soil moisture content [30].
Secondly, water vapor and other factors in the atmosphere will affect inevitably emissivity variation and lead to errors [81].The method of water vapor scaling was introduced to minimize atmospheric uncertainties [82].Based on these, Wang et al. [25] studied the effect of soil moisture on thermal infrared emissivity with ASTER data.In addition, the result was also conducive to request that future thermal sensors needed to be accommodated.These findings highlight the potential to characterize the soil water content of salt-affected soil with thermal infrared band domains in arid land at various scales.

Conclusions
This study is among the few to attempt the use of thermal infrared bands for estimating soil moisture content at various mixed conditions.Within the scope of this study, the empirical approach of the PLSR model is examined with two spectral forms: raw and derivate spectra.The PLSR model built with first-order derivative spectra is recommended for soil moisture estimation.Compared to the channels of existing sensors, the result of this study showed the potential of thermal infrared data in providing access to the soil moisture content estimate.The laboratory experimental result should be evaluated in the field, with airborne and space-borne data, where additional confounding factors are present: in particular, the atmosphere effect and spectral resolution problems.This study also acts as a reference material and specific request for wavebands of thermal sensors designed in the future.

Figure 1 .
Figure 1.Spectral characteristic of the various sample settings: (a) emissivity of various soil water contents for medium soil particle size (0.2-2 mm); (b) emissivity of different soil particle size for the third measurement; (c) emissivity of various soil salt levels at the second measurement; and (d) emissivity of various soil water contents for the sandy soil without salt added.Note: 1st, 2nd, 3rd, 4th and 5th represent the times measured soil water content and emission.

Figure 2 .
Figure 2. Correlation between soil moisture content and two spectra forms.

Figure 3 .
Figure 3.The result of cross-validation for PC number determination and soil water content estimate based on PLSR model with raw spectra form ((a), (b)) and first-order derivative spectra form ((c), (d)).

Table 1 .
Soil chemical properties of different sets of field-collected soils.

Table 2 .
Parameters for model selection.PC refers to the number of principal components.

Table 3 .
Coefficients of selected bands in the PLSR model with original spectra.

Table 4 .
Loading weight matrix of each band for the PLSR model based on the Er spectra to estimate soil water content.

Table 5 .
Coefficients of involved bands in the PLSR model with derivative spectra.

Table 6 .
Loading weight matrix (LW×100) of each band for the PLSR model based on the DE spectra to estimate soil water content.

Table 7 .
Correlation matrix between soil moisture content and raw spectra form involved in the PLSR model.

Table 8 .
Correlation matrix between soil moisture content and first-order derivate spectra form involved in the PLSR model.