Montmorillonite Estimation in Clay–Quartz–Calcite Samples from Laboratory SWIR Imaging Spectroscopy: A Comparative Study of Spectral Preprocessings and Unmixing Methods

Clay minerals play an important role in shrinking–swelling of soils and off–road vehicle mobility mainly due to the presence of smectites including montmorillonites. Since soils are composed of different minerals intimately mixed, an accurate estimation of its abundance is challenging. Imaging spectroscopy in the short wave infrared spectral region (SWIR) combined with unmixing methods is a good candidate to estimate clay mineral abundance. However, the performance of unmixing methods is mineral-dependent and may be enhanced by using appropriate spectral preprocessings. The objective of this paper is to carry out a comparative study in order to determine the best couple spectral preprocessing/unmixing method to quantify montmorillonite in intimate mixtures with clays, such as montmorillonite, kaolinite and illite, and no-clay minerals, such as calcite and quartz. To this end, a spectral database is built with laboratory hyperspectral imagery from 51 dry pure mineral samples and intimate mineral mixtures of controlled abundances. Six spectral preprocessings, standard normal variate (SNV), continuum removal (CR), continuous wavelet transform (CWT), Hapke model, first derivative (1st SGD) and pseudo–absorbance (Log(1/R)), are applied and compared with reflectance spectra. Two linear unmixing methods, fully constrained least square method (FCLS) and multiple endmember spectral mixture analysis (MESMA), and two non-linear unmixing methods, generalized bilinear method (GBM) and multi-linear model (MLM), are compared. Global results showed that the benefit of spectral preprocessings occurs when spectral absorption features of minerals overlap for SNV, CR, CWT and 1st SGD, whereas the use of reflectance spectra performs the best when no overlap is present. With one mineral having no spectral feature (quartz), montmorillonite abundance estimation is difficult and gives RMSE higher than 50%. For the other mixtures, performances of linear and non-linear unmixing methods are similar. Consequently, the recommended couple spectral preprocessing/unmixing method based on the trade-off between its simplicity and performance is 1st SGD/FCLS for clay binary and ternary mixtures (RMSE of 9.2% for montmorillonite–illite mixtures, 13.9% for montmorillonite–kaolinite mixtures and 10.8% for montmorillonite–illite–kaolinite mixtures) and reflectance/FCLS for binary mixtures with calcite (RMSE of 8.8% for montmorillonite–calcite mixtures). These performances open the way to improve the classification of expansive soils.


Introduction
Soil shrinking and swelling due to expansive clays have consequences on urban planning for decision makers since they cause billions of dollars in building damages every year [1][2][3]. They are also responsible for off-road vehicle mobility, the latter being impacted by the sinking and stickiness of wet soils [4][5][6]. Aluminum-rich smectite (montmorillonite) has been demonstrated to be the clay mineral having the most expansive potential hazard [7]. Thus to monitor soil shrinking and swelling, the characterization of clay mineralogy is needed, both in terms of detection and quantification. In mid-latitude regions, smectite (whose montmorillonite represents the most common variety), illite and kaolinite are the main clay minerals found in soils affected by swelling risk [8]. For soil swell-shrinking application, Dufréchou et al. and Chassagneux et al. [9,10] defined four classes linking shrink-swell potential and montmorillonite content: low swelling potential (< 10%), moderate swelling potential (between 10% and 50%), high swelling potential (between 50% and 70%) and very high swelling potential (> 70%). Many well-used techniques, either qualitative or quantitative, can assess clay minerals characterization, but they only provide little and local information due to a low number of soil samplings along a coarse grid such as: ground geotechnical engineering techniques measuring soil swelling potential [11] and X-ray diffraction (XRD) for mineral identification/quantification, and infrastructure damage reports for expansive clay qualitative assessment. However, the use of these methods mainly remains expensive and time-consuming.
An alternative to overcome these limitations is the use of hyperspectral imaging spectroscopy, which is able to discriminate clay minerals using their specific spectral absorption features in the short wave infrared region (SWIR) 2100-2500 nm [12][13][14]. However, clay minerals are rarely pure in soils and are usually intimately mixed with other minerals, water and organic matter. Spectral preprocessings are used to overcome intraclass spectral variability, remove some non-linear illumination effects and enhance shallow absorption features [15][16][17][18][19][20]. For example, the identification of soil minerals using spectral similarity distance increases up to 18% using continuous wavelet transform [20]. Using spectral preprocessings could be a first preliminary step to further discrimination/quantification methods [16,20,21].
Two classes of methods exist in order to map mineral clays in soils: the first is based on spectral features and the second on unmixing. The first class includes the use of spectral indices (Dufréchou et al., [12] obtained a root mean square error (RMSE) of 15.4% for montmorillonite, 25.2% for kaolinite and 29.8% for illite), and linear regression methods using clay spectral feature characteristics (e.g., depth, width and position close to 2200 nm; [22][23][24][25]), and decision tree methods (Mulder et al., [26] obtained a RMSE of 9% for clay-calcite intimate mixtures). Expert systems such as Tetracorder use spectral features in order to detect and discriminate minerals, such as clays [27], for example to map clay alteration of Hawaii volcanoes [28]. These characteristics can also be automatically extracted using partial least square regression (Viscarra Rossel et al. [29,30] obtained a RMSE of 3% for illite, kaolinite and smectite in clay-quartz-organic mixtures). Neural networks have been tested for mineral sub-pixel classification [31], on a Hyperion dataset. Compared to ground reference, fraction prediction provides a correlation coefficient r² value of 0.62 for illite/montmorillonite class and 0.6 for kaolinite class.
The second class of methods using unmixing relies on the knowledge of pure mineral spectra or endmembers [32][33][34][35]. Linear unmixing methods only consider materials at the surface that are contiguously distributed (named in the following "aeral mixture", [36] Figure 1a) while non linear unmixing methods consider mixed pixels several soil elements of volumetric distribution (also called "intimate mixture", [32,36] Figure 1b). On one side, the most used linear unmixing methods considering only one spectrum per endmember class are the fully constrained least square (FCLS; [37]) and mixture tuned matched filtering (MTMF; [38]). On contrary, several algorithms account for the endmember spectral intraclass variability such as multiple endmember spectral mixture analysis (MESMA; [39]), the spectral assistant, (TSA; [40]) and the unmixing within a multi-task Gaussian process framework (UMTGP; [41]). For example, Chabrillat and Bedini [42,43] used MTMF and MESMA to discriminate illite/smectite from kaolinite in soils with airborne data. On the other side, Heylen [44] gave a review of performance of nonlinear unmixing methods, including the Hapke model [45], generalized bilinear model (GBM; [46]) and multi-linear model (MLM; [34]). For example, [34] showed with laboratory spectroscopy datasets of quartz-alunite intimate mixtures that the Hapke model delivers the best result (bias of 1%-2% between estimated and measured abundances), then MLM (bias of 10%-20%) and at last both GBM and FCLS have a similar performance (bias 30%-40%). Robertson et al. [47] compared a version of the Shkuratov model [48] with the Hapke model to quantify laboratory mineral mixtures of montmorillonite-gypsum. In these samples, minerals fractions are estimated with an error less than 10% for both models. Unmixing methods can also be embedded in expert systems such as Tetracorder in order to increase the accuracy of mineral discrimination [27]. The advantage of using unmixing methods is to be not site-dependent and they do not require a learning stage with a representative training dataset, which is the case with the first class of methods. The difficulty to choose the most appropriate unmixing method is that their performance differs from the mineral mixture composition of soils. Few studies have used unmixing methods to estimate clay minerals fraction, but unmixing is more used for mineral discrimination. Moreover, very few studies have combined spectral preprocessings with unmixing methods, but they demonstrated the ability to decrease mineral fraction estimation errors using spectral preprocessing [16,20,21].
Then, by using airborne and satellite hyperspectral imaging spectroscopy, the sensor spatial resolution and the possible presence of vegetation overlaid on soil within a pixel degrade the performances of mineral estimation [42,43,49]. A first step commonly practiced is to study spectroscopy data under laboratory conditions to avoid environmental factors such as atmospheric and soil water content and soil/vegetation mixtures [12,26,29,[50][51][52].
The main objective of this study is to compare the performance of both linear and non-linear unmixing methods combined with spectral preprocessings to estimate montmorillonite abundance in mineral mixtures. For this purpose, this pioneer work is based on spectroscopic measurements of intimate mixtures composed of different mineral types (clays, calcite and quartz) and controlled abundances, manually generated in the laboratory with dry conditions and further used as proxies for soil samples. Performances of the spectral preprocessings and unmixing methods will be assessed from the spectral database deriving from them. Another scenario considering synthetic mineral aeral mixtures computed from spectral measurements of pure minerals will be also analyzed in order to compare the performances of spectral preprocessings and unmixing methods between aeral and intimate mixtures. Thus, the spectral database of intimate and aeral mixtures and the laboratory imaging spectroscopy setup are presented in Section 2. The methodology including the spectral preprocessing and selected unmixing methods are described in Section 3. Results are presented in Section 4 and discussed in Section 5, with the conclusions exposed in Section 6.

Laboratory Imaging Spectroscopy Setup
The laboratory setup ( Figure 2) was similar to the one used in [50] under dry conditions. Two halogen lamps light the sample with an incidence zenith angle of 35°. The reflected signal was acquired by a hyperspectral camera HySpex SWIR-320me [53] (NorskElektroOptikk) whose characteristics are summarized in Table 1. Each dry mineral mixture was placed in a plastic container lying on a tray that moved horizontally during the acquisition process to obtain a hyperspectral image Radsample ( Figure 2). The final image was the average of 10 acquisitions, reducing the noise by √10. Further, an image, Radspec, of a standard white Spectralon® panel was recorded in order to convert the acquired sample images expressed in radiance unit to reflectance unit such as: With Rad(λ, i, j) the radiance of image pixel (i, j) with i the line number, j the column number and λ the wavelength, ρ(λ)spec the Spectralon ® reflectance and ρ(λ,i,j)sample the sample reflectance.
An area of 150 pixels × 200 pixels (11.25 × 15.0 cm²) was selected over each image to avoid possible stray light from the container borders ( Figure 2b). Saturated pixels were removed, corresponding to 2% in the average for each image. For each mineral mixture, a mean reflectance over the area of interest of the image was computed and stored in a spectral database.

Spectral Database
The spectral database was built from five pure clay mineral samples (illite, kaolinite, montmorillonite, calcite and quartz) and two mixture types were considered, aeral and intimate, for a total of 46 mixtures. In one hand, the spectral reflectance of aeral mixtures was computed from the linearly mixed combination of pure mineral samples pondered by their respective abundance. On the other hand, the spectral reflectance of intimate mixtures was measured from the facilities presented in Section 2.1. This last scenario is a proxy to consider soil mixtures, without any organic matter and soil water content affecting spectral signatures [54]. Investigating unmixing performance with aeral mixtures enables to (i) highlight the benefit or not of spectral preprocessings over reflectance spectra, and (ii) compare the performance of linear unmixing methods independently of non-linear effects occurring in intimate mixtures. For this purpose, the mineral abundance should be as close as possible between aeral and intimate mixtures.
Each sample was labeled as a suite of capital initials of the mineral's name followed by its abundance in weight percentage (wt %), and the same for quartz and calcite. For example, I20K80 refers to illite and kaolinite with an abundance of 20 wt % and 80 wt %, respectively. M20S70C10 corresponds to a mixture with a 20 wt % abundance of montmorillonite, 70 wt % of quartz and 10 wt % of calcite, respectively.

Pure Clay Mineral Samples
The following clay minerals are: illite from Saint Paulien (France) commercialized by Argiles du Velay, kaolinite from Clérac (France) commercialized by Imerys, montmorillonite from Villanova Tulo (Sardinia, Italy) commercialized by Argiles du Bassin Méditerranéen (ABM). Clays are provided as a powder with a grain size less than 80 µm. Quartz is collected from a sand mine in Fontainebleau (France) commercialized by Sifraco and calcite comes from Orgon (France). Quartz has a grain size less than 300 µm, and calcite less than 70 µm. Collected minerals were oven-dried and have been stored in dry conditions before use.
Clay spectral features are present in the SWIR domain, at 1400 nm, 1900 nm and more specifically, between 2100 and 2500 nm [12][13][14] (Figure 3). As atmospheric water vapor content has low transmittance at 1400 and 1900 nm [14,55], only the reduced spectral domain of 2100 -2425 nm will be further considered in this study. Clays exhibit several spectral features, which result from the presence of vibrational hydroxyl processes:  The absorption feature around 2200 nm is due to the Al-OH vibrational mode. Its accurate location depends on the clay type: 2208 nm for illite, 2212 nm for montmorillonite and 2206 nm for kaolinite. Kaolinite also has a double absorption feature (2160 nm and 2206 nm), which is leftward asymmetric. The typical absorption bandwidth is around 100 nm whatever the clay type.  The absorption features due to OH-stretching bands combined with lattice vibrations at approximately 2360 nm is shallow for illite and sharp for kaolinite. Kaolinite has also in addition two more absorption features at 2320 nm and 2380 nm. Calcite and quartz were very reflective with a maximum reflectance of 0.94 for calcite and 0.8 for quartz ( Figure 3). Calcite had strong absorption bands corresponding to the vibrational modes of CO ions between 2300 and 2350 nm, and a weak absorption feature close to 2140 nm. Quartz remained spectrally flat in this spectral domain.

Synthetic Mineral Aeral Mixtures
Aeral mixtures are synthetically generated using spectral data of pure minerals ( Figure 3). In order to reproduce variability of pure mineral spectra, 1000 pixel spectra have been randomly selected in each hyperspectral image of pure minerals. Then, using these spectra, a linear mixing was applied to these spectra for each mineral following the equation below: With the abundance of mineral i and the spectral reflectance. As the focus of the study is on the discrimination of the montmorillonite, only binary aeral mixtures combined with montmorillonite are considered ( Figure 4): illite-montmorillonite, kaolinite-montmorillonite, calcite-montmorillonite and quartz-montmorillonite. Each mineral abundance of a given mixture was measured by weight percentage (wt %). The abundances vary from 20 to 80 wt % by a step of 20%.

Mineral Intimate Mixtures
Twelve binary and 9 ternary mixtures of clay minerals ( Figure 4a) and 8 binary mixtures of montmorillonite with either quartz or calcite ( Figure 4b) were considered in order to uniformly cover as much as possible the area of the ternary diagrams. Each mixture had a volume of 300 mL (corresponding to a sample thickness of 2 cm in a plastic container of surface 22 × 26 cm²). This abundance was converted from volume to weight percentage in order to be compared with its aeral counterpart. As it can be seen in Figure 4, this measurement protocol did not allow us to necessarily have a superposition in terms of mineral abundance between the intimate and patchwork like mixture.
Then, the minerals were manually and homogeneously mixed in the plastic container. The surface of each sample was leveled to avoid non-linear interactions due to the roughness of the sample. The mixture homogeneity was assessed by deriving the standard deviation of the spectral reflectance measured over the image area of interest (cf. Figure 2, Section 2.1)). The latter ranged between 1% to 2.5% reflectance for each mixture, which proved good uniformity within the sample at the surface.

Methods
The methodology consists of the application of spectral preprocessing (Section 3.1), then unmixing methods (Section 3.2), and the accuracy of the retrieved abundance maps was further assessed with evaluation criteria (Section 3.3; Figure 5).

Spectral Preprocessings
Six spectral preprocessings already used with unmixing methods or commonly used in the soil scientific community were selected ( Table 2). Among them, a first group conserving the reflectance continuum includes pseudo-absorbance (Log(1/R)), the Hapke model and standard normal variate (SNV), while a second group suppressing it includes continuum removal (CR), continuous wavelet transform (CWT) and first Savitzky-Golay derivative (1st SGD). The chosen Hapke model considers the soil medium as an isotropic mixture with the same granulometry for all components [45]. The acquisition conditions were fixed to 35° for the zenithal incident angle (i.e., µ0 =cos (35)) and 0° for the zenithal viewing angle (i.e., µ =cos(0)).
CWT deconvolves the signal into a linear sum of a wavelet-base function (ψ) defined as the second order derivative of Gaussian. Ten wavelets were computed from the original spectrum. Consequently, only the second to the fifth scale wavelets were kept in order to remove the contribution of the noise and continuum as recommended by Feng et al. [56].
The 1st SGD is the result of the convolution of the reflectance spectra by the Savitzky-Golay filter [57]. It was performed with a second order polynomial fit on a window size of seven bands. Higher orders of derivative were not tested since Heydley et al., [58] showed that they did not improve unmixing performance.

Unmixing Methods
Four unmixing methods were chosen: two linear methods, FCLS (the most commonly used) and MESMA (accounts for intraclass variability), and two non-linear methods, GBM (able to considered first and second order interactions) and MLM (multiple interactions; Table 3). In MESMA, each endmember was modeled by six spectra (mean spectrum, mean spectrum ± standard deviation, mean spectrum ± 3 × standard deviation and median spectrum). The non-linear contribution was estimated by GBM with its γij parameter and by MLM with the P parameter.
The implementation of the FCLS algorithm comes from the pysptools python's package. We coded our own version of MESMA based on [39]. GBM and MLM implementation from [44] were used. Table 3. Unmixing methods. Reflectance: ρ, number of endmembers: p, endmember abundance: f , endmembers spectra: ρ and ρ , residual term: ε, free additional parameter γ , probability to undergo multiple interactions: P.

Evaluation Criteria
Three evaluation criteria expressed in percentage of weight abundances (wt %) will assess the performance of the couple spectral preprocessing-unmixing model.
The mean bias (MB) will evaluate the mean error between the estimated abundance and the measured one for a pixel n among the region of interest (ROI) image N pixels: The standard deviation around the mean bias (STDB) is as follows: The root mean square error (RMSE) is expressed as:

Clay Mixtures
Reflectance spectra for illite-montmorillonite (IM) and montmorillonite-kaolinite (MK) intimate mixtures show that the reflectance level decreased when the montmorillonite abundance increased. For IM mixtures, a spectral shift of 6 nm was observed from I100 to M100 spectra, simultaneously with a slope change in the range 2220-2250nm (Figure 6b). For MK mixtures, the double absorption feature at both 2160 nm and 2200 nm disappeared when montmorillonite abundance was more than 50% (data not shown).
Intrasample spectral variations were more important for reflectance, Hapke and Log(1/R), than for SNV, CR, CWT and 1st SGD whatever IM ( Figure 6) and MK mixtures (data not shown). For instance, these variations for reflectance spectra had an amplitude around 2% for illite and 5% for montmorillonite, while for IM mixtures they were around 5% (Figure 6a,b).
The non-linear behavior of intimate mixtures relative to montmorillonite abundance was compared with the linear behavior of aeral ones. For IM mixtures, the mean spectral difference between the two mixture types was not centered to zero, thus highlighting this non-linear behavior ( Figure 6). Then, the latter was the most emphasized in clay absorption spectral features for SNV, CR, CWT and 1st SGD suppressing the continuum (Figure 6o,r,u). For reflectance spectra, the non-linearity was quantified up to 1.5% for M20I80 (Figure 6c).
For clay ternary mixtures, same conclusions as before were derived from the analysis of the spectral preprocessing (data not shown). Figure 6. Montmorillonite-illite mixtures: from (a) to (c) reflectance spectra and from (d) to (u) application of the 6 spectral preprocessings for aeral mixtures (first column) and intimate mixture (second column), difference between mean spectra of aeral and intimate mixtures (third column; line: mean reflectance, ribbon: mean ± one standard deviation).

Montmorillonite-calcite/quartz Binary Mixtures
For MC intimate mixtures, the montmorillonite spectral absorption feature started disappearing when calcite abundance was comprised between 80% and 100% (Figure 7b).
The same conclusions as for clay binary mixtures applied for the intrasample spectral variations (Figure 7). For reflectance spectra, they accounted for less than or about 5% (Figure 7b). However contrary to these previous clay mixtures, a better intersample discrimination was noticed with reflectance, Hapke and Log(1/R).
The study of the non-linear behavior of intimate mixtures led to the same conclusions as previously. However, the non-linear effect tended to be higher when calcite abundance dominated, and achieved 4%-6% for M20C80 (Figure 7c).
For MQ reflectance spectra, all spectra seemed to overlap for M abundance superior to 11% (Figure 8b). For this reason, the use of spectral preprocessing brought no benefit for spectra separability (Figure 8). MQ aeral mixtures were not shown because of abundances values between aeral and intimate mixtures did not match (cf. Figure 4).

Clay Mixtures
Linear unmixing method performances depend on the intrinsic errors due to the unmixing algorithms and the errors due to the non-linearity of intimate mixtures. The first was estimated by applying the unmixing methods on aeral mixtures, and the second would be deduced from the comparison of these results to those obtained with intimate mixtures. These two steps were detailed for clay binary mixtures, while for clay ternary ones; only the global error was given, which accounted for both intrinsic and non-linearity errors (no aeral mixtures available).
For IM aeral mixtures (respectively MK), the best RMSE performance is shown for FCLS applied on SNV, CR, CWT and 1st SGD with RMSE from 2.4% to 3.8% (respectively from 3.2% to 6.2%) ( Table 4). Moreover, CWT and 1st SGD biases were close to zero. Oppositely, FCLS applied on reflectance, Hapke or Log(1/R) led to a large range in M abundance estimates with STDB > 14% (respectively > 16%) with an underestimated mean bias (highest bias for Hapke with -5.3%, respectively -3.9%; Figure 9a). For MESMA and both IM and MK aeral mixtures, the best RMSE performance was achieved with whatever reflectance and spectral preprocessing except for Hapke and ranges between 2.6% and 5.3%. However, the range in montmorillonite abundance estimates was strongly reduced compared to FCLS for reflectance, Hapke or Log(1/R) by a maximum factor of 4, but this reduction was not significant for SNV, CR, CWT and 1st SGD (Figure 9b).
For IM intimate mixtures (respectively MK), FCLS methods delivered similar best results with SNV, CR, CWT and 1st SGD with RMSE ranging from 8.4% to 12.7% (respectively from 13.6% to 15.7%; Table 4). Their mean bias in M estimated abundances increase compared to the aeral mixture, from 7.1% to 12.2% for IM intimate mixtures (respectively from 12.5% to 14.8%) and below 2.1% for aeral ones whatever the IM/MK mixtures. For MESMA and IM intimate mixtures, the best results were achieved with reflectance and whatever spectral preprocessing except Log(1/R) with RMSE less than 13% (respectively 15.7% for MK). For IM and MK, a degradation of performances compared to aeral ones was observed around a factor of 2, except for Log(1/R) (degradation by a factor of 5) and Hapke (improvement by a factor of 2 only for IM).
For illite-montmorillonite-kaolinite (IMK) intimate mixtures, better results with FCLS were with SNV, CR, CWT and 1st SGD for RMSE between 9.1% and 14.8% (Table 5). However, the use of SNV led to higher absolute mean biases (MB of around 13.9%) than the others (MB between 7.4% and 9.7%) while the variability in estimates was comparable among them (STDB between 4.8% and 5.2%). For the MESMA method, results performed the same for reflectance and all spectral preprocessing with RMSE ranging from 7.1% (Hapke) to 14.5% (SNV) and similar values of STDB from 5% to 9.5%.    Table 6). For MC/MQ mixtures and both FLCS and MESMA, the use of reflectance, CWT and 1st SGD introduced almost no bias in the estimates (≤ 0.4 %).
For MC intimate mixtures, FCLS and MESMA with reflectance or Log(1/R) delivered the best performance with RMSE less than 10.8% and STDB less than 8%. In a least extent, the use of CR with MESMA gave a RMSE of 16% while RMSE accounted for more than 21% for the other spectral preprocessings with both FCLS and MESMA. Furthermore, a large increase in mean bias was observed for FCLS and MESMA coupled to SNV, CR, CWT, Hapke or 1st SGD (MB between 14.3% and 36.6%). Compared to aeral mixtures, poorer performances were obtained with these spectral preprocessing-unmixing method couples by a factor worst than 2, excepted for Log(1/R)-MESMA (no explanation found).
For MQ intimate mixtures, unmixing performance is very low for all linear unmixing methods, with and without spectral preprocessing, with a RMSE higher than 50%. Thus compared to aeral mixtures, performance are the worst for intimate mixtures by a factor of more than 7.

Clay Binary Mixtures
For GBM, the best results were obtained with CR, CWT and 1st SGD, leading to RMSE ranging between 13.4% and 13.9% for MK intimate mixtures, and between 7.7% and 9.2% for IM ones ( Table  7). The worst performance was achieved for reflectance and Log(1/R). Comparing MK with IM mixtures, mean biases were higher with MB of around 12.4% for the first and 7% for the second. The non-linear contribution estimated by GBM was examined through the parameter γ and only for MK mixtures (same conclusions for IM ones; Figure 10). Histograms of γ had narrow peaks close to zero for all spectral preprocessing and reflectance, with CR and 1st SGD having a mean value almost null.
Oppositely, performance of MLM was less dependent on the use or not of spectral preprocessing with RMSE in the range 12.4%-14.8% for MK mixtures and 7.0%-12.9% for IM ones. Histograms of P, the non-linear contribution of MLM, were centered around zero with a wider distribution covering both positive and negative values ( Figure 11). The wider ones were observed for reflectance and Log(1/R).    (Table 8). In the least extent, GBM with CR gave a RMSE of 17.9% while using the other spectral preprocessings led to RMSE more than 17.9% due to higher values of MB. Associated histograms of γ were close to zero (Figure 12). MLM results brought more degradation in global performance with RMSE superior to 27%, except for Log(1/R) with RMSE equal to 14.7%. Associated histograms of P were centered to zero for CWT and 1st SGD with narrow peaks, while the histograms of reflectance, CR and Log(1/R) were more spread with negative and positive values (Figure 13). At last for SNV, the histogram was a narrow peak around a mean negative value.
For MQ intimate mixtures, both GBM and MLM presented poor performance with RMSE more than 50% (Table 8).

Clay Ternary Mixtures
For the GBM method, the best results were obtained with CR, CWT and 1st SGD, RMSE respectively of 7.5%, 10.2% and 10.8%, and MB and STDB of the same order of magnitude (Table 9). SNV delivered higher absolute mean biases of around 20.5% and RMSE of 21%. For the MLM method, performances were similar whatever the spectral preprocessings (RMSE between 9.9% and 12.7%) except for CR with RMSE of 17.1%. The main errors were driven by MB (15.6%) in comparison with STDB (6.8%). Table 9. Performances of non-linear unmixing methods to estimate montmorillonite abundances for IMK intimate mixtures with different combinations of spectral preprocessings (best results in grey cells with bold font).

Non-linearity of Intimate Mixtures Depending on the Mineralogical Composition
Following the increase in montmorillonite abundance, the non-linearity of intimate mixtures was put into evidence compared to aeral mixtures. For two minerals having the same granulometry (around 80 µm), the main contribution of this non-linearity was due to the mean level of reflectance if the minerals have close spectral absorption features such as illite and montmorillonite. Then, with two minerals having no overlapping spectral features such as calcite and montmorillonite, the spectral variations in the absorption features were added as a second contributor to the non-linearity. At last, in the presence of a mineral without any absorption feature such as quartz, the spectral effect of the non-linearity increased drastically. Actually, the reflectance could be decomposed into two components, the volume and the surface reflectances, whose relative contribution depended on the granulometry. First, due to the high reflectance of quartz, the volume reflectance was favored by the multiple scattering inside the mixture and as such increased the chance for a photon to interact with montmorillonite mineral. As a consequence, the non-linearity was observed in the absorption band of montmorillonite, even if its abundance was low as it had already been mentioned by Clark [62]. Second, the surface reflectance depended on the reflective value and the abundance of each mineral. The higher the granulometry, the higher is the surface reflectance relative contribution. Thus for quartz/montmorillonite mixtures, the non-linearity contribution might be partially compensated by the quartz particle size (300 µm).

Spectral Variability Reduction with Spectral Preprocessings
Application of spectral preprocessings on intimate mixtures spectra showed that 1st SGD, CWT, SNV and CR significantly reduced the intrasample spectral variability while Log(1/R) and Hapke gave an intrasample spectral variability with the same order of magnitude as with reflectance spectra, but sometimes much worse.
Actually, two classes of spectral preprocessings could be distinguished based on their spectral transformation: quasi-linear (1st SGD, CWT) and non-linear (SNV, CR, Log(1/R), Hapke). These two spectral preprocessing classes led to different spectra and introduced some artifacts. CWT decomposes the spectrum into a sum of linear signals and preserves additive properties of the spectra [20] while the 1st SGD is only sensitive to the local spectral variations. SNV is based on a centered-reduced spectrum normalization [17,19] corresponding to a multiplicative correction causing a loss of any additive contribution. Clark et al. [60,62] showed that CR correctly identifies the location of spectral absorption features with spectra having a steep slope continuum but to the detriment of the deformation of the spectral feature [20,62]. Log(1/R) transforms data into the logarithmic space, which, in relationship with Lambert-Beer's law increases linearity between spectral data and constituent abundance [17,19]. However, additive and multiplicative properties of the spectrum are not suppressed, so that the variability is not reduced. Hapke transforms the data into the albedo space, leading to the same conclusions as Log(1/R).
As a result, the weak intrasample spectral variability of spectra deduced from quasi-linear and non-linear spectral preprocessings 1st SGD, CWT, SNV and CR, was explained by their low sensitivity to the reflectance mean level. On the contrary, the non-linear Hapke and Log(1/R) preserved the intrasample spectral variability of the reflectance. However, the reduction of the latter did not lead to a better discrimination between the different mixture spectra because our samples were homogeneous and thus had low intraclass variability.

Performance of Linear Unmixing Methods with Spectral Preprocessings
For clay binary aeral mixtures (IM and MK) and reflectance spectra, MESMA is expected to have better performances than FCLS, as it takes into account the intrasample variability of endmember [39]. Improved estimation results were particularly noticeable with reflectance spectra and Log(1/R) having a large intrasample spectral variability, but improvements were very small with the other spectral preprocessings already having a reduced intrasample spectral variability. Overall, FCLS and MESMA had comparable performances with SNV, CR, CWT and 1st SGD. However, SNV and CR carried higher biases in montmorillonite abundance estimation. For calcite and quartz aeral mixtures (MC and MQ), the same conclusions as before could be done in the main lines, with the best results achieved with 1st SGD, CWT and CR.
For clay binary intimate mixtures, the best spectral preprocessings were SNV, CR, CWT and 1st SGD for FCLS whereas for MESMA, Hapke and reflectance spectra were added. MK mixtures had larger montmorillonite estimation errors than for IM ones, oppositely to the clay aeral mixtures. Possible reasons might be that bias values were more important for intimate mixtures than aeral ones. Then for the latter, since there was a strong spectral overlapping of IM mixtures within the range of the intrasample spectral variability (Figure 6a), confusions might occur in finding the best unmixing solution, whereas, confusions might be added due to the non-linear effects for intimate mixtures (more important for MK than IM mixtures). Among the best couples spectral preprocessing/unmixing method, CWT or 1st SGD/FCLS or MESMA were selected. By comparing their results between the aeral and intimate mixtures, the main source of error based on the RMSE came from the non-linearity of the intimate mixtures with an increase by a factor of 2-6. For calcite/montmorillonite intimate mixtures, the best couples spectral preprocessing/unmixing method were REF/FCLS or MESMA. Performances were similar with IM intimate mixtures but better than MK ones. The impact of the non-linearity within the intimate mixtures compared to the aeral mixtures increased by a factor of 2-4. For quartz/montmorillonite intimate mixtures, the non-linearity effect due to the presence of quartz was too strong to be taken into account by the linear unmixing methods and the performances were very low. For ternary clay intimate mixtures, CR or CWT/FCLS or /MESMA were the best couples and their performance had the same order of magnitude as those of clay binary mixtures.
As a first conclusion, when two minerals present overlapped absorption features (clay mixtures), the best spectral preprocessings were those enhancing the slight spectral local changes, such as SNV, CR, CWT and 1st SGD. However, SNV was not recommended since it might bring higher errors on mineral abundance estimation in some cases. Oppositely, when no overlap occurred (mixtures with calcite), Hapke, and reflectance spectra performed the best because they rely on the global variation of the spectrum (i.e., continuum), which varied linearly with the addition of minerals.

Performance of Allunmixing Methods with Spectral Preprocessings
For clay binary intimate mixtures and whatever linear or non-linear unmixing method, results of montmorillonite abundance estimation were in the range 10.8%-25.4% (RMSE) for MK mixtures and 7.0%-29.5% for IM ones. MESMA, FCLS and GBM had comparable performances when using SNV, CR, CWT and 1st SGD, the same as MLM combined with whatever spectral preprocessing. MLM algorithm might compensate the higher intrasample spectral variability with the P non-linear parameter, because this spectral variability had a larger amplitude than the non-linearity of the intimate mixture for similar grain size in clay mixtures [34]. Revel et al. [63] have also observed the prevalence of intraclass variability over non-linear effects on the performance of unmixing methods in another context.
For calcite/montmorillonite intimate mixtures, RMSE ranged between 8.2% and 38.6%. GBM had a very close performance with reflectance and Log(1/R) compared to FLCS and MESMA with reflectance, while MLM performed the worst. Compared to clay binary intimate mixtures, this last case may be explained because the P parameter of MLM had negative values. Heylen et al. [34] mentioned that the use of MLM could lead to errors in the case of high reflective materials, like calcite.
For quartz/montmorillonite intimate mixtures, results of montmorillonite abundance estimation were very poor with RMSE around 50.4%-65.3%. The difficulties raised in this study with mixtures containing quartz are also noticed by Asadzadeh et al., Viscarra Rossel et al. and Mulder et al. [15,26,29] since detecting and quantifying quartz are still major limitations for soil mineralogy spectroscopy in the range 400-2500 nm. Indeed, some studies chose to neglect the impact of quartz abundance [26]. However, Debba et al. [16] demonstrated that taking into account quartz abundance as an endmember for unmixing produces more accurate results.
For clay ternary intimate mixtures, besides the increasing impact of non-linear effects from clay binary to clay ternary intimate mixtures, the performance was globally similar whatever the unmixing method with montmorillonite abundance estimation in the range 7.5%-24.3% (RMSE). The best performance was obtained for CWT or 1st SGD/FCLS or MESMA or GBM or MLM.
Globally several trends could be deduced, except for the quartz/montmorillonite mixtures: 1. For clay intimate mixtures either binary or ternary and whatever unmixing methods, the best spectral preprocessings were CWT and 1st SGD while for calcite/montmorillonite intimate mixtures, the best results were obtained without the use of spectral preprocessing. Consequently, it was recommended to use quasi-linear spectral preprocessings (CWT or 1st SGD) when the absorption bands of minerals overlap and keep the reflectance in the other cases (calcite). Attention should be paid with the use of SNV leading to higher biases in montmorillonite abundance estimations, and Log(1/R)/Hapke leading to non robust results dependent on the mixture type and the unmixing method. The only best results for Hapke were obtained with MESMA for clay intimate mixtures (IM: RMSE of 6.6%, MK: RMSE of 12.1% and IMK: RMSE of 7.1%) with minerals of similar granulometry (80 µm), which is in agreement with Heylen and Scheunders [31] (RMSE less than 2% for alunite-quartz mixtures having similar grain size). However, with minerals of different granulometry (calcite: 70 µm, quartz: 300 µm), performance was poor because it violated the main assumption of the Hapke algorithm used in this paper [44]. 2. Similar performances in terms of RMSE were noticed between the use of linear and non-linear unmixing methods. As a result, the use of the simplest linear unmixing method, FCLS, was advised, coupled with the simplest spectral preprocessing, 1 st SGD, for clay intimate mixtures and without the use of spectral preprocessing with calcite/montmorillonite. 3. The error in montmorillonite abundance estimation achieved for the best couple mentioned in Section 2, a RMSE of 9.2% for IM mixtures, 13.9% in MK mixtures, 10.8% in clay ternary mixtures and 8.8% for MC mixtures. These results were better than those obtained by [12] using a geometrical analysis for montmorillonite-illite-kaolinite mixtures (RMSE 15.5%). For more complex mixtures, the performance gave an RMSE of around 8% using a regression tree for smectite-kaolinite-muscovite-calcite-quartz mixtures [26] and an RMSE of around 3.4% using a multivariate analysis for smectite-illlite-kaolinite-carbonate-quartz mixtures [29].

Conclusions
A comparative study was carried out in order to assess the performance of combining unmixing methods (two linear and two non-linear methods, FCLS and MESMA, GBM and MLM, respectively), with and without the use of six spectral preprocessings (SNV, CR, CWT, Hapke, 1st SGD and Log(1/R)), for the estimation of montmorillonite abundance. The objective of this work was to analyze the sources of non-linearities as a function of the mixture type (aeral versus intimate), the number of minerals (binary and ternary) and their nature (clay, calcite and quartz).
The major results included the following: (i) spectral preprocessing SNV, CR, CWT and 1st SGD reduced the spectral intrasample variability, (ii) the benefit of the spectral preprocessings CWT and 1st SGD occurred when spectral absorption features of minerals overlapped, whereas the reflectance spectra without spectral preprocessing performed the best when no overlap occurred, (iii) SNV carried non-linear effects, which led to biases for montmorillonite estimation and the use of Log(1/R) and Hapke sometimes led to non robust results, (iv) unmixing on mixtures with quartz achieved the worst performance with RMSE higher than 50%, (v) linear and non-linear unmixing methods had similar performance and so the use of FCLS, the simplest method, might be recommended and (vi) the most robust couple of spectral preprocessing and unmixing method was 1st SGD and FCLS for the clay binary mixtures and reflectance and FCLS for the mixtures with calcite, with RMSE ranging between 8.8% and 13.9%. For soil swell-shrinking application, our results with ternary and binary clay and clay-calcite mixtures gave an RMSE less than 15%, the use of unmixing methods could be of interest in order to improve the classification of expansive soils.
This study was only carried out with mineral mixtures manually generated in laboratory conditions, in order to study non-linear interactions due to mineralogy alone. However, these mixtures did not represent real soil composition, mainly because they did not include the presence of soil organic matter and water content and because of the homogeneous mineral content (little variability in a mixture, with a controlled grain size). Thus, future work will be to carry out the same comparative study applied on real soils with very high resolution hyperspectral imaging in order to take into account the impact of soil composition like mineralogy and, water content and organic matter content [7,29], as well as environmental factors like soil surface roughness [64] and shadow effects [64][65][66][67] and the presence of either sparse green or dry vegetation with soil within a pixel [49,[68][69][70]. As quartz and carbonates minerals possess distinct features in long wave infrared (LWIR, 7.5-14 µm) [56,71], we propose in a further perspective to use the combination of SWIR and LWIR spectral domains in order to improve clay estimation in soil.