A Probabilistic Approach to Phosphorus Speciation of Soils Using P K-edge XANES Spectroscopy with Linear Combination Fitting

A common technique to quantitatively estimate P speciation in soil samples is to apply linear combination fitting (LCF) to normalized P K-edge X-ray absorption near-edge structure (XANES) spectra. Despite the rapid growth of such applications, the uncertainties of the fitted weights are still poorly known. Further, there are few reports to what extent the LCF standards represent unique end-members. Here, the co-variance between 34 standards was determined and their significance for LCF was discussed. We present a probabilistic approach for refining the calculation of LCF weights based on Latin hypercube sampling of normalized XANES spectra, where the contributions of energy calibration and normalization to fit uncertainty were considered. Many of the LCF standards, particularly within the same standard groups, were strongly correlated. This supports an approach in which the LCF standards are grouped. Moreover, adsorbed phytates and monetite were well described by other standards, which puts into question their use as end-members in LCF. Use of the probabilistic method resulted in uncertainties ranging from 2 to 11 percentage units. Uncertainties in the calibrated energy were important for the LCF weights, particularly for organic P, which changed with up to 2.7 percentage units per 0.01 eV error in energy. These results highlight the necessity of careful energy calibration and the use of frequent calibration checks. The probabilistic approach, in which at least 100 spectral variants are analyzed, improves our ability to identify the most likely P compounds present in a soil sample, and a procedure for this is suggested in the paper.


Introduction
To understand the biogeochemical cycling of phosphorus (P) in nature and the risks of, e.g., P leaching and subsequent eutrophication, knowledge on the speciation of P in soils is crucial. The soil chemistry of P is rather complex and involves several different inorganic and organic phases, which have different reactivity. One of the few available methods for directly estimating P speciation in soils is X-ray absorption spectroscopy, which may include both extended X-ray absorption fine structure (EXAFS) and X-ray absorption near-edge structure (XANES). Both involve exposing a soil sample to X-ray radiation over an energy range close to and including an absorption edge, and quantification of the X-ray absorbance over this energy range, usually (in the case of P) by fluorescence detectors [1]. For a light element such as P, however, the use of EXAFS spectroscopy is difficult for at least two reasons: (i) the presence of sulfur, which is ubiquitous in the environment, limits the usable k range severely; and (ii) the adequate identification of second-shell scattering paths requires much higher P concentrations than those commonly present in soils to obtain a sufficiently clean signal. As regards XANES spectroscopy, data can be collected both at the K-edge, at near 2145 eV, and at the L2,3 edge, at 135 eV. However, for the latter, concentrations of several thousand mg P kg −1 are required for acceptable spectral quality [2], which makes P K-edge XANES spectroscopy a more useful alternative for most soils. Use of P K-edge XANES spectroscopy involves collecting spectra from samples and comparing them with a number of standards thought to represent the most common P species in the sample.
In two early papers, Franke and Hormes [3] and Hesterberg et al. [4] presented P K-edge XANES spectra for a large number of P compounds relevant for soils. A few years later, Beauchemin et al. [5] suggested the use of linear combination fitting (LCF) to estimate the P phase composition of a soil sample from normalized XANES spectra. Since then, the literature has exploded with hundreds of applications of P K-edge XANES to study the P speciation in soils, sediments, by-products, and other environmental materials. No doubt this development has been possible thanks to computer codes such as Athena [6], which greatly facilitate the processing of XANES spectra and their subsequent analysis by LCF. In LCF, a sample spectrum is modelled with a linear combination of spectra from standards that have a known structure and composition.
An important consideration to make when selecting appropriate standards for LCF is to what extent they have unique features that allow a clear discrimination between different P species [1,5,7]. This is not always easy, as some important P-containing compounds, such as organic P, have rather featureless spectra [7,8]. By contrast, when the P speciation is dominated by a Ca phosphate such as hydroxyapatite, the LCF result is less uncertain due to its relative richness in features [9], specifically a distinct shoulder on the high-energy side of the white-line peak. A related problem is that the XANES spectra of many P standards are highly correlated, particularly within the same group of compounds [5]. For example, adsorbed PO4 on ferrihydrite has very similar spectral features as adsorbed PO4 on goethite, making it difficult or even pointless to try to estimate the contribution from each using LCF [5]. This is particularly the case in a matrix such as soil, which contains many other P-containing phases that may blur the small differences in spectral features further. Therefore, a common practice is to sort the standards into different groups such as Fe-or Al-(hydr)oxide-adsorbed phosphate [5,10].
So far, the actual uncertainty of LCF has seldom been studied, although different crude estimates have been made on typical uncertainties of LCF-fitted percentages, e.g., 10% [1,11] or 15% [12]. To shed more light on the uncertainty of LCF, Ajiboye et al. [9] prepared binary mixtures of Ca, Al and Fe phosphates, collected their XANES spectra, and then performed LCF. It was found that the LCFfitted percentages of standards differed by between 0.8% and 17% from the known mixtures, with the lowest percentages for those systems where Ca phosphates made up a large proportion of the P. Although this can serve as an indicative estimate of the magnitude of the errors associated with LCFs, they likely do not provide the whole picture, as the identity of the P-containing phases in a 'real' soil is unknown at the outset of the LCF analysis.
If the uncertainty of LCFs was more precisely known, it may also allow proper statistical treatment to, e.g., figure out statistically significant differences in the P composition between different samples. One important source of uncertainty is normalization, i.e., the procedure by which the P Kedge XANES spectra are normalized to a unit edge jump, where the energy ranges for baseline correction and normalization are somewhat arbitrarily set. To deal with this uncertainty, Werner and Prietzel [11] suggested a method by which a large number (i.e., 65,000) of normalized spectra were produced using different combinations of normalization parameters, within certain limits. These spectra were then subject to LCF using a small number of selected standards. Of the resulting LCFs for which the sum of the individual fractions ("sum of weights", SOW) was very close to 1, the fit with the lowest R factor was chosen. One may argue that this method requires that the standards used constitute a perfect representation of the species found in a sample, since if the P phases in the sample were slightly different as regards crystallinity, impurities, mixed phases, etc., a SOW of 1 for the best fit may not necessarily be expected.
However, normalization is not the only factor that gives rise to uncertainties in LCF. Calvin [13] listed a number of additional sources of uncertainty, e.g., the acquired X-ray energy (resulting from misalignment or from poor energy resolution), self-absorption, glitches, and noise-some of which can be hard to quantify. All these uncertainties led Giguet-Covex et al. [14] to conclude that P K-edge XANES results should be treated with caution, and they recommended the use of complementary techniques (e.g., sequential extractions) to constrain P speciation results.
Among the uncertainties, errors in calibrated energy can be expected to be particularly important on the P K-edge, as a typical LCF analysis makes use of the very small differences in the energy position of the main absorption edge (<1 eV) that differentiates organic P and Ca phosphates from Fe-and Al-associated P [15]. Further, self-absorption, where fluorescence X-rays are reabsorbed by absorber atoms in the sample before they are detected, can be an issue, particularly at high P concentrations. Hurtarte et al. [16] recently showed that self-absorption effects appear to be insignificant at P concentrations < 100 mmol P kg −1 , at least if the samples are well prepared. This would mean that for most soil samples, self-absorption would not be a problem. However, many standards have been prepared at much higher P concentrations (e.g., 300-400 mmol kg −1 ) by dilution in boron nitride or other low-Z, inert material to obtain low-noise spectra for fitting. However, significant self-absorption effects still cannot be ruled out, because of the higher concentrations and because some materials are difficult to disperse successfully in boron nitride.
The main purpose of this work was to address the uncertainty of LCF using a probabilistic approach, in which a large number of normalized spectra were produced from assumed uncertainties in the calibrated energy and normalization parameters. In addition, the library of standards was analyzed to exclude those standards that might be subject to substantial self-absorption and those that were highly correlated with other standards. Based on the findings, we suggest a procedure for how to fine-tune LCF results that allows for proper uncertainty analysis.

Soil Samples
A total of 11 samples were analyzed in this work. These were taken from different projects to represent a wide variety of soil chemical properties. Four of the soil samples-203 FYM 0-28 cm, 203 FYM 35-45 cm, 302 BIO 0-28 cm, and 302 BIO 35-45 cm-were from the long-term field experiment Qualiagro at Feucherolles in the Paris Basin, France. The soils are Glossic Luvisols developed in silt loam. More details on these soils and their P chemistry are provided elsewhere [17]. Four additional samples-Ekebo A3, Ekebo D3, Fors A3, and Fors D3-were taken from the Swedish long-term soil fertility experiments. The Ekebo soil was a loamy Haplic Phaeozem, whereas the Fors soil was a siltyloamy Calcaric Phaeozem [10,15,18]. The samples were taken from the A horizon (0-20 cm), and are from plots receiving different amounts of PK fertilizer, where the A3 plots received no P or K, whereas the D3 plots received 30 kg P and 80 kg K ha −1 yr −1 , plus replacement of harvested P and K. Tärnsjö Oe and Tärnsjö Bs are from a sandy Haplic Podzol 60 km NW of Uppsala, Sweden, where the Oe horizon is from the mor layer and the Bs horizon is from the uppermost part of the B horizon at 10-20 cm depth. Rödålund E is from an Albic Podzol developed in a wave-washed sand deposit 40 km NW of Umeå, Sweden. Basic soil properties of all soils are presented in Table 1.

Phosphorus K-Edge XANES Spectroscopy
Prior to spectroscopic measurements, the soils were air-dried, sieved (<2 mm) and then finely milled to a powder (<0.1 mm). All P K-edge XANES measurements were carried out at BL-8 of the Synchrotron Light Research Institute (SLRI), Nakhon Ratchasima, Thailand [19]. The synchrotron facility was equipped with a 1.3 GeV beam storage ring, with a beam current of 80-150 mA. An InSb (111) double crystal monochromator was used, giving a beam flux of 1.3 × 10 9 to 3 × 10 11 photons s −1 (100 mA) −1 in a 17.7 × 0.9 mm 2 beam. The beamline was operated in the fluorescence mode. Fluorescence emitted from the sample was measured by a solid-state 13-element Ge detector. Data were collected across an interval of 2100-2320 eV.
The powdered sample was applied as a thin layer on P-free Kapton tape, supported by a small metal frame, which was sealed with polypropylene X-ray film and mounted on the sample holder in the beamline. The sample compartment was filled with He gas to avoid photon absorption by air. The intensity of the incoming beam was monitored with a mixed N2/He gas-filled ion chamber. The energy step between each measurement ranged from 0.2 to 5.0 eV, with the smallest steps taken near the edge (2140-2155 eV). A dwell time of 3 s was used. The number of scans per sample ranged from 3 to 8, depending on the P content of the sample. The energy was calibrated by setting the maximum of the first derivative of the spectrum (E0) for elemental P powder (black phosphorus) to 2145.5 eV, which was collected in transmission mode to avoid self-absorption after applying a very thin layer of finely ground P powder on P-free polypropylene tape. If necessary, re-calibration took place after each electron fill of the storage ring at 12 h intervals. During each session, P K-edge XANES data were collected periodically for a variscite standard sample (E0 = 2154.05 eV) to correct for any shifts occurring during the beam time.
The initial data treatment was carried out with Athena, version 0.9.025 [6]. This included energy calibration, merging and normalization of sample scans, and LCF analysis. The normalization procedure included baseline correction by subtracting a linear function from the pre-edge region at −30 to −10 eV relative to E0 from the spectra, where E0 was defined as the maximum in the first derivative of variscite at 2154.05 eV. As a starting point, the edge step used for the normalization was determined with a linear function across the post-edge region between 30 and 45 eV relative to E0. P K-edge XANES spectra for 34 standards were available for use in the LCF (Table S1 and S2). The standard spectra had been collected at the same beamline and under the same experimental conditions as the samples and were diluted with BN to avoid self-absorption [5,20]. However, not all standards are relevant for soils. Further, a number of the standards had a low normalized white-line intensity compared to other similar compounds, which may indicate self-absorption despite the dilution with BN (Table S2). Moreover, as it was suspected that some of the standards may be highly correlated with one another, or could be described with a combination of other standards, the initial stage of the work consisted of an analysis in which the uniqueness of the standards used for LCF was analyzed, as outlined in Section 3.1.
The LCF analysis of the P K-edge XANES spectra of the soils was then carried out using 14 or 15 selected standards (see Table S2 and Results section). The LCF was performed for the energy range between −10 and +30 eV relative to E0, where the latter was kept constant at 2154.05 eV, i.e., at the E0 of variscite. Moreover, up to 4 standards were allowed [20,21]. In the fitting process, energy shifts were not permitted, and the SOW was not forced to one. However, if the SOW of the preliminary LCF was <0.95 or >1.05, the normalization range was modified, usually by adjusting the upper bound of the post-edge region, and then LCF was performed again. For the final LCF, the weights were renormalized to a sum of one. The final normalized spectrum, as well as the corresponding LCF fit, were saved for later uncertainty analysis.

Uncertainty Analysis
In the LCF analysis of Athena, the 1σ (i.e.,±1 standard deviation) uncertainty of each calculated weight is provided in the output. However, this uncertainty only applies to the methodological uncertainty of the LCF itself and is related to the reduced χ 2 (chi-squared) value of the overall fit. In other words, uncertainties caused by other factors such as errors in normalization or calibrated energy are not addressed. In the approach used here, we estimated uncertainties in the calibrated energy and in the normalization procedure by a Monte Carlo sampling approach, in which 100 or 1000 variants of each normalized spectrum were obtained by Latin hypercube sampling (LHS) [22]. For both the normalization and energy errors, we used Beta(α,β) distributions, with α = β = 1.5. These distributions are less peaked than Gaussian distributions but still have the highest probability density at the 50th percentile. In the case of the error in calibrated energy, the 0th and 100th percentiles were set to −0.05 and 0.05 eV (see Figure S1). This implies a slightly smaller error than that associated with energy calibration itself (±0.11 eV, [18]), but represents the error range observed when variscite was used as an internal calibration check. For the normalization errors, we assumed a maximum error of 7% from the initial LCF at E0 and below. The error was then assumed to decrease linearly to 0 at the lower end of the post-edge normalization range, i.e., at an energy of +30 eV relative to E0 with the standard parameters given above. The value of 7% was obtained by systematically varying the normalization range to get SOW values ranging between 0.95 and 1.05 in the LCF. An example of the resulting LHS-generated spectral variants is shown in Figure 1 for the 203 FYM 0-28 cm sample. For this sample, the normalized white-line intensity varied by 14% between 5.76 and 6.59. Each of these sampled spectral variants were then subject to LCF analysis using all the selected standards for the soils. The code AthenaAut was written in Visual Basic NET to produce the LHSgenerated spectral variants, to automate Athena ver. 0.9.025 for multiple runs, to retrieve the best fits for all LCFs and to import them to Excel for further data handling. In this work, we used the so-called R factor [23] as the goodness-of-fit value, where R is defined in the following way: Equation (1) dictates that the lower the R factor, the better the fit. Whether the R factor is an ideal goodness-of-fit parameter for LCF is certainly up for discussion. For example, the risk for parameter overfitting is not addressed in this approach. Other ways of assessing the goodness of fit such as Akaike's or Bayesian information criteria would probably be preferred. However, we chose to extract the R factor of the best fit for two reasons. Firstly, in current usage, it is the most commonly used output from XANES-LCF, and second, it is the goodness-of-fit parameter that is most easily extracted from Athena with the coding approach used in this work. Thus, a new 1σ uncertainty was obtained from the standard deviation of the spectral weights in the outputs from all the fits with the lowest R factor. At the same time, we assumed that the 'methodological' uncertainty was automatically incorporated in the overall uncertainty. Normalized absorption

Energy / eV
Preliminary runs showed, however, that a large number of standards occurred in the outputs, and that few standards (if any) occurred consistently in all the best fits (see Table S3). This was probably due to the strong correlation between many standards, particularly if they represented the same type of compounds. When standards did not appear in the outputs from individual LCFs, they were assigned a weight of 0 in the statistical evaluation, which caused high calculated uncertainties. To deal with this problem, we developed a procedure consisting of the following steps. First, the selected standards were divided into six standard groups in line with Eriksson et al. [ Although this reduced the problems with many weights of 0 in the outputs, it did not eliminate them. For this reason, we applied the following procedure for eliminating the number of zero weights as much as possible (see also examples in the Supplementary Materials): 1. The AthenaAut program was executed to calculate linear combination (LC) fits for 100 spectral variants obtained by LHS, using the following LCF settings in Athena: all standards were marked; at most, 4 standards were used; all combinations were fit. For each best fit of the 100 outputs, standards with non-zero weights were grouped into the appropriate standard group as defined above. Thus, for each output, there was a certain combination of max. 4 standard groups in the best fit. 2. If the same combination of standard groups was present in > 50% of the LC fits, this combination was assumed to represent the P compounds present in the soil. For those spectral variants for which the best LC fit contained other combinations, AthenaAut was rerun, excluding those standards that were not included in the best combination, and then the results from all 100 outputs could be compiled. 3. In cases where no combination of standard groups was present in 50% or more of the LC fits, it was tested whether 50% could be obtained if: all standards in the FeP and PO4 ads Fe groups were regrouped into a single group Fe-bound P, or if all standards in the AlP and PO4 ads Al groups were regrouped into a single group Al-bound P.
If >50% of the LC fits contained the same combination, AthenaAut was rerun for spectral variants with other combinations, etc., as in point 2. If 50% could still not be obtained, the next step was to test whether the use of both Fe-bound P and Al-bound P resulted in >50% for the best combination. 4. When it was still not possible to obtain the same combination of standard groups in >50% of the LC fits, all standards in the Fe-bound P and Al-bound P groups were regrouped into a single group Fe-and Al-bound P.
The whole data treatment process, from data collection to the final results with uncertainties, is summarized in Figure S2. Noise was treated as an inherent property of the normalized spectrum, and therefore we did not assign uncertainties to this parameter. Although it can be discussed whether noise should be included in an uncertainty analysis of an LCF, the derivation of a 'noise-free' spectrum, which would be important for this exercise, is non-trivial and uncertain in itself, mainly because of the rather small number of data points in regions where features occur (i.e., at the preedge or at the main absorption edge). Therefore, the impact of noise on the quality of the LCF is addressed here only using the goodness-of-fit parameter (i.e., the R factor).

Uniqueness of Standards Used for LCF
To study the covariance between the 34 different standards, Pearson's correlation coefficients were calculated for normalized intensity data in the region of the LCF (−10 to +30 eV relative to E0), and the results are presented in correlation matrix form in Figure 2. The correlation coefficients were often very high, particularly between standards in the same group of compounds, e.g., between different Ca phosphates and between Fe-and Al-bound PO4 phases. This suggests that only small variations in the P K-edge XANES spectra can lead to other standards being selected in the overall best fit of an LCF, which provides support for the idea to group the standards into different compound groups when interpreting LCF results in a complex matrix such as soils. Correlation matrix for the P K-edge XANES spectra of the 34 standard samples considered. Data points ranging from −10 to +30 eV relative to E0 were included in the analysis. For abbreviations, see Table S1.
In addition, some standards were also strongly correlated with standards in other compound groups. Monetite is one example of such a standard. This can be attributed to the relatively featureless spectrum of monetite, as it (contrary to the other Ca phosphates) does not show a marked post-edge shoulder at ~2157 eV [24]. For this reason, monetite was not selected as a standard in the library for fitting soil samples, as used in this paper.
Another way to study the uniqueness of the standards used for LCF is to investigate whether a standard spectrum can be described with a combination of other standards. Therefore, each standard was subject to LCF in which all the other standards were included, using the same methodology as for soils; however, in this case, a maximum of three standards were allowed in the fit to avoid excessively long execution times, and the results were not subject to uncertainty analysis. The best fits are summarized in Table 2. Overall, the results reinforce the results from the correlation matrix in Figure 1, i.e., that many Ca phosphates are similar to one another and consequently they can be described with a combination of other Ca phosphates with excellent, i.e., low, R factors. The similar appearance of many Ca phosphate spectra is in accordance with previous findings [25,26], and this makes it difficult to distinguish between different Ca phosphates when they appear together with other P-containing compounds, which is a common situation in soils. A similar case holds for Fe-and Al-bound PO4. Substantial amounts of Al-bound PO4 often appear in the LCF for Fe-bound PO4 1 2 3 4 5 6 7 8 9 10 11 12 13 1 4 1 5 1 6 1 7 1 8 1 9 2 0 21 22 23 24 25 26 27 2 8 2 9 3 0 3 1 3 2 3 3   species and vice versa, showing that the differentiation between Fe-and Al-bound P may not always be an easy task, particularly not when these phases constitute a low proportion of the total P. Remarkably, the standards in which phytate was adsorbed to ferrihydrite, Al(OH)3 and allophane could be excellently well described with a combination of other standards. All of these LC fits contained approximately 50% organic P (as lecithin) and 50% Fe-or Al-bound PO4, depending on the standard, and the R factor was better than 0.001 (an example of a fit is shown in Figure S3). Adsorbed phytates have recently been used in XANES-LCF for soils, often suggesting a strong contribution of adsorbed phytates to the overall P speciation [27][28][29][30]. However, our results suggest that any contribution of adsorbed phytates might alternatively be described with a combination of organic P and Fe-and Al-associated PO4. This is also in line with recent work showing that the whiteline energy and intensity, upon the adsorption of an organic P species to an Fe and Al (hydr)oxide, change to values intermediate to those of pure organic P and of inorganic PO4 adsorbed to the same (hydr)oxides [31,32]. For these reasons, we decided to exclude the adsorbed phytates from our database for soils. We acknowledge that this does not mean that these phases are not present, only that their contribution to soil P probably cannot be determined with confidence by P K-edge XANES alone. This means that to an unknown extent, the modelled proportions of organic P and Fe-and Albound P may include adsorbed phytates.
The final list of selected standards for soils, and the rationale for not including the other standards, is summarized in Table S2. In all, 15 standards were selected-of which one (the ACP standard) is only used at pH values above 7.5, as ACP is not stable below this pH value and will quickly dissolve [33]. The XANES spectra of the selected standards are shown in Figure S4.

Uncertainty Analysis and Probabilistic Estimation of Soil Phosphorus Speciation
A first consideration to make when evaluating a Monte Carlo sampling approach for uncertainty analysis is to decide on the number of samples generated by LHS. The larger the number of samples, the better is the precision of the method. Ideally, one would allow the LHS to generate 10,000 samples or more for the best possible precision. However, this is not practical, as this would lead to excessively long execution times. As Athena required approximately 10 min to complete a single LCF with 14 standards (1456 combinations) on the PC used in this work, LCF on 10,000 spectral variants would require approximately 70 days. To be a realistic option, it seems likely that LCF on a maximum of 100 spectral variants, taking 16 h to complete, would be preferred. For two soil samples, 203 FYM 0-28 cm and Tärnsjö Oe, it was tested to what extent the LCF results differed depending on whether 100 or 1000 spectral variants were used. The differences obtained were much smaller than the overall uncertainty calculated by the method ( Table 3), suggesting that the use of 100 spectral variants should be sufficient. Because of the small differences, no effort was made to incorporate the uncertainty as resulting by the suboptimal number of spectral variants in the overall uncertainty.
When comparing the output from traditional LCF using only one normalized spectrum (here referred to as the "single-fit" method) with the average weights calculated from all outputs generated from LHS-generated normalized spectra (the "probabilistic" method), it was found that they gave very similar results both in terms of weights and goodness of fit (R factors), see Table 3. It should be noted that for the probabilistic method, the R factors in Table 3 refer to the average of all outputs. This means that for some outputs, the R factors were much better than that resulting from applying the single-fit method. Table 3. Relative phosphorus speciation in soil samples (%) as evidenced from LCF using the single fit (sf) and probabilistic (prob) methods. Standard deviations are shown within parenthesis. For each standard group included in the fit, the standard with the highest average weight is shown. With the prob method, results for 100 or 1000 LHS-generated spectra are shown. The standard deviations are the ones reported by Athena (sf method) or the ones calculated from the uncertainty analysis (prob method). For the prob method the R factors represent the average of all outputs. P-Goeth = PO4 adsorbed to goethite, P-Gibbs = PO4 adsorbed to gibbsite, P-Allo = PO4 adsorbed to allophane, HAp = hydroxyapatite, Apat T = Apatite Taiba, and Brush = Brushite.
However, because the probabilistic method generated outputs that were not consistent as regards the standard groups included in the LCF, some groups had to be combined into larger groups (c.f. Examples in Supplementary Materials). The results for Rödålund E and Tärnsjö Bs illustrate this: in both cases, the AlP and PO4 ads Al groups had to be combined into a single Al-bound P group for >50% of the best fits to be described. This can be interpreted as evidence that the LCF cannot distinguish well between Al phosphates and PO4 adsorbed to Al hydroxides. For the Tärnsjö Oe and Fors soil samples, all the Fe-and Al-bound P standards even had to be regrouped into a single group, Fe-and Al-bound P.
In most cases, the obtained uncertainties were below the 10% as suggested by previous authors [1,11], with one exception: for organic P in the Fors D3 soil (10.8% uncertainty). Usually the weight of the Ca phosphate had the lowest uncertainties, commonly below 5%, again with the exception of the Fors soil. Whereas errors in normalization did not appear to lead to any consistent differences in the obtained P speciation, the errors in calibrated energy were very influential for the obtained weights, in particular for organic P (see examples in Figure 3). This was particularly true in those cases for which organic P was rather low. For the 203 FYM 0-28 cm soil, for example, the slope of the regression line means that the weight of soil organic P increased with 2.7 percentage units for every 0.01 eV the calibrated energy was displaced towards lower energies. For Tärnsjö Oe, however, where the P speciation was dominated by soil organic P, the corresponding increase was only 0.8 percentage units. Further, the weights of Ca phosphates were sensitive to the uncertainty in calibrated energy, but the trend was not consistent in this case (Figure 4). In the 203 FYM 0-28 cm soil, the weight of Ca phosphate decreased with 1.2% percentage units for every lower 0.01 eV in energy, but the weight increased at lower energy for the two other soil samples shown in Figure 4.  Table 3.   Table 3.
The outputs from the probabilistic method differed in their distributions. An example of this is seen in Figure 5 for the 203 FYM 0-28 cm sample. While the results for Ca phosphates appeared to be fairly well normally distributed, the distributions of Fe-and Al-bound P often exhibited tailing effects-some of which can be seen in Figure 5. Again, this is probably due to the strong correlation between the standards in these groups. In addition, the distribution of organic P was far from being normally distributed. Thus, any statistical treatment of such LCF outputs will probably require tests that do not rely on a particular distribution to fit the data.

Discussion
The traditional single-fit method for LCF has a number of deficiencies that have been highlighted in past research. Apart from the poorly constrained estimates of uncertainty as discussed above, it has often been observed that a large number of different combinations of fitting standards may result in a similar goodness of fit. One way to deal with this problem has been to present results of the five best fits to check whether they are consistent [10,21]. Calvin [13] presented another, more sophisticated, method, i.e., the statistically based Hamilton test for determining whether the best fit could be differentiated from other fits. This test has since been used in a number of publications dealing with P K-edge XANES spectroscopy [12,34,35]. However, although the Hamilton test addresses a fundamental issue with single-fit LCF, it still does not address uncertainties other than those associated with the LCF method itself. A Monte Carlo-based uncertainty analysis approach should be better suited for identifying the likely standard groups present in a sample and will constrain their weights more accurately. This does not mean that the selection of standard groups is without problems in a probabilistic LCF. As mentioned above (and as seen in Table S3), because many standards appear in the best fits of 100 otherwise very similar spectra, some kind of selection criteria, not firmly based in statistics, need to be used. The criteria used here, i.e., that, if possible, 50% of the fits should be consistent with a certain combination of standard groups, can certainly be discussed and should be evaluated in more detail. For example, one way to constrain the selection criteria could be to consider the chemical mass balance of elements composing the standards (e.g., Ca/P ratios relative to percentages of different Ca-phosphates) [36] or other soil chemical properties.
To what extent the speciation of individual P-containing compounds can be determined in soils with XANES spectroscopy is a matter of ongoing discussion [37,38]. Although this issue was not specifically addressed in this paper, it may be hypothesized that standards that show up often in the outputs from the probabilistic method, and are present in high concentrations, are likely to provide a good representation of a soil P phase. For each standard group, Table 3 identifies the standard that was present in the highest concentration. This information, together with Table S3, which shows the number of occurrences of standards in the fits, leads to the conclusion that apatite was usually the most common Ca-P mineral phase in these soils, although brushite was also present. For Fe-and Albound P, the results were less consistent and suggested that the predominant species is different in different soils. An important outcome of P speciation analysis is to predict behavior such as P mobilization under various geochemical conditions. Thus, tying XANES speciation results to soil P behavior, e.g., including an analysis of how fitted species change after leaching treatments, would be one indication of the suitability of the fitted standards for modelling soil P species.
The problem observed with including adsorbed phytates as standards in the LCF illustrates that for successful use of the LCF, the standards should be regarded as end-members. Standards that can be well described using a combination of other standards are not statistically meaningful and should, in general, be avoided, as they can give a misleading picture of the actual speciation. This, of course, calls for caution when interpreting results from LCF, because sometimes (as in the case of adsorbed phytates) such phases may represent an own specific group that is distinctly different from other groups and therefore of interest to identify. Complementary studies with other methods are necessary to clarify to what extent these kinds of phases are present.
An important observation is that the P speciation as obtained by P K-edge XANES spectroscopy is very sensitive to errors in the calibrated energy; this is especially true for organic P. This effect was significant, even for energy shifts of only ±0.05 eV ( Figure 3; Figure 4). Thus, researchers working with P K-edge XANES speciation of natural materials need to make every effort to ensure that the calibrated energy is as precisely known as possible. The use of a standard with a precisely known E0 as an internal calibration check (variscite in our case) is one possibility that reduces this uncertainty [39,40]. Improved energy resolution at the beamline might be another.
In summary, the results shown here demonstrate that it appears possible to use a Monte Carlo approach not only to calculate uncertainties, but also to calculate LCF weights that considers in a better way uncertainties in the results caused by, for example, the non-uniqueness in the standards used for LCF. Such a probabilistic approach allows proper discretization of identifiable standard groups. It should also allow statistical treatment of the output distributions to estimate significant differences in weights between different samples.

Supplementary Materials:
The following are available online at www.mdpi.com/2571-8789/4/2/26/s1, Figure S1: Probability density function of the Beta distribution used for linear hypercube sampling of the misalignment of the energy calibration, Figure S2: Flow chart illustrating the data treatment process as described in the paper, Figure S3: Results from linear combination fitting of phytate adsorbed to ferrihydrite, Figure S4: Stacked P Kedge XANES spectra of the 15 standards used for routine LCF analyses of soil samples, Figure S5: Stacked P Kedge XANES spectra of the 11 soils analyzed in this study, Table S1: Origin/synthesis method of the 34 standards used for P K-edge XANES spectroscopy, Table S2: Prominent features of P K-edge XANES spectra of the 34 standards considered, Table S3