Revealing Topsoil Behavior to Compaction from Mining Field Observations

: Soils are a finite resource that is under threat, mainly due to human pressure. Therefore, there is an urgent need to produce maps of soil properties, functions and behaviors that can support land management and various stakeholders’ decisions. Compaction is a major threat to soil functions, such as water infiltration and storage, and crops’ root growth. However, there is no general agreement on a universal and easy-to-implement indicator of soil susceptibility to compaction. The proposed indicators of soil compaction require numerous analytical determinations (mainly bulk density measurements) that are cost prohibitive to implement. In this study, we used data collected in numerous in situ topsoil observations during conventional soil survey and compared field observations to usual indicators of soil compactness. We unraveled the relationships between field estimates of soil compactness and measured soil properties. Most of the quantitative indicators proposed by the literature were rather consistent with the ordering of soil compactness classes observed in the field. The best relationship was obtained with an indicator using bulk density and clay (BDr 2 ) to define three classes of rooting limitation. We distinguished six clusters of topsoil behaviors using hierarchical clustering. These clusters exhibited different soil behaviors to compaction that were related to soil properties, such as particle-size fractions, pH, CaCO 3 and organic carbon content, cation exchange capacity, and some BDr 2 threshold values. We demonstrate and discuss the usefulness of field observations to assess topsoil behavior to compaction. The main novelty of this study is the use of large numbers of qualitative field observations of soil profiles and clustering to identify contrasting behavior. To our knowledge, this approach has almost never been implemented. Overall, analysis of qualitative and quantitative information collected in numerous profiles offers a new way to discriminate some broad categories of soil behavior that could be used to support land management and stakeholders’ decisions.


Introduction
Soil degradation by compaction continues to be a concern due to the increasing use of heavy machinery in intensive agriculture and forestry.Compaction decreases the pore space in soils.As a result, it has important consequences for germination rates, seedling emergence, plant rooting and crop production [1][2][3].Soil compaction also reduces oxygen diffusion and water movement, thus leading to increased N 2 O emissions and runoff Land 2024, 13, 909 2 of 23 risks [4,5].Soil compaction processes have been extensively studied for a long time [6,7] and several models have been proposed and evaluated [5].Larson et al. [8] determined compression curves from 36 agricultural soil samples globally (belonging to 8 different soil orders based on soil taxonomic classification).They applied a range of stresses on these samples under different water content, measured the resulting changes in bulk density (BD) and concluded that the compression curves were dependent on clay content, clay mineralogy and water content.De Lima et al. [9] developed an R package soilphysics for simulation of soil compaction induced by agricultural field traffic.In Table 2 of their article, De Lima et al. [9] listed a series of existing pedo-transfer functions (PTFs) for estimation of soil precompression (σp), compressive property stress, and compressive parameters from "readily available soil properties".However, the very large number and types of input parameters required to estimate σp and compressive properties raises the question of their availability.Conversely, some very basic soil properties in the input parameters (e.g., particle-size fractions, soil organic carbon content) are often missing.In addition, the soils to which these PTFs apply are poorly defined, thus the domain validity of these PTFs is unknown.We fully acknowledge the usefulness of such PTFs and the need for a mechanistic modeling of soil compaction processes to predict the distribution of stress in the soil induced by farm vehicles and resulting changes in soil structure [5].However, the very large number of input parameters on the one hand, and the oversimplification of soil parameters inputs on the other hand, do not favor their practical use as generic tools to predict susceptibility to compactness over large areas.This is why there is a need for developing simple and easy-to-implement indicators of soil compactness and/or of susceptibility to compaction.Some of the commonly proposed indicators of soil compactness are soil BD and some of its derivative indicators, such as packing density (PD), soil compaction degree (SCD) and root restrictive BD (BDr 1 and BDr 2 ).The PD was first defined by Renger [10] as a function of BD and clay content.It was used by Jones et al. [11] and Panagos et al. [12] at the EU scale.Jones [13] proposed threshold values of BD (BDr 1 and BDr 2 ) for restricting root depth, calculated from linear equations using clay content (BDr 1 ) or (clay + silt) content (BDr 2 ).Stânga [14] proposed the soil compaction degree (SCD) based on the concept of minimum required porosity (MNP).Stânga first defined MNP as a linear function of clay content.Then, he defined the SCD as the ratio (MNP-Total Porosity)/MNP, expressed in %.Other proposed compactness indicators used various ratios between the observed BD and BD measured in the lab after application of various compaction stresses [15][16][17].A limitation of these latter ratios is that they require numerous laboratory measurements.
Indeed, most indicators of compactness or susceptibility to compaction require measurements of BD for comparison against reference values.Moreover, some of these indicators, such as BD, PD, BDr 1 and BDr 2 , rely on over-simplistic hypotheses and equations.
If we simplify reality by separating mineral constituents, soil organic matter constituents, and roots, the maximum compactness of soil depends on: (i) the relative proportion within the soil of various mineral and organic matter particles, including their morphological characteristics (sizes, shapes, density and rheological properties); and (ii) the relative proportion of roots and their morphological characteristics, similar to those for mineral and organic particles.
To establish a general relationship between these variables, it would be necessary to describe the arrangement of all constituents according to their morphological characteristics, proportion, and properties.In practice, this approach is not feasible given the number of parameters to consider and the difficulty in evaluating all of them.Therefore, in a first approximation, it seems logical to explore if simple properties, such as soil texture (ST) and soil organic carbon content (SOC), would discriminate some typical soil behaviors regarding their porosity and susceptibility to compaction [18].Conversely, the "natural" resilience of soils to compaction depends on swelling/shrinkage properties, biological activity (e.g., bioturbation), and soil climate.As the intensity of compaction strongly Land 2024, 13, 909 3 of 23 depends on the type of mechanical stresses applied to soil and on other various factors, there are a myriad of combinations of situations leading to an observed compactness.
Visual field assessment of soil structure quality has been used for a long time by agronomists and soil scientists to assess the effects of different soil management practices on soil structure [19][20][21][22][23][24][25][26][27][28].Two examples of widely used procedures are field Visual Soil Assessment (VSA [25]) and Visual Evaluation of Soil Structure (VESS [21,28]).VSA and VESS often involve qualitative physical tests, such as aggregate strength, assessed by applying progressive force using fingers and hands [21].Most of the studies on VSA and VESS have been conducted to compare soil structure under different land use, soil types, and agricultural practices, often based on long-term field experiments.Some studies attempted to use VSA or VESS to build indicators of soil structure quality, such as the SOC/Clay ratio [27,[29][30][31][32][33].The SOC/Clay indicator was criticized by Poeplau and Don, [34] who demonstrated that this ratio could lead to inconsistent values.Using a large set of observations in France, Rabot et al. [35] also questioned the relevance of SOC/Clay as a national soil health indicator.Poeplau and Don [34] proposed another ratio between actual and expected SOC (SOC/SOC exp ) as an easy-to-use alternative, where expected SOC is derived from a regression between SOC and clay content.Overall, the literature shows that it is hard to derive generic and commonly accepted thresholds for soil structure indicators.Indeed, as proposed by Rabot et al. [35], some indicators might be more relevant when calibrated locally.In addition, some other indicators are highly variable temporally and spatially (especially when they involve BD measurements).
Considering these variations, one may wonder if a large number of observations could support classification, not only of structural state, but also classes of soil response to compaction.Indeed, a major drawback is that most of the studies, and even meta-analyses (e.g., [36]), included a rather low number of in situ observations and were often restricted to specific soils or textural ranges.This is why it is tempting to test the potential of large databases with in situ soil observations in order to cover a maximum range of scenarios and their combinations.
Soil scientists and soil surveyors conduct many soil descriptions during their field work.These descriptions use national instructions and standard protocols described in detail in field books [37][38][39][40][41].However, the qualitative information gathered by soil surveyors and/or stored in soil databases is seldom used, especially for estimating soil physical properties and behaviors and relating them to quantitative soil properties, such as texture, soil organic carbon content, etc.A noticeable exception is the study from Bondi et al. [42] who used machine learning to predict soil bulk density from "visual parameters".
Our aim is to test if multivariate analysis of in situ observations conducted in the framework of conventional soil surveys may provide interesting insights into inherent soil responses to compaction over a large geographic area.Our specific objectives are to (i) unravel relationships between a simple field estimate of topsoil compactness degree and measured properties/indicators, (ii) assess if some clusters of topsoil behavior to compaction can be extracted from these observations, and (iii) provide a first interpretation of these clusters in terms of topsoil behavior to compaction.

Soil Horizon Selection
Topsoil data come from the French national database DoneSol [43,44].We first selected horizons with a 0-30 cm depth from the database.Several horizons per observation site could thus be selected.We checked the names of the horizons to keep only topsoil horizons and we removed O horizons.From DoneSol, we selected data from all cultivated topsoil horizons.Most of the remaining topsoil horizons were A horizons that had been subjected to soil tillage at least once during the past 20 years.The selected horizons included topsoil converted to no-till, and many other tillage management practices.They also included soils under a large variety of crop rotations, some of which included non-permanent grassland.Soils under permanent grasslands were excluded from the study.
We used only the observations performed at a soil moisture slightly lower than field capacity and far from the permanent wilting point.This selection was based on soil moisture classes allocated to horizons by the soil surveyors at the time of the profile description and on-field tests.In total, we retained 27,775 horizons from mainland France.

Soil Attributes
In DoneSol, the compactness of a horizon is an ordered categorical variable having four classes.These soil compactness classes (SCCs) correspond to a knife penetrability test performed on each horizon when describing soil profiles.The SCCs are described in Table 1.The SCCs correspond to a soil surveyor's assessment of soil knife penetrability recorded after five attempts and avoiding large voids that may have come from recent tillage operations.The SCCs reflect "the ability of soil in a confined (field) state to resist penetration by a rigid object" as defined by the USDA [41], but are less precise than the USDA definition, which adds "of specified size", after "object".We removed horizons with missing SCC observations.Filtering missing SCCs resulted in 14,572 horizons with a description of compactness.
Quantitative variables corresponding to laboratory analysis for these horizons were also extracted (Table 2), and box plots were drawn according to each SCC (Table 1).For SOC, we eliminated values greater than 80 g kg −1 in order to remove holorganic horizons [45].We also removed values of BD ≥ 2 because they were considered as errors, or as topsoils containing too many hard and within-site dense coarse elements (gravels, stones, rocks), which usually increase BD and variability and add noise to the relationships between BD and soil compactness.Bulk Density (BD) Soil BD has been often used as a compactness indicator (e.g., [52][53][54][55][56][57]). Thus, we selected this indicator.However, it is well-known that BD values and thresholds must be interpreted in the context of other soil properties, e.g., particle-size distribution, clay content, SOC content, coarse fragment volume (e.g., [10,11,[58][59][60][61][62]).Therefore, we also calculated eight other indicators taken from the literature.
Complexed Organic Carbon (COC) and Non-Complexed Organic Carbon (NCOC) Dexter et al. [63] defined the concept of Complexed organic carbon (COC) as the ratio of the clay content divided by SOC (Clay/SOC).For French and Polish topsoils, Dexter et al. [63] stated that, for a couple of soil physical properties related to soil structure, the optimal value of Clay/SOC was equal to 10.They defined COC and NCOC as follows.
In this case, we can also possibly deduce the unsaturated clay: ClayNsat = Clay − (SOCx10).
In this case, all the clay is saturated (ClayNsat = 0).

SOC/Clay Index
We calculated the SOC/Clay ratio.Then, we applied the classification proposed by Johannes et al. [29], used by Prout et al. [31,33] and tested by Pulley et al. [32], to transform SOC/Clay ratio into four classes (Table 3).The SOC/SOC exp index proposed by Poeplau and Don [34] was based on a regression between observed SOC and Clay contents in Germany.Here, we kept the original concept from Poeplau and Don but we adopted a similar yet slightly modified method to alleviate the effect of extreme SOC values.To estimate expected SOC content (SOC exp ), we transformed clay content (%) in 25 classes of 4% clay content range.For each class of clay, we considered SOC exp as equal to the SOC median value.Then, as proposed by Poeplau and Don [34], we used quartiles of SOC/SOC exp .These quartiles were calculated for clay content classes to generate SOC/SOC exp index classes as shown in Table 4.
Table 4. SOC/SOC exp ratio classes of soil structure quality (SSQ) and their threshold values.SOC: soil organic carbon content (g kg −1 ); SOC exp : expected soil organic carbon content (g kg −1 ).SOC exp is estimated by the SOC median value for classes of increasing clay content (%).Clay content intra-class range is 4%.q1, q2, q3, q4: 1st, 2nd, 3rd and 4th quartile of SOC content statistical distributions in every clay content class, respectively.Packing density (PD) was defined by Renger [10] as a function of BD (Mg m −3 ) and clay content (Clay, <0.002 mm size, in %).Packing density is calculated as:

SOC/SOC
where PD et BD are expressed in Mg m −3 and Clay is expressed in % (i.e., g 100 g −1 ).Interpretation limits for PD classes are shown in Table 5. Stângă [14] proposed the soil compaction degree (SCD, in Romanian: Gradul de Tasare, GT) term, which was then used in the soil literature from Romania [64].The SCD is an index arising from bulk density (BD) and the clay content (Clay).Stângă [14] first defined the Minimum Needed Porosity (MNP) by the following equation: where Clay is expressed in weight % (g 100 g −1 ), and MNP is expressed in % volume.
The Total Porosity (TP) is defined by: where BD is bulk density expressed as Mg m −3 , D is the density of solid particles assumed to be 2.65 Mg m −3 , and TP is expressed in % volume.
Then the SCD is calculated as: where SCD is expressed in % vol.Interpretation limits for six SCD classes are shown in Table 6.Jones [13] estimated threshold values of BD (BDr 1 and BDr 2 ) for restricting root depth by using relationships between BD and clay (BDr 1 ) or BD and clay + silt (BDr 2 ).He estimated classes of root restrictive bulk density (Table 7).We compared raw analytical data and indicators described in Section 2.1.3. to field observations of soil compactness classes (SCCs).To this aim, we calculated the relative proportion of the ordered qualitative indicators of soil structural quality/soil strength according to the four observed SCCs.We also plotted box plots of quantitative raw data and various derived data (e.g., COC, NCOC, CEC, CaCO 3 , pH, SOC/Clay, SOC/SOC exp ) according to each SCC.In order to project the data into a ST triangle, we selected the Aisne equilateral ST triangle [65], which is one of the two most widely used ST triangles in France [66].These projections were performed in R (version 4.2.2) with the soiltexture package [67].

Discretizing the Soil Texture Triangle
In an orthonormal space of silt% on the abscissa and clay% on the ordinate, we divided the space into squares of 2% × 2%, called hereafter soil texture cells (STcells).In order to calculate robust statistics for each STcell, we only kept STcells with ≥10 observations of SCC (Figure 1).This filtering resulted in 422 STcells.
Land 2024, 13, x FOR PEER REVIEW 7 of 24 Jones [13] estimated threshold values of BD (BDr1 and BDr2) for restricting root depth by using relationships between BD and clay (BDr1) or BD and clay + silt (BDr2).He estimated classes of root restrictive bulk density (Table 7).We compared raw analytical data and indicators described in Section 2.1.3. to field observations of soil compactness classes (SCCs).To this aim, we calculated the relative proportion of the ordered qualitative indicators of soil structural quality/soil strength according to the four observed SCCs.We also plotted box plots of quantitative raw data and various derived data (e.g., COC, NCOC, CEC, CaCO3, pH, SOC/Clay, SOC/SOCexp) according to each SCC.In order to project the data into a ST triangle, we selected the Aisne equilateral ST triangle [65], which is one of the two most widely used ST triangles in France [66].These projections were performed in R (version 4.2.2) with the soiltexture package [67].

Discretizing the Soil Texture Triangle
In an orthonormal space of silt% on the abscissa and clay% on the ordinate, we divided the space into squares of 2% × 2%, called hereafter soil texture cells (STcells).In order to calculate robust statistics for each STcell, we only kept STcells with ≥ 10 observations of SCC (Figure 1).This filtering resulted in 422 STcells.

Clustering
Within each STcell, we calculated the frequency of each SCC and we derived four quantitative variables (SCC-1, SCC-2, SCC-3 and SCC-4), each being equal to the frequency of a given SCC within a STcell.
We then performed a clustering of STcells, using these four quantitative variables.Hierarchical clustering was carried out in R with the cluster package [68].We ran the dissimilarity matrix calculation with the Gower's distance [69].For the aggregation method, to construct the dendrogram, we used the Ward method [70].We performed a principal component analysis (PCA) to visualize the clustering results, using the fviz_cluster function of R package factoextra (version 1.0.7.[71]).
To optimize the number of clusters, we used three methods.The methods were sum of squared errors (SSE), goodness of clustering measure or the "gap" statistic, and minimal number of clusters' members described below.
We first studied changes in the SSE with an increasing number of clusters.The SSE is defined as the sum of the squared distance between each member of a cluster and its cluster centroid.The formula for SSE is: where y i represents the observed value of a point, y represents the predicted value of the cluster centroïd, and Σ represents the sum over all data points.
The SSE quantifies the overall magnitude of the residuals or errors in the clustering model.Thus, SSE can be considered as an overall measure of error.In general, as the number of clusters increases, SSE should decrease because clusters are, by definition, smaller.Therefore, studying changes in SSE with increasing numbers of clusters resembles a trade-off approach between reducing errors while keeping a reasonable number of clusters.Figure S1 displays the SEE vs. the number of clusters.The SSE exhibited a sharp decrease until 6 clusters and stabilized between 6 and 7 clusters suggesting that increasing the number of clusters to >6 does not have a substantial impact on total SSE.
The second method for choosing the number of clusters k consisted of calculating a goodness of clustering measure, i.e., the "gap" statistic [72].The gap statistic compares the total variation within the cluster for different values of k with their expected values under a zero-reference distribution of the data.For a detailed description of this method, we refer to [72].The estimate of the optimal number of clusters is the value that maximizes the gap statistic.This value indicates a clustering structure, being the farthest from the random and uniform distributions of points.We performed this calculation with the clusGap function of the R cluster package [68] by testing a number of clusters ranging from 2 to 10. Figure S2 shows that applying the gap statistic resulted in an optimal cluster number of 6, similar to the SSE method.
Finally, we wanted to have a minimum number of individuals (i.e., STcells) in each cluster to be able to run robust statistics.In particular, we aimed to avoid clusters with only very few STcells to prevent a small number of STcells with similar SCCs from generating random clusters.Figure S3 displays the number of STcells by cluster with the increasing number of clusters.
When the number of clusters was >7, all clustering resulted in one cluster containing only one STcell.From 6 clusters to 7 clusters, the number of clusters having rather few STcells (≤40) doubled from 2 to 4, and represented 17.5 and 34.5% of the overall STcells' population, respectively.In addition, using 7 clusters led to one cluster having less than 30 STcells.
Overall, the results from the 3 tests performed led to similar results.The optimal number of clusters was 6, as shown in Figures S1-S3 and their detailed captions.
We projected the clusters onto the STcells.We also assigned clusters to each observation point, which made it possible to analyze their relationships with quantitative data.For each Land 2024, 13, 909 9 of 23 cluster, we plotted box plots of SCC frequency, selected quantitative raw data, indicators' raw values, and indicator classes.The distributions of SCCs 1 and 2 were strongly linked to the density of all observation points.These distributions were rather similar to other distributions found for all depths at national or subnational scales in mainland France (e.g., [73]), though they were denser where both low clay contents (from about 10 to 25%) and high silt contents (from about 40 up to 80%) were present.A noticeable difference between SCC-1 and SCC-2 was the highest density of points with clay >30% for SCC-2 than for SCC-1.The SCC-2 distribution exhibited a high density of points located very close to the silt-to-clay side of the ST triangle.For SCC-3, the effect of point density began to fade and became almost null for SSC-4.Note also that for SSC-4 only two points were observed for the nearly pure sand ST class.

Observed Soil Compactness Classes
Land 2024, 13, x FOR PEER REVIEW 9 of 24 We projected the clusters onto the STcells.We also assigned clusters to each observation point, which made it possible to analyze their relationships with quantitative data.For each cluster, we plotted box plots of SCC frequency, selected quantitative raw data, indicators' raw values, and indicator classes.

Observed Soil Compactness Classes
Figure 2 displays the distribution of all SCCs' topsoil observations in the ST triangle.The distributions of SCCs 1 and 2 were strongly linked to the density of all observation points.These distributions were rather similar to other distributions found for all depths at national or subnational scales in mainland France (e.g., [73]), though they were denser where both low clay contents (from about 10 to 25%) and high silt contents (from about 40 up to 80%) were present.A noticeable difference between SCC-1 and SCC-2 was the highest density of points with clay >30% for SCC-2 than for SCC-1.The SCC-2 distribution exhibited a high density of points located very close to the silt-to-clay side of the ST triangle.For SCC-3, the effect of point density began to fade and became almost null for SSC-4.Note also that for SSC-4 only two points were observed for the nearly pure sand ST class.The SCC-1 frequency exhibited gradients linked to ST. High SCC-1 densities were mainly observed for the sandiest STcells, showing that non-compact topsoils dominated in these ST.For other STcells, a rather fuzzy gradient linked to clay content was observed from low (less than 0.1) to medium (about 0.5) SCC-1 frequencies.A small cluster of rather low SCC-1 frequencies was observed for the few siltiest STcells.Gradients were less clear for SCC-2 density, though the rather low densities close to the sand corner were consistent with SCC-1 results.Rather high SCC-2 frequencies (0.5 to 0.7) dominated for the few siltiest ST.Some high SCC-2 frequencies were also observed for the STcells, which were the closest to the silt-to-clay side of the ST triangle, having clay contents ranging from 35 to 50% (mainly pure silty clays with almost no sand).For SCC-3 a fuzzy gradient was observed, with a scattering roughly looking like a "mirror" of the SCC-1 plot.Finally, the SCC-4 frequency exhibited zero or less than 0.1 values nearly everywhere.The STCells with SCC-4 observations were scarce, which was consistent with the low number of SCC-4 observations (cf.Table 1).They were evenly scattered in the triangle.The SCC-1 frequency exhibited gradients linked to ST. High SCC-1 densities were mainly observed for the sandiest STcells, showing that non-compact topsoils dominated in these ST.For other STcells, a rather fuzzy gradient linked to clay content was observed from low (less than 0.1) to medium (about 0.5) SCC-1 frequencies.A small cluster of rather low SCC-1 frequencies was observed for the few siltiest STcells.Gradients were less clear for SCC-2 density, though the rather low densities close to the sand corner were consistent with SCC-1 results.Rather high SCC-2 frequencies (0.5 to 0.7) dominated for the few siltiest ST.Some high SCC-2 frequencies were also observed for the STcells, which were the closest to the silt-to-clay side of the ST triangle, having clay contents ranging from 35 to 50% (mainly pure silty clays with almost no sand).For SCC-3 a fuzzy gradient was observed, with a scattering roughly looking like a "mirror" of the SCC-1 plot.Finally, the SCC-4 frequency exhibited zero or less than 0.1 values nearly everywhere.The STCells with SCC-4 observations were scarce, which was consistent with the low number of SCC-4 observations (cf.Table 1).They were evenly scattered in the triangle.

Observed Compactness vs. Indicators of Compactness and Soil Structural Quality
In this section, we test if some commonly used indicators of structural quality/soil strength are related to the observed SCCs.Our aim is to explore to which extent the information carried out by these indicators and by the in situ SCC observations are consistent.

Ordered Qualitative Classes Based on Quantitative Threshold Values
Figure 4 shows the relative proportion of six ordered qualitative indicators of soil structural quality/soil strength according to the four observed SCCs.Note that the number of observations is highly diverse, as it depends on the availability of attributes used to

Observed Compactness vs. Indicators of Compactness and Soil Structural Quality
In this section, we test if some commonly used indicators of structural quality/soil strength are related to the observed SCCs.Our aim is to explore to which extent the information carried out by these indicators and by the in situ SCC observations are consistent.

Ordered Qualitative Classes Based on Quantitative Threshold Values
Figure 4 shows the relative proportion of six ordered qualitative indicators of soil structural quality/soil strength according to the four observed SCCs.Note that the number of observations is highly diverse, as it depends on the availability of attributes used to calculate the indicators, i.e., in decreasing order, clay and silt content, SOC, and BD (see Table 1).
As explained in Sections SOC/Clay index and SOC/SOC exp index, two indicators (SOC/SOC exp and SOC/Clay) were calculated using SOC and clay.The "very good" class proportion of both indicators decreased from SCC-1 to SCC-4, while the "degraded" class proportion increased concomitantly.The SOC/Clay index systematically indicated an increasing trend of dominant proportions of degraded soils from SCC-1 to SCC-4.These degraded soil proportions were always higher than the degraded soil proportions calculated using the SOC/SOC exp index.Almost all the observations classified as very compact (SCC-4) were also classified as degraded by the SOC/Clay indicator.
The PD and SCD indicators gave rather consistent results for the most favorable predictions (PDClass1 vs. SCDClasses 1 and 2).However, the results of PD and SCD were often not consistent with on-field observations of SCCs and even sometimes contradictory to SCC (see in particular SCC-3).
Land 2024, 13, x FOR PEER REVIEW 11 of 24 calculate the indicators, i.e., in decreasing order, clay and silt content, SOC, and BD (see Table 1).As explained in Sections SOC/Clay index and SOC/SOCexp index, two indicators (SOC/SOCexp and SOC/Clay) were calculated using SOC and clay.The "very good" class proportion of both indicators decreased from SCC-1 to SCC-4, while the "degraded" class proportion increased concomitantly.The SOC/Clay index systematically indicated an increasing trend of dominant proportions of degraded soils from SCC-1 to SCC-4.These degraded soil proportions were always higher than the degraded soil proportions calculated using the SOC/SOCexp index.Almost all the observations classified as very compact (SCC-4) were also classified as degraded by the SOC/Clay indicator.
The PD and SCD indicators gave rather consistent results for the most favorable predictions (PDClass1 vs. SCDClasses 1 and 2).However, the results of PD and SCD were often not consistent with on-field observations of SCCs and even sometimes contradictory to SCC (see in particular SCC-3).
The BDr1 classes exhibited rather erratic relative proportions with increasingly compact SCC observations, especially for BDr1-Class1 and 3.In addition, SCC-4 observations exhibited BDr1 class proportions that were inconsistent with the highest compactness class.On the contrary, BDr2 class relative proportions showed changes that were consistent with an increase in compactness: from SCC-1 to SCC-4, we observed a regular increase in BDr2-Class3 proportion (rooting highly restricted) together with a regular decrease in BDr2-Class1 (rooting not restricted), whereas the BDr2-Class2 proportion remained nearly constant.

Soil Compactness Classes and Quantitative Soil Properties Values and Ratios
In this section, we present some selected box plots of quantitative soil properties values and ratios according to SCCs. Figure 5 displays box plots of raw data (SOC, CEC, The BDr 1 classes exhibited rather erratic relative proportions with increasingly compact SCC observations, especially for BDr 1 -Class1 and 3.In addition, SCC-4 observations exhibited BDr 1 class proportions that were inconsistent with the highest compactness class.On the contrary, BDr 2 class relative proportions showed changes that were consistent with an increase in compactness: from SCC-1 to SCC-4, we observed a regular increase in BDr 2 -Class3 proportion (rooting highly restricted) together with a regular decrease in BDr 2 -Class1 (rooting not restricted), whereas the BDr 2 -Class2 proportion remained nearly constant.

Soil Compactness Classes and Quantitative Soil Properties Values and Ratios
In this section, we present some selected box plots of quantitative soil properties values and ratios according to SCCs. Figure 5 displays box plots of raw data (SOC, CEC, CaCO 3 , pH) content, and of several properties related to relative SOC and clay proportions (COC, NCOC, SOC/Clay, SOC/SOC exp ).
All box plots showed a high variability and some high outlier values, except for pH.The SOC content progressively decreased from SCC-1 to SCC-4.The median and interquartile range of CEC increased from SCC-1 to SCC-4.This last result is consistent with the different distributions of SCC observations in the ST triangle from SCC-1 to SCC-4 (see Figure 1).The median value of CaCO 3 was close to zero, though very slightly positive for SCC-3 and SCC-4.Upper quartiles of CaCO 3 were rather similar for all SCCs.Therefore, at a first look, CaCO 3 appeared not relevant to discriminating against SCCs.Conversely, pH slightly increased with increasing compactness.The COC values slightly decreased from SCC-2 to SCC-4, whereas NCOC median values were all null, and even the upper quartile values were equal to zero for SCC-2, -3 and -4.Nevertheless, SCC-1 exhibited a substantially higher proportion of positive NCOC values, as shown by the upper quartiles.These larger values of NCOC were consistent with the dominant sandy STcells observed for SCC-1 in Figure 3 (Section 3.1).Both SOC/SOC exp and SOC/Clay decreased with increasing observed compactness.Note, however, that from SCC-1 to SCC-3, SOC/Clay exhibited very high upper quartile and outlier values, when compared to medians.Overall, very low values of SOC/Clay dominated for most of the SCCs.All box plots showed a high variability and some high outlier values, except for pH.The SOC content progressively decreased from SCC-1 to SCC-4.The median and interquartile range of CEC increased from SCC-1 to SCC-4.This last result is consistent with the different distributions of SCC observations in the ST triangle from SCC-1 to SCC-4 (see Figure 1).The median value of CaCO3 was close to zero, though very slightly positive for SCC-3 and SCC-4.Upper quartiles of CaCO3 were rather similar for all SCCs.Therefore, at a first look, CaCO3 appeared not relevant to discriminating against SCCs.Conversely, pH slightly increased with increasing compactness.The COC values slightly decreased from SCC-2 to SCC-4, whereas NCOC median values were all null, and even the upper quartile values were equal to zero for SCC-2, -3 and -4.Nevertheless, SCC-1 exhibited a substantially higher proportion of positive NCOC values, as shown by the upper quartiles.These larger values of NCOC were consistent with the dominant sandy STcells observed for SCC-1 in Figure 3 (Section 3.1).Both SOC/SOCexp and SOC/Clay decreased with increasing observed compactness.Note, however, that from SCC-1 to SCC-3, SOC/Clay exhibited very high upper quartile and outlier values, when compared to medians.Overall, very low values of SOC/Clay dominated for most of the SCCs.The clusters exhibited rather fuzzy, yet striking, distributions of locations in the ST.Cluster 1 dominated in the sandiest parts of the ST triangle, cluster 3 showed a rather compact geometry mainly located in the denser SCC observations area shown in Figure 1 (Section 2.2.2) and Figure 2 (Section 3.1), whereas cluster 2 was more clayey and cluster 6 was mainly located in heavy clay and clay ST.Cluster 4 had a scattered distribution.Cluster 5, though rather scattered, showed a substantial number of STcells very close to the clay-to-silt side of the ST triangle.Figure 7 displays selected cluster factor maps produced from the hierarchical clustering-based PCA.All together, the three first principal dimensions explained 93.6% of the inertia.

Clustering
On the dimension 1 vs. dimension 2 plan (Figure 7a), clusters 1 and 6 were clearly separated, similar to STcell cluster projections observed in Figure 6a whereas other clusters more or less overlapped with each other.Only cluster 2 intersected with all other clusters.Note that cluster 2 exhibited a large and rounded shape and was centered close to (0; 0).Cluster 3 was the most compact.On the dimension 2 vs. dimension 3 plan (Figure 7b), cluster 1 was separated from clusters 4, 5 and 6, underlining its particularity.Similarly, cluster 5 was distinct from clusters 1, 2, and 3. Clusters 2 and 3 intersected with all other clusters, except for cluster 5.However, the intersection between clusters 3 and 6 was very small and contained only one STcell.Interestingly, this latter STcell was the only one in the very small intersection resulting from the overlap of clusters 2, 3, 4 and 6.Cluster 3 exhibited a large and rounded shape and was centered rather close to (0; 0).The clusters exhibited rather fuzzy, yet striking, distributions of locations in the ST.Cluster 1 dominated in the sandiest parts of the ST triangle, 3 showed a rather compact geometry mainly located in the denser SCC observations area shown in Figures 1 (Section 2.2.2) and 2 (Section 3.1), whereas cluster 2 was more clayey and cluster 6 was mainly located in heavy clay and clay ST.Cluster 4 had a scattered distribution.Cluster 5, though rather scattered, showed a substantial number of STcells very close to the clay-tosilt side of the ST triangle.Figure 7 displays selected cluster factor maps produced from the hierarchical clustering-based PCA.All together, the three first principal dimensions explained 93.6% of the inertia.On the dimension 1 vs. dimension 2 plan (Figure 7a), clusters 1 and 6 were clearly separated, similar to STcell cluster projections observed in Figure 6a whereas other clusters more or less overlapped with each other.Only cluster 2 intersected with all other clusters.Note that cluster 2 exhibited a large and rounded shape and was centered close to (0; 0).Cluster 3 was the most compact.On the dimension 2 vs. dimension 3 plan (Figure 7b), cluster 1 was separated from clusters 4, 5 and 6, underlining its particularity.Similarly, cluster 5 was distinct from clusters 1, 2, and 3. Clusters 2 and 3 intersected with all other clusters, except for cluster 5.However, the intersection between clusters 3 and 6 was very small and contained only one STcell.Interestingly, this latter STcell was the only one  The clusters exhibited rather fuzzy, yet striking, distributions of locations in the ST.Cluster 1 dominated in the sandiest parts of the ST triangle, cluster 3 showed a rather compact geometry mainly located in the denser SCC observations area shown in Figures 1 (Section 2.2.2) and 2 (Section 3.1), whereas cluster 2 was more clayey and cluster 6 was mainly located in heavy clay and clay ST.Cluster 4 had a scattered distribution.Cluster 5, though rather scattered, showed a substantial number of STcells very close to the clay-tosilt side of the ST triangle.Figure 7 displays selected cluster factor maps produced from the hierarchical clustering-based PCA.All together, the three first principal dimensions explained 93.6% of the inertia.On the dimension 1 vs. dimension 2 plan (Figure 7a), clusters 1 and 6 were clearly separated, similar to STcell cluster projections observed in Figure 6a whereas other clusters more or less overlapped with each other.Only cluster 2 intersected with all other clusters.Note that cluster 2 exhibited a large and rounded shape and was centered close to (0; 0).Cluster 3 was the most compact.On the dimension 2 vs. dimension 3 plan (Figure 7b), cluster 1 was separated from clusters 4, 5 and 6, underlining its particularity.Similarly, cluster 5 was distinct from clusters 1, 2, and 3. Clusters 2 and 3 intersected with all other clusters, except for cluster 5.However, the intersection between clusters 3 and 6 was very small and contained only one STcell.Interestingly, this latter STcell was the only one If we consider now both plans (i.e., three dimensions), cluster 4 is the only cluster which intersected all dimensions except dimension 1, which is consistent with its scattering in the ST triangle.

Cluster and Soil Compactness Class Frequencies
Figure 8 displays the frequency of observed SCCs of the six clusters.Cluster 1 showed a regular and strong decrease in SCC frequency from SCC-1 to SCC-4.Overall, this cluster had the largest frequency of non-compact values (SCC-1) and included almost no very compact observations (SCC-4).Cluster 2 exhibited rather similar frequencies for SCC-1, 2 and 3, and a very small proportion of SCC-4.Cluster 3 exhibited high and nearly equal frequencies of the two less compact classes (SCC-1 and SCC-2) whereas SCC-3 was less observed, together with a very small frequency of SCC-4.Cluster 4 was dominated by slightly compact observations (median = 50% of SCC-2), then followed by SCC-1 and SCC-2 (medians around 25%).Though the median value for SCC-4 frequency was close to zero, cluster 4 exhibited the highest first quartile for SSC-4 (very compact).Cluster 5 had the highest frequency of SCC-2, which largely dominated.Finally, cluster 6 exhibited a very different behavior, where SCC-2 and SCC-3 were largely dominant.Note also that, among all clusters, cluster 6 had the highest SCC-3 frequency (compact), and the lowest SCC-1.
compact observations (SCC-4).Cluster 2 exhibited rather similar frequencies for SCC-1, 2 and 3, and a very small proportion of SCC-4.Cluster 3 exhibited high and nearly equal frequencies of the two less compact classes (SCC-1 and SCC-2) whereas SCC-3 was less observed, together with a very small frequency of SCC-4.Cluster 4 was dominated by slightly compact observations (median = 50% of SCC-2), then followed by SCC-1 and SCC-2 (medians around 25%).Though the median value for SCC-4 frequency was close to zero, cluster 4 exhibited the highest first quartile for SSC-4 (very compact).Cluster 5 had the highest frequency of SCC-2, which largely dominated.Finally, cluster 6 exhibited a very different behavior, where SCC-2 and SCC-3 were largely dominant.Note also that, among all clusters, cluster 6 had the highest SCC-3 frequency (compact), and the lowest SCC-1.The SOC/Clay class frequencies showed a much larger variability among clusters (Figure 9b).They exhibited a large percentage of soils classified as degraded (nearly 90% for cluster 6).Cluster 1 had the highest proportion of "very good" and "good" index, whereas cluster 6, 5 and 2 had the largest percentage of "degraded".From cluster 1 to 4, the SCD index (Figure 9c) showed rather slight variations whereas SCDClass6 exhibited The SOC/Clay class frequencies showed a much larger variability among clusters (Figure 9b).They exhibited a large percentage of soils classified as degraded (nearly 90% for cluster 6).Cluster 1 had the highest proportion of "very good" and "good" index, whereas cluster 6, 5 and 2 had the largest percentage of "degraded".From cluster 1 to 4, the SCD index (Figure 9c) showed rather slight variations whereas SCDClass6 exhibited very low and rather high values for cluster 5 and 6, respectively.Though much less contrasted, the distribution of PD classes (Figure 9d) was rather consistent with SOC/Clay for clusters 2 and 6.Finally, BDr 1 and BDr 2 (Figure 9e,f) gave rather inconsistent results, though cluster 6 remained the highest in terms of worst class frequency and BDr 1 -Class1 frequency could be seen as a good criterion to discriminate against some categories of clusters.

Cluster and Quantitative Soil Variables' Relationships
We explored quantitative relationships between clusters and some soil properties.They are available in the form of box plots in the Supplementary Material (Figures S4 and S5).SOC, SOC/SOC exp , COC, BD, Fine silt, and C/N did not give convincing results.Interesting contrasted values or gradients among clusters were found with particle-size fractions, pH, CaCO 3 , SOC/Clay, CEC, NCOC and some BDr threshold values.Figure 10 displays selected box plots of these properties per cluster.As expected from Figure 6 (Section 3.3.1),the clusters exhibited noticeable differences in clay, silt and sand distributions (Figure 10a-c).Substantial amounts of NCOC were likely and partly linked to low clay contents in clusters 1 and 3 (Figure 10d).Among clusters, the CaCO3 box plots were rather contrasted (Figure 10e), whereas pH box plots exhibited smaller differences (Figure 10f).

Consistency of the Results and Main Outputs
In this article, we show that utilizing very simple information gathered in the field supports clustering of various topsoil responses to compaction stress.These clusters are interpretable when compared to other soil properties and indicators.This attempt could have been viewed as rather risky, considering that an inherent and large variability in SCC classification should come from the different soil surveyors.As the SCCs do not come from physical standard devices and measurements, they are highly dependent on the observer's physical strength and on the shape and mechanical properties of the knife used.However, all soil surveyors were trained with similar standards and had solid experience in the field.Consequently, we believe that, regardless of individual strength and differ- As expected from Figure 6 (Section 3.3.1),the clusters exhibited noticeable differences in clay, silt and sand distributions (Figure 10a-c).Substantial amounts of NCOC were likely and partly linked to low clay contents in clusters 1 and 3 (Figure 10d).Among clusters, the CaCO 3 box plots were rather contrasted (Figure 10e), whereas pH box plots exhibited smaller differences (Figure 10f).

Consistency of the Results and Main Outputs
In this article, we show that utilizing very simple information gathered in the field supports clustering of various topsoil responses to compaction stress.These clusters are interpretable when compared to other soil properties and indicators.This attempt could have been viewed as rather risky, considering that an inherent and large variability in SCC classification should come from the different soil surveyors.As the SCCs do not come from physical standard devices and measurements, they are highly dependent on the observer's physical strength and on the shape and mechanical properties of the knife used.However, all soil surveyors were trained with similar standards and had solid experience in the field.Consequently, we believe that, regardless of individual strength and differences in the tools used, the field surveyors succeeded in forging an incredibly consistent description of soil morphology.The SCCs appeared to be consistent with some indicators of soil compaction and soil structural quality.Though they are conceptually different, SOC/Clay and SOC/SOC exp exhibited consistent orderings with SCCs.However, very low values of SOC/Clay dominated for most of the SCCs, either suggesting that the SOC/Clay statistical distributions and the observed SCCs were inconsistent, or that the SOC/Clay indicator of structure is biased, as suggested by Poeplau and Don [34], or should be adapted more locally, as suggested by Rabot et al. [35].If we consider the SCCs as the "truth", the SOC/Clay indicator resulted in much more "pessimistic" predictions than SOC/SOC exp .
The SCC gradients within the ST triangle were more or less expected, confirming the predominant role of ST in inherent soil susceptibility to compaction.However, the evenly scattered SCC-4 points in the triangle suggest that extreme soil compactness resulted from factors other than ST alone (e.g., extreme loads, compaction under specific soil moisture conditions).
We aimed to capture several possible soil responses to compaction susceptibility, using a single observation per each combination of location and time.Therefore, we hypothesized that ST was one of the major controlling factors of soil behavior to compaction susceptibility.Indeed, a basic and strong assumption that we made to build STcells is that the SCC statistical distribution for topsoil at least partly depends on ST.This assumption seems reasonable if we consider that a large expected variability in the sizes of particles in a soil allows a priori optimization of their perfect stacking and reduced porosity (see the pioneering work from Hénin et al. [18]).It is also consistent with the fact that certain soil particles, in particular clay minerals, have particular rheological properties (plasticity, shrinkage-swelling, etc.).This enabled us to gather enough information to calculate distributions of SCCs among STcells and to run the clustering according to these distributions.The distribution of SCCs by ST cells in the ST triangle confirmed our initial hypothesis, thus allowing for partial interpretation of the clusters.
Cluster 1 corresponds to sandy soils that are the least sensitive to compaction.Cluster 2 is mainly composed of silty clays, which can reach rather high compactness states, but they are reversible by tillage, climate, and/or by favorable pH for biological activity.Cluster 4 still remains to be explored and is not correlated to soil texture; we still need further interpretation of this cluster.Cluster 5 is mainly characterized by large numbers of fine but not deformable particles (silt).As particles are fine, the effect of compaction may lead to decreases in soil microporosity.However, the dominant SCC of cluster 5 is class 2 (slightly compact), indicating that this cluster does not often reach very high compactness.Cluster 6 is clearly linked to some clayey soils that can reach high levels of compactness.Cluster 3 is more challenging to interpret.

Comparison with Other Studies
To our knowledge, such a study using a large number of field observations has almost never been conducted at the national scale.The only attempt was made by Bondi et al. [42], who used machine learning to predict soil BD from "visual parameters" in Ireland.Briefly, they selected eleven descriptors, which they considered the most important for the qualitative assessment of soil structure.Each descriptor was detailed and recorded based on a set of predefined categories.They ran a decision tree-learning algorithm, which allowed discrimination among three BD ranges.They also provided a linear equation model, which predicted the numerical value of soil BD.A noticeable difference between the study of Bondi et al. and our study is that they aimed to predict a state of compactness, not a behavior or response to compaction.In addition, they considered raw BD as a suitable estimator of compactness, which is perhaps an acceptable assumption in Ireland, but is far from being generally accepted elsewhere.Using BD alone may indeed not be relevant for other conditions.Păltineanu et al. [74] compared BD, PD and SCD in a set of about 400 samples taken from a large spectrum of soils in Romania.Their aim was to test if their interpretations lead to similar results in characterizing soil compaction.They suggested that the soil physical state can be synthetically described by BD, PD and SCD.A highly significant correlation expressed by a curvilinear regression equation was found between SCD and PD.However, they did not use field observations as we did in our study.
In a systematic review and meta-analysis, Franco et al. [36] compiled 158 observations from 31 visual soil assessment studies around the world and classified samples into five broad ST classes.They concluded that higher structure quality scores were observed in clayey/silty soils compared to sandy soils, regardless of climate zone.Importantly, this meta-analysis was not able to detect differences induced by soil management and cropping systems.Nevertheless, these authors suggested that "the visual assessment of soil structure (VESS) is an on-farm, practical and reliable tool for evaluating the structural quality of soils globally".
Indeed, most national and continental-scale studies on compaction used measurements and rather simple indicators such as PD, SCD, BDr 1 and BDr 2 , thresholds of porosity, and/or PTFs or pedo-transfer rules to derive them (e.g., [11,12,[75][76][77]), or use models that require many properties that are most often derived from PTFs (e.g., [5,[78][79][80][81]) rather than measured or in situ observed.Though an interpretation of indicator values is subject to many precautions, this result suggests that SOC and Clay relationships may partly help to discriminate some clusters.Finally, except for SOC/SOC exp , all indicators were able to discriminate at least one cluster from the others; however, their threshold values did not seem to reflect reality.
We explored to which extent the information contained in a series of indicators and by the in situ SCC observations were consistent.We showed both some consistencies and inconsistencies; we also demonstrated that some indicators were sensitive but did not reflect a plausible reality of soil compactness.Such an assessment might be seen as considering two sides of the same coin.Do the indicator values afford the confidence that we may have on using SCC information to characterize the soil mechanical behavior and resilience to compaction?Or do the large amount of SCC data enable us to estimate the robustness and generality of indicators?Though it is not an easy task with the data that we have, we tried to look at these two sides in a plausible way.
From a practical perspective, the clusters that we identified exhibited various susceptibilities to soil compaction and some of them are clearly related to some broadly available soil properties.Therefore, we believe that this kind of clustering and the relationships that we found between clusters and some inherent soil properties will be useful for practitioners and advisers to formulate recommendations on soil management practices.

Limitations 4.3.1. Basic Assumption and Limiting Number of On-Field Information
We used only one test for in situ compactness assessment.Strictly speaking, the data we used can thus be considered as proxies of a test of penetration resistance [40,41,82] rather than a test of compaction.However, we choose this on-field test because of its simplicity and the numerous available information.Besides, Canarache [83] compared several factors and indices regarding compactness of agricultural soils and found very good relationships between various compactness indices and resistance to penetration at field capacity (R 2 ranging 0.92-0.99)for a large range of soil clay content (0-70%).In addition, Van Orsouw et al. [82] recently assessed soil compactness using penetrologgers that measure the penetration resistance of the soil.Though they also stressed that penetration resistance "is therefore a measure of soil strength rather than the density", they provided a list of 33 references of previous articles using penetration resistance threshold values to evaluate the state of soil compactness.We could have added other tests that are routinely performed to assess compactness.However, adding other tests would certainly have reduced the number of observations gathering all tests together.It might also make the interpretation of the results overly complex.Though there is an interest in adding more information for exploring data-based knowledge discovery tools mixing many field indicators, we think that the use of a single indicator already demonstrates the interest in mining such qualitative observations.Therefore, we reached our main goal of an enhanced use of qualitative field observations conducted in the framework of conventional soil surveys, to provide interesting insights and information related to soil response to compaction at a broad scale.

Clusters' Size and Interpretation
Cluster 3 largely dominated the others, whereas cluster 6 was rather small.This has statistical implications for the robustness of our results.One can easily understand that one STcell and/or cluster can contain points having different behaviors, as shown by the variability expressed in cluster box plots.More importantly, it calls into question the real occurrence of some clusters.Is cluster 3 simply what remains once extreme situations have been identified and removed?Does it actually reflect a particular behavior or is it a mixture of behaviors that could potentially be subdivided or stratified?We have not answered these questions yet.Different questions arise for other clusters.For example, for cluster 4, the absence of a relationship with ST still questions the factors controlling its behavior.Though cluster 6 is overall the most clayey cluster, it is spread among other clay STcells belonging to other clusters, therefore further work is needed to better interpret and characterize clusters 4 and 6.Subsoil compaction is a severe problem mainly due to its persistence and its effects may be even permanent [84][85][86].Moreover, soil compaction has been a rising concern since the adoption of reduced tillage and/or no-till agricultural practices (e.g., [86][87][88][89][90][91]). Therefore, it would be worth testing our approach for subsoils.One may even wonder why we did not begin by exploring subsoil compaction.The main reason is the discrepancy that may exist between soil surveyors, who mainly sample by pedogenetic horizons (except for the tilled layers), and agronomists or specialists of compaction who systematically take into account differences in soil strength and agricultural induced compaction to describe and sample soil layers.A careful exploration of the database is thus necessary before deciding if it is worth investigating this topic.

Considering Compaction Impacts Using a Loss-Function Approach
We classified clusters without giving a weight to the consequences of misclassification.From a practical point of view, we could have used weights in order to take into account the losses due to misclassification.Classifying a soil as rather resilient to compaction when it is not may have serious consequences.Indeed, this misclassification may induce inappropriate decisions for agricultural practices and loading.In other words, we should relate the misclassification errors to cost benefit and/or risk assessment approaches.A loss-function approach could be used to differentiate different kinds of errors and their consequences on various soil functions or services [82,92], such as infiltration, crop production, erosion, etc. Adopting this approach would require some in-depth studies, which may come from long-term experiments, information on yields, or from expert judgment.However, many of these loss-function approaches will also require additional variables, such as relief and climate for erosion, soil capacity and condition for yields, and deep soil saturated water conductivity for infiltration.

Digital Soil Mapping of Clusters
Digital Soil Mapping (DSM, [93]) of soil behavior to compaction could be a way to (i) further interpret clusters (from a soil geography perspective), (ii) discover patterns of clusters that may provide additional explanation on their behaviors to compaction, and (iii) check if the clusters are operational for decision making and at what scale.This will necessitate identification of relevant covariates.These selected covariates could be particlesize distribution fractions, some indicators of sensitivity to compaction, and potentially relevant covariates that were not used in this study (e.g., climate, relief, soil parent material).This is tempting and an exciting challenge as it could bring both new knowledge on controlling factors of soil behavior and spatial predictions that can improve soil connectivity with stakeholders and farmers [94].

Conclusions
Incorporating cost-effective data is one of the ways forward to improve the number of observations and the relevance of a soil behavior assessment capturing complex soil responses to various pressures.In this article, we used about 10,000 field observations of topsoil compactness classes (SCCs) collected by qualified soil surveyors in mainland France.We demonstrated that mining numerous in situ observations conducted in the framework of conventional soil surveys provides interesting insights into inherent soil behavior to compaction at a broad-scale.This novel approach has been seldom explored and paves the way to research exploring the potential of field observations to provide new insights into soil function and responses to management practices.
The main outputs of this study are the following.

1.
The SCC field observations were interpretable when compared to some compactness quantities or sensitivity to compaction indicators derived from analytical measurements.2.
Conversely, some quantitative indicators did not capture large variations in SCCs and/or over-or underestimated topsoil compactness or topsoil sensitivity to compaction.

3.
A clustering based on these SCCs provided useful insights into specific behaviors to soil compaction.4.
Though soil texture and SOC affected some clusters of topsoil behavior to compaction, other controlling factors still remain to be elucidated.This is a limitation of the present study and we believe that adding a DSM approach may help to unravel other factors controlling soil compaction. 5.
Approaches that include rescuing and gathering observations that are routinely conducted by field soil surveyors may offer a great potential for assessing and classifying complex soil properties and behaviors.6.
This classification of complex soil behavior to compaction can be used by soil practitioners and advisers to adapt practical recommendations according to inherent soil susceptibility to compaction.7.
Increasing the number of field indicators to classify various soil behaviors, and conducting digital soil mapping and assessment studies based on such field observations and data-driven approaches would further increase the utility of existing data.
Indeed, the encouraging results that we obtained using only one field indicator advocate for testing other ones and for exploring data-based knowledge discovery tools mixing many field indicators together-or without-quantitative measurements of soil properties.Such an approach is worth testing and we plan to tackle this question in the near future.

Figure 1 .
Figure 1.Number of soil compactness class (SCC) observations per STcell containing at least 10 observations of SCC.

Figure 2
Figure 2 displays the distribution of all SCCs' topsoil observations in the ST triangle.The distributions of SCCs 1 and 2 were strongly linked to the density of all observation points.These distributions were rather similar to other distributions found for all depths at national or subnational scales in mainland France (e.g.,[73]), though they were denser where both low clay contents (from about 10 to 25%) and high silt contents (from about 40 up to 80%) were present.A noticeable difference between SCC-1 and SCC-2 was the highest density of points with clay >30% for SCC-2 than for SCC-1.The SCC-2 distribution exhibited a high density of points located very close to the silt-to-clay side of the ST triangle.For SCC-3, the effect of point density began to fade and became almost null for SSC-4.Note also that for SSC-4 only two points were observed for the nearly pure sand ST class.

Figure 2 .
Figure 2. Projection of the soil compactness class (SCC) observation points in the texture triangle.(a): SCC-1 class; (b): SCC-2 class; (c): SCC-3 class and (d): SCC-4 class Figure 3 displays the frequency of each SCC within the STcells, projected in the ST triangle.

Figure 3
Figure 3 displays the frequency of each SCC within the STcells, projected in the ST triangle.The SCC-1 frequency exhibited gradients linked to ST. High SCC-1 densities were mainly observed for the sandiest STcells, showing that non-compact topsoils dominated in these ST.For other STcells, a rather fuzzy gradient linked to clay content was observed from low (less than 0.1) to medium (about 0.5) SCC-1 frequencies.A small cluster of rather low SCC-1 frequencies was observed for the few siltiest STcells.Gradients were less clear for SCC-2 density, though the rather low densities close to the sand corner were consistent with SCC-1 results.Rather high SCC-2 frequencies (0.5 to 0.7) dominated for the few siltiest ST.Some high SCC-2 frequencies were also observed for the STcells, which were the closest to the silt-to-clay side of the ST triangle, having clay contents ranging from 35 to 50% (mainly pure silty clays with almost no sand).For SCC-3 a fuzzy gradient was observed, with a scattering roughly looking like a "mirror" of the SCC-1 plot.Finally, the SCC-4

3. 3 . 1 .
Figure 6 displays the projection of the STcell clusters into the ST triangle and the number of SCC observation points by cluster.Cluster 3 had more SCC observations (about 5600) than the others.Conversely, cluster 5 had much fewer SCC observations (about 900),

3. 3 .
Figure 6 displays the projection of the STcell clusters into the ST triangle and the number of SCC observation points by cluster.Cluster 3 had more SCC observations (about 5600) than the others.Conversely, cluster 5 had much fewer SCC observations (about 900), cluster 6 being the least populated (about 500 observations).The other clusters (1, 3 and 4) had a very similar number of observations, close to 2000.The clusters exhibited rather fuzzy, yet striking, distributions of locations in the ST.Cluster 1 dominated in the sandiest parts of the ST triangle, cluster 3 showed a rather compact geometry mainly located in the denser SCC observations area shown in Figure1(Section 2.2.2) and Figure2(Section 3.1), whereas cluster 2 was more clayey and cluster 6 was mainly located in heavy clay and clay ST.Cluster 4 had a scattered distribution.Cluster 5, though rather scattered, showed a substantial number of STcells very close to the clay-to-silt side of the ST triangle.Figure7displays selected cluster factor maps produced from the hierarchical clustering-based PCA.All together, the three first principal dimensions explained 93.6% of the inertia.On the dimension 1 vs. dimension 2 plan (Figure7a), clusters 1 and 6 were clearly separated, similar to STcell cluster projections observed in Figure6awhereas other clusters more or less overlapped with each other.Only cluster 2 intersected with all other clusters.Note that cluster 2 exhibited a large and rounded shape and was centered close to (0; 0).Cluster 3 was the most compact.On the dimension 2 vs. dimension 3 plan (Figure7b),

Land 2024 ,
13,  x FOR PEER REVIEW 13 of 24 cluster 6 being the least populated (about 500 observations).The other clusters (1, 3 and 4) had a very similar number of observations, close to 2000.

Figure 6 .
Figure 6.(a): Projection of the soil texture cell (STcell) clusters into the ST triangle; (b): number of soil compactness class (SCC) observation points by cluster.

Figure 6 .
Figure 6.(a): Projection of the soil texture cell (STcell) clusters into the ST triangle; (b): number of soil compactness class (SCC) observation points by cluster.

Figure 6 .
Figure 6.(a): Projection of the soil texture cell (STcell) clusters into the ST triangle; (b): number of soil compactness class (SCC) observation points by cluster.

Figure 8 .
Figure 8. Frequency box plots of soil compactness classes (SCCs) for the 6 clusters.

3. 3 . 3 .
Figure 9 displays the relative proportion of six ordered qualitative indicators of soil structural quality, soil compactness or soil strength, according to the six clusters described in Section 3.3.1.The SOC/SOCexp class distribution was nearly the same for all the clusters, except cluster 5.The latter cluster exhibited a substantially higher percentage of degraded and moderate classes, whereas other clusters had very similar and equilibrated populations among SOC/SOCexp classes.

Figure 8 .
Figure 8. Frequency box plots of soil compactness classes (SCCs) for the 6 clusters.

3. 3 . 3 .Figure 9 24 Figure 9 .
Figure 9 displays the relative proportion of six ordered qualitative indicators of soil structural quality, soil compactness or soil strength, according to the six clusters described in Section 3.3.1.The SOC/SOC exp class distribution was nearly the same for all the clusters, except cluster 5.The latter cluster exhibited a substantially higher percentage of degraded and moderate classes, whereas other clusters had very similar and equilibrated populations among SOC/SOC exp classes.Land 2024, 13, x FOR PEER REVIEW 15 of 24

Figure 9 .
Figure 9. Relative proportions of six ordered classes of indicators of soil structural quality, soil compactness or soil strength, according to six observed clusters.A cumulated bar shows the relative proportions of classes of an indicator for one cluster.The numbering under the x-axis corresponds to the cluster number.(a): SOC/SOC exp index; (b): SOC/Clay index; (c): SCD index; (d): PD index; (e): BDr 1 class; (f): BDr 2 class.

24 Figure 10 .
Figure 10.Box plots of selected topsoil properties per cluster.Axis x legend is the cluster number.For NCOC, the y axis has been truncated and the number of outliers not present in the figure is indicated in red.(a): clay; (b): silt; (c): sand; (d): NCOC; (e): CaCO3; (f): pH.

Figure 10 .
Figure 10.Box plots of selected topsoil properties per cluster.Axis x legend is the cluster number.For NCOC, the y axis has been truncated and the number of outliers not present in the figure is indicated in red.(a): clay; (b): silt; (c): sand; (d): NCOC; (e): CaCO 3 ; (f): pH.

4. 4 .
Possible Improvements, Perspectives and Prospects 4.4.1.Moving to Deeper Horizons Figure S1.Plot of sum of squared errors (SSE) as a function of the number of clusters.Figure S2.Plot of the optimal number of clusters by gap statistic method.Figure S3.Number of STcells per cluster depending on the number of clusters created.Figure S4.Box plots of different topsoil properties per cluster.Figure S5.Box plots of different topsoil properties per cluster (continue).

Table 1 .
Definition of each soil compactness class (SCC).

Table 2 .
List of quantitative variables used in this study.

Table 5 .
Packing density (PD) classes and threshold values.

Table 6 .
Soil compactness degree (SCD) classes and threshold values.

Table 7 .
Root restrictive bulk density (BDr 1 and BDr 2 ) classes and their threshold values.BDr 1 Classes and BDr 2 Classes are classes of limiting root depth bulk density as g cm −3 .Clay and Silt content are both in % (g 100 g −1 ).

Table 7 .
Root restrictive bulk density (BDr1 and BDr2) classes and their threshold values.BDr1Classes and BDr2Classes are classes of limiting root depth bulk density as g cm −3 .Clay and Silt content are both in % (g 100 g −1 ).
2.2.1.Comparing Field Observations with Raw Quantitative Data and Indicators