Quantifying Soil Complexity Using Fisher Shannon Method on 3D X-ray Computed Tomography Scans

The conversion of native forest into agricultural land, which is common in many parts of the world, poses important questions regarding soil degradation, demanding further efforts to better understand the effect of land use change on soil functions. With the advent of 3D computed tomography techniques and computing power, new methods are becoming available to address this question. In this direction, in the current work we implement a modification of the Fisher–Shannon method, borrowed from information theory, to quantify the complexity of twelve 3D CT soil samples from a sugarcane plantation and twelve samples from a nearby native Atlantic forest in northeastern Brazil. The distinction found between the samples from the sugar plantation and the Atlantic forest site is quite pronounced. The results at the level of 91.7% accuracy were obtained considering the complexity in the Fisher–Shannon plane. Atlantic forest samples are found to be generally more complex than those from the sugar plantation.


Introduction
The degradation of soils due to land use changes driven by economic factors represents a major concern for the foreseeable future in many parts of the world.More precisely, land use change may adversely affect fundamental soil functions, such as nutrient storage, diffusion and cycling, carbon storage and greenhouse gas emissions, erosion resistance, water storage, drainage, and filtration [1][2][3][4][5].Moreover, the biodiversity of forests may also be unfavorably affected by systematic land use change [6].On the other hand, poverty and population growth lead to an ever-increasing demand for indiscriminate natural resources in developing countries.The demand for pasture, timber, firewood, and crops drives the conversion of tropical forests into agricultural land at an alarming rate.This situation dictates comprehensive studies on the impact of deforestation and land use conversion on soil quality in general.More precisely, the outstanding question is whether the cultivation of deforested land may lead to the permanent degradation of land productivity.The ecologically sensitive components of the tropical ecosystem may not buffer the effects of agricultural practices (see, e.g., [7] and references therein).Therefore, comprehensive assessment of soil properties is fundamental for the early detection and mitigation of adverse soil change effects.
The effects of land use change have been addressed mainly focusing on physical, chemical, and biological properties [7][8][9], while far fewer studies have been devoted to changes in soil structure [10,11].The latter governs its functions [12], and quantification of soil architecture can be seen as a key to better understanding the complex dynamical phenomena that govern these functions.Therefore, a comprehensive description and quantification of soil functions rely on an in-depth understanding of characteristics such as the three-dimensional distribution of constituents, connectedness, hierarchical organization, and complexity.
While X-ray computed tomography (CT) has been advancing at an impressive rate over the last decades, it has also become a widespread tool for non-destructive 3D soil visualization and quantification, shedding new light on soil functions [13].The diverse properties of soil that have not been previously amenable to analyses can now be assessed through CT scans, providing novel fundamental insights into soil functions [13].These properties include isotropy, homogeneity, complexity, and the hierarchical fractal (or multifractal) organization of soil constituents, contributing to a deeper understanding of soil's physical, chemical, and biological processes [14].X-ray CT scans have already been studied to characterize the pores' spatial distribution, revealing extraordinary complexity of the pore space [15][16][17][18].The complexity of soil structure has also been addressed through methods based on concepts from statistical physics and information theory, such as fractals and multifractals [19][20][21], information content [22] and complex networks [23,24].As a consensus has not yet been reached on the adequate threshold for separating pores from solid in CT scans [25], it has also been suggested that rather than thresholding, grayscale soil images should be used for the multifractal characterization of the soil structure [26][27][28][29][30].
Between 2000 and 2018, Brazil suffered a total reduction of 489,877 km 2 in the natural area of its six terrestrial biomes.Among them, the Atlantic Forest biome has the one with the highest percentage of degradation over time, as it covers the most industrialized and productive areas, in addition to having the highest demographic density in the national territory, housing about 49.3% of the urban areas of the country [31].One of the crops that most stands out in the region of the Atlantic Forest biome is sugarcane, especially in the northeast region of the country, where cultivation is present in eight of the nine states in the region.For the 2020/2021 harvest, an increase of 1.6% in the planted area and 4.1% in production were estimated compared to the previous sugarcane harvest in the northeastern region of Brazil [32].The replacement of the native vegetation of the Atlantic Forest with sugarcane cultivation generates negative impacts on the physical attributes of the soil [33][34][35][36].These attributes control many soil functions, such as water retention and infiltration, gas exchange, resistance to erosion, nutrient dynamics, and root penetration [12], and directly influence ecosystem services.
In this work, we investigate how land use change affects soil structure by using information theory to quantify the complexity of soil 3D X-ray CT soil samples.For the first time, the Fisher-Shannon method [37], introduced to jointly quantify the local and global properties of the probability density function of unidimensional signals, is applied in the context of soil complexity.In the current study, the "signals" are represented by a 790 × 790 set of 1d vertical lines of 790 greyscale values in X-ray CT scan images of soil samples from a sugarcane field, and a nearby Atlantic forest site, in northeastern Brazil.For each image and each of these sequences, we calculate Shannon entropy power (SEP) and Fisher information measure (FIM) that quantify the disorder and structural organization of a signal's variation [38].The joint FIM/SEP analysis is then performed through the Fisher-Shannon information plane (FS) via an innovative normalization procedure to achieve a 91.7% level of accuracy of distinction between the sugar plantation and the Atlantic forest samples.

Soil Samples
Twenty-four soil samples analyzed in this work were collected from sugarcane cultivation and a nearby native Atlantic forest at a location in the northeastern Brazilian region, the state of Pernambuco, between latitudes −7.84836 and −7.83519 and longitudes −34.9973 and −34.9935, as shown in Figure 1.Two samples were collected at each site: one at 10 cm and another at 20 cm depth.

Soil Samples
Twenty-four soil samples analyzed in this work were collected from sugarca vation and a nearby native Atlantic forest at a location in the northeastern Brazilian the state of Pernambuco, between latitudes −7.84836 and −7.83519 and longitudes and −34.9935, as shown in Figure 1.Two samples were collected at each site: one and another at 20 cm depth.The samples were collected using a soil auger with an internal PVC cylind cm height and 7.5 cm diameter and excavated by careful penetration with a cylin pled with a blade.After the insertion of the auger in the soil, the cylinders were c extracted to ensure the preservation of the original structure of the environment in PVC cylinders.The samples were then dried at 40 °C to remove the water conten the scanning tomography of the samples.
The CT tomography was performed using a third-generation Nikon XT H 2 ray microtomograph with 150 kV voltage, 180 µA current, 500 ms exposure time, µm resolution for voxels.A copper filter with a thickness of 0.5 mm was used to m low-intensity photons.After scanning the total cylinder volume in the initial acq a subvolume of interest was defined and reconstructed using CTPro 3D XT 3.0.3Metrology NV, Brighton, MI, USA) software.The central part of the cylinder w lighted to avoid edge influence.The reconstructed 2D axial projections mainta spatial resolution of the acquisition of 45 µm and were saved at a radiometric re (grayscale level) of 16 bits.The final volume was 790 stacks with 790 × 790 pixels, end volume of 790 3 = 493,039,000 voxels.
The voxel values of the CT scan images correspond to local sample density on field unit (HU) scale (a linear transformation of the linear attenuation coefficient m ment such that the radiodensity of water is defined as 0 HU, and the radiodensity defined as −1000 HU).The minimum observed HU value is 0, the maximum i mean is 16,327, and the standard deviation is 402.
Considering the vertical (gravity) direction as naturally preferential from a p enological point of view, in what follows, we perform calculations on 790 × 790 = vertical lines of 790 grey-level values each for every sample.The samples were collected using a soil auger with an internal PVC cylinder of 7.5 cm height and 7.5 cm diameter and excavated by careful penetration with a cylinder coupled with a blade.After the insertion of the auger in the soil, the cylinders were carefully extracted to ensure the preservation of the original structure of the environment inside the PVC cylinders.The samples were then dried at 40 • C to remove the water content before the scanning tomography of the samples.
The CT tomography was performed using a third-generation Nikon XT H 225 ST X-ray microtomograph with 150 kV voltage, 180 µA current, 500 ms exposure time, and a 45 µm resolution for voxels.A copper filter with a thickness of 0.5 mm was used to minimize low-intensity photons.After scanning the total cylinder volume in the initial acquisition, a subvolume of interest was defined and reconstructed using CTPro 3D XT 3.0.3(Nikon Metrology NV, Brighton, MI, USA) software.The central part of the cylinder was highlighted to avoid edge influence.The reconstructed 2D axial projections maintained the spatial resolution of the acquisition of 45 µm and were saved at a radiometric resolution (grayscale level) of 16 bits.The final volume was 790 stacks with 790 × 790 pixels, with an end volume of 790 3 = 493,039,000 voxels.
The voxel values of the CT scan images correspond to local sample density on Hounsfield unit (HU) scale (a linear transformation of the linear attenuation coefficient measurement such that the radiodensity of water is defined as 0 HU, and the radiodensity of air is defined as −1000 HU).The minimum observed HU value is 0, the maximum is 32,492, mean is 16,327, and the standard deviation is 402.
Considering the vertical (gravity) direction as naturally preferential from a phenomenological point of view, in what follows, we perform calculations on 790 × 790 = 624,100 vertical lines of 790 grey-level values each for every sample.

The Fisher-Shannon Method
The Fisher-Shannon method consists of a joint analysis of Fisher information measure (FIM), which quantifies the amount of organization (or order) in a signal, and Shannon entropy (SE) which quantifies the amount of disorder [37].Fisher introduced the FIM concept in the statistical estimation theory [39], and it was subsequently used to describe physical systems [40,41], as well as for time series analysis in geophysics [42,43], ecology [44], astrophysics [45], meteorology [46,47], hydrology [48], and social science [49].
For a univariate distribution of a continuous variable X with probability density function (PDF) f (x), the Fisher information measure I X is defined as [49] and Shannon entropy H X as The Fisher information measure thus describes the local properties of the PDF, while the Shannon entropy describes its global properties [49].The shape of the PDF is reflected on these measures, as the FIM assumes high values if the PDF is narrow and low values if the PDF is broad, while SE attains high values for a broad PDF and low values for a narrow PDF.
Instead of Shannon entropy, it is often more convenient [50] to use the quantity called Shannon entropy power (SEP) defined by The product C X = N X I X satisfies "isoperimetric inequality" N X I X ≥ 1 (where equality holds for the Normal distribution), demonstrating that FIM and Shannon entropy are intrinsically related and can be jointly used to characterize the non-stationary behavior of complex signals.The product N X I X is called Fisher-Shannon complexity (FSC) and can be used as a statistical measure of the complexity of the signal under study [51].The joint FIM/SEP analysis is performed through the Fisher-Shannon information plane (FS), where Shannon entropy power N X is used for the horizontal axes, and Fisher information measure I X is taken for the vertical axis variable [37].The signal is mapped to the point with coordinates (N X , I X ), which can lie anywhere in the FS plane where the "isoperimetric inequality" N X I X ≥ 1 is satisfied.The distance from the "isocomplexity" line N X I X = 1 can be used as a measure of the complexity of the signal [49].
As the above measures depend only on the PDF, Fisher-Shannon analysis can be implemented for real-world datasets corresponding to complex systems through nonparametric density estimation, avoiding parametric assumptions on the distribution.One possibility is using histograms with the discretized version of (1) and (2).However, in this work, we implement kernel density estimation, which is more reliable [52] in the current case.The Kernel density estimator of the PDF is given by [53] fM where b > 0 is the so-called bandwidth parameter, N is the length of the signal, and K(u) is the kernel function, which is a continuous symmetric function that satisfies K(u) ≥ 0 and The most widely used is the Gaussian kernel The term "nonparametric density estimation" here refers to avoiding a choice of the functional form of the distribution and the corresponding parameter estimation.On the other hand the bandwidth parameter b > 0 is used here to control the smoothness of the PDF and is determined here through Silverman's rule [54] where n is the sample size, σ is standard deviation, and IQR is the interquartile range.

Results and Discussion
The I x , N x , and C x values were calculated for each of the 624,100 vertical lines of length 790 for all the 24 CT scan images, as well as for the 790 horizontal planes of 790 × 790 = 624,100 voxels each, and for the full 790 3 = 493,039,000 voxel images.The PDF f (x) in the 1d case corresponds to 790 voxels for each of the 624,100 vertical lines; in the 2d case, f (x) corresponds to 624,100 voxels for each of the 790 vertical planes; and in the 3D case, there is a single PDF f (x) corresponding to all the 493,039,000 voxels.
After extensive testing with different combinations of quantities and distribution measures that can be extracted from these calculations, we have found that the first option of considering the set of vertical lines for each sample yields the best distinction between the sugar cane and the Atlantic forest samples.The descriptive statistics (minimum, maximum, quartiles, mean, and standard deviation) of the I x , N x , and C x values of vertical lines, obtained with bandwidth b = 94.85 (average Silverman's rule value for all the vertical strips of all the images), are presented in Tables A1-A3 in the Appendix A, respectively, and the distribution of the values in the 790 × 790 plane for all samples are presented in Figures 2-4.
As seen in Figures 2-4, the spatial distribution of I x demonstrates higher values for sugarcane (SC) samples compared to the Atlantic forest (AF), while the N x distribution demonstrates the opposite, i.e., lower values are found for SC in comparison with the AF samples.On the other hand, these differences are mostly canceled out when the complexity C X = N X I X is calculated; the SC samples C X values are found to be rather homogeneous except for the sample SC2-10, while inhomogeneity is rather pronounced for the AF samples, more so for the 10 cm than for the 20 cm depth samples.These findings are purely phenomenological, and the observations may not hold in a general scenario of native versus cultivated land samples.
In order to address the distance from the isocomplexity line N X I X = 1 as a measure of complexity of the vertical sample lines [49], it should be noted that the scales of Shannon entropy power with an average of N x = 1.527 × 10 5 and Fisher information with an average of I = 9.278 × 10 −6 differ in orders of magnitude, while the average of the Fisher-Shannon complexity C x = 1.352 is of the order of unity.If the projection of a point (N X , I X ) in the FS plane presented in Figure 5a to the nearest point on the "iso- complexity" line is denoted by (N X0 , I X0 ), then the displacement of Shannon information ∆I X ≡ I X − I X0 turns negligible in comparison to the displacement of Shannon entropy power ∆N X ≡ N X − N X0 , and by minimizing the distance to the isocomplexity line, the points are projected vertically down to the isocomplexity line, with the Euclidean distance being practically reduced to ∆N X and there being no influence of ∆I X .
which is solved numerically for all samples.The results of this novel procedure are presented in Figure 5b, and the distances scatterplot from the isocomplexity plane is presented in Figure 6.The sugarcane sample's complexity, quantified by the distance from the isocomplexity line (Figures 4 and 5) is generally lower than those of the Atlantic forest samples, which also exhibit large fluctuations between the alternative values from samples taken at depths of 10 cm and 20 cm.The complexity values are presented in Table 1.To mitigate this fact, here we introduce a novel normalization procedure for the variables N X and I X .More precisely, first we identify the maximum value N Xmax among all the samples, and we scale all the sample values as N X = N X /N Xmax and I X = I X N Xmax , thus preserving the C x ≡ N X I X complexity values.Distance from a point (N X , I X ) to a projection point (N XI , I XI ) on the isocomplexity plane is now given by and setting the derivative of d with respect to N XI to zero to find the closest point N X0 , I X0 yields the fourth-order polynomial expression for x which is solved numerically for all samples.The results of this novel procedure are presented in Figure 5b, and the distances scatterplot from the isocomplexity plane is presented in Figure 6.To demonstrate the validity of this novel approach, in Figure 7, we show the 3D images of the two samples with the lowest and highest complexity, respectively.Figure 6.The distances' scatterplot from the isocomplexity line in the order of samples from south to north (same as in Table 1).
The sugarcane sample's complexity, quantified by the distance from the isocomplexity line (Figures 4 and 5) is generally lower than those of the Atlantic forest samples, which also exhibit large fluctuations between the alternative values from samples taken at depths of 10 cm and 20 cm.The complexity values are presented in Table 1.
To demonstrate the validity of this novel approach, in Figure 7, we show the 3D images of the two samples with the lowest and highest complexity, respectively.It should be stressed here that the current approach, without using any arbitrary parameters (such as, e.g., threshold), not only yields results that agree well with common sense (as can be seen in Figure 5) but also provides a precise quantitative measure of the complexity of the samples.The usefulness of this approach for quantifying soil degradation should be tested in future studies in the natural environment and controlled laboratory experiments.
Finally, to test the discriminative power of the current approach, we implement here the fitting of values from Table 1 to a logistic function, where a categorical variable of zero value is attributed to the sugarcane samples and unit value to the Atlantic forest samples, with results shown in Figure 8.It should be stressed here that the current approach, without using any arbitrary parameters (such as, e.g., threshold), not only yields results that agree well with common sense (as can be seen in Figure 5) but also provides a precise quantitative measure of the complexity of the samples.The usefulness of this approach for quantifying soil degradation should be tested in future studies in the natural environment and controlled laboratory experiments.
Finally, to test the discriminative power of the current approach, we implement here the fitting of values from Table 1 to a logistic function, where a categorical variable of zero value is attributed to the sugarcane samples and unit value to the Atlantic forest samples, with results shown in Figure 8.
sense (as can be seen in Figure 5) but also provides a precise quantitative measure of the complexity of the samples.The usefulness of this approach for quantifying soil degradation should be tested in future studies in the natural environment and controlled laboratory experiments.
Finally, to test the discriminative power of the current approach, we implement here the fitting of values from Table 1 to a logistic function, where a categorical variable of zero value is attributed to the sugarcane samples and unit value to the Atlantic forest samples, with results shown in Figure 8.  Considering logistic regression as a binary classifier, the threshold of d = 0.0952 separates the two groups of samples with only two samples (SC2-10 and SC4-20) falling into the wrong category.It should be noted here that the k-means method does not produce meaningful results in this case because of the difference in the variance of the d values of the two groups, which is much smaller for the sugarcane samples than for the Atlantic forest samples.
Observing the images of I x , N x , and C x of spatial distribution of these samples in Figures 2-4 reveals that the origin of the strikingly high value of d = 0.290 for sample SC2-10 stems from the pronounced values of Shannon entropy power N X , which, in combination with Information measure values I X , yield a Fisher-Shannon complexity C X for each of the 624,100 vertical lines that was rather similar to those of the Atlantic forest samples.Therefore, the correct grouping of 22 out of 24 samples (91.7%) is attained, demonstrating the power of the current novel nonparametric approach.Nevertheless, considering the fact that the samples are geographically very close to each other so that correlation among them could have resulted in overfitting, this obtained accuracy represents the training error, and the regressor is yet to be tested in an independent, uncorrelated dataset.

Conclusions
Overall, we can claim that the Fisher-Shannon complexity captures the morphological changes induced by land-use change rather well.More precisely, the sugarcane field sites lie in the area that has been converted from Atlantic forest to plantation, and the subsequent cultivation activities have brought about changes in the soil morphology.While the results are not entirely consistent in terms of depth and/or position, the 91.7% grouping success rate may be considered quite high, where the discrepancies may be attributed to some yet unknown particularities of these sites.
The approach introduced in the current work does not use arbitrary parameters.It provides a rather precise quantitative complexity measure, which may be seen as a quantifier of soil degradation level.Finally, the novel normalization procedure of variables for representation in the Fisher-Shannon information plane, i.e., preserving the Fisher-Shannon complexity, may be useful for time series and image analysis in general.

Figure 1 .
Figure 1.Spatial distribution of sample sites.Two samples were taken at each site: one another at 20 cm depth.

Figure 1 .
Figure 1.Spatial distribution of sample sites.Two samples were taken at each site: one at 10 and another at 20 cm depth.

Figure 2 .
Figure 2. Spatial distribution of the  values in the horizontal projection plane for each sample.Pixels are color-coded in blue for values below two standard deviations from the global average and red for values above two standard deviations.For values in between, a spectrum of colors is used, as per the color bar.

Figure 2 .
Figure 2. Spatial distribution of the I x values in the horizontal projection plane for each sample.Pixels are color-coded in blue for values below two standard deviations from the global average and red for values above two standard deviations.For values in between, a spectrum of colors is used, as per the color bar.

Figure 3 .
Figure 3. Spatial distribution of the  values in the horizontal projection plane for each sample.Pixels are color-coded in blue for values below two standard deviations from the global average and red for values above two standard deviations.For values in between, a spectrum of colors is used, as per the color bar.

Figure 3 .
Figure 3. Spatial distribution of the N x values in the horizontal projection plane for each sample.Pixels are color-coded in blue for values below two standard deviations from the global average and red for values above two standard deviations.For values in between, a spectrum of colors is used, as per the color bar.

Figure 4 .
Figure 4. Spatial distribution of the  values in the horizontal projection plane for each sample.Pixels are color-coded in blue for values below two standard deviations from the global average and red for values above two standard deviations.For values in between, a spectrum of colors is used, as per the color bar.

Figure 4 .
Figure 4. Spatial distribution of the C x values in the horizontal projection plane for each sample.Pixels are color-coded in blue for values below two standard deviations from the global average and red for values above two standard deviations.For values in between, a spectrum of colors is used, as per the color bar.

Figure 5 .
Figure 5. (a) Scatterplot of the original ( ,  ) value pairs, and (b) scatterplot of the normalized values ( ,  ), together with projections ( ,  ) to the closest point on the isocomplexity line.

Figure 5 .
Figure 5. (a) Scatterplot of the original (N X , I X ) value pairs, and (b) scatterplot of the normalized values N X , I X , together with projections N X0 , I X0 to the closest point on the isocomplexity line.

Figure 8 .
Figure 8. Fit to a logistic function.The vertical dashed line represents the threshold d = 0.0952.

Figure 8 .
Figure 8. Fit to a logistic function.The vertical dashed line represents the threshold d = 0.0952.
Figure 6.The distances' scatterplot from the isocomplexity line in the order of samples from south to north (same as in Table1

Table A1 .
Descriptive statistics of Fisher information measure I x .

Table A3 .
Descriptive statistics of Fisher-Shannon complexity C x .