Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization

Silva, Italo Francyles Santos da; Araújo, Alan de Carvalho; Almeida, João Dallyson Sousa de; Paiva, Anselmo Cardoso de; Silva, Aristófanes Corrêa; Roehl, Deane

doi:10.3390/agriengineering7020027

Open AccessArticle

Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization

by

Italo Francyles Santos da Silva

^1,*,†

,

Alan de Carvalho Araújo

^1,†

,

João Dallyson Sousa de Almeida

^1,*,†

,

Anselmo Cardoso de Paiva

^1,†

,

Aristófanes Corrêa Silva

^1,†

and

Deane Roehl

^2,†

¹

Applied Computing Group (NCA), Federal University of Maranhão, São Luís 65085-580, MA, Brazil

²

Tecgraf Institute, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro 22453-900, RJ, Brazil

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AgriEngineering 2025, 7(2), 27; https://doi.org/10.3390/agriengineering7020027

Submission received: 16 December 2024 / Revised: 15 January 2025 / Accepted: 16 January 2025 / Published: 27 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

The pore structure plays a crucial role in soil systems. It affects a range of processes essential for soil ecological functions, such as the transport and retention of water and nutrients, as well as gas exchanges. The mechanical and hydrological characteristics of soil are predominantly determined by the three-dimensional pore pore-space structure. A precise analysis of pore structure can help specialists understand how these shapes impact plant root activity, leading to better cultivation practices. X-ray computed tomography provides detailed information without destroying the sample. However, manually delineating pore structure and estimating porosity are challenging tasks. This work proposes an automated method for 3D pore segmentation and characterization using convolutional neural networks with attention mechanisms. The method introduces a novel approach that combines attention at both channel and spatial levels, enhancing the segmentation and property estimation, providing valuable insights for a more detailed study of soil conditions. In experiments conducted with a private dataset, the segmentation results achieved mean Dice values of 99.10% ± 0.0004 and mean IoU values of 98.23% ± 0.0008. Additionally, in tests with Phaeozem Albic, the automatic method provided porosity estimates comparable to those obtained by a method based on integral geometry and morphology.

Keywords:

3D pore segmentation; soil characterization; porosity estimation; convolutional neural networks; attention mechanisms; computed tomography

1. Introduction

The analysis of soil pore structure, including its connectivity and spatial relationships, is of great importance to various segments of the economy, such as agriculture, soil remediation, and the oil industry. These structures can define soil characteristics related to ecological functions, such as the transportation and retention of water and nutrients and gas exchange with the environment, and also provide clues about the storage capacity of energy resources [1,2].

Those characteristics also affect root growth. Roots mainly grow in macropores and thus progressively fill them [3]. Roots are also extremely sensitive to changes in water distribution in the medium, especially if this leads to hydraulic stress [4]. Soils with high porosity retain more water (in tiny pores) and allow drainage (in large pores), which are crucial for the water balance of roots. In this context, soil analysis is important for planning soil management and identifying structural changes that should be avoided to prevent soil productivity compromise. Changes caused by inadequate practices can lead to greater compaction, negatively affecting the soil’s mechanical resistance to penetration, density, and porosity, consequently harming crop productivity [5].

Recently, methods based on imaging technologies have been widely used in soil analysis, allowing the acquisition of high-resolution three-dimensional representations of soil samples. Among these methods, high-resolution X-ray micro-computed tomography (micro-CT) is used to capture and visualize the three-dimensional geometric structure of pores [6]. A 3D digital soil sample can be used to extract information about its mineral composition and microstructural parameters that are challenging to estimate through observational experience. In this context, image processing methods are important, as they can help visualize and quantify the available pore space [7].

One step in this analysis is segmentation, which consists of extracting from the images only a region of interest to the study, such as the pore structure. This process can be done using free software that implements image processing techniques, such as ImageJ [8]. However, they require specialists to configure their parameters manually, leading to the execution of several tests in order to find the best configuration for an analyzed sample, making this analysis process more time-consuming and complex. Commercial options such as Dragonfly [9] may offer more options to make the task easier, but they add financial costs. In both cases, it may be necessary to use other tools to carry out complementary analyses, such as property estimation, thus increasing operational complexity.

As an alternative, methods based on the combination of image processing and deep learning can promote greater efficiency in the treatment of these data and, in an integrated solution, assist specialists in various scenarios, such as feasibility studies and activity planning based on the structural behavior of the soil studied.

There are works in the literature with good results aimed at classifying soil type [10,11] and automatically estimating porosity [12,13] and permeability [14]. As for segmentation, the methods mostly use conventional convolutional neural network (CNN) architectures without including more robust mechanisms. Da Wang et al. [15] developed a method for multi-class segmentation of mineral components using a hybrid architecture of U-Net [16] and ResNet [17]. Yu et al. [18] used a fully convolutional network (FCN) to segment sandstone images and analyze the characteristics of the pore space. Marques et al. [19] performed a comparative analysis between U-Net and SegNet [20], the latter producing better results for samples extracted from a borehole.

Attention mechanisms have recently been used to compose convolutional neural network architectures to improve their results [21,22]. In the context of soil analysis, Koeshidayatullah et al. [23] have developed an approach to super-resolution as a step prior to the petrophysical analysis of the sample. Segmentation applications are addressed by Song et al. [24], who used a model combining Transformer and CNN with concentrated-fusion attention to identify different pore types. However, among the studies cited, this is the only one that uses a 3D approach to image processing and analysis. The others are applied to 2D images individually.

Considering the lack of work on 3D soil analysis, this paper introduces a method for 3D segmentation of pore structures in soil micro-CT samples and estimating properties for characterization. This method uses a CNN integrated with a proposed attention mechanism that combines the advantages of two other mechanisms (CBAM [25] and GSAU [26]) to produce more accurate segmentations. In contrast to the aforementioned approaches, segmentation is not done by slice but by 3D subgroups of slices extracted from the original volume. The segmentations generated are also used to estimate porosity, porosity curves, and tortuosity, which are helpful in the soil characterization process.

Therefore, this work presents the following contributions: (1) a method for 3D pore segmentation and estimation of properties related to the study of soil samples; (2) a proposed attention mechanism, which combines the channel-attention and spatial-attention modules of the CBAM and GSAU mechanisms, respectively; (3) a comparative analysis between the results of the proposed method and an approach based on integral geometry, topology, and mathematical morphology applied to a sample of Phaeozem Albic, a soil widely used in agriculture.

This paper is organized as follows: Section 2 explains the phases of the proposed method for soil segmentation and characterization; Section 3 shows the results obtained in the experiments performed; in Section 4, the strengths and limitations of the proposed method are addressed; finally, conclusions and future work are presented in Section 5.

2. Proposed Method

In this section, the proposed method is presented. It is composed of two main phases, each consisting of specific steps. They are: (1) Pore Segmentation, and (2) Soil Characterization. Each phase is described in the following sections.

2.1. Pore Segmentation

The Pore Segmentation phase comprises three main steps, as shown in Figure 1. The CT soil volume is passed as input. In the first step, the volume is divided into groups of contiguous slices. In the second step, each group is passed to the CNN for segmentation. The segmented groups are merged, thus recomposing the volume and generating the final segmentation as output. Each step of this phase is explained as follows.

2.1.1. Slice Group Splitting

In the context of soil CT scans, the generated image volumes usually have thousands of slices with high width and height dimensions. Considering this, in this step, the input volume is divided into groups of eight contiguous slices in order to process each of them, reducing computational costs compared to processing the entire volume. The choice of contiguous slices in a group was made to maintain the continuity of the pore structure, which would be incorrectly characterized if random slices were grouped together. In addition, the number eight was chosen due to the limitations of the CNNs used in the segmentation step, which will be explained in the next section.

It should be noted that when a volume has a total number of slices that is not divisible by eight, a zero padding strategy is applied. In this case, slices with zero pixels will be added to the group to complete the eight slices. This process is shown in Figure 2.

Lastly, each slice of the created groups is resized to 256 × 256. Min-max normalization is also applied to scale pixel values to a range of 0 to 1. Then, all slice groups are passed as input to the second step: the segmentation process.

2.1.2. Segmentation

In this step, all slice groups are fed into the CNN to generate the mask segmentation. The CNN receives volumes with 256 × 256 × 8 dimensions and yields volumes with the same dimensions whose voxel values are classified as 0 (background) or 1 (soil).

In this work, a proposed network based on the 3D U-Net architecture [16] is proposed. An overview of the main architecture can be seen in Figure 3. This network has four levels of 3D downsampling and upsampling, with the number of filters increasing progressively at each level: 16, 32, 64, and 128. As the downsampling levels progress, the dimensions of the feature maps are halved. This constraint explains why the number eight was chosen in the preceding section: the number of slices in a group must be a power of two. The ReLU activation function [27] was used in these processes. The attention module is used before each upsampling layer and receives as input two sets of feature maps: those coming from the output of the previous layer and the skip connection. This approach was inspired by the works of Oktay et al. [28] and Li et al. [29].

The proposed attention module comprises a combination of two approaches: Convolutional Block Attention Module (CBAM) [25] and Gated Spatial-Attention Unit (GSAU) [26]. The CBAM approach is known for weighting feature map information in both channel (C) and spatial (S) dimensions, multiplying these generated maps (C and S) with the original input feature map to produce the final feature map in the output [29,30]. According to Woo et al. [25], this overall process is divided into the channel-attention and spatial-attention parts. They can be described by Equations (1) and (2) explained below.

F_{c} = A_{c} (I) \otimes I

(1)

F_{s} = A_{s} (F_{c}) \otimes F_{c}

(2)

where I is the input feature map;

A_{C}

and

A_{S}

are, respectively, the channel-level and spatial-level attention weights;

F_{c}

and

F_{s}

are weighted feature maps.

The channel-attention part is focused on identifying the most important features. First, average pooling and max pooling are applied separately in the input feature map (

I \in R^{C \times W \times H}

). These pooled features are fed into a multi-layer perceptron (MLP). The results are combined to produce the attention weights for each channel (

A_{C} \in R^{C \times 1 \times 1}

). This channel attention and the input are multiplied, generating the weighted feature map

F_{c}

. In the spatial-attention part, max and average pooling of

F_{c}

are performed. These pooled features are concatenated and passed through a convolutional layer to generate the spatial-attention weights (

A_{S} \in R^{1 \times W \times H}

). Then, this spatial attention is multiplied by

F_{c}

, generating the refined features

F_{s}

.

Regarding GSAU, this module is focused only on the spatial-level attention. Based on gated convolutional networks [31] and transformers [32,33], Wang et al. [26] combined spatial attention with a gated linear unit (GLU) to introduce an adaptive gating mechanism, reducing parameters and calculations. For an intermediate input I, GSAU performs the extraction of spatial information as described by the Equation (3).

F_{G S A U} = f_{d w} (f_{p w} (I)) \otimes f_{p w} (I)

(3)

where

f_{p} w (\cdot)

and

f_{d} w (\cdot)

indicate pointwise and depthwise convolution, respectively; ⊗ represents the element-wise multiplication;

F_{G S A U}

is the refined feature map.

As aforementioned, an attention module combining the CBAM and GSAU approaches is proposed in this work. This proposed module is also composed of two parts, one focused on channel and the other on spatial attention. Therefore, an input I is passed to the CBAM’s channel attention, generating the weighted feature map

F_{c}^{p}

. Then, this result is used as input to the GSAU, which generates the refined feature map

F_{G S A U}^{p}

. The combination of these approaches is based on the hypothesis that pooling operations applied in the spatial part of CBAM can lead to the loss of relevant spatial information, while the combination of pointwise and depthwise convolutions preserves more important details. This process is shown in the Equations (4) and (5).

F_{c}^{p} = A_{c} (I) \otimes I

(4)

F_{G S A U}^{p} = f_{d w} (f_{p w} (F_{c}^{p})) \otimes f_{p w} (F_{c}^{p})

(5)

where

A_{C} \in R^{C \times 1 \times 1}

are the attention weights generated for each channel;

F_{c}^{p}

and

F_{G S A U}^{p}

are the weighted feature maps generated by the proposed attention module, the latter being passed to the subsequent layer.

In the last layer of the CNN, the sigmoid activation function is used to classify voxel values as 0 or 1. Segmentation masks will be generated for each volume that will be passed to the next step.

2.1.3. Slice Group Merging and Output

The segmented groups generated in the second step are passed to this step to be merged, i.e., reorganized in the z-axis to prepare the output of the segmentation process. After that, the generated segmentation can be used in the next phase of the proposed method for the analysis of soil and its pore structure.

2.2. Soil Characterization

Soil characterization is the final phase of the proposed method. In this phase, the previously generated segmentation serves as a basis for extracting detailed properties of soil composition and pore structure. These properties can be used as indicators that support specialists in making decisions about soil management, such as planting preparation or other agricultural activities, aiming at more efficient and sustainable soil utilization practices.

The soil analysis performed in this phase is based on computational petrophysics metrics. According to Fernandes et al. [34], computational petrophysics is a fast and low-cost methodology for predicting properties such as porosity, intrinsic permeability, relative permeability, and electrical parameters such as formation factor and resistivity index, based solely on knowledge of the microstructure. In this work, three properties are calculated: porosity, porosity curve, and tortuosity.

Before these calculations are performed, a circular sub-area with a radius of 100 pixels is delimited from the center of the sample (Figure 4). Thus, possible damage to the sample edges caused by the capture process is disregarded. According to Dantas et al. [12], the porosity (P) of a volume can be measured according to Equations (6) and (7):

P_{s} = \frac{n_{p}}{n_{p} + n_{s}}

(6)

P_{v} = \frac{\sum_{i = 1}^{n} (P_{s} (S_{i}))}{n}

(7)

where

n_{p}

is the number of pore pixels (corresponding to 0 in the segmented volume), and

n_{s}

is the number of soil pixels (equal to 1 in the segmented volume)/n is the number of slices of a segmented volume. The porosity

P_{s}

is calculated for each slice S of the segmented volume, and the total volume porosity (

P_{v}

) is calculated by averaging the porosities

P_{s}

.

With the porosities calculated per slice, it is possible to create a porosity curve that represents the variation in porosity throughout the volume analyzed. This curve allows patterns to be identified, such as regions with higher or lower pore density, providing insights into the structural heterogeneity of the material. This information can be fundamental for studies on fluid dynamics, mechanical resistance, or other properties associated with soil behavior.

In this work, tortuosity is also calculated. This property describes the complexity of pore geometry in porous media for fluids and solutes to move through. It also indicates how tortuous is a path of a 3D binarized image along the z-axis [35]. According to Dantas et al. [12], the tortuosity (T) is defined as follows: first, the centroid of each slice is calculated, then the length of the path through these centroids divided by the number of slices along the z-axis, resulting in the average length (

L_{c}

). Tortuosity is then the ratio between the average length of a curve (

L_{c}

) and the distance between its endpoints, as shown in the Equation (8).

T = \frac{L_{c}}{d (c_{0}, c_{n})}

(8)

where

c_{0}

and

c_{n}

are the endpoints of a volume (the first and last centroid, respectively), and

d (\cdot)

is the Euclidean distance between two points.

According to Zhang et al. [35], tortuosity is a parameter that quantifies the influence of soil pore geometry on the movement of fluids and solutes, reflecting the interaction between the medium’s structure and transport processes. Therefore, based on Equation (8), a value lower than 1 means that the flow paths are elongated due to obstacles or complexities within the medium, such as irregular or constricted pores, which increase the resistance to flow and affect overall transport efficiency.

3. Experiments and Results

In this section, the experiments carried out during the development of the proposed method are presented, as well as the results achieved and considerations about the applied approach.

3.1. Datasets

In this work, the experiments were conducted with two datasets. The first is a private dataset containing micro-CT images of a soil sample composed of coarse-grained, poorly consolidated particles, consisting predominantly of shell fragments and other organic remains. The volume has 3771 slices with

2000 \times 2048

dimensions acquired with a resolution of 19 micrometers. This dataset also contains the ground-truth with the segmentation mask of the soil sample, as shown in Figure 5. The specialists obtained these segmentations using the Dragonfly Software’s thresholding option. Porosity and tortuosity properties were calculated using the approaches mentioned in Section 2.2.

This sample was taken from an outcrop area in Alagoas, Brazil. Its other properties include 7.86 cm in height and 3.76 cm in diameter, a grain density of 2.71 g/cm³, 76.76 cm³ of solids volume, an air permeability of 10.60, and a confining pressure of 500 psi.

The Silty Loam Phaeozem Albic (SLPA) dataset [2] contains two varieties of sample conditions: dry and wet (Figure 6). After the first microtomography in dry condition, the sample was removed and moistened by the excess water through the base of several filter papers. This moistening process continued for 7 full days. Then, the sample was free-drained on a sandy base, reaching constant weight without evaporation. Then, the sample was imaged by microtomograph for 2–3 h. Both volumes have 1001 slices with dimensions

500 \times 500

. The ground truth, however, was not made available. In the experiment carried out with this dataset, the masks were generated automatically by the proposed method.

The experiments performed with the private dataset served to develop and consolidate the proposed method with a focus on the segmentation phase. Complementarily, the experiments with the SLPA dataset served to demonstrate the application of the proposed method with another type of sample, and the possibility of achieving results similar to those presented by the dataset developers, which are based on integral geometry, topology, and morphological analysis methods.

3.2. Experimental Setup

The proposed method was developed using the following hardware and software specifications: Intel Core i7-6700HQ CPU, 16 GB RAM, and an NVIDIA GeForce GTX 1070 GPU. For the image processing and machine learning tasks, Python 3.6 with the frameworks Keras [36] and Tensorflow were used.

In order to evaluate the segmentation results, the Dice coefficient and Jaccard index (or Intersection over Union—IoU) were used. The scientific community widely uses and recommends these metrics to evaluate segmentation methods [37].

3.3. Experiments with the Private Dataset

The soil sample was divided into groups of 8 slices, a limitation imposed by the CNN’s architecture, which requires dimensions to be powers of two as stated in Section 2.1.2. In this first experiment, a hold-out approach was used. Thus, the dataset was divided into 70% for training (of which 20% was reserved for validation) and 30% for testing. Data augmentation was used for the training set, including rotations in the range of [−30°, 30°], as well as vertical and horizontal flips. The training set increased from 142 groups of 8 slices to 792 groups. All slices were resized to

256 \times 256

due to computational limitations.

For the training process, the following hyperparameters were used: 50 epochs, the Dice Loss function, and the Adam optimizer; the learning rate was set at 0.001, with a decay of 0.1 every 10 epochs. Early stopping was also used with a patience threshold of 30 epochs to prevent overfitting. The segmentation results can be seen in Table 1.

The results obtained by U-Net in its conventional architecture were the lowest found in this comparative analysis. It can be seen that the use of an attention mechanism such as GSAU was important to improve the results of Dice and IoU. However, among the mechanisms tested, CBAM and the proposed method were the ones that stood out, the latter slightly outperforming the others with an IoU of 0.9825, rivaling CBAM in Dice values.

Based on this result, the two best approaches were subjected to a 3-fold cross-validation (with the same training and testing proportions used in the previous experiment) in order to evaluate the consistency of their performance when using different training and test sets. Table 2 shows these results. The proposed method outperformed CBAM, with higher average Dice and IoU values and lower standard deviation values, indicating a more consistent performance.

To analyze the statistical difference between the presented methods, the t-test was used, adopting a two-tailed and paired approach. The p-values obtained for Dice and IoU were 0.26 and 0.32, respectively. Since both are higher than the significance level

α

(

α = 0.05

), it was inferred that the statistical difference between the methods is not significant.

However, in addition to the proposed method slightly outperforming CBAM, its qualitative results were also better. Figure 7 shows the qualitative results of the tested methods. In that figure, the last slice group is shown. For all the tested methods, segmentation errors were observed in the last slice group.

The U-Net failed to produce a segmentation for the last slice, indicating a tendency toward false negatives. The inclusion of GSAU addressed this issue, successfully recovering predictions for the affected slice. Meanwhile, a tendency for CBAM toward false positives is observed, as this method generated predictions for a slice without ground, used as zero padding as explained in Section 2.1.1. The proposed method, which combines CBAM and GSAU, effectively mitigated these errors, balancing the strengths of both components to achieve more accurate segmentation results.

The segmentation generated for the test subvolume is used in the soil characterization phase, in which properties are extracted that can be used to support soil analysis. Table 3 shows the porosity and tortuosity results obtained by the segmentations generated by the proposed method compared to ground-truth.

Since porosity is estimated at the pixel level, it can be seen that even though the Dice was above 99.10%, which indicates a segmentation very similar to the ground-truth, the number of pixels misclassified by the proposed method meant that the values obtained were lower than those extracted with ground-truth. This is also reflected in the tortuosity results.

As mentioned, the results of the proposed method presented a lower porosity value compared to ground-truth, but showed greater tortuosity. According to Dantas et al. [12] and Zhang et al. [35], the higher the porosity of a soil, the lower the tortuosity value. Therefore, the findings of this study are in line with the literature regarding the relationship between these properties.

At last, Figure 8 shows a comparison of the porosity curves estimated by the proposed method and by ground-truth. Despite the difference between the porosity averages, the behavior of the curves is very similar. This shows that the method has achieved promising results, as it provides an analysis comparable to the real scenario.

3.4. Experiments with the SLPA Dataset

Experiments were also conducted with the SLPA dataset, which contains dry and wet soil samples with Pheaozem Albic characteristics, as explained in Section 3.1. These experiments aim to verify in a case study whether the proposed method provides results similar to those found by Ivonin et al. [2], who are the authors responsible for the dataset.

According to these authors, the average porosity value estimated for the Pheaozem Albic sample in the dry condition is higher than that estimated in the wet condition, given that wetting the soil leads to a reduction in the size of the pores and, consequently, the closure of a large part of them. However, this analysis was based on integral geometry, topology, and mathematical morphology methods to calculate the porosity curve from the images. This approach was carried out using various software programs and requires knowledge of advanced mathematics. On the other hand, the proposed method uses a unified approach based on the use of CNN, reducing operational complexity and the need for integration between different tools.

As the SLPA dataset lacks ground-truth, it was not feasible to train the network with these images. Therefore, a model was used with the weights trained with the private dataset in these experiments. As the datasets have different contrast characteristics, it was necessary to use histogram matching. The image processing community often uses this technique to achieve consistent illumination or contrast across two datasets (i.e., the reference and the target images) [38]. Thus, a slice from the private base used in the network’s training set was randomly chosen as a reference for histogram matching. As a result, the brightness and contrast levels of the SLPA dataset slices were improved and standardized, as shown in Figure 9.

After this pre-processing, the proposed method was applied to the dry and wet samples. These samples then passed through the slice group splitting, segmentation, and slice group merging stages, thus obtaining the segmented volumes. Figure 10 shows the difference between the segmentations obtained with and without histogram matching. Notably, applying this technique was a determining factor in achieving better results. From the masks generated, the following properties were calculated: porosity, tortuosity, and porosity curve. The results for the first two can be seen in Table 4.

A behavior similar to that observed in the previous experiments with the private dataset can be seen in the tests with the SLPA dataset. In this case, it can be seen that when the porosity values decrease, the tortuosity values increase, indicating that the decrease in pore space leads to complications in the passage of fluids and nutrients, making the path more tortuous. Furthermore, the analysis provided by the proposed method corroborates the findings of Ivonin et al. [2] since the dry sample has a higher porosity than the wet sample.

This is also confirmed by comparing the porosity curves, as shown in Figure 11. The drastic reduction in porosity can be seen at various points on the chart. The maximum porosity value observed in the dry sample was 14.38%. In the wet sample, this same slice showed 11.23%. Regarding the lowest porosity values observed, the dry sample had a minimum of 9.23% while the wet sample had 8.82%.

The achieved results show that wetting has caused soil compaction or the obstruction of pores, thus reducing the transport capacity of water, nutrients, and gases. This negatively affects soil aeration and the growth of plant roots. This analysis obtained from the proposed method’s results is similar to that made by the creators of the SLPA dataset.

Another piece of information presented by the proposed method is the increase in tortuosity after wetting, suggesting that the pathways for the flow of fluids and solutes have become more complex, hindering the efficient movement of these essential elements for biological activity.

In their work, the creators of the SLPA dataset use thresholds and morphological operations to generate the segmentations, which makes this process more time-consuming due to the manual search for the best parameters. On the other hand, the proposed method uses an automatic and efficient approach based on deep learning to generate the automatic segmentation of the volumetric soil sample, thus reducing the analysis time.

4. Discussion

The proposed method shows promising results by automatically segmenting soil samples using convolutional neural networks and an attention mechanism that combines the advantages of CBAM and GSAU. In addition, the method also enables the estimation of porosity and tortuosity properties, which are important to support detailed analysis and guide practical applications related to soil structural behavior.

The comparative study of attention mechanisms performed during the development of the proposed method shows that these techniques can improve the results when incorporated into neural networks. This is observed through the segmentation evaluation metrics (Dice and IoU), which showed better results in the experiments with attention mechanisms, with the proposed mechanism outperforming the others.

The proposed attention mechanism is based on how CBAM works, using its channel-level attention stage. However, spatial attention is replaced by the GSAU, which includes pointwise and depthwise convolutions. These types of convolutions are characterized by the preservation of spatial information and the ability to extract more complex representations, unlike the pooling operations used in CBAM’s spatial attention, which reduce the dimensions of the feature maps by calculating averages or maximum value in a window, leading to the loss of spatial details.

However, the proposed method has some limitations. Despite obtaining 99.15% Dice in the experiment with the private base, which means a high level of similarity with ground-truth, porosity was estimated with a high difference. As this calculation is based on the count of soil and pore pixels, it can be affected by errors in the classification of these pixels. However, the difference between the tortuosity values is smaller, as this calculation depends on the centroids of each slice of the sample and is less sensitive to errors in the classification of pore pixels.

Despite these limitations, the proposed method showed its adaptability to different soil analysis scenarios, as in the experiments with the SLPA dataset. The conclusions about the porosity curves of dry and wet samples calculated based on the automatic segmentation generated by the proposed method were similar to the findings of Ivonin et al. [2]. Their approach, however, requires advanced mathematical knowledge to implement and interpret metrics such as Minkowski functionals and Betti numbers. In contrast, based on neural networks with an attention mechanism, the proposed method automates feature extraction and offers operational simplicity, making the analysis more accessible to users without specialized mathematical expertise.

5. Conclusions and Future Works

This work introduced a method for 3D pore segmentation and soil characterization based on CNN with attention mechanisms. Different mechanisms integrated with U-Net were tested: CBAM, GSAU, and a proposed mechanism that combines their advantages. This proposed mechanism showed the best segmentation results, surpassing the results produced by the others and U-Net, a reference in segmentation tasks according to the literature.

Regarding soil characterization, the proposed method also estimates important properties, such as porosity and tortuosity, widely used in computational petrophysics analyses. These properties allow for detailed analyses that can support practices related to the structural behavior of the soil. In the experiments with the private dataset, the results differed slightly from the ground-truth. In the experiments with the SLPA dataset, the results obtained by the method were consistent with those generated by approaches based on advanced mathematical concepts. This compatibility reinforces the robustness of the method, its adaptability to different soil samples, and the operational simplicity characteristic of approaches using convolutional neural networks.

Future work includes the acquisition of soil samples, exploring different attention mechanisms [39,40], more robust 3D backbones, such as those of the EfficientNet family [41], as well as conducting experiments with vision transformers [42] and mamba [43] to improve segmentation. In addition, the aim is to estimate additional properties, such as soil permeability, density, pore volume, and larger pore connectivity, expanding the applications and structural understanding provided by the method.

Author Contributions

Conceptualization, I.F.S.d.S., A.d.C.A., J.D.S.d.A., A.C.S., A.C.d.P. and D.R.; Methodology, I.F.S.d.S.; Software, I.F.S.d.S. and A.d.C.A.; Writing—Original Draft, I.F.S.d.S.; Investigation, I.F.S.d.S.; Resources, D.R.; Supervision, J.D.S.d.A., A.C.S., A.C.d.P. and D.R.; Funding acquisition, J.D.S.d.A., A.C.S., A.C.d.P. and D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Brazilian fomenting agencies Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)—Finance Code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Fundação de Amparo à Pesquisa Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA)—Finance Code BPD-03248/24.

Data Availability Statement

The private dataset supporting the conclusions of this article will be made available by the authors on request. The Silty Loam Phaeozem Albic dataset is openly available at https://www.digitalrocksportal.org/projects/302, accessed on 23 December 2024.

Acknowledgments

The authors acknowledge the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil—Finance Code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil, Fundação de Amparo à Pesquisa Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA), Brazil, for the financial support; and Institute of Technical-Scientific Software Development of PUC-Rio (Tecgraf/PUC-Rio) for data resources and technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Houston, A.N.; Otten, W.; Falconer, R.; Monga, O.; Baveye, P.C.; Hapca, S.M. Quantification of the pore size distribution of soils: Assessment of existing software using tomographic and synthetic 3D images. Geoderma 2017, 299, 73–82. [Google Scholar] [CrossRef]
Ivonin, D.; Kalnin, T.; Grachev, E.; Shein, E. Quantitative analysis of pore space structure in dry and wet soil by integral geometry methods. Geosciences 2020, 10, 365. [Google Scholar] [CrossRef]
Kerloch, E.; Michel, J.C. Pore tortuosity and wettability as main characteristics of the evolution of hydraulic properties of organic growing media during cultivation. Vadose Zone J. 2015, 14, 1–7. [Google Scholar] [CrossRef]
Cejas, C.M.; Hough, L.A.; Beaufret, R.; Castaing, J.C.; Frétigny, C.; Dreyfus, R. Preferential Root Tropisms in 2D Wet Granular Media with Structural Inhomogeneities. Sci. Rep. 2019, 9, 14195. [Google Scholar] [CrossRef]
Freddi, O.d.S.; Centurion, J.F.; Beutler, A.N.; Aratani, R.G.; Leonel, C.L. Compactação do solo no crescimento radicular e produtividade da cultura do milho. Rev. Bras. Ciênc. Solo 2007, 31, 627–636. [Google Scholar] [CrossRef]
Al-Marzouqi, H. Digital rock physics: Using CT scans to compute rock properties. IEEE Signal Process. Mag. 2018, 35, 121–131. [Google Scholar] [CrossRef]
Andrä, H.; Combaret, N.; Dvorkin, J.; Glatt, E.; Han, J.; Kabel, M.; Keehm, Y.; Krzikalla, F.; Lee, M.; Madonna, C.; et al. Digital rock physics benchmarks—Part I: Imaging and segmentation. Comput. Geosci. 2013, 50, 25–32. [Google Scholar] [CrossRef]
Schindelin, J.; Rueden, C.T.; Hiner, M.C.; Eliceiri, K.W. The ImageJ ecosystem: An open platform for biomedical image analysis. Mol. Reprod. Dev. 2015, 82, 518–529. [Google Scholar] [CrossRef]
Comet Technologies Canada Inc. Dragonfly 2022.2 [Computer Software]. 2022. Available online: https://dragonfly.comet.tech/ (accessed on 23 December 2024).
Mandal, P.P.; Rezaee, R. Facies classification with different machine learning algorithm–An efficient artificial intelligence technique for improved classification. ASEG Ext. Abstr. 2019, 2019, 1–6. [Google Scholar] [CrossRef]
Liu, X.; Chandra, V.; Ramdani, A.I.; Zuhlke, R.; Vahrenkamp, V. Using deep-learning to predict Dunham textures and depositional facies of carbonate rocks from thin sections. Geoenergy Sci. Eng. 2023, 227, 211906. [Google Scholar] [CrossRef]
Dantas, A.; Vidal, A.; Soares, J.; Medeiros, L. Petrofísica Computacional Aplicada a Analise da Tortuosidade de Rochas Carbonáticas. In Proceedings of the VII Simpósio Brasileiro de Geofísica, SBGf, Ouro Preto, MG, Brazil, 25–27 October 2016. [Google Scholar]
Chawshin, K.; Berg, C.F.; Varagnolo, D.; Lopez, O. Automated porosity estimation using CT-scans of extracted core data. Comput. Geosci. 2022, 26, 595–612. [Google Scholar] [CrossRef]
dos Anjos, C.E.M.; de Matos, T.F.; Avila, M.R.V.; Fernandes, J.d.C.V.; Surmas, R.; Evsukoff, A.G. Permeability estimation on raw micro-CT of carbonate rock samples using deep learning. Geoenergy Sci. Eng. 2023, 222, 211335. [Google Scholar] [CrossRef]
Da Wang, Y.; Shabaninejad, M.; Armstrong, R.T.; Mostaghimi, P. Deep neural networks for improving physical accuracy of 2D and 3D multi-mineral segmentation of rock micro-CT images. Appl. Soft Comput. 2021, 104, 107185. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Yu, Q.; Xiong, Z.; Du, C.; Dai, Z.; Soltanian, M.R.; Soltanian, M.; Yin, S.; Liu, W.; Liu, C.; Wang, C.; et al. Identification of rock pore structures and permeabilities using electron microscopy experiments and deep learning interpretations. Fuel 2020, 268, 117416. [Google Scholar] [CrossRef]
Marques, V.G.; da Silva, L.R.; Carvalho, B.M.; de Lucena, L.R.; Vieira, M.M. Deep learning-based pore segmentation of thin rock sections for aquifer characterization using color space reduction. In Proceedings of the 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), Osijek, Croatia, 5–7 June 2019; pp. 235–240. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Zhao, H.; Jia, J.; Koltun, V. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10076–10085. [Google Scholar]
Gonçalves, T.; Rio-Torto, I.; Teixeira, L.F.; Cardoso, J.S. A survey on attention mechanisms for medical applications: Are we moving toward better Algorithms? IEEE Access 2022, 10, 98909–98935. [Google Scholar] [CrossRef]
Koeshidayatullah, A.; Ferreira-Chacua, I.; Li, W. Is attention all geosciences need? Advancing quantitative petrography with attention-based deep learning. Comput. Geosci. 2023, 181, 105466. [Google Scholar] [CrossRef]
Song, M.; Zhao, Y.; Zhao, Y.; Han, Q. ACFTransUNet: A new multi-category soil pores 3D segmentation model combining Transformer and CNN with concentrated-fusion attention. Comput. Electron. Agric. 2024, 225, 109312. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Wang, Y.; Li, Y.; Wang, G.; Liu, X. Multi-scale attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5950–5960. [Google Scholar]
Hara, K.; Saito, D.; Shouno, H. Analysis of function of rectified linear unit used in deep learning. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–8. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Li, W.; Wu, J.; Chen, H.; Wang, Y.; Jia, Y.; Gui, G. Unet combined with attention mechanism method for extracting flood submerged range. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6588–6597. [Google Scholar] [CrossRef]
Su, H.; Wang, X.; Han, T.; Wang, Z.; Zhao, Z.; Zhang, P. Research on a U-Net bridge crack identification and feature-calculation methods based on a CBAM attention mechanism. Buildings 2022, 12, 1561. [Google Scholar] [CrossRef]
Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
Hua, W.; Dai, Z.; Liu, H.; Le, Q. Transformer quality in linear time. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 9099–9117. [Google Scholar]
Fernandes, C.; Santos, L.; Philippi, P.; Bueno, A.; Rodrigues, C.; Gaspari, H. Predição de propriedades petrofísicas de rochas reservatório de petróleo a partir de análise de imagens. In Proceedings of the 1º Congresso Brasileiro de P&D em Petróleo e Gás, UFRN–SBQ Regional RN, Natal, RN, Brazil, 25–28 November 2001. [Google Scholar]
Zhang, Y.; Yang, Z.; Wang, F.; Zhang, X. Comparison of soil tortuosity calculated by different methods. Geoderma 2021, 402, 115358. [Google Scholar] [CrossRef]
Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 8 December 2024).
Müller, D.; Soto-Rey, I.; Kramer, F. Towards a guideline for evaluation metrics in medical image segmentation. BMC Res. Notes 2022, 15, 210. [Google Scholar] [CrossRef] [PubMed]
Javaheri, I.; Sundararaghavan, V. Polycrystalline microstructure reconstruction using Markov random fields and histogram matching. Comput.-Aided Des. 2020, 120, 102806. [Google Scholar] [CrossRef]
Sinha, A.; Dolz, J. Multi-scale self-guided attention for medical image segmentation. IEEE J. Biomed. Health Inform. 2020, 25, 121–130. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Yue, Y.; Li, Z. Medmamba: Vision mamba for medical image classification. arXiv 2024, arXiv:2403.03849. [Google Scholar]

Figure 1. An overview of the proposed method for 3D pore segmentation.

Figure 2. Example of application of the zero padding strategy in the slice group splitting step.

Figure 3. The architecture of the proposed network for 3D pore segmentation.

Figure 4. Circular sub-area defined for the calculations.

Figure 5. Examples from the private dataset: (A) a soil micro-CT image, and (B) the respective ground-truth.

Figure 6. Examples from SLPA dataset with two varieties of samples condition: (A) dry and (B) wet.

Figure 7. Qualitative results of the tested approaches and the proposed method.

Figure 8. Porosity estimated for each slice from the test soil sample. Comparison between the results obtained by the proposed method and those extracted from the ground-truth.

Figure 9. Histogram matching results on the SLPA dataset. The reference slice is from the private dataset.

Figure 10. Segmentation results using the proposed method applied to the dry and wet samples from the SLPA dataset: (A) without histogram matching, and (B) with histogram matching.

Figure 11. Porosity estimated for each slice from the SLPA dataset. Comparison between the results obtained for the dry and wet samples.

Table 1. Results obtained in experiments for the segmentation process.

Method	Dice	IoU
U-Net	0.9896	0.9818
GSAU	0.9907	0.9819
CBAM	0.9910	0.9824
Proposed Method	0.9910	0.9825

Table 2. Results of 3-fold cross-validation. Comparison between CBAM and the proposed method.

Method	Dice	IoU	Best Dice	Worst Dice
CBAM	0.9907 ± 0.0006	0.9819 ± 0.0014	0.9913	0.9900
Proposed Method	0.9910 ± 0.0004	0.9823 ± 0.0008	0.9915	0.9906

Table 3. Porosity and Tortuosity values: a comparison between the proposed method and the ground-truth.

Sample	Porosity (%)	Tortuosity
Ground-truth	17.65	0.1473
Proposed Method	13.21	0.1509

Table 4. Porosity and Tortuosity values estimated for the dry and wet samples from the SLAP dataset.

SLPA Sample	Porosity	Tortuosity
Dry	11.55	0.1224
Wet	11.35	0.1388

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Silva, I.F.S.d.; Araújo, A.d.C.; Almeida, J.D.S.d.; Paiva, A.C.d.; Silva, A.C.; Roehl, D. Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization. AgriEngineering 2025, 7, 27. https://doi.org/10.3390/agriengineering7020027

AMA Style

Silva IFSd, Araújo AdC, Almeida JDSd, Paiva ACd, Silva AC, Roehl D. Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization. AgriEngineering. 2025; 7(2):27. https://doi.org/10.3390/agriengineering7020027

Chicago/Turabian Style

Silva, Italo Francyles Santos da, Alan de Carvalho Araújo, João Dallyson Sousa de Almeida, Anselmo Cardoso de Paiva, Aristófanes Corrêa Silva, and Deane Roehl. 2025. "Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization" AgriEngineering 7, no. 2: 27. https://doi.org/10.3390/agriengineering7020027

APA Style

Silva, I. F. S. d., Araújo, A. d. C., Almeida, J. D. S. d., Paiva, A. C. d., Silva, A. C., & Roehl, D. (2025). Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization. AgriEngineering, 7(2), 27. https://doi.org/10.3390/agriengineering7020027

Article Menu

Soil Structure Analysis with Attention: A Deep-Learning-Based Method for 3D Pore Segmentation and Characterization

Abstract

1. Introduction

2. Proposed Method

2.1. Pore Segmentation

2.1.1. Slice Group Splitting

2.1.2. Segmentation

2.1.3. Slice Group Merging and Output

2.2. Soil Characterization

3. Experiments and Results

3.1. Datasets

3.2. Experimental Setup

3.3. Experiments with the Private Dataset

3.4. Experiments with the SLPA Dataset

4. Discussion

5. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI