Research on Soil Pore Segmentation of CT Images Based on MMLFR-UNet Hybrid Network
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors correctly establish that accurate segmentation of soil pore structure is crucial for studying soil water migration, nutrient cycling, and gas exchange. However, we know that low-contrast, high-noise images in complex soil environments leave traditional segmentation methods with obvious shortcomings in accuracy and robustness.
In this regard, they propose a hybrid model that combines a multimodal low-frequency reconstruction (MMLFR) algorithm and UNet (MMLFR-UNet). This proposal is important, but for proper publication, some considerations need to be taken into account:
Carefully review the wording of the document, for example, line 43.
The structure and writing of the introduction should be improved; establish a logical order; I believe it is disorganized.
The outline included in Figure 1 of the introduction should be transferred to the Methodology section.
At the end of the introduction, the research questions or, failing that, the objective of this work should be clearly and precisely stated.
In item 2.1. Soil Sampling, consider including references that support the information used.
In the methodology section, a methodological outline of this work is required, including a flowchart that allows for interpretation of the methodology used.
The discussion of results requires references that allow for proper analysis and comparison of the results. This is important to validate the results, preferably using results from other researchers from the last five years if possible.
It would be advisable to improve the writing and presentation of the work. The resulting product is undoubtedly good, but a scientific article has presentation standards.
Author Response
Response to Reviewer 1:
I am very grateful to your comments for the manuscript. Thank you for your advice. All your suggestions are very important. They have important guiding significance for our paper and our research work. We have revised the manuscript according to your comments. The response to each revision is listed as following:
Comment 1
Carefully review the wording of the document, for example, line 43.
Response:
Thanks for your suggestion.
We have carefully reviewed the wording and sentence structure throughout the manuscript, including the originally mentioned line 43. In particular, we have restructured and refined the entire Introduction section to enhance clarity, logical flow, and language accuracy. The revised version eliminates previous ambiguities, improves transitions between paragraphs, and ensures more precise expressions. We sincerely appreciate your valuable suggestion.
Comment 2
The structure and writing of the introduction should be improved; establish a logical order; I believe it is disorganized.
Response:
Thanks for your suggestion.
We agree that a clear and logical organization is essential for readers to understand the background and motivation of our study. In the revised manuscript, we have restructured the introduction accordingly. Specifically, we have restructured the content into a more coherent logical progression that covers:
- The scientific significance of soil pore structures and their role in soil physical and biological processes;
- The limitations of conventional models based on REV and the advantages brought by CT imaging;
- A critical review of traditional segmentation methods and their challenges;
- The development and performance of deep learning–based models, particularly U-Net, along with its limitations;
- The rationale for introducing our proposed MMLFR-UNet model, including its theoretical basis and anticipated improvements.
We believe that the revised structure presents a clearer and more logical progression of the research background, technical challenges, methodological motivation, and study objectives, thereby providing a stronger foundation for the rest of the paper. We sincerely hope that these improvements meet your expectations.
A revised manuscript with the red marked correction in Line 39, Page 1 - Line 116, Page 3 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 3
The outline included in Figure 1 of the introduction should be transferred to the Methodology section.
Response:
Thanks for your suggestion.
In accordance with your comment, we have moved the outline originally presented in Figure 1 from the Introduction to the Methodology section in the revised manuscript.
We believe this adjustment improves the logical flow of the manuscript and better aligns with academic conventions.
“
Figure 1. Framework for MMLFR-UNet.
”
A revised manuscript with the red marked correction in Line 157, Page 4 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 4
At the end of the introduction, the research questions or, failing that, the objective of this work should be clearly and precisely stated.
Response:
Thanks for your suggestion.
In response to your comment, we have revised the last paragraph of the introduction to clearly and explicitly state the research objectives of this study. The revised version now explicitly outlines the purpose of proposing the MMLFR-UNet model and the specific goals of our work, which include comparative performance evaluation with existing segmentation methods and supporting the quantitative characterization of soil pore structures.
We have added relevant content to the text:
“In this study, we propose a hybrid segmentation model, MMLFR-UNet, which integrates Multi-Modal Low-Frequency Reconstruction (MMLFR) based on two-dimensional variational mode decomposition (2Dvmd) with the UNet architecture to enable precise segmentation of heterogeneous soil CT images. The objectives of this study are: (1) to compare the performance of MMLFR-UNet with traditional pore identification methods (e.g., Otsu, Fuzzy C-Means and Unet) through both quantitative and qualitative experiments; and (2) to provide technical support for the subsequent quantitative characterization of soil pore structures.”
A revised manuscript with the red marked correction in Line 109-116, Page 3 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 5
In item 2.1. Soil Sampling, consider including references that support the information used.
Response:
Thanks for your suggestion.
We agree with the reviewer’s suggestion and have supplemented Section 2.1 “Soil Sampling” with relevant references to support the methodology described. The soil sampling method adopted in this study follows classical agricultural and soil science protocols. To enhance clarity and academic rigor, we have cited the following three references:
(1) Lu, R.K. Analytical Methods of Soil Agrochemistry; Chinese Agriculture Science and Technology Press: Beijing, China, 1999.
(2) Han, Q., Liu, L., Zhao, Y., & Zhao, Y. (2021). A neighborhood median weighted fuzzy c-means method for soil pore identification. Pedosphere, 31(5), 746–760.
(3) Liu, L., et al. A Simplified Convolutional Network for Soil Pore Identification Based on Computed Tomography Imagery.
A revised manuscript with the red marked correction in Line 129, Page 3 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 6
In the methodology section, a methodological outline of this work is required, including a flowchart that allows for interpretation of the methodology used.
Response:
Thanks for your suggestion.
In response, we have added a clear methodological outline to the methodology section, as shown in Figure 1. This flowchart presents the overall workflow of the proposed MMLFR-UNet model, including image preprocessing, frequency-domain decomposition and reconstruction using MMLFR, UNet-based segmentation, and performance evaluation. We believe this visual representation significantly improves the clarity and interpretability of the methodology used in our study.
“
Figure 1. Framework for MMLFR-UNet.
The overall workflow of the proposed MMLFR-UNet is depicted in Figure 1, outlining the methodology used in this study. Initially, raw CT soil images undergo conventional preprocessing techniques to normalize the data and reduce basic noise. Subsequent to this, the Multi-Modal Low-Frequency Reconstruction (MMLFR) algorithm is implemented in order to decompose each image into multiple sub-modalities, with the purpose of capturing distinct frequency components and enhancing the representation of structural details. Thereafter, an FFT-based frequency domain metric is utilised to identify and suppress noise components automatically. Subsequently, a selection of sub-modes are recombined in order
to reconstruct a denoised version of the input image, thereby enhancing critical features while minimising interference.The reconstructed image is then fed into a UNet-based encoder–decoder architecture. During the testing phase, the trained MMLFR-UNet model generates segmentation predictions, which are then compared to ground truth labels and
benchmarked against traditional methods. In order to perform a quantitative evaluation of performance, the standard metrics are adopted, including Intersection over Union (IoU), Pixel Accuracy (PA), Dice Similarity Coefficient (DSC) and Boundary Similarity (Boundary F1-Score). The effectiveness and robustness of the proposed approach are collectively
validated by these metrics[39].
”
A revised manuscript with the red marked correction in Line 157-176, Page 4 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 7
The discussion of results requires references that allow for proper analysis and comparison of the results. This is important to validate the results, preferably using results from other researchers from the last five years if possible.
Response:
Thanks for your suggestion.
In response, we have expanded the discussion section to include recent literature from the past five years for better validation and comparison of the results. Specifically, we benchmarked the performance of the proposed MMLFR-UNet model against three recent methods: (1) Liu et al. (2024), who proposed a hybrid U-Net and LSTM model with a reported maximum accuracy of 96.49%; (2) Han et al. (2024), who developed an improved UNet-VAE model for multi-type soil pore segmentation, achieving an average accuracy of 93.83%; and (3) Song et al. (2024), who introduced ACFTransUNet, combining Transformer and CNN with attention mechanisms, reporting an average accuracy of 94.12% across four pore categories. In comparison,
our MMLFR-UNet model achieves a pixel accuracy of 98.83% on 2D images, with relatively better performance in small-pore detection under noisy conditions and more complete boundary preservation. These additions support a more thorough analysis of the method's effectiveness.
We have added relevant content to the text:
“Furthermore, to demonstrate the comparative advantage of MMLFR-UNet, we benchmark it against two recent studies. Liu[27] et al. (2024) proposed a 3D pore segmen-tation method based on a hybrid U-Net and LSTM network, reporting a highest test accu-racy of 96.49%. Han et al. (2024) developed an improved UNet-VAE model for multi-type pore segmentation, with an average accuracy of 93.83% across four pore categories. We further compared the segmentation performance of our method with the recent ACFTransUNet model proposed by Song[42] et al. (2024), which integrates Transformer and CNN with concentrated-fusion
attention for multi-category 3D soil pore segmentation. According to their results, the highest reported average accuracy across four pore types was 94.12%. In comparison, the MMLFR-UNet in this study achieves a pixel accuracy of 98.83% on 2D images, with significantly better results in detecting small pores under noisy conditions and preserving
more complete boundary morphology.”
A revised manuscript with the red marked correction in Line 486-498, Page 16 was attached as the supplemental material entitled by “Revised Manuscript”.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
you have presented an interesting paper entitled Research on Soil Pore Segmentation of CT Images Based on MMLFR-UNet Hybrid Network. In my opinion: in the section of introduction, research methodology, and results the paper was relatively correctly prepared. Big doubts are raised by the chapter entitled discussion of results, which should be a comparison of the results obtained by your research with the results of other Authors who deal with issues similar to the one presented by you. A polemic is needed. I suggest that at this point you withdraw the article, prepare the discussion correctly, anew, and then submit the paper under the same name.
Author Response
Response to Reviewer 2:
I am very grateful to your comments for the manuscript. Thank you for your advice. All your suggestions are very important. They have important guiding significance for our paper and our research work. We have revised the manuscript according to your comments. The response to each revision is listed as following:
Comment 1
Big doubts are raised by the chapter entitled discussion of results, which should be a comparison of the results obtained by your research with the results of other Authors who deal with issues similar to the one presented by you. A polemic is needed.
Response:
Thanks for your suggestion.
We sincerely thank the reviewer for raising this important point. Following your suggestion, we have thoroughly revised the “Discussion of Results” section (now integrated into Section 4.2) to incorporate comprehensive comparisons with multiple representative studies published in recent years. Specifically, we compared our proposed MMLFR-UNet model with the methods reported by Liu et al. (2024), Han et al. (2024), and Song et al. (2024), which include advanced architectures such as U-Net–LSTM hybrids and Transformer-based models like ACFTransUNet. Through quantitative analysis of segmentation accuracy (e.g., pixel accuracy), our results demonstrate consistent advantages in complex scenarios, particularly in small pore recognition and boundary integrity. This addition directly addresses the need for critical comparison and scientific dialogue (“a polemic”) with other works in the field, as the reviewer kindly recommended. We are truly grateful for your constructive feedback, which greatly improved the depth and relevance of our discussion.
We have added relevant content to the text:
“Furthermore, to demonstrate the comparative advantage of MMLFR-UNet, we benchmark it against two recent studies. Liu[25] et al. (2024) proposed a 3D pore segmentation method based on a hybrid U-Net and LSTM network, reporting a highest test accuracy of 96.49%. Han et al. (2024) developed an improved UNet-VAE model for multi-type pore segmentation, with an average accuracy of 93.83% across four pore categories. We further compared the segmentation performance of our method with the recent ACFTransUNet model proposed by Song[38] et al. (2024), which integrates Transformer and CNN with concentrated-fusion attention for multi-category 3D soil pore segmentation. According to their results, the highest reported average accuracy across four pore types was 94.12%. In comparison, the MMLFR-UNet in this study achieves a pixel accuracy of 98.83% on 2D images, with significantly better results in detecting small pores under noisy conditions and preserving more complete boundary morphology.”
A revised manuscript with the red marked correction in Line 487-499, Page 16 was attached as the supplemental material entitled by “Revised Manuscript”.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors- The importance of the pore spaces of a porous media as well as the necessity of conducting such studies should be explained clearer at the first parts of the introduction.
- When the authors deal with traditional methods of pore identification, they should mention some physical or empirical equations, from which those are based on selecting representative elementary volume; then make a bridge and discuss CT scan methods.
- Information presented for the soil samples are inadequate. Please give more detailed information, including (if applicable), a map of the region, various classifications, physical properties (in average), hydraulic characteristics, total number of samples (rather than slices), the area of the region (if all data have been obtained from a unique region), etc. All of such information should be given in section 3.1.
- In the methods section, information on the adopted CNN-based model should be presented more. Tuning parameters, convergence speed, etc,
- In figure10; information on how the index was obtained should be given more.
Author Response
Response to Reviewer 3:
I am very grateful to your comments for the manuscript. Thank you for your advice. All your suggestions are very important. They have important guiding significance for our paper and our research work. We have revised the manuscript according to your comments. The response to each revision is listed as following:
Comment 1
The importance of the pore spaces of a porous media as well as the necessity of conducting such studies should be explained clearer at the first parts of the introduction.
Response:
Thanks for your suggestion.
In response, we have revised the beginning of the introduction to more clearly emphasize the importance of soil pore spaces and the necessity of conducting related research. Specifically, we added a new paragraph that defines soil as a porous medium and highlights how the geometry and connectivity of pore structures critically influence key physical processes such as water retention, gas exchange, and biogeochemical cycling. This addition also clarifies the significance of quantitative characterization of pore structures for understanding and modeling soil behavior. We believe this enhancement provides stronger context and motivation for the study.
We have added relevant content to the text:
"Soil is defined as a porous medium formed by the geometric arrangement of solid particles (mineral and organic matter), water, air, and microorganisms. The soil pore structure is defined by the interconnected void spaces between these solid phases. The intricate characteristics of soil pores, encompassing their geometry, spatial configuration, and connectivity, are pivotal in determining soil gas permeability and water-retention properties [1, 2]. In the meantime, it has been established that the structure of soil pores is a pivotal factor in the maintenance of essential soil biogeochemical and biophysical processes[3, 4]. The quantitative characterisation of soil pore structure is imperative for comprehending the mechanisms of water and gas transport. This, in turn, is essential for accurate modelling and prediction of physical and chemical processes in soils."
A revised manuscript with the red marked correction in Line 39, Page 1 - Line 48, Page 2 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 2
When the authors deal with traditional methods of pore identification, they should mention some physical or empirical equations, from which those are based on selecting representative elementary volume; then make a bridge and discuss CT scan methods.
Response:
Thanks for your suggestion.
In response, we have revised the introduction to include a description of traditional soil pore identification approaches based on Representative Elementary Volume (REV) theory. Specifically, we now discuss how classical soil physics has long employed empirical and physically-based models—such as Darcy’s law and other percolation equations—for estimating porosity, permeability, and specific surface area under the assumption of structural homogeneity at a given scale. We further explain the limitations of REV-based models in capturing the microstructural complexity and heterogeneity of soil pore networks. To bridge this discussion with CT-based approaches, we highlight how the emergence of computed tomography (CT) imaging has enabled micrometer-scale visualization of internal soil structures, supporting more accurate morphological characterization and spatial analysis. This addition strengthens the logical connection between traditional REV-based methods and advanced image-based segmentation techniques used in our study.
We have added relevant content to the text:
“Conventional soil physics has historically relied on empirical or physically-based models based on Representative Elementary Volume (REV) assumptions to achieve such quantitative characterisations[5, 6]. These models are frequently employed to estimate parameters such as porosity, permeability, and specific surface area, often utilising classical flow equations like Darcy's law and presuming a homogeneous medium at a defined spatial scale. Nevertheless, REV-based approaches are constrained in their capacity to capture the microstructural characteristics, connectivity, and heterogeneity of complex soil pore networks. With the advent of computed tomography (CT) imaging, it is now possible to visualize the internal three-dimensional pore architecture of soil samples at micrometer-scale resolution, allowing direct analysis of spatial variability and structural complexity. This capability enhances our understanding of fluid dynamics in soil systems and provides essential data for investigating microbial community distributions and plant–root interactions[7-9]. As a non-destructive and high-resolution technique, X-ray CT has become a powerful tool in soil science, enabling both two- and three-dimensional visualization of pore structures and supporting accurate quantification of their morphological characteristics. “
A revised manuscript with the red marked correction in Line 49-64, Page 2 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 3
Information presented for the soil samples are inadequate. Please give more detailed information, including (if applicable), a map of the region, various classifications, physical properties (in average), hydraulic characteristics, total number of samples (rather than slices), the area of the region (if all data have been obtained from a unique region), etc. All of such information should be given in section 3.1.
Response:
Thanks for your suggestion.
In response, we have revised Section 2.1 "Soil Sampling" to include more detailed information about the soil samples. Specifically, we have provided the precise sampling location (19°49′N, 110°5′E) in Hainan Province, China, and described the regional climate and soil classification (Ferralsols according to FAO/UNESCO, corresponding to Oxisols under USDA taxonomy). We also included physical and chemical properties of the samples, such as bulk density, texture composition, pH, organic matter, total phosphorus, and total nitrogen. Two undisturbed cylindrical samples were collected, and all data were obtained from a single agroforestry observation station. This information helps clarify the environmental context and sampling conditions of our study.
We have added relevant content to the text:
“Soil sampling[3, 35, 36] was conducted in Chengmai Meiting Agroforestry Complex Ecosystem Hainan Observation and Research Station, Hainan Province, China (19°49′N, 110°5′E). Two undisturbed cylindrical soil column (10 cm in diameter and 10 cm in height) were randomly collected from a well-structured A horizon of Ferralsols(FAO/UNESCO), which corresponded to Oxisols(USDA). Hainan Island, situated in the tropics, features a warm, humid climate with abundant rainfall and deep, fertile red-weathered soils ideal for tropical agriculture. The bulk density at a depth of 0 to 20 cm is 1.37 g cm−3 . The soil texture is classified as silt loam, composed of 29.33% sand, 44.17% silt, and 26.50% clay. The chemical properties of soil samples were as follows:pH (5.67), soil organic matter (11.13 g·kg −1 ), total phosphorus (0.58 g·kg −1 ), total nitrogen (0.89 g·kg −1 ). ”
A revised manuscript with the red marked correction in Line 119-129, Page 3 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 4
In the methods section, information on the adopted CNN-based model should be presented more. Tuning parameters, convergence speed, etc.
Response:
Thanks for your suggestion.
In response to your suggestion, we have substantially expanded Section 2.4 "MMLFR-UNet hybrid model" by including a detailed description of the CNN-based U-Net structure used in our study. Specifically, we describe the network architecture including the number of encoding and decoding layers, kernel size, activation function, and upsampling method. Furthermore, we have provided a complete configuration table (Table 1) summarizing all key training parameters such as learning rate, batch size, optimizer, initialization method, loss function, and learning rate scheduler. These additions aim to enhance the reproducibility and clarity of our model implementation.
We have added relevant content to the text:
“
In this model, the MMLFR first decomposes each input image into five Intrinsic Mode Functions (IMFs) using 2D-VMD. These IMFs capture different frequency components, and a low-frequency energy threshold (set as 0.3) is applied to preserve meaningful information while suppressing noise. The selected IMFs are Fourier-transformed and merged to form a denoised and enhanced image. This processed image is then fed into a U-Net network, which consists of 4 encoding and 4 decoding layers, each with 3×3 convolution, ReLU activation, and bilinear upsampling. Skip connections are used to preserve spatial detail. The training process is optimized using RMSprop, BCEWithLogitsLoss, and a ReduceLROnPlateau scheduler. The complete training-related parameter configuration is summarized in Table 1.
Table 1. Training Parameters Configuration.
Parameter Name |
Value/Setting |
Parameter Name |
Value/Setting |
alpha |
5000 |
epochs |
200 |
tau |
0.25 |
batch_size |
64 |
K |
5 |
Learning Rate |
0.0001 |
DC |
1 |
scale |
0.5 |
init |
1 |
Initialization |
Kaiming Normal |
tol |
K × 10⁻⁶ |
optimizer |
RMSprop |
eps |
2.2204e-16 |
loss |
BCEWithLogitsLoss |
low_freq_threshold |
0.3 |
LR scheduler |
ReduceLROnPlateau |
”
A revised manuscript with the red marked correction in Line 211, Page 6 - Line 221, Page 7 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 5
In figure10; information on how the index was obtained should be given more.
Response:
Thanks for your suggestion.
In response to Comment 5, we have updated Section 4.2 “Evaluation Indicators” by providing the mathematical formulas for all four evaluation metrics (IoU, PA, Dice, and BF1). Additionally, we have included clear definitions for all relevant parameters such as TP, FP, FN, TN, as well as the explanation of Boundary Precision and Recall used in the BF1 calculation. These improvements aim to clarify how the metrics presented in Figure 10 were obtained and to enhance the transparency and reproducibility of the evaluation process.
We have added relevant content to the text:
“To comprehensively assess the segmentation performance of different methods on soil CT images, this study adopts four commonly used metrics: Intersection over Union (IoU), Pixel Accuracy (PA), Dice Similarity Coefficient (DSC), and Boundary F1-Score (BF1). These indicators evaluate segmentation quality from multiple dimensions, including region overlap, pixel-wise classification, structural similarity, and boundary delineation accuracy. The formulas for these metrics are as follows:
In the aforementioned equation, TP denotes the number of correctly predicted pore pixels, FP and FN represent the number of solid or pore pixels misclassified as the opposite class, respectively, and TN is the number of correctly identified solid pixels. Within the BF1 metric, Boundary Precision is defined as the proportion of correctly predicted boundary pixels among all predicted boundaries, whereas Boundary Recall is the proportion of true boundary pixels that were correctly detected.”
A revised manuscript with the red marked correction in Line 424-435, Page 14 was attached as the supplemental material entitled by “Revised Manuscript”.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe article needs further work
Comments for author File: Comments.pdf
Author Response
Response to Reviewer 4:
I am very grateful to your comments for the manuscript. Thank you for your advice. All your suggestions are very important. They have important guiding significance for our paper and our research work. We have revised the manuscript according to your comments. The response to each revision is listed as following:
Comment 1
A structural diagram (Fig. 1) of image processing using the MMLFR algorithm is
provided, but the limitations of this method are not mentioned.
Response:
Thanks for your suggestion.
In response to the comment regarding Figure 1, we have addressed the limitation of the MMLFR algorithm directly in the revised manuscript. Specifically, we added a statement in the Conclusion section acknowledging that although the MMLFR algorithm significantly improves segmentation performance, it also introduces additional computational overhead due to the Fourier and inverse Fourier transform steps involved. This trade-off between processing cost and segmentation quality has now been explicitly discussed to provide a more balanced and transparent assessment of the method. We sincerely appreciate your suggestion, which helped us present the strengths and limitations of the proposed model more comprehensively.
We have added relevant content to the text:
“Although MMLFR introduces additional Fourier transform and inverse transform operations, which increase the computational overhead compared with the traditional preprocessing methods, the segmentation effect is significantly improved.”
A revised manuscript with the red marked correction in Line 509-511, Page 16 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 2
The article lacks some details regarding the specific parameters and settings of MMLFR and UNet. For example, the values of the MMLFR algorithm parameters (number of modes, bandwidth parameters, low-frequency energy threshold) and the UNet architecture (number of layers, filter sizes, activation functions) are not specified. Providing this information would enhance the reproducibility of the study.
Response:
Thanks for your suggestion.
In response to your comment, we have updated Section 2.4 to include a comprehensive description of the parameter settings and architectural configurations used in both the MMLFR algorithm and the UNet model. Specifically, we provided details such as the number of decomposition modes (K=5), bandwidth constraint (alpha=5000), low-frequency energy threshold (0.3), and other relevant hyperparameters for the 2D-VMD-based MMLFR module. For the UNet architecture, we have clarified that it consists of 4 encoding and 4 decoding layers, each employing 3×3 convolution, ReLU activation, and bilinear upsampling. Training settings including optimizer (RMSprop), learning rate (0.0001), batch size (64), and loss function (BCEWithLogitsLoss) are also summarized in Table 1. These additions aim to enhance the transparency and reproducibility of our methodology.
We have added relevant content to the text:
”In this model, the MMLFR first decomposes each input image into five Intrinsic Mode Functions (IMFs) using 2D-VMD. These IMFs capture different frequency components, and a low-frequency energy threshold (set as 0.3) is applied to preserve meaningful information while suppressing noise. The selected IMFs are Fourier-transformed and merged to form a denoised and enhanced image. This processed image is then fed into a U-Net network, which consists of 4 encoding and 4 decoding layers, each with 3×3 convolution, ReLU activation, and bilinear upsampling. Skip connections are used to preserve spatial detail. The training process is optimized using RMSprop, BCEWithLogitsLoss, and a ReduceLROnPlateau scheduler. The complete training-related parameter configuration is summarized in Table 1.
Table 1. Training Parameters Configuration.
Parameter Name |
Value/Setting |
Parameter Name |
Value/Setting |
alpha |
5000 |
epochs |
200 |
tau |
0.25 |
batch_size |
64 |
K |
5 |
Learning Rate |
0.0001 |
DC |
1 |
scale |
0.5 |
init |
1 |
Initialization |
Kaiming Normal |
tol |
K × 10⁻⁶ |
optimizer |
RMSprop |
eps |
2.2204e-16 |
loss |
BCEWithLogitsLoss |
low_freq_threshold |
0.3 |
LR scheduler |
ReduceLROnPlateau |
”
A revised manuscript with the red marked correction in Line 211, Page 6 - Line 221, Page 7 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 3
The authors do not discuss the computational costs associated with the developed method. An analysis of processing time, memory requirements, and parallelization capabilities would be beneficial for assessing the method's practicality.
Response:
Thanks for your suggestion.
In response, we have added a new subsection (Section 2.5. Experimental Environment and Computational Efficiency) to provide a detailed analysis of the computational costs associated with the proposed MMLFR-UNet method. Specifically, we report the execution time and memory usage of the MMLFR module for both high-resolution (1700 × 1700) and experimental-standard (283 × 283) images. Additionally, we present the total training time and peak GPU memory usage during UNet training over 50 epochs. This information is intended to offer a clearer understanding of the method’s practicality. Although our model introduces moderate computational overhead, it remains efficient and applicable to large-scale soil CT image segmentation tasks. We appreciate your suggestion, which helped us improve the completeness of the manuscript.
We have added relevant content to the text:
“2.5. Experimental Environment and Computational Efficiency
To evaluate the computational efficiency and practicality of the proposed MMLFR-UNet model, all experiments were conducted on a workstation running Windows 10, configured with Python 3.10.13, PyTorch 2.1.1+cu118, and an NVIDIA RTX 4090 GPU with 24 GB of VRAM. When applied to high-resolution grayscale images of 1700 × 1700, the MMLFR preprocessing step took approximately 260.2 seconds and consumed 926.1 MB of memory, reflecting considerable computational demand. In contrast, processing 283 × 283 images required only 6.84 seconds with a peak memory usage of 25.71 MB, demonstrating a substantial improvement in efficiency. Therefore, to balance computational cost and segmentation performance, we selected 283 × 283 as the standard image size for subsequent experiments. For the UNet training stage, using this image size and training for 50 epochs, the total training time was 365.7 seconds, with a peak GPU memory usage of 2019.34 MB. These results indicate that while the proposed model introduces moderate computational overhead, it remains practical for large-scale soil image segmentation tasks.”
A revised manuscript with the red marked correction in Line 222-236, Page 7 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 4 | Comment 9 | Comment 10
Comment 4: The study focuses on one soil type (Ferralsols) from a specific region. Further research could aim to evaluate the method's effectiveness on different soil types and under various imaging conditions to verify its generalization ability.
Comment 9: The physicochemical and structural aspects of tomographic sensing of the soil sample and the effectiveness of the sampling method depending on the soil condition and research objectives are not disclosed.
Comment 10: The article does not specify whether this method is suitable for different soil types or only for a specific territory. If it is for a specific territory, then the article has very limited scientific novelty.
Response:
Thanks for your suggestion.
Thank you for your attention to the soil sample representativeness. In this study, soil sampling was conducted in the Chengmai Meiting Agroforestry Complex Ecosystem Hainan Observation and Research Station, Hainan Province, China (19°49′N, 110°5′E). Two undisturbed cylindrical soil columns (10 cm in diameter and 10 cm in height) were randomly collected from a well-structured A horizon of Ferralsols (FAO/UNESCO), corresponding to Oxisols (USDA). The site is located in a tropical region characterized by a warm, humid climate with abundant rainfall and deeply weathered red soils that are highly suitable for tropical agriculture. The bulk density of the 0–20 cm layer was 1.37 g·cm⁻³. The soil texture was classified as silt loam, consisting of 29.33% sand, 44.17% silt, and 26.50% clay. Chemical properties included a pH of 5.67, soil organic matter content of 11.13 g·kg⁻¹, total phosphorus of 0.58 g·kg⁻¹, and total nitrogen of 0.89 g·kg⁻¹.
Although the experimental data in this study were collected from a specific soil type and region, our proposed segmentation method—MMLFR-UNet—is designed based on the common structural and frequency characteristics present in soil CT images. Therefore, the model has theoretical potential for generalization. In future work, we plan to incorporate CT images from various soil types and regions to further verify the adaptability and generalizability of our method, thereby enhancing the scientific significance and application scope of the research.
Comment 5
While the authors compare their method with traditional approaches and UNet, it would be interesting to compare it with other state-of-the-art image segmentation methods such as DeepLab, Mask R-CNN, or Transformer-based models.
Response:
Thanks for your suggestion.
In response, we have expanded the comparative discussion by incorporating recent state-of-the-art image segmentation methods into our evaluation. Specifically, we benchmarked the proposed MMLFR-UNet against three recent and representative models: (1) a hybrid U-Net–LSTM method by Liu et al. (2024); (2) an improved UNet-VAE model developed by Han et al. (2024); and (3) the Transformer-CNN fusion model ACFTransUNet proposed by Song et al. (2024). These models represent current advancements in deep learning–based soil pore segmentation, including Transformer-based architectures as you kindly suggested. Results show that our model achieved a higher pixel accuracy of 98.83% on 2D CT images and better handled small-pore detection and boundary preservation under noisy conditions. This comparative analysis has been added to the discussion section to highlight the effectiveness and potential advantages of our method.
We have added relevant content to the text:
“Furthermore, to demonstrate the comparative advantage of MMLFR-UNet, we benchmark it against two recent studies. Liu[27] et al. (2024) proposed a 3D pore segmentation method based on a hybrid U-Net and LSTM network, reporting a highest test accuracy of 96.49%. Han et al. (2024) developed an improved UNet-VAE model for multi-type pore segmentation, with an average accuracy of 93.83% across four pore categories. We further compared the segmentation performance of our method with the recent ACFTransUNet model proposed by Song[42] et al. (2024), which integrates Transformer and CNN with concentrated-fusion attention for multi-category 3D soil pore segmentation. According to their results, the highest reported average accuracy across four pore types was 94.12%. In comparison, the MMLFR-UNet in this study achieves a pixel accuracy of 98.83% on 2D images, with significantly better results in detecting small pores under noisy conditions and preserving more complete boundary morphology.”
A revised manuscript with the red marked correction in Line 487-499, Page 16 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 6
No assessment of the risks of incorrect analysis and recognition of the sample structure is conducted; the focus is on identifying the mechanical porous structure of the soil according to the haracteristics of crop cultivation and the ability to retain groundwater.
Response:
Thanks for your suggestion.
We sincerely thank the reviewer for this valuable and forward-looking comment. We acknowledge that the current study focuses primarily on improving the segmentation accuracy of soil pore structures from a technical and image-processing perspective. The assessment of potential risks related to misidentification of structural features—and their implications for crop cultivation, groundwater retention, or other agronomic interpretations—is indeed an important and meaningful direction. Although such analysis falls outside the scope of the current work, we fully recognize its significance and will actively consider integrating such risk evaluations and agronomic context into future research. Thank you again for your thoughtful guidance.
Comment 7
Existing methodologies used over the past 50 years are not considered, and no analysis of them is provided.
Response:
Thanks for your suggestion.
We fully acknowledge the importance of reviewing classical methodologies to contextualize recent advances. In response, we have expanded the Introduction section to include a discussion of traditional empirical and physically-based approaches that have been developed over the past decades. Specifically, we have outlined how soil physics has historically relied on Representative Elementary Volume (REV) models and classical equations such as Darcy's law to estimate pore-related parameters like porosity and permeability. However, we also discuss their limitations in capturing the heterogeneity and connectivity of complex pore structures. Furthermore, in Section 4.1, we conducted a detailed comparison of four segmentation methods—Otsu, Fuzzy C-Means (FCM), UNet, and our proposed MMLFR-UNet—covering both classical and contemporary techniques. This extended analysis addresses the evolution of soil pore segmentation methods and highlights the advantages of the proposed approach.
We have added relevant content to the text:
“Conventional soil physics has historically relied on empirical or physically-based models based on Representative Elementary Volume (REV) assumptions to achieve such quantitative characterisations[5, 6]. These models are frequently employed to estimate parameters such as porosity, permeability, and specific surface area, often utilising classical flow equations like Darcy's law and presuming a homogeneous medium at a defined spatial scale. Nevertheless, REV-based approaches are constrained in their capacity to capture the microstructural characteristics, connectivity, and heterogeneity of complex soil pore networks. With the advent of computed tomography (CT) imaging, it is now possible to visualize the internal three-dimensional pore architecture of soil samples at micrometer-scale resolution, allowing direct analysis of spatial variability and structural complexity. This capability enhances our understanding of fluid dynamics in soil systems and provides essential data for investigating microbial community distributions and plant–root interactions[7-9]. As a non-destructive and high-resolution technique, X-ray CT has become a powerful tool in soil science, enabling both two- and three-dimensional visualization of pore structures and supporting accurate quantification of their morphological characteristics. “
A revised manuscript with the red marked correction in Line 49-64, Page 2 was attached as the supplemental material entitled by “Revised Manuscript”.
Comment 8
No classification of soils based on basic structures, chemical and agricultural properties is performed, and the target task for the agricultural complex of problems is not formulated.
Response:
Thanks for your suggestion.
We appreciate the reviewer’s concern regarding the classification of soils and the clarification of the target application. In the revised version, we have supplemented the Introduction with a clear articulation of the objectives of this study, which emphasize not only the comparison of segmentation methods but also the support this work aims to provide for the quantitative characterization of soil pore structures—an essential aspect of agricultural and ecological soil management. Moreover, detailed soil classification based on structure, texture, and chemical properties has been provided in Section 2.1 (Soil Sampling), where we describe the Ferralsols (FAO/UNESCO classification), their physical properties (bulk density, texture), and chemical indicators (e.g., pH, organic matter, nitrogen, phosphorus). These additions help frame the relevance of the proposed method within the context of agricultural soil analysis and reinforce the practical implications for land use and crop-root interaction studies.
We have added relevant content to the text:
“In this study, we propose a hybrid segmentation model, MMLFR-UNet, which integrates Multi-Modal Low-Frequency Reconstruction (MMLFR) based on two-dimensional variational mode decomposition (2Dvmd) with the UNet architecture to enable precise segmentation of heterogeneous soil CT images. The objectives of this study are: (1) to compare the performance of MMLFR-UNet with traditional pore identification methods (e.g., Otsu, Fuzzy C-Means and Unet) through both quantitative and qualitative experiments; and (2) to provide technical support for the subsequent quantitative characterization of soil pore structures.”
A revised manuscript with the red marked correction in Line 109-116, Page 3 was attached as the supplemental material entitled by “Revised Manuscript”.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsSe agradece el esfuerzo de los autores, se han incorporado las recomendaciones realizadas, considero publicable este trabajo
Author Response
We sincerely appreciate your positive evaluation and kind recognition of our efforts. Your constructive feedback greatly contributed to improving the clarity and quality of our manuscript. Thank you for recommending the publication of this work.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
Unfortunately, I get the impression that the authors still have not understood the principles of creating a good scientific article. I appreciate your contribution to improving the manusctipt, but the discussion of the results still does not look professional. I would avoid pasting figures, diagrams, tables in this part of the paper, and focus only on comparing the results of our research with those of other authors. Certainly, a subsection should be moved: Evaluation indicators to another place -> preferably before the results, in the methodological part. Similarly with table 2- > it is a typical table for the results section.
Please do a solid rewrite of the discussion.
Author Response
Response to Reviewer 2:
I am very grateful to your comments for the manuscript. Thank you for your advice. All your suggestions are very important. They have important guiding significance for our paper and our research work. We have revised the manuscript according to your comments. The response to each revision is listed as following:
Comment 1
Unfortunately, I get the impression that the authors still have not understood the principles of creating a good scientific article. I appreciate your contribution to improving the manusctipt, but the discussion of the results still does not look professional. I would avoid pasting figures, diagrams, tables in this part of the paper, and focus only on comparing the results of our research with those of other authors. Certainly, a subsection should be moved: Evaluation indicators to another place -> preferably before the results, in the methodological part. Similarly with table 2- > it is a typical table for the results section.
Response:
Thanks for your suggestion.
We carefully reviewed the structure of our manuscript in light of your comment and made the following revisions accordingly:
The "Evaluation Indicators" subsection has been moved from the original Discussion section to the Methodology part (now located 2.6. Evaluation indicators), as per your suggestion.
Figure 10 and Table 2, along with their corresponding analysis, have also been removed from the Discussion section and integrated into the Results section (now Sections 3.2.3 and 3.2.4), where they are more appropriately placed.
The Discussion section has been fully rewritten and is now focused exclusively on comparing the performance of our method with those of recent advanced models reported in the literature, such as UNet-VAE, MFHSformer, and ACFTransUNet. This restructuring ensures a more professional and focused comparative discussion, consistent with academic standards.
We sincerely hope that these revisions align better with your expectations for a high-quality scientific article.We have added relevant content to the text:
“4. Discussion
4.1 Comparative Performance with Recent Segmentation Models
To evaluate the performance and applicability of the proposed MMLFR-UNet model, we conducted a comparative analysis with several recent advanced segmentation frameworks. These include the UNet-VAE model by Han et al. (2024), the MFHSformer model by Bai et al. (2025), and the ACFTransUNet by Song et al. (2024). The following discourse aims to elucidate the merits and deficiencies of our methodology in relation to contemporary state-of-the-art techniques.
On 2D soil CT datasets, MMLFR-UNet achieved a pixel accuracy of 98.83%, a Dice coefficient of 0.8714, an IoU of 0.7790, and a Boundary F1-score of 0.5236. These results reflect the model's superior ability to identify small pores, preserve boundary integrity, and resist noise interference—key challenges in soil image segmentation.
In comparison with the work of Han et al. (2024), who developed a UNet-VAE[48] model achieving an average accuracy of 93.83% across four pore types, MMLFR-UNet offers higher overall accuracy and better adaptability to small, irregular pores. However, the UNet-VAE may offer enhanced performance in the context of multi-class segmentation, a domain not directly addressed by our current binary segmentation framework. In 2025, Bai et al. presented MFHSformer[49], a transformer-based architecture that achieved a reported F1-score of 84.51% and an accuracy of 99.40%. While the model demonstrates excellent precision, it is computationally intensive and may require larger datasets for optimal performance. Conversely, MMLFR-UNet demonstrates superior boundary preservation in 2D CT images and requires a reduced amount of training data. Song et al. (2024) proposed ACFTransUNet[50] for multi-category 3D pore segmentation and reported an average accuracy of 94.12%. While the model demonstrates excellent capabilities for volumetric analysis, MMLFR-UNet is specifically tailored for 2D binary segmentation and achieves higher pixel-level accuracy with fewer parameters and lower computational cost.
4.2 Applicability and Limitations of MMLFR-UNet
The MMLFR-UNet model demonstrates strong adaptability in small-sample learning scenarios, making it particularly effective in applications where annotated datasets are limited. Its integration of 2D-VMD-based low-frequency reconstruction significantly enhances structural feature preservation, allowing accurate segmentation of small pores and complex boundaries even under conditions of low contrast and high noise. These characteristics make the model highly suitable for soil CT image analysis, where such challenges are prevalent. However, the MMLFR preprocessing—especially the Fourier decomposition and recombination stages—introduces additional computational overhead compared to traditional workflows. Furthermore, while the current work focuses on 2D binary segmentation, the generalizability of the model to 3D volumetric data and multi-class segmentation tasks remains to be further validated through expanded experimentation.”
A revised manuscript with the red marked correction in Line 281-520, Page 16 was attached as the supplemental material entitled by “Revised Manuscript”.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsNo further comments.
Author Response
Thank you for your time and review. We appreciate your evaluation and are pleased that no further revisions are required. Your constructive feedback greatly contributed to improving the clarity and quality of our manuscript. Thank you for recommending the publication of this work.
Reviewer 4 Report
Comments and Suggestions for AuthorsThe article has been sufficiently reviced and can be published
Author Response
Thank you very much for your positive comments and recognition of our work. We sincerely appreciate your careful review and constructive feedback throughout the revision process. Your constructive feedback has greatly helped us improve the clarity and quality of the manuscript. Thank you for recommending this book for publication.
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsI accept for publication