Non-Linear Models for Assessing Soil Moisture Estimation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article presents an important contribution to science. The development of tools to assist in soil moisture forecasting is a solution for decision-making, considering different field managements. However, the article presents many problems in the presentation of the results. Some information presented in the supplementary material is too important to be presented throughout the article. There is room for improvement in the presentation of the results. The figure should be presented after the text.
In the discussion, the authors did not make any citations. It is extremely necessary to do a complete review of the article.
Finally, I recommend changing the name of the article and perhaps the journal is not the most suitable for publication, since the tool is not directly connected to the monitoring of horticultural areas. The authors present a general model for soil moisture forecasting.
Comments for author File: Comments.pdf
Author Response
Reviewer #1: The article presents an important contribution to science. The development of tools to assist in soil moisture forecasting is a solution for decision-making, considering different field managements. However, the article presents many problems in the presentation of the results.
Comment 1: Some information presented in the supplementary material is too important to be presented throughout the article.
Response 1:
We agree that some of the supplementary materials are essential to the core narrative of the paper. Accordingly, we have moved the following contents into the main manuscript: Figure S6 was moved to Section 3.1 (now Figure 2).
Comment 2: There is room for improvement in the presentation of the results. The figure should be presented after the text.
Response 2: All figures have now been repositioned to appear immediately after the paragraphs where they are first mentioned, in accordance with journal formatting guidelines. This reordering improves the logical flow and readability of the results section.
Comment 3: In the discussion, the authors did not make any citations. It is extremely necessary to do a complete review of the article.
Response 3: We have thoroughly revised the Discussion section to include recent and relevant citations including references from 39 to 49 that support our interpretations and compare our results with existing literature. This improves the scientific depth and rigor of the discussion.
Comment 4: Finally, l recommend changing the name of the article and perhaps the journal is not the most suitable for publication, since the tool is not directly connected to the monitoring of horticultural areas. The authors present a general model for soil moisture forecasting.
Response 4: We understand the concern regarding the scope of the article. The motivation is to support diverse agricultural applications, including but not limited to precision agriculture and ecological monitoring, which aligns well with the journal’s scope on the standardization of environmental monitoring technologies. To clarify the broader applicability of our tool, we have revised the article title to better reflect the generalized modeling framework we proposed. Revised Title: “Non-linear models for assessing soil moisture estimation”.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsPaper is generally well written, but there are some shortcomings that authors should improve. Material and Methods should be improved. Input parameters are not clearly described, Discussion is poor, without giving support by references and have to be improved. There are many typos. Abbreviations are not explained, in the Table there are some parameters that are not used in the main document. Some specific remarks are given direcltz in the text.
Comments for author File: Comments.pdf
Author Response
Reviewer #2: Paper is generally well written, but there are some shortcomings that authors should improve.
Comment 1: Material and Methods should be improved. Input parameters are not clearly described,
Response 1: The revised Materials and Methods section includes the following improvements:
- We have added a description of the measurement methods, sensor specifications, calibration procedures, and data quality control protocols. The revised sections are highlighted in yellow in the updated manuscript.
The SM dataset used in this study was obtained from the National Ecological Science Data Center and spans the period from 2003 to 2010 at the Changbai Mountain Station[1]. SM was measured at depths of 5 cm, 20 cm, and 50 cm using CS616 frequency domain reflectometry (FDR) sensors (Campbell Scientific Inc., USA), which recorded volumetric water content (VWC, m³·m⁻³) at hourly intervals. To ensure measurement reliability, a two-step calibration process was employed. Factory calibration was first conducted by the sensor manufacturer. Additionally, field calibration was periodically performed by comparing sensor readings with gravimetric soil moisture measurements obtained via the oven-drying method, following the Chinese national standard GB/T 22314-2008. Meteorological variables were collected using a series of high-precision instruments. Air temperature and relative humidity were measured by HMP45C sensors (VAISALA, Finland), while wind speed was recorded by A100R anemometers (Vector Instruments, UK). Precipitation was measured by a Model 52203 tipping-bucket rain gauge (RM YOUNG, USA). Global solar radiation was monitored using a CM11 pyranometer (KIPP & ZONEN, Netherlands). Soil temperature was measured by vertically embedded 105T thermistor probes (Campbell Scientific Inc., USA) at five depths: 0 cm (surface), 5 cm, 20 cm, 50 cm, and 100 cm. A complete list of meteorological variables is provided in Table S1, with wind speed and direction summarized in Table S2.
- The basis for the selection of key model parameters has been further explained and is clearly stated in the revised manuscript (see sections 2.4.2-2.4.4).
2.4.2. K-means Clustering
The K-means clustering algorithm partitions data into k clusters by calculating the Euclidean distance (Eq. S6) between data points and cluster centroids. The flow of the algorithm is illustrated in Fig. 2. The optimal number of clusters (k) was determined using the elbow method, which assesses the reduction in the Sum of Squared Errors (SSE) (Eq. S7). As k increases, the rate of decrease in the SSE begins to flatten, with a significant reduction in SSE observed at k=3. This indicates that further increasing k offers diminishing improvements in clustering performance. Based on the elbow method, we selected k=3 as the optimal number of clusters, and combined the three clusters with the MLP neural network, thereby enhancing the accuracy of soil moisture estimation.
2.4.3. MLP
The MLP neural network is a type of feedforward network structure capable of mapping multiple inputs to corresponding outputs, as illustrated in Fig. S2. The MLP employs a backpropagation algorithm, iteratively adjusting weights to minimize the error between predicted and actual outputs until the error converges below a pre-specified threshold.
2.4.4. ARIMA
The ARIMA model is commonly used for forecasting time-series data. It incorporates three key components: autoregressive (AR), integrated (I), and moving average (MA), denoted as ARIMA (p, d, q). Fig. S3 illustrates the structure of the ARIMA model, and its general form is provided in Eq. S8. To predict soil moisture (SM) at various depths, a systematic approach was employed to determine the optimal ARIMA parameters:
(1) Parameter Determination
First, the stationarity of each SM time series was evaluated using the Augmented Dickey-Fuller (ADF) test. For non-stationary series (p > 0.01), first-order differencing (d = 1) was applied. If stationarity was not achieved, higher-order differencing was considered. Next, a grid search was performed to identify the optimal combinations of p and q, guided by the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). The model with the lowest AIC and BIC values was selected for each soil depth.
(2) Model Order Identification by Depth
5 cm depth: The autocorrelation function (ACF) and partial autocorrelation function (PACF) revealed strong short-term dependencies. Based on grid search results, ARIMA(0,1,6) yielded the lowest AIC (-412.3), and BIC (-398.7), was identified as the optimal model.
20 cm depth: The PACF exhibited a sharp cutoff at lag 1, while the ACF displayed a tailing pattern up to lag 2, supporting the selection of ARIMA(1,1, 2) in Fig. S3.
50 cm depth: The ACF showed a gradual decay, and the PACF truncated at lag 2 in Fig. S7, indicating ARIMA(2,1,1) as the suitable model.
(3) Model Performance Evaluation
Model performance was assessed using the coefficient of determination (R²) and root mean square error (RMSE). The results are as follows:
Depth |
Model structure |
R2 |
RMSE(m3·m-3) |
5 cm |
ARIMA(0,1,6) |
0.9677 |
0.020 |
20 cm |
ARIMA(1,1,2) |
0.9853 |
0.015 |
50 cm |
ARIMA(2,1,1) |
0.9684 |
0.006 |
These findings indicate that the ARIMA model provides reliable and accurate predictions of soil moisture across different depths, with the best performance observed at 20 cm.
The AIC criterion (Akaike Information Criterion) is formulated as:
where L represents the model's likelihood function, and K is the number of model parameters.
However, AIC may not converge to the true model when the sample size is large. The BIC (Bayesian Information Criterion), is defined as:
where n is the sample size, provides a more consistent model selection in such cases. In this study, both AIC and BIC were minimized via grid search to determine optimal ARIMA configurations. The ARIMA model is commonly used for its efficacy in forecasting time-series data with dynamic, random patterns. ARIMA models combine autoregressive (AR), integrated (I), and moving average (MA) components, denoted as ARIMA (p, d, q). The ARIMA structure is illustrated in Fig. S3, while Eq. S8 presents the general ARIMA formula[2].
Comment 2: Discussion is poor, without giving support by references and have to be improved.
Response 2: We have thoroughly revised the Discussion section to include recent and relevant citations including references from 39 to 49 that support our interpretations and compare our results with existing literature. This improves the scientific depth and rigor of the discussion.
Comment 3: There are many typos.
Response 3: The manuscript has undergone a thorough language revision. All typographical and grammatical errors have been corrected to ensure linguistic clarity and professionalism.
Comment 4: Abbreviations are not explained, in the Table there are some parameters that are not used in the main document. Some specific remarks are given direcltz in the text.
Response 4: We have revised the Abbreviations and Nomenclature section. All abbreviations mentioned in the main text are now listed alphabetically with clear definitions. A list of abbreviations has been added to the end of the manuscript for reference. The revised sections are highlighted in yellow.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe study, titled “Assessing Soil Moisture Using ARIMA for One-Day Forecasting and K-MLP for Real-Time Estimation”, presents a comparative analysis of soil moisture prediction using ARIMA and a Kernel-based Multilayer Perceptron (K-MLP) model in the Changbai Mountain region. It assesses model performance at varying soil depths, with a primary focus on forecasting accuracy.
Specific Comments and Suggested Improvements:
- The manuscript does not provide information on how soil moisture content was measured or whether any calibration procedures were applied. This information is essential to assess the reliability of the ground truth data used for model development and evaluation. I recommend including a clear description of the measurement method, the type of sensors (if applicable), calibration approach, and any data quality control procedures.
- In Section 2.4.2. K-means Clustering, please correct the reference from “(Eq. S5)” to “(Eq. S6)” in the second line.
- The manuscript does not currently provide a justification for the choice of ARIMA and K-MLP models over more recent deep learning approaches such as LSTM or GRU. A brief explanation of the rationale behind choosing these models, along with a mention of more complex alternatives in the discussion or future work section, would strengthen the study.
- The method for outlier detection is vague. Please provide additional details on how outliers were identified and removed. For example, were there specific statistical criteria, thresholds, or algorithms used in conjunction with the visual methods mentioned (time series plots and block diagrams)?
- The manuscript mentions that ARIMA achieved high accuracy at 20 cm, but it would be beneficial to include a theoretical explanation or interpretation of why this depth yielded better results. Expanding on this point would provide more context and insight into the findings.
Author Response
Reviewer #3:
Comment 1:The manuscript does not provide information on how soil moisture content was measured or whether any calibration procedures were applied. This information is essential to assess the reliability of the ground truth data used for model development and evaluation. I recommend including a clear description of the measurement method, the type of sensors (if applicable), calibration approach, and any data quality control procedures.
Response 1: In the revised version of Section 2.3, We have added a description of the measurement methods, sensor specifications, calibration procedures, and data quality control protocols. The revised sections are highlighted in yellow in the updated manuscript.
The SM dataset used in this study was obtained from the National Ecological Science Data Center and spans the period from 2003 to 2010 at the Changbai Mountain Station. SM was measured at depths of 5 cm, 20 cm, and 50 cm using CS616 frequency domain reflectometry (FDR) sensors (Campbell Scientific Inc., USA), which recorded volumetric water content (VWC, m³·m⁻³) at hourly intervals. To ensure measurement reliability, a two-step calibration process was employed. Factory calibration was first conducted by the sensor manufacturer. Additionally, field calibration was periodically performed by comparing sensor readings with gravimetric soil moisture measurements obtained via the oven-drying method, following the Chinese national standard GB/T 22314-2008. Meteorological variables were collected using a series of high-precision instruments. Air temperature and relative humidity were measured by HMP45C sensors (VAISALA, Finland), while wind speed was recorded by A100R anemometers (Vector Instruments, UK). Precipitation was measured by a Model 52203 tipping-bucket rain gauge (RM YOUNG, USA). Global solar radiation was monitored using a CM11 pyranometer (KIPP & ZONEN, Netherlands). Soil temperature was measured by vertically embedded 105T thermistor probes (Campbell Scientific Inc., USA) at five depths: 0 cm (surface), 5 cm, 20 cm, 50 cm, and 100 cm. A complete list of meteorological variables is provided in Table S1, with wind speed and direction summarized in Table S2. To ensure data quality, preprocessing steps included outlier removal, missing value filling, and standardization. Outlier removal uses Tukey’s fences method (threshold = 1.5×Interquartile range) applied to hourly aggregated data. Outliers were identified using Tukey’s fences method, which detects values lying beyond 1.5 times the interquartile range (IQR) from the first (Q1) and third quartiles (Q3). Specifically, hourly-aggregated data were used to compute Q1 and Q3 for each variable, and any data points below Q1 - 1.5×IQR or above Q3 + 1.5×IQR were flagged as outliers. Visual inspection using time series plots and boxplots was conducted to identify and interpret outliers. For missing value treatment, linear interpolation was applied to data segments with gaps shorter than 2 hours. For longer-term missing meteorological data, imputation was preferentially carried out using observations from nearby meteorological stations. If such data were unavailable, the missing values were filled using the average diurnal variation method. To eliminate the influence of variable dimensions, the data were uniformly standardized with Z-score.
Comment 2:In Section 2.4.2. K-means Clustering, please correct the reference from “(Eq. S5)” to “(Eq. S6)” in the second line.
Response 2: This typo has been corrected. The equation has been corrected from “(Eq. S5)” to “(Eq. S6)” in Section 2.4.2.
Comment 3: The manuscript does not currently provide a justification for the choice of ARIMA and K-MLP models over more recent deep learning approaches such as LSTM or GRU. A brief explanation of the rationale behind choosing these models, along with a mention of more complex alternatives in the discussion or future work section, would strengthen the study.
Response 3: Numerous studies have explored a wide range of strategies for SM prediction, including physically based models, water balance approaches, and advanced machine learning techniques. LSTM networks have demonstrated strong performance in capturing long-term temporal dependencies across various soil profiles. However, LSTM typically demands large amounts of training data and considerable computational resources. When applied to small datasets, LSTM is prone to overfitting, which limits their generalizability and practical use in data-scarce environments. In contrast, our models based on MLP and ARIMA presented here strike a more practical balance between accuracy, computational efficiency, and model interpretability, particularly in scenarios with limited historical observational data. This comparison highlights the robustness and feasibility of our modeling framework. The competitiveness of this trade-off is supported by the model's high predictive accuracy—ARIMA achieved an R² of 0.9684 at the 20 cm depth.
Comment 4: The method for outlier detection is vague. Please provide additional details on how outliers were identified and removed. For example, were there specific statistical criteria, thresholds, or algorithms used in conjunction with the visual methods mentioned (time series plots and block diagrams)?
Response 4:
The revised description has been incorporated into the updated “2.3.2Data Preprocessing” section. To ensure data quality, preprocessing steps included outlier removal, missing value filling, and standardization. Outlier removal uses Tukey’s fences method (threshold = 1.5×Interquartile range) applied to hourly aggregated data. Outliers were identified using Tukey’s fences method, which detects values lying beyond 1.5 times the interquartile range (IQR) from the first (Q1) and third quartiles (Q3). Specifically, hourly-aggregated data were used to compute Q1 and Q3 for each variable, and any data points below Q1 - 1.5×IQR or above Q3 + 1.5×IQR were flagged as outliers. Visual inspection using time series plots and boxplots was conducted to identify and interpret outliers. For missing value treatment, linear interpolation was applied to data segments with gaps shorter than 2 hours. For longer-term missing meteorological data, imputation was preferentially carried out using observations from nearby meteorological stations. If such data were unavailable, the missing values were filled using the average diurnal variation method. To eliminate the influence of variable dimensions, the data were uniformly standardized with Z-score.
Comment 5: The manuscript mentions that ARIMA achieved high accuracy at 20 cm, but it would be beneficial to include a theoretical explanation or interpretation of why this depth yielded better results. Expanding on this point would provide more context and insight into the findings.
Response 5: The ARIMA model’s autoregressive component effectively captures the correlations between past and future observations, making it well suited for short‑term SM forecasting. In this study, the optimized ARIMA models demonstrated consistent predictive performance across all soil depths. The best overall model fit was observed at the 20 cm depth (R2 = 0.9853, RMSE = 0.015 m3·m-3), while the weakest performance occurred at 5 cm. The variation in model accuracy across depths stems not only from lagged precipitation effects but also from inherent soil properties that affect infiltration rates and water retention. The shallow layer (5 cm) responds quickly to short-term meteorological fluctuations such as precipitation and evaporation, introducing high‑frequency noise from surface perturbations. Capturing this variability required a moving‑average term of order six (q = 6), but fitting so many parameters on limited data raises uncertainty. The 20 cm layer lies between the dynamic surface layer and the more stable deeper layer. Here, an ARIMA (1,1,2) structure—combining one autoregressive term and two moving‑average terms—balances “memory” of past moisture and stochastic shocks. This configuration aligns with the model’s line-ar framework and yields strong autocorrelation, explaining its superior performance at this depth.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsMy suggestions from the initial review have been thoroughly addressed by the author, leading to substantial improvements in the manuscript's quality. I now believe it meets the standards for publication in this journal.