Non-Linear Models for Assessing Soil Moisture Estimation

Li, Rui; Wang, Susu; Wu, Han; Dong, Hao; Kong, Dezhi; Li, Hanxue; Zhang, Dorothy S.; Chen, Haitao

doi:10.3390/horticulturae11050492

Open AccessArticle

Non-Linear Models for Assessing Soil Moisture Estimation

by

Rui Li

^1,2,

Susu Wang

¹,

Han Wu

³

,

Hao Dong

⁴,

Dezhi Kong

¹,

Hanxue Li

⁵,

Dorothy S. Zhang

⁶ and

Haitao Chen

^1,2,*

¹

College of Engineering, Northeast Agricultural University, Harbin 150030, China

²

Heilongjiang Provincial Engineering Research Center for Mechanization and Materialization of Major Crops Production, Harbin 150030, China

³

School of Computer and Internet of Things Engineering, Chongqing Institute of Engineering, Chongqing 400900, China

⁴

School of Public Administration, Dongbei University of Finance and Economics, Dalian 116025, China

⁵

College of Mechanical and Electrical Information, Shangqiu University, Shangqiu 476000, China

⁶

Computing Technology and Information Systems Department, Guilford College, Greensboro, NC 27410, USA

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(5), 492; https://doi.org/10.3390/horticulturae11050492

Submission received: 27 March 2025 / Revised: 21 April 2025 / Accepted: 28 April 2025 / Published: 30 April 2025

(This article belongs to the Special Issue Applied Artificial Intelligence in Digital Horticulture: Practices and Innovations)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Accurately estimating soil moisture (SM) without direct measurements poses significant challenges due to nonlinear interactions in meteorological variables and the lagged response of SM to precipitation. This study evaluates two approaches: the auto-regressive integrated moving average (ARIMA) model for one-day-ahead SM forecasting and a K-means clustering-based multilayer perceptron (K-MLP) for real-time SM estimation at depths of 5 cm, 20 cm, and 50 cm in Changbai Mountain region. Although the K-MLP model outperformed the MLP model, achieving a maximum R² of 0.728, its estimation accuracy remains suboptimal. By contrast, the ARIMA model effectively leveraged SM persistence, achieving high accuracy in one-day-ahead forecasting. Specifically, the ARIMA (0, 1, 6), ARIMA (1, 1, 2), and ARIMA (2, 1, 1) models yield R² values of 0.9677, 0.9853, and 0.9684 and RMSE values of 0.02 m³·m³, 0.015 m³·m³, and 0.006 m³·m³ at depths of 5 cm, 20 cm, and 50 cm, respectively. This study explores ARIMA’s robustness in short-term SM forecasting and its adaptability to dynamic meteorological conditions, offering potential applications in agricultural water management and ecological monitoring.

Keywords:

soil moisture estimation; auto-regressive integrated moving average; multilayer perceptron; K-means clustering

1. Introduction

Soil moisture (SM) is a pivotal parameter governing soil properties, crop growth, ecosystem stability, and irrigation efficiency [1,2]. One-day-ahead SM forecasting plays a crucial role in agricultural irrigation management, flood risk assessment, and drought monitoring, particularly under rapidly changing meteorological conditions [3]. Reliable short-term forecasting can optimize irrigation schedules, minimize water waste, and enhance crop yield [4]. In flood-prone regions, sudden increases in soil moisture can trigger surface runoff, underscoring the importance of early warning systems for flash floods [5]. Similarly, in arid and semi-arid areas, real-time SM fluctuations affect drought monitoring and short-term mitigation strategies, helping to improve water-use efficiency [6] by alleviating short-term soil moisture stress. Because the SM is highly dynamic, continuous monitoring is essential for capturing global trends [7]. Although SM exhibits persistence over multiple days, immediate predictive information remains critical for informed decision-making.

Traditional in-situ monitoring methods provide direct and accurate SM measurements but are costly and geographically limited, restricting large-scale deployment, especially in remote and mountainous regions [7]. Ground-based observations often suffer from sparse spatial coverage and temporal discontinuities, particularly in arid and semi-arid zones where SM variability is pronounced. These limitations necessitate alternative approaches for comprehensive SM assessments at regional or global scales.

Meteorological data offer a promising solution due to their wide availability, real-time accessibility, and compatibility with indirect SM estimation models. Variables such as precipitation, temperature, and solar radiation serve as proxies for SM dynamics, particularly when integrated into data-driven frameworks [8]. However, the nonlinear relationships between meteorological factors and SM introduce significant modeling challenges. For example, factors such as precipitation effects [9], the temperature–evapotranspiration relationship [10], and spatial heterogeneity [11] complicate the development of generalized models, requiring innovative approaches to disentangle these complexities.

Traditional regression models, such as linear and multiple regression, often struggle to capture these nonlinear trends, as they fail to account for the time-series characteristics [12]. Linear regression is too simplistic for complex environmental systems, while multiple linear regression suffers from weak correlations, unstable regression weights, and poor reproducibility. Although polynomial regression can accommodate some nonlinearities by incorporating higher-order terms, it is prone to overfitting, particularly with small datasets [13]. These limitations make traditional regression approaches inadequate for accurate SM forecasting.

Numerous prediction methods have been developed, including empirical equations, water-balance approaches, recession index methods, soil dynamics modeling [14], TSA, sensor measurements [15,16], and artificial neural networks (ANNs) [17]. Among these, ANNs offer significant advantages for handling nonlinear relationships and complex data structures. Their multi-layered architecture enables them to learn high-level features from large datasets, capturing patterns that traditional regression models fail to recognize [18].

Various models have been employed for soil moisture prediction, including data-driven approaches such as ANNs, support vector machines (SVMs), and random forests, as well as statistical methods such as the autoregressive integrated moving average (ARIMA) model [19] These models vary in their capacity to handle nonlinear relationships, capture temporal dependencies, and respond to spatial or climatic variability.

Among these approaches, ANNs offer notable advantages. They automatically adjust internal parameters based on data characteristics, eliminating the need for predefined assumptions about input–output relationships [20]. Due to their flexibility for handling time-series, image-based, and multidimensional data, ANNs have been widely adopted in agricultural applications [21], weather forecasting [22], and soil moisture estimation. To meet the diverse demands of SM modeling, different ANN architectures have been developed. For example, long short-term memory (LSTM) networks effectively capture temporal dependencies across heterogeneous soil layers [23] convolutional neural networks (CNNs) extract spatial features from remote sensing imagery [24], and transformer-based models support long-range sequence prediction under variable meteorological conditions [25]. The choice of architecture typically involves a balance between predictive accuracy, computational cost, and interpretability and should reflect both data availability and application-specific constraints.

Multilayer perceptron (MLP), a widely used ANN structure, effectively models nonlinear interactions between meteorological inputs and soil moisture. However, its performance tends to decline at deeper soil layers due to reduced responsiveness to surface conditions and greater influence from subsurface processes, such as groundwater dynamics [26]. To address this limitation, our study integrates a K-means clustering algorithm into the MLP framework. This preprocessing step reduces variability within input groups, improving the training efficiency and predictive performance of the model. Despite these enhancements, the K-MLP model still faces challenges [27]. In particular, soil moisture responds to precipitation with a time lag, and this delay is further complicated by soil infiltration properties and hydraulic conductivity [28,29].

The ARIMA (autoregressive integrated moving average) model is a well-established tool for time-series forecasting that addresses lag effects in the relationship between meteorological factors and SM [30]. Studies have shown a strong correlation between precipitation and SM, and the ARIMA model effectively captures both trends and periodic variations in time-series data [31]. It is well-suited for stationary time-series data or those that can be made stationary through differencing. With its interpretable parameters (p, d, q), ARIMA provides a structured approach to modeling temporal dependencies, making it highly effective for short-term SM forecasting [31,32].

TSA (time-series analysis) of SM reveals a high degree of autocorrelation across soil layers, which diminishes with increasing time lag, indicating that SM is strongly influenced by its prior values. Previous research confirms that SM exhibits significant autocorrelation, with a gradual decline over time [33]. TSA methods therefore offer advantages for simulating and predicting these time-dependent changes [34,35]. Although precipitation is a key driver of soil moisture variation, its effects are time-lagged and vary with soil depth. In this study, the ARIMA model is employed to leverage SM’s strong temporal persistence, enhancing the accuracy of short-term forecasting.

This study investigates sensor-free methods for estimating SM using two algorithmic approaches (K-MLP and ARIMA), providing cost-effective solutions for agricultural and ecological management. Utilizing meteorological data from Changbai Mountain, we developed a real-time K-MLP model (with no lead time) for SM estimation and an ARIMA model for one-day-ahead forecasting. We systematically explored their performance at soil depths of 5 cm, 20 cm, and 50 cm.

2. Materials and Methods

2.1. Model Design

The design process for the K-MLP and ARIMA models is shown in Figure 1. This study introduces the K-MLP model. In parallel, an ARIMA model is developed to address lag effects in SM forecasting. The predictive performance of both models is evaluated and compared to assess their accuracy.

2.2. Study Area

The SM data were collected from Changbai Mountain Station, located in Jilin Province, China (128°05′45″ E, 42°24′9″ N), at an elevation of 738 m above sea level. This region has a temperate continental climate with substantial monsoon influences and mid-latitude mountain weather characteristics [36]. This region experiences an annual average temperature of 3.6 °C and annual precipitation of 713 mm, with the majority of rainfall occurring from June to August. The area receives 6–7 h of sunshine daily, with an annual frost-free period of approximately 109 to 141 days (“The Chinese Academy of Sciences Research Station of Baekdu Mountain forest ecosystem”, 2019).

2.3. Dataset

2.3.1. Data Acquisition

The SM dataset used in this study was obtained from the National Ecological Science Data Center and spans the period from 2003 to 2010 at the Changbai Mountain Station [37]. SM was measured at depths of 5 cm, 20 cm, and 50 cm using CS616 frequency domain reflectometry (FDR) sensors (Campbell Scientific Inc., Logan, UT, USA), which recorded volumetric water content (VWC, m³·m⁻³) at hourly intervals. To ensure measurement reliability, a two-step calibration process was employed. Factory calibration was first conducted by the sensor manufacturer. Additionally, field calibration was periodically performed by comparing sensor readings with gravimetric soil moisture measurements obtained via the oven-drying method, following Chinese national standard GB/T 33705-2017 [38]. Meteorological variables were collected using a series of high-precision instruments. Air temperature and relative humidity were measured by HMP45C sensors (VAISALA, Vantaa, Finland), while wind speed was recorded by A100R anemometers (Vector Instruments, North Wales, UK). Precipitation was measured by a Model 52203 tipping-bucket rain gauge (RM YOUNG, Traverse City, MI, USA). Global solar radiation was monitored using a CM11 pyranometer (KIPP & ZONEN, Delft, The Netherlands). Soil temperature was measured by vertically embedded 105T thermistor probes (Campbell Scientific Inc., Logan, UT, USA) at five depths: 0 cm (surface), 5 cm, 20 cm, 50 cm, and 100 cm. A complete list of meteorological variables is provided in Table S1, with wind speed and direction summarized in Table S2.

2.3.2. Data Preprocessing

To ensure data quality, preprocessing steps included outlier removal, missing value filling, and standardization. Outlier removal uses the Tukey’s fences method (threshold = 1.5 × interquartile range) applied to hourly aggregated data. Outliers were identified using the Tukey’s fences method, which detects values lying beyond 1.5 times the interquartile range (IQR) from the first (Q1) and third quartiles (Q3). Specifically, hourly-aggregated data were used to compute Q1 and Q3 for each variable, and any data points below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR were flagged as outliers. Visual inspection using time-series plots and boxplots was conducted to identify and interpret outliers. For missing value treatment, linear interpolation was applied to data segments with gaps shorter than 2 h. For longer-term missing meteorological data, imputation was preferentially carried out using observations from nearby meteorological stations. If such data were unavailable, the missing values were filled using the average diurnal variation method. To eliminate the influence of variable dimensions, the data were uniformly standardized with a Z score [39]. The dataset was divided into a training set (2003–2008) and a validation set (2009–2010) to prevent temporal leakage. Additionally, 70% of the processed data were randomly selected for training and 30% for validation to ensure model robustness.

2.4. Research Methods

2.4.1. Correlation Analysis

The Pearson correlation coefficient, calculated using Equation (S2), was applied to quantify the linear relationship between pairs of variables. The coefficient value ranges from −1 to 1, where a larger absolute value indicates a stronger relationship. Table S3 shows the corresponding strength of linearity for various correlation coefficient values [40].

2.4.2. K-Means Clustering

The K-means clustering algorithm partitions data into k clusters by calculating the Euclidean distance (Equation (S6)) between data points and cluster centroids. The flow of the algorithm is illustrated in Figure 2. The optimal number of clusters (k) was determined using the elbow method, which assesses the reduction in the sum of squared errors (SSE) (Equation (S7)). As k increases, the rate of decrease in SSE begins to flatten, with a significant reduction in SSE observed at k = 3. This indicates that further increasing k offers diminishing improvements in clustering performance. Based on the elbow method, we selected k = 3 as the optimal number of clusters and combined the three clusters with the MLP neural network, thereby enhancing the accuracy of soil moisture estimation.

2.4.3. MLP

The MLP neural network is a type of feedforward network structure capable of mapping multiple inputs to corresponding outputs, as illustrated in Figure S2. The MLP employs a backpropagation algorithm, iteratively adjusting weights to minimize the error between predicted and actual outputs until the error converges below a pre-specified threshold.

2.4.4. ARIMA

The ARIMA model is commonly used for its efficacy in forecasting time-series data with dynamic, random patterns. ARIMA models combine autoregressive (AR), integrated (I), and moving average (MA) components, denoted as ARIMA (p, d, q). The ARIMA structure is illustrated in Figure S3, while Equation (S8) presents the general ARIMA formula [31]. To predict SM at various depths, a systematic approach was employed to determine the optimal ARIMA parameters: First, the stationarity of each SM time series was evaluated using the augmented Dickey–Fuller (ADF) test. For non-stationary series (p > 0.01), first-order differencing (d = 1) was applied. If stationarity was not achieved, higher-order differencing was considered. Next, a grid search was performed to identify the optimal combinations of p and q, guided by Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC). The model with the lowest AIC and BIC values was selected for each soil depth.

2.5. Evaluation Metrics

The predictive accuracy of each model was assessed using several statistical metrics: the R², RMSE, MAE, and RRMSE. Calculations are detailed in Equations (S9)–(S12).

3. Results

3.1. Data Processing

Figure 3a displays the sequence diagram of SM and precipitation at the Changbai Mountain Station for soil depths of 5, 20, and 50 cm. Figure 3b provides the corresponding box plots of SM values at each depth. The integrated SM dataset initially included several standard missing value codes of −99999, representing placeholders defined by the Changbai Mountain meteorological station to indicate sensor malfunctions.

These gaps, both random and continuous, were determined to have minimal impact on the overall analysis and were removed directly. Missing values for outliers were filled with the mean. Figure 3a highlights the presence of numerous outliers at the 5 cm and 20 cm depths, especially following significant precipitation events. However, since SM values typically range between 0 and 0.7 m³·m⁻³, and given the context of these outliers, they were deemed reasonable and retained. After data cleaning, the dataset comprised 2717 samples. The 23 predictors for SM at Changbai Mountain Station span eight magnitudes; Z-score normalization was therefore applied for dimension comparability. Table S4 summarizes the maximum, minimum, mean, and standard deviation of each SM variable.

3.2. Analysis of Factors Influencing SM

The relationships between SM at depths of 5, 20, and 50 cm and various predictors were examined using Pearson correlation coefficients, presented in Table S5. Overall, the correlations between predictors and SM were moderate, with minimum air temperature (Ta) emerging as the strongest predictor. Specifically, the correlation coefficients for minimum Ta with SM were 0.415 at 5 cm, 0.616 at 20 cm, and 0.555 at 50 cm. These values indicate a significant correlation between minimum Ta and SM at 20 and 50 cm, though the relationship is weaker at 5 cm. While air pressure was negatively correlated with SM, other variables, including mean air temperature, maximum air temperature, mean surface temperature, minimum surface temperature, 0 cm ST, and 5 cm ST displayed significant positive correlations with SM at 20 and 50 cm but weaker correlations at 5 cm.

3.3. SM Estimation Based on MLP Model and K-MLP Model

The SM dataset was utilized to train an MLP neural network. As shown in Figure S5, the model identified the maximum surface air temperature at 2 m as the most influential predictor, followed by soil temperature (ST) at the first (0 cm), second (5 cm), and fifth (100 cm) layers. Other significant variables included average surface temperature, barometric pressure metrics, month, and evapotranspiration, while variables such as precipitation, wind speed, and relative humidity contributed less. Model evaluation was conducted using a randomly selected set of 1000 samples, with observed versus predicted SM values shown in Figure 4a–c. These graphs indicate discrepancies between model predictions and actual SM values, highlighting areas where model performance could be improved.

The SM dataset was utilized to train an MLP neural network. The K-MLP model enhances SM estimation by incorporating data clustering in preprocessing to identify underlying patterns, which improves the dataset’s suitability for machine-learning algorithms. The optimal number of clusters was determined to be three, as shown in Figure 2.

3.4. SM Forecasting Using TSA

The ARIMA model was applied to predict SM at depths of 5 cm, 20 cm, and 50 cm. After training, optimal configurations were identified as ARIMA (0, 1, 6) for 5 cm, ARIMA (1, 1, 2) for 20 cm, and ARIMA (2, 1, 1) for 50 cm. A sample of 1000 data points was used to compare observed and predicted daily SM values, as illustrated in Figure 5. The predicted SM values closely aligned with observed data across all depths, indicating a high degree of model accuracy. The R² values for each depth—0.9677 for 5 cm, 0.9853 for 20 cm, and 0.9684 for 50 cm—demonstrate ARIMA’s strong predictive capabilities, with R² exceeding 0.96 at each level. Forecasting accuracy was highest at 20 cm, followed by 50 cm and 5 cm, suggesting that SM at 20 cm is less susceptible to rapid climatic shifts compared to shallower depths, leading to more stable and predictable trends.

Figure 6a–f present the fitting plots of observed vs. predicted SM values using both uncorrected and corrected ARIMA models across all three depths. At 5 cm depth, the uncorrected model achieved an R² of 0.9164 and an RMSE of 0.0321 m³·m⁻³; post-correction, R² improved to 0.9677, and RMSE decreased to 0.0198 m³·m⁻³. For 20 cm, correction raised R² from 0.9604 to 0.9853 and reduced RMSE from 0.0242 m³·m⁻³ to 0.0146 m³·m⁻³. At 50 cm, the uncorrected model yielded the lowest R² at 0.8978, which significantly improved to 0.9684 post-correction.

3.5. Comparison of SM Forecasting Models

The performance of the ARIMA model was compared with that of the K-MLP model for predicting SM dynamics using meteorological data, as shown in Figure 7. The K-MLP model achieved a maximum R² of 0.728, which was lower than that of the ARIMA model. This lower R² is primarily due to the delayed response of SM to precipitation, where rainfall effects are influenced by soil infiltration rates, yielding higher forecasting accuracy, achieving a maximum R² of 0.985, and demonstrating a superior fit [27]. The ARIMA model consistently produced lower error values across all evaluation metrics—RMSE, MAE, and RRMSE—highlighting its stronger predictive capability.

4. Discussion

Observations show that the standard deviation of ST decreases with increasing depth, while SM at 20 cm exhibits the lowest standard deviation [41]. Conversely, the mean SM value increases with increasing depth. SM at 50 cm remains relatively stable, typically ranging between 0.30 and 0.45 m³·m⁻³, which reflects the soil’s water-holding capacity in the study area. Following heavy rainfall, SM at 5 cm and 20 cm increases sharply, while deeper layers at 50 cm respond more slowly. During dry periods, deeper soil layers retain more moisture than surface layers [42].

Monthly SM trends for four selected years (2003, 2005, 2007, and 2008) in Figure S4 demonstrate that SM at 50 cm varies slowly, with levels peaking between April and August. At shallower depths, SM peaks in May, declines, and then rises again in July, reflecting rainfall variability during the summer months. The MLP model’s performance varied with depth, as demonstrated by model evaluation using 1000 randomly selected samples in Figure 4a–c. Key predictors—including surface air temperature, ST, barometric pressure, month, and evapotranspiration—were strongly associated with SM, while precipitation, wind speed, and relative humidity had weaker effects [43].

The K-MLP model, which incorporates K-means clustering into the MLP framework, yielded improved predictive performance compared to the original MLP [44]. Figure S6 displays the relative importance of independent variables in each of the three clustered MLP models. In Cluster 1, four STs emerged as the most influential variables, followed by evapotranspiration and month. In Cluster 2, ST and air temperature were dominant. In Cluster 3, these variables remained relevant but had diminished importance, indicating a weaker influence on SM dynamics in that group. In Figure 4d–f, the K-MLP model achieved the highest predictive accuracy at 20 cm (R² = 0.728) and the lowest at 5 cm (R² = 0.608).

The relatively low accuracy at 5 cm (R² = 0.608) is attributed to the rapid response of surface soil to precipitation. Near-surface layers (0–10 cm) have higher hydraulic conductivity, making them more sensitive to rainfall and evaporation [45]. Following precipitation, moisture at 5 cm increases quickly due to shallow depth, but is rapidly lost through surface evaporation and lateral drainage. This results in short-term variability and instability, which complicates prediction. By contrast, soil moisture variations in deeper layers exhibit a clear lag effect [46]. At 20 cm, the predictive accuracy of the K-MLP model improved substantially (R² = 0.728), primarily due to the slower rate of moisture movement at greater depths and the reduced gradient of soil matric potential. At 50 cm, the soil is further buffered by slower infiltration, resulting in minimal variation. This aligns with the K-MLP model’s low sensitivity to deep soil moisture dynamics at 50 cm (Figure S6, Cluster 3). The RMSE values support this depth-based pattern: the smallest error was recorded at 50 cm (RMSE = 0.006 m³·m⁻³), followed by 20 cm (RMSE = 0.015 m³·m⁻³), and the highest at 5 cm (RMSE = 0.02 m³·m⁻³), confirming that model performance improves with depth.

Numerous studies have explored a wide range of strategies for SM prediction, including physically based models, water-balance approaches, and advanced machine-learning techniques. LSTM networks have demonstrated strong performance in capturing long-term temporal dependencies across various soil profiles [47]. However, LSTM typically demands large amounts of training data and considerable computational resources. When applied to small datasets, LSTM is prone to overfitting, which limits its generalizability and practical use in data-scarce environments [48]. By contrast, the models we present here based on MLP and ARIMA strike a more practical balance between accuracy, computational efficiency, and model interpretability, particularly in scenarios with limited historical observational data. This comparison highlights the robustness and feasibility of our modeling framework. The competitiveness of this trade-off is supported by the model’s high predictive accuracy: ARIMA achieved an R² of 0.9684 at the 20 cm depth.

The ARIMA model’s autoregressive component effectively captures the correlations between past and future observations, making it well-suited for short-term SM forecasting [4]. In this study, the optimized ARIMA models demonstrated consistent predictive performance across all soil depths. The best overall model fit was observed at the 20 cm depth (R² = 0.9853, RMSE = 0.015 m³·m⁻³), while the weakest performance occurred at 5 cm. The variation in model accuracy across depths stems not only from lagged precipitation effects but also from inherent soil properties that affect infiltration rates and water retention [49]. The shallow layer (5 cm) responds quickly to short-term meteorological fluctuations such as precipitation and evaporation, introducing high-frequency noise from surface perturbations [50]. Capturing this variability required a moving-average term of order six (q = 6), but fitting so many parameters on limited data raises uncertainty. The 20 cm layer lies between the dynamic surface layer and the more stable deeper layer. Here, an ARIMA (1,1,2) structure—combining one autoregressive term and two moving-average terms—balances the “memory” of past moisture and stochastic shocks [51]. This configuration aligns with the model’s linear framework and yields strong autocorrelation, explaining its superior performance at this depth.

5. Conclusions

This study examined SM at the Changbai Mountain Station, utilizing a robust SM dataset enhanced through preprocessing techniques such as rejection, interpolation, and normalization. Pearson correlation analysis assessed the relationship between SM and predictors. Predictive models were developed using the K-MLP model and ARIMA model, yielding the following key insights:

(1): SM distribution: deeper soil layers have higher mean SM values and less variability compared to shallower layers. Seasonal rainfall patterns lead to two peaks in SM at the 5 cm and 20 cm depths, typically around May and July.
(2): Correlation with predictors: SM exhibits positive correlations with air temperature, relative humidity, evaporation, surface temperature, and ST and a negative correlation with air pressure. The strongest correlation between SM and predictors occurs at 20 cm depth, while the weakest is at 5 cm.
(3): The K-MLP model outperformed the MLP model, achieving a maximum R² of 0.728. The ARIMA model achieved high accuracy with R² values exceeding 0.96, particularly at 20 cm.

In conclusion, the ARIMA model proved to be a more robust tool for short-term SM forecasting.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/horticulturae11050492/s1, Figure S1: Flowchart of K-means clustering algorithm; Figure S2: MLP neural network architecture; Figure S3: Flowchart of ARIMA model; Figure S4: Monthly average variation curves of ST in 2003 (a), 2005 (b), 2007 (c), and 2008 (d); Figure S5: Importance chart of MLP model independent variables for predicting SM; Figure S6: The Elbow method; Figure S7: Importance diagram of independent variables of MLP for the first cluster (a), for the second cluster (b), and the third cluster (c); Table S1: Brief introduction of meteorological variables; Table S2: The encoded information of wind speed and direction; Table S3: The correspondence between the absolute value of the correlation coefficient and the degree of a linear relationship; Table S4: The statistics of the variables in the Changbai Mountain dataset; Table S5: Correlation coefficients between prediction factor and SM at different depths.

Author Contributions

Conceptualization, R.L. and S.W.; methodology, R.L.; software, H.W. and H.D.; validation, R.L., S.W. and H.C.; formal analysis, R.L.; investigation, D.K.; resources, H.L.; data curation, H.C.; writing—original draft preparation, R.L.; writing—review and editing, S.W. and D.S.Z.; visualization, S.W.; supervision, H.C.; project administration, R.L.; funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Heilongjiang Province “Double First Class” Discipline Collaborative Innovation Achievement Project (LJGXCG2023-066) and (LJGXCG2024-F03).

Data Availability Statement

The original contributions presented in this study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Ta	Air temperature
GEP	Gene expression programming
GMDH	Group method of data handling
GRNN	Generalized regression neural network
K-MLP	Integration of K-means clustering and multilayer perception model
MAE	Mean absolute error
ML	Machine learning
MLP	Multilayer perception
NLR	Nonlinear regression
PCA	Principal component analysis
R²	Coefficient of determination
RBNN	Radial basis functions neural network
RMSE	Root mean square error
RRMSE	Relative root mean square error
Sd	Standard deviation
SM	Soil moisture
SMR	Stepwise multiple regression
SSE	Sum of squares of errors
ST	Soil temperature
SVM	Support vector machine
TSA	Time-series analysis

References

Zhu, L.; Liu, Y.; Jiao, F. Time series analysis of spatial variability of soil moisture in Loess Hilly Region. Procedia Earth Planet. Sci. 2012, 5, 346–353. [Google Scholar] [CrossRef]
Wang, Z.; Duan, L.; Han, Y.; Ji, J.; Shi, Q.; Yao, R.; Luo, Y.; Liu, T. Hydrology–Soil–Vegetation Element Interactions in the Largest Single-Port Artesian Irrigation Area of Asia. Land 2024, 13, 2099. [Google Scholar] [CrossRef]
Adeyemi, O.; Grove, I.; Peets, S.; Norton, T. Advanced monitoring and management systems for improving sustainability in precision irrigation. Sustainability 2017, 9, 353. [Google Scholar] [CrossRef]
Filipović, N.; Brdar, S.; Mimić, G.; Marko, O.; Crnojević, V. Regional soil moisture prediction system based on Long Short-Term Memory network. Biosyst. Eng. 2022, 213, 30–38. [Google Scholar] [CrossRef]
Massari, C.; Camici, S.; Ciabatta, L.; Brocca, L. Exploiting satellite-based surface soil moisture for flood forecasting in the Mediterranean area: State update versus rainfall correction. Remote Sens. 2018, 10, 292. [Google Scholar] [CrossRef]
Niu, L.; Wang, Z.; Zhu, G.; Yu, K.; Li, G.; Long, H. Stable soil moisture improves the water use efficiency of maize by alleviating short-term soil water stress. Front. Plant Sci. 2022, 13, 833041. [Google Scholar] [CrossRef]
Dubois, A.; Teytaud, F.; Verel, S. Short term soil moisture forecasts for potato crop farming: A machine learning approach. Comput. Electron. Agr. 2021, 180, 105902. [Google Scholar] [CrossRef]
Liu, H.; Wan, X.; Cui, J.; Cai, T.; Yang, Y. Moisture and temperature prediction in tillage layer based on deep reinforcement learning. J. South China Agric. Univ. 2023, 44, 84–92. [Google Scholar] [CrossRef]
Zhao, Y.; Qian, C.; Shi, H. Temporal and Spatial Variation Characteristics of Soil Moisture in North China in Recent 50 Years and Its Future Prediction. Nat. Sci. 2021, 9, 954. [Google Scholar] [CrossRef]
Taheri, M.; Anboohi, M.S.; Nasseri, M.; Bigdeli, M.; Mohammadian, A. Quantifying a Reliable Framework to Estimate Hydro-Climatic Conditions via a Three-Way Interaction between Land Surface Temperature, Evapotranspiration, Soil Moisture. Atmosphere 2022, 13, 1916. [Google Scholar] [CrossRef]
Ivanov, V.Y.; Fatichi, S.; Jenerette, G.D.; Espeleta, J.F.; Troch, P.A.; Huxman, T. Hysteresis of soil moisture spatial heterogeneity and the “homogenizing” effect of vegetation. Water Resour. Res. 2010, 46, 1–15. [Google Scholar] [CrossRef]
Ahmed, S.; Chakrabortty, R.K.; Essam, D.L.; Ding, W. Poly-linear regression with augmented long short term memory neural network: Predicting time series data. Inf. Sci. 2022, 606, 573–600. [Google Scholar] [CrossRef]
Araújo, A. Polynomial regression with reduced over-fitting—The PALS technique. Measurement 2018, 124, 515–521. [Google Scholar] [CrossRef]
Huang, X.; Shi, Z.; Zhu, H.; Zhang, H.; Ai, L.; Yin, W. Soil moisture dynamics within soil profiles and associated environmental controls. Catena 2016, 136, 189–196. [Google Scholar] [CrossRef]
Baatz, R.; Hendricks Franssen, H.-J.; Han, X.; Hoar, T.; Bogena, H.R.; Vereecken, H.J.H.; Sciences, E.S. Evaluation of a cosmic-ray neutron sensor network for improved land surface model prediction. Hydrol. Earth Syst. Sci. 2017, 21, 2509–2530. [Google Scholar] [CrossRef]
Then, Y.L.; You, K.Y.; Dimon, M.N.; Lee, C.Y.J.M. A modified microstrip ring resonator sensor with lumped element modeling for soil moisture and dielectric predictions measurement. Measurement 2016, 94, 119–125. [Google Scholar] [CrossRef]
Wang, S.; Li, R.; Wu, Y.; Wang, W. Estimation of surface soil moisture by combining a structural equation model and an artificial neural network (SEM-ANN). Sci. Total Environ. 2023, 876, 162558. [Google Scholar] [CrossRef]
Deng, L.; Wu, Y.; Hu, X.; Liang, L.; Ding, Y.; Li, G.; Zhao, G.; Li, P.; Xie, Y. Rethinking the performance comparison between SNNS and ANNS. Neural Netw. 2020, 121, 294–307. [Google Scholar] [CrossRef]
Cai, Y.; Zheng, W.; Zhang, X.; Zhangzhong, L.; Xue, X. Research on soil moisture prediction model based on deep learning. PLoS ONE 2019, 14, e0214508. [Google Scholar] [CrossRef]
Wang, G.; Han, Y.; Chang, J. Research on soil moisture content combination prediction model based on ARIMA and BP neural networks. Adv. Control Appl. 2024, 6, e139. [Google Scholar] [CrossRef]
Bayir, R.; Albayrak, A. The Determination of the Developments of Beehives via Artificial Neural Networks. Teh Vjesn 2018, 25, 553–557. [Google Scholar] [CrossRef]
Ramseyer, C.A.; Mote, T.L. Atmospheric controls on Puerto Rico precipitation using artificial neural networks. Clim Dynam 2016, 47, 2515–2526. [Google Scholar] [CrossRef]
Li, Q.; Zhang, C.; Shangguan, W.; Li, L.; Dai, Y. A novel local-global dependency deep learning model for soil mapping. Geoderma 2023, 438, 116649. [Google Scholar] [CrossRef]
Liu, J.; Yang, Z.; Liu, Y.; Mu, C. Hyperspectral remote sensing images deep feature extraction based on mixed feature and convolutional neural networks. Remote Sens. 2021, 13, 2599. [Google Scholar] [CrossRef]
Li, T.-B.; Su, Y.-T.; Song, D.; Li, W.-H.; Wei, Z.-Q.; Liu, A.-A. Multi-Scale Spatial-Temporal Transformer for Meteorological Variable Forecasting. IEEE Trans. Circuits Syst. Video Technol. 2024, 35, 2474–2486. [Google Scholar] [CrossRef]
Kornelsen, K.C.; Coulibaly, P. Root-zone soil moisture estimation using data-driven methods. Water Resour. Res. 2014, 50, 2946–2962. [Google Scholar] [CrossRef]
Kim, J.; Kim, Y.; Jeong, S.; Hong, M. Rainfall-induced landslides by deficit field matric suction in unsaturated soil slopes. Environ. Earth Sci. 2017, 76, 1–17. [Google Scholar] [CrossRef]
Anagnostopoulos, G.G.; Fatichi, S.; Burlando, P. An advanced process-based distributed model for the investigation of rainfall-induced landslides: The effect of process representation and boundary conditions. Water Resour. Res. 2015, 51, 7501–7523. [Google Scholar] [CrossRef]
Menberu, M.W.; Haghighi, A.T.; Ronkanen, A.; Marttila, H.; Kløve, B. Effects of drainage and subsequent restoration on peatland hydrological processes at catchment scale. Water Resour. Res. 2018, 54, 4479–4497. [Google Scholar] [CrossRef]
Ponkina, E.; Illiger, P.; Krotova, O.; Bondarovich, A. Do ARMA models provide better gap filling in time series of soil temperature and soil moisture? The case of arable land in the Kulunda Steppe, Russia. Land 2021, 10, 579. [Google Scholar] [CrossRef]
Wang, G.; Zhuang, L.; Mo, L.; Yi, X.; Wu, P.; Wu, X. BAG: A linear-nonlinear hybrid time series prediction model for soil moisture. Agriculture 2023, 13, 379. [Google Scholar] [CrossRef]
Wang, G.; Su, H.; Mo, L.; Yi, X.; Wu, P. Forecasting of soil respiration time series via clustered ARIMA. Comput. Electron. Agr. 2024, 225, 109315. [Google Scholar] [CrossRef]
Yao, S.X.; Zhao, C.C. Application of time series analysis in soil moisture of fixed dune on Korqin sandy land, northern China. Glob. NEST J. 2020, 22, 471–476. [Google Scholar] [CrossRef]
Behrangi, A.; Khakbaz, B.; Jaw, T.C.; AghaKouchak, A.; Hsu, K.; Sorooshian, S. Hydrologic evaluation of satellite precipitation products over a mid-size basin. J. Hydrol. 2011, 397, 225–237. [Google Scholar] [CrossRef]
Jiang, S.; Chen, G.; Chen, D.; Chen, T. Application and evaluation of an improved LSTM model in the soil moisture prediction of southeast chinese tobacco-producing areas. J. Indian Soc. Remote Sens. 2022, 51, 1843–1853. [Google Scholar] [CrossRef]
Liu, S.; Yang, J.; Zhang, X.; Drury, C.; Reynolds, W.; Hoogenboom, G. Modelling crop yield, soil water content and soil temperature for a soybean–maize rotation under conventional and conservation tillage systems in Northeast China. Agric. Water Manag. 2013, 123, 32–44. [Google Scholar] [CrossRef]
ChinaFLUX. Meteorological Data from Changbai Mountain Station Spanning the Period 2003–2010. Available online: https://www.nesdc.org.cn/sdo/detail?id=5ff3c0c2042ebb1f26aff7e0 (accessed on 27 April 2025).
GB/T 33705-2017; Soil Moisture Observation—Frequency Domain Reflectometry Method. China Standards Press: Beijing, China, 2017.
Sun, H.; Cui, Y. Evaluating downscaling factors of microwave satellite soil moisture based on machine learning method. Remote Sens. 2021, 13, 133. [Google Scholar] [CrossRef]
Mu, Y.; Liu, X.; Wang, L.J.I.S. A Pearson’s correlation coefficient based decision tree and its parallel implementation. Inf. Sci. 2018, 435, 40–58. [Google Scholar] [CrossRef]
Li, C.F.; Shi, X.B.; Zhou, Z.Y.; Li, J.B.; Geng, J.H.; Chen, B. Depths to the magnetic layer bottom in the South China Sea area and their tectonic implications. Geophys. J. Int. 2010, 182, 1229–1247. [Google Scholar] [CrossRef]
Li, Z.; Vanderborght, J.; Smits, K.M. The effect of the top soil layer on moisture and evaporation dynamics. Vadose Zone J. 2020, 19, e20049. [Google Scholar] [CrossRef]
Fan, K.K.; Zhang, Q.; Singh, V.P.; Sun, P.; Song, C.Q.; Zhu, X.D.; Yu, H.Q.; Shen, Z.X. Spatiotemporal impact of soil moisture on air temperature across the Tibet Plateau. Sci. Total Environ. 2019, 649, 1338–1348. [Google Scholar] [CrossRef]
Bressane, A.; Garcia, A.J.D.; de Castro, M.V.; Xerfan, S.D.; Ruas, G.; Negri, R.G. Fuzzy Machine Learning Applications in Environmental Engineering: Does the Ability to Deal with Uncertainty Really Matter? Sustainability 2024, 16, 4525. [Google Scholar] [CrossRef]
Lu, T.; Zheng, C.; Zhou, B.; Wu, J.; Wang, X.K.; Zhao, Y.; Liu, X.H.; Yuan, W.Q. Response of Liquid Water and Vapor Flow to Rainfall Events in Sandy Soil of Arid and Semi-Arid Regions. Agronomy 2023, 13, 2424. [Google Scholar] [CrossRef]
Na, L.; Na, R.S.; Bao, Y.B.; Zhang, J.Q. Time-Lagged Correlation between Soil Moisture and Intra-Annual Dynamics of Vegetation on the Mongolian Plateau. Remote Sens. 2021, 13, 1527. [Google Scholar] [CrossRef]
Li, Q.; Zhu, Y.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma 2022, 409, 115651. [Google Scholar] [CrossRef]
Wang, Y.; Zha, Y. Comparison of transformer, LSTM and coupled algorithms for soil moisture prediction in shallow-groundwater-level areas with interpretability analysis. Agric. Water Manag. 2024, 305, 109120. [Google Scholar] [CrossRef]
Li, H.; Zhao, Y.; Qi, Y.; Jiang, Y.; Boyer, E.W.; Mello, C.R.; Guo, L. Incorporating Catchment Attributes Grouping into Model Parameter Regionalization to Enhance Root Zone Soil Moisture Estimation. Water Resour. Manag. 2025, 1–18. [Google Scholar] [CrossRef]
Gao, L.; Shao, M. Temporal stability of soil water storage in diverse soil layers. Catena 2012, 95, 24–32. [Google Scholar] [CrossRef]
Biswas, A. Season- and depth-dependent time stability for characterising representative monitoring locations of soil water storage in a hummocky landscape. Catena 2014, 116, 38–50. [Google Scholar] [CrossRef]

Figure 1. Design flowchart of K-MLP and ARIMA models.

Figure 2. The elbow method.

Figure 3. Sequence diagram of SM and precipitation at Changbai Mountain Station (a) and box plot of SM at various depths (b).

Figure 4. SM estimation using an MLP model at depths of 5 cm (a), 20 cm (b), and 50 cm (c) and a K-MLP model at depths of 5 cm (d), 20 cm (e), and 50 cm (f).

Figure 5. Time-series model forecasting data for SM at depths of (a) 5 cm, (b) 20 cm, and (c) 50 cm.

Figure 6. Comparison of observed and predicted SM values using both uncorrected and corrected ARIMA models for depths of 5 cm (a,b), 20 cm (c,d), and 50 cm (e,f).

Figure 7. Evaluation indicators for SM forecasting models: R² (a), RMSE (b), MAE (c), and RRMSE (d) for MLP, K-MLP, and ARIMA models.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, R.; Wang, S.; Wu, H.; Dong, H.; Kong, D.; Li, H.; Zhang, D.S.; Chen, H. Non-Linear Models for Assessing Soil Moisture Estimation. Horticulturae 2025, 11, 492. https://doi.org/10.3390/horticulturae11050492

AMA Style

Li R, Wang S, Wu H, Dong H, Kong D, Li H, Zhang DS, Chen H. Non-Linear Models for Assessing Soil Moisture Estimation. Horticulturae. 2025; 11(5):492. https://doi.org/10.3390/horticulturae11050492

Chicago/Turabian Style

Li, Rui, Susu Wang, Han Wu, Hao Dong, Dezhi Kong, Hanxue Li, Dorothy S. Zhang, and Haitao Chen. 2025. "Non-Linear Models for Assessing Soil Moisture Estimation" Horticulturae 11, no. 5: 492. https://doi.org/10.3390/horticulturae11050492

APA Style

Li, R., Wang, S., Wu, H., Dong, H., Kong, D., Li, H., Zhang, D. S., & Chen, H. (2025). Non-Linear Models for Assessing Soil Moisture Estimation. Horticulturae, 11(5), 492. https://doi.org/10.3390/horticulturae11050492

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Linear Models for Assessing Soil Moisture Estimation

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Design

2.2. Study Area

2.3. Dataset

2.3.1. Data Acquisition

2.3.2. Data Preprocessing

2.4. Research Methods

2.4.1. Correlation Analysis

2.4.2. K-Means Clustering

2.4.3. MLP

2.4.4. ARIMA

2.5. Evaluation Metrics

3. Results

3.1. Data Processing

3.2. Analysis of Factors Influencing SM

3.3. SM Estimation Based on MLP Model and K-MLP Model

3.4. SM Forecasting Using TSA

3.5. Comparison of SM Forecasting Models

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI