A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination

Zhao, Jiawei; Tian, Peng; Sun, Jihong; Wang, Xinrui; Deng, Changjun; Yang, Yunlei; Zhang, Haokai; Qian, Ye

doi:10.3390/agronomy15051180

Open AccessArticle

A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination

by

Jiawei Zhao

¹,

Peng Tian

^1,2

,

Jihong Sun

^1,2,

Xinrui Wang

^1,2,

Changjun Deng

³,

Yunlei Yang

³,

Haokai Zhang

⁴ and

Ye Qian

^1,2,*

¹

College of Big Data, Yunnan Agricultural University, Kunming 650201, China

²

The Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province, Yunnan Agricultural University, Kunming 650201, China

³

Yunnan Hanzhe Science & Technology Co., Ltd., Kunming 650101, China

⁴

Engineering College, China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(5), 1180; https://doi.org/10.3390/agronomy15051180

Submission received: 5 April 2025 / Revised: 5 May 2025 / Accepted: 10 May 2025 / Published: 13 May 2025

(This article belongs to the Section Water Use and Irrigation)

Download

Browse Figures

Versions Notes

Abstract

:

Soil pore water electrical conductivity (EC), as a comprehensive indicator of soil nutrient status, is closely linked to crop growth and development. Accurate prediction of pore water EC is therefore essential for informed and scientific crop management. This study focuses on a greenhouse rose cultivation site in Jiangchuan District, Yuxi City, Yunnan Province, China. Leveraging multi-parameter sensors deployed within the facility, we collected continuous soil data (temperature, moisture, EC, and pore water EC) and meteorological data (air temperature, humidity, and vapor pressure deficit) from January to December of 2024. We propose a hybrid prediction model—PSO–CNN–LSTM–BOA–XGBoost (PCLBX)—that integrates a particle swarm optimization (PSO)-enhanced convolutional LSTM (CNN–LSTM) with a Bayesian optimization algorithm-tuned XGBoost (BOA–XGBoost). The model utilizes highly correlated environmental variables to forecast soil pore water EC. The experimental results demonstrate that the PCLBX model achieves a mean square error (MSE) of 0.0016, a mean absolute error (MAE) of 0.0288, and a coefficient of determination (R²) of 0.9778. Compared to the CNN–LSTM model, MSE and MAE are reduced by 0.0001 and 0.0014, respectively, with an R² increase of 0.0015. Against the BOA–XGBoost model, PCLBX yields a reduction of 0.0006 in MSE and 0.0061 in MAE, alongside a 0.0077 improvement in R². Furthermore, relative to an equal-weight ensemble of CNN–LSTM and BOA–XGBoost, the PCLBX model shows improved performance, with MSE and MAE decreased by 0.0001 and 0.0005, respectively, and R² increased by 0.0007. These results underscore the superior predictive capability of the PCLBX model over individual and ensemble baselines. By enhancing the accuracy and robustness of soil pore water EC prediction, this model contributes to a deeper understanding of soil physicochemical dynamics and offers a scalable tool for intelligent perception and forecasting. Importantly, it provides agricultural researchers and greenhouse managers with a deployable and generalizable framework for digital, precise, and intelligent management of soil water and nutrients in protected horticulture systems.

Keywords:

LSTM; XGBoost; PSO; PCLBX model; soil pore water EC prediction

1. Introduction

Soil health is a complex and multidimensional indicator that cannot be directly measured using a single approach. Instead, it requires a comprehensive assessment of various structural attributes, including soil temperature, moisture, pH, and electrical conductivity (EC). These properties are not only critical to the intrinsic health of the soil but also play a pivotal role in explaining the intricate processes that ultimately lead to biodiversity loss and soil and water degradation [1,2,3]. Located in southwestern China, Yunnan Province benefits from abundant solar radiation and favorable climatic conditions, making it a key hub for the national floriculture industry [4]. Within Yunnan, Yuxi City stands out as a core area for protected floriculture, characterized by widespread greenhouse cultivation of high-value ornamental crops such as roses. However, the intensive irrigation and fertilization practices commonly employed in greenhouses often result in the accumulation of soluble salts within the root zone, impairing root function and plant growth. Soil electrical conductivity (EC), as a proxy for ionic concentration in soil solution, reflects salt accumulation and nutrient availability. It is therefore a crucial indicator for assessing soil salinization, nutrient leaching, and the impact of irrigation water quality [5]. Without timely monitoring and forecasting of soil EC dynamics, the risks of growth inhibition and yield loss are significantly increased [6]. Consequently, there is an urgent need among agricultural producers and greenhouse managers for efficient EC prediction tools to enable precision irrigation and fertilization, improve water and nutrient use efficiency, and mitigate salinity-related risks.

Soil EC, as a key electrochemical property, provides critical insights into soil salinity, nutrient content, and moisture levels, making it a subject of increasing interest in recent research. Studies have shown that soil EC is positively correlated with salt content, serving as a comprehensive indicator of soil nutrient status [7]. Among the various EC measurements, pore water EC offers a more direct representation of the concentration of water-soluble salts in the soil, playing a crucial role in plant water and nutrient uptake. Consequently, it exerts a significant influence on crop growth, yield, and quality. Maintaining an optimal pore water EC is therefore essential for ensuring healthy crop development. Accurate prediction of pore water EC dynamics is of great significance for advancing precision agriculture, optimizing irrigation strategies, and enhancing both crop productivity and crop quality [8,9].

The rapid advancement of machine learning and deep learning techniques has facilitated the development of data-driven soil-moisture prediction models. Hosseinpour-Zarnaq et al. [10] employed a convolutional neural network (CNN) model to predict soil properties, comparing its performance with partial least squares (PLS) regression and first-order derivative preprocessing. Their findings demonstrated the feasibility of deep learning-based models in soil property assessment. Liang et al. [11] utilized the LightGBM model for soil nutrient status evaluation, achieving high reliability. Their approach not only accurately monitored soil nutrient levels but also provided insights into the correlation and significance of nutrient factors. Taşan et al. [12] conducted a correlation analysis to identify key soil variables (EC, pH, Na, K, N, etc.) and applied multiple machine learning models (ANN, DNN, RF, KNN, and AB) for comparative analysis in heavy metal content prediction in paddy soils. Sankalp et al. [13] employed a simple recurrent neural network (SimpleRNN) and a gated recurrent unit (GRU) to predict surface soil–moisture dynamics. Their results indicated that the GRU model outperformed SimpleRNN, demonstrating robust performance even with a reduced number of training parameters, thereby enabling efficient data training with minimal memory consumption. These studies collectively demonstrate that machine learning and deep learning algorithms have been widely applied in soil-moisture prediction tasks. The diverse strengths of different models—in feature-extraction capability, interpretability, and computational efficiency—offer robust methodological support for developing more accurate and resource-efficient models for agricultural environmental monitoring.

The existing studies have predominantly focused on the prediction of soil nutrients and moisture content, while relatively limited attention has been given to forecasting soil electrical conductivity (EC) in greenhouse environments. In the field of soil EC prediction, Samadianfard et al. [14] developed a hybrid machine learning model combining a multilayer perceptron (MLP) and the grey wolf optimizer (GWO) to predict soil EC, facilitating the monitoring of crop health, agricultural productivity, and adverse effects in farm management. Gao et al. [15] proposed a deep bidirectional long short-term memory (Bid–LSTM) network to predict soil EC and soil moisture (SM) in citrus orchards, providing insights for irrigation and fertilization strategies. Feng et al. [16] utilized unmanned aerial vehicles (UAVs) to monitor cotton growth, measuring and calibrating apparent field soil EC, and further quantifying soil texture and water-holding capacity based on soil characteristics. Additionally, convolutional neural networks (CNNs) were employed to analyze soil properties at different depths along with historical meteorological data, while a gated recurrent unit (GRU) model was used to predict cotton yield, exploring the ways in which UAV imagery and deep learning techniques can quantify the impacts of soil texture and climatic conditions on crop production. Although notable progress has been made in soil EC prediction, most existing studies have focused on open-field cultivation or surface soil layers. Limited attention has been directed toward the root-zone EC dynamics under greenhouse conditions, which constrains their applicability in precision management within protected agriculture systems.

In the domain of optimization algorithms, Fu et al. [17] employed machine learning techniques to predict toxic heavy metal concentrations in soil, proposing a whale optimization algorithm (WOA)-based spectral feature-extraction method. A comparative analysis with genetic algorithms (GA) and particle swarm optimization (PSO) demonstrated that WOA was the most effective approach for spectral feature selection. Rahman et al. [18] investigated the performance of PSO and GA, introducing a hybrid PSO–GA technique to optimize a soil classification-based crop planting decision tree model. Gao et al. [19] developed a convolutional neural network regression model (CNNR–IEGA) optimized by genetic algorithms, effectively integrating environmental data and chlorophyll fluorescence parameters to predict photosynthetic rate, significantly enhancing prediction accuracy and advancing precision agriculture. To address the challenges of large sample requirements, hyperparameter tuning, and limited interpretability in machine learning models, recent studies have shown that the extreme gradient boosting (XGBoost) algorithm offers high predictive accuracy and strong generalization capacity when applied to high-dimensional and nonlinear agricultural environmental data. However, its limitations include relatively weak performance in modeling time-series data, reliance on empirical or manual tuning of hyperparameters, and challenges in model interpretability [20,21]. Various optimization algorithms have demonstrated distinct advantages in agricultural data modeling, contributing to enhanced predictive performance and adaptability. The emerging research trends increasingly favor the integration of multiple optimization strategies to overcome the limitations of individual algorithms and to improve model robustness and practical utility.

Previous research has primarily focused on constructing and optimizing individual predictive models, yet their predictive accuracy remains to be further enhanced. To develop a more robust forecasting framework, researchers have explored ensemble models that integrate multiple individual models, leveraging their respective strengths in feature extraction and predictive performance. These ensemble approaches have demonstrated significant improvements in forecasting accuracy, particularly in addressing time-series prediction challenges. In the domain of ensemble modeling, prior studies have investigated various hybrid architectures. For instance, Yu et al. [22] combined convolutional neural networks (CNN) with gated recurrent units (GRU) to predict soil moisture in the root zone of maize, demonstrating that the CNN–GRU hybrid outperformed standalone CNN and GRU models in both predictive accuracy and convergence speed. Zhang et al. [23] introduced an EMD–LSTM model that integrates empirical mode decomposition (EMD) with long short-term memory (LSTM) networks, significantly enhancing the accuracy and sustainability of reagent-free water quality detection. Similarly, Alshingiti et al. [24] validated the computational efficiency and complexity-handling advantages of a CNN–LSTM hybrid model. Bagheri et al. [25] proposed a hybrid prediction framework that integrates data-driven machine learning with hydrological models, employing uncertainty quantification (UQ) and physics-informed neural networks (PINN). This approach effectively incorporates residual learning with physical constraints, markedly improving soil-moisture simulation accuracy. Additionally, Wu et al. [26] developed a PSO–LSTM hybrid model by integrating particle swarm optimization (PSO) with LSTM networks to predict soil moisture at varying depths in citrus orchards. The incorporation of PSO enhanced the predictive accuracy of the LSTM model. These studies collectively underscore the potential of hybrid modeling approaches in advancing predictive performance and addressing the limitations of individual models in time-series forecasting. Ensemble models, by integrating the strengths of multiple algorithms, have demonstrated superior accuracy and stability in handling complex agricultural time-series data. However, most ensembles adopt static weighting schemes that overlook the dynamic performance variations of constituent models over time, lacking a flexible and adaptive integration mechanism. The emerging research suggests that coupling ensemble models with optimization algorithms represents a promising direction for enhancing both predictive performance and practical applicability.

This study focuses on a rose greenhouse located in Jiangchuan District, Yuxi City, Yunnan Province, China—a representative region for protected floriculture, with a solid foundation of available data. Utilizing soil and meteorological data collected throughout the year 2024, we developed a data-driven, multi-model ensemble prediction framework—the PCLBX model. The predictive performance of the ensemble is contingent upon the integration efficacy existing between its constituent models. Particle swarm optimization (PSO), known for its fast convergence and minimal parameter requirements, is a widely adopted metaheuristic for neural network training and parameter tuning [27,28]. The PCLBX model leverages CNN–LSTM to extract deep temporal features, while the BOA–XGBoost component enhances nonlinear modeling capabilities. PSO is employed to dynamically assign time-dependent weights, enabling an adaptive integration of the CNN–LSTM and BOA–XGBoost sub-models. This architecture aims to achieve high-precision prediction of soil pore water electrical conductivity (EC). The principal advantages of this study are as follows:

(1): It addresses a practical challenge in protected agriculture by enabling precise forecasting of soil pore water EC;
(2): A hybrid model integrating deep learning and machine learning algorithms with adaptive optimization capability was developed, achieving a balance between predictive accuracy and generalization performance;
(3): It integrates feature extraction, hyperparameter optimization, and dynamic ensemble control into a unified framework, offering both theoretical insights and practical guidance for intelligent irrigation and fertilization management in greenhouse systems.

2. Materials and Methods

2.1. Data Sources and Processing

2.1.1. Data Sources

Data collection for this study was conducted in a rose greenhouse operated by Yuxi Heyun Horticulture Co., Ltd., located in Jiangchuan District, Yuxi City, Yunnan Province, China. The deployed sensors (model Escher2023-1), manufactured by Kunming Escher Technology Co., Ltd., Kunming City, China, are equipped with 4G IoT communication capabilities and were installed in a greenhouse primarily used for rose cultivation. These sensors are capable of real-time monitoring of multiple environmental variables, including air temperature, air humidity, VPD, soil temperature, soil moisture, soil EC, and soil pore water EC. Data were collected from 1 January to 19 December 2024, at 10 min intervals, using three synchronized sensor nodes. A total of 50,894 data records were obtained and transmitted in real time to the monitoring platform via a 4G network, using the MQTT protocol. High-frequency data acquisition facilitates the precise capture of rapid fluctuations in soil pore water EC driven by plant transpiration, water uptake, and salt transport processes, thereby enhancing the model’s responsiveness to abrupt changes and improving predictive accuracy. A representative subset of the soil and meteorological data is presented in Table 1.

2.1.2. Data Processing and Correlation Analysis

During data acquisition, sensor readings were subject to potential inaccuracies due to calibration drift and environmental influences such as ambient humidity and temperature fluctuations, resulting in a certain level of noise within the raw dataset. To mitigate the impact of inter-node sensor variability on model training, the soil and meteorological data collected from the three monitoring nodes were resampled by computing hourly averages. This downsampling approach ensured temporal consistency and smoothed short-term fluctuations. Using this procedure, a total of 8482 hourly records of environmental features were obtained. Data screening revealed a small number of missing values: 5 records each for air temperature, air humidity, soil temperature, soil moisture, soil EC, and soil pore water EC, and 8 records for VPD, yielding a total of 38 missing entries. To maintain consistency in data distribution between training and testing sets and to prevent the introduction of bias by uneven preprocessing, all missing values were imputed using the mean of the corresponding feature. This strategy preserved the overall statistical characteristics of the dataset and supported robust model development. Given the differences in units and scales among the variables, all features were normalized using Min–Max scaling for both the training and testing sets [29]. The processed soil and meteorological data are presented in Table 2.

The processed dataset comprises the seven distinct variables used to investigate the potential relationships between soil pore water EC and other environmental factors. Incorporating these variables into the predictive model is expected to enhance the accuracy of EC forecasting. To quantify these relationships, Pearson correlation analysis was employed to assess the associations between soil pore water EC and the other variables. Pearson’s correlation coefficient is a statistical measure that evaluates the degree of linear association between two variables. Given that soil and atmospheric variables are not entirely independent, their interdependencies can be effectively quantified using Pearson correlation analysis [30,31]. The Pearson correlation coefficient between the random variables x and y is computed using the following formula:

ρ_{P} = \frac{\sum_{i = 1}^{N} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{{[\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}]}^{\frac{1}{2}}}

(1)

In the equation, N represents the number of samples used for calculation, and ρ_p ranges from [−1, 1]. A larger absolute value of ρ_p indicates a stronger correlation between the two variables, while a negative value signifies a negative correlation. The standard interpretation of Pearson’s correlation coefficient is presented in Table 3.

As shown in the correlation heatmap (Figure 1), soil pore water EC exhibits a strong negative correlation with soil moisture (ρ_p = −0.6) and a strong positive correlation with soil EC (ρ_p = 0.93). It is moderately positively correlated with soil temperature (ρ_p = 0.49) and weakly positively correlated with air temperature (ρ_p = 0.34), while its correlations with VPD and air humidity are negligible (ρ_p = 0.12 and ρ_p = −0.06, respectively). Although air temperature exhibits only a weak positive correlation with soil pore water EC, it remains a critical variable within greenhouse environments due to its strong regulatory influence on crop physiological processes. As a key controllable environmental factor, temperature modulation in greenhouses significantly affects transpiration rates and root activity by altering thermal stability and diurnal temperature variation. These physiological responses, in turn, influence water and nutrient uptake, indirectly driving the accumulation and redistribution of salts within the soil solution, thereby affecting pore water EC dynamics. The extremely weak correlations observed between pore water EC and both VPD and air humidity may be attributed to the highly regulated nature of greenhouse systems, in which interventions such as ventilation, shading, and irrigation disrupt the natural variability of these meteorological parameters and attenuate their influence on soil pore water EC. Overall, soil moisture, soil temperature, soil EC, and air temperature were identified as the relevant predictors for model training.

2.2. CNN–LSTM Predictive Model

2.2.1. CNN Feature Extraction

Convolutional neural networks (CNN) are feature-extraction methods commonly used in deep learning. The basic structure of a CNN includes convolutional layers, pooling layers, and fully connected layers [32,33]. The convolutional layer enhances the features of the input data through convolution operations, facilitating the extraction of local information. The pooling layer performs down-sampling operations to achieve secondary feature extraction, while simultaneously reducing the number of parameters in the network, thereby improving computational efficiency. The fully connected layer integrates local features into global features via the Softmax function. In the field of image recognition, two-dimensional convolutional neural networks (2D CNN) are widely used, while for time-series prediction tasks, one-dimensional convolutional neural networks (1D CNN) are typically employed [34]. Therefore, in this study, a 1D CNN is chosen to extract temporal features from soil and meteorological datasets. During the feature-extraction process, the data undergo convolution and pooling operations, progressively enhancing the feature information, which effectively reveals the changing patterns of soil and meteorological factors. Finally, the extracted features are flattened, converting the multi-dimensional features (such as the 2D feature maps from the CNN output) with spatial dimensions into a one-dimensional vector format. This transformation is a common data preprocessing step used to bridge the CNN and LSTM models, enabling the prepared features to be used as inputs for the time-series model.

2.2.2. CNN–LSTM Model Architecture

LSTM, proposed by Hochreiter and Schmidhuber in 1997, is a specialized type of recurrent neural network (RNN) [35]. The structure of an LSTM cell is illustrated in Figure 2. By incorporating a gating mechanism that includes three types of gates, namely, the forget gate, the input gate, and the output gate, an LSTM effectively regulates the forgetting and updating of information, thereby overcoming the vanishing-gradient problem commonly encountered in traditional RNN during long-sequence learning. An LSTM is capable of retaining useful information over extended time spans while selectively forgetting irrelevant parts, providing a significant advantage when addressing long-term dependency issues [36]. Due to its proficiency in handling data characterized by long sequences, an LSTM may exhibit overfitting when applied to data characterized by short sequences.

The forget gate selectively discards a portion of the input data received from the previous node, regulating the extent to which past states are retained. The specific calculation formula is as follows:

z_{f} = σ (w_{f} [h_{t - 1}, x_{t}] + b_{f})

(2)

The input gate regulates which information is incorporated into the current unit, based on the values of x and h_t-1. The specific calculation formula is as follows:

z_{i} = σ (w_{i} [h_{t - 1}, x_{t}] + b_{i})

(3)

z = t a n h (w [h_{t - 1}, x_{t}] + b)

(4)

The output gate regulates the current state’s output through z₀, and scales the cell state c_t from the previous stage, using the tanh activation function. The specific calculation formula is as follows:

z_{0} = σ (w_{0} [h_{t - 1}, x_{t}] + b_{0})

(5)

h_{t} = z_{0} t a n h (c_{t})

(6)

In the above equations, w_f, w_i, w, and w₀ represent the weight coefficients corresponding to each module, while b_f, b_i, b, and b₀ denote the respective bias terms. Here, σ refers to the activation function, and tanh represents the hyperbolic tangent activation function. The specific calculation formula is as follows:

σ (x) = \frac{1}{1 + e^{- x}}

(7)

t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(8)

The final predicted value is obtained through the transformation of h_t. The specific calculation formula is as follows:

y_{t} = σ (w_{0} [h_{t - 1}, x_{t}])

(9)

The input data were first processed by a convolutional neural network (CNN) to extract spatial features. These extracted features were then flattened into a one-dimensional vector and fed into the input layer of a long short-term memory (LSTM) network to capture temporal dependencies. The combined CNN–LSTM architecture consisted of two convolutional layers, two LSTM layers, and a dropout layer. The first convolutional layer included 64 filters with a kernel size of 3 × 3; it was followed by a second convolutional layer comprising 128 filters of the same size. Temporal modeling was achieved using two sequential LSTM layers with 60 and 50 units, respectively. A dropout layer with a rate of 0.2 was applied to prevent overfitting. Model training was conducted using the Adam optimizer, with a learning rate of 0.001 over 100 epochs. The architecture of the CNN–LSTM model is illustrated in Figure 3.

2.3. BOA–XGBoost Predictive Model

2.3.1. XGBoost Model

XGBoost, proposed by Chen et al. [37], is an end-to-end gradient-boosting tree system. As an efficient implementation of the gradient-boosting algorithm, it has been widely applied to tasks such as classification, regression, and ranking. Building upon traditional gradient-boosting methods, XGBoost incorporates a range of innovative techniques, such as regularization, parallel computing, and missing value handling, significantly enhancing computational efficiency and predictive accuracy. In this study, the XGBoost model is configured with 100 trees, a maximum tree depth of 5, and a learning rate of 0.1, while all other parameters remain at their default values. The core principle of XGBoost is to construct a strong classifier by integrating multiple weak learners, typically, decision trees [38].

2.3.2. BOA–XGBoost Model Architecture

Probabilistic models have become a mainstream approach in artificial intelligence and machine learning. Bayesian optimization algorithm (BOA) is an optimization method based on Bayes’ theorem [39]. Bayesian optimization is a global optimization technique that relies on a probabilistic surrogate model, typically a Gaussian process, and an acquisition function. The acquisition function guides the selection of the next evaluation point, effectively balancing exploration of unknown regions and exploitation of known optimal areas. By updating the model after each evaluation, Bayesian optimization progressively refines the search space, ultimately prompting a convergence to the global optimum.

BOA is employed to determine the optimal values of key hyperparameters in the XGBoost model. The primary process of BOA optimization involves defining an objective function, iteratively adding new data points to update the posterior distribution of the objective function, and selecting the next set of hyperparameters based on this posterior distribution [40]. In this study, the Bayesian optimization process is performed over 400 iterations. The optimized model parameters include 58 trees, a maximum tree depth of 8, and a learning rate of 0.068. The advantage of this algorithm lies in its ability to leverage previous evaluation points to guide the selection of subsequent hyperparameter configurations, thereby improving optimization efficiency. The workflow of the BOA–XGBoost model is illustrated in Figure 4.

2.4. Hybrid Model Based on LSTM–CNN and BOA–XGBoost

2.4.1. Particle Swarm Optimization Algorithm

The particle swarm optimization (PSO) algorithm was proposed by Dr. Eberhart and Dr. Kennedy in 1995 and is classified as a swarm intelligence optimization algorithm [41]. Inspired by the foraging behavior of bird flocks, PSO represents individual birds or physical entities as candidate solutions to an optimization problem. By leveraging information exchange among members of the swarm, the algorithm guides the entire population toward convergence to the optimal solution. This convergence is achieved through an iterative updating process. In the PSO framework, each individual in the swarm is modeled as a particle characterized by two attributes: position and velocity. Assuming an N-dimensional search space with X particles, each particle follows the movement of the current best-performing particle in the space. The velocity and position of each particle are iteratively updated according to the following equations.

v_{i d (t + 1)} = w v_{i d (t)} + c_{1} r_{1} (p_{i d (t)} - x_{i d (t)}) + c_{2} r_{2} (p_{g d (t)} - x_{i d (t)})

(10)

x_{i d (t + 1)} = x_{i d (t)} + x_{i d (t + 1)}

(11)

In these equations, w represents the inertia weight, c₁ is the individual learning factor, and c₂ is the social learning factor. The r₁ and r₂ are random numbers uniformly distributed between 0 and 1. The v_id(t), x_id(t), p_id(t), and p_gd(t) denote the velocity, position, local best solution, and global best solution of the particle at time t, respectively.

2.4.2. PCLBX Hybrid Model

The predictive performance of the hybrid model is closely related to the distribution of weight coefficients assigned to individual models. Therefore, selecting an appropriate weighting strategy is crucial [42,43]. To improve the prediction accuracy of the hybrid model, this study employs the PSO algorithm to determine the weights for the CNN–LSTM and BOA–XGBoost models. By leveraging the optimization capability of PSO, the weight parameters for the predicted values of the CNN–LSTM and BOA–XGBoost models at different time points are determined with the objective of minimizing the error across all time points. This approach constructs the PCLBX hybrid model used for predicting the EC of soil pore water. The parameters of the PSO algorithm were set as follows: the swarm size was 100, the inertia weight w was 0.7, both the cognitive learning factor c₁ and the social learning factor c₂ were 1.5, and the maximum number of iterations was 500.

First, the PCLBX framework incorporates a CNN module to enhance the feature-extraction capability of the LSTM network, thereby constructing the CNN–LSTM predictive component used for forecasting soil pore water electrical conductivity (EC). Second, BOA is employed to automatically search for the optimal hyperparameter configuration within the XGBoost model, forming the BOA–XGBoost predictive component. Finally, PSO is utilized to dynamically adjust the weights between the CNN–LSTM and BOA–XGBoost models over time. The PSO optimization procedure involves randomly initializing the position and velocity of particles within the search space, evaluating the fitness of each particle, and updating each particle’s individual best position when a better fitness is achieved. The global best position is then determined from among all particles. Subsequently, particles iteratively update their velocities and positions based on both individual experience and collective intelligence. This process continues until a predefined number of iterations is reached or convergence criteria are met. By enabling adaptive weight adjustment at different time points, PSO ensures that the ensemble model yields near-optimal predictions throughout the forecasting horizon, thereby improving overall predictive accuracy and robustness. The workflow of the PCLBX model is illustrated in Figure 5, with the specific steps outlined in the following:

(1): The dataset was partitioned into training and testing subsets, in an 8:2 ratio, based on chronological order to preserve the temporal dependencies inherent in time-series data.
(2): The model parameters were initialized, and the CNN–LSTM and BOA–XGBoost models were constructed based on the training set. The prediction results were used to generate the weights W_CNN-LSTM and W_BOA-XGBoost.
(3): The particle swarm was initialized and individual fitness evaluated. The specific calculation formula was as follows:

$f (t) = {(y_{A (t)} * w_{C N N - L S T M} + y_{B (t)} * w_{B O A - X G B o o s t} - y_{(t)})}^{2}$

(12)

where y(t) represents the actual value, y_A(t) and W_CNN-LSTM represent the predicted value and weight of the CNN–LSTM model, and y_B(t) and W_BOA-XGBoost represent the predicted value and weight of the BOA–XGBoost model. The fitness function was designed based on the multi-model weighting strategy proposed in [44], wherein particle-specific and global best positions are iteratively updated according to their respective fitness values. Once the maximum number of iterations is reached, the optimal values of W_CNN-LSTM and W_BOA-XGBoost are output, with the CNN–LSTM model weight being 0.7093 and the BOA–XGBoost model weight being 0.2907.
(4): The hybrid prediction model based on CNN–LSTM and BOA–XGBoost is constructed using the testing set to obtain the optimal prediction values for the combined model.

2.5. Model Execution Environment and Evaluation Metrics

The experiments were conducted on a computing platform equipped with an Intel(R) Core(TM) i5-10300H CPU @ 2.50 GHz, with 16 GB RAM and an NVIDIA GeForce GTX 1650 GPU. The operating system was Windows 11, and the model development and implementation were carried out using PyCharm 2023. The deep learning framework employed in this study was TensorFlow.

To evaluate the predictive performance of the PCLBX model, three objective metrics were selected to assess its prediction accuracy: mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R²). The mathematical formulations of these metrics are presented as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(13)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |

(14)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(\hat{y_{i}} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(\bar{y_{i}} - y_{i})}^{2}}

(15)

Here, n represents the number of prediction samples, yᵢ denotes the actual value, ŷᵢ is the predicted value, and ȳ refers to the mean of the actual values.

3. Results and Analysis

In this study, Pearson correlation analysis was employed to identify relevant input variables for model development, resulting in the selection of soil moisture, soil temperature, soil EC, and air temperature as predictors. To preserve the temporal dependencies inherent in the time-series data, the dataset was partitioned chronologically into training and testing sets at an 8:2 ratio. For model construction, first, a CNN–LSTM architecture was developed in which one-dimensional convolutional neural networks were utilized to extract local temporal features, and these were followed by a LSTM network used to capture long-range dependencies. In parallel, a BOA–XGBoost model was constructed by applying Bayesian optimization to fine-tune the hyperparameters of the XGBoost algorithm. To further enhance prediction accuracy, PSO was employed to dynamically optimize the weighting between the CNN–LSTM and BOA–XGBoost outputs, thereby forming the integrated PCLBX ensemble model.

To validate the performance of the proposed PSO-based ensemble model PCLBX, we constructed and compared several models, including GRU, LSTM, CNN–LSTM, LightGBM, XGBoost, BOA–XGBoost, and CNN–LSTM–BOA–XGBoost. The predictive performance metrics of each model are presented in Table 4.

In Table 4, under the same conditions, the LSTM model exhibits higher prediction accuracy for soil interstitial water EC compared to the GRU model. The CNN–LSTM model achieves an MSE, MAE, and R² of 0.0017, 0.0302, and 0.9763, respectively. Compared to the LSTM model, the CNN–LSTM model reduces MSE and MAE by 0.0004 and 0.0049, respectively, and improves R² by 0.005. Although the improvement in R² is relatively modest, even slight gains in high-frequency environmental time-series data signify a substantial enhancement in the model’s capacity to capture complex nonlinear relationships. Under the same conditions, the XGBoost model demonstrates higher prediction accuracy for soil interstitial water EC than does LightGBM. The BOA–XGBoost model has an MSE, MAE, and R² of 0.0022, 0.0349, and 0.9701, respectively. Compared to XGBoost, the BOA–XGBoost model reduces MSE and MAE by 0.0008 and 0.0063, respectively, and improves R² by 0.0108. The PCLBX model achieves an MSE, MAE, and R² of 0.0016, 0.0288, and 0.9778, respectively. Compared to the CNN–LSTM model, the PCLBX model reduces MSE and MAE by 0.0001 and 0.0014, respectively, and improves R² by 0.0015. Compared to BOA–XGBoost, the PCLBX model reduces MSE and MAE by 0.0006 and 0.0061, respectively, and improves R² by 0.0077. The PCLBX model also reduces MSE and MAE by 0.0001 and 0.0005, respectively, and improves R² by 0.0007 when compared to the CNN–LSTM–BOA–XGBoost model. The reductions in both MSE and MAE achieved by the PCLBX model indicate a marked improvement in predictive performance. In the context of protected agriculture, even minor fluctuations in soil pore water EC can influence threshold-based decisions related to irrigation and fertilization. Therefore, enhancing prediction accuracy facilitates more timely adjustments to irrigation frequency and nutrient application rates, supporting the objectives of precision agriculture—namely, water conservation, salinity reduction, and fertilizer optimization. This, in turn, promotes more efficient resource utilization and contributes to improvements in crop growth conditions.

As shown in Figure 6, the PCLBX ensemble model demonstrates superior predictive accuracy for soil pore water EC, compared to single models and the equal-weight ensemble model. The equal-weight ensemble model, CNN–LSTM–BOA–XGBoost, captures the variations in the original values more effectively than do the individual models, thereby enhancing prediction performance to some extent. The improved performance of the proposed model enhances its capacity to characterize the dynamic behavior of soil salinity, thereby providing more accurate and reliable data support for irrigation and fertilization decision-making in greenhouse cultivation. However, since the CNN–LSTM–BOA–XGBoost model does not fully leverage the advantages of each constituent model at different time points, a dynamically weighted ensemble model, PCLBX, was proposed. Compared to the equal-weight ensemble CNN–LSTM–BOA–XGBoost model, PCLBX exhibits a notable improvement in predictive accuracy. Specifically, the MSE and MAE of this model are reduced, while R² shows an increase. This indicates that PCLBX more effectively captures the variation patterns in the data, enabling more precise prediction of soil pore water EC values at different time points. These findings highlight that the PSO-based dynamically weighted ensemble model can adjust the weight parameters of individual models dynamically over time, ensuring that the predicted EC values are closer to the actual values. This adaptability makes the model particularly effective for short-term predictions.

A comparative analysis of predicted versus observed soil pore water EC values derived from the four models is provided in Figure 7. The blue curve represents the actual soil pore water EC values, while the red curve denotes the predicted values. As can be observed in the plot, the PCLBX model demonstrates superior accuracy in capturing both the overall trend and the fluctuations in the data, highlighting its effectiveness in handling complex time-series predictions. In contrast, the BOA–XGBoost, CNN–LSTM, and CNN–LSTM–BOA–XGBoost models exhibit larger discrepancies between predicted and actual values, particularly during the period from June to August, when the data exhibit more pronounced variations. Notably, this time frame coincides with a critical crop management phase in the greenhouse, characterized by high ambient temperatures, increased irrigation frequency, and concurrent fertilization and environmental regulation interventions. The stable performance of the PCLBX model under such conditions further confirms its robustness and reliability in generating accurate predictions in complex and highly dynamic environments, underscoring its practical relevance for precision greenhouse management.

4. Discussion

With the growing application of deep learning techniques, LSTM networks have demonstrated remarkable advantages in various forecasting tasks due to their superior temporal modeling capabilities. However, studies have shown that an LSTM alone often struggles to effectively capture spatial features in complex data environments. Consequently, integrating additional modules to enhance spatiotemporal feature extraction has become a common strategy for improving model performance [45,46]. CNNs have been shown to augment the predictive capabilities of LSTMs by extracting localized patterns, and CNN–LSTM architectures have consistently outperformed traditional machine learning models such as random forests (RF) and support vector regression (SVR) in terms of prediction accuracy [47]. In the present study, the LSTM leveraged its gated mechanisms to effectively regulate the flow of long-term temporal information, capturing dynamic variations in soil and meteorological parameters over time. By incorporating a CNN, the model was further able to identify several characteristic temporal patterns in soil pore water EC evolution, including short-term diurnal fluctuations following irrigation events and seasonal trends driven by long-term variations in meteorological factors such as temperature and humidity. These temporally dependent features reflect potential linkages between soil water-salinity dynamics and controlled agricultural management practices, such as irrigation frequency, fertilization schedules, and environmental control strategies (e.g., timed ventilation). The effective capture of such patterns not only enhanced the model’s capability of representing complex temporal sequences, but also provided data-driven insights and theoretical support critical for optimizing crop water and nutrient management. Meanwhile, the XGBoost model, known for its robustness in handling structured data and executing efficient feature engineering, exhibited excellent performance in modeling nonlinear relationships with strong generalization ability. Zhang et al. [48] employed grid search to tune XGBoost hyperparameters, which improved prediction accuracy but the model suffered from limitations in high-dimensional parameter spaces, including low efficiency and susceptibility to local optima. In this study, Bayesian optimization was adopted for adaptive hyperparameter tuning of XGBoost, resulting in reductions of 0.0008 and 0.0063 in MSE and MAE, respectively, and an R² improvement of 0.0108 (Table 4). The dynamics of soil pore water EC in greenhouse systems are influenced by multiple interacting variables such as soil moisture, soil temperature, and air temperature, which exhibit nonlinear coupling. While XGBoost inherently possesses strong nonlinear modeling capacity, the application of Bayesian optimization enhanced the specificity of its hyperparameter configuration, enabling the model to better capture the complex distributions of real-world data. Furthermore, greenhouse cultivation systems are subject to frequent external perturbations—including irrigation scheduling and ventilation control—that introduce greater variability in environmental conditions. This optimization strategy markedly improved the nonlinear learning performance of XGBoost, thereby enabling it to more effectively identify intricate patterns in the data and, ultimately, enhance predictive accuracy.

In this study, PSO was employed to dynamically determine the predictive weight contributions of the CNN–LSTM and BOA–XGBoost models at different time steps, thereby enhancing the overall accuracy of the PCLBX ensemble framework. The optimization process used the prediction error between the observed and estimated values as the fitness function and iteratively searched for the optimal weight configuration to construct a weighted fusion model. Compared to the equal-weight CNN–LSTM–BOA–XGBoost ensemble, the PSO-optimized PCLBX model more effectively integrates the strengths of temporal sequence modeling and nonlinear feature extraction by dynamically adjusting component weights. This led to improved prediction accuracy as to soil pore water EC under complex environmental conditions. Moreover, the approach enhanced the model’s robustness and practical applicability in real-world agricultural monitoring scenarios. Nonetheless, the ensemble strategy also introduces certain limitations. Specifically, it requires the training and optimization of multiple base learners, resulting in substantially higher computational costs relative to single-model approaches. Additionally, the increased structural complexity may reduce model interpretability, making it more challenging to disentangle and attribute the contributions of individual components to the final predictions [49,50].

Previous studies have applied machine learning and deep learning methods to predict soil salinity. In the realm of machine learning, Zhao et al. [21] employed extreme gradient boosting (XGBoost), backpropagation neural networks (BPNN), and random forest (RF) algorithms for modeling. The results indicated that XGBoost outperformed both RF and BPNN in prediction accuracy, achieving an R² as high as 0.82. In terms of deep learning, Arshad et al. [51] utilized feedforward neural networks (FFNN), recurrent neural networks (RNN), and long short-term memory (LSTM) networks to predict soil EC. Their findings showed that a simple FFNN reached the highest R² of 0.88 during model training. However, an improved ensemble model combining FFNN and LSTM demonstrated better performance on the test dataset, achieving a maximum R² of 0.84. In contrast, the PCLBX model proposed in this study achieved an R² of 0.9778, significantly improving prediction accuracy. This underscores the potential of optimizing algorithms to combine deep learning and machine learning techniques, thereby overcoming the performance limitations of traditional methods. Additionally, this study employed the Bayesian optimization algorithm (BOA) and particle swarm optimization (PSO) for automatic hyperparameter tuning and model weighting, addressing the common issues of low grid search efficiency and susceptibility to local optima seen in prior research.

Despite the superior predictive performance of the PCLBX dynamic-weight ensemble model in forecasting soil pore water EC, several limitations remain. First, the implementation of PSO requires extensive iterative computation, which can result in reduced efficiency when applied to large-scale datasets. Moreover, the performance of PSO is highly sensitive to its hyperparameters—such as inertia weight and cognitive learning coefficients—necessitating meticulous empirical tuning, thereby increasing the complexity of model optimization [52]. Second, as the model was trained under specific greenhouse conditions, it lacks validation across different geographic regions and crop systems, which may constrain its generalizability and scenario adaptability. Although the dynamic weighting mechanism enhances model interpretability to some extent, the overall architecture remains largely “black-box”, limiting the capacity to mechanistically elucidate the contributions of individual input variables to the final predictions. Notably, the current model relies primarily on ground-based sensor data with relatively limited feature dimensions. Future research could explore the integration of remote sensing datasets—such as the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Land Surface Temperature (LST)—to develop multi-source, ground–air collaborative predictive frameworks. Such enhancements may significantly improve the spatial generalizability and practical applicability of the model. In addition, incorporating lightweight neural architectures or embedding domain-specific physical knowledge may further increase deployment efficiency and interpretability, while maintaining high prediction accuracy. These advancements would provide a more robust technical foundation for intelligent irrigation scheduling and precision fertilization in data-driven agricultural management.

From the perspectives of agronomy and soil science, the soil pore water EC serves as a critical indicator of salt stress, nutrient availability, and water movement within the soil profile. Elevated EC levels can inhibit nutrient uptake, adversely affect plant physiological processes, and, ultimately, reduce crop productivity. Accurate monitoring of dynamic changes in soil pore water EC is therefore essential for optimizing irrigation and fertilization strategies and enhancing resource use efficiency, and holds substantial promise for precision agriculture applications [53,54,55]. In greenhouse systems, environmental parameters such as air and soil conditions tend to be more stable than in open-field settings but are also subject to intensive human regulation. Temperature and humidity, for example, are commonly managed through heating, ventilation, and supplemental lighting systems to maintain the optimal conditions for plant growth, which in turn affects the EC of soil pore water. Among individual environmental factors, soil moisture plays a pivotal role in modulating solution concentration. Moderate soil moisture facilitates the dissolution and transport of soluble salts, thus increasing EC. However, excessive moisture can dilute ion concentrations, leading to reduced EC values. Soil temperature influences molecular mobility and ionic diffusion, thereby enhancing ion transport and elevating EC. Air temperature affects EC indirectly by altering evapotranspiration rates and, consequently, soil temperature. Under high-temperature conditions, increased evaporation results in reduced soil moisture and greater solute concentration, thus raising EC levels. Accordingly, a positive correlation exists between soil temperature and pore water EC, while air temperature shows an indirect positive association. In contrast, a negative correlation is typically observed between soil moisture and pore water EC, due to the moisture’s diluting effects upon soluble ions.

5. Conclusions

To address the complex interactions between soil and atmospheric conditions, soil moisture, soil temperature, soil EC, and air temperature were selected as the key input variables for model training. The selection was guided by Pearson correlation analysis, aiming to evaluate the strength of the association between each variable and soil pore water EC, thereby enabling data-driven modeling of EC dynamics. This study introduces a highly integrated predictive framework—PCLBX—by synergistically combining PSO, CNN-based feature extraction, LSTM networks for temporal sequence learning, and the predictive efficiency of Bayesian-optimized XGBoost. Through a comprehensive performance evaluation of the model, the following conclusions were drawn:

(1): The predictive performance of the CNN–LSTM model exhibited lower MSE and MAE compared to the LSTM and GRU models, indicating that the LSTM-based approach was more suitable for forecasting soil pore water EC in this study. Furthermore, the convolutional layers effectively enhance the feature-extraction process, thereby improving the predictive capability of the LSTM model. The BOA–XGBoost model, optimized through Bayesian optimization, successfully identifies the optimal hyperparameter combination, further enhancing the model’s predictive performance. Notably, in this study, the BOA–XGBoost model demonstrated a significant advantage in predicting soil pore water EC, providing robust support for soil-moisture forecasting. In time-series tasks, even marginal changes in model performance indicate that such improvements can effectively enhance the predictive accuracy with respect to soil pore–water EC, thereby providing more precise guidance for irrigation and fertilization decisions in greenhouse cultivation. This is particularly crucial in rose greenhouses, in which optimal resource allocation and yield optimization are key. Even slight improvements in prediction accuracy can have significant impacts on plant health and yield.
(2): The PCLBX model, optimized through PSO and employing a dynamic weighting strategy, successfully integrates the advantages of time-series analysis and nonlinear learning. The results indicate that, compared to the equally weighted CNN–LSTM–BOA–XGBoost model, the dynamically weighted PCLBX model achieves superior predictive accuracy. The model’s predictions exhibit a high degree of agreement with actual values, demonstrating its precision in forecasting soil pore water EC. Moreover, its strong performance across multiple evaluation metrics highlights the practical value of this approach.

In summary, this study developed a hybrid predictive framework—PCLBX—that enables high-precision forecasting of soil pore water EC in a rose greenhouse environment. The proposed model demonstrated robust adaptability to complex, nonlinear spatiotemporal data, offering a technical reference for real-time monitoring of soil water and salinity dynamics within agricultural Internet of Things (IoT) systems. Nevertheless, the study presents certain limitations. The integration of multiple sub-models increases system complexity, and the experimental validation was conducted solely within a single greenhouse setting, leaving the model’s generalizability yet to be fully established. Given the strong capacity of the PCLBX model for capturing temporal dependencies and nonlinear relationships, it holds promising scalability. Future research may explore its transferability across diverse cropping systems and soil types to further assess its universality and adaptability. For researchers engaged in agricultural environmental monitoring, precision irrigation, and greenhouse climate regulation, the modeling strategy and optimization pipeline proposed herein provide valuable methodological guidance. Moreover, this work contributes a reproducible, data-driven technical pathway toward the advancement of intelligent and adaptive agricultural management.

Author Contributions

Conceptualization, J.Z., J.S. and Y.Q.; Methodology, J.Z. and J.S.; Investigation, J.Z.; Formal analysis, J.S. and Y.Q.; Validation, J.Z., P.T., X.W. and H.Z.; Data curation, C.D. and Y.Y.; Resources, C.D., Y.Y. and Y.Q.; Writing—original draft, J.Z., P.T., X.W. and J.S.; Writing—review and editing, H.Z., J.S. and Y.Q.; Supervision, J.S. and Y.Q.; Project administration, Y.Q.; Funding acquisition, Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Yunnan Provincial Major Science and Technology Project (No. 202402AE090018-1), the Young and Middle-Aged Academic and Technical Leaders Reserve Talent Project (No. 202405AC350108), and the Yunnan Provincial Science and Technology Talent and Platform Program (Academician Expert Workstation Project, No. 202405AF140077).

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to the necessity for further research but are available from the corresponding author upon reasonable request.

Conflicts of Interest

Authors Changjun Deng and Yunlei Yang were employed by the company Yunnan Hanzhe Science & Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Gürsoy, S.; Türk, Z. Effects of land rolling on soil properties and plant growth in chickpea production. Soil Tillage Res. 2019, 195, 104425. [Google Scholar] [CrossRef]
Chen, S.; Zhang, G.; Zhu, P.; Wang, C.; Wan, Y. Impact of land use type on soil erodibility in a small watershed of rolling hill northeast China. Soil Tillage Res. 2023, 227, 105597. [Google Scholar] [CrossRef]
Liu, Z.; Ma, D.; Hu, W.; Li, X. Land use dependent variation of soil water infiltration characteristics and their scale-specific controls. Soil Tillage Res. 2018, 178, 139–149. [Google Scholar] [CrossRef]
Yin, X.; Chen, J.; Ye, Y.; Zhu, H.; Li, J.; Zhang, L.; Zhang, H.; He, S.; Wu, H. Optimizing bent branch numbers improves transpiration and crop water productivity of cut rose (Rosa hybrida) in greenhouse. Agric. Water Manag. 2024, 296, 108795. [Google Scholar] [CrossRef]
Sui, Y.; Jiang, R.; Liu, Y.; Zhang, X.; Lin, N.; Zheng, X.; Li, B.; Yu, H. Predicting the spatial distribution of soil salinity based on multi-temporal multispectral images and environmental covariates. Comput. Electron. Agric. 2025, 231, 109970. [Google Scholar] [CrossRef]
Ibraheem, A.H.A.; Lana, S.; Sergey, S. Understanding the mechanistic basis of adaptation of perennial Sarcocornia quinqueflora species to soil salinity. Physiol. Plant. 2021, 172, 1997–2010. [Google Scholar]
Vyavahare, G.D.; Lee, Y.; Seok, Y.J.; Kim, H.N.; Sung, J.; Park, J.H. Monitoring of soil nutrient levels by an ec sensor during spring onion (Allium fistulosum) cultivation under different fertilizer treatments. Agronomy 2023, 13, 2156. [Google Scholar] [CrossRef]
Ferrarezi, R.S.; Alem, P.O.; van Iersel, M.W. Prediction of pore water electrical conductivity using real dielectric and bulk electrical conductivity in soilless substrates. HortScience 2014, 49, S165–S166. [Google Scholar]
Sodini, M.; Cacini, S.; Navarro, A.; Traversari, S.; Massa, D. Estimation of pore-water electrical conductivity in soilless tomatoes cultivation using an interpretable machine learning model. Comput. Electron. Agric. 2024, 218, 108746. [Google Scholar] [CrossRef]
Hosseinpour-Zarnaq, M.; Omid, M.; Sarmadian, F.; Ghasemi-Mobtaker, H. A CNN model for predicting soil properties using VIS–NIR spectral data. Environ. Earth Sci. 2023, 82, 382. [Google Scholar] [CrossRef]
Liang, Z.; Zou, T.; Gong, J.; Zhou, M.; Shen, W.; Zhang, J.; Fan, D.; Lu, Y. Evaluation of soil nutrient status based on LightGBM model: An example of tobacco planting soil in Debao county, Guangxi. Appl. Sci. 2022, 12, 12354. [Google Scholar] [CrossRef]
Taşan, M.; Demir, Y.; Taşan, S.; Öztürk, E. Comparative analysis of different machine learning algorithms for predicting trace metal concentrations in soils under intensive paddy cultivation. Comput. Electron. Agric. 2024, 219, 108772. [Google Scholar] [CrossRef]
Sankalp, S.; Rao, U.M.; Patra, K.C.; Sahoo, S.N. Modeling gated recurrent unit (GRU) neural network in forecasting surface soil wetness for drought districts of Odisha. Dev. Environ. Sci. 2023, 14, 217–229. [Google Scholar]
Samadianfard, S.; Darbandi, S.; Salwana, E.; Nabipour, N.; Mosavi, A. Predicting soil electrical conductivity using multi-layer perceptron integrated with Grey Wolf Optimizer. J. Geochem. Explor 2020, 220, 106639. [Google Scholar]
Gao, P.; Xie, J.; Yang, M.; Zhou, P.; Chen, W.; Liang, G.; Chen, Y.; Han, X.; Wang, W. Improved soil moisture and electrical conductivity prediction of citrus orchards based on IoT using deep bidirectional LSTM. Agriculture 2021, 11, 635. [Google Scholar] [CrossRef]
Feng, A.; Zhou, J.; Vories, E.; Sudduth, K.A. Prediction of cotton yield based on soil texture, weather conditions and UAV imagery using deep learning. Precis. Agric. 2024, 25, 303–326. [Google Scholar] [CrossRef]
Fu, C.; Feng, X.; Tian, A. Whale optimization algorithm coupled with machine learning models for quantitative prediction of soil Ni content. Microchem. J. 2025, 209, 112709. [Google Scholar] [CrossRef]
Rahman, F.; Khan, M.A.R.; Alam, M. Hybrid PSO-GA Optimization for Enhancing Decision Tree Performance in Soil Classification and Crop Cultivation Prediction. Evol. Intell. 2025, 18, 30. [Google Scholar] [CrossRef]
Gao, P.; Lu, M.; Yang, Y.; Li, H.; Tian, S.; Hu, J. A predictive model of photosynthetic rates for eggplants: Integrating physiological and environmental parameters. Comput. Electron. Agric. 2025, 234, 110241. [Google Scholar] [CrossRef]
Liu, X.; Hu, Y.; Li, X.; Du, R.; Xiang, Y.; Zhang, F. An Interpretable Model for Salinity Inversion Assessment of the South Bank of the Yellow River Based on Optuna Hyperparameter Optimization and XGBoost. Agronomy 2024, 15, 18. [Google Scholar] [CrossRef]
Zhao, W.; Li, Z.; Li, H.; Li, X.; Yang, P. Soil Salinity Prediction in an Arid Area Based on Long Time-Series Multispectral Imaging. Agriculture 2024, 14, 1539. [Google Scholar] [CrossRef]
Yu, J.; Zhang, X.; Xu, L.; Dong, J.; Zhangzhong, L. A hybrid CNN-GRU model for predicting soil moisture in maize root zone. Agric. Water Manag. 2021, 245, 106649. [Google Scholar] [CrossRef]
Zhang, Y.; Li, C.; Jiang, Y.; Sun, L.; Zhao, R.; Yan, K.; Wang, W. Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model. J. Clean. Prod. 2022, 354, 131724. [Google Scholar] [CrossRef]
Alshingiti, Z.; Alaqel, R.; Al-Muhtadi, J.; Haq, Q.E.U.; Saleem, K.; Faheem, M.H. A deep learning-based phishing detection system using CNN, LSTM, and LSTM-CNN. Electronics 2023, 12, 232. [Google Scholar] [CrossRef]
Bagheri, A.; Patrignani, A.; Ghanbarian, B.; Pourkargar, D.B. A hybrid time series and physics-informed machine learning framework to predict soil water content. Eng. Appl. Artif. Intell. 2025, 144, 110105. [Google Scholar] [CrossRef]
Wu, Z.; Cui, N.; Zhang, W.; Liu, C.; Jin, X.; Gong, D.; Xing, L.; Zhao, L.; Wen, S.; Yang, Y. Estimating soil moisture content in citrus orchards using multi-temporal sentinel-1A data-based LSTM and PSO-LSTM models. J. Hydrol. 2024, 637, 131336. [Google Scholar] [CrossRef]
Liu, Z.; Qin, Z.; Zhu, P.; Li, H. An adaptive switchover hybrid particle swarm optimization algorithm with local search strategy for constrained optimization problems. Eng. Appl. Artif. Intell. 2020, 95, 103771. [Google Scholar] [CrossRef]
Han, L.; Zhang, H.; An, N. A Continuous Space Path Planning Method for Unmanned Aerial Vehicle Based on Particle Swarm Optimization-Enhanced Deep Q-Network. Drones 2025, 9, 122. [Google Scholar] [CrossRef]
Passalis, N.; Kanniainen, J.; Gabbouj, M.; Iosifidis, A.; Tefas, A. Forecasting financial time series using robust deep adaptive input normalization. J. Signal Process. Syst. 2021, 93, 1235–1251. [Google Scholar] [CrossRef]
Dufera, A.G.; Liu, T.; Xu, J. Regression models of Pearson correlation coefficient. Stat. Theory Relat. Fields 2023, 7, 97–106. [Google Scholar] [CrossRef]
Sheng, H.; Yin, Z.; Zhou, P.; Thompson, M.L. Soil C:N:P ratio in subtropical paddy fields: Variation and correlation with environmental controls. J. Soils Sediments 2022, 22, 21–31. [Google Scholar] [CrossRef]
Latif, G.; Abdelhamid, S.E.; Mallouhy, R.E.; Alghazo, J.; Kazimi, Z.A. Deep Learning Utilization in Agriculture: Detection of Rice Plant Diseases Using an Improved CNN Model. Plants 2022, 11, 2230. [Google Scholar] [CrossRef] [PubMed]
Lu, J.; Li, J.; Fu, H.; Tang, X.; Liu, Z.; Chen, H.; Sun, Y.; Ning, X. Deep Learning for Multi-Source Data-Driven Crop Yield Prediction in Northeast China. Agriculture 2024, 14, 794. [Google Scholar] [CrossRef]
Zhao, J.; Mao, X.; Chen, L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 2019, 47, 312–323. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Huang, R.; Wei, C.; Wang, B.; Yang, J.; Xu, X.; Wu, S.; Huang, S. Well performance prediction based on Long Short-Term Memory (LSTM) neural network. J. Pet. Sci. Eng. 2022, 208, 109686. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Sahin, E.K. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Appl. Sci. 2020, 2, 1308. [Google Scholar] [CrossRef]
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2015, 104, 148–175. [Google Scholar] [CrossRef]
Wang, M.; Lai, W.; Sun, P.; Li, H.; Song, Q. Severity Estimation of Inter-Turn Short-Circuit Fault in PMSM for Agricultural Machinery Using Bayesian Optimization and Enhanced Convolutional Neural Network Architecture. Agriculture 2024, 14, 2214. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN′95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
Shahhosseini, M.; Hu, G.; Archontoulis, S.V. Forecasting corn yield with machine learning ensembles. Front. Plant Sci. 2020, 11, 1120. [Google Scholar] [CrossRef]
Cao, J.; Wang, H.; Li, J.; Tian, Q.; Niyogi, D. Improving the forecasting of winter wheat yields in Northern China with machine learning–dynamical hybrid subseasonal-to-seasonal ensemble prediction. Remote Sens. 2022, 14, 1707. [Google Scholar] [CrossRef]
Mao, X.; Ren, N.; Dai, P.; Jin, J.; Wang, B.; Kang, R.; Li, D. A variable weight combination prediction model for climate in a greenhouse based on BiGRU-Attention and LightGBM. Comput. Electron. Agric. 2024, 219, 108818. [Google Scholar] [CrossRef]
Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 2019, 323, 203–213. [Google Scholar] [CrossRef]
Gandh, D.R.; Harigovindan, V.; Haq, K.; Bhide, A. Attention-driven LSTM and GRU deep learning techniques for precise water quality prediction in smart aquaculture. Aquac. Int. 2024, 32, 8455–8478. [Google Scholar]
Farhangmehr, V.; Imanian, H.; Mohammadian, A.; Cobo, J.H.; Shirkhani, H.; Payeur, P. A spatiotemporal CNN-LSTM deep learning model for predicting soil temperature in diverse large-scale regional climates. Sci. Total Environ. 2025, 968, 178901. [Google Scholar] [CrossRef]
Zhang, Y.; Haghani, A. A gradient boosting method to improve travel time prediction. Transp. Res. Part C Emerg. Technol. 2015, 58, 308–324. [Google Scholar] [CrossRef]
Ahn, J.M.; Kim, J.; Kim, K. Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM for harmful algal blooms forecasting. Toxins 2023, 15, 608. [Google Scholar] [CrossRef]
Zhou, S.; Song, C.; Zhang, J.; Chang, W.; Hou, W.; Yang, L. A hybrid prediction framework for water quality with integrated W-ARIMA-GRU and LightGBM methods. Water 2022, 14, 1322. [Google Scholar] [CrossRef]
Arshad, S.; Kazmi, J.H.; Harsányi, E.; Nazli, F.; Hassan, W.; Shaikh, S.; Dalahmeh, M.A.; Mohammed, S. Predictive Modeling of soil salinity integrating remote sensing and soil variables: An ensembled deep learning approach. Energy Nexus 2025, 17, 100374. [Google Scholar] [CrossRef]
Shixin, H.; Kedao, Z.; Hongmei, L.; Xiangjian, C. Application of Improved Monarch Butterfly Optimization for Parameters’ Optimization. Math. Probl. Eng. 2023, 2023, 1348624. [Google Scholar]
Ko, H.; Choo, H.; Ji, K. Effect of temperature on electrical conductivity of soils–Role of surface conduction. Eng. Geol. 2023, 321, 107147. [Google Scholar] [CrossRef]
Wang, D.; Yang, W.; Meng, C.; Cao, Y.; Li, M. Research on vehicle-mounted soil electrical conductivity and moisture content detection system based on current–voltage six-terminal method and spectroscopy. Comput. Electron. Agric. 2023, 205, 107640. [Google Scholar] [CrossRef]
Rhoades, J.; Chanduvi, F.; Lesch, S. Soil Salinity Assessment: Methods and Interpretation of Electrical Conductivity Measurements; Food & Agriculture Org.: Rome, Italy, 1999. [Google Scholar]

Figure 1. Heatmap of the correlations between various data indicators and soil pore water EC.

Figure 2. Schematic diagram of the LSTM cell structure.

Figure 3. CNN–LSTM model architecture.

Figure 4. BOA–XGBoost model workflow diagram.

Figure 5. PCLBX model workflow diagram.

Figure 6. Comparison of model evaluation metrics.

Figure 7. Comparative plots for the observed and predicted soil pore water EC values across the four modeling approaches.

Table 1. Examples of soil and meteorological data.

Node	Time	Soil Temperature	Soil Moisture	EC	Pore Water EC	Air Temperature	Air Humidity	VPD
Node 1	1 January 2024 0:00	12.21	19.06	0.178	1.796	7.72	84.82	0.12
	1 January 2024 0:10	12.13	19.06	0.178	1.796	7.34	86.91	0.09
	......	......	......	......	......	......	......	......
	19 December 2024 10:01	11.68	27.7	0.421	2.526	13.77	80.11	0.26
	19 December 2024 10:10	11.69	27.53	0.413	2.499	13.23	82.91	0.20
Node 2	1 January 2024 0:00	12.16	28.52	0.622	3.583	7.72	84.82	0.12
	1 January 2024 0:10	12.07	28.52	0.623	3.589	7.34	86.91	0.09
	......	......	......	......	......	......	......	......
	19 December 2024 10:01	11.22	31.79	0.608	3.012	13.77	80.11	0.26
	19 December 2024 10:10	11.22	31.79	0.605	2.997	13.23	82.91	0.20
Node 3	1 January 2024 0:00	12.16	30.97	0.596	3.062	7.72	84.82	0.12
	1 January 2024 0:10	12.11	30.96	0.595	3.058	7.34	86.91	0.09
	......	......	......	......	......	......	......	......
	19 December 2024 10:01	12.31	32.24	0.559	2.716	13.77	80.11	0.26
	19 December 2024 10:10	12.31	32.07	0.551	2.697	13.23	82.91	0.20

Table 2. Processed soil and meteorological data example.

Time	Soil Temperature	Soil Moisture	EC	Pore Water EC	Air Temperature	Air Humidity	VPD
1 January 2024 00:00:00	12.054	26.184	0.465	2.813	7.197	87.365	0.087
1 January 2024 01:00:00	11.764	26.183	0.465	2.810	6.595	89.218	0.065
......	......	......	......	......	......	......	......
19 December 2024 08:00:00	11.582	28.499	0.467	2.659	9.858	88.703	0.090
19 December 2024 09:00:00	11.603	29.613	0.500	2.707	12.520	83.682	0.183

Table 3. Pearson correlation coefficient standards.

The Range of Values for ρ_p	Degree of Correlation
[−1, −0.6]	Strong Negative Correlation
(−0.6, −0.4]	Moderate Negative Correlation
(−0.4, −0.2]	Weak Negative Correlation
(−0.2, 0.2)	Very Weak Correlation
[0.2, 0.4)	Weak Positive Correlation
[0.4, 0.6)	Moderate Positive Correlation
[0.6, 1]	Strong Positive Correlation

Table 4. Model performance metrics.

Model	MSE	MAE	R2
GRU	0.0040	0.0505	0.9456
LSTM	0.0021	0.0351	0.9708
CNN–LSTM	0.0017	0.0302	0.9763
LightGBM	0.0031	0.0416	0.9572
XGBoost	0.0030	0.0412	0.9593
BOA–XGBoost	0.0022	0.0349	0.9701
CNN–LSTM–BOA–XGBoost	0.0017	0.0293	0.9771
PCLBX	0.0016	0.0288	0.9778

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, J.; Tian, P.; Sun, J.; Wang, X.; Deng, C.; Yang, Y.; Zhang, H.; Qian, Y. A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination. Agronomy 2025, 15, 1180. https://doi.org/10.3390/agronomy15051180

AMA Style

Zhao J, Tian P, Sun J, Wang X, Deng C, Yang Y, Zhang H, Qian Y. A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination. Agronomy. 2025; 15(5):1180. https://doi.org/10.3390/agronomy15051180

Chicago/Turabian Style

Zhao, Jiawei, Peng Tian, Jihong Sun, Xinrui Wang, Changjun Deng, Yunlei Yang, Haokai Zhang, and Ye Qian. 2025. "A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination" Agronomy 15, no. 5: 1180. https://doi.org/10.3390/agronomy15051180

APA Style

Zhao, J., Tian, P., Sun, J., Wang, X., Deng, C., Yang, Y., Zhang, H., & Qian, Y. (2025). A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination. Agronomy, 15(5), 1180. https://doi.org/10.3390/agronomy15051180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Predictive Method for Greenhouse Soil Pore Water Electrical Conductivity Based on Multi-Model Fusion and Variable Weight Combination

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources and Processing

2.1.1. Data Sources

2.1.2. Data Processing and Correlation Analysis

2.2. CNN–LSTM Predictive Model

2.2.1. CNN Feature Extraction

2.2.2. CNN–LSTM Model Architecture

2.3. BOA–XGBoost Predictive Model

2.3.1. XGBoost Model

2.3.2. BOA–XGBoost Model Architecture

2.4. Hybrid Model Based on LSTM–CNN and BOA–XGBoost

2.4.1. Particle Swarm Optimization Algorithm

2.4.2. PCLBX Hybrid Model

2.5. Model Execution Environment and Evaluation Metrics

3. Results and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI