Next Article in Journal
Prehistoric Plant Exploitation and Domestication: An Inspiration for the Science of De Novo Domestication in Present Times
Next Article in Special Issue
Enhancing the Adaptability of Tea Plants (Camellia sinensis L.) to High-Temperature Stress with Small Peptides and Biosurfactants
Previous Article in Journal
Supplementation of the Plant Conditioner ELICE Vakcina® Product with β-Aminobutyric Acid and Salicylic Acid May Lead to Trans-Priming Signaling in Barley (Hordeum vulgare)
Previous Article in Special Issue
Effects of Magnesium on Transcriptome and Physicochemical Index of Tea Leaves
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved SVM-Based Soil-Moisture-Content Prediction Model for Tea Plantation

1
Electronic Information School, Wuhan University, Wuhan 430072, China
2
School of Automatic Control, Liuzhou Railway Vocational Technical College, Liuzhou 545616, China
Plants 2023, 12(12), 2309; https://doi.org/10.3390/plants12122309
Submission received: 17 May 2023 / Revised: 4 June 2023 / Accepted: 12 June 2023 / Published: 14 June 2023
(This article belongs to the Special Issue Tea Plants Response to Abiotic Stress)

Abstract

:
Accurate prediction of soil moisture content in tea plantations plays a crucial role in optimizing irrigation practices and improving crop productivity. Traditional methods for SMC prediction are difficult to implement due to high costs and labor requirements. While machine learning models have been applied, their performance is often limited by the lack of sufficient data. To address the challenges of inaccurate and inefficient soil moisture prediction in tea plantations and enhance predictive performance, an improved support-vector-machine- (SVM) based model was developed to predict the SMC in a tea plantation. The proposed model addresses several limitations of existing approaches by incorporating novel features and enhancing the SVM algorithm’s performance, which was improved with the Bald Eagle Search algorithm (BES) method for hyper-parameter optimization. The study utilized a comprehensive dataset comprising soil moisture measurements and relevant environmental variables collected from a tea plantation. Feature selection techniques were applied to identify the most informative variables, including rainfall, temperature, humidity, and soil type. The selected features were then used to train and optimize the SVM model. The proposed model was applied to prediction of soil water moisture in a tea plantation in Guangxi State-owned Fuhu Overseas Chinese Farm. Experimental results demonstrated the superior performance of the improved SVM model in predicting soil moisture content compared to traditional SVM approaches and other machine-learning algorithms. The model exhibited high accuracy, robustness, and generalization capabilities across different time periods and geographical locations with R2, MSE, and RMSE of 0.9435, 0.0194 and 0.1392, respectively, which helps to enhance the prediction performance, especially when limited real data are available. The proposed SVM-based model offers several advantages for tea plantation management. It provides timely and accurate soil moisture predictions, enabling farmers to make informed decisions regarding irrigation scheduling and water resource management. By optimizing irrigation practices, the model helps enhance tea crop yield, minimize water usage, and reduce environmental impact.

1. Introduction

Plant science encompasses a wide range of disciplines aimed at understanding the biology, ecology, and interactions of plants. Over the years, advancements in technology have provided researchers with new tools and techniques to explore the intricate world of plants. In recent times, machine learning (ML) and deep learning (DL) have emerged as powerful computational approaches with the potential to transform plant science research.
ML and DL are subsets of artificial intelligence (AI) that involve the development of algorithms and models capable of learning from and making predictions or decisions based on data. ML algorithms use statistical techniques to enable computers to learn patterns and relationships in data without being explicitly programmed, while DL, inspired by the structure of the human brain, utilizes artificial neural networks to model complex patterns and hierarchies in data.
The availability of large-scale datasets, coupled with advancements in computational power and algorithmic improvements, has made ML and DL highly effective in various fields. In plant science, these techniques have shown great potential in addressing challenges related to plant phenotyping, disease diagnosis, yield prediction, genomics, and crop improvement.
Sunny et al. proposed a new method for plant leaf recognition based on CapsNet, which consists of four parts: image sampling, image processing, feature extraction, and feature recognition [1].
Horng et al. proposed a harvesting system based on the Internet of Things technology and smart image recognition. The method uses an intelligent recognition model trained by a neural network, and then the system confirms whether the crop can be harvested, and finally uses a robotic arm to harvest the crop [2].
Disease diagnosis and pest management are critical aspects of plant science, as they directly impact crop health and productivity. Disease detection is conducted by experts using the naked eye, which is both time-consuming and troublesome [3].
ML and DL algorithms can analyze various data sources, including images, genomic data, and environmental parameters, to detect and classify plant diseases, assess disease severity, and optimize pest control strategies. These techniques provide valuable insights for early detection and timely intervention, leading to improved disease-management practices. K-Means [4], SVM [5], neural network [5], and Otsu’s thresholding algorithm [6] are the methods most used to detect and classify disease based on the features extracted from the image.
Convolution Neural Network Approaches (CNNS) is performed to select the dataset for training. Some architectures such as AlexNet [7,8], GoogLeNet [7,8], Inception v3 [9], CafeNet [10], and deep meta-architectures [11] are used to detect plant disease based on factors such as error rate, accuracy, and training time.
Van Dijk et al. summarized the application of machine learning in the context of plant science and plant breeding, which offers a suite of methods that enable researchers to find meaningful patterns in relevant plant data [12].
Esami et al. discussed and highlighted the application of AI-OA models in different steps of plant tissue culture, which provides a new view for future study objectives [13].
Singh et al. gave an overview of machine learning in the field of plant stress phenotyping for identification, classification, quantification, and prediction, which enable further exploration of these tools for facilitating practical use in plant breeding [14].
Jafari et al. presented a modeling and predicting morphological response of citrus to drought stress with different Artificial Neural Networks (ANNs), including Generalized Regression Neural Network (GRNN), Radial Basis Function (RBF), and Multilayer Perceptron (MLP), which are considered to be an effective tool for predicting plant morphological and physiological responses to drought stress [15].
Hesami et al. provided in-depth insight into plant system biology using machine learning, which is a good solution for studying plant system biology [16].
Grinblat and colleagues introduced a plant identification approach that concentrates on the vein patterns of leaves. They utilized deep convolutional neural networks to improve the recognition accuracy of legume species. This technique was reported to enhance the current state-of the art [17].
Mishra et al. proposed a hybrid approach in plant immune systems using Machine Learning, which provides new insights for understanding plant–microbe interactions [18].
Soil moisture content is one of the key parameters of plant growth. Machine learning and deep learning have also shown advantages in predicting soil water content. Soil moisture vastly affects tea growth, yield, and quality. Variations in soil moisture conditions within tea plantations directly affect the fertility of tea trees, and as a result, tea yield and quality. Appropriate soil moisture enhances tea yield and quality. Conversely, insufficient moisture retards tea tree growth, causes diseases, and even halts growth. Optimal soil moisture not only improves the root system’s capacity for nutrient absorption and utilization but also improves soil nutrient exploitation [19]. Therefore, precise soil moisture monitoring and management are imperative farming practices for high yield and excellent quality tea. Technological advances such as moisture sensors and computer modeling tools present constructive solutions for more effective irrigation management in tea plantations.
Various modeling techniques have been applied to predict SMC, including multiple linear regression (MLR), artificial neural network (ANN) models, and support vector machines (SVM) [20]. Among these techniques, SVM has shown superior performance due to its ability to handle small datasets and nonlinear complex relationships [21]. In addition, many scholars have proposed some solutions in soil water content prediction methods. Zhang, X., et al. presented that Conventional soil moisture measurement techniques like the weighing method, the tensiometer method, the resistance method, the neutron method, the standing wave ratio method, and the time domain reflection method are limited in measurement depth and scale [22].
Xu Qiao et al. proposes a soil moisture prediction approach based on multiple regression analysis to determine influential GPR indicators and radial basis function (RBF) neural network modeling using these indicators and measured soil moisture content [23]. Eftychia Taktikou et al. evaluates the ability of remotely sensed soil moisture to indicate in situ soil and canopy water content at fine spatial and temporal scales. Daily MODIS NDVI and diurnal land surface temperature difference were used to retrieve daily soil moisture, which was compared with in situ measurements; the results show the retrieved soil moisture can satisfactorily predict measured soil moisture and inform irrigation management [24]. Sankhadeep Chatterjee et al. developed a rapid, non-contact machine vision and AI-based method to estimate soil moisture from a single image. Three soil textures and organic matter levels were tested. Color features were extracted and used in ANFIS and multiple regression models to predict soil moisture, allowing soil moisture estimation from a single image [25]. Liang-hong C et al. used discrete wavelet transformation to decompose spectra, competitive adaptive reweighted sampling (CARS) to select key wavelengths, and partial least squares regression (PLSR) to model SMC, and then ELM to construct a nonlinear model for soil water content prediction, which provided an accurate SMC estimation model for arid soils [26]. Prakash S et al. used multiple linear regression, support vector regression and recurrent neural networks for prediction of soil moisture, which provided a good performance [27]. Data mining techniques and multiple linear regression were used to estimate tea yield analysis of four regions of Assam [28]. A multiple regression model was used to predict the average soil water content in the root zone, using soil water content at 5 cm depth, effective rainfall, and evapotranspiration as predictive variables [29]. L Esmaeelnejad et.al applied Multiple Linear Regression (MLR), Artificial Neural Network (ANN) and the Rosetta model to predict soil moisture [30]. T Aslan used SVR derived by Chaotic approach for the short term forecast of the soil water content [31]. Rf A et al. presented water management parameters prediction for irrigation management with RF (random forest), cubist (cubist regression), and GBM (gradient boosting machine) [32]. GAO Peng et.al establish prediction models of soil moisture content with long short-term memory (LSTM) and general regression neural network (GRNN), which has high accuracy and can be helpful for guiding irrigation and fertilization management [33].
Although the aforementioned method can improve the precision of soil moisture content assessment, it has limitations that need consideration.
The effectiveness of machine learning models depends on an adequate amount of high-quality training data. The caliber of the training data significantly resonates with the prediction outcomes of the model. Frequently, the training data may not satisfy the demands of machine learning models due to lack of information. Since inadequate training data can prevent models from precisely distinguishing distributional roles of data, it can cause poor generalization, overfitting, and unsatisfactory prediction outcomes.
Moreover, hyper-parameters’ choices in machine learning models significantly affect prediction outcomes. Incorrect hyper-parameter selection, such as learning rate, regularization coefficient, neural network number, network depth, etc., can lead to unsatisfactory prediction outcomes. The hyper-parameter selection remains an arduous and crucial concern in machine learning studies.
Therefore, this research aims to: 1. Develop an improved SVM model by incorporating meteorological factors for SMC prediction in tea plantations; 2. Investigate the use of population intelligence optimization algorithms to optimize machine learning hyper-parameters; and 3. Compare the performance of the improved SVM model with other existing models.

2. Research Object

2.1. Study Area Description

The Liucheng County is strategically situated in the subtropical monsoon zone of Guangxi, China—an area known for its remarkable climatic conditions. The county enjoys moderate temperatures and abundant light and rainfall, which are favorable for plant growth. However, the rainfall distribution across seasons tends to be uneven, leading to occasional seasonal water shortages. With an annual rainfall of 1300–1500 mm and an average temperature of 20 °C, the climate of the area is well-suited for the cultivation of various plants. The research area for this paper is the 58.4-hectare tea plantation located in Guangxi State-owned Fuhu Overseas Chinese Farm (N 109°21′, E 24°82′). The plantation is situated within Liucheng County, Guangxi, a region characterized by a subtropical humid climate. The area experiences an annual average temperature of 20.4 °C, with the hottest month having an average temperature of 28.2 °C. The annual average relative humidity is 77%, and the average annual sunshine duration is 1820 h. Despite adequate light, drought conditions still arise occasionally, particularly between March and May, due to irregular temporal rainfall distribution [34].

2.2. Soil Information Data System (SIDS)

To collect soil data from the tea plantation for this study, soil temperature sensors, soil moisture sensors, and soil electrical conductivity sensors were installed within the plantation. However, tea plantations are usually situated in hilly or sloping regions, which can create obstacles that affect wireless sensor node communication. To address this issue, wireless transmission and direct current (DC) line communication technology were utilized in tandem for dual data transmission. For this study, MEC20 sensors, manufactured by Dalian Zheqin Technology Co., were used to collect soil temperature, soil moisture content (SMC), and soil electrical conductivity (SEC) data. The sensors were selected based on their voltage output, cable length, and measuring range of SMC and SEC, all of which satisfy the necessary data collection requirements for the tea plantation. Voltage output sensors with a 6 m cable were chosen based on the measuring range of 0–100% for SMC and 0–5000 μS/cm for SEC. To ensure comprehensive and effective monitoring, the number and deployment of sensors were based on the actual plantation area. To adequately cover the Liuzhou tea plantation’s actual area, five sensor nodes were strategically deployed in a distributed pattern, with each node responsible for obtaining soil indicators for its assigned subarea.
First, divide the tea plantation into smaller zones based on factors such as topography, soil type, and any other relevant characteristics that may affect soil conditions. The number of zones will depend on the variability of the plantation.
Secondly, select sensor locations: Identify key representative locations within each zone where soil conditions can be measured effectively. The goal is to capture the spatial variability within each zone while minimizing the number of sensors. Consider factors such as slope, elevation, soil texture, and known variations in soil properties.
Lastly, decide on the depth at which you want to measure soil conditions. To monitor soil moisture in the root zone of tea trees accurately, sensors were buried at a depth of 10–15 cm. The suggestion of using sensors at varying depths enables the average soil moisture conditions to be obtained at different soil layers in the root zone [34].
It’s important to note that while five sensors provide valuable information, they may not capture the entire complexity of soil conditions across the entire tea plantation. However, this approach can help identify trends, variations, and patterns within the selected zones, guiding management decisions related to irrigation, fertilizer application, and overall soil health maintenance.
In addition, an integrated field weather monitor provided by Guangzhou Hairui Intelligent Technology Co., Ltd. is also installed, which is responsible for collecting environmental parameters related to this project, mainly including natural environmental parameters such as air temperature, air humidity, light intensity, rainfall, air pressure, wind speed, and wind direction [19].

3. Methods

The proposed model is presented in Figure 1.
Figure 1 shows the proposed model, which consists of sensors, data processing, feature input, support vector machine, swarm intelligence optimization, and result output. Firstly, the sensor deployed in the monitored area is used to collect data from the monitored area, and after data processing, the features are extracted and input to the crowd intelligence algorithm optimization after the support vector machine model is optimized to predict and obtain the prediction results. Data inputs include air temperature, relative humidity, rainfall, soil temperature and soil electrical conductivity, and the output of the data is soil water moisture.

3.1. Support Vector Machines

Support Vector Machines (SVMs) are a type of supervised machine learning algorithm utilized for classification and regression tasks. SVMs construct optimal hyperplanes that separate classes while simultaneously maximizing margins. With the application of kernel tricks, SVMs can handle complex, nonlinear problems and adapt to various classification tasks [19]. As a result, they are highly accurate and do not suffer from overfitting issues. Consequently, SVMs are an adaptable tool for many classification scenarios, providing precise solutions with minimal overfitting. To put it concisely, SVMs are a versatile and accurate machine learning algorithm used for classification and regression tasks.
Assume that the input sample is defined as Equation (1).
D = { ( x 1 , y 1 ) , ( x 1 , y 1 ) , , ( x 1 , y 1 ) } y i R
Therefore, the goal is to find a function f ( x ) whose value is as close to y as possible, defined as in Equation (2).
f ( x ) = w T x + b
The weight and bias values of the SVM model are represented by the variables “w” and “b”, respectively.
Equations (3) and (4) express the optimization problem:
min w , b 1 2 w 2 + C i = 1 m ( ξ i + ξ ^ i )
s t . { f ( x i ) y i ε + ξ i y i f ( x i ) ε + ξ ^ i ξ i 0 , ξ ^ i 0 , i = 1 , 2 , , m
where C represents the penalty factor, while ξ i and ξ ^ i represent the introduced relaxation in the problem.
Equation (5) illustrates the pairwise problem from the above-mentioned problem
Max [ 1 2 i , j = 1 m ( a i a ^ i ) ( a j a ^ j ) K ( x i , x j ) ε i = 1 m ( a i + a ^ i ) + i = 1 m y i ( a i a ^ i ) ] s . t { i = 1 m ( a i a ^ i ) = 0 , 0 a i c , 0 a ^ i c
where a i and a ^ j are Lagrangian multipliers. The Lagrangian multipliers are represented by the variables a i and a ^ j , respectively. K x i , x j is a kernel function.
Suppose the optimal solution is a = a 1 , a 2 , , a m , a ^ = a ^ 1 , a ^ 2 , , a ^ m . By adjusting the optimal solution, w * and b * are defined as Equations (6) and (7)
b * = 1 N n 0 < α i < C   y i x i   α i α i * K x i , x j ε + 0 < α i < C C   y i x i   α i α i * K x i , x j + ε
w * = i = 1 m a i a ^ i φ x i
where N n is sample data of SVM.
Therefore, the decision function for SVR is defined as Equation (8).
f x = w * · x + b * = i = 1 m a i a ^ i φ x i φ x j + b * = i = 1 m a i a ^ i K x i , x j + b *
The performance of the SVR model depends on the degree of match of the kernel function and the selection of the C value. Proper selection of the kernel function can effectively deal with nonlinear regression problems. However, if the value of C is too large, overfitting can easily occur, which reduces the generalization ability of the model. On the other hand, if the value of C is too small, underfitting occurs, and the model’s performance deteriorates. The selection of the C value achieves a balance between controlling the complexity of the model and generalization. Both factors determine the effectiveness of SVR. Therefore, optimizing these two parameters is necessary to obtain the best performance.

3.2. Bald Eagle Search Algorithm

The Bald Eagle Search algorithm, commonly known as BES, is renowned for its exceptional ability to conduct a global search and its effectiveness in solving multifaceted numerical optimization problems. An algorithm that has powerful global search capabilities can effectively solve various complex numerical optimization problems. The algorithm comprises of three phases, namely the selection, search, and swooping phases [19,35,36,37].
During the selection phase, the bald eagle initiates the optimization process by randomly selecting an area and evaluating the prey population to determine the best possible position to locate prey. The primary factors considered in determining the bald eagle’s position at this stage are experience and position change parameters (refer to Equation (9)).
P i , n e w = P b e s t + α × γ × ( P m e a n P i )
The parameter α signifies the control for changing the position and has a value between 1.5 and 2. The variable γ represents a random number ranging from 0 to 1. During the optimization process, P b e s t denotes the bald eagle’s previously identified best search position, P m e a n indicates the prominence of information derived from the previous points, and P i refers to the i-th bald eagle’s position.
During the search phase, the bald eagle intensifies its search for potential prey to expedite the optimization process in the form of a spiral fly and identify the optimal swooping location. The mathematical model of spiral flight is represented by polar equations. The key parameters in the polar equation are polar angle θ i and polar diameter r i . Equation (10) plays a crucial role in this phase by determining the most suitable location for the eagle based on various parameters.
P i , n e w = P i + x ( i ) × ( P i P m e a n ) + y ( i ) × ( P i P i + 1 ) x ( i ) = x r ( i ) max ( | x r | )    y ( i ) = y r ( i ) max ( | y r | ) xr ( i ) = r ( i ) × sin ( θ ( i ) )    yr ( i ) = r ( i ) × cos ( θ ( i ) ) r ( i ) = θ ( i ) + R × r a n d    θ ( i ) = a × π × r a n d
In the above equation, θ i and r i represent the polar angle and polar diameter of the spiral equation, respectively. The parameter φ determines the angle between the search point and the central point, taking a value between 5 and 10. The parameter R determines the number of search cycles and takes a value between 0.5 and 2. Rand is a random number ranging from 0 to 1. Additionally, x i and y i indicate the bald eagle’s position in polar coordinates, and P i + 1 refers to the i-th bald eagle’s next updated position.
During the swooping stage, bald eagles swiftly swoop from the best spots in the search space towards the prey, while other species move towards the optimal spots and execute their attack on the prey. The position of the bald eagle during this stage is evaluated using Equation (11).
P i , n e w = rand × P b e s t + x 1 ( i ) × ( P i c 1 × P m e a n ) + y 1 ( i ) × ( P i c 2 × P b e s t ) x 1 ( i ) = x r ( i ) max ( | x r | )    y 1 ( i ) = y r ( i ) max ( | y r | ) xr ( i ) = r ( i ) × sin h ( θ ( i ) )    yr ( i ) = r ( i ) × cos h ( θ ( i ) ) θ ( i ) = a × π × r a n d r ( i ) = θ ( i )
The variables c1 and c2 denote the bald eagle’s exercise intensity concerning the optimal and central positions. The exercise intensity can have a value ranging between 1 and 2.

3.3. Modifications Bald Eagle Search Algorithm (MBESMBES)

In order to improve the convergence and convergence rate of BES during the optimization process, two optimizations have been made to the BES algorithm. The first optimization is for the initialization of the BES population positions, while the second is for the update of positions during the prey process.
(1)
Optimization of Population Initialization Positions
The initialization of population positions is an important step towards swarm intelligence algorithms. The quality of the initialized positions has a significant effect on the convergence, convergence rate, and solution quality of the genetic algorithm. If the initial population positions are too centralized, the algorithm is prone to getting stuck in local optima, causing a decrease in accuracy at the cost of faster convergence. On the other hand, if the initial population positions are too dispersed, it is difficult for individuals to exchange information, making it challenging to find the global optimum direction, and requiring more time to find the optimal solution. This phenomenon can affect the convergence and convergence rate. Therefore, the choice of initial population positions should consider both convergence and convergence rate. Reasonable initialized population positions can balance the convergence rate with accuracy to find the global optimum direction gradually and achieve the optimal convergence effect. The more evenly distributed the population positions, the higher the coverage rate of the area, increasing the likelihood of finding optimal solutions. This enhances the probability of finding the optimal solution and speeds up the algorithm’s convergence.
In order to evenly distribute the initialization position of the vulture in the search area, the Tent chaotic mapping function is introduced to adjust the vulture group to enhance the diversification of the vulture group, avoid repetition, and help improve the global search ability of the algorithm. The introduction of random numbers can solve the problem of tent mapping in small cycle periods and fixed points. The mathematical expression for improving the tent map is shown in Equation (12)
z n + 1 = 2 z n + r a n d / N 0 z n < 0.5 2 1 z n + r a n d / N 0.5 z n 1
where rand is a random number of the [0, 1] range, z represents the iteration value, and N is the total number of particles in the chaotic sequence.
For the specified search area, when the number of optimization parameters has only one parameter, the initialization position is shown in the Formula (13)
P o s i t o n = L b + z ( U b L b )
where L b is the lower limit of the search area, and U b is the upper limit of the search area.
When the number of optimized parameters is two or more, the initialization of parameters is shown in Equation (14).
P o s i t o n = X l b + z . ( X u b X l b )
where
X l b = r e p m a t ( L b , S e a r c h A g e n t s _ n o , 1 )
X u b = r e p m a t ( U b , S e a r c h A g e n t s _ n o , 1 )
(2)
Optimization of optimal position update
When the vulture selects the search area in the selection stage, it uses a random number γ to perform a spatial search, and the global search ability is limited. To solve this problem, the Fuch chaos map in the chaos map is introduced to optimize the random number γ . Fuch chaotic mapping can balance the search area, convergence is fast, and is not sensitive to initial values. Considering that the Fuchs chaos map generates negative numbers, the resulting chaotic sequences must be processed with absolute values. The specific definition is shown in Equation (17)
f i = c o s 1 f i 1 2
where f (1) is a random number of [0, 1].
The update of the position of vultures in the hunting stage has a certain influence on the global optimal position and the average distribution position of all vulture groups. Therefore, a weight parameter ω is introduced, defined in Equation (18).
ω = 0.9 cos π 2 · 1 t T · f i t i G B e s t a v g f i t G B e s t t T   f i t i < a v g f i t 0.9 cos π 2 · 1 t T f i t n e s s i < a v g f i t
From the characteristics of the cos function, it is known that at first the weight is small, the optimization speed is fast, and with the increase of the number of weights, the optimization speed is slower, so as to have a good convergence balance. At the same time, considering the individual adaptation value of the vulture, the weight value is small for the less adaptation value, and the weight value of the large adaptation value is large, which ensures that the vulture has a good performance to approach the search area.
Thus, the vulture position at this time is calculated by the Formula (19).
P i ,   new   = r a n d × P best   + ω × x 1 ( i ) × P i c 1 × P mean   + ω × y 1 ( i ) × P i c 2 × P best  

3.4. MBESMBES-SVM Model

The improved vulture-search algorithm is used to optimize the kernel function and penalty factor in SVM. In the optimization process of the SVM model, MSE is used as the fitness function of MBES algorithm, which is defined in Equation (20) [19,35]
f i t n e s s = arg min ( M S E p ) = 1 M n = 1 M ( y p y r ) 2
where y p represents the predicted value, y r represents the measured value, and M is the total number of data samples.
The MBES–SVM algorithm uses MBES to automatically search for the optimal parameters of an SVR model. The BES algorithm performs iterations and updates the parameters based on the mean squared error (MSE) calculations on the test set. After a certain number of iterations, it finds a parameter configuration that yields the lowest MSE and highest predictive accuracy of the SVR model. The optimized SVR model with the selected parameters can then be used for new prediction tasks.
The steps of the algorithm are as follows:
Step 1: The data is divided into training and test sets.
Step 2: Define the Hyper-Parameter Space: Determine the hyper-parameters that need to be tuned and their respective ranges or discrete values.
Step 3: Set the control parameters of BES and initialize the Bald Eagle Population: Generate an initial population of bald eagles. Each bald eagle represents a candidate solution, which is a set of hyper-parameters.
Step 4: Evaluate Fitness: Evaluate the fitness of each bald eagle in the population. Fitness is determined by training and evaluating the model using the corresponding set of hyper-parameters. The fitness function should be defined by Equation (20).
Step 5: Update the position and speed of the bald eagle. Update the position (hyperparameter configuration) and speed (search direction) of each condor according to its current position, speed, and adaptability.
Step 6: Perform Exploration and Exploitation: The BES algorithm balances exploration and exploitation to search for better solutions. After the exploration and exploitation phases, reassess the fitness of the bald eagle and update it if any bald eagle achieves better performance.
Step 7: Repeat the Steps: Repeat steps 5 and 6 for a fixed number of iterations or until a termination criterion is met. The termination criterion can be a maximum number of iterations or reaching a desired level of performance.
Step 8: Extract the Best Solution: Once the algorithm terminates, extract the hyper-parameter set from the elite bald eagle, as it represents the best-performing solution obtained by the BES algorithm.
Step 9: Train and Evaluate the Model: Use the best hyper-parameter set obtained from the BES algorithm to train the model on the entire training dataset. Finally, evaluate the performance of the trained model on the test dataset or through cross-validation.

4. Experimental Analysis

4.1. Experimental Data

The data in this paper were obtained by SIDS mentioned in Section 2.2. The data include the ST (soil temperature), SMC (soil moisture content), and SEC (soil electrical conductivity), air humidity, rainfall, air humidity and light intensity of the tea plantation, which were measured from 20 August 2020 to 27 March 2021, with a total of 18,345 sets of data. After processing the data daily, 229 sets of data samples were generated and 216 sets of data were left after some abnormal data was clear. Some of the data are shown in Table 1.

4.2. Experimental Environment

The performance implementation of the algorithm was completed under Matlab 2014. Two experimental platforms were executed on a computer system operating under Windows 11. The computer’s CPU is an Intel(R) Core (TM) i7-9750H with a frequency of 2.60 GHz and 2.59 GHz, and a memory of 32 G.
The specific parameters of MBES-SVM, SVM, particle swarm optimization-support vector machines (PSO-SVM), and genetic algorithm-optimization support vector machine (GA-SVM) are shown in Table 2.

4.3. Evaluation Indicators

The model’s performance evaluation utilizes four distinct indicators, namely the root mean square error (MSE), root mean square error (RMSE), and R-square (R2). These factors can be calculated using Equations (21)–(23).
M S E = 1 n i = 1 n ( y i y ^ i ) 2
RMSE = 1 n i = 1 n ( y i y ^ i ) 2
R 2 = i = 1 n ω i ( y ^ i y ¯ i ) 2 i = 1 n ω i ( y i y ¯ i ) 2
In the aforementioned equations, n represents the number of samples in a given dataset. The variable y ^ indicates the predicted value, while y represents the mean of the measured value of a particular feature. Finally, the variable y represents the measured value of a given feature.

4.4. Experimental Analysis

The initial 70% chronologically organized data points were designated as the training dataset, while the subsequent 30% data points were identified as the test dataset. The training dataset was then utilized to train the MBES-SVM network, SVM, PSO-SVM, and GA-SVM models. In PSO-SVM, each particle represents a candidate solution, and the optimal SVM model parameters are found by iteratively updating the position and velocity of the particles. In GA-SVM, the SVM model parameters are optimized through coding and genetic manipulation, such as crossover and mutation, and individuals with high fitness are selected to generate next-generation solutions. The MBES-SVM, PSO-SVM, and GA-SVM models differ in the optimization process. Differences in these algorithms can lead to different performances on different problems and datasets. Which model to use is depended on the specific application scenario and problem requirements.
Upon completion of training, the test dataset was applied to evaluate the performance of the mentioned model in order to generate the results illustrated in Table 3. To rigorously assess the performance of the model, repeated experiments are necessitated, of which the mean value is adopted as the conclusive outcome.
As can be seen from the results in Table 3, MBES-SVM exhibits superior performance, with R2, MSE and RMSE of 0.8675, 0.0178 and 0.1333, SVM 0.8675, 0.0178, 0.1333, PSO-SVM 0.5662, 0.0711, 0.2666, and GA-SVM 0.5627, 0.0578 and 0.24.4, respectively. The PSO-SVM and GA-SVM algorithms are similar, SVMs are slightly better than both, and the MBES-SVM algorithm achieves the highest performance. The MBES-SVM model posited within this examination demonstrates the optimal predictive capacity, intimating that the model possesses profoundly robust prognostic ability, and the predictive efficacy is conclusively viable.
The graphical illustrations in Figure 2 and Figure 3 delineate the outcomes of the prognostic test for the MBES-SVM model as well as the dispersal of prognostic errors in the MBES-SVM model test. From the figures, it is conspicuous that the anticipated values yielded by the MBES-SVM model are congruous with the authentic values, in which the margin of errors is regulated within ±0.2 with trivial deviations, thereby validating the viability and potency of this framework.
Using a very small number of sets as training and measurement sets, we conducted experiments and recorded the results. We randomly selected 10 sets of data and used 1, 2, 3, 4 and 5 data as the training set and the rest as the test set. The results are given in the following Table 4.
Although the training data is small, the prediction of the algorithm is still relatively satisfactory.
We did further experiments with a small training set to predict the effect of more test data. We randomly selected 216 sets of data and used 1, 2, 3, 4 and 5 data as the training set and the rest as the test set. The results are given in the following Table 5.
From the results of the experiment, the prediction is also satisfactory.
The MBES-SVM model proposed in this manuscript has the best prediction performance, which indicates that the model has good prediction ability, and the prediction performance is effective.

5. Conclusions

In summary, the study developed an improved SVM model by incorporating meteorological factors such as air temperature, relative humidity, rainfall, and strength of illumination as input, along with the traditional input parameters of soil properties such as soil temperature and soil electrical conductivity to predict the soil moisture content (SMC) in a tea plantation. The model was trained and tested using data from the tea plantation. Compared to other existing models, the improved model achieved a higher accuracy in SMC prediction with R2, MSE, and RMSE of 0.8675, 0.0178 and 0.1333, respectively.
The kernel function and penalty parameter embedded in the SVM model were fine-tuned by means of the bald eagle search algorithm; thereafter the enhanced MBES-SVM model was adopted for anticipating the soil moisture content in tea farms. The experiments substantiated that this model showcases sterling proficiency and exceptional generalization aptitude. It also investigated the viability of applying this model to forecast soil moisture content.
An advanced SVM technique for SMC modeling developed in this paper utilizes both soil properties and meteorological inputs. The improved model demonstrated promising potential for real-time SMC monitoring in tea plantations. Further enhancements in model parameterization and validation across plantations would solidify its practical utility as a comprehensive tool to support sustainable water resource planning for agriculture.
Although the model shows the best performance compared to previous approaches, there are also limitations for this research.
Limited Generalization: One limitation of the proposed SVM model may be its ability to generalize to different tea plantations or regions. It is important to investigate whether the model performs well across diverse geographical locations or if it is specific to a particular dataset. A follow-up test could involve evaluating the model’s performance on soil moisture data from multiple tea plantations across different regions.
Data Availability and Quality: The availability and quality of data can significantly impact the performance of machine learning models. It is important to assess whether the proposed model is sensitive to variations in data availability and quality. A follow-up test could involve evaluating the model’s performance using different datasets with varying levels of data availability and quality.
Feature Selection: The paper may mention the features used for predicting soil moisture content. It is important to investigate the impact of feature selection on the model’s performance. A follow-up test could involve exploring different combinations of features or using feature selection techniques to identify the most relevant variables for predicting soil moisture content in tea plantations.
Model Comparison: Comparing the proposed SVM model with other machine learning algorithms commonly used for soil moisture prediction can provide insights into its effectiveness. A follow-up test could involve comparing the SVM model with algorithms like Random Forest, Neural Networks, or Gaussian Processes to determine which model performs better in terms of accuracy and computational efficiency.
Robustness to Environmental Factors: Tea plantations are subject to various environmental factors such as rainfall, temperature, and humidity. It is essential to assess the robustness of the proposed model under different environmental conditions. A follow-up test could involve evaluating the model’s performance during different seasons or under varying weather conditions to determine its reliability and stability.
Long-Term Predictability: Tea plantations require long-term soil moisture predictions for effective irrigation management. It would be valuable to assess the long-term predictability of the proposed SVM model. A follow-up test could involve training the model on historical data and assessing its performance in predicting soil moisture content for future time periods.
Scalability: The scalability of the model is an important consideration, especially if it is intended for real-world applications. It would be beneficial to evaluate the model’s performance when trained on larger datasets or when predicting soil moisture content for larger tea plantation areas. A follow-up test could involve scaling up the dataset and assessing the model’s computational efficiency and prediction accuracy.

Funding

This research was funded by 2018 Guangxi University High-Level Innovation Team and Excellence Scholars Program (Guijaioren (2018) No. 35).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sunny, N.M. A Review on Deep Learning for Plant Species Classification using Leaf Vein. Int. J. Eng. Res. Technol. 2020, 9, 796–801. [Google Scholar]
  2. Horng, G.J.; Liu, M.X.; Chen, C.C. The smart image recognition mechanism for crop harvesting system in intelligent agriculture. IEEE Sens. J. 2019, 20, 2766–2781. [Google Scholar] [CrossRef]
  3. Hungilo, G.G.; Emmanuel, G.; Emanuel, A.W.R. Image processing techniques for detecting and classification of plant disease: A review. In Proceedings of the 2019 International Conference on Intelligent Medicine and Image Processing, Bali, Indonesia, 19–22 April 2019; pp. 48–52. [Google Scholar]
  4. Habib, M.T.; Majumder, A.; Jakaria, A.Z.M.; Akter, M.; Uddin, M.S.; Ahmed, F. Machine vision-based papaya diseaserecognition. J. King Saud Univ. Comput. Inf. Sci. 2020, 32, 300–309. [Google Scholar]
  5. Rothe, P.R.; Kshirsagar, R.V. Adaptive neuro-fuzzy inference system for recognition of cotton leaf diseases. In Proceedings of the 2014 Innovative Applications of Computational Intelligence on Power, Energy and Controls with Their Impact on Humanity (CIPECH), Ghaziabad, India, 28–29 November 2014; pp. 12–17. [Google Scholar]
  6. Kumar, A.; Sharma, M.; Meshram, M.R. Image processing based leaf rot disease, detection of betel vine (Piper betle L.). Procedia Comput. Sci. 2016, 85, 748–754. [Google Scholar]
  7. Zhang, X.; Qiao, Y.; Meng, F.; Fan, C.; Zhang, M. Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access 2018, 6, 30370–30377. [Google Scholar] [CrossRef]
  8. Ma, J.; Du, K.; Zheng, F.; Zhang, L.; Gong, Z.; Sun, Z. Original papers A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput. Electron. Agric. 2018, 154, 18–24. [Google Scholar] [CrossRef]
  9. Ramcharan, A.; Baranowski, K.; McCloskey, P.; Ahmed, B.; Legg, J.; Hughes, D. Using transfer learning for imagebased cassava disease detection. Front. Plant Sci. 2017, 8, 1852. [Google Scholar] [CrossRef]
  10. Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
  11. Sladojevic, S.; Arsenovic, M.; Anderla, A.; Culibrk, D.; Stefanovic, D. Deep neural networks based recognition of plant diseases by leaf image classification. Comput. Intell. Neurosci. 2016, 2016, 613–623. [Google Scholar] [CrossRef]
  12. van Dijk, A.D.J.; Kootstra, G.; Kruijer, W.; de Ridder, D. Machine learning in plant science and plant breeding. Iscience 2021, 24, 101890. [Google Scholar] [CrossRef]
  13. Hesami, M.; Jones, A.M.P. Application of artificial intelligence models and optimization algorithms in plant cell and tissue culture. Appl. Microbiol. Biotechnol. 2020, 104, 9449–9485. [Google Scholar] [CrossRef] [PubMed]
  14. Singh, A.; Ganapathysubramanian, B.; Singh, A.K.; Sarkar, S. Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 2016, 21, 110–124. [Google Scholar] [CrossRef] [PubMed]
  15. Jafari, M.; Shahsavar, A. The application of artificial neural networks in modeling and predicting the effects of melatonin on morphological responses of citrus to drought stress. PLoS ONE 2020, 15, e0240427. [Google Scholar] [CrossRef] [PubMed]
  16. Hesami, M.; Alizadeh, M.; Jones, A.M.P.; Torkamaneh, D. Machine learning: Its challenges and opportunities in plant system biology. Appl. Microbiol. Biotechnol. 2022, 106, 3507–3530. [Google Scholar] [CrossRef] [PubMed]
  17. Grinblat, G.L.; Uzal, L.C.; Larese, M.G.; Granitto, P.M. Deep learning for plant identification using vein morphological patterns. Comput. Electron. Agric. 2016, 127, 418–424. [Google Scholar] [CrossRef]
  18. Mishra, B.; Kumar, N.; Mukhtar, M.S. Systems biology and machine learning in plant–pathogen interactions. Mol. Plant-Microbe Interact. 2019, 32, 45–55. [Google Scholar] [CrossRef]
  19. Huang, Y.; Jiang, H.; Wang, W.F.; Wang, W.; Sun, D. Soil moisture content prediction model for tea plantations based on SVM optimised by the bald eagle search algorithm. Cogn. Comput. Syst. 2021, 3, 351–360. [Google Scholar] [CrossRef]
  20. Kashif Gill, M.; Asefa, T.; Kemblowski, M.W.; McKee, M. Soil moisture prediction using support vector machines. J. Am. Water Resour. Assoc. 2006, 42, 1033–1046. [Google Scholar] [CrossRef]
  21. Kavzoglu, T.; Colkesen, I. A kernel functions analysis for support vector machines for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 352–359. [Google Scholar] [CrossRef]
  22. Zhang, X.; Tortel, H.; Ruy, S.; Litman, A. Microwave Imaging of Soil Water Diffusion Using the Linear Sampling Method. IEEE Geosci. Remote Sens. Lett. 2010, 8, 421–425. [Google Scholar] [CrossRef]
  23. Qiao, X.; Yang, F.; Xu, X. The prediction method of soil moisture content based on multiple regression and RBF neural network. In Proceedings of the 15th International Conference on Ground Penetrating Radar, Brussels, Belgium, 30 June–4 July 2014; pp. 140–143. [Google Scholar] [CrossRef]
  24. Taktikou, E.; Bourazanis, G.; Papaioannou, G.; Kerkides, P. Prediction of soil moisture from remote sensing data. Procedia Eng. 2016, 162, 309–316. [Google Scholar] [CrossRef]
  25. Chatterjee, S.; Dey, N.; Sen, S. Soil moisture quantity prediction using optimized neural supported model for sustainable agricultural applications. Sustain. Comput. Inform. Syst. 2020, 28, 100279. [Google Scholar] [CrossRef]
  26. Liang-hong, C.; Jian-li, D. Prediction for soil water content based on variable preferred and extreme learning machine algorithm. Spectrosc. Spectr. Anal. 2018, 38, 2209–2214. [Google Scholar]
  27. Prakash, S.; Sharma, A.; Sahu, S.S. Soil moisture prediction using machine learning. In Proceedings of the 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 20–21 April 2018; pp. 1–6. [Google Scholar]
  28. RNivetha, R.Y.; Dhaya, C. Developing a Prediction Model for Stock Analysis. In Proceedings of the Technical Advancements in Computers and Communications (ICTACC) 2017 International Conference on IEEE, Melmaurvathur, India, 10–11 April 2017. [Google Scholar]
  29. Moroizumi, T. Prediction of Soil Water Content Using Multiple Regression Model with Time Series Data. J. Agric. Eng. Soc. Jpn. 2014, 82, 47–52. [Google Scholar]
  30. Esmaeelnejad, L.; Ramezanpour, H.; Seyedmohammadi, J.; Shabanpour, M. Selection of a suitable model for the prediction of soil water content in north of Iran. Span. J. Agric. Res. 2015, 13, e1202. [Google Scholar] [CrossRef]
  31. Aslan, T. Forecasting of Soil Water Content using Support Vector Regression with the Purpose of Agricultural Drought. Am. Meteorol. Soc. 2015, 2. Available online: https://ams.confex.com/ams/27WAF23NWP/webprogram/Paper273510.html (accessed on 16 May 2023).
  32. Filgueiras, R.; Almeida, T.S.; Mantovani, E.C.; Dias, S.H.B.; Fernandes-Filho, E.I.; da Cunha, F.F.; Venancio, L.P. Soil water content and actual evapotranspiration predictions using regression algorithms and remote sensing data. Agric. Water Manag. 2020, 241, 106346. [Google Scholar] [CrossRef]
  33. Gao, P.; Xie, J.; Sun, D.; Chen, W.; Yang, M.; Zhou, P.; Wang, W. Prediction models of soil moisture content and electrical conductivity in citrus orchard based on internet of things and LSTM. J. South China Agric. Univ. 2020, 41, 134–144. [Google Scholar]
  34. Huang, Y.; Jiang, H.; Wang, W. Research on tea tree growth monitoring model using soil information. Plants 2022, 11, 262. [Google Scholar] [CrossRef]
  35. Huang, Y.; Jiang, H.; Wang, W.; Sun, D. Prediction model of soil electrical conductivity based on ELM optimized by Bald Eagle search algorithm. Electronics 2021, 25, 50–56. [Google Scholar] [CrossRef]
  36. Alsattar, H.A.; Zaidan, A.A.; Zaidan, B.B. Novel meta-heuristic bald eagle search optimisation algorithm. Artif. Intell. Rev. Int. Sci. Eng. J. 2020, 53, 2237–2264. [Google Scholar] [CrossRef]
  37. Jia, H.; Jiang, Z.; Li, Y. Simultaneous optimization feature selection based on improved condor search algorithm. Control Decis. 2020, 37, 445–454. [Google Scholar] [CrossRef]
Figure 1. Proposed System.
Figure 1. Proposed System.
Plants 12 02309 g001
Figure 2. Outcome of the MBES-SVM Model.
Figure 2. Outcome of the MBES-SVM Model.
Plants 12 02309 g002
Figure 3. Error of the MBES-SVM Model.
Figure 3. Error of the MBES-SVM Model.
Plants 12 02309 g003
Table 1. Partial raw data.
Table 1. Partial raw data.
Air TemperatureHumidityLight IntensityRainfallSTSMCSECTime
21.85 100.00 1860.00 411.20 19.05 26.94 120.32 2021/3/27
15.70 100.00 134.00 537.60 18.87 26.88 118.56 2021/3/25
17.36 94.15 3492.50 503.45 17.63 26.89 118.76 2021/3/24
17.36 99.40 3888.29 536.37 17.53 27.06 119.16 2021/3/23
16.60 86.40 6658.00 537.60 16.98 27.18 120.27 2021/3/22
26.67 100.00 7702.67 409.60 21.14 27.06 116.31 2021/3/19
26.60 100.00 6266.00 409.60 20.67 27.10 116.17 2021/3/18
24.98 100.00 5482.75 410.00 19.75 27.15 116.15 2021/3/17
23.87 100.00 6968.00 420.93 18.77 27.24 115.75 2021/3/16
Table 2. Parameters of the experiment.
Table 2. Parameters of the experiment.
ModesItemValueModesItemValue
SVM-c50BESα1.5
-g0.1a10
-s2R1.5
-p0.01C12
PSOc11.5C22
c21.7N100
maxgen200Max_iter100
k0.6Dim2
wP1GAmaxgen30
v3sizepop20
popcmax100cbound[0, 100]
popcmin0.1gbound[0, 100]
popgmax1000v5
popgmin0.01ggap0.9
Table 3. Evaluation of model.
Table 3. Evaluation of model.
Model R 2 RMSEMSE
SVM0.86750.13330.0178
PSO-SVM0.56620.26660.0711
GA-SVM0.56270.24040.0578
MBES-SVM0.94350.13920.0194
Table 4. Evaluation of model in smaller samples.
Table 4. Evaluation of model in smaller samples.
Number of Training SetsMSER2
1294.996 −3.35862 × 10−16
20.0625 0.6018
30.10880.8465
40.05600.7176
50.01060.9636
Table 5. Evaluation of model.
Table 5. Evaluation of model.
Number of Training SetsMSER2
1698.959 −2.28567 × 10−16
20.11680.0385
30.02984 0.7252
40.02299 0.8540
50.0058 0.9448
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Y. Improved SVM-Based Soil-Moisture-Content Prediction Model for Tea Plantation. Plants 2023, 12, 2309. https://doi.org/10.3390/plants12122309

AMA Style

Huang Y. Improved SVM-Based Soil-Moisture-Content Prediction Model for Tea Plantation. Plants. 2023; 12(12):2309. https://doi.org/10.3390/plants12122309

Chicago/Turabian Style

Huang, Ying. 2023. "Improved SVM-Based Soil-Moisture-Content Prediction Model for Tea Plantation" Plants 12, no. 12: 2309. https://doi.org/10.3390/plants12122309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop