Forecasting Helianthus annuus Seed Quality Based on Soil Chemical Properties Using Radial Basis Function Neural Networks

Forecasting crop chemical characteristics based on soil properties is not only a possible way to spare supplementary sampling and testing, but also a potential method of instructing cultivation planning based on regional soil surveys. In this paper, taking the data of regional agricultural geological survey on Helianthus annuus sources in the western part of the Jilin province as an attempt, radial basis function neural networks were used to forecast the quality indexes of Helianthus annuus seeds based on the non-linear relationship between soil and crop. The results indicate the following: (1) The mean relative errors of vitamin E, protein, fat, and TAA concentration forecasting neural networks are 2.63%, 2.19%, 2.19%, and 2.80%, respectively. The root mean square errors are 1.7 mg/100 g, 0.59%, 1.09%, and 0.77%. The forecasting radial basis function neural networks are of high prediction accuracy, which introduces an empirical case of forecasting the quality of crop based on a systematical soil environmental quality investigation along with a sampling survey of the crops. To set a proper model, interrelation between the selected indexes of input layer and output layer needs to be confirmed first, and a low setting of spread can improve the accuracy; (2) Soil in the studied area is under severe salinization, and concentrations of soil chemical properties mostly show an evident regional difference between the three experimental fields. However, the vitamin E, protein, and TAA concentrations of Helianthus annuus seeds all stabilize in a certain range despite the different soil environments. The mean fat concentration of Helianthus annuus seeds collected from Nongan and Daan exceeds those from Tongyu by approximately 5%, which shows a relatively evident regional difference.


Introduction
During the procedures of most regional agricultural geological surveys, usually, part of the soil or crop samples will be randomly tested to represent the chemical characteristics of the entire region-in particular for that crop.However, due to spatial heterogeneity [1], some local properties will be concealed, and this representation can be inaccurate.In addition, the follow-up research mostly requires a complete set of tested field data to be accomplished.Nevertheless, the supplementary sampling usually cannot coincide with the conditions of the original [2][3][4].Thus, how to accurately forecast the untested crop data of a field survey based on the current available data has become an important issue.
The soil-plant system is a sophisticated combination of complex non-linear relationships between chemical characteristics of soil and plants [5].Artificial neural networks are effective tools for stimulating such relationships [6].A radial basis function neural network (RBFNN) is an artificial neural network with a simple structure and a fast learning speed, and it can approximate any continuous functions at any degree of accuracy [7,8].On the other hand, soil chemical properties and the bio-available transformation of soil elements are the main indexes for representing soil fertility [9][10][11], and they have great impacts on crop yield and quality.This makes it possible to forecast crop quality based on soil chemical properties using RBFNN.Currently, a member of the research team has used a RBFNN to forecast the carbon, nitrogen, and phosphorus concentration of Leymus chinensis based on soil chemical properties [12].
The western part of the Jilin province is one of the main soil salinization regions in China with obvious grassland degradation and poor crop-planting conditions [13].However, the strong hardiness to the severe environments of Helianthus annuus has enabled it to become a major cash crop in the region [14].With the purpose of introducing an empirical case of forecasting the crop quality based on soil properties on the micro level, this paper takes the data of a regional agricultural geological survey on Helianthus annuus sources in the western part of Jilin as an attempt.The trained RBFNNs are applied to forecast the concentrations of vitamin E, protein, fat, and total amino acid (TAA) of the untested Helianthus annuus seeds.This suggests an empirical method of forecasting crop quality based on soil chemical properties using a RBFNN.Moreover, this attempt provides a potential mode for cultivation planning to reduce the waste of cultivated land resources and realize the sustainable use of cultivated land.

Study Area
Three experimental fields are located in Bajilei Town of Nongan County, Longzhao Town of Daan City and Zhanyu Town of Tongyu City in the western part of Jilin Province, respectively.This region is part of the Songliao Plain which is flat and under a typical middle temperate continental semi-arid monsoon climate.The annual average temperature is 4.4-4.7 degrees Celsius.The annual precipitation is 400-500 mm.The air relative humidity is 56%-64% in general.The study area is of high sunshine hours and sufficient heat which leads to the strong evaporation and concentration and high soil salinity.Soil type mainly consists of chernozem, light chernozem, salinized meadow soil, saline-alkali soil and aeolian sandy soil.Off-white calcareous sediments can be found under surface soil.Soil pH appears to be alkaline.

Sampling and Testing
According to the actual root length of Helianthus annuus, the sampling depth of soil is set to 60 cm.A total of 120 soil samples were collected from three experimental fields (Figure 1), 40 soil samples each.Soil samples were air-dried, ground, and passed through a 2-mm sieve prior to analysis.Helianthus annuus seeds sampling sites were in accordance with the soil sampling, seeds of 5-10 normal plants around the site were collected as one sample.The varieties of Helianthus annuus in three experimental fields are the same.A total of 120 Helianthus annuus seeds samples were collected from three experimental fields, 40 Helianthus annuus seeds samples each.All the soil and seeds samples were collected in October 2006.
In consideration of the availability of research data and indicator selection of an earlier study [12], soil pH, soil organic matter, soil hydrolytic nitrogen, soil Olsen-K and total concentration of soil Fe 2 O 3 , CaO, MgO, Na 2 O, Se, Mo, B were selected as the input indexes of the neural networks.Soil pH was tested via the glass electrode method.Total concentration of Fe, Ca, Mg, and Na were tested via X-ray fluorescence spectrometric method.Total concentration of Mo was tested via inductively coupled plasma mass spectrometric method.Total concentration of Se was tested via atomic fluorescence spectrometric method.Total concentration of B was tested via emission spectrometric method.Soil hydrolytic nitrogen concentration was tested via alkaline hydrolysis diffusion method.Soil Olsen-K concentration was tested via flame photometry.Vitamin E, total amino acid (TAA), protein and fat of Helianthus annuus seeds were selected as the output indexes of neural networks to represent the crop quality.Concentration of vitamin E and TAA were tested via liquid chromatography.Concentration of protein was tested via semi-micro Macro Kjeldahl method.Concentration of fat was tested via Soxhlet extraction.

Radial Basis Function Neural Networks
Artificial neural networks are the crucial branch of artificial intellegence with wide application in many disciplinary fields.Radial basis function neural network (RBFNN) is a type of ANN which uses radial basis function as the activation function, it is used in prediction estimation, system control, etc. [15].The actual topological structure of RBFNN is displayed in Figure 2. RBFNN is the kind of feed-forward artificial neural network with three layers (Figure 2): the input layer, the hidden layer and the output layer [7,8,12].The input layer is composed of signal source nodes and the input vector can be mapped into hidden space directly in a nonlinear form.In this paper, the input layer is composed of soil pH, soil organic matter, soil hydrolytic nitrogen, soil Olsen-K and total concentration of Fe2O3, CaO, MgO, Na2O, Se, Mo, B. The hidden layer is composed of hidden radial basis function units.The number of hidden units depends on the requirement of depicted issue, and the hidden layer converts the linearly inseparable issues in low-dimensional space to be linearly separable in high dimensional space by modeling the space of hidden layer with radial basis function units.The transfer function of hidden units is normally set to Gaussian function.Mapping from hidden layer to output layer is linear and thus it responses to the impact of input pattern, namely the linear weighted combination [16].The output layer is composed of vitamin E, total amino acid (TAA), protein and fat of Helianthus annuus seeds.The mapping formula is:

Radial Basis Function Neural Networks
Artificial neural networks are the crucial branch of artificial intellegence with wide application in many disciplinary fields.Radial basis function neural network (RBFNN) is a type of ANN which uses radial basis function as the activation function, it is used in prediction estimation, system control, etc. [15].The actual topological structure of RBFNN is displayed in Figure 2.

Radial Basis Function Neural Networks
Artificial neural networks are the crucial branch of artificial intellegence with wide application in many disciplinary fields.Radial basis function neural network (RBFNN) is a type of ANN which uses radial basis function as the activation function, it is used in prediction estimation, system control, etc. [15].The actual topological structure of RBFNN is displayed in Figure 2. RBFNN is the kind of feed-forward artificial neural network with three layers (Figure 2): the input layer, the hidden layer and the output layer [7,8,12].The input layer is composed of signal source nodes and the input vector can be mapped into hidden space directly in a nonlinear form.In this paper, the input layer is composed of soil pH, soil organic matter, soil hydrolytic nitrogen, soil Olsen-K and total concentration of Fe2O3, CaO, MgO, Na2O, Se, Mo, B. The hidden layer is composed of hidden radial basis function units.The number of hidden units depends on the requirement of depicted issue, and the hidden layer converts the linearly inseparable issues in low-dimensional space to be linearly separable in high dimensional space by modeling the space of hidden layer with radial basis function units.The transfer function of hidden units is normally set to Gaussian function.Mapping from hidden layer to output layer is linear and thus it responses to the impact of input pattern, namely the linear weighted combination [16].The output layer is composed of vitamin E, total amino acid (TAA), protein and fat of Helianthus annuus seeds.The mapping formula is: RBFNN is the kind of feed-forward artificial neural network with three layers (Figure 2): the input layer, the hidden layer and the output layer [7,8,12].The input layer is composed of signal source nodes and the input vector can be mapped into hidden space directly in a nonlinear form.In this paper, the input layer is composed of soil pH, soil organic matter, soil hydrolytic nitrogen, soil Olsen-K and total concentration of Fe 2 O 3 , CaO, MgO, Na 2 O, Se, Mo, B. The hidden layer is composed of hidden radial basis function units.The number of hidden units depends on the requirement of depicted issue, and the hidden layer converts the linearly inseparable issues in low-dimensional space to be linearly separable in high dimensional space by modeling the space of hidden layer with radial basis function units.The transfer function of hidden units is normally set to Gaussian function.Mapping from hidden layer to output layer is linear and thus it responses to the impact of input pattern, namely the linear weighted combination [16].The output layer is composed of vitamin E, total amino acid (TAA), protein and fat of Helianthus annuus seeds.The mapping formula is: where x − C i is Euclidean norm which denotes the actual distance from input vector (x) to the center of Gaussian function (C i ); σ 2 is the variance of Gaussian Function; w i is the link weight between hidden layer and output layer; m is the number of hidden layer nodes; y is the actual output.

Training of Neural Networks
Concentrations of vitamin E, total amino acid (TAA), protein, and fat of Helianthus annuus seeds were forecasted based on 11 soil chemical indexes using a RBFNN.To avoid the impact of abnormal values and to accelerate the rate of convergence, mapminmax function was invoked to normalize the input value and reverse the normalization after the output.The newrb function in the Matlab neural network toolbox was invoked to build the RBFNN.When the simulation of the non-linear relationship begins, the number of neurons in the hidden layer increases automatically until the mean square error drops below the target value.The format of the newrb invocation is: net = newrb(P, T, goal, spread, mn, df ), where P is the input vector; T is the output vector; and goal is the target mean square error, which is set to 10 −8 .spread is the expansion constant which represents the smoothness of the fitted curve.As the training of ANNs is mostly empirical [12,15], repeated modification of spread should be conducted during the training process for better simulation; mn is the upper limit number of neurons, which is set to 100; df is the displaying frequency of training process, which represents the number of neurons added between the two iterations, and it is set to 1.There are 60 groups of matched tested soil and Helianthus annuus seed data, and 20 groups in each experimental field.Fifty-five groups were selected randomly to train the neural networks, and the other 5 groups were used to validate the accuracy of the trained neural networks.The four parameter settings for forecasting RBFNNs of Helianthus annuus seed quality are as follows: (1) Parameters of vitamin E forecasting RBFNN:goal is set to 10 −8 , spread is set to 0.0397, mn is set to 100, and df is set to 1. NA-1, DA-6, DA-27, TY-12, and TY-34 samples were selected as the validation groups, and the other 55 as the training groups.(2) Parameters of protein forecasting RBFNN:goal is set to 10 −8 , spread is set to 0.0594, mn is set to 100, and df is set to 1. NA-15, NA-21, DA-27, TY-34, and TY-40 samples were selected as the validation groups, and the other 55 as the training groups.(3) Parameters of fat forecasting RBFNN:goal is set to 10 −8 , spread is set to 0.0478, mn is set to 100, and df is set to 1. NA-11, DA-19, DA-30, TY-3, and TY-5 samples were selected as the validation groups, and the other 55 as the training groups.(4) Parameters of TAA forecasting RBFNN:goal is set to 10 −8 , spread is set to 0.0593, mn is set to 100, and df is set to 1. NA-6, DA-23, DA-36, TY-22, and TY-24 samples were selected as the validation groups, and the other 55 as the training groups.
x i,m × 100%, where x i, f is the forecasted value; x i,m is the measured value; and n is the number of validation groups.

Forecasted Outcomes of Helianthus annuus Seed Quality Indexes
Validation results indicate a high accuracy of simulation and a reasonable setting of neural network parameters.Thus, the trained RBFNNs can be applied to forecast the remaining 60 groups, the values of Helianthus annuus seed quality indexes of which are untested.Due to the partial data loss of the DA-39 and TY-31 samples, the trained RBFNNs were used to forecast the Helianthus annuus seed quality of 58 other groups (Table 2).
The Helianthus annuus seeds in the western part of Jilin are of relatively high quality.Statistical data of the four quality indexes is shown in Table 3.The mean values of vitamin E, protein, and TAA concentration of the three experimental fields show little difference.However, the mean value of fat in the seeds collected from Nongan and Daan (45.39% and 46.41%) is about 5% higher than those from Tongyu (40.70%).The forecasted values by the RBFNN are in line with the tested data of 60 value-known groups.All quality index values of every experimental field have small fluctuations with a variable coefficient below 20%, which indicates lower variability.

Training of the RBFNN
The RBFNN is an effective tool for simulating a non-linear relationship.It avoids the disadvantages of back propagation neural networks (BPNNs), which is of low convergence speed and liable to fall into the locally optimal solution [12,15].The setting-up of neural networks is based on the non-linear relationship between the input vector and the output vector.The author failed to forecast the moisture concentration of Helianthus annuus seeds in the same way for the possible reason that soil water concentration is not included in the input index system, which is one of the main factors affecting the moisture concentration of crops [19].This indicates that the interrelation between selected indexes of input layers and output layers needs to be confirmed first to set a proper model.The goal values of the four RBFNNs in this paper are all set to 10 −8 , which means the iterative computation would stop when the mean square error dropped below 10 −8 .Among all the parameters of the RBFNNs, spread has the largest impact on the forecasted outcome.The parameter adjustment of RBFNNs mainly depends on the modification of spread to approach the relationship curve.The higher spread is, the smoother the fitted curve would be [20].In this paper, the spread of the vitamin E forecasting RBFNN is set to 0.0397, the spread of the protein forecasting RBFNN is set to 0.0594, the spread of the fat forecasting RBFNN is set to 0.0478, and the spread of the TAA concentration forecasting RBFNN is set to 0.0593.To ensure the accuracy, all settings of spread in the four RBFNNs are low.The mean relative errors of the four RBFNNs are 2.63%, 2.19%, 2.19%, and 2.80% and the root mean square errors are 1.7 mg/100 g, 0.59%, 1.09%, and 0.77% (Table 1), respectively, which shows a high accuracy of forecasting.The forecasted values of the RBFNNs are in line with the tested data of the 60 treated groups, and there are no abnormal values forecasted.This suggests an empirical method of forecasting and supplementing the missing part of vegetation data based on the interrelationship between soil properties and crop quality using RBFNNs.Meanwhile, high-accuracy forecasting of crop quality based on soil properties using a proper mathematical model has the potential to guide agricultural planning and realize the intensive use of cultivated land.

Analysis of Soil Chemical Properties and Helianthus annuus Seed Quality in the Western Part of Jilin
The western part of Jilin is of severe soil salinization and significant soil degradation [21,22].The statistical data of soil chemical properties shows that the averages of soil pH of soil samples of the three experimental fields range from 8.0 to 8.5, and the concentrations of Na 2 O are concentrated around 2%, which indicates obvious soil salinization [23].In addition, nine other indexes selected show high regional differences (Figure 3), especially the concentration of soil organic matter, soil total Se, soil total Fe 2 O 3 , and soil hydrolytic nitrogen.However, statistical data of Helianthus annuus seed quality indexes show little regional difference between the three experimental fields except that the mean value of the fat in seeds collected from Nongan and Daan is about 5% higher than those from Tongyu.According to the homeostasis theory and the ecological stoichiometry [24][25][26][27], plants have a self-protection mechanism to adapt to the sufferable change of the growing environment, which enables the exact element concentration to maintain in a certain range.Helianthus annuus has strong resistance characteristics to severe environments, which enables its survival in the poor planting environments of Western Jilin [14].This also helps to explain the stabilization of vitamin E, protein, and TAA concentrations of Helianthus annuus seeds despite the evident regional variation in soil chemical properties.
seed quality indexes show little regional difference between the three experimental fields except that the mean value of the fat in seeds collected from Nongan and Daan is about 5% higher than those from Tongyu.According to the homeostasis theory and the ecological stoichiometry [24][25][26][27], plants have a self-protection mechanism to adapt to the sufferable change of the growing environment, which enables the exact element concentration to maintain in a certain range.Helianthus annuus has strong resistance characteristics to severe environments, which enables its survival in the poor planting environments of Western Jilin [14].This also helps to explain the stabilization of vitamin E, protein, and TAA concentrations of Helianthus annuus seeds despite the evident regional variation in soil chemical properties.

Conclusions
(1) The mean relative errors of vitamin E, protein, fat, and TAA concentration forecasting neural networks are 2.63%, 2.19%, 2.19%, and 2.80%, respectively.The root mean square errors are 1.7 mg/100g, 0.59%, 1.09%, and 0.77%.The neural networks are of high prediction accuracy, which introduces an empirical case for the forecasting quality of crops based on soil properties by using RBFNN.The interrelation between selected indexes of input layer and output layer needs to be confirmed first, and a low setting of the spread of RBFNN can help improve the accuracy.
(2) Soil in the studied area is under severe salinization, and soil chemical properties of the three experimental fields show high regional differences.However, the vitamin E, protein, and TAA concentration of Helianthus annuus seeds all stabilize in a certain range despite the different soil environments.The mean value of fat in the seeds collected from Nongan and Daan (45.39% and 46.41%) is about 5% higher than those from Tongyu (40.70%).

Conclusions
(1) The mean relative errors of vitamin E, protein, fat, and TAA concentration forecasting neural networks are 2.63%, 2.19%, 2.19%, and 2.80%, respectively.The root mean square errors are 1.7 mg/100 g, 0.59%, 1.09%, and 0.77%.The neural networks are of high prediction accuracy, which introduces an empirical case for the forecasting quality of crops based on soil properties by using RBFNN.The interrelation between selected indexes of input layer and output layer needs to be confirmed first, and a low setting of the spread of RBFNN can help improve the accuracy.
(2) Soil in the studied area is under severe salinization, and soil chemical properties of the three experimental fields show high regional differences.However, the vitamin E, protein, and TAA concentration of Helianthus annuus seeds all stabilize in a certain range despite the different soil environments.The mean value of fat in the seeds collected from Nongan and Daan (45.39% and 46.41%) is about 5% higher than those from Tongyu (40.70%).
of protein was tested via semi-micro Macro Kjeldahl method.Concentration of fat was tested via Soxhlet extraction.

Figure 1 .
Figure 1.Distribution of the experimental fields.

Figure 1 .
Figure 1.Distribution of the experimental fields.
of protein was tested via semi-micro Macro Kjeldahl method.Concentration of fat was tested via Soxhlet extraction.

Figure 1 .
Figure 1.Distribution of the experimental fields.

Figure 3 .
Figure 3.Comparison of soil chemical properties between the three experimental fields.

Figure 3 .
Figure 3.Comparison of soil chemical properties between the three experimental fields.

Table 1 .
Validation results of all radial basis function neural networks.

Table 2 .
Forecasted outcomes of untested Helianthus annuus seed quality.

Table 3 .
Statistics of Helianthus annuus seed quality in the studied area.