Accuracy Analysis Mechanism for Agriculture Data Using the Ensemble Neural Network Method

: With the rise and development of information technology (IT) services, the amount of data generated is rapidly increasing. Data from many different places are inconsistent. Data capture, storage and analysis have major challenges. Most data analysis methods are unable to handle such large amounts of data. Many studies employ neural networks, mostly specifying the number of hidden layers and neurons according to experience or formula. Different sets of network topologies have different results, and the best network model is selected. This investigation proposes a system based on the ensemble neural network (ENN). It creates multiple network models, each with different numbers of hidden layers and neurons. A model that does not achieve the accuracy rate is discarded. The proposed system derives the weighted average of all remaining network models to improve the accuracy of the prediction. This study applies the proposed method to generate agricultural yield predictions. The agricultural production process in Taiwan is more complex than those of manufacturing or other industries. The Council of Agriculture provides agricultural forecasting primarily based on the planted area and experience to predict the yield, but without consideration of the overall planting environment. This work applies the proposed data analysis method to agriculture. The method based on ENN has a much lower error rate than traditional back-propagation neural networks, while multiple regression analysis has an error rate of 12.4%. Experimental results reveal that the ENN method is better than traditional back-propagation neural networks and multiple regression analysis.


Introduction
Crop production is important for people in Taiwan, while manufacturing industries face more issues than agricultural production.Issues in agricultural production include climatic factors, pests, diseases and the treatment process.Hence, farmers engaged in production, or those indirectly related to agricultural agencies, need to predict their crop yield accurately to avoid imbalances in market supply and demand caused or hastened by harvest crop quality and poor results.The agricultural forecasting provided by the Council of Agriculture is mainly based on the planted area and experience to predict the yield, but does not consider the impact of the plant environment on yield.
To understand the effect of important meteorological parameters, and to predict crop yields effectively, this work adopts stepwise regression and an ensemble neural network (ENN) method for analysis with the aim of improving the accuracy of crop yield prediction.
The rest of this study is organized as follows.The research backgrounds and related works of data mining methods, agricultural production forecasting, stepwise regression, and back-propagation neural networks (BPNs) are presented in Section 2. Section 3 proposes an ENN method to analyze agriculture data.The experimental results and discussions are illustrated in Section 4. Section 5 gives conclusions and future work.

Research Backgrounds and Related Works
The literature review of data mining methods, agricultural production forecasting, stepwise regression, and BPNs is discussed in the following subsections.

Data Mining Methods
Data mining is a part of database knowledge discovery.As the name suggests, it involves accumulating large amounts of data and extracting useful information from them.However, with the current development of information technology, the increasing amount of data, and different data types and sources and diversification, big data has become a major research topic in recent years for governments and industries.Big data technology is still based on traditional data mining methods.The objective of data mining or big data analysis is to identify implicit information from data, and thus enhance the value of information.Data analysis can be conducted using many approaches, such as cluster analysis, classification and statistical analysis.

Cluster Analysis
Fahad et al. [1] divided cluster analysis methods into five types, namely segmentation-based, hierarchical-based, density-based, grid-based and model-based methods, as listed in Table 1.

Classification
A classification model is generated from property values of existing data, then employed to predict the category of new data.The main goal of classification is to analyze the influence of each factor or variable on forecast data values.The result is a supervised learning network, containing neural networks and decision trees [2][3][4].

Statistical Analysis
This is based on mathematical principles, and can be categorized as descriptive statistics and inferential statistics [5][6][7].
Data mining creates high value for enterprises in sectors such as health and medical care, personal location information, retail and manufacturing [8].The proportion of US health care spending is very high.Analyzing the massive amount of health care data would significantly reduce capital costs.The retail sector has employed data mining analysis techniques for a long time: customer purchase records are applied to predict a future purchases list, and to adjust marketing strategies or merchandise display modes.The manufacturing sector, which is the backbone of the global trading industry, has a complex and widely dispersed value chain.Analyzing the available data would enable increased productivity, process improvements and reduced product delivery times.

Agricultural Production Forecasting
Many factors, mainly meteorological and environmental factors, influence crop yield.The variables covering changes in the weather include temperature, the amount of sunlight, and rain.Some studies concluded that temperatures and rainfall affect the growth of crops, thus affecting the final yield.Environmental factors that affect crop growth include latitude and soil.Chen et al. (2008) [9] accumulated data about crop damage, the economic growth rate, pesticide sales, the rate of change in agricultural production, the index of agricultural production and the gross national product to determine the effect of these variables on the amount of fresh fruits and vegetables in the market output as a factor of economic variables.Other investigations have observed that the usage of fertilizer and the mechanization of production are factors that affect crop yield.Some agricultural prediction algorithms utilize neural networks.Zhang et al. (2010) [10] accumulated meteorological and crop growth data, and employed these to compare the performance of artificial neural networks, the k-nearest neighbors algorithm (kNN) and regression methods to predict soybean growth and flowering stages in the schedule model.Their results show that artificial neural networks predicted the soybean growth and flowering stages more accurately than the two other models.Tsai et al. (2004) [11] constructed a production forecast model based on meteorological factors and growth traits factors, and analyzed it using the back-propagation network and other methods.Their analytical results demonstrated that the BPN forecasting performed better than others.Ma et al. employed regression analysis, the genetic algorithm, the back propagation neural network, and regression analysis combined with genetic algorithms to predict sales of pineapple, grapes and wax apples.According to their experimental data, the BPN best predicted wax apple sales, while regression analysis combined with genetic algorithms was most accurate for predicting pineapple and grape sales.

Stepwise Regression
Regression analysis by one or more independent variables is performed on the degree of correlation of a dependent variable to understand the influence of each independent variable.The methods of regression analysis are entering, forward, backward and stepwise regression.
Stepwise regression analysis combines the forward and backward regression return characteristics, beginning with the selected independent and dependent variables with the largest number of relationships.The dependent variables are successively removed from the regression equation, then added back to determine whether they should be included in the equation.Thus, forward and backward regression is utilized to obtain the best regression model [12,13].

Back-Propagation Neural Network
An artificial neural network (ANN) simulates messaging between neurons in a biological neural network.It comprises a plurality of neurons, as depicted in Figure 1. Figure 2 illustrates the network structure, also called the network topology [14][15][16].The traditional BPN which is a supervised learning network can be used for classification and prediction.In the learning stage, the BPN can update the weights among neurons in accordance with the error rate between the predicted output and the actual output in each iteration, and the error rate can be minimized after several iterations.The steps of the BPN method are described in the following [14][15][16].
(3) Setting the input neurons (e.g., Xi in Figure 1) and the output neurons (e.g., Yj in Figure 1).( 4) Calculating the output value of each neuron in the hidden layer in accordance with inputs and the output value of the neuron (e.g., Yj in Figure 1) in the output layer.(5) Evaluating the error rate between the predicted output and actual output.(6) Evaluating the error rate among the value of the output neuron, the output value of each neuron in the hidden layer, and the value of the input neurons.( 7) Updating the weights of neurons in accordance with error rates.(8) Repeating Steps ( 4)-( 7) until convergence.
While the BPN can analyze data and optimize the weights of the neural network, a local optimal solution may be performed by the BPN.Therefore, this study proposes an ENN to combine multiple BPNs with several compositions of data.

Materials and Methods
This investigation designs an accuracy analysis mechanism for agriculture data using the ENN method.The designed mechanism is employed for agricultural applications.Figure 3 shows the architecture of this mechanism.

Data Collection Mechanism
This is the underlying data analysis layer.It accumulates meteorological factors (e.g., relative humidity, precipitation, and air temperature), environmental factors (e.g., planting area, harvested area, harvest and harvest per unit volume), and economic factors (e.g., the cost of production and  The traditional BPN which is a supervised learning network can be used for classification and prediction.In the learning stage, the BPN can update the weights among neurons in accordance with the error rate between the predicted output and the actual output in each iteration, and the error rate can be minimized after several iterations.The steps of the BPN method are described in the following [14][15][16]. (1) Setting the parameters (e.g., neural network structure, learning rate, etc.) of the BPN.
(3) Setting the input neurons (e.g., Xi in Figure 1) and the output neurons (e.g., Yj in Figure 1).( 4) Calculating the output value of each neuron in the hidden layer in accordance with inputs and the output value of the neuron (e.g., Yj in Figure 1) in the output layer.(5) Evaluating the error rate between the predicted output and actual output.(6) Evaluating the error rate among the value of the output neuron, the output value of each neuron in the hidden layer, and the value of the input neurons.(7) Updating the weights of neurons in accordance with error rates.(8) Repeating Steps ( 4)-( 7) until convergence.
While the BPN can analyze data and optimize the weights of the neural network, a local optimal solution may be performed by the BPN.Therefore, this study proposes an ENN to combine multiple BPNs with several compositions of data.

Materials and Methods
This investigation designs an accuracy analysis mechanism for agriculture data using the ENN method.The designed mechanism is employed for agricultural applications.Figure 3 shows the architecture of this mechanism.

Data Collection Mechanism
This is the underlying data analysis layer.It accumulates meteorological factors (e.g., relative humidity, precipitation, and air temperature), environmental factors (e.g., planting area, harvested area, harvest and harvest per unit volume), and economic factors (e.g., the cost of production and The traditional BPN which is a supervised learning network can be used for classification and prediction.In the learning stage, the BPN can update the weights among neurons in accordance with the error rate between the predicted output and the actual output in each iteration, and the error rate can be minimized after several iterations.The steps of the BPN method are described in the following [14][15][16]. (1) Setting the parameters (e.g., neural network structure, learning rate, etc.) of the BPN.
(2) Setting the weights (e.g., W i,j in Figure 1) among neurons in the BPN.
(3) Setting the input neurons (e.g., X i in Figure 1) and the output neurons (e.g., Y j in Figure 1).(4) Calculating the output value of each neuron in the hidden layer in accordance with inputs and the output value of the neuron (e.g., Y j in Figure 1) in the output layer.(5) Evaluating the error rate between the predicted output and actual output.(6) Evaluating the error rate among the value of the output neuron, the output value of each neuron in the hidden layer, and the value of the input neurons.(7) Updating the weights of neurons in accordance with error rates.(8) Repeating Steps ( 4)-( 7) until convergence.
While the BPN can analyze data and optimize the weights of the neural network, a local optimal solution may be performed by the BPN.Therefore, this study proposes an ENN to combine multiple BPNs with several compositions of data.

Materials and Methods
This investigation designs an accuracy analysis mechanism for agriculture data using the ENN method.The designed mechanism is employed for agricultural applications.Figure 3 shows the architecture of this mechanism.

Data Collection Mechanism
This is the underlying data analysis layer.It accumulates meteorological factors (e.g., relative humidity, precipitation, and air temperature), environmental factors (e.g., planting area, harvested area, harvest and harvest per unit volume), and economic factors (e.g., the cost of production and the market trading price) which are shown in Table 2 from many different open data sources.Figure 4 illustrates the data preprocessing stage, which involves data integration, data cleaning and data transformation.Each step is presented in the following paragraph.(1) Data integration The data from different databases (e.g., the Agriculture and Food Agency of Council of Agriculture in Taiwan) are collected and stored into a database.
(2) Data cleaning  (1) Data integration The data from different databases (e.g., the Agriculture and Food Agency of Council of Agriculture in Taiwan) are collected and stored into a database.
(2) Data cleaning Due to the wide range of sources of information, information may be incomplete, non-conformant or noisy.Therefore, the data are cleaned to ensure the integrity and accuracy of the information (3) Data transformation For data normalization, data transformation is performed to normalize the data by using Equations ( 1)-(3).For instance, the average of the relative humidity during the j-th month can be defined as a 1,j , and the mean and standard deviation of the relative humidity in the historical dataset can be calculated by Equations ( 1) and ( 2), respectively.Then the normalized average of the relative humidity during the j-th month can be expressed as x 1,j by Equation (3).
x i,j " a i,j ´ai

Stepwise Multiple Regression Mechanism
Selecting the input variables of the neural network is a very important issue.Irrelevant input variables may lead to high network error, and indirectly reduce the network model reliability.To discover the relationship between meteorological factors and yields, this work derives a dependent variable from the monthly average temperature, relative humidity, sunshine and precipitation as independent variables.

Ensemble Neural Network Analysis Mechanism
The ENN method is based on BPNs.The ENN mechanism randomly generates a plurality of neural networks, each with a different architecture.For instance, the numbers of hidden layers and hidden layer neurons are generated randomly.Figure 5 illustrates the main process, which is divided into three stages, namely learning, recall and forecast.

Learning Stage
This algorithm generates M neural networks, each with different numbers of hidden layers and neurons in each hidden layer.In the learning stage, the learning data set is input into the networks.Input parameters, including meteorological data entry, contain the previous stage of the regional yield important parameters, environmental factors and economic factors.A neural network is a supervised learning network.In the learning stage, the input layer of the target maps to a known state in the output layer.Table 3 depicts the group summary.Hence, the main objective of this investigation is to construct a neuron coupling model between neurons for the learning stage, by constantly modifying the weights of neurons, in order to establish a correspondence between the input and output data in the study sample through learning.4 presents the group summary.The actual output value is then obtained.This is then compared with the target output value to obtain the accuracy for each network model.This accuracy is reused as the weight in the prediction stage.Furthermore, a threshold is considered and adopted for heuristic design.Any model that does not reach the accuracy threshold is eliminated.

Learning Stage
This algorithm generates M neural networks, each with different numbers of hidden layers and neurons in each hidden layer.In the learning stage, the learning data set is input into the networks.Input parameters, including meteorological data entry, contain the previous stage of the regional yield important parameters, environmental factors and economic factors.A neural network is a supervised learning network.In the learning stage, the input layer of the target maps to a known state in the output layer.Table 3 depicts the group summary.Hence, the main objective of this investigation is to construct a neuron coupling model between neurons for the learning stage, by constantly modifying the weights of neurons, in order to establish a correspondence between the input and output data in the study sample through learning.

Parameter Status
Neurons of input layer Known (learning data set) Weight Unknown (learned through constant learning and revision) Neurons of output layer Known

Recall Stage
Each network model constructs its network architecture model based on the preceding learning stage.The testing data set are entered for each network model which is then reconstructed based on the best correspondence.Table 4 presents the group summary.The actual output value is then obtained.This is then compared with the target output value to obtain the accuracy for each network model.This accuracy is reused as the weight in the prediction stage.Furthermore, a threshold is considered and adopted for heuristic design.Any model that does not reach the accuracy threshold is eliminated.

Parameter Status
Neurons of input layer Known (testing data set) Weight Known (learned through learning stage) Neurons of output layer Unknown (to verify the accuracy of the model output)

Prediction Stage
Any new data to be analyzed are entered into the remaining network models.Each network model determines the output based on the learning results and predictions.These network models which more accurately predict the overall results have a greater impact on the overall result.

Analyses of Experimental Results
This section presents the experimental environments and performs traditional BPNs and ENN to predict agricultural production.

Experimental Environments
All tomato data, meteorology data, environment data and economic data were accumulated.The total data set had 9953 records from the Agriculture and Food Agency of the Council of Agriculture in Taiwan from 1997 to 2014.The meteorological factors included the average air temperature, relative humidity, and precipitation; the environmental factors included the planting area, harvested area, harvest and harvest per unit volume; and the economic factors included the cost of production and the market trading price.In this study, the input parameters include the average air temperature, relative humidity, precipitation, planting area, cost of production, and market trading price; the output is harvest.The tools used in the experimental environments are listed in Table 5.

Experimental Results and Discussions
This study randomly generated five neural network models.Each network model generated up to five random hidden layers, and up to five neurons.Each network model used 60% of the available data for the learning data set, and the remaining 40% for the testing data set.The accuracy threshold was set as 90%, and the learning rate of each neural network was set as 0.1.That is, any model with accuracy below 90% was eliminated.Five tests were run in the learning stage.Table 6 shows the network model and network infrastructure for each test run.
In the first experiment, the accuracy rates of network models 1-5 were 90.81%, 86.70%, 88.10%, 89.87%, 93.30%, respectively.Only network models 1 and 5 had accuracy above 90%.In back-propagation neural technology research and analysis, a network model is only adopted if the accuracy rate of the network model has reached a threshold value.In this case, the model is used for later analysis to verify its prediction accuracy.The experimental conditions and parameters are fixed Sustainability 2016, 8, 735 9 of 11 in this stage.A datum is randomly selected from the data cluster.The traditional BPN model is then run to predict the results of multiple regression analysis and comparison.The regression equation based on regression analysis is defined as Equation (4).Each parameter in Table 2 was adopted into the regression model to predict harvests.This study used the root mean squared error (RMSE) to evaluate the error rate of the prediction method.The error rate of this method is about 12.4% which is higher than the error rates of traditional BPNs and ENNs.

Experimental Results of Traditional Back-Propagation Neural Network Analysis
The same consideration threshold of 90% of the model was compared to the first experiments.The actual production forecast was obtained by Model 1 which includes the neural network structure {1,3,1,2,1}.The output of Model 1 was 179,582 kg, and the actual yield was 191,500 kg.The result from Model 1 was thus 11,918 kg, or 6.64%, less than the actual production.The production forecast with Model 5 (i.e., neural network structure {1,3,2}) was 202,587 kg, which is 11,087 kg greater than the actual yield, giving a network model error of 5.47%.

Ensemble Neural Network Analysis of Experimental Results
The merit of this method is that it also considers the accuracy of the threshold through the network model.In the first experiment, ENNs were run to obtain the output value of Models 1 and 5.The weighted average yield was found to be 191,240 kg.The error rate of the ENN in Experiment 1 was 1.30% which is smaller than the error rate of traditional BPNs.The error rate was under 2% in Experiments 1, 3 and 4. The error rates of the models in Experiments 2 and 5 were higher, so Experiments 2 and 5 had high error rates.However, considering the weighted average significantly reduced the error rate.Figure 6 depicts the error rate for each experiment, and Figure 7 shows the error rate comparisons of BPNs and the ENN.

Conclusions and Future Work
With the advancement of information technology in various fields and the daily growth rate in data, neural networks are being widely adopted in industry, business, science and finance.However, the optimal number of hidden layers and neurons is mostly determined by experience or a formula.Considering a variety of analytical models is not possible.This study utilized stepwise regression analysis and ENN for the design guidelines to use in agriculture forecast analysis.The ENN method randomly creates a plurality of networks for analysis and forecasting and analyzes the results of all network models in order to improve the accuracy of the analysis.Experimental results reveal that the ENNs have the lowest error rate and highest accuracy, followed by traditional BPNs and multiple regression analysis.

Conclusions and Future Work
With the advancement of information technology in various fields and the daily growth rate in data, neural networks are being widely adopted in industry, business, science and finance.However, the optimal number of hidden layers and neurons is mostly determined by experience or a formula.Considering a variety of analytical models is not possible.This study utilized stepwise regression analysis and ENN for the design guidelines to use in agriculture forecast analysis.The ENN method randomly creates a plurality of networks for analysis and forecasting and analyzes the results of all network models in order to improve the accuracy of the analysis.Experimental results reveal that the ENNs have the lowest error rate and highest accuracy, followed by traditional BPNs and multiple regression analysis.

Conclusions and Future Work
With the advancement of information technology in various fields and the daily growth rate in data, neural networks are being widely adopted in industry, business, science and finance.However, the optimal number of hidden layers and neurons is mostly determined by experience or a formula.Considering a variety of analytical models is not possible.This study utilized stepwise regression analysis and ENN for the design guidelines to use in agriculture forecast analysis.The ENN method randomly creates a plurality of networks for analysis and forecasting and analyzes the results of all network models in order to improve the accuracy of the analysis.Experimental results reveal that the ENNs have the lowest error rate and highest accuracy, followed by traditional BPNs and multiple regression analysis.

Figure 3 .
Figure 3. Architecture of accuracy analysis mechanism for agricultural data based on the ENN method.Figure 3. Architecture of accuracy analysis mechanism for agricultural data based on the ENN method.

Figure 3 .
Figure 3. Architecture of accuracy analysis mechanism for agricultural data based on the ENN method.Figure 3. Architecture of accuracy analysis mechanism for agricultural data based on the ENN method.

Figure 4 .
Figure 4.The process of data preprocessing.

Figure 4 .
Figure 4.The process of data preprocessing.
rate.Figure6depicts the error rate for each experiment, and Figure7shows the error rate comparisons of BPNs and the ENN.

Figure 6 .
Figure 6.ENN results compared to experimental results of each experiment.

Figure 7 .
Figure 7.The error rates from BPNs and the ENN.

Figure 6 .
Figure 6.ENN results compared to experimental results of each experiment.

Figure 6 .
Figure 6.ENN results compared to experimental results of each experiment.

Figure 7 .
Figure 7.The error rates from BPNs and the ENN.

Figure 7 .
Figure 7.The error rates from BPNs and the ENN.

Table 3 .
ENN learning stage group summary.Each network model constructs its network architecture model based on the preceding learning stage.The testing data set are entered for each network model which is then reconstructed based on the best correspondence.Table

Table 3 .
ENN learning stage group summary.

Table 4 .
Summary of recall stage group of ENN.

Table 5 .
Tools in experimental environments.