Application of Vegetation Indices for Agricultural Crop Yield Prediction Using Neural Network Techniques

Spatial variability in a crop field creates a need for precision agriculture. Economical and rapid means of identifying spatial variability is obtained through the use of geotechnology (remotely sensed images of the crop field, image processing, GIS modeling approach, and GPS usage) and data mining techniques for model development. Higher-end image processing techniques are followed to establish more precision. The goal of this paper was to investigate the strength of key spectral vegetation indices for agricultural crop yield prediction using neural network techniques. Four widely used spectral indices were investigated in a study of irrigated corn crop yields in the Oakes Irrigation Test Area research site of North Dakota, USA. These indices were: (a) red and near-infrared (NIR) based normalized difference vegetation index (NDVI), (b) green and NIR based green vegetation index (GVI), (c) red and NIR based soil adjusted vegetation index (SAVI), and (d) red and NIR based perpendicular vegetation index (PVI). These four indices were investigated for corn yield during 3 years (1998, 1999, and 2001) and for the pooled data of these 3 years. Initially, Back-propagation Neural Network (BPNN) models were developed, including 16 models (4 indices * 4 years including the data from the pooled years) to test for the efficiency determination of those four vegetation indices in corn crop yield prediction. The corn yield was best predicted using BPNN models that used the means and standard deviations of PVI grid images. In all three years, it provided higher prediction accuracies, coefficient of determination (r2), and lower standard error of prediction than the models involving GVI, NDVI, and SAVI image information. The GVI, NDVI, and SAVI models for all three years provided average testing prediction accuracies of 24.26% to 94.85%, 19.36% to 95.04%, and 19.24% to 95.04%, respectively while the PVI models for all three years provided average testing prediction accuracies of 83.50% to 96.04%. The PVI pool model provided better average testing prediction accuracy of 94% with respect to other vegetation models, for which it ranged from 89–93%. Similarly, the PVI pool model provided coefficient of determination (r2) value of 0.45 as compared to 0.31–0.37 for other index models. Log10 data transformation technique was used to enhance the prediction ability of the PVI models of years 1998, 1999, and 2001 as it was chosen as the preferred index. Another model (Transformed PVI (Pool)) was developed using the log10 transformed PVI image information to show its global application. The transformed PVI models provided average corn yield prediction accuracies of 90%, 97%, and 98% for years 1998, 1999, and 2001, respectively. The pool PVI transformed model provided as average testing accuracy of 93% along with r2 value of 0.72 and standard error of prediction of 0.05 t/ha.


Introduction
Achieving maximum crop yield at minimum cost is one of the goals of agricultural production.Early detection and management of problems associated with crop yield indicators can help increase yield and subsequent profit.Remote sensing and global positioning systems (GPS) can be used to assess spatial variability in crop yield [1]. Visible red, green, and blue band and near-infrared (NIR) regions of the electromagnetic spectrum have been used successfully to monitor crop cover, crop health, soil moisture, nitrogen stress, and crop yield [2][3][4][5][6][7][8][9][10][11].
More recently, aerial images have been widely used for crop yield prediction before harvest [1,12,13].These images can provide high spatial cloud free information of the crop's spectral characteristics.Analysis of vegetation and detection of changes in vegetation patterns are important for natural resource management and monitoring, such as crop vigor analysis [14].Healthy crops are characterized by strong absorption of red energy and strong reflectance of NIR energy [1].The strong contrast of absorption and scattering of the red and near-infrared bands can be combined into different quantitative indices of vegetation conditions.These mathematical quantitative combinations are known as vegetation indices.Since the late 1980s, numerous studies like Funk and Budde [1,3,8,12,13,15,16] have been conducted on crop growth analysis using normalized difference vegetation index (NDVI) to support precision agriculture.Presently, site-specific crop management (SSCM), an important component of precision agriculture is being pursued vigorously to increase production, which involves five main processes of spatial referencing, crop and climate monitoring, attribute mapping, decision support systems, and differential action.NDVI study is appropriate in large area crop management but precise SSCM warrants use of advanced image processing approaches like high-end vegetation indices.As described earlier, slope-based vegetation indices (e.g., NDVI) are widely used in crop yield estimation [1,3,8,12,13,15,16].Baez-Gonzalez et al. [3] used Landsat ETM+ (enhanced thematic mapper) data in an NDVI model to predict corn yield in Sinaloa, Mexico.They obtained an average error of 9.2% in corn yield prediction. Yang et al. (2004) [5] used the United States Department of Agriculture (USDA) EPIC model to predict crop yield for a part of China.They found the differences in the statistical and simulated crop yield (using the EPIC model) was under 10%.Baez-Gonzalez et al. [8] modeled corn yield in Mexico with NDVI derived from NOAA-Advanced Very High Resolution Radiometer (AVHRR) images.Their model accounted for 89% variability in yields in irrigated conditions and 76% under non-irrigated conditions.
Gopalapillai and Tian [12] obtained correlation coefficients ranging from 0.13 to 0.98 for predicting corn yield from a study in 9 different fields and in two different years.They used aerial images of the corn plots with agriculturally controlled conditions and computed NDVI to model yield.The average correlation coefficient (r) between the NDVI and the yield from all the nine fields was 0.54.In the agriculturally controlled conditions, crop production parameters such as fertilizer, irrigation, and pesticide application are tracked in test plots maintained by researchers.However, these controlled conditions are not to be expected under real farming scenarios.Senay et al. [13] obtained a very high coefficient of determination (0.99) between non-discrete corn yield values (five classes) and spectral information from the NIR band (800-890 nm) of an aerial image of a 9 ha field crop under controlled condition.Plant et al. [9] obtained an R 2 of 0.65 while correlating cotton yield from a small research plot using NDVI.However, in general, farmers' crop fields are not under controlled conditions.It is essential to develop precision crop yield models using general field condition and discrete crop yield information.Therefore, this study attempted to develop a corn crop yield estimation modeling technique by using spectral information from the field.The models were developed using crop spectral data over several years.This SSCM precision agriculture study not only involves the widely used NDVI analysis but also explores the advantages of other vegetation indices including a green vegetation index (GVI), a perpendicular vegetation index (PVI) and a soil adjusted vegetation index (SAVI).NDVI [17,18] is determined using the red (R) and near-infrared (NIR) bands of a given image and is expressed as follows where  r and  ir are spectral reflectance from the R-and NIR-band images, respectively.The green vegetation index (GVI) was determined using where  g , and  ir are spectral reflectance from the G and NIR-band images.Lecain et al. [19] established the direct relationship between the GVI and grazed pasture green-up with the progression of the season.Todd and Hoffer [20] used GVI and NDVI to evaluate the effects of variations in soil texture and soil water content on vegetation cover and varying soil backgrounds.Therefore, based on their studies, we hypothesized that GVI could help in crop yield estimation and subsequent SSCM.
The main function of vegetation indices, other than NDVI, is to compensate for the effects of disturbing factors on the relationships between vegetation spectral reflectance as measured by crop characteristics, such as crop type, leaf area index (LAI), or canopy biomass [21].Undesirable disturbing factors include soil background and atmospheric conditions.Distance-based vegetation indices cancel or diminish the effect of soil brightness in cases where vegetation is sparse, i.e., the pixels in the image are a combination of vegetation and soil information [22].PVI and SAVI are some examples of distance-based vegetation indices [14].
SAVI tends to minimize soil brightness, a phenomenon that has been demonstrated by many researchers [23][24][25].Huete [26] introduced a soil calibration factor in the NDVI equation to account for the first order soil-vegetation optical interactions.SAVI is a compromise between NDVI and PVI and is defined as where L is a constant that is a surrogate for LAI.Huete [26] defined the optimal adjustment factor of L = 0.25 to be considered for higher vegetation density in the field, L = 0.5 for intermediate vegetation density, and L = 1 for the low vegetation density.He suggested that SAVI (L = 0.5) successfully minimized the effect of soil variations in green vegetation compared to NDVI.
Casanova et al. [27] reiterated that PVI, which corrects for soil reflectance, had a more linear and less-scattered relationship with the fraction of intercepted photosynthetically active radiation (f PAR ) than NDVI.The PVI equation is expressed as the function of the slope and intercept of the vegetation images of the R and NIR band, and the soil images of the R and NIR band [14].
where  r, s and  ir, s are reflectance of soil background in R and NIR bands, respectively; and  r and  ir = reflectance of vegetation in R and NIR bands, respectively.PVI is determined using the distance between the intersection point (G ir,s and G r,s ) and the vegetation image pixel coordinate (P ir and P r ) by the Pythagorean Theorem.
Each vegetation index (VI) provides information on vegetation vigor in the field.However, the comparison of broad-band and narrow-band red and NIR vegetation indices suggested that there were advantages of distance based vegetation indices over the slope based vegetation indices [28].They found that the predictive powers of SAVI and PVI were better than the NDVI for LAI prediction and estimation of green cover percentage of vegetation.The SAVI and PVI provided R 2 value of 0.87 and 0.86, respectively, for green cover estimation using narrow bands while the NDVI provided R 2 = 0.59.With the Advanced Very High Resolution Radiometer (AVHRR) platform, SAVI and PVI provided R 2 values of 0.58 and 0.68, respectively, for green cover prediction while the NDVI provided R 2 of 0.42.They used Landsat MSS and thematic mapper (TM) broad-band images for vegetation green cover and LAI prediction using distance based vegetation indices as well as slope based vegetation indices.The slope based vegetation indices (SAVI and PVI) performed better against NDVI.This finding was further supported by other researchers [29][30][31].
However, these studies were performed with low-resolution satellite images.Sufficient studies have not been completed using distance-based vegetation indices (PVI and SAVI) for processing aerial images and differentiating vegetation vigor to estimate crop yield.Moreover, the above-mentioned study on the prediction of LAI and green cover using vegetation indices showed potential for the use of VI (from an aerial image platform) for crop yield prediction.Thus, this study was designed to evaluate the use of slope based vegetation indices (NDVI, GVI) and distance based vegetation indices (SAVI and PVI) in corn crop yield estimation.
The vegetation index information is data intensive and correlates nonlinearly with spatial based crop yield.Therefore, a proper model building technique for crop yield prediction is essential.A general method previously used by researchers was statistical model building [1,12,13].Black [32] suggested that the complexity of the total environment affects crop production.The dataset relating to crop yield is essentially nonlinear.Considering the nature of data used for crop yield modeling, the neural network (NN) modeling techniques, which are analogous to information processing methods used in the human brain, can be a better substitute.NNs have the ability of computing, processing, predicting, and classifying data.The approach has the advantages of nonlinearity, input-output mapping, adaptivity, generalization, and fault tolerance [33].An NN functions as a massive parallel-distributed processor that has a natural property for storing experimental knowledge and making it available for use in prediction (a process known as training [33]).Through learning procedures, artificial neural networks (ANNs) have the power to approximate any non-linear relationship that exists between a set of inputs and their corresponding set of outputs [34].Zhuang and Engel and Ranaweera et al. [35,36] provided evidence regarding the advantages of NN modeling techniques over the statistical process in the case of nonlinear data modeling, to diminish collinearity problems and model flexibility.The back-propagation NN (BPNN) modeling technique was selected for yield prediction in our study.Moshou et al. [37] have used self-organizing map (SOM) NNs to classify crops and different types of weeds by using spectral reflectance measurements.They proved its superiority over results obtained by using statistical optimal Bayesian classifier.
In this study, we used NDVI, GVI, SAVI, and PVI measurements from aerial images to develop models for predicting corn crop yield before harvest, which can be used in general crop production.We used BPNN and other data mining techniques to create yield prediction models.The models could be useful for farmers to estimate spatial variability in crop yield using aerial images of the mid-crop season.
The objective of this study is to develop and evaluate BPNN models for predicting crop yield using vegetation index information.

Study Area and Aerial Images Acquisition
The research site is located in Oakes, North Dakota, USA.The Oakes Irrigation Test Area (OITA) consists of several quadrants.Each quadrant of the research site was irrigated using a central pivot irrigation system.Aerial images of the test area were acquired.The images were taken from several quadrants under differing cropping patterns (Figure 1, Table 1) (65 ha with corn) of the OITA site for three different years, 1998, 1999, and 2001.We used the best date images from four different quarters (NW29, SW16, SW03, and NW15) for 1998, five different quarters (NW15, SW03, NW22, SW16, and SE16) for 1999, and three different quarters (NW29 and NW22) for 2001 (Table 1 and Figure 1).These sites, with diversified crop production features (different nitrogen application rate, irrigation amount, soil moisture and texture conditions, etc.), were selected to allow development of the crop yield prediction models.The quarters (65 ha) chosen for the training and testing of the crop yield prediction models were different from each other and separated by several kilometers (Figure 1).It is to be noted that each quarter (65 ha) has a dimension of 805 × 805 m.Again, the selection of training and testing samples were done to include the most variation in field condition and crop production parameters (Table 1), so that the model could be used with different crop production conditions.Please refer to Figure 1 to understand the field name and their positions.
The imaging system used for the aerial image acquisition was an SLR 35 mm camera with either 100 or 200 ASA Ektachome slide film.The slides were scanned with a Nikon Scanner (Nikon Inc., Digital Imaging, Melville, NY, USA) at 2,800-dpi (dot per inch) resolution.The aerial images were saved in 8-bit TIFF format.The nominal ground resolution of the images was 60 cm × 60 cm.The initial raw images acquired from the airplane were not geometrically corrected.The images were georeferenced with ArcGIS 9.2 (ESRI, Redlands, CA, USA) with reference to actual ground control points acquired by Trimble (Sunnyvale, CA, USA) GeoXT GPS instrument.The geographical coordinates of each square quadrant corner were used for georeferencing.The images were taken using a broad range of the visible and NIR spectrum, ranging from 400-900 nm.Images of the study area were acquired with a window of 1 hour of solar noon (Central Standard Time).Color calibration of the aerial images of different years was performed with respect to the red, green, white, and black sheets placed in the field during the time of image acquisition.The camera used for the image acquisition was calibrated each time to an ideal standard in the laboratory before use to reduce aberrations in image gray levels.All the images were acquired under cloud free conditions.Other image acquisition parameters, such as flight height and image acquisition system (aperture and speed setting of the camera) were kept the same for all image acquisition dates.The images in all three years (1998, 1999, and 2001) were acquired in a narrow window during the growing season.Therefore, it was assumed that the affect of radiometric aberrations in the images were minimal for this study.

Extraction of Vegetation Indices
Our approach was to use various vegetation indices as input parameters for crop yield prediction models.Slope-based vegetation indices (NDVI and GVI) and distance-based vegetation indices (SAVI and PVI) were used.SAVI was also selected for determination of vegetation vigor.Unlike other slope-based vegetation indices, PVI and SAVI analysis minimizes the soil brightness effect on the image [38].Our objective was to minimize the effect of bare soil on the image of the study area.Based on our observations, we considered canopy cover of the corn crop in the field as intermediately dense during the aerial image acquisition period in 1998, 1999, and 2001.Thus, 0.5 was used as the L factor using the Huete [25] strategy of selecting the L factor, which was also supported by Thiam and Eastmen [14].PVI, NDVI, GVI, and SAVI for each image were calculated for the three years (1998, 1999, and 2001) using the IDRISI (IDRISI Production, Worcester, MA, USA) environment.
Initially, the NIR-and R-band images were processed using IDRISI 32.2 software.Thus, separate NDVI, SAVI, PVI, and GVI images were obtained using Equations 1, 2, 3, and 4, respectively.Each approximately 65-hectare (805 m × 805 m) image was divided into 100 grid images of 75 m × 75 m size for analysis of the crop yield pattern, based on different imagery information.These grid images were obtained after discarding a 25 m wide portion from the edges of the entire quarter (65 ha) image for which crop yield information was unavailable.This individual grid image information was used as independent observations for crop yield estimation model building.Yield information was available for a few of the grid images in some quarters and was used for the study.Figure 2 shows the extracted grid image positions in the aerial image of one of the quarters (NW29) as an example.

Yield Data Collection
The actual grid-based yield was recorded, along with longitude and latitude, by the yield monitor at 6 m intervals using the 'Microtrack' program (Micro Track Software Corporation, Wyomissing, PA, USA).A combined harvester that used the Microtrack' program was used for measuring the corn yield during harvest at each 6 m interval.It was recorded in a spatial format and stored in the memory of the software.Trimble differential global positioning system (DGPS) was used while recording the yield data and thus the error in the GPS recording was kept to a minimum of a few centimeters.The resulting file was downloaded into Surfer 7.0 program (Scientific Software Corp., Sandy, UT, USA).Using the 3-dimensional information the dataset was interpolated into the crop yield values at a 3 m interval using the kriging geostatistics option.An MS Visual Basic (Microsoft Corporation, Washington) program was written to calculate the average yield from each 75 m × 75 m plot (grid image) using the average sampling algorithm.The average sampling algorithm is expressed as where Y GP is the average crop yield from the individual plot (corresponding to grid images), X i is the yield from each individual 3 m grid within the 75 m × 75 m plot, and n is the total number of individual 3 m grids present in the entire plot.Spatial coordinates of the grid-based yield were used to correlate with the grid images.Actual yield from the field was used as the output neuron for the back-propagation neural network (BPNN) model.Data mining was an important aspect of the dataset preparation for crop yield estimation.Sometimes, due to inadequacy in data collection, data input misunderstanding, and/or equipment malfunction, some of the data were not recorded in the data set.These missing values can be filled by using the 'attribute mean' [39], most probable value [39], or by a global constant [39].In a classification problem, the mean of the group is used for the missing number to which it belongs.In our study, the yield matrix collected from the field had some missing data points; those data points were not recorded.Thus, we used the average 8-neighborhood technique, in which the missing data were filled with the mean of the adjacent eight neighborhood grid values.If the missing data were in the edge, we chose the mean of the adjacent five neighborhood values, where one row/column comprised of 3 values were not available.

Input-output architecture of the NN model
BPNN architecture was chosen to develop a predictive model for yield estimation.Thus, for four vegetation indices (NDVI, GVI, PVI, and SAVI) and three separate years (1998, 1999, and 2001), a total of 12 NN models were developed.These models used individual information like mean and standard deviation of each VI grid image as input and actual yield as output.In these models, the input and output parameters were not transformed, i.e., the actual gray value statistics obtained from image processing were used.

Data preprocessing/mining of input parameters
The original input parameters were in different ranges in different VI-NN models.For example, the SAVI mean values ranged from −1.5 to +1.5.NDVI and GVI mean values were from +1 to −1, while their corresponding standard deviations were less than one.However, the PVI mean values were high (0.05 to 250), while their standard deviations were low.Therefore, data preprocessing was essential to standardize the digital information.The data in their original form were used in initial evaluation of individual VI models without any preprocessing.
The best comparative individual VI model for corn crop yield estimation was chosen.The PVI individual model performed better compared to other individual VI models while using the preprocessed dataset for model building.Therefore, a data transformation technique was employed to enhance the model yield prediction results.PVI mean values were transformed using the log 10 transformation technique as they ranged from 0.05 to 250.The transformed PVI mean values were ranged from −1.3 to 2.4.Thus, three new models (Processed PVI) using the log 10 transformation data preprocessing technique for 1998, 1999, and 2001 were obtained.Moreover, the min-max scaling algorithm (Equation 6) was used in the Neural Works Professional II Plus software (Neural Ware, Carnegie, PA, USA) to bring the data range to a scale of -1 to 1.The min-max equation used in the software is as follows where X n is the scaled value of X, X i is the input variable X with 'i th ' training case, and X max and X min are maximum and minimum values of input variable respectively.

Processing of output parameters
The actual grid plot corn yields used as the output neurons for the BPNN models were from 2.52 t/ha to 18.77 t/ha.It was postulated that the model prediction accuracy might improve by reducing the data range of the difference of actual yield using a data transformation technique.The same log 10 data transformation technique was also used to transform the output dataset in the PVI model in anticipation of increased model prediction ability.Nonetheless, this output transformation was not carried out during the preliminary model development process.
Thus, initially twelve new BPNN yield prediction models (Table 2) were created, one each for the years 1998, 1999, and 2001, using original VI information as input and actual non-transformed yield data as output.Four more BPNN yield prediction models (Table 2) were later created using randomly selected data for training and testing from the combined pool data of 362 grid plots for all the three years.The training data included 287 grid plot information and the testing data included 75 grid plot information.Individual models were compared with each other based on their prediction ability and using the yearly model information.In the best resultant VI model (PVI model), where the output yield was transformed using the log 10 technique, the predicted yield was obtained in the same log 10 form.Therefore, it was necessary to transform the obtained output to its conventional original range format (t/ha).This transformation was carried out by performing an antilog (10 x ) of the predicted output yield for each grid plot image.

Neural network model development and evaluation
BPNN is known as a multiplayer perceptron (MLP) network, because of how it handles errors.Back-propagation solves the problems of "assignment of error in prediction to which input group" by propagating the output error backward into the network [40].This process is repeated until the input layer is reached with a model of minimum possible error [40].
A typical BPNN model consists of an input layer, hidden layer(s), and an output layer as shown in Figure 4. BPNN structural algorithm is very well known and widely published in many books, journal papers, and other neural network software manuals [33,[42][43][44].For this study, the Neural Works Professional II Plus (Neural Ware, Carnegie, PA, USA) software was used to develop and test the BPNN.In the NN software, for BPNN modeling, error is back-propagated to the hidden layer each time and subsequently to the input layer with each iteration.Then, the weights connecting the input neurons (processing elements in the NN) to the hidden neurons changed randomly to establish a better correlation between the input neurons and the actual output.Thus, at a particular point, the lowest error to stop the model was obtained.The comparison of model prediction ability was performed on optimized models that had optimal hidden layers, hidden nodes, learning rate, momentum rate, and epochs.The step-by-step optimization procedure [41] was used to optimize the models.
The BPNN model performances were evaluated based on Root Mean Square Error (RMSE), prediction accuracy, and standard error of prediction (SEP).Moreover, the correlation coefficient (r) between the actual and predicted output along with the slope and intercept of linear regression model was used.The equation for RMSE is given by where n is the number of observation, p is the number of parameter to be estimated and SSE and MSE are the sum of squared error and the mean square error, respectively.Average test prediction accuracy is calculated based on Equation 16: where N is the total number of observations and OP A and OP P are actual and predicted output, respectively.
A C++ program (MS-Visual C++, Microsoft Corporation, Bellevue, WA, USA) was developed to determine the predicted yield accuracy, subsequent actual yield, and predicted yield correlation coefficient (r), intercept (a), slope (), and SEP from the back-propagation neural network result.The predicted and actual output regression analysis was done using the following linear equation: where X and Y are predicted and actual output, respectively,  is slope and a is the intercept.The SEP of the predictive model is calculated by using the following equation [42].(10) where d m is the mean of the difference between actual and predicted values Y and X (of i th individual), respectively and n is the total number of observations.

Best Date Image Selection for the Study
Images from the mid cropping season (July 2 nd half or August 1 st half) of corn in each year best correlated to the yield.This was based on our earlier study [43] on the temporal analysis of different cropping period images with respect to their ability to predict crop yield.Thus, we used visible and infrared band aerial images on July 30, July 24, and August 2 for 1998, 1999, and 2001, respectively.

VI image Analyses and Soil Line Information for PVI Analysis
Four vegetation index images were created using the grid images of the quarters for the years 1998, 1999, and 2001.VI images were created using the R-, G-(only in case of GVI), and IR-band image information using IDRISI.Examples of four vegetation index images of a single quarter (SW16) field for 1999 are shown in Figures 5, 6, 7, and 8, respectively.Vegetation vigor (strength) of the quarter could be established from these VI images by visual analysis.The non-irrigated corners of the quadrants outside of the irrigation pivot line were well differentiated from the irrigated area.However, in the irrigated portion of the field, there was less variation in vegetation vigor, since the corn crop with similar crop production inputs provides uniform crop vigor during the period under consideration in the study.There were only a few roads, walking tracks, and a farmstead (in the bottom left of the quadrant) in the crop field.These features were well differentiated in the vegetation index color images.Of course, the change in vegetation density in the images (refer to the index scale at the side) could be visually observed from the color images.

Performance Evaluation of VI-BPNN Models
Sixteen models using individual VI data (NDVI, GVI, PVI, and SAVI means and their standard deviations, respectively) for 1998, 1999, 2001, and the pool data were optimized using a step-by-step approach.For the optimization of model architecture, a fan-in approach was followed, i.e., the number of hidden nodes was less than the number of input neurons.For each individual VI model, the neural network architecture was 2-1-1 for all the three years.The model architecture 2-1-1 suggests that the back-propagation neural network had 2 input neurons, 1 hidden node in a single hidden layer, and one output neuron (corn yield).The modeling simulation was carried out with initial network parameters, such as learning rate (0.5), momentum term (0.5), and 20,000 epochs (iterations).The optimum network parameters for each network are provided in Table 2. Table 2 also provides the comparative information on the prediction ability of each model using individual VI information for three years and the pool.

Individual VI Models
Models with PVI 98 data provided better performance than the other VI models, measured on the basis of yield prediction accuracy of 83% with an actual and predicted output coefficient of determination (r 2 ) value of 0.69 for 1998 (Table 2).However, the SAVI 99 model for 1999 provided the highest average prediction accuracy of 95% with an r 2 of 0.53 between actual and predicted crop (corn) yield.The PVI 99 model could predict the corn yield with an average yield prediction accuracy of 93%, with an r 2 of 0.25.The PVI 01 model for 2001 was the best among other individual VI models in that year with an average prediction accuracy of 96% (Table 2).The other individual VI models (SAVI 01, NDVI 01, and GVI 01) in 2001 also predicted with comparable average accuracies (94%-95%).However, the PVI (Pool) model was best among all other pool models.The PVI (Pool) model provided a 94% average testing prediction accuracy and an r 2 of 0.45.
The results showed that the use of a distance-based vegetation index, especially PVI that diminishes the bare soil reflectance factor from the image, was the better procedure to use in crop yield prediction.It should be noted that individual PVI models of each year had a very high range of input variability (mean ranging from 0.05-250) compared to the small range digital data used in other individual VI models.Therefore, the individual PVI models were found to be better models due to their consistent results.SAVI, being a distance-based vegetation index, also provided greater yield prediction ability for 1999 and 2001.For the year 1998, the average prediction was very low.Therefore, the SAVI model could not provide consistent yield prediction.One reason for this could be attributed to our assumption of a fixed L = 0.5 in the calculation of SAVI.Both the distance-based vegetation indices SAVI and PVI were found to be better input for the yield prediction models than the slope-based vegetation indices (NDVI and GVI).GVI models provided better corn yield prediction accuracies of 93% and 94% in 1999 and 2001, respectively.R 2 values were 0.53 and 0.30 for the respective models.The GVI (Pool) model also provided high testing prediction accuracy of 92% and an r 2 of 0.37.However, in 1998, the GVI model performed poorly with average yield prediction accuracy of only 24%.Therefore, GVI models did not show consistently better performance for these three years.On the other hand, the PVI models showed consistency and comparatively better prediction accuracies.This finding supports the findings of Elvidge and Chen [28] for comparison of vegetation indices on the basis of predicting the LAI and green cover.Their finding indicated that prediction of LAI and green cover using SAVI and PVI based on any platform and band range were equal to each other and were better than those by NDVI.
Graphical comparisons of the predicted and actual yields using different VI models are shown in Figures 9, 10, 11, and 12 for 1998Figures 9, 10, 11, and 12 for , 1999Figures 9, 10, 11, and 12 for , 2001, and pool, respectively.The PVI models predicted better in two different years (1998 and 2001) than all other individual VI models based on better average prediction accuracies.However, it did not perform similarly for the year 1999 and for pool data, but the prediction ability was almost similar to other individual VI models.It is important to mention that there was erratic crop yield in 1998 for different quadrants due to a late season pest attack.Most of other VI models except PVI model could not predict the crop yield competently.Hence, the superiority of PVI model was clearly established for that year.Therefore, the PVI model for each year was chosen for additional performance enhancement using log 10 transformation technique.Again, the inconsistency in corn yield predictability in different years was because crop yield is a very complex phenomenon that depends upon many other non-imagery factors, such as late season pest attack, nitrogen stress, or water scarcity.These factors were not reflected in the aerial image taken in the later part of July in each year, nor considered in developing the models reported in this paper.It is postulated that we could obtain very high performance in crop yield prediction if the crop production parameters remain consistent throughout the season until the crop is harvested.The year 1998 is the perfect example of this.

Transformed BPNN PVI Models
The model optimization was obtained with 2-1-1-network architecture, network parameters of learning rate (0.1), momentum term (0.1), and 50,000 epochs for 1998 model (transformed PVI 98).The average yield prediction accuracy of 90% (on testing dataset) was obtained.It was almost 7% higher than the prediction accuracy (83%) of the earlier model that was obtained using the data in their original form (PVI 98 model) (Table 3).In both cases, the obtained r 2 was 0.69 (Tables 2 and 3).This result showed the usefulness of data transformation in NN modeling to increase the prediction accuracy over that of earlier models, which used the data in their original form.An average yield prediction accuracy of 97% was obtained for the transformed PVI 99 model.The accuracy was 1-6% higher than those obtained by all individual VI models of the same year, including PVI 99 model.The r 2 obtained for Transformed PVI99 model was only 0.20 (Table 3).
The transformed PVI 01 model provided an average yield prediction accuracy of 98%, which was a 2% increase over that given by the earlier PVI 01 model.The model r 2 was 0.21 (Table 3) an increase of 0.12 from the PVI 01 non-transformed model (Table 2).Average yield prediction accuracies from the transformed PVI models were greater than those obtained from the other individual VI models that used the original data without transformation.
The linear regression analysis report of the anti-log predicted (original format) and actual test plot yields correlation provided the same linear regression parameters as the log 10 -transformed models (1998, 1999, and 2001) (Table 3).Figures 13, 14, and 15 compare the predicted versus actual yield using the anti-log transformation for 1998, 1999, and 2001.The errors of predictions are also shown in these figures.In the transformed PVI 98 model, five discrete yield data points out of 32 testing data had absolute variations of approximately 4 t/ha.Predicted yield variations for other grid plots were from 0-2 t/ha.However, the variations of yield prediction from actual yield were very low (0-1 t/ha) for most of the test grid plots (21) used in the transformed PVI 99 model.The errors in corn crop yield prediction were −2 to −3 t/ha.In 2001, the transformed PVI model had generally lower predicted yield variation than the actual yield (0-1 t/ha).Out of the 30 test grid plots, the yield prediction varied from the actual yield (corn) by −2 t/ha for only one individual grid plot.The pool model that used randomly selected training and testing data over three years (1998, 1999, and 2001) provided an r 2 value of 0.72 with an average prediction accuracy of 93.05 which was lower than Transformed PVI models of 1999 and 2001 but better than 1998 (Table 3).The SEP obtained from the model was only 0.05 t/ha (Table 3).Figure 16 provides a comparative graph of actual and predicted yields for those 75 testing data points.It includes the error in prediction values in t/ha.Again, the difference in prediction ability in various years using the BPNN could be attributed to the varying crop management factors (non-imagery), such as soil quality, soil and air temperature, applied nitrogen quantity, ground elevation, available volumetric water content, diseases etc.Many of these factors were not represented in the mid-crop season period image information used for the development of the model.

Conclusions
Four widely used spectral indices, including GVI, NDVI, PVI, and SAVI were investigated in the study of irrigated corn crop yield estimation. PVI, a distance-based VI technique, was found to be better than other individual VI techniques for yield prediction of corn, as it reduced the interference caused by the bare soil information present in the aerial image.PVI based models provided average corn yield prediction accuracies of 83.5%, 93%, 96% in 1998, 1999, and 2001, respectively.These accuracies were about 59 to 64% higher, −2 to 2% higher, and 1 to 2% higher than other prediction models in 1998, 1999, and 2001, respectively.Data transformation techniques using the log 10 -transformed procedure with PVI mean and corn actual yield increased the prediction accuracy by more than 7% for 1998.With this technique, testing prediction accuracies of more than 97% with the log 10 -transformed yield were obtained for 1999 and 2001.The PVI pool model was developed using the randomly selected training and testing data of all three years.The transformed PVI pool model provided as average testing accuracy of 93% along with coefficient of determination (r 2 ) value of 0.72 and standard error of prediction of 0.05 t/ha.The study supports the use of crop images for yield estimation.The main and unique contribution of this study in precision agriculture was to show that distance based vegetation indices like PVI can help improve the crop yield prediction results.This research also verified the utility of NN application and data transformation technique as tools for crop yield prediction with high accuracies.

Figure 1 .
Figure 1.The false color composite (FCC) aerial image of the Oakes area in 1998 showing quarter pattern used for this study.

Figure 2 .
Figure 2. Aerial image of the OITA NW29 quarter in July 30, 1998, showing grid image pattern.

Figure 3 .
Figure 3. Schematic of working procedure for imagery input parameter extraction for BPNN model.

Figure 6 .
Figure 6.Normalized difference vegetation index (NDVI) image of the OITA SW16 quarter of 1999.

Figure 7 .
Figure 7. Soil adjusted vegetation index (SAVI) image of the OITA SW16 quarter of 1999.

Figure 9 .
Figure 9.Comparison of VI-NN model prediction ability for the year 1998.

Figure 10 .
Figure 10.Comparison of VI-NN model prediction ability for the year 1999.

Figure 11 .
Figure 11.Comparison of VI-NN model prediction ability for the year 2001.

Figure 12 .
Figure 12.Comparison of VI-NN model prediction ability for pool data.

Figure 13 .
Figure 13.Comparison of actual versus predicted corn yield using testing dataset of the optimal prediction model of the year 1998 (Transformed PVI 98).

Figure 14 .
Figure 14.Comparison of actual versus predicted corn yield using testing dataset of the optimal prediction model of the year 1999 (Transformed PVI 99).

Figure 15 .
Figure 15.Comparison of actual versus predicted corn yield using testing dataset of the optimal prediction model of the year 2001 (Transformed PVI 01).

Figure 16 :
Figure 16: Comparison of actual versus predicted corn yield using testing dataset selected from all three years (1998, 1999, and 2001) pool data (Transformed PVI (Pool)).

Table 1 .
List of dataset used for hybrid BPNN yield prediction models of various years.
* 29NW is the name of the quadrant which is in the NW corner of the Oakes Irrigation Test Area plot number 29.

Table 2 .
The performance of hybrid VI-BPNN models using original data.
a Leaning rate used in neural network simulation; b Momentum term used in neural network simulation; c Intercept of the linear fit equation; d Slope of the linear fit equation between the actual and predicted crop yield; e Correlation coefficient of the linear fit model; f Architecture (BPNN); g The individual VI models used 100 grid plots for training and 31 grid plots for testing model development in 1998; 80 grid plots for training and 21 for testing in 1999; and 100 grid plots for training and 30 for testing in 2001; h The pool models used randomly selected 287 grid plot vegetation index information as training data and 75 for testing data.They were chosen from the pool of total 362 grid plots of 1998, 199, and 2001.

Table 3 .
10e performance evaluation of the optimal PVI models using log10transformed input (mean only) and output.coefficient of determination, b intercept, c Slope of the linear fit model, and d Standard error of prediction of the model based on log 10 transformed yield.
a e Transformed PVI BPNN model, f The pool models used training and testing data randomly selected from all three years.