^{1}

^{1}

^{1}

^{2}

^{3}

^{*}

^{1}

^{2}

^{3}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Hyperspectral imaging in the visible and near infrared (VIS-NIR) region was used to develop a novel method for discriminating different varieties of commodity maize seeds. Firstly, hyperspectral images of 330 samples of six varieties of maize seeds were acquired using a hyperspectral imaging system in the 380–1,030 nm wavelength range. Secondly, principal component analysis (PCA) and kernel principal component analysis (KPCA) were used to explore the internal structure of the spectral data. Thirdly, three optimal wavelengths (523, 579 and 863 nm) were selected by implementing PCA directly on each image. Then four textural variables including contrast, homogeneity, energy and correlation were extracted from gray level co-occurrence matrix (GLCM) of each monochromatic image based on the optimal wavelengths. Finally, several models for maize seeds identification were established by least squares-support vector machine (LS-SVM) and back propagation neural network (BPNN) using four different combinations of principal components (PCs), kernel principal components (KPCs) and textural features as input variables, respectively. The recognition accuracy achieved in the PCA-GLCM-LS-SVM model (98.89%) was the most satisfactory one. We conclude that hyperspectral imaging combined with texture analysis can be implemented for fast classification of different varieties of maize seeds.

Effective variety discrimination of maize seeds is increasingly vital for the growing food industry owing to the appearance on the market of more and more new maize varieties like Sweet maize, Waxy maize, Popcorn, Dent maize and Amylomaize, during these years. Different varieties of maize seeds have different characteristics and qualities. Types of maize are commonly classified depending on their quality parameters, such as oil content, sweetness, and degree of waxiness. How to target and recommend an appropriate maize variety which meets the varietal purity standards for target markets is a serious problem faced by variety breeders, farmers, bulk handlers, marketers, and others. However, the traditional and prevailing methods for seed cultivar identification, like grain morphology, fluorescent scanning, protein electrophoresis and DNA molecular markers are time consuming, expensive, complex to use and subject to human error and inconsistency. To overcome these shortcomings, an approach for quickly and reliably identifying maize seed varieties would be highly desirable and beneficial from both technical and economical points of view. Thus, is this work automatic variety identification based on hyperspectral imaging technique was investigated.

Hyperspectral imaging is an emerging platform technology that integrates spatial information, as regular imaging systems, and spectral information for each pixel in the image. Compared to conventional RGB imaging, NIR spectroscopy and multispectral imaging, hyperspectral imaging has many advantages, like containing spatial, spectral and multi-constituent information and sensitivity to minor components [

Regarding the classification of agricultural products, the technique has been successfully applied in detection on apples [

Although many studies have been focused on wheat and rice variety identification and quality inspection, no research endeavours using hyperspectral imaging have been reported for maize seeds. Therefore, it is our interest to implement this technology to aid visual inspection and replace human judgement in the discrimination of different seeds. The aim of this study was to investigate the feasibility of using hyperspectral imaging in the 380–1,030 nm visible and near infrared spectral region for the variety discrimination of maize seeds. The specific objectives were to: (1) extract spectral features from the average reflectance spectrum of hyperspectral images using principal component analysis (PCA) and kernel principal component analysis (KPCA); (2) extract texture features from hyperspectral images using PCA and Gray-level co-occurrence matrix (GLCM); (3) develop several classification models using least squares-support vector machine (LS-SVM) and back propagation neural network (BPNN) based on different combinations of spectral features and texture features, respectively, and (4) obtain an optimal calibration model after comparing the performance of different algorithms.

A total of 330 samples of six maize seed varieties were collected from the Seed Company of Zhejiang Province in China, including Heinuo (I), Huyunuo (II), Sukehuanuo (III), Jinyin (IV), Meiyu (V) and Suyu (VI). These six varieties of maize seed were all produced in Zhejiang Province in 2010. There were different cultivar registrated codes among these different brands according to Maize GB1353-2009, State Standard of the People’s Republic of China. This classification method is mainly based on the testa colour. Maize seeds were evenly distributed in glass dishes of the same size (∅120 mm × 10 mm), and the surface of samples was smoothed. Each dish was then imaged individually in the hyperspectral imaging system as explained below.

A laboratory visible and near infrared (VIS-NIR) hyperspectral imaging system was assembled to acquire hyperspectral images for maize seeds. As shown in

Each glass dish filled with seeds was placed on the mobile platform and then moved at a speed of 4.5 mm/s to be scanned using 0.06 s exposure time to build a hyperspectral image with dimensions (

For calculating the reflectance spectrum, the spectral raw images (_{0}

PCA is a multivariate statistical tool developed primarily to obtain a parsimonious representation of multivariate data. Orthogonal transformation by PCA results in fewer independent variables but maximum representation of original variables [

PCA was also directly employed on the selected ROI images to create the PC images using ENVI software. Each PC image is a linear sum of the original images at individual wavelengths multiplied by corresponding (spectral) weighting coefficients [

In order to compare with PCA, another reduction dimension approach, kernel principal component analysis (KPCA), was implemented to extract the spectral features. KPCA successfully extends PCA to nonlinear cases by performing PCA in a higher or even infinite dimensional feature space which is nonlinearly transformed from input space and implicitly defined by a kernel function [_{1}, …,_{n}], ^{P}→_{n}) [

GLCM analysis was executed to extract second-order statistical textural features variables from the PC images using each of the selected dominant wavelengths. GLCM is a statistical technique for texture analysis. Probably, the most frequently cited method for texture analysis is based on extracting various textural features from a GLCM. A general procedure for extracting textural features of image in the spatial domain was presented by Haralick

LS-SVM is a state-of-the-art statistical algorithm capable of learning in high-dimensional characteristic space with fewer training variables or samples [

In order to compare the performance of LS-SVM models, BPNN was applied in this study. BPNN is a type of nonlinear neural network used to solve several types of classification and regression problems. The eigenvectors obtained from compressing the raw spectra were processed by the neural network and the network output expresses the resemblance that an object corresponds with a training pattern [

The actual optical sensitivity of this system ranges from 380 to 1,030 nm but only the range of 500–900 nm was used to avoid low signal-to-noise ratio. The average reflectance spectra of each variety of seeds in the spectral range of 500–900 nm are shown in

PCA was applied on all spectral data (500–900 nm) acquired from all samples to reduce the high dimensionality and to check qualitative discrimination in the spectra among the maize seeds. The explained variance rate for the first three principal components was 95%, 3% and 1% of the total variance, respectively. It indicated that the cumulative reliabilities of the first three PCs could explain 99% of the total information, so they could be used to represent the 315 variables for classification of maize seeds. The interpretation of the results of PCA is usually carried out by visualization of its PC scores.

Similarly, KPCA was used in the spectral region between 500 nm and 900 nm. The top three KPCs were extracted and they could explain 99.63% variance of all features, which corresponding to the accumulative variance of 99% from the first three PCs by PCA. It sketched that the first three KPCs could also express the total spectral information and the KPCA feature extraction method is a little more superior to the traditional PCA method.

As stated above, PCA directly implementing on each ROI image using ENVI was used for identification of optimal wavelengths. The PC loadings can be used to identify sensitive wavelengths that are highly correlated with each PC’s.

The top three PCs were used for x-loading weights to select wavelength in the entire spectral range. The wavelengths corresponding to peaks (maxima) and valleys (minima) at these particular principal components were selected as optimum wavelengths (

The wavelengths selected before may represent the differences of colour and different content of ingredients in maize seeds. Thus, the monochromatic images of the effective wavelengths were then selected as the optimal images to represent the most significant variance and loading weights within the whole region. Four textural features including contrast, homogeneity, energy and correlation were calculated from GLCM of each monochromatic image. Additionally, there were three monochromatic images for each sample corresponding to optimal wavelengths 540 nm, 670 nm, and 800 nm, so 12 textural features were then generated for one sample through GLCM feature extraction.

Regarding LS-SVM models, the optimization-value ranges for the regularization parameter γ and the RBF (radial basis function) kernel function parameter σ^{2} were set at 2^{−1}–2^{10} and 2–2^{15}, respectively, which were determined by applying a grid-search technique. For each combination of γ and σ^{2} parameters, the root mean square error of cross-validation (RMSECV) was calculated. The optimum parameters were selected when they produced the smallest RMSECV. Ninety samples in the prediction set were classified by the LS-SVM model with the optimal combinations of (γ, σ^{2}).

For BPNN models, the optimal parameters of this matrix in modeling process were set as follows after the adjustments of parameters. The number of hidden layers, the dynamic parameter, the goal error and the times of training were set as 9, 0.6, 0.00001 and 1,000, respectively. The threshold error of recognition was also set as ±0.5.

For comparison, several LS-SVM and BPNN models were established using the selected PCs, KPCs and the textural variables as different inputs, respectively.

The above excellent discrimination results suggested that VIS-NIR hyperspectral imaging technique combined with PCA-GLCM feature extraction and LS-SVM could be successfully applied for conducting fast variety identification of commercial maize seeds. Three wavelengths (523, 579 and 863 nm) were selected as the optimum wavelengths according to first three PCs loading weights. Based on four textural features calculated from GLCM of each monochromatic image at optimal wavelengths, prediction accuracy of 98.89% was achieved using the LS-SVM calibration model, which was higher than that of using KPCA and BPNN calibration models. This increased accuracy is very important for discrimination of multiple varieties of maize seeds in mass and practical applications. Combining spectral features and texture features to establish LS-SVM discrimination models was proved as a prominent way for image classification with high accuracy. This finding will provide assistance for the future research of hyperspectral imaging analysis. Expanding the variety number and optimizing the image process algorithm should be put more effort in future study to validate the repeatability of the algorithms for real-time online applicability. Besides, more effective wavelengths would be acquired, which might be also important for the on-line inspection and portable instruments for commercial applications of adulteration detection.

This work was supported by the National Science and Technology Support Program of China (2011BAD20B12), Zhejiang Provincial Natural Science Foundation of China (Z3090295), Agricultural Science and Technology Achievements Transformation Fund Programs (2009GB23600517) and the Fundamental Research Funds for the Central Universities (2012FZA6005).

Schematic diagram of hyperspectral imaging system.

Images acquired from six varieties of maize seeds.

Vis/NIR reflectance of six different maize seeds extracted from the ROI pixels of hyperspectral images.

Score cluster plot with PC1× PC2 × PC3 of each maize variety.

Loading weights of the first three PCs from PCA on ROI images for selecting optimal wavelengths.

Monochrome images obtained using three selected optimal wavelengths.

Statistic result of discrimination models for prediction.

PCA | 95.00 | 93.33 | 94.58 | 91.11 |

PCA-GLCM | 100 | 98.89 | 97.50 | 91.11 |

KPCA | 93.75 | 93.33 | 93.33 | 91.11 |

KPCA-GLCM | 99.58 | 96.67 | 98.33 | 90.00 |