Early Detection of Aspergillus parasiticus Infection in Maize Kernels Using Near-Infrared Hyperspectral Imaging and Multivariate Data Analysis

Fungi infection in maize kernels is a major concern worldwide due to its toxic metabolites such as mycotoxins, thus it is necessary to develop appropriate techniques for early detection of fungi infection in maize kernels. Thirty-six sterilised maize kernels were inoculated each day with Aspergillus parasiticus from one to seven days, and then seven groups (D1, D2, D3, D4, D5, D6, D7) were determined based on the incubated time. Another 36 sterilised kernels without inoculation with fungi were taken as control (DC). Hyperspectral images of all kernels were acquired within spectral range of 921–2529 nm. Background, labels and bad pixels were removed using principal component analysis (PCA) and masking. Separability computation for discrimination of fungal contamination levels indicated that the model based on the data of the germ region of individual kernels performed more effectively than on that of the whole kernels. Moreover, samples with a two-day interval were separable. Thus, four groups, DC, D1–2 (the group consisted of D1 and D2), D3–4 (D3 and D4), and D5–7 (D5, D6, and D7), were defined for subsequent classification. Two separate sample sets were prepared to verify the influence on a classification model caused by germ orientation, that is, germ up and the mixture of germ up and down with 1:1. Two smooth preprocessing methods (Savitzky-Golay smoothing, moving average smoothing) and three scatter-correction methods (normalization, standard normal variate, and multiple scatter correction) were compared, according to the performance of the classification model built by support vector machines (SVM). The best model for kernels with germ up showed the promising results with accuracies of 97.92% and 91.67% for calibration and validation data set, respectively, while accuracies of the best model for samples of the mixed kernels were 95.83% and 84.38%. Moreover, five wavelengths (1145, 1408, 1935, 2103, and 2383 nm) were selected as the key wavelengths in the discrimination of fungal contamination levels. In general, near-infrared hyperspectral imaging can be used for early detection of fungal contamination in maize kernels.


Introduction
Maize (Zea mays L.) is one main cereal crop that has been broadly cultivated worldwide for human consumption and animal feed.However, maize is susceptible to be infected with toxigenic fungi, such as Fusarium spp.(F.verticillioides, Gibberella moniliformis, F. graminearum) and Aspergillus spp.(A.niger, A. flavus, A. parasiticus), in the field and/or post-harvest conditions.Different toxigenic fungi can produce relevant toxic metabolites, for example, aflatoxins by A. flavus and A. parasiticus, ochratoxins by A. niger, fumonisins by Fusarium spp.[1,2].Among the various toxins, aflatoxin is highly toxic and carcinogenic to animals and humans [3,4].Therefore, there has been a mass of studies focusing on detection of aflatoxins and fungi on maize to prevent health threats to people resulting from consumption of infected kernels.
Traditional methods to detect fungal contamination on cereals are microbiological methods and diagnostic media for toxigenic fungi identification, and immunological methods for toxin detection.However, these methods are usually time-consuming, professional, expensive, labour-intensive and contaminative by introducing harmful chemical reagents [2,[5][6][7][8].
In recent years, hyperspectral imaging, which was originally developed for remote sensing applications, emerges and becomes one of the most potential techniques for fast, non-destructive detection of food quality and safety [9][10][11].Hyperspectral imaging integrates conventional imaging and spectroscopy to obtain both spatial and spectral information of the object [12,13].It allows interactive analysis between spectral and surface properties of samples based on each pixel point of images acquired with a wide wavelength range of spectral such as UV-visible, near infrared (NIR) or infrared (IR) [2].Data of hyperspectral images are three-dimensional arrays of the form, X (m × k × λ), where m and k are the two axes of spatial information of the image, and λ is the number of wavelengths of the spectra.
Hyperspectral imaging also has become a promising non-destructive technique for detection of toxin and fungi contamination on cereals.Yao [14] studied hyperspectral Bright Green-Yellow Fluorescence (BGYF) imaging to detect single corn kernels contaminated with aflatoxin.Firrao et al. [15] used multispectral imaging (720-940 nm) to predict fumonisin content of milled maize.Wang et al. [16][17][18] studied the use of hyperspectral imaging, with the range of 400-1000 nm and 1000-2500 nm, to detect aflatoxin B1 on maize kernels, which were artificially titrated on the kernel surface in the laboratory and inoculated with Aspergillus flavus conidia in the field, respectively.Kandpal et al. [19] verified the feasibility of short wave infrared (SWIR) hyperspectroscopy to detect aflatoxin on the surface of three different corn varieties (yellow, white, and purple).The results showed that the hyperspectral imaging technique can be used to detect the aflatoxin on or in maize kernels.However, research also found that it is hard to directly detect mycotoxins non-destructively.Therefore, some studies turn around to detect the contamination caused by fungi or mycotoxins, which affects other chemical and optical properties of whole kernels, and these changes can be detected with NIR hyperspectral imaging or spectroscopy [17,[20][21][22].Moreover, compared to toxin detection, detection of toxigenic fungi on cereals is important for the objective of early detection, which is significant in that it can prevent contamination in the form of mycotoxins in the food chain.Singh et al. and Zhang et al. [23,24] studied the application of near-infrared reflectance hyperspectral imaging for detection of wheat kernels infected with different fungi.Bauriegel et al. [25] studied wheat plants using hyperspectral imaging (400-1000 nm) for early detection of Fusarium infection, and results demonstrated that Fusarium infestation could not be detected by spectral analysis immediately after infection.Williams et al. [26] used hyperspectral imaging with a spectral range of 1000-2498 nm to study maize kernels inoculated with F. verticillioides at the incubated process from 0 to 90 h, and results showed that principal component analysis (PCA) models without pre-processing could discriminate between the images of the control, 17 h after inoculation and 20-90 h, and infection that changes the starch structure or content was noted.Del Fiore et al. [2] studied early detection of maize kernels inoculated with different toxigenic fungi using visible-near infrared (Vis/NIR) hyperspectral imaging (400-1000 nm), and results indicated that the analytical method they presented allowed early detection of fungal contaminants on the maize, starting at 48 h from inoculation and incubation at 30 • C for A. flavus and A. niger.All of the studies showed the possibility of the application of hyperspectral imaging in fungi detection on cereals.Although some scientific studies were successfully carried out to detect toxin and fungi contamination on maize at an early stage [2,19], studies on detection of fungi contamination on maize are limited, and are applied with different wavelength bands of hyperspectral imaging technologies, such as fluorescence (ultraviolet, UV), 400-1000 nm (Vis/NIR), and 1000-2500 nm (NIR), for different fungi infection detection, such as Fusaium spp.and Aspergillus spp.[2,26,27].Therefore, there are limited studies on detection of Aspergillus parasiticus infection in maize kernels using NIR hyperspectral imaging.
The objectives of this work were to study early detection of maize kernels infected with Aspergillus parasiticus using NIR hyperspectral imaging.To that end, the specific objectives were to: (1) detect fungal contamination of maize kernels before the onset of visual symptoms; to (2) discriminate contamination and/or damage levels; and to (3) determine optimal wavelengths for subsequent instrument development.

Maize Kernel Preparation
Maize kernels (Jingke 968) harvested from Hefei City, Anhui Province, China in 2015 were used for the experiment.Firstly, all kernels were immersed in distilled water for 24 h.Then, in order to remove the natural fungi on both the surface and inside of the kernel, samples were sterilised by rinsing in 75% ethanol and 1% NaClO solution separately.After that, kernels were washed with sterile distilled water and then left to dry in a laminar flow for 2 h.A total of 288 kernels were randomly selected, 36 of which were taken and used as control (DC for short, means without inoculation with fungi).

Fungal Inoculation
Aspergillus parasiticus (3.6155) strain was obtained from China General Microbiological Culture Collection Center (CGMCC, Beijing, China).The strain was firstly transferred onto potato dextrose agar (Baiaolaibo, Beijing, China) and incubated at 28 • C for 7 days.Spores were obtained from the agar surface by gently rubbing the colonies using a sterile loop.The spore suspension was shaken and then poured through two layers of sterile cheesecloth to remove mycelium.Thereafter, the inoculums concentration (spores/mL) was adjusted to 10 6 spores•mL −1 by microscopic enumeration with a haemocytometer (Qiujing, Shanghai, China).Thirty-six sterilised maize kernels were inoculated each day with Aspergillus parasiticus from 1 to 7 days.Kernels were inoculated by dipping into the spore suspension for 30 s and were then incubated at 30 • C. The contaminated kernels were divided into seven groups based on the incubated time, that is, incubated for 1 to 7 days, referred to as D 1 , D 2 , D 3 , D 4 , D 5 , D 6 , D 7 , respectively.Thirty-six kernels of each group were separately placed onto two Petri dishes (18 kernels per Petri dish).On one Petri dish, the kernels were orientated with germ up.On another one, kernels were germ down.After collecting the images, two kernels were taken away from each group to verify the presence of the fungus based on common fungal isolation and culture methods.The procedure is as follows: firstly, kernels were washed with sterile distilled water, and then the used water was dropped onto rose Bengal agar and incubated at 30 • C for three days.

NIR Hyperspectral System and Image Calibration
Hyperspectral images were collected by spectral camera (Specim, Spectral Imaging Ltd., Oulu, Finland).The camera was equipped with a cryogenically cooled Mercury-Cadmium-Telluride (MCT) detector.Individual images consisted of 288 spectral bands with the range from 921 to 2529 nm at 5.6 nm resolution.The matched Lumo software (Specim, Spectral Imaging Ltd., Oulu, Finland) was for data acquisition and camera parameters setting.Eight images (DC, D 1 , D 2 , D 3 , D 4 , D 5 , D 6 , D 7 ) were collected at identical times (the eighth day from the first inoculation, and control group DC samples were prepared at the seventh day) after all inoculations from 1 to 7 days.Each image consisted of the above-mentioned two uncovered Petri dishes.To reduce the influence of illumination and dark current of the charge-coupled-device (CCD) detectors, white and dark references were captured prior to sample images for image correction and calibration.A white reference was acquired based on a standard Teflon white board, and a dark reference was obtained by closing the shutter.The calibrated images were calculated according to Equation (1): where R = calibrated images; R 0 = raw images; D = dark reference; W = white reference.

Data Analysis
Hyperspectral images were analyzed using ENVI v.5.1 (Exelis Visual Information Solutions Inc., Redlands, CA, USA).Before analysis, spectra range was reduced to 1000-2500 nm for noise reduction.Regions of interest (ROIs) of maize kernels of each image were identified and selected in order to mask the background, Petri dishes and labels in the image.Principal component analysis (PCA) as an unsupervised classification algorithm has the ability to separate data with distinct differences.Thus, PCA was applied on the image for the selection of ROIs of maize kernels [18].After the selection, inverse PCA was used to reduce noise by removing some PCs that represented noise information, and transform PCs back to wavelengths for subsequent analysis.Then, a mask built with ROIs was applied to obtain a clean image without irrelevant information, such as label and background.
After that, each clean image was resized to remove the blank edges around the kernels, and then all eight of the resized images were integrated to construct a mosaic.It is well-known that PCA can reduce data dimensionality by transforming the raw large number of inter-correlated variables into fewer and orthogonal variables, which are referred to as PCs.PCs are calculated based on the maximum variance of the data, and they are in descending order according to the percentage of variance.Generally, the last PCs are for noise and can be removed.Based on hyperspectral image data, PC loading line plots and PC score images can be displayed and discussed interactively to explore spectra and image information simultaneously.In this research, PCA was applied on the mosaic to reduce data dimensionality and noise to benefit the next classification of different fungal contamination levels.
Computation of ROI separability in ENVI software is usually applied to have a preliminary observation on classification effect of data before application of a classification modeling algorithm.The separation results are calculated on every two classes, represent the separable ability between two classes, and are shown in a list of values for all of the pairs of classes.The separability values vary from 0 to 2. The closer the value to 2, the better the separation for the two classes performed.In this work, the separability for the eight groups was calculated based on two types of data sets, one of which is data of the whole kernel, and the other is data of the germ region.The sample kernels of one Petri dish, on which kernels were orientated with germ up, were used.Elliptical ROIs of the germ region on each kernel were hand-digitized using ENVI software.
The mean reflectance from each ROI of the germ region of kernel samples of the two Petri dishes was calculated.Samples from both of the two Petri dishes made a data set with 1:1 for the number of kernels orientated with germ up and germ down, which was used for discussion on the influence on classification models caused by germ orientation.For the samples orientated with germ down, the germ region was defined as the area symmetrical to the germ portion.Spectral data was analyzed using MATLAB software (The Math Works, Natick, MA, USA).
The raw spectral data (from ROIs) contained scattering noise generated by a camera, random noise, baseline offset, non-uniformity and surface scattering in samples [28].Therefore, to highlight the effective information, pretreatment techniques were applied.In this study, the mathematical pretreatment techniques used were Savitzky-Golay smoothing (SGS), moving average smoothing (MAS), normalization (NMZ), standard normal variate (SNV), and multiple scatter correction (MSC), as well as their combinations.SGS and MAS are moving window methods used to eliminate overlapping peaks due to random noise [29,30].NMZ, SNV and MSC were scatter-correction methods and designed to remove the slope variation from the spectra generated by the scatter and variation of particle size and adjust baseline shifts between samples [31][32][33].PCA can reduce the dimensionality of data by contracting data into a few descriptive dimensions, denoted principal components, which represent main variation in the data [34,35].It is also commonly used as an exploratory tool for data analysis due to the ability of each PC to be displayed graphically [36].Kennard-Stone (KS) used in this study is a technique for calibration and validation subset partitioning.It is based on maximizing the Euclidean distances between spectral data of samples to cover the multidimensional space [37].Support vector machines (SVMs) were used for classification modeling in the study.SVMs are proven to present good generalization performance and to be able to model complex nonlinear boundaries through the use of adapted kernel functions [38].In this work, radial basis function was used and particle swarm optimization (PSO) based on fivefold cross-validation was adopted for optimization of the parameters in SVM [39][40][41].

ROIs Extraction Based on PCA and Masking
Processing on image data of D 7 was shown as an example.After PCA was applied, the first three PCs had a total variance percentage of 99.57%, while individual variance percentages were 92.31%, 6.34%, and 0.92% for PC 1 , PC 2 and PC 3 , respectively.Comparing the score images of the first PCs, although PC 1 had a maximum value of variance percentage, maize kernels seemed to be more distinct from the background on PC 2 and PC 3 score images.Therefore, data of PC 2 and PC 3 were used to make a scatter plot (Figure 1a) to extract object of maize kernels.Five classes were selected based on density slice imagery (Figure 1b), the corresponding pixels of each class were shown in a score image (Figure 1c) where the red class represented the whiteboard, the magenta class the background, the purple class the label, sea green for most parts of the Petri dishes, and yellow for kernels and lesser parts of the Petri dishes.Some parts of the labels produced specular reflection.Hence, they were gathered with the whiteboard in the scatter plot.Since kernels and parts of the Petri dishes were gathered together, it was difficult to separate them clearly in this step.Then, the yellow class was exported to ROI tools and used to make a mask to remove other classes' information.After the mask was applied, retained data also contained parts of the Petri dishes.A scatter plot of PC 3 vs.PC 4 (Figure 2a) was drawn based on the masked data.As shown in Figure 2a, two distinct clusters were apparent.The green cluster was for the parts of Petri dishes (Figure 2b), and exported to make an inverse mask to remove the parts of Petri dishes.After that, a clean image (Figure 2c) with only maize kernels was obtained.Image data of the other treatments (D 1 , D 2 , D 3 , D 4 , D 5 , D 6 , and DC) were processed with the same process steps.
exploratory tool for data analysis due to the ability of each PC to be displayed graphically [36].Kennard-Stone (KS) used in this study is a technique for calibration and validation subset partitioning.It is based on maximizing the Euclidean distances between spectral data of samples to cover the multidimensional space [37].Support vector machines (SVMs) were used for classification modeling in the study.SVMs are proven to present good generalization performance and to be able to model complex nonlinear boundaries through the use of adapted kernel functions [38].In this work, radial basis function was used and particle swarm optimization (PSO) based on fivefold cross-validation was adopted for optimization of the parameters in SVM [39][40][41].

ROIs Extraction Based on PCA and Masking
Processing on image data of D7 was shown as an example.After PCA was applied, the first three PCs had a total variance percentage of 99.57%, while individual variance percentages were 92.31%, 6.34%, and 0.92% for PC1, PC2 and PC3, respectively.Comparing the score images of the first PCs, although PC1 had a maximum value of variance percentage, maize kernels seemed to be more distinct from the background on PC2 and PC3 score images.Therefore, data of PC2 and PC3 were used to make a scatter plot (Figure 1a) to extract object of maize kernels.Five classes were selected based on density slice imagery (Figure 1b), the corresponding pixels of each class were shown in a score image (Figure 1c) where the red class represented the whiteboard, the magenta class the background, the purple class the label, sea green for most parts of the Petri dishes, and yellow for kernels and lesser parts of the Petri dishes.Some parts of the labels produced specular reflection.Hence, they were gathered with the whiteboard in the scatter plot.Since kernels and parts of the Petri dishes were gathered together, it was difficult to separate them clearly in this step.Then, the yellow class was exported to ROI tools and used to make a mask to remove other classes' information.After the mask was applied, retained data also contained parts of the Petri dishes.A scatter plot of PC3 vs. PC4 (Figure 2a) was drawn based on the masked data.As shown in Figure 2a, two distinct clusters were apparent.The green cluster was for the parts of Petri dishes (Figure 2b), and exported to make an inverse mask to remove the parts of Petri dishes.After that, a clean image (Figure 2c) with only maize kernels was obtained.Image data of the other treatments (D1, D2, D3, D4, D5, D6, and DC) were processed with the same process steps.

Further Processing of Bad Pixels and Separability Computation of Contamination Levels
The mosaic image containing the eight clean images was further resized to remove the blank edges around the kernels.Then, PCA was applied on the mosaic.The first four PCs had a total variance percentage of 99.90%, while PC5 to PC8 had a total variance percentage of 0.08%.However, as shown in Figure 3, loading line plot of PC5 to PC8 had several sharp peaks.Corresponding wavelengths at the locations of the sharp peaks were 1016, 1067, 1100, 1111, 1117, 1229, and 1263 nm.Correspondingly, images at these monochromatic wavelengths were found to have several bad pixel lines, which images at other wavelengths did not have.For an example, images at 1111 and 1240 nm were shown in Figure 4. Therefore, the sharp peaks in the loading line plot of PC5 to PC8 were produced due to noise induced by bad pixel lines.Thus, before subsequent analysis, it is necessary to cancel the data at these wavelengths.

Further Processing of Bad Pixels and Separability Computation of Contamination Levels
The mosaic image containing the eight clean images was further resized to remove the blank edges around the kernels.Then, PCA was applied on the mosaic.The first four PCs had a total variance percentage of 99.90%, while PC 5 to PC 8 had a total variance percentage of 0.08%.However, as shown in Figure 3, loading line plot of PC 5 to PC 8 had several sharp peaks.Corresponding wavelengths at the locations of the sharp peaks were 1016, 1067, 1100, 1111, 1117, 1229, and 1263 nm.Correspondingly, images at these monochromatic wavelengths were found to have several bad pixel lines, which images at other wavelengths did not have.For an example, images at 1111 and 1240 nm were shown in Figure 4. Therefore, the sharp peaks in the loading line plot of PC 5 to PC 8 were produced due to noise induced by bad pixel lines.Thus, before subsequent analysis, it is necessary to cancel the data at these wavelengths.

Further Processing of Bad Pixels and Separability Computation of Contamination Levels
The mosaic image containing the eight clean images was further resized to remove the blank edges around the kernels.Then, PCA was applied on the mosaic.The first four PCs had a total variance percentage of 99.90%, while PC5 to PC8 had a total variance percentage of 0.08%.However, as shown in Figure 3, loading line plot of PC5 to PC8 had several sharp peaks.Corresponding wavelengths at the locations of the sharp peaks were 1016, 1067, 1100, 1111, 1117, 1229, and 1263 nm.Correspondingly, images at these monochromatic wavelengths were found to have several bad pixel lines, which images at other wavelengths did not have.For an example, images at 1111 and 1240 nm were shown in Figure 4. Therefore, the sharp peaks in the loading line plot of PC5 to PC8 were produced due to noise induced by bad pixel lines.Thus, before subsequent analysis, it is necessary to cancel the data at these wavelengths.

Further Processing of Bad Pixels and Separability Computation of Contamination Levels
The mosaic image containing the eight clean images was further resized to remove the blank edges around the kernels.Then, PCA was applied on the mosaic.The first four PCs had a total variance percentage of 99.90%, while PC5 to PC8 had a total variance percentage of 0.08%.However, as shown in Figure 3, loading line plot of PC5 to PC8 had several sharp peaks.Corresponding wavelengths at the locations of the sharp peaks were 1016, 1067, 1100, 1111, 1117, 1229, and 1263 nm.Correspondingly, images at these monochromatic wavelengths were found to have several bad pixel lines, which images at other wavelengths did not have.For an example, images at 1111 and 1240 nm were shown in Figure 4. Therefore, the sharp peaks in the loading line plot of PC5 to PC8 were produced due to noise induced by bad pixel lines.Thus, before subsequent analysis, it is necessary to cancel the data at these wavelengths.In our experiment, only a few kernels of D 6 and D 7 had obvious hyphae on the surface, and almost all of these kernels showed symptomatic appearance of fungal infection originally in the germ region, which is consistent with life experience.Pseudo-color images composited by 1095, 1291, 1543 nm of kernels with symptomatic appearance of fungal infection (have obvious hyphae on kernel surface) and asymptomatic appearance were shown in Figure 5.It was proposed that the germ region data might be better than the whole kernel data for further classification of contamination levels and feature extraction of fungi infection.
Appl.Sci.2017, 7, 90 7 of 13 In our experiment, only a few kernels of D6 and D7 had obvious hyphae on the surface, and almost all of these kernels showed symptomatic appearance of fungal infection originally in the germ region, which is consistent with life experience.Pseudo-color images composited by 1095, 1291, 1543 nm of kernels with symptomatic appearance of fungal infection (have obvious hyphae on kernel surface) and asymptomatic appearance were shown in Figure 5.It was proposed that the germ region data might be better than the whole kernel data for further classification of contamination levels and feature extraction of fungi infection.Separability computation in ENVI software was applied to investigate classification effects of whole kernel data and germ region data.The results for 28 ( 2 8 C ) pairs of comparison among eight treatments were plotted as distribution histograms (Figure 6).Compared to the spectral data based on whole kernel data (Figure 6a), the separation values calculated based on germ region data (Figure 6b) were more concentrated at the value of 2. It indicated that data of germ regions were more effective for classification, which might be due to the fact that a greater proportion of fungi start growing from and gather more in germ regions than in whole kernels.Therefore, spectral data of germ regions of all kernel samples were firstly extracted and used for later spectral analysis.

Spectral Analysis
Average spectral curves of the germ region of control and infected kernels with different days were shown in Figure 7.According to the separability computation, some separability values were less than 1.4, which meant that those groups were not easily separable from each other.Hence, four new groups were defined, that is, DC, D1-2, D3-4, D5-7 (D1-2 group consisted of D1 and D2; D3-4 group consisted of D3 and D4; D5-7 group consisted of D5, D6 and D7.).Influence on the performance of the classification model caused by germ orientation was further investigated.For that purpose, two sets of data used for modeling were extracted.One set was the ROI data of kernels orientated with germ up, while the other was the ROI data of the mixture of kernels orientated with germ up and down with 1:1.Separability computation in ENVI software was applied to investigate classification effects of whole kernel data and germ region data.The results for 28 (C 2 8 ) pairs of comparison among eight treatments were plotted as distribution histograms (Figure 6).Compared to the spectral data based on whole kernel data (Figure 6a), the separation values calculated based on germ region data (Figure 6b) were more concentrated at the value of 2. It indicated that data of germ regions were more effective for classification, which might be due to the fact that a greater proportion of fungi start growing from and gather more in germ regions than in whole kernels.Therefore, spectral data of germ regions of all kernel samples were firstly extracted and used for later spectral analysis.
Appl.Sci.2017, 7, 90 7 of 13 In our experiment, only a few kernels of D6 and D7 had obvious hyphae on the surface, and almost all of these kernels showed symptomatic appearance of fungal infection originally in the germ region, which is consistent with life experience.Pseudo-color images composited by 1095, 1291, 1543 nm of kernels with symptomatic appearance of fungal infection (have obvious hyphae on kernel surface) and asymptomatic appearance were shown in Figure 5.It was proposed that the germ region data might be better than the whole kernel data for further classification of contamination levels and feature extraction of fungi infection.Separability computation in ENVI software was applied to investigate classification effects of whole kernel data and germ region data.The results for 28 ( 28 C ) pairs of comparison among eight treatments were plotted as distribution histograms (Figure 6).Compared to the spectral data based on whole kernel data (Figure 6a), the separation values calculated based on germ region data (Figure 6b) were more concentrated at the value of 2. It indicated that data of germ regions were more effective for classification, which might be due to the fact that a greater proportion of fungi start growing from and gather more in germ regions than in whole kernels.Therefore, spectral data of germ regions of all kernel samples were firstly extracted and used for later spectral analysis.

Spectral Analysis
Average spectral curves of the germ region of control and infected kernels with different days were shown in Figure 7.According to the separability computation, some separability values were less than 1.4, which meant that those groups were not easily separable from each other.Hence, four new groups were defined, that is, DC, D1-2, D3-4, D5-7 (D1-2 group consisted of D1 and D2; D3-4 group consisted of D3 and D4; D5-7 group consisted of D5, D6 and D7.).Influence on the performance of the classification model caused by germ orientation was further investigated.For that purpose, two sets of data used for modeling were extracted.One set was the ROI data of kernels orientated with germ up, while the other was the ROI data of the mixture of kernels orientated with germ up and down with 1:1.

Spectral Analysis
Average spectral curves of the germ region of control and infected kernels with different days were shown in Figure 7.According to the separability computation, some separability values were less than 1.4, which meant that those groups were not easily separable from each other.Hence, four new groups were defined, that is, DC, D 1-2 , D 3-4 , D 5-7 (D 1-2 group consisted of D 1 and D 2 ; D 3-4 group consisted of D 3 and D 4 ; D 5-7 group consisted of D 5 , D 6 and D 7 .).Influence on the performance of the classification model caused by germ orientation was further investigated.For that purpose, two sets of data used for modeling were extracted.One set was the ROI data of kernels orientated with germ up, while the other was the ROI data of the mixture of kernels orientated with germ up and down with 1:1.

Classification Based on Data of Maize Kernels with Germ Up
Samples of the Petri dish with kernels orientated with germ up were used, a total of 144 samples made up the data set, with 18 samples of each of the eight treatments.Ninety-six of 144 kernels were selected by using the Kennard-Stone (KS) algorithm as the calibration set, while the other 48 kernels were used as the validation set.Two smooth methods, SGS and MAS, were used firstly to reduce random noise of raw data.In addition, to get rid of the scattering effects caused by different surface roughness and kernel shape, three scatter correction methods, NMZ, SNV and MSC, were applied individually.Discrimination performance of different pretreatment methods and corresponding combinations were shown in Table 1.The performance of SGS and MAS was similar, whether or not they combined with other methods.Results of NMZ were the worst and accuracies of validation (72.92%) were much lower than that of calibration (100%).This might be caused by introducing only the maximum and minimum values of the data set in the algorithm.MAS combined with SNV had the highest classified accuracies of 97.92% and 91.67% for calibration and validation data sets, respectively.The corresponding confusion matrix with characteristics (i.e., sensitivity, specificity, and overall accuracy) was shown in Table 2. Generally, the results were promising with the precision, sensitivity and specificity substantially above 80%, except for low sensitivity (57.14%) for DC in validation, which might be caused by less total sample numbers of the DC group and a large proportion of the number of validation sets compared with the number of calibration sets for the DC group.Moreover, the D1-2 group showed the highest separability, which was illogical.The reason may be related to water content in maize sample kernels.In the experiment, kernels were immersed in water to create a humid environment for mold growth and dipped into the spore suspension for inoculation.A portion of water in kernels evaporated rapidly during the initial period after inoculation, which caused a relatively larger difference in water content between the D1-2 group and other groups (the DC group was not dipped into the spore suspension).The influence of water content could be verified in Figure 8 (loading line plot for PC4-PC6), where several predominant peaks and valleys at 1935 nm (associated with water) were shown.Misclassification might be caused by the size, texture and shape of germ of the samples.

Classification Based on Data of Maize Kernels with Germ Up
Samples of the Petri dish with kernels orientated with germ up were used, a total of 144 samples made up the data set, with 18 samples of each of the eight treatments.Ninety-six of 144 kernels were selected by using the Kennard-Stone (KS) algorithm as the calibration set, while the other 48 kernels were used as the validation set.Two smooth methods, SGS and MAS, were used firstly to reduce random noise of raw data.In addition, to get rid of the scattering effects caused by different surface roughness and kernel shape, three scatter correction methods, NMZ, SNV and MSC, were applied individually.Discrimination performance of different pretreatment methods and corresponding combinations were shown in Table 1.The performance of SGS and MAS was similar, whether or not they combined with other methods.Results of NMZ were the worst and accuracies of validation (72.92%) were much lower than that of calibration (100%).This might be caused by introducing only the maximum and minimum values of the data set in the algorithm.MAS combined with SNV had the highest classified accuracies of 97.92% and 91.67% for calibration and validation data sets, respectively.The corresponding confusion matrix with characteristics (i.e., sensitivity, specificity, and overall accuracy) was shown in Table 2. Generally, the results were promising with the precision, sensitivity and specificity substantially above 80%, except for low sensitivity (57.14%) for DC in validation, which might be caused by less total sample numbers of the DC group and a large proportion of the number of validation sets compared with the number of calibration sets for the DC group.Moreover, the D 1-2 group showed the highest separability, which was illogical.The reason may be related to water content in maize sample kernels.In the experiment, kernels were immersed in water to create a humid environment for mold growth and dipped into the spore suspension for inoculation.A portion of water in kernels evaporated rapidly during the initial period after inoculation, which caused a relatively larger difference in water content between the D 1-2 group and other groups (the DC group was not dipped into the spore suspension).The influence of water content could be verified in Figure 8 (loading line plot for PC 4 -PC 6 ), where several predominant peaks and valleys at 1935 nm (associated with water) were shown.Misclassification might be caused by the size, texture and shape of germ of the samples.For practical application, it was expected that performance for classification was similar regardless of the orientation of kernels.Data of kernels of the two Petri dishes were analyzed and discussed.A total of 288 samples made up the data set, with 36 from each of the eight treatments.In addition, 192 of 288 kernels were selected as the calibration set, while the other 96 kernels were used as the validation set.Results of the data analyzed with the same chemometric algorithms were shown in Table 3.Similar to Table 1, NMZ showed the worst performance among the pretreatment methods and MAS combined with SNV had the highest accuracies.Compared with Table 1, with the same pretreatment methods, the results were worse in Table 3 than that in Table 1, which indicated that germ orientation does have an influence on the classification performance.As for maize kernels with different characters on two sides, i.e., the germ and endosperm side, published literature showed oil-rich germs are colonized by aflatoxin-producing fungi before endosperm tissue [42,43], which makes a distribution difference in the depth of fungi.Moreover, major information in reflectance spectra originate from the surface and several millimeters below the surface of the side facing the camera.Hence, data of kernels with germ up performed more effectively than that of germ down for detecting early growth of fungi.Despite this, accuracies of the best results were 95.83% and 84.38% for calibration and validation, respectively.Both were above 84%, which was also significant for application.The corresponding confusion matrix with characteristics (i.e., sensitivity, specificity, and overall accuracy) was shown in Table 4.The results were similar to Table 2, and slightly worse.Variance percentages of each of the first six PCs used for the best classification model were listed in Table 5.All of them contributed 99.40% of the total variation information.Then, in order to select the characteristic wavelengths, loading line plot of the first six PCs were analysed.According to Manley et al. [44], PC 1 to PC 3 reflected the variation due to the effect of grain topography, such as kernel curvature, shape and texture, on spectral variation of NIR hyperspectral images.Loading line plot for PC 4 to PC 6 was shown in Figure 8. Five key wavelengths, ascribed to meaningful chemical functional groups, were pointed out.They were 1145, 1408, 1935, 2103, and 2383 nm.The wavelength of 1146 nm ascribed to the 2nd overtone (OT) of aromatic -CH-group, showed the existence of the benzene ring, which can be attributed to some kinds of the few amino acids such as tyrosine [16].The wavelength of 1405 nm, with an O-H stretch first overtone attributed to an ROH structure, was associated with starch [26,29].Furthermore, 1940 nm was ascribed to the O-H bend second overtone of water [29].A wavelength of 2100 nm was associated with carbohydrate [29].In addition, 2382 nm was ascribed to the 2nd overtone of C-H stretching from aliphatic compounds [16,29].These indicated that variation in different groups was mainly caused by changes of ingredient contents of maize germ due to the fungal growth.functional groups, were pointed out.They were 1145, 1408, 1935, 2103, and 2383 nm.The wavelength of 1146 nm ascribed to the 2nd overtone (OT) of aromatic -CH-group, showed the existence of the benzene ring, which can be attributed to some kinds of the few amino acids such as tyrosine [16].The wavelength of 1405 nm, with an O-H stretch first overtone attributed to an ROH structure, was associated with starch [26,29].Furthermore, 1940 nm was ascribed to the O-H bend second overtone of water [29].A wavelength of 2100 nm was associated with carbohydrate [29].In addition, 2382 nm was ascribed to the 2nd overtone of C-H stretching from aliphatic compounds [16,29].These indicated that variation in different groups was mainly caused by changes of ingredient contents of maize germ due to the fungal growth.

Conclusions
The feasibility of early detection of Aspergillus parasiticus infection in maize kernels was verified by using NIR hyperspectral imaging.PCA was applied on image data to extract ROIs from background, labels and others.With loading line plot, wavelengths at which images contained bad pixels were recognized.Germ region data was more effective in discrimination of contamination levels than whole kernel data, which was confirmed by separability computation.

Conclusions
The feasibility of early detection of Aspergillus parasiticus infection in maize kernels was verified by using NIR hyperspectral imaging.PCA was applied on image data to extract ROIs from background, labels and others.With loading line plot, wavelengths at which images contained bad pixels were recognized.Germ region data was more effective in discrimination of contamination levels than whole kernel data, which was confirmed by separability computation.
The classification model based on germ region data (of all kernels orientated with germ up) showed the best results with the preprocess method of a combination of MAS and SNV.The accuracies were 97.92% and 91.67% for calibration and validation, respectively, which demonstrated the ability to separate sound kernels and kernels of different contamination levels of D 1-2 (the group consisted of D 1 and D 2 ), D 3-4 (D 3 and D 4 ), and D 5-7 (D 5 , D 6 , and D 7 ).Germ orientation was found to affect the performance of classified models.Despite this, accuracies of the model, which were built based on data of the mixture of kernels with germ up and down, were 95.83% and 84.38% for calibration and validation respectively, which still showed the possibility for the application of the presented method.
Five key wavelengths (1145,1408,1935,2103, and 2383 nm) were selected according to the loading line plot of PCs used in the model, and were associated with amino acid, starch, water, carbohydrate, and aliphatic compounds, respectively.These are the main ingredients in maize germ, which indicated that variation in different contamination levels was mainly caused by changes of ingredient contents in maize germ due to the fungal growth.The evaporation of water from the maize kernels after inoculation adversely affected the experimental results.Therefore, this process should be noticed in future studies.Furthermore, for future studies, a larger data set of kernels needs to be prepared for further analysis on feature extraction of fungal infection, and different varieties of maize kernels need to be tested to verify the practical application of the presented method.

Figure 1 .
Figure 1.(a) PCA scatter plot of PC2 vs. PC3 (6.34% and 0.92%) of the image of D7; (b) density scatter plot of PC2 vs. PC3; and (c) corresponding score image where the red class represented the whiteboard and lighting sections of labels, the magenta represented the background, the purple the label, sea green for most parts of the Petri dishes, and yellow for kernels and lesser parts of the Petri dishes.

Figure 1 .
Figure 1.(a) PCA scatter plot of PC 2 vs. PC 3 (6.34%and 0.92%) of the image of D 7 ; (b) density scatter plot of PC 2 vs. PC 3 ; and (c) corresponding score image where the red class represented the whiteboard and lighting sections of labels, the magenta represented the background, the purple the label, sea green for most parts of the Petri dishes, and yellow for kernels and lesser parts of the Petri dishes.

Figure 2 .
Figure 2. (a) PCA scatter plot of PC3 vs. PC4 (0.92% and 0.2%) of the masked image of D7; (b) corresponding score image where the green class represented the Petri dishes; and (c) a clean image.

Figure 3 .
Figure 3. Loading line plot of PC5-PC8 of the mosaic image.

Figure 2 .
Figure 2. (a) PCA scatter plot of PC 3 vs.PC 4 (0.92% and 0.2%) of the masked image of D 7 ; (b) corresponding score image where the green class represented the Petri dishes; and (c) a clean image.

Figure 2 .
Figure 2. (a) PCA scatter plot of PC3 vs. PC4 (0.92% and 0.2%) of the masked image of D7; (b) corresponding score image where the green class represented the Petri dishes; and (c) a clean image.

Figure 3 .
Figure 3. Loading line plot of PC5-PC8 of the mosaic image.

Figure 3 .Figure 2 .
Figure 3. Loading line plot of PC 5 -PC 8 of the mosaic image.

Figure 3 .
Figure 3. Loading line plot of PC5-PC8 of the mosaic image.

Figure 6 .
Figure 6.Distribution plots of separability values calculated based on (a) whole kernel data and (b) germ region data.

Figure 6 .
Figure 6.Distribution plots of separability values calculated based on (a) whole kernel data and (b) germ region data.

Figure 6 .
Figure 6.Distribution plots of separability values calculated based on (a) whole kernel data and (b) germ region data.

Figure 7 .
Figure 7. Average spectra of the germ region of all kernels under 8 different treatments.

Figure 7 .
Figure 7. Average spectra of the germ region of all kernels under 8 different treatments.

D 1 - 2 :
the group consisted of D 1 and D 2 ; D 3-4 : the group consisted of D 3 and D 4 ; D 5-7 : the group consisted of D 5 , D 6 and D 7 .

Figure 8 .
Figure 8. Loading line plot for PC4-PC6 used in the best classification model.

Figure 8 .
Figure 8. Loading line plot for PC 4 -PC 6 used in the best classification model.

Table 1 .
Comparison of classification models with different pretreatment methods based on kernel samples orientated with germ up.

Table 1 .
Comparison of classification models with different pretreatment methods based on kernel samples orientated with germ up.

Table 2 .
Confusion matrix of calibration and validation set of the best model based on kernels orientated with germ up..2.2.Classification Based on Data of the Mixture of Maize Kernels Orientated Germ Up and Down 3

Table 3 .
Comparison of classification models with different pretreatment methods based on the mixture of kernel samples orientated with germ up and down.

Table 4 .
Confusion matrix of calibration and validation set of the best model based on the mixture of kernels orientated with germ up and down.

Table 5 .
Variance percentages of the first six PCs.

Table 4 .
Confusion matrix of calibration and validation set of the best model based on the mixture of kernels orientated with germ up and down.

Table 5 .
Variance percentages of the first six PCs.