The Application of Hyperspectral Images in the Classiﬁcation of Fresh Leaves’ Maturity for Flue-Curing Tobacco

: The maturity of tobacco leaves directly affects their curing quality. However, no effective method has been developed for determining their maturity during production. Assessment of tobacco maturity for ﬂue curing has long depended on production experience, leading to considerable variation. In this study, hyperspectral imaging combined with a novel algorithm was used to develop a classiﬁcation model that could accurately determine the maturity of tobacco leaves. First, tobacco leaves of different maturity levels (unripe, under-ripe, ripe, and over-ripe) were collected. ENVI software was used to remove the hyperspectral imaging (HSI) background, and 11 groups of ﬁltered images were obtained using Python 3.7. Finally, a full-band-based partial least-squares discriminant analysis (PLS-DA) classiﬁcation model was established to identify the maturity of the tobacco leaves. In the calibration set, the model accuracy of the original spectrum was 88.57%, and the accuracy of the de-trending, multiple scattering correction (MSC), and standard normalization variable (SNV) treatments was 91.89%, 95.27%, and 92.57%, respectively. In the prediction set, the model accuracy of the de-trending, MSC, and SNV treatments was 93.85%, 96.92%, and 93.85%, respectively. The experimental results indicate that a higher model accuracy was obtained with the ﬁltered images than with the original spectrum. Because of the higher accuracy, de-trending, MSC, and SNV treatments were selected as the candidate characteristic spectral bands, and a successive projection algorithm (SPA), competitive adaptive reweighted sampling (CASR), and particle swarm optimization (PSO) were used as the screening methods. Finally, a genetic algorithm (GA), PLS-DA, line support vector machine (LSVM), and back-propagation neural network (BPNN) classiﬁcation and discrimination models were established. The combination SNV-SPA-PLS-DA model provided the best accuracy in the calibration and prediction sets (99.32% and 98.46%, respectively). Our ﬁndings highlight the efﬁcacy of using visible/near-infrared (ViS/NIR) hyperspectral imaging for detecting the maturity of tobacco leaves, providing a theoretical basis for improving tobacco production.


Introduction
The flue curing of tobacco (Nicotiana tabacum L.) plants has greatly improved tobacco farmers' income [1][2][3][4][5].Tobacco production comprises many steps, including breeding, cultivation, harvesting, curing, and grading [6][7][8][9].For the harvesting step, maturity is the most important indicator for determining whether tobacco leaves are ready for flue curing [10,11].Using immature leaves or leaves at various maturity levels increases the difficulty of curing and reduces the quality of flue-cured tobacco leaves, resulting in great economic losses [12][13][14].However, the assessment of tobacco maturity depends mainly on individual experience, and errors are introduced due to variations in levels of experience [5].Consequently, establishing a harvest maturity standard was challenging.
Therefore, developing an objective evaluation method to determine the maturity of tobacco leaves is highly desirable to improve tobacco production.
Hyperspectral imaging (HSI) technology has been utilized by an increasing number of agricultural experts to resolve problems in crop production, food safety, crop maturity, meat freshness, and other fields in recent years [15][16][17][18].Yuan et al. [19] reported a visible/nearinfrared (ViS/NIR) HIS system for classifying jujube internal bruises over time, and showed that this rapid and nondestructive technique could identify bruising time with 100% accuracy.Ke et al. [20] developed an HSI method for the nondestructive determination of volatile oil and moisture content and the discrimination of geographical origins of Zanthoxylum bungeanum Maxim.Yang et al. [21] reported a classification accuracy of 75.50% for an integrated convolutional neural network (CNN) model for estimating corn yield classification based on HSI and RGB images.Liu et al. [22] successfully proposed a novel framework that combines hyperspectral imaging (HSI) and deep learning techniques for plant image classification.The best accuracy measured for the models was 0.834.
Despite the progress made, most previous studies used only the original hyperspectral image without modifications.On the one hand, a lot of noise is generated during the acquisition of hyperspectral images due to the noise generated by the instrument itself and the external environment, as well as interference from light scattering [23], and on the other hand, original hyperspectral images comprise many narrow spectral bands, leading to a large amount of repeated information in the adjacent bands.Hyperspectral image data have high dimensionality and collinearity [19].When applying an original hyperspectral image to experimental or production practice, hundreds of bands of data exist, which increases the computational burden calculation in the model-building process, seriously hindering the popularization and application of the model [24,25].Therefore, we need to effectively eliminate collinearity between variables, extract useful information from overlapping information, and minimize the number of variables and complexity of models to improve the efficiency and speed of modeling.Nevertheless, filtering transformation and characteristic wavelength screening have a good effect on reducing the multidimensionality and high collinearity of hyperspectral data, which could improve the speed of the model [26][27][28].In many studies, several filtering methods, such as multiple scattering correction (MSC), standard normalization variable (SNV), and Gaussian filter (GF), have been used to filter the original hyperspectral image, while particle swarm optimization (PSO), competitive adaptive reweighted sampling (CARS), and successive projection algorithm (SPA) have been used to select characteristic wavelengths.
Concerning tobacco, researchers have used HSI to analyze the color and location features of tobacco [29] to rapidly and nondestructively predict nicotine and chlorophyll contents [30] and photosynthetic capacity [31] and quantitate tobacco chemical composition [25].Focusing on the classification of tobacco maturity, Chen et al. [5] successfully used near-infrared spectroscopy to discriminate between the different maturity levels of fresh tobacco leaves.However, this technique relies on a high computational effort that increases the processing workload.Therefore, we postulate that it would be a wise solution to establish an efficient and accurate maturity recognition model for flue-cured tobacco by using the method of filtering and transforming the original spectra and screening the characteristic spectra.In the present study, we developed a new methodology to determine the maturity of tobacco leaves using hyperspectral technology.This new method has a lower computational burden than current methods while producing more accurate results.This proposed method is expected to replace the traditional method of classifying tobacco maturity grade, providing an auxiliary means for objective classification of tobacco maturity and reducing the difficulty of tobacco curing.

Sample Preparation
Tobacco leaves were obtained from Luoyang, Henan Province, China, in 2021.A total of 213 tobacco samples were collected for this study.Experts who had been engaged in flue-cured tobacco leaf production for a long time were invited to evaluate the maturity level of all samples and label them according to four maturity stages (unripe, under-ripe, ripe, and over-ripe), as shown in Table 1.There were 36 unripe, 43 under-ripe, 91 ripe, and 53 over-ripe tobacco leaves.Hyperspectral images of all samples were obtained within 2 h.The leaves are dark green, the main veins and branches are green, the pubescence is obvious, and the angle between the stalk and leaves is less than 60 • .

Under-ripe 43
The leaves are light green, the main vein is light green, the branch veins are all white, the pubescence is partially shed, and the angle between the stem and leaves is 60 Ripe 91 The leaf surface is light yellow, the main vein and the branch veins are all white, the pubescence fell off, the leaf tip and leaf edge are whitish and rolled down, and the angle between the stem and leaves is 75

Over-ripe 53
The complete leaf surface is yellow, the main veins and branches are white, the pubescence is completely shed, and the angle between the stem and leaves is greater than 85 • .

Acquisition of Hyperspectral Images
Hyperspectral images were acquired using a HSI system (GaiaField-V10E, Jiangsu Dualix Spectral Imaging Technology Co., Ltd., Wuxi City, Jiangsu, China).The spectral range was 400-1000 nm.The number of bands in the hyperspectral images was 256.Data acquisition and management were conducted using SpecView software 5.3.1 (Jiangsu Dualix Spectral Imaging Technology Co., Ltd., Wuxi City, Jiangsu, China).A clear image was captured using an exposure time of 12 ms.The distance between the sample and the camera was 120 cm.Four 200 W/220 V halogen lamps were used as the light source.A schematic is shown in Figure 1.

Hyperspectral Image Correction
During hyperspectral image acquisition, dark currents and uneven illumination lead to noise in the collected images [32].Therefore, the hyperspectral image must be corrected using the white reference image Iwhite (obtained from the standard white plate scan) and the black reference image Idark (obtained by covering the lens without illumination).The corrected image R was transformed from raw hyperspectral image I based on Equation (1):

Hyperspectral Image Correction
During hyperspectral image acquisition, dark currents and uneven illumination lead to noise in the collected images [32].Therefore, the hyperspectral image must be corrected using the white reference image I white (obtained from the standard white plate scan) and the black reference image I dark (obtained by covering the lens without illumination).The corrected image R was transformed from raw hyperspectral image I based on Equation (1):

Extraction of Spectral and Image Data
To effectively extract the spectral and image data from the object, ENVI5.3.1 (Research System Inc., Boulder, CO, USA) software was used to mask and select the region of interest (ROI) in all sample images.The ROI was used to remove invalid backgrounds from the hyperspectral image.The mean spectral values of the removed background images were then calculated.Spectral data were extracted and preprocessed using Python 3.7.
In order to reduce the high dimensionality and collinearity of the hyperspectral image for further data analysis and modeling, it needed to be preprocessed.In this experiment, baseline, de-trending, MSC, normalization, orthogonal signal correction (OSC), Savitzky-Golay (SG), SNV, spectroscopic transformation (ST), GF, moving average (MA), and median filter (MF) methods were used to filter the original hyperspectral images.Baseline and ST can effectively block image noise [23]; de-trending is primarily used to eliminate baseline drift of diffuse reflections [17]; normalization unifies the spectral values of each image element to the overall average luminance level to reduce luminance differences [23]; MSC and SNV are mainly used for light scattering correction and aim to improve the accuracy of spectral data [23]; OSC is used to filter out noise from the raw data and to ensure that the information filtered out is not relevant to the information to be measured [19]; SG is mainly used to eliminate random noise [23]; GF eliminates Gaussian noise and is widely used in image processing for noise reduction [17]; MA can filter out high-frequency noise [17]; MF effectively blocks image noise, greatly reduces the blurring of image edges, and is better at handling slightly dense noise points [17].

Data Analysis
PLS-DA models [33] were used to analyze the filtered and original hyperspectral images.Then, high-accuracy image data were collected.Finally, to effectively eliminate collinearity between variables, minimize the complexity of the model variables, and improve the efficiency and speed of modeling, the SPA [34], PSO [35], and CARS [36] algorithms were used to obtain the characteristic wavelengths of the selected image from step 2. Finally, the GA [37], PLS-DA, LSVM [38], and BPNN [39] models were used to analyze the characteristic wavelengths of the hyperspectral images.

Analytical Software
The invalid background of the hyperspectral images was removed using ENVI5.3.1.Image filtering, characteristic wavelength selection, and construction of the analytical model were carried out in Python 3.7.The graphs were created using Excel 2010.

Image Analysis
The ENVI software was used to mask the images of tobacco and remove the image background.As shown in Figure 2, the background image can be easily removed from the ROI.Principal component analysis (PCA) was performed on the masked images.The top 5 images of the 256 principal component images were then obtained.The results show that the color and principal components of tobacco leaves clearly differed among maturity levels and that the tobacco leaves changed from green to yellow as they matured.
The ENVI software was used to mask the images of tobacco and remove the image background.As shown in Figure 2, the background image can be easily removed from the ROI.Principal component analysis (PCA) was performed on the masked images.The top 5 images of the 256 principal component images were then obtained.The results show that the color and principal components of tobacco leaves clearly differed among maturity levels and that the tobacco leaves changed from green to yellow as they matured.

Spectra Analysis
The spectral curves of tobacco leaves at different maturity levels are shown in Figure 3. Leaves at all maturity levels showed the same trend and had an absorption peak at 680 nm.In the range of 450-750 nm, the spectral reflectance of fresh tobacco leaves at different maturity levels followed the order over-ripe > ripe > under-ripe > unripe.

Spectra Analysis
The spectral curves of tobacco leaves at different maturity levels are shown in Figure 3. Leaves at all maturity levels showed the same trend and had an absorption peak at 680 nm.In the range of 450-750 nm, the spectral reflectance of fresh tobacco leaves at different maturity levels followed the order over-ripe > ripe > under-ripe > unripe.

Filter Processing of Images
The original spectra were pretreated using the baseline, de-trending, MSC, normalize, OSC, SG, SNV, spectroscopic transformation, GF, MA, and MF methods, and the spectral curves obtained with the different pretreatments are shown in Figure 4.The trend of the original spectral curves differed from that of the filtered curves modified via de-trending, MSC, normalization, OSC, SNV, and spectroscopic filtering methods.

Filter Processing of Images
The original spectra were pretreated using the baseline, de-trending, MSC, normalize, OSC, SG, SNV, spectroscopic transformation, GF, MA, and MF methods, and the spectral curves obtained with the different pretreatments are shown in Figure 4.The trend of the original spectral curves differed from that of the filtered curves modified via detrending, MSC, normalization, OSC, SNV, and spectroscopic filtering methods.However, the trends of the other filtered spectral data were more similar to those of the source spectra.Furthermore, with the exception of the OSC filtering treatment, the spectral trends of fresh tobacco leaves in the over-ripe grades differed significantly from those in the other maturity grades.

PLS-DA of Tobacco Leaf Images
Before establishing the PLS-DA classification model, the tobacco leaf samples were randomly assigned to two groups at a ratio of 7:3.From the samples, 70% were used as the calibration set, and the remaining 30% were used as the prediction set (30%).The allocation results are listed in Table 2, and the classification results of PLS-DA for the original and filtered spectra are shown in Table 3.In the calibration sets, the model accuracy of the original spectra was 87.84%.The model accuracies of the de-trending, MSC, and SNV spectra were 91.89%, 95.27%, and 92.57%, respectively, which are 4.05%, 7.43%, and 4.73% higher, respectively, than those of the original.The model accuracy of the other filtered images was lower than that of the original image.In the prediction set, the model accuracy of the original spectra was 92.31% and that of de-trending, MSC, and SNV spectra was 01.54%, 4.61%, and 1.54% higher at 93.85%, 96.92%, and 93.85%, respectively.However, the model accuracy of the other eight filtering methods was lower than that of the original method.In the calibration and prediction sets, the model accuracy of de-trending, MSC, and SNV spectra were always higher than that of the original.Therefore, these spectra can be used for screening characteristic spectral bands.

PLS-DA of Tobacco Leaf Images
Before establishing the PLS-DA classification model, the tobacco leaf samples were randomly assigned to two groups at a ratio of 7:3.From the samples, 70% were used as the calibration set, and the remaining 30% were used as the prediction set (30%).The al-

De-Trending
Seven characteristic spectral bands were selected using the SPA.These were 410.1, 510.0, 517.1, 531.3, 555.0, 610.1, and 941.0 nm (Figure 5).When the PSO algorithm was used to screen the characteristic spectral bands, 70 bands were selected.These were 403.7).

Classification Results Using the Characteristic Wavelengths
The GA, PLS-DA, LSVM, and BP model results for classifying the characteristic wavelengths are shown in Table 4.The SNV-SPA-PLS-DA model provided the highest accuracy (99.32%) in the calibration set.The SNV-PSO-PLS-DA, MSC-SPA-PLS-DA, and MSC-PSO-PLS-DA models all achieved an adequate accuracy rate of 96% or more, which is slightly lower than that of SNV-SPA-PLS-DA.The accuracy of the other models was relatively low.In the prediction set, SNV-SPA-PLS-DA provided the highest accuracy at 98.46%, while the accuracies of SNV-PSO-PLS-DA, MSC-SPA-PLS-DA, and MSC-PSO-PLS-DA models were all less than 96%.The accuracy of the other models was relatively low.Therefore, the SNV-SPA-PLS-DA model had the highest accuracy, and the SNV-SPA-PLS-DA model could quickly and accurately identify the maturity of fresh tobacco leaves.

Classification Results Using the Characteristic Wavelengths
The GA, PLS-DA, LSVM, and BP model results for classifying the characteristic wavelengths are shown in Table 4.The SNV-SPA-PLS-DA model provided the highest accuracy (99.32%) in the calibration set.The SNV-PSO-PLS-DA, MSC-SPA-PLS-DA, and MSC-PSO-PLS-DA models all achieved an adequate accuracy rate of 96% or more, which is slightly lower than that of SNV-SPA-PLS-DA.The accuracy of the other models was relatively low.In the prediction set, SNV-SPA-PLS-DA provided the highest accuracy at 98.46%, while the accuracies of SNV-PSO-PLS-DA, MSC-SPA-PLS-DA, and MSC-PSO-PLS-DA models were all less than 96%.The accuracy of the other models was relatively low.Therefore, the SNV-SPA-PLS-DA model had the highest accuracy, and the SNV-SPA-PLS-DA model could quickly and accurately identify the maturity of fresh tobacco leaves.The data in bold indicate the best method and the corresponding model performance.

Classification Results of Tobacco Leaves of Different Maturities Using SVN-SPA-PLS-DA
Table 5 shows the confusion matrices of tobacco leaves at various levels of maturity obtained using the SNV-SPA-PLS-DA model.The classification accuracy at the over-ripe level was 97.14% and 95% in the calibration sets and prediction sets, respectively, and one tobacco leaf at this level was misclassified as ripe in both sets.The calibration and prediction sets could classify the remaining levels at 100% accuracy.

Discussion
Tobacco leaf maturity is an important factor during the production of tobacco [5,13].Nowadays, there are mainly two strategies used to ensure the maturity of tobacco leaves in large tobacco-producing countries: one is to change agronomic methods by studying the ripening mechanism of tobacco leaves [40], and the other is to accurately assess the maturity of tobacco leaves and develop efficient curing technologies [5,10].However, because the growth of tobacco is greatly affected by the environment, the first strategy is somewhat passive.The method developed in this study is efficient and accurate and not limited by the ecological environment of tobacco fields.
In this study, except for the OSC filtering, both the original spectral data and the other filtered spectral data showed a large difference between the over-ripe and ripe grades and the under-ripe and unripe grades, while the trends of under-ripe, ripe, and unripe were basically the same.This is mainly due to the fact that in the visible-light range, the chlorophyll of plant leaves has a strong light absorption capacity [41]; as fresh tobacco leaves gradually age, the chlorophyll content continues to decrease, and their light absorption capacity gradually weakens.
It was found that one sample was incorrectly classified using the SNV-SPA-PLS-DA model.The main reason for this is that the metabolism of tobacco leaf is a continuous process; therefore, the changes in the properties of the tobacco leaf are also continuous.Ripe and over-ripe are two adjacent maturity levels of fresh tobacco leaves that may have more similarity in color, moisture, and other components between them, and therefore their spectral characteristics are similar and more distinct.This provides some reference for further optimization of our model.
Chen et al. directly used 512 spectral bands of the original spectrum to model and identify the maturity of fresh tobacco leaves, with a maximum accuracy of 97.31% [5], while we only used 22 spectral bands to construct a fresh tobacco leaf maturity recognition model, and its accuracy reached 98.46%.This is mainly because we reduced the collinearity of hyperspectral data through filtering transformation, extracted more effective information, and reduced the calculation amount of the model by filtering the characteristic spectrum [42].This technique reduces the computation requirements of the model, improves the recognition efficiency, and increases the probability of the technology being popularized and applied.However, since the number of samples was only 213 in this study, the accuracy of the model could be further improved by obtaining more sample data.

Conclusions
We have successfully established a novel algorithm for identifying the maturity of tobacco leaves.De-trending, MSC, and SNV were selected as the optimized filtering methods owing to their high accuracy.SPA and PSO were used to select the characteristic spectral bands of these three methods.GA, PLS-DA, LSVM, and BPNN classification and discrimination models were established.Finally, the SNV-SPA-PLS -DA model exhibited the highest accuracy among the tested models, with 99.32% and 98.46% accuracy in the calibration and prediction sets, respectively.Although the constructed model can effectively identify the maturity of flue-cured tobacco, it is currently limited to the samples collected during the experimental process.Due to the large planting area of flue-cured tobacco in China, the growth and development of flue-cured tobacco are greatly affected by the ecological environment, and the maturity characteristics of flue-cured tobacco may vary in different regions.There is also some variation in the maturity of tobacco leaves in different growing locations.Therefore, we need to increase the number of samples and further improve the stability and accuracy of the model in different flue-cured tobacco planting areas in our future research.

Figure 1 .
Figure 1.Hyperspectral image acquisition of tobacco leaves.

Figure 2 .
Figure 2. Principal component analysis images of tobacco leaves at different maturity levels.PC1, PC2, PC3, PC4, and PC5 were the first, second, third, fourth, and fifth principal components.

Figure 2 .
Figure 2. Principal component analysis images of tobacco leaves at different maturity levels.PC1, PC2, PC3, PC4, and PC5 were the first, second, third, fourth, and fifth principal components.

Figure 3 .
Figure 3. Original spectral curves of tobacco leaves at different maturity levels.

Figure 3 .
Figure 3. Original spectral curves of tobacco leaves at different maturity levels.

Figure 5 .Figure 6 .
Figure5.Characteristic wavelengths of the de-trending spectral curve selected using the SPA.

Figure 6 .
Figure6.Characteristic wavelengths of the de-trending spectral curve selected using the PSO algorithm.

Figure 7 .
Figure 7. Characteristic wavelengths of the de-trending spectral curve selected using the CARS algorithm.

Figure 7 .
Figure 7. Characteristic wavelengths of the de-trending spectral curve selected using the CARS algorithm.

Figure 7 .
Figure 7. Characteristic wavelengths of the de-trending spectral curve selected using the CARS algorithm.

Figure 8 .
Figure 8. Characteristic wavelengths of the SNV spectral curve selected using the SPA.

Figure 8 .Figure 9 .Figure 10 .
Figure 8. Characteristic wavelengths of the SNV spectral curve selected using the SPA.

Figure 10 .
Figure10.Characteristic wavelengths of the SNV spectral curve selected using the CARS algorithm.

Figure 11 .Figure 12 .
Figure 11.Characteristic wavelengths of the MSC spectral curve selected using the SPA.

Figure 11 .
Figure 11.Characteristic wavelengths of the MSC spectral curve selected using the SPA.

Figure 11 .Figure 12 .
Figure 11.Characteristic wavelengths of the MSC spectral curve selected using the SPA.

Figure 13 .
Figure13.Characteristic wavelengths of the MSC spectral curve selected using the CARS algorithm.

Figure 13 .
Figure13.Characteristic wavelengths of the MSC spectral curve selected using the CARS algorithm.

Table 1 .
The maturity characteristics of different flue-cured tobacco samples.

Table 2 .
Calibration and prediction sets of samples.

Table 3 .
Classification results of PLS-DA for the original and filtered spectra.
The data in bold indicate that the classification result is better than the filtering method result of the original spectral data.

Table 4 .
The results of GA, PLS-DA, LSVM, and BP classification based on characteristic wavelengths.

Table 4 .
The results of GA, PLS-DA, LSVM, and BP classification based on characteristic wavelengths.

Table 5 .
Confusion matrices of tobacco leaves with various maturities obtained using SNV-SPA-PLS-DA.