A Long Short-Term Memory Neural Network Based Simultaneous Quantitative Analysis of Multiple Tobacco Chemical Components by Near-Infrared Hyperspectroscopy Images

: Near-infrared (NIR) spectroscopy has been widely used in agricultural operations to obtain various crop parameters, such as water content, sugar content, and different indicators of ripeness, as well as other potential information concerning crops that cannot be directly obtained by human observation. The chemical compositions of tobacco play an important role in the quality of cigarettes. The NIR spectroscopy-based chemical composition analysis has recently become one of the most effective methods in tobacco quality analysis. Existing NIR spectroscopy-related solutions either have relatively low analysis accuracy, or are only able to analyze one or two chemical components. Thus, a precise prediction model is needed to improve the analysis accuracy of NIR data. This paper proposes a tobacco chemical component analysis method based on a neural network (TCCANN) to quantitatively analyze the chemical components of tobacco leaves by using NIR spectroscopy, including nicotine, total sugar, reducing sugar, total nitrogen, potassium, chlorine, and pH value. The proposed TCCANN consists of both residual network (ResNet) and long short-term memory (LSTM) neural network. ResNet is applied to the feature extraction of high-dimension NIR spectroscopy, which can effectively avoid the gradient-disappearance issue caused by the increase of network depth. LSTM is used to quantitatively analyze the multiple chemical compositions of tobacco leaves in a simultaneous manner. LSTM selectively allows information to pass through by a gated unit, thereby comprehensively analyzing the correlation between multiple chemical components and corresponding spectroscopy. The experimental results conﬁrm that the proposed TCCANN not only predicts the corresponding values of seven chemical components simultaneously, but also achieves better prediction performance than other existing machine learning methods.


Introduction
As an important economic plant, the total global economic cost of tobacco is estimated at around USD 1.85 trillion or around 1.8% of global GDP and continues to increase [1]. Tobacco contains a variety of chemical compositions, some of which contribute to the flavor and aroma of tobacco [2], and some of which affect the tobacco organoleptic quality [3].

•
A ResNet module is used to directly extract the features of high-dimension NIR spectra. Traditional methods cannot process the high-dimension NIR spectra without preprocessing. However, part of the original spectra information may be lost in the data preprocessing. In our proposed ATCNN, through the internal residual block, the ResNet network enables high-dimension data to achieve the identity mapping between different network layers in the deep network, and enables the shallow data to be identity mapped to the deep network. ResNet network can fully extract the features of spectra, and improve the prediction accuracy of the proposed TCCANN. • The proposed ResNet network uses a full convolutional structure to avoid the loss of spectra information in the feature extraction process and can extract spectroscopy features more effectively. The tobacco features at each wavelength point in NIR spectra are correlated with each other. The pooling layer can reduce the number of network parameters by decreasing the feature dimensions, but the loss of partial feature information may occur in the pooling process, which affects the performance of the proposed TCCANN. TCCANN uses the convolutional layer with a stride of two to replace the pooling layer. In the feature extraction process, the relevance in the feature information of spectra can be retained, the loss of spectra information can be reduced, and the prediction accuracy of the proposed model can be improved. • The proposed TCCANN uses LSTM network to achieve the simultaneous quantitative analyses of multiple chemical compositions of tobacco. Existing methods need to consider the correlation between each chemical component and the NIR spectra, and can only analyze each chemical composition individually. LSTM network can keep data information in internal gate units and achieve selective information transmission. Thus, the comprehensive correlation analysis between multiple chemical components and spectra, and simultaneous quantitative analyses of multiple chemical compositions of tobacco can be achieved. Compared with existing models, the proposed TCCANN is simple to operate, requires less running time, and achieves better prediction performance.
The remaining structure of this paper is shown as follows. Section 2 discusses related work; Section 3 discusses the proposed TCCANN in detail; Section 4 simulates the proposed TCCANN and compares the corresponding results with existing solutions; Section 5 concludes this paper.

Related Work
Many studies have been conducted to analyze the chemical compositions of various plants and foods by using NIR spectroscopy. Machine learning methods, such as partial least squares (PLS) [45], support vector machine (SVM) [11], and least-squares support vector machine (LS-SVM) [44] have been widely used to analyze chemical compositions. In recent studies, researchers have used the preprocessing of spectra, various modelling procedures, and the optimization of model parameters to improve the determination accuracy. Chen [46] used the fractional calculus augmented NIR spectra to detect the nitrogen contents of rubber trees. In this method, fractional calculus was used to extract additional information from the original spectra, and the derivatives of different orders were analyzed. Then, the selected wavelengths were utilized by PLS regression method to develop the estimation model. Olarewaju [47] developed a multivariate calibration model based on PLS regression algorithm to determine the rind biochemical properties of citrus fruit from visible to near-infrared (Vis/NIR) spectra. Some mathematical pre-processing methods were introduced to develop regression models for NIR spectra analysis. Ting [45] developed a synergy interval partial least squares (Si-PLS) method to do quantitative analysis of total flavonoid content (TFC) in Goji berries. Si-PLS method split the fullspectrum region into a number of subintervals (variable-wise) first, and then calculated the combinational subintervals of all possible PLS models. Then, the optimal Si-PLS model was established by the combination of subinterval spectra, which contained the lowest loss. Han [44] developed a MCUVE-LSSVM model to determine total phenolics (TPC) and pcoumaric acid (PA) contents in barley grain. This model used Monte Carlo-Uninformative Variable Elimination (MC-UVE) to select the information wavelength first and then obtain the best calibration specificity of different components. Following this, the analysis model was established by using the least square support vector machine (LS-SVM) with optimized spectra. Jin [48] applied a stepwise-PLS approach to estimate leaf chlorophyll contents of various species from NIR reflected hyperspectral information. This method defined both maximum and minimum values of spectral bands that were used to explain the variations of dependent variables in PLS regression. Different informative spectral bands from hyperspectral reflectance were selected, and evaluated for consistency at different spectral resolutions to identify PLS regression models. Modlitbovab [49] used Laser-Induced Breakdown Spectroscopy (LIBS) as an element of a bio-imaging technique to analyze the nutrient contents of plant samples. This method used LIBS to analyze both element distributions and contents of various nutrients in plants, and highlighted the assessment values of spatial element distribution in phytotoxicity testing.
There are a lot of studies about the determination of routine chemical constituents of tobacco by using NIR spectroscopy, in which machine learning methods have been widely used. Zhang [11] proposed a WT-SVM method that integrated SVM with wavelet transformation (WT) to analyze chemical constituents of tobacco. They first employed the WT method to preprocess the spectra as inputs. Based on the radial basis function (RBF), this model then used SVM regression to analyze the chemical compositions. Tan [4] proposed a boosting partial least squares (boosting-PLS) method to determinate the nicotine content in tobacco. The boosting method was used to optimize training sets first, and then PLS was employed as the regression algorithm to determinate nicotine content. However, Tan highlighted that these results were valid only for their own dataset. Jing [50] applied the Multiblock partial least squares (MB-PLS) method to determinate the moisture in corn as well as both nicotine and sugar in tobacco leaves. In this method, the spectra were separated into sub-blocks along the wavenumber first, then PLS was used to build the corresponding model for each sub-block, and finally a determination model was built by using sub-block models. Duan [2] established a quantitative correction model by using PLS regression to analyze four different categories of chemical compositions in tobacco. The chemical compositions of tobacco include routine chemicals, primary aromatic constituents, inorganic nutrients and heavy metals. They used Savitzky-Golay 9 points algorithm to smooth the original data first, and then the first derivative was used to eliminate the spectral differences from the baseline. They also established different models by PLS regression to analyze different types of chemical compositions. Tan [23] proposed a multivariate calibration method based on WT and mutual information (MI) to analyze the total sugar in tobacco. In this method, the spectra of the training set were transformed into a set of wavelet representations by WT. Then, the reconstructed training set that retained the higher MI value was obtained by calculation, and a PLS model was constructed and optimized. However, this method only analyzed the total sugar in tobacco components. Li [51] applied both PLS and nonlinear least-squares support vector machine (LS-SVM) to the development of calibration models to estimate the constituents of tobacco seed. They used four preprocessing methods to optimize the original spectra before the establishment of calibration models, respectively, and compared the prediction performance of two different models. However, it was complicated that they needed to model each constituent individually.
Li [18] proposed a variable adaptive boosting partial least squares (VABPLS) method to establish the quantitative analysis model of tobacco NIR spectra. According to ARS theory, this method integrated a variable adaptive strategy into BPLS algorithm to analyze the spectra and re-weighted all the appeared samples and variables.
At present, the above research methods used in analyzing the chemical-compositions of tobacco leaves are machine learning methods, which limit to the dimension size of input and need to pre-process the high-dimension original spectral data. In the pretreatment process, part of important information in the original spectra may be lost, which reduces the prediction accuracy of these models. Existing methods cannot comprehensively analyze the correlation between a variety of chemical components and spectra, and can only predict each chemical component individually, which increases the difficulty in analyzing multiple chemical components.
In order to overcome the above shortcomings and achieve higher prediction accuracy, the proposed TCCANN adopts a feature extraction network with residual structure, which can reduce the loss of important information in the NIR spectrum when performing nonlinear transformation. At the same time, the use of LSTM network to extract long-range dependencies in NIR spectra overcomes the lack of residual network capability in this regard. Therefore, the features extracted by TCCANN can contain important information in the original NIR spectrum. TCCANN is able to achieve excellent prediction results by establishing a complex mapping between NIR spectra and prediction sequences.

The Framework of The Proposed TCCANN
This paper proposes TCCANN to do simultaneous quantitative analyses of multiple chemical compositions of tobacco by NIR spectra. Figure 1 shows the overall architecture for analysis of chemical composition of tobacco using ATCNN model. The ATCNN model can directly analyze high-dimensional original spectrum. The tobacco samples were collected from different regions of Guizhou Province, China. We use a series of different analytical methods to detect the chemical values of tobacco, and the spectroscopy of tobacco leaves was prepared by NIR sensors. Then, we divide the prepared tobacco samples into training sets and testing sets. TCCANN uses ResNet network to directly extract features of NIR spectra first, and then applies LSTM network to the quantitative analyses of multiple chemical compositions. The values of chemical compositions obtained by the proposed TCCANN are evaluated and analyzed by several error metrics. Finally, the network structure and parameters are adjusted to further optimize TCCANN and test the trained TCCANN.

The Structure of TCCANN
As shown in Figure 2, the proposed TCCANN has two key components, ResNet and LSTM. ResNet model mainly consists of five convolutional layers and two residual blocks. LSTM model contains three hidden layers. The spectra of tobacco leaves are the input of ResNet model, and the features of spectra are extracted by using a onedimension convolutional layer. The outputs of ResNet nework are used as an input of LSTM network. LSTM model has many memory cells, which selectively transmit information through the gate structure within a unit [35], and generate the prediction values of various chemical compositions.

ResNet
The ResNet structure is illustrated in Figure 2. The blue squares represent the convolution layer, the gray squares represent the residual blocks, and the orange squares represent identity shortcut connection. ResNet mainly consists of five convolutional layers and two residual blocks. A residual block has three convolution layers and an identity skip connection. The first layer of ResNet network is a convolutional layer, in which the stride is 2 and the number of convolutional kernels is 32. The second layer as a residual block includes three convolutional layers that set the stride and the number of convolutional kernels to 1 and 64, respectively. As a convolutional layer, the stride and the channel number are 2 and 128, respectively, in the third layer. The fourth layer is also a residual block, including three convolutional layers with the stride of 1 and the number of convolutional kernels is 256. The last layer of ResNet is composed of three convolution layers, in which the stride is 2 and the number of convolutional kernels is 512. ResNet network is a full convolutional network, in which each max-pooling layer is replaced by a convolutional layer with the stride of 2. In the feature extraction process, the pooling layer is used to reduce the dimension of data, but it may cause the loss of internal information [41,52]. The forward calculation formula of this convolutional layer is shown as follows.
where i is the i-th convolutional kernel of layer l, j is the j-th convolutional kernel of layer is the feature map of the j-th convolutional kernel of layer l − 1, and c represents a set of input feature maps in layer l − 1. b (l) i is the bias of the i-th convolutional kernel of the convolutional layer l. y (l) i is the output of the convolutional layer l, and w (l) j,i is the convolution kernel. ⊗ is a convolutional product with inverted weights. In Equation (2), x (l) i is the feature graph of the convolutional layer l, RELU is the activation function. Rectified linear units (ReLU) [53] are used as the activation function, and the formula is shown as follows.
where BN represents batch normalization(BN) [43]. Before the activation function of each layer, batch standardized processing is applied to make the output of inactive nodes follow a normal distribution with mean of 0 and variance of 1. Then, the results obtained by batch standardized calculation are used to restore the original input characteristics by scaling and panning [54]. This process can ensure the network capacity, accelerate the network training speed, and improve the network generalization ability [55]. ResNet is built by stacking multiple basic structure elements called residual blocks [24]. ResNet introduces an "identity shortcut connection" to residual blocks. The "identity shortcut connection" skips multiple layers in the network and uses the output of the previous layer as a partial input of the subsequent layer [33,56]. The structure of ResNet ensures the identity mapping in the training process. ResNet network enables the shallow data to be identity mapping to the deep network. Thus the related data features can be effectively extracted from the high-dimension NIR spectra. The residual block structure of the proposed TCCANN is shown in Figure 2, x is the input of a convolutional layer, F(x) is the output of a convolutional layer, and the activation function of convolutional layers is set to ReLU. By passing the input x directly into a convolutional layer, the output of the residual module changes from the original F(x) to H(x) = F(x) + x [57]. In the proposed network, the gradients received at H(x) equally flow back into x and F(x) [33] during the back propagation. This simple addition can significantly improve the training effect without adding any extra parameters.

LSTM
Long Short-Term Memory (LSTM) model can capture sequential patterns by learning how to store or ignore certain information from data input [35,58]. The key of LSTM is the cell state (memory cell), which is also shown in Figure 2. It runs straight down the entire chain with the ability to add or remove information to/from the cell state, which is regulated by gates [59]. Gates are used as the optional inlets of information. The gate mechanism includes forget gate, input gate, and output gate. At time step t, the input is x t , and the hidden state from the previous time step h t−1 that is introduced to LSTM cell. The forward pass of an LSTM memory cell proceeds as follows. 1.
The first step decides what information is going to be removed from the cell state. This decision is made by the following forget gate f t . 2.
The following step decides which new information is going to be stored in the cell state. First, the input gate i t layer decides which values are to be updated. Second, a tanh layer [35] that creates a vector of new candidate values g t . 3.
Then, update an old cell state, c t−1 into a new cell state c t as follows.
4. Finally, the output gate o t decides which parts of the cell state are going to be calculated as output. The cell state first goes through tanh layer (to push the values to be between -1 and 1), and then it is multiplied by the output gate as follows.
During the calculation shown in Equations (4)- (9) and Figure 2, f t , i t , o t are the the forget gate, input gate, and output gate, respectively, g t is candidate value, * indicates the element-wise multiplication, σ and tanh are non-linear functions. σ is the sigmoid activation function, which has non-linearity and similarly compresses its inputs to the range of [−1, 1]. W f ,i,g,o and b f ,i,g,o are the weight matrices and bias vectors, respectively. t means the t th time step, and the input at time step t is x t . h t−1 and h t indicate the hidden state at time t − 1 and t, while c t is the cell state at time t. The term h t−1 contains the critical features of tobacco as the output of a pooling layer at time t − 1 and is used as the input of the LSTM memory cell.

The Specific Architecture of TCCANN
The proposed TCCANN consists of ResNet and LSTM. ResNet model has five convolutional layers and two residual blocks, and LSTM model contains three hidden layers. In network training, the number of network training epochs is set to 250,000, and the batch size of training data is set to 16. The initial learning rate is set to 0.001, the learning rate attenuation coefficient is set to 0.99, and the learning rate is updated every 6000 rounds.
The detailed parameters of ResNet and LSTM are shown in Tables 1 and 2, where Conv is short for convolutional layer, FullConnection is short for fully connected layer. The original data are one-dimension NIR data and each dimension has 1609 features. Therefore, the convolutional kernel in all convolutional layers is set to a one-dimension vector. As shown in Table 1, the upper layer of TCCANN consists of ResNet. The input layer of ResNet network is conv1, in which the number of convolutional kernels is 64, the size of convolutional kernel is 9, and the stride is 2. The input size is 1 × 1609 × 1, the output size of Conv1 is 1 × 805 × 32. The next layer of ResNet is residual block1, in which the number of convolutional kernels is 64, the size of the convolutional kernel is 3, and the stride is 1. The input size is 1 × 805 × 32, and the output size of residual block1 is 1 × 805 × 64. The next layer of ResNet is the conv2, in which 128 convolutional channels exist, the size of convolutional kernel is 9, and the stride is 2. The input size is 1 × 805 × 64, and the output size of conv2 is 1 × 403 × 128. The extracted features from conv2 are the input of residual block2. The number of inner convolutional channels are 256, the stride is 1, the output size of residual block2 is 1 × 403 × 256. The output of ResNet network is composed of three convolutional layers, conv3, conv4 and conv5. The number of convolutional channels in these three layers is 512, the size of convolutional kernel is 3, and the stride is 2. The input size is 1 × 403 × 256, and the output size is 1 × 51 × 512. As shown in Table 2, the lower layer of TCCANN is LSTM. LSTM model contains three hidden layers, each of which has 100 LSTM units. The output layer of LSTM model is a fully connected layer, which has 100 units and 7 outputs. The advanced features extracted from ResNet are used as the input of LSTM, and the corresponding input size is 1 × 51 × 512. The final output of TCCANN is the predicted seven chemical compositions of tobacco leaves. During the training process, mean square error (MSE) is chosen as the overall loss function evaluated at the end of each forward iteration, as shown in Equation (10).
The Adam optimizer [60] is selected to minimize the total loss, which updates the network weights and biases based on the gradient of the loss function. In each convolutional layer, ReLU activation function is employed, and each convolutional layer uses the "msra" method proposed by He for weight initialization [61].
The parameter updating in the back propagation training of TCCANN is summarized in Algorithm 1. First, both network structure and parameters of TCCANN are initialized, including the parameters of each network layer and the initial learning rate of the iterative number. Before the model training, the network weight is initialized. In network training, the network weight and learning rate are updated by the loss function and back propagation. Adam [60] is used to optimize the loss function. Extract NIR spectroscopy features by using ResNet model 5: Analyze chemical composition value by using LSTM model 6: Calculate total loss value 7: Minimize total loss by using Adam [60] 8: learning_rate_lr ← update_lr(s) 10: s ← s + 1 11: end while 12: Calculate average loss to determine various hyperparameters 13: 14: return the trained model

The Overall Algorithm of TCCANN
The overall flow of TCCANN is summarized as shown in Algorithm 2. In Algorithm 2,ĉ represents the predicted value of the network on a validation set.ĉ i represents the predicted value of the network on a testing set. Model represents the the trained TCCANN.

Experiment Preparation
In this study, a total of 4000 standard samples of tobacco leaves were collected and measured from different regions of Guizhou Province by Guizhou Tobacco Science Research Institute of China. For the determination of the standard values of tobacco chemical compositions, all tobacco samples were dried in an oven at 60 • C under normal pressure for half an hour first, and then ground to certain granularity through a whirlwind grinding instrument. Next, the sample powders were sieved by mesh. The sieved powders were then processed and analyzed by a San+Automated Wet Chemical Analyzer (Skalar, Holand) (a continuous flow inject analytical instrument). The analyzer can accurately measure the values of routine chemical compositions including nicotine, total sugar, reducing sugar, total nitrogen, potassium, chlorine and pH using a range of different analytical methods [13].
The obtained values are used as the standard values for the experimental analysis of chemical compositions. Statistical values of seven tobacco compositions from 4000 standard samples of tobacco leaves are shown in Table 3.  55, and 4.53-6.01, respectively. This means that all the samples are in a good representation of distribution and cover a wide range of values. For example, nicotine has a strong effect on both the aroma and taste of tobacco products, and nicotine intake can have some side effects on human body [4]. The composition of reducing sugar and total sugar correlates with aftertaste, irritation and aroma quality, and the amount of total nitrogen correlates with the smoke concentration and smoking strength [5,9]. The pH value of tobacco is the determinant factor in the acute toxicity and is also correlated with total nitrogen, total alkaloid and total volatile alkali bases of tobacco [6]. The potassium amount has a positive relationship with the flavor and the degree of wetness [8]. The chemical compositions of tobacco leaves affects the quality of tobacco together, and various chemical compositions are coupled and closely related to each other. NIR spectra were collected by Thermo Antaris 2 with multiple sensors (Thermo Fisher Scientific Inc., Waltham, MA, USA). NIR chemical detector is shown in Figure 3. The collected spectra have the resolution of 8 cm −1 and 64 scans.
As shown in Figure 4, the NIR range is from 3800 cm −1 to 10,000 cm −1 . There are significant fluctuations from 3800 cm −1 to 6500 cm −1 . The 3800 cm −1 to 4870 cm the main tobacco compositions, such as nicotine, total sugar, reducing sugar and total nitrogen. In addition, the potassium has a sensitive band in spectra and the chlorine participates in photosynthesis [8]. The characteristics of potassium and chlorine were bound up with the absorption of C-H, O-H and N-H, which supports the theoretical foundation for determining potassium and chlorine by NIR spectra. There is a certain correlation between NIR spectra data and tobacco chemical constituents.  The detailed division of tobacco data is shown in Figure 5. In this study, there are a total of 4000 standard tobacco data sets, which were randomly divided into both training and testing sets at 1:1 ratio (2000 samples in training and testing sets, respectively). In the model training, the root-mean-squared error of 5-fold cross-validation is used to evaluate the network model. According to the ratio of 4:1, the spectra of routine chemical constituents of tobacco were divided into both calibration and validation sets. Both training and testing of tobacco data were performed using an NVIDIA GeForce RTX 2080 GPU and Intel Core(TM) i7-8700 CPU with a running memory of 24 GB. The neural network was built using the deep learning framework Tensorflow 1.15.0 and Windows 10 operating system, and the training and testing of the proposed TCCANN were processed using the Python 3.6 platform.

Evaluation Metrics
In this paper, both root mean square error (RMSE) and mean absolute error (MAE) are used to evaluate the performance of the proposed model. The corresponding equations of RMSE and MAE are shown in Equations (11) and (12), respectively. As shown in Equation (13), the determination coefficient R 2 is also used to evaluate the performance of the proposed model.
where y t is the true chemical composition value,ŷ t is the predicted data,ȳ is the mean of all the actual samples, and n is the number of y t . RMSE and MAE indicate the measurement precision. When the values of RMSE and MAE are close to 0, it indicates a good fitting. On the contrary, R 2 measures how successful the fit is in explaining the data variation. When the value of R 2 is close to 1, the model shows a good fitting [11]. Concisely, a useful model should have a high R 2 value, and low RMSE and MAE values.

Parameters of TCCANN
In this experiment, the total training rounds of TCCANN are set to 250,000. The loss value of the training set is shown in Figure 6. The ordinate represents the loss value, and the abscissa represents the number of training rounds. The red curve represents the trend of the loss value as the network training progresses.
As shown in Figure 6, in the initial stage of network training, the loss value of the network shows a rapid downward trend, and then decreases slowly until reaching stabilization. When the network training is 100,000 rounds, the training loss value is 0.00307. When the network training is 150,000 rounds, the training loss value is 7.681 × 10 −4 . When the network training reaches 250,000 rounds, the loss value eventually decreases to 1.38 × 10 −4 . The proposed TCCANN fluctuates considerably from 0 to 200,000 rounds of training.
Finally, the loss of TCCANN tends to stabilize and maintains a small value, which confirms TCCANN has a better compatibility. During the training process, 5-fold cross-validation is used in the proposed model, and both training and validation samples are randomly split according to a 4:1 ratio. In order to accurately measure both generalization ability and prediction accuracy of the proposed model, RMSE, R 2 and MAE are used as evaluation indexes for both verification and test sets, respectively.
In the validation set, the RMSE mean value of chemical constituents is 0.03864, and the mean value of MAE is 0.02190. In the test set, the RMSE mean value of chemical constituents is 0.04134 and the mean value of MAE is 0.02501. The values of RMSE do not have significant differences on both verification and test sets. Similarly, the values of MAE do not have significant differences on both verification and test sets either. The correlation coefficient R 2 of seven chemical compositions on both verification and test sets is greater than 0.99 and close to 1. The loss values of seven chemical compositions corresponding to both verification and test sets show a good linear relationship. Thus, the proposed networks are not involved in overfitting and underfitting issues, and have a good generalization ability.
In order to test the analytical efficiency of the proposed TCCANN, we recorded the training and testing times. With the help of Cuda and GPU: training 1928 samples 250,000 steps only cost average 44.35 s, and testing 690 samples only cost average 0.83 s which means a cost of about 1.19 milliseconds for one sample. The result show that, under the conditions of satisfying equipment and data, compared with the traditional chemical method, TCCANN is simpler to operate, the analysis speed is greatly improved, and good analysis results can be obtained.

Comparison with Existing Methods
The proposed TCCANN is used to analyze the complex NIR spectra and determine the chemical compositions of tobacco leaves, including nicotine, total sugar, reducing sugar, total nitrogen, potassium, chlorine, and pH value. In order to demonstrate the good performance of the proposed model, the proposed TCCANN is compared with existing methods, including PLS regression methods [2], wavelet transformation support vector machine (WT-SVM) [11], LS-SVM methods [51] and variable adaptive boosting partial least squares (VAB-PLS) methods [18]. According to the original papers, PLS, WT-SVM, LS-SVM, and VAB-PLS were implemented to carry out comparative experiments. Four models use the same settings, platform, and evaluation indicators as the proposed model. The specific division is shown in Figure 5. Figure 7 shows the correlations between the predicted values and the measured values of 2000 tobacco samples by using four different models on the testing set. Figure 7 has seven rows and five columns (35 scatter plots in total). Each scatter plot represents the correlation between the predicted value and the measured value of a chemical composition. The abscissa of each scatter plot represents the measured value of a chemical composition, and the ordinate represents the predicted value of a chemical composition by different models. The first to fifth columns show the values of seven chemical compositions obtained PLS, WT-SVM, VAB-PLS, LS-SVM, and TCCANN, respectively. Each column shows the values of a chemical composition obtained by PLS, WT-SVM, VAB-PLS, LS-SVM, and TCCANN, respectively. The first to seventh rows show the obtained results of nicotine, total sugar, reducing sugar, total nitrogen, potassium, chlorine, and pH corresponding to red, light green, blue, sage, orange, purple, and dark green points, respectively.
As shown in Figure 7, the first column scatter diagram represents the PLS model, the second column scatter diagram represents WT-SVM model, the third column scatter diagram represents VAB-PLS model, and the fourth column scatter diagram represents the LS-SVM model, and the fifth column scatter diagram represents the proposed TCCANN. As shown in Figure 7, both predicted and measured values obtained by PLS and WT-SVM show great differences, and most of the points are not evenly distributed around the diagonal. Some differences in the predicted and measured values of LS-SVM and VAB-PLS exist, involving a few points distributed around the diagonal. Both predicted and measured values of TCCANN only have little difference, and most of the points are evenly and compactly distributed along the diagonal y = x. As shown in Figure 7, the closer these points are to the diagonal the better the fitting effect of the model. There is a significant linear relationship between the predicted values and the measured values of seven chemical compositions in proposed TCCANN. Figure 8 shows the results of three evaluation indicators obtained by five analysis models on the test set. Three sub-figures from left to right show the values of RMSE, MAE, and R 2 obtained by five different models, respectively. The abscissa of each line chart represents seven different chemical compositions, and the ordinate represents the specific loss value. NIC, TS, RS, TN, PO, CL, PH represent nicotine, total sugar, reducing sugar, total nitrogen, potassium, chlorine, and pH value, respectively. The grey line represents the PLS model, the red line represents the WT-SVM model, the green line represents the VAB-PLS model, the blue line represents the LS-SVM model, and the orange line represents the ATCNN model.   Table 4 shows the results obtained by five analysis models on the test set. NIC, TS, RS, TN, PO, CL, PH represent nicotine, total sugar, reducing sugar, total nitrogen, potassium, chlorine, and pH value, respectively. CC represents chemical-compositions. For PLS, the mean value RMSE of seven chemical compositions is 0.08961, and the minimum loss of total nitrogen is 0.08201. The mean value MAE of seven chemical compositions is 0.07857, and the minimum loss value of chlorine is 0.06393. The mean value R 2 of seven chemical compositions is 0.72428. For WT-SVM, the mean value RMSE of seven chemical compositions is 0.08706, and the minimum loss of total nitrogen is 0.07069. The mean value MAE of seven chemical compositions is 0.06764, and the minimum loss value of total nitrogen is 0.05401. The mean value R 2 of seven chemical compositions ss 0.78283. For VAB-PLS, the mean value RMSE of seven chemical compositions is 0.07191, and the mean value MAE of seven chemical compositions is 0.05107. The mean value R 2 of seven chemical compositions is 0.94358. For LS-SVM, the mean value RMSE of seven chemical compositions is 0.06507, and the mean value MAE of seven chemical compositions is 0.05031. The mean value R 2 of seven chemical compositions is 0.97301. As shown in Table 4, the average RMSE of LS-SVM is less than the corresponding ones obtained by PLS and WT-SVM. The MAE value of LS-SVM is less than the corresponding ones obtained by PLS, WT-SVM, and VAB-PLS. The R 2 value obtained by LS-SVM is larger than the corresponding ones obtained by PLS, WT-SVM, and VAB-PLS. Therefore, the overall performance of LS-SVM model is better than WT-SVM, PLS, and VAB-PLS.
For TCCANN, the mean values of RMSE and MAE are 0.04134 and 0.02501, respectively. The loss values RMSE and MAE of seven chemical compositions are less than the corresponding ones obtained by the other four models. According to the above results, TCCANN bears a good overall performance and high accuracy. The correlation coefficients R 2 of seven chemical compositions are greater than 0.99, which are better than the other four models. As shown in Figure 8 and Table 4, TCCANN performs significantly better than the other four machine learning methods over the tobacco dataset. In comparative experiments, the values of appraisal indexes indicate that the generalization ability and prediction accuracy of TCCANN are superior to the other four methods. Thus, TCCANN is a powerful solution to the problem of determining the chemical composition detection of tobacco leaves.

Conclusions
Near-infrared spectroscopy has become an important research topic in the determination of routine chemical compositions of tobacco. This paper proposes TCCANN to perform simultaneous quantitative analysis of multiple chemical compositions of tobacco by using NIR hyperspectroscopy imagery. TCCANN adopts a full convolutional network that replaces the max-pooling by a convolutional layer with a stride of two. TCCANN can effectively avoid the loss of spectroscopy information in the feature extraction process. TC-CANN uses ResNet to directly extract features of NIR spectroscopy data, and applies LSTM to the simultaneous quantitative analysis of multiple chemical compositions. Through internal residual blocks, ResNet network enables data in the deep network to achieve the identity mapping between different network layers, which can fully extract advanced data features. LSTM can store data information in internal gated units, and implement selective information transmission by gated units. Thus, LSTM can comprehensively analyze the correlation between chemical compositions and spectra, and achieve the simultaneous and quantitative analysis of multiple chemical compositions. This paper uses RMSE, R 2 , and MAE as the evaluation indexes to evaluate the performance of the proposed network. The proposed model is compared with four other machine learning methods (PLS, WT-SVM, VAB-PLS, LS-SVM) to demonstrate its usefulness and excellence. TCCANN achieves better results than the other four methods on several statistical indicators. The results demonstrate the superiority of TCCANN and also demonstrate that the deep learning framework can be applied to NIR spectra and efficiently achieve rapid and accurate analysis of routine chemical compositions of tobacco. However, the proposed TCCANN cannot determine the completely accurate determination of chemical compositions of tobacco. In future work, a more effective analysis model of chemical compositions will be explored.