Rapid Test for Adulteration of Fritillaria Thunbergii in Fritillaria Cirrhosa by Laser-Induced Breakdown Spectroscopy

Fritillaria has a long history in China, and it can be consumed as medicine and food. Owing to the high cost of Fritillaria cirrhosa, traders sometimes mix it with the cheaper Fritillaria thunbergii powder to make profit. Herein, we proposed a laser-induced breakdown spectroscopy (LIBS) technique to test the adulteration present in the sample of Fritillaria cirrhosa powder. Experimental samples with different adulteration levels were prepared, and their LIBS spectra were obtained. Partial least squares regression (PLSR) was adopted as the quantitative analysis model to compare the effects of four data standardization methods, namely, mean centring, normalization by total area, standard normal variable, and normalization by the maximum, on the performance of the PLSR model. Principal component analysis and least absolute shrinkage and selection operator (LASSO) were utilized for feature extraction and feature selection, and the performance of the PLSR model was determined based on its quantitative analysis. Subsequently, the optimal number of features was determined. The residuals were corrected using support vector regression (SVR). The mean absolute error and root mean square error of prediction obtained from the quantitative analysis results of the combined LASSO-PLSR-SVR model for the test set data were 5.0396% and 7.2491%, respectively, and the coefficient of determination R2 was 0.9983. The results showed that the LIBS technique can be adopted to test adulteration in the sample of Fritillaria cirrhosa powder and has potential applications in drug quality control.


Introduction
Fritillaria has high medicinal value because of its ability to moisten the lungs, relieve cough, reduce phlegm, and it can also be consumed as a food [1,2]. Owing to the difference in efficacy and supply and demand, the price of Fritillaria cirrhosa is several times higher than other Fritillaria varieties [3]. In the market, unscrupulous merchants sometimes mix cheaper Fritillaria powders with Fritillaria cirrhosa powder and attempt to sell them off as pure Fritillaria cirrhosa. Therefore, it is necessary to develop a rapid and convenient method to test the level of adulteration of other Fritillaria powders in Fritillaria cirrhosa powder samples [4]. In this paper, the adulteration of Fritillaria thunbergii powder in Fritillaria cirrhosa powder was tested.
Currently, these techniques, such as atomic absorption spectrometry (AAS), inductivelycoupled plasma-mass spectrometry (ICP-MS), and inductively-coupled plasma optical The experimental setup employed in this study was the same as that used in a previous study [3]. The adopted Fritillaria cirrhosa and Fritillaria thunbergii powder samples were purchased from Anhui Tong Huatang Chinese Herbal Beverage Technology Co. (Bozhou, China) and Sichuan Haorui Gallium Biotechnology Co. (Chengdu, China). In the experimental procedure, the powdered Fritillaria cirrhosa and Fritillaria thunbergii were first mixed into experimental samples with 21 different adulteration levels, as listed in Table 1. Subsequently, for each sample, the powdered samples were continuously and vigorously vibrated at 1000 rpm for 10 h using a multifunctional vortex mixer (VM-500Pro, Joanlab, Huzhou, China). Next, the powdered samples were retrieved, placed on weighing paper, and weighed with an electronic balance (BSA124S-CW, Sartorius, Shanghai, China); accordingly, a 0.5-g weight was obtained. The samples were then pressed for 5 min at a 20-MPa pressure using a tablet press (HY-12 Tianjin, Tianguang Optical Instruments Co., Ltd., Tianjin, China) to obtain disc shaped samples with a diameter and thickness of 13 and 2 mm, respectively, and each sample with different doping levels was pressed into two slices, as illustrated in Figure 1. Finally, the pressed samples were placed on a three-dimensional motorized translation table, and 100 LIBS spectra were obtained at different positions for each sample; accordingly, 200 spectra were obtained for two tablets of each sample, and then two spectra were averaged into one spectrum. Finally, 100 LIBS spectra were obtained for each experimental sample with a different doping level.

LIBS Experiments
The main LIBS setup of our experiments has been shown in the previous work [3]. A homemade Q-switched Nd: YAG laser (1064 nm, 30 mJ, 1 Hz, 10 ns) is used as the excitation source. The laser beam is reflected by three mirrors and then focused onto the sample surface with a NIR-corrected microscopic objective (Mitutoyo, 10X, working distance is 30.5 mm). The plasma radiation is collected and focused into a 600 μm fiber using convex lenses. Then, the LIBS spectra are analyzed by the fiber spectrometer (AvaSpec 2048-2-USB2, Avantes) with a range of 190~1100 nm and a resolution of 0.2~0.3 nm.

LIBS Spectra of the Samples
The typical LIBS spectra of 21 randomly selected experimental samples with different doping levels are presented in Figure S1. The Ca, Na, K, H, O, and N elements, including the CN and C2 molecular bands, were identified in the experimental samples and labelled in the LIBS spectra of Sample 1. From Figure S1, the intensities of H, O, N, CN, and C2 in the LIBS spectra of the experimental samples with different doping levels are independent of the doping amount of Fritillaria thunbergii, which is primarily influenced by air. To eliminate the interference of airborne components on the intensities of the LIBS spectra of the experimental samples, non-metallic elements were not used in subsequent analyses. The wavelength of 300-800 nm and intensities greater than 1000 counts of Ca (393.3 nm, 396.8 nm, and 422.6 nm), Na (588.9 nm and 589.5 nm), and K (766.4 nm and 769.8 nm) for three metallic elements are presented in Table 2 for the seven LIBS spectral lines.

Quantitative Analysis Modelling
In this study, the data point intensities of seven spectral lines for each spectrum of three metal elements were adopted as input variables (66 variables in total), 100 LIBS spectra from each of Samples 4,6,8,14,16,and 18 were used as the test set, and 100 LIBS spectra from each of the remaining 15 samples were utilized as the training set to develop the PLSR model.

Data Standardization
Data normalization can reduce the fluctuations between the LIBS spectra of Fritillaria and improve the performance of quantitative analysis models. The effects of the MC, NA, SNV, and NM methods on the performance of the quantitative models were discussed.

LIBS Experiments
The main LIBS setup of our experiments has been shown in the previous work [3]. A homemade Q-switched Nd: YAG laser (1064 nm, 30 mJ, 1 Hz, 10 ns) is used as the excitation source. The laser beam is reflected by three mirrors and then focused onto the sample surface with a NIR-corrected microscopic objective (Mitutoyo, 10X, working distance is 30.5 mm). The plasma radiation is collected and focused into a 600 µm fiber using convex lenses. Then, the LIBS spectra are analyzed by the fiber spectrometer (AvaSpec 2048-2-USB2, Avantes) with a range of 190~1100 nm and a resolution of 0.2~0.3 nm.

LIBS Spectra of the Samples
The typical LIBS spectra of 21 randomly selected experimental samples with different doping levels are presented in Figure S1. The Ca, Na, K, H, O, and N elements, including the CN and C 2 molecular bands, were identified in the experimental samples and labelled in the LIBS spectra of Sample 1. From Figure S1, the intensities of H, O, N, CN, and C 2 in the LIBS spectra of the experimental samples with different doping levels are independent of the doping amount of Fritillaria thunbergii, which is primarily influenced by air. To eliminate the interference of airborne components on the intensities of the LIBS spectra of the experimental samples, non-metallic elements were not used in subsequent analyses. The wavelength of 300-800 nm and intensities greater than 1000 counts of Ca (393.3 nm, 396.8 nm, and 422.6 nm), Na (588.9 nm and 589.5 nm), and K (766.4 nm and 769.8 nm) for three metallic elements are presented in Table 2 for the seven LIBS spectral lines.

Quantitative Analysis Modelling
In this study, the data point intensities of seven spectral lines for each spectrum of three metal elements were adopted as input variables (66 variables in total), 100 LIBS spectra from each of Samples 4,6,8,14,16, and 18 were used as the test set, and 100 LIBS spectra from each of the remaining 15 samples were utilized as the training set to develop the PLSR model.

Data Standardization
Data normalization can reduce the fluctuations between the LIBS spectra of Fritillaria and improve the performance of quantitative analysis models. The effects of the MC, NA, SNV, and NM methods on the performance of the quantitative models were discussed. The mean absolute error (MAE) and RMSEP of the test set obtained using the PLSR model under the four data normalization methods are listed in Table 3.  10.8970%, and 10.8760%, respectively. Among the four data normalization methods, the MAE and RMSEP values of the PLSR model's test set were the lowest by pre-processing the data using the NM method. The relationship between the predicted and standard values of the test set obtained by the PLSR quantitative analysis method under NM data normalization is illustrated in Figure 2; evidently, the MAE and RMSEP values of the test set are 8.6111% and 10.8760%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9651. The error bars in Figure 3 represents the standard deviation (SD) of the predicted value based on measurements. We calculated the SD value of Equation (1) in all the figures of this work.
where the x represent the average value of sample x i , i is the integer index, and n is the number of samples. The mean absolute error (MAE) and RMSEP of the test set obtained using the PLSR model under the four data normalization methods are listed in Table 3.  10.8970%, and 10.8760%, respectively. Among the four data normalization methods, the MAE and RMSEP values of the PLSR model's test set were the lowest by pre-processing the data using the NM method. The relationship between the predicted and standard values of the test set obtained by the PLSR quantitative analysis method under NM data normalization is illustrated in Figure 2; evidently, the MAE and RMSEP values of the test set are 8.6111% and 10.8760%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9651. The error bars in Figure 3 represents the standard deviation (SD) of the predicted value based on measurements. We calculated the SD value of Equation (1) in all the figures of this work.
where the x represent the average value of sample i x , i is the integer index, and n is the number of samples.   more refined model by constructing a penalty function, so that it compresses some coefficients and sets some coefficients to zero. Therefore, Lasso retains the advantages of subset contraction, and it is an estimation of processing with compound linear data. After normalizing the data via the NM method, the feature variables were extracted from the LIBS spectral data using the PCA method. Figure 3 presents the variance ratio of each principal component (PC) and the cumulative variance ratio obtained after the feature extraction using the PCA method. As illustrated in Figure 3, the percentage of variance of each individual PC gradually decreased, and the percentage of cumulative variance gradually increased as the number PC increased. The variance ratios of the first three PCs were 89.77%, 7.53%, and 1.17%. The first PC contained most of the original information, and the cumulative variance ratio of the first three PCs reached 98.47%.
To investigate the effect of the number of PCs on the performance of the PLSR quantitative analysis model, the PC scores were adopted as the inputs to the PLSR model for modelling, and the relationship between the RMSEP of the training set and the number of PCs is illustrated in Figure 4.

Feature Variable Selection
To reduce the amount of data input, discard noise and unimportant features, and to achieve the objective of improving the speed of data processing, it is necessary to select feature variables. Feature variable selection includes two types of feature extraction and feature selection. Additionally, in this study, we adopted PCA and LASSO for feature extraction and feature selection, respectively. LASSO is a compressed estimate. It obtains a more refined model by constructing a penalty function, so that it compresses some coefficients and sets some coefficients to zero. Therefore, Lasso retains the advantages of subset contraction, and it is an estimation of processing with compound linear data.
After normalizing the data via the NM method, the feature variables were extracted from the LIBS spectral data using the PCA method. Figure 3 presents the variance ratio of each principal component (PC) and the cumulative variance ratio obtained after the feature extraction using the PCA method.
As illustrated in Figure 3, the percentage of variance of each individual PC gradually decreased, and the percentage of cumulative variance gradually increased as the number PC increased. The variance ratios of the first three PCs were 89.77%, 7.53%, and 1.17%. The first PC contained most of the original information, and the cumulative variance ratio of the first three PCs reached 98.47%.
To investigate the effect of the number of PCs on the performance of the PLSR quantitative analysis model, the PC scores were adopted as the inputs to the PLSR model for modelling, and the relationship between the RMSEP of the training set and the number of PCs is illustrated in Figure 4.
From Figure 4, the RMSEP of the training set fluctuates with the increase in PC number, exhibiting a general decreasing trend. When the number of PC is 64, the lowest RMSEP value is obtained via the PLSR quantitative analysis model. Therefore, the first 64 PCs are chosen for the test set data analysis. The relationship between the predicted and standard values of the test set obtained by using the PLSR quantitative analysis method for the first 64 PCs as input data is illustrated in Figure 5. From Figure 5, the MAE and RMSEP values of the test set are 7.1585% and 9.1709%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9920.
To investigate the effect on the performance of the PLSR model after adopting LASSO feature selection for quantitative analysis, the LIBS spectral data of the samples were first standardized using the NM method; subsequently, the importance of the data after standardizing the LIBS spectra was evaluated using the LASSO method, and the importance weight values of 66 variables were obtained. Finally, the importance weight values of the 66 variables were normalized, and the obtained normalized weight values based on the training set data are presented in Figure 6.  From Figure 4, the RMSEP of the training set fluctuates with the increase in PC number, exhibiting a general decreasing trend. When the number of PC is 64, the lowest RMSEP value is obtained via the PLSR quantitative analysis model. Therefore, the first 64 PCs are chosen for the test set data analysis. The relationship between the predicted and standard values of the test set obtained by using the PLSR quantitative analysis method for the first 64 PCs as input data is illustrated in Figure 5. From Figure 5, the MAE and RMSEP values of the test set are 7.1585% and 9.1709%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9920. To investigate the effect on the performance of the PLSR model after adopting LASSO feature selection for quantitative analysis, the LIBS spectral data of the samples were first standardized using the NM method; subsequently, the importance of the data after standardizing the LIBS spectra was evaluated using the LASSO method, and the importance weight values of 66 variables were obtained. Finally, the importance weight values of the 66 variables were normalized, and the obtained normalized weight values based on the training set data are presented in Figure 6.   Figure 4, the RMSEP of the training set fluctuates with the increase in PC number, exhibiting a general decreasing trend. When the number of PC is 64, the lowest RMSEP value is obtained via the PLSR quantitative analysis model. Therefore, the first 64 PCs are chosen for the test set data analysis. The relationship between the predicted and standard values of the test set obtained by using the PLSR quantitative analysis method for the first 64 PCs as input data is illustrated in Figure 5. From Figure 5, the MAE and RMSEP values of the test set are 7.1585% and 9.1709%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9920. To investigate the effect on the performance of the PLSR model after adopting LASSO feature selection for quantitative analysis, the LIBS spectral data of the samples were first standardized using the NM method; subsequently, the importance of the data after standardizing the LIBS spectra was evaluated using the LASSO method, and the importance weight values of 66 variables were obtained. Finally, the importance weight values of the 66 variables were normalized, and the obtained normalized weight values based on the training set data are presented in Figure 6.      Figure 6 shows the normalized weight values of 66 variables obtained for the importance assessment of the LIBS spectral lines using the LASSO method. Among the 66 spectral features, nine of them have normalized weight values of 0, thus indicating that these nine features do not play a role in the quantitative analysis. Meanwhile, the elements corresponding to the top 57 wavelengths in importance ranking, and their importance weights are listed in Table 4. To investigate the effect of the number of features on the performance of the PLSR quantitative analysis model, the 57 spectral features with non-zero normalized weight values were ranked according to the normalized weight values from the largest to the smallest. The different numbers of spectral features, corresponding to the top 1-57 normalized weight values, were adopted as inputs to the PLSR model, respectively, and the relationship between the RMSEP of training set and the number of spectral features is illustrated in Figure 7.
As illustrated in Figure 7, the RMSEP of the training set fluctuates with the increase in the number of spectral features, exhibiting a general decreasing trend. The RMSEP value obtained using the PLSR quantitative analysis model is lowest when the number of input spectral features is 57. Then, these 57 features are used in the test data analysis. The relationship between the predicted and standard values of the test set using the PLSR quantitative analysis model for the first 57 spectral features as input data is illustrated in Figure 8. From Figure 8, it can be seen that the MAE and RMSEP values of the test set are 7.1038% and 9.1523%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9728.
To investigate the effect of the number of features on the performance of the PLSR quantitative analysis model, the 57 spectral features with non-zero normalized weight values were ranked according to the normalized weight values from the largest to the smallest. The different numbers of spectral features, corresponding to the top 1-57 normalized weight values, were adopted as inputs to the PLSR model, respectively, and the relationship between the RMSEP of training set and the number of spectral features is illustrated in Figure 7.  Figure 7, the RMSEP of the training set fluctuates with the increase in the number of spectral features, exhibiting a general decreasing trend. The RMSEP value obtained using the PLSR quantitative analysis model is lowest when the number of input spectral features is 57. Then, these 57 features are used in the test data analysis. The relationship between the predicted and standard values of the test set using the PLSR quantitative analysis model for the first 57 spectral features as input data is illustrated in Figure 8. From Figure 8, it can be seen that the MAE and RMSEP values of the test set are 7.1038% and 9.1523%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9728.  Compared with the PCA-PLSR model, the MAE and RMSEP of the predicted values of the LASSO-PLSR model's test set were relatively better, decreasing from 7.1585% to 7.1038% and from 9.1709% to 9.1523%, respectively.

Residual Correction
The residual values were obtained by subtracting the predicted values of the LASSO-PLSR model from the standard values and were corrected using SVR. The SVR is a common kind of traditional machine learning method, which has extraordinary performance in the small sample training. In SVR, the straight line required for fitting data becomes a hyperplane. The SVR creates an "interval band" on both sides of the linear function, which does not calculate the loss for all samples falling into the interval zone. The samples outside the interval zone are recognized as support vectors. That is, only the support vector will affect its function model. Finally, the optimized model is obtained by minimizing total loss and maximum interval.
The relationship between the mean value of the residuals of the 100 spectra of each modelled sample and the mean value of the intensity of the most important spectral line (Na Ⅰ 589.6 nm) is illustrated in Figure 9. Compared with the PCA-PLSR model, the MAE and RMSEP of the predicted values of the LASSO-PLSR model's test set were relatively better, decreasing from 7.1585% to 7.1038% and from 9.1709% to 9.1523%, respectively.

Residual Correction
The residual values were obtained by subtracting the predicted values of the LASSO-PLSR model from the standard values and were corrected using SVR. The SVR is a common kind of traditional machine learning method, which has extraordinary performance in the small sample training. In SVR, the straight line required for fitting data becomes a hyperplane. The SVR creates an "interval band" on both sides of the linear function, which does not calculate the loss for all samples falling into the interval zone. The samples outside the interval zone are recognized as support vectors. That is, only the support vector will affect its function model. Finally, the optimized model is obtained by minimizing total loss and maximum interval.
The relationship between the mean value of the residuals of the 100 spectra of each modelled sample and the mean value of the intensity of the most important spectral line (Na I 589.6 nm) is illustrated in Figure 9.
does not calculate the loss for all samples falling into the interval zone. The samples outside the interval zone are recognized as support vectors. That is, only the support vector will affect its function model. Finally, the optimized model is obtained by minimizing total loss and maximum interval.
The relationship between the mean value of the residuals of the 100 spectra of each modelled sample and the mean value of the intensity of the most important spectral line (Na Ⅰ 589.6 nm) is illustrated in Figure 9.  As illustrated in Figure 9, the data points comprising the average of the residuals of 100 spectra of each modelled sample and the average of the intensity of the most important spectral line (Na I 589.6 nm) are randomly distributed and exhibit a nonlinear relationship overall. To further improve the accuracy of the quantitative analysis model, the most prevalent nonlinear kernel function in SVR (radial basis kernel function) was adopted to develop a SVR-based residual correction model with optimal parameters (c = 76.7312, g = 9.3966), and the predicted value of the PLSR model plus the predicted value of the SVR correction model was considered as the final predicted value. The relationship between the predicted and standard values of the test set, based on the LASSO-PLSR-SVR residual correction model, is illustrated in Figure 10; from this figure, it can be seen that the MAE and RMSEP values of the test set are 5.0396% and 7.2491%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9983. As illustrated in Figure 9, the data points comprising the average of the residuals of 100 spectra of each modelled sample and the average of the intensity of the most important spectral line (Na Ⅰ 589.6 nm) are randomly distributed and exhibit a nonlinear relationship overall. To further improve the accuracy of the quantitative analysis model, the most prevalent nonlinear kernel function in SVR (radial basis kernel function) was adopted to develop a SVR-based residual correction model with optimal parameters (c = 76.7312, g = 9.3966), and the predicted value of the PLSR model plus the predicted value of the SVR correction model was considered as the final predicted value. The relationship between the predicted and standard values of the test set, based on the LASSO-PLSR-SVR residual correction model, is illustrated in Figure 10; from this figure, it can be seen that the MAE and RMSEP values of the test set are 5.0396% and 7.2491%, respectively, and the coefficient of determination, R 2 , of the test set's fitted straight line is 0.9983. From Figures 8 and 10, it can be seen that the performance of the LASSO-PLSR-SVR residual correction-based model improved better than the LASSO-PLSR-based model in the quantitative analysis of the Fritillaria LIBS spectral data. The MAE value of the test set decreased from 7.1038% to 5.0396%, the RMSEP value decreased from 9.1523% to 7.2491%, and the coefficient of determination R 2 of the test set's fitted straight line increased from 0.9728 to 0.9983. The obtained results indicate that the residual correction model can compensate for the limitations of the conventional PLSR model in elucidating nonlinear characteristic variables and ultimately improve the prediction accuracy of the quantitative analysis model. From Figures 8 and 10, it can be seen that the performance of the LASSO-PLSR-SVR residual correction-based model improved better than the LASSO-PLSR-based model in the quantitative analysis of the Fritillaria LIBS spectral data. The MAE value of the test set decreased from 7.1038% to 5.0396%, the RMSEP value decreased from 9.1523% to 7.2491%, and the coefficient of determination R 2 of the test set's fitted straight line increased from 0.9728 to 0.9983. The obtained results indicate that the residual correction model can compensate for the limitations of the conventional PLSR model in elucidating nonlinear characteristic variables and ultimately improve the prediction accuracy of the quantitative analysis model.

Conclusions
In this study, we proposed the adoption of the LIBS technique, combined with machine learning methods in the rapid test of adulteration levels in Fritillaria cirrhosa powder. Using PLSR as a quantitative analysis model, the effects of four data normalization methods, MC, NA, SNV, and NM, on the quantitative analysis performance of the PLSR model, were compared, among which the NM data normalization method exhibited the best performance. Accordingly, the effects of feature extraction and feature selection on the quantitative analysis performance of the PLSR model using PCA and LASSO were compared and analysed, and the performance of the LASSO-PLSR-based model was to be relatively better. A correction model for the residuals was further developed using SVR to compensate for the limitations of the conventional PLSR model in elucidating the nonlinear feature variables. The final LASSO-PLSR-SVR-based residual correction model improved the accuracy of the quantitative analysis relative to the LASSO-PLSR model, and the MAE of the test set decreased from 7.1038% to 5.0396%, the RMSEP decreased from 9.1523% to 7.2491%, and the coefficient of determination R 2 of the test set's fitted straight line increased from 0.9782 to 0.9983. The experimental results demonstrated the effectiveness of adopting the LIBS technique combined with machine learning in the rapid test of adulteration levels in Fritillaria cirrhosa powder. Furthermore, this study has important application value for drug testing, regulation, and quality control.