Rapid Determination of Saponins in the Honey-Fried Processing of Rhizoma Cimicifugae by Near Infrared Diffuse Reflectance Spectroscopy

Objective: A model of Near Infrared Diffuse Reflectance Spectroscopy (NIR-DRS) was established for the first time to determine the content of Shengmaxinside I in the honey-fried processing of Rhizoma Cimicifugae. Methods: Shengmaxinside I content was determined by high-performance liquid chromatography (HPLC), and the data of the honey-fried processing of Rhizoma Cimicifugae samples from different batches of different origins by NIR-DRS were collected by TQ Analyst 8.0. Partial Least Squares (PLS) analysis was used to establish a near-infrared quantitative model. Results: The determination coefficient R2 was 0.9878. The Cross-Validation Root Mean Square Error (RMSECV) was 0.0193%, validating the model with a validation set. The Root Mean Square Error of Prediction (RMSEP) was 0.1064%. The ratio of the standard deviation for the validation samples to the standard error of prediction (RPD) was 5.5130. Conclusion: This method is convenient and efficient, and the experimentally established model has good prediction ability, and can be used for the rapid determination of Shengmaxinside I content in the honey-fried processing of Rhizoma Cimicifugae.


Introduction
Rhizoma Cimicifugae is primarily derived from Cimicifuga heracleifolia Kom., Cimicifuga dahurica (Turcz.) Maxim., or Cimicifuga foetida L. [1]. It is a kind of cool and deconstructive drug commonly used in Chinese traditional medicine. It was first recorded in "Sheng Nong's herbal classic" and appeared as a "top grade product" drug. It has the effect of clearing rash, heat, and detoxifying and lifting yang [1]. Cimicifugae is suitable for growing in mountains, forests, and roadside grasslands about 2000 m above sea level, and their main producing areas are in the three northeastern provinces of China [2]. Cimicifuga Simplex Wormsk is a perennial herb suitable for growing in warm, humid climates and slightly acidic humus. In the cultivation of Cimicifugae, it is necessary to prevent soil from drought.
Cimicifugaes were purchased from different batches of different production areas around China. A total of 150 batches were produced in Anhui Bozhou, Datong Shanxi, Gansu, Yunnan, Shanxi, Sichuan, Henan, Northeast, and the other places shown in Table 1. The standard product Shengmaxinside I (16,17-didehydro-24S-O-acetyl hydroshengmanol-3-O-β -D-galactopyranoside) was prepared in the laboratory and its purity reached 95%. The structure is shown in Figure 1. The 1 H-NMR shows that cyclopropane matrix signals δ H 0.16 (1H, d, J = 3.8 Hz) and δ H 0.59 (1H, d, J = 3.8 Hz), and a methylene proton signal can be seen in the high field region. J = 6.8 Hz, 6 methyl proton signals, 4 oxo-methyl proton signals, 1 acetyl-matrix signal, and 1 sugar-terminal proton signal, suggesting that the backbone is 9, 19-Cyclopentane triterpenoid saponin. The 13 C-NMR shows that from the carbon signals of δ C 121.4 and 151.0, the double bond is located at C-16 and 17 positions, that is, the D ring of 25 is dehydrated at C-16 and C-17 positions to form a double bond, which is a hydroshengmanol-type ring of pineapple beesin triterpene saponins.  Sichuan, China

145-150
Sichuan, China The standard product Shengmaxinside I (16,17-didehydro-24S-O-acetyl hydroshengmanol-3-Oβ-D-galactopyranoside) was prepared in the laboratory and its purity reached 95%. The structure is shown in Figure 1. The 1 H-NMR shows that cyclopropane matrix signals δH 0.16 (1H, d, J = 3.8 Hz) and δH 0.59 (1H, d, J = 3.8 Hz), and a methylene proton signal can be seen in the high field region. J = 6.8 Hz, 6 methyl proton signals, 4 oxo-methyl proton signals, 1 acetyl-matrix signal, and 1 sugarterminal proton signal, suggesting that the backbone is 9, 19-Cyclopentane triterpenoid saponin. The 13 C-NMR shows that from the carbon signals of δC 121.4 and 151.0, the double bond is located at C-16 and 17 positions, that is, the D ring of 25 is dehydrated at C-16 and C-17 positions to form a double bond, which is a hydroshengmanol-type ring of pineapple beesin triterpene saponins.

Preparation of the Test Solution
We precisely weighed Rhizoma Cimicifugae 1.000 g in a round bottom flask. After, we added 40 mL 70% ethanol solution and heated, refluxed, and extracted it for 2 h, and then it was filtered. Then, we added 70% ethanol solution 40 mL again and repeated the above operation two times. The solutions were merged and then evaporated. The resulting precipitate was dissolved in 50 mL 70% ethanol and was filtered with a 0.22 µm microporous membrane to get the test solution.

Investigation of Linear Relationship
We accurately absorbed 4.0, 6.0, 8.0, 10.0, 12.0 and 14.0 µL of Shengmaxinside I standard solution into a 20 µL of sample was injected, and the peak area (Y) was plotted against the concentration of the reference solution (X). The standard curve of Shengmaxinside I was obtained as y = 1.5609x + 5.1509 (r = 0.999), showing a good linear relationship between concentration and peak area in the range of 0.0306 mg to 0.1071 mg.

Precision Experiment
Twenty µL of the reference solution was precisely pipetted and continuously injected 5 times according to the chromatographic conditions above. The RSD value of Shengmaxinside I was 0.91%, calculated from the peak area, indicating that the precision of the instrument was good.

Stability Experiment
For the same test solution, the peak areas were measured at 0, 4, 8, 12, and 24 h, respectively, according to the chromatographic conditions above, and the RSD was 1.79%, indicating that the test solution was stable within 24 h.
2.2.8. Repetitive Experiment 5 samples from the same batch were accurately weighed and prepared according to the method of the test solution. According to the chromatographic conditions above, the average content was determined to be 0.425%, and the RSD was 0.89%, indicating that the method had good repeatability.

Sample Recovery Experiment
We precisely absorbed the sample solution and added the standard solution of high-, middle-, and low-concentration gradients of Shengmaxinside I, and then injected 20 µL of each sample to determine the recovery of the corresponding components according to the above chromatographic conditions. The average recovery rate was 99.14% and the RSD was 1.75%.

Spectral Acquisition
The scanning spectrum range was 12,000-4000 cm −1 with a resolution ratio of 0.5 cm −1 . Two grams of the medicinal powder was placed in a quartz sample cup. Each batch of the 150 batches was reloaded and scanned 3 times. The average was then taken. The detected spectra were superimposed (150 in all), as shown in Figure 2.

Near-Infrared Model Evaluation Index and Establishment of Modeling Methods
When establishing a quantitative analysis model of NIR-DRS analysis, TQ Analyst 8.0 data processing software is used. Multi-linear regression (MLR), Step-wise Multi-linear regression (SMLR), Principal component regression Principal Component Regression (PCR), Partial Least Squares (PLS), and other methods can help establish the model. In the past, MLR was used for correction data. However, the PCR and PLS have been widely used because the new generation of NIR spectrometers can collect spectra in all NIR bands. For the determination of complex sample systems, both PCR and PLS can be applied. Compared with MLR, PCR is slower and the understanding of the model is less intuitive. The most important thing is that it can identify the main factors that affect the system and can solve the problems of colinearity and the number of variables in the linear regression analysis. Relative to the former, PLS establishes a more robust model with the widest range of applications, and the resulting eigenvectors are directly related to sample properties [20]. Therefore, PLS was determined as an analytical method after comprehensive comparison.
While establishing a quantitative analysis model, the performance of the model has to be evaluated. In the NIR-DRS analysis, there are two inspection indicators that are commonly used for quantitative analysis of model results. One is the Cross-Validation Root Mean Square Error (RMSECV), and the other is the determination coefficient (R 2 ). The closer the R 2 value is to 1, the better the correlation between the predicted value of the model and the measured value of the sample; and the smaller the RMSECV, the more stable the model performance and the higher the accuracy [21].
Verification samples are used to verify the NIR-DRS quantitative model by taking the Root Mean Square Error of Prediction (RMSEP) as the inspection index. The RPD is the ratio of the standard deviation for the validation samples to the standard error of prediction, which is also the inspection index.

Correction Set and Verification Set Division
One hundred and fifty spectra were obtained by the Antaris II Fourier Transform Near Infrared Spectrometer. Besides, the TQ Analyst 8.0 data processing software randomly selected 120 representative spectra as calibration sets and 30 as validation sets. The correction set content range is 0.12%-0.52% (w/w), and the validation set content range is 0.14%-0.50% (w/w). Because the content of Shengmaxinside I in the verification sample set is within the calibration set content range, this calibration set and verification set can be used for modeling.

Near-Infrared Model Evaluation Index and Establishment of Modeling Methods
When establishing a quantitative analysis model of NIR-DRS analysis, TQ Analyst 8.0 data processing software is used. Multi-linear regression (MLR), Step-wise Multi-linear regression (SMLR), Principal component regression Principal Component Regression (PCR), Partial Least Squares (PLS), and other methods can help establish the model. In the past, MLR was used for correction data. However, the PCR and PLS have been widely used because the new generation of NIR spectrometers can collect spectra in all NIR bands. For the determination of complex sample systems, both PCR and PLS can be applied. Compared with MLR, PCR is slower and the understanding of the model is less intuitive. The most important thing is that it can identify the main factors that affect the system and can solve the problems of colinearity and the number of variables in the linear regression analysis. Relative to the former, PLS establishes a more robust model with the widest range of applications, and the resulting eigenvectors are directly related to sample properties [20]. Therefore, PLS was determined as an analytical method after comprehensive comparison.
While establishing a quantitative analysis model, the performance of the model has to be evaluated. In the NIR-DRS analysis, there are two inspection indicators that are commonly used for quantitative analysis of model results. One is the Cross-Validation Root Mean Square Error (RMSECV), and the other is the determination coefficient (R 2 ). The closer the R 2 value is to 1, the better the correlation between the predicted value of the model and the measured value of the sample; and the smaller the RMSECV, the more stable the model performance and the higher the accuracy [21].
Verification samples are used to verify the NIR-DRS quantitative model by taking the Root Mean Square Error of Prediction (RMSEP) as the inspection index. The RPD is the ratio of the standard deviation for the validation samples to the standard error of prediction, which is also the inspection index.

Correction Set and Verification Set Division
One hundred and fifty spectra were obtained by the Antaris II Fourier Transform Near Infrared Spectrometer. Besides, the TQ Analyst 8.0 data processing software randomly selected 120 representative spectra as calibration sets and 30 as validation sets. The correction set content range is 0.12%-0.52% (w/w), and the validation set content range is 0.14%-0.50% (w/w). Because the content of Shengmaxinside I in the verification sample set is within the calibration set content range, this calibration set and verification set can be used for modeling.

Content of Shengmaxinside I in Rhizoma Cimicifugae Extract
According to the above method, the content of Shengmaxinside I in the sample of Rhizoma Cimicifugae was determined. The chromatogram is shown in Figure 3. As it shown in the

Content of Shengmaxinside I in Rhizoma Cimicifugae Extract
According to the above method, the content of Shengmaxinside I in the sample of Rhizoma Cimicifugae was determined. The chromatogram is shown in Figure 3. As it shown in the Figure 3: A. Reference substance; B. Sample; 1. Shengmaxinside I. In order to calculate the content of Shengmaxinside I in Rhizoma Cimicifugae, the sample content is calculated based on dry products. Each batch of samples is measured in parallel and averaged. The results showed that the 150 samples ranged from 0.12% to 0.52% (w/w), shown in Table  2.  In order to calculate the content of Shengmaxinside I in Rhizoma Cimicifugae, the sample content is calculated based on dry products. Each batch of samples is measured in parallel and averaged. The results showed that the 150 samples ranged from 0.12% to 0.52% (w/w), shown in Table 2.

Investigation of Near-Infrared Method
We weighed 2 g powder of Cimicifugae and collected the signal 9 times under the same spectral conditions to calculate the precision. The RSD was 1.09%.
From Figure 2, we can get the following information. There are a few characteristic peaks absorbed in the wavenumber band of 8500-12,000 cm −1 . In the 4000-4200 cm −1 band, fiber absorption is not suitable for modeling because it contains more noise. There are distinct characteristic absorption peaks within 4500-8500 cm −1 , and therefore, selective modeling is selected within this band range.
According to the raw NIR spectra (Figure 2) of 150 samples at wavenumbers ranging from 4000 to 10,000 cm −1 , several characteristic absorption peaks can be seen. For example, 4250 cm −1 is the C-H stretch/C-H deformation, 4357 cm −1 is the stretch and bending combination of -CH 2 , 4762 cm −1 is the performance of stretching vibration of C-C and C=C bonds, 5168 cm −1 due to the second overtone of C=O stretching bands of acetyl and maybe also the stretching and deformation of O-H bonds, 5776 cm −1 results from first overtone of stretching C-H bonds, and 6848 cm −1 is the O-H stretching first overtone. In addition, the second overtone of C-H stretching arises around 8248 cm −1 [22][23][24][25]. These signals could reflect the chemical information of Shengmaxinside I.

Selection of Spectral Pretreatment Methods
In the process of sample collection, due to the influence of the sample's grain size, color, and instrument response, the near-infrared raw spectra often contain factors that are not related to the nature of the sample to be measured, resulting in interference such as near-infrared spectral shift or drift. Thus, it is necessary to carry out pretreatment [2]. In terms of RMSECV, RMSEP, and the value of R 2 as indicators, we need to examine the spectra, first derivative (FD), second derivative (SD), multiple scatter correction (MSC), and normal variable correction (SNV), classic Savitzky-Golay (SG) filtering, Norris derivative filtering, and more other preprocessing methods. With the data processing as the index, the R 2 and the RMSECV are comprehensively examined. As shown in Table 3, the pretreatment method for determining the Shengmaxinside I model was MSC + SD + SG.

Spectral Range Selection
Based on the absorption spectra of each hydrogen-containing group, the content of the materials in the samples were obtained. However, the information contained in the different spectral ranges is different. Therefore, a more accurate quantitative model can be obtained by selecting an appropriate spectral interval model [26]. In this research, the spectrum of Shengmaxinside I standard product was compared with multiple ranges, and R 2 , RMSECV, RPD, and RMSEP were selected as comprehensive indicators to examine. As shown in Table 4, the spectrum used for modeling was finally determined and the intervals are 5200-6700 cm −1 and 7700-8800 cm −1 . There are few characteristic peaks absorbed in the wavelength band of 8500-12,000 cm −1 . When establishing a near-infrared model, it is particularly important to determine the number of best principal factors involved in modeling. If the main factors are too few, much useful information of the original spectrum will be lost, and the fitting will be insufficient, which will reduce the prediction accuracy of the model; if too many, the measurement noise will be excessively high. The phenomenon of overfitting appears to reduce the predictive ability of the model [23]. The number of PLS factors can be determined from Prediction Residual Sum of Squares (PRESS) and RMSECV. The number of PLS factors is determined to be 6.

Verification of the Model
The model is established by the PLS method through TQ Analyst 8.0 software, and spectral preprocess by MSC + SD + SG. The spectral range is 5200-6700 cm −1 and 7700-8800 cm −1 , and the number of factors was 6. The determination coefficient of the model of Shengmaxinside I was 0.9878%, the corrected mean square error (RMSECV) was 0.0193%, and RPD was 5.5130, as shown in Table 5.
The experimental samples were selected from 500 samples and selected by the WinISI 4.3 software to obtain 150 samples which obey the "boxcar" distribution, and the content of each sample is shown in Table 2. The correlation between the predicted content and the authentic content is shown in Figure 4.  . Correlation between predicted content and authentic content. represents calibration; + represents validation.
One hundred and fifty NIR-DRS datasets belonging to the verification set were substituted into the model, and the error distribution map and relative trend comparison chart between the validation set NIR-DRS prediction value and the actual measured value of the reference method were obtained. The model has a predicted mean square error RMSEP of 0.1064%, which is shown in Figure 5, and the model has good predictive power. represents calibration; + represents validation.

Discussion
During the early stage of this experiment, we explored the method of determining Shengmaxinside I in Cimicifugae by HPLC, which has good precision and accuracy. Based on this, we extracted the honeydew medicinal materials from different batches of different origins and determined the content of Shengmaxinside I. The results showed that the differences in the regions affected the content of these components, which provided the basis for the extensive applicability of subsequent quantitative models. This is the first time a model to determine the content of traditional Chinese medicine in honey processing by NIR-DRS has been built, so this conclusion provides an exemplary role for other research on Chinese medicine processing.
Using near-infrared spectroscopy combined with chemometric methods, a quantitative model of Dhengmaxinside I in Cimicifugae was established. In the process of model building, most rely on analysis software and statistical methods to reduce the error caused by human operation. To a certain extent, it predicts the reliability and accuracy of the results, and improves the efficiency of sample measurement. The experimental results show that the established model has good predictive ability. However, in the actual production and analysis process, a faster method is needed because as long as the NIR spectra are obtained by scanning powder samples in the established near-infrared quantitative analysis model, the content of Dhengmaxinside I in the Cimicifugae samples can be quickly predicted. Thus, in this research, a near-infrared quantitative model of Shengmaxinside I was  represents calibration; + represents validation.
One hundred and fifty NIR-DRS datasets belonging to the verification set were substituted into the model, and the error distribution map and relative trend comparison chart between the validation set NIR-DRS prediction value and the actual measured value of the reference method were obtained. The model has a predicted mean square error RMSEP of 0.1064%, which is shown in Figure 5, and the model has good predictive power. represents calibration; + represents validation.

Discussion
During the early stage of this experiment, we explored the method of determining Shengmaxinside I in Cimicifugae by HPLC, which has good precision and accuracy. Based on this, we extracted the honeydew medicinal materials from different batches of different origins and determined the content of Shengmaxinside I. The results showed that the differences in the regions affected the content of these components, which provided the basis for the extensive applicability of subsequent quantitative models. This is the first time a model to determine the content of traditional Chinese medicine in honey processing by NIR-DRS has been built, so this conclusion provides an exemplary role for other research on Chinese medicine processing.
Using near-infrared spectroscopy combined with chemometric methods, a quantitative model of Dhengmaxinside I in Cimicifugae was established. In the process of model building, most rely on analysis software and statistical methods to reduce the error caused by human operation. To a certain extent, it predicts the reliability and accuracy of the results, and improves the efficiency of sample measurement. The experimental results show that the established model has good predictive ability. However, in the actual production and analysis process, a faster method is needed because as long as the NIR spectra are obtained by scanning powder samples in the established near-infrared quantitative analysis model, the content of Dhengmaxinside I in the Cimicifugae samples can be quickly predicted. Thus, in this research, a near-infrared quantitative model of Shengmaxinside I was represents calibration; + represents validation.
One hundred and fifty NIR-DRS datasets belonging to the verification set were substituted into the model, and the error distribution map and relative trend comparison chart between the validation set NIR-DRS prediction value and the actual measured value of the reference method were obtained. The model has a predicted mean square error RMSEP of 0.1064%, which is shown in Figure 5, and the model has good predictive power. represents calibration; + represents validation.
One hundred and fifty NIR-DRS datasets belonging to the verification set were substituted into the model, and the error distribution map and relative trend comparison chart between the validation set NIR-DRS prediction value and the actual measured value of the reference method were obtained. The model has a predicted mean square error RMSEP of 0.1064%, which is shown in Figure 5, and the model has good predictive power. represents calibration; + represents validation.

Discussion
During the early stage of this experiment, we explored the method of determining Shengmaxinside I in Cimicifugae by HPLC, which has good precision and accuracy. Based on this, we extracted the honeydew medicinal materials from different batches of different origins and determined the content of Shengmaxinside I. The results showed that the differences in the regions affected the content of these components, which provided the basis for the extensive applicability of subsequent quantitative models. This is the first time a model to determine the content of traditional Chinese medicine in honey processing by NIR-DRS has been built, so this conclusion provides an exemplary role for other research on Chinese medicine processing.
Using near-infrared spectroscopy combined with chemometric methods, a quantitative model of Dhengmaxinside I in Cimicifugae was established. In the process of model building, most rely on analysis software and statistical methods to reduce the error caused by human operation. To a certain extent, it predicts the reliability and accuracy of the results, and improves the efficiency of sample measurement. The experimental results show that the established model has good predictive ability. However, in the actual production and analysis process, a faster method is needed because as long as the NIR spectra are obtained by scanning powder samples in the established near-infrared quantitative analysis model, the content of Dhengmaxinside I in the Cimicifugae samples can be quickly predicted. Thus, in this research, a near-infrared quantitative model of Shengmaxinside I was . Correlation between predicted content and authentic content. represents calibration; + ts validation. dred and fifty NIR-DRS datasets belonging to the verification set were substituted into d the error distribution map and relative trend comparison chart between the validation prediction value and the actual measured value of the reference method were obtained. as a predicted mean square error RMSEP of 0.1064%, which is shown in Figure 5, and s good predictive power. represents calibration; + represents validation.
the early stage of this experiment, we explored the method of determining represents calibration; + represents validation.

Discussion
During the early stage of this experiment, we explored the method of determining Shengmaxinside I in Cimicifugae by HPLC, which has good precision and accuracy. Based on this, we extracted the honeydew medicinal materials from different batches of different origins and determined the content of Shengmaxinside I. The results showed that the differences in the regions affected the content of these components, which provided the basis for the extensive applicability of subsequent quantitative models. This is the first time a model to determine the content of traditional Chinese medicine in honey processing by NIR-DRS has been built, so this conclusion provides an exemplary role for other research on Chinese medicine processing.
Using near-infrared spectroscopy combined with chemometric methods, a quantitative model of Dhengmaxinside I in Cimicifugae was established. In the process of model building, most rely on analysis software and statistical methods to reduce the error caused by human operation. To a certain extent, it predicts the reliability and accuracy of the results, and improves the efficiency of sample measurement. The experimental results show that the established model has good predictive ability. However, in the actual production and analysis process, a faster method is needed because as long as the NIR spectra are obtained by scanning powder samples in the established near-infrared quantitative analysis model, the content of Dhengmaxinside I in the Cimicifugae samples can be quickly predicted. Thus, in this research, a near-infrared quantitative model of Shengmaxinside I was established by NIR-DRS combined with chemometric methods. Comparing the two methods of content determination, NIR-DRS is more convenient and faster than HPLC. It is suitable for determination of large batches of medicinal materials without damaging the sample and is safe and environmentally friendly. However, a limitation is that it requires the establishment of a quantitative model, and a determined chemical measurement method is required as a bedding, which is not suitable for the determination of a small sample or a small dose of an unmodeled drug. Furthermore, the Cimicifuga sources used in the establishment of this model are all from China, so there may be limitations in the analysis of Cimicifuga from other regions.
Although near-infrared spectroscopy technology is convenient and quick, the process of establishing the model in the early stage is complex and needs to be based on traditional chemical methods and cannot be completely replaced. Therefore, in order to fully exploit the strengths of the NIR-DRS method, follow-up research will be devoted to the establishment of an extensive library of traditional Chinese medicine near infrared models.