Quantitative Characterization of Arnicae flos by RP-HPLC-UV and NIR Spectroscopy.

The possibility of applying near-infrared (NIR) spectroscopy to monitor 13 active components (phenolic acids, flavonoids, and sesquiterpene lactones) in Arnicae flos was studied. The preprocessing of the spectra were performed by using the conventional Golay-Savitzky procedure and the newly developed step-by-step filter. The results obtained show that the step-by-step filter derivatives provide a better signal-to-noise ratio at a lower convolution window. Better calibration for the content of protocatechuic acid, chlorogenic acid, caffeic acid, p-cumaric acid, ferulic acid, isoquercitrin, and quercetin were obtained by step-by-step filter derivatives, compared to the direct raw spectra processing and the Golay-Savitzky approach. Although the step-by-step filter substantially reduces the spectral distortion, the convolution procedure leads to loss of spectral points in the red end of the spectral curve. Probably for this reason this approach shows better calibration only in seven of the monitored 13 active components.


Introduction
The use of herbal medicinal products (HMP) is becoming increasingly relevant for modern healthcare as an alternative to conventional medicine [1]. Plant substances are typically characterized by a diverse composition, which can vary greatly according to the conditions and manner of cultivation, harvesting, processing and, storage [2]. In order to achieve reproducible quality and safety of the HMP, the raw materials from which they are harvested should be subjected to a comprehensive qualitative and quantitative analysis to ensure their authenticity and compliance with the pharmacopoeial requirements [3].
The subject of this paper is the plant substance Arnicae flos obtained from the species Arnica montana L. (mountain arnica) and Arnica chamissonis Less. (Asteraceae). It is widely used in herbal and homeopathic medicine as an anti-inflammatory agent for external use on sprains, hematomas and for arthritic pain [4]. The pharmacological effects of the substance are due to a complex of active ingredients, the most important of which are the sesquiterpene lactones (STL) [5,6] and the phenolic compounds [5,7,8]. In 1998, Lange et al. [9] evaluated the use of Arnicae flos in Europe at over 50,000 kg of dry substance, and according to Franke et al. [10], over 20,000 kg of dried flowers are required to cover the annual needs of the German market alone. Today, the raw material is harvested both from wild populations and from cultivation.

HPLC-Analysis
In the current study, HPLC analysis was used for quantitation of phenolic compounds and sesquiterpene lactones. However, it should be noted that some alternative methods can be found described in the literature [14,15].
The chromatographic analysis was conducted on an HPLC-system produced by Varian (Varian, Inc., Walnut Creek, CA, USA) comprising of: Tertiary pump model 9012, Rheodyne manual injector with an injecting volume of 10 µm, and a UV-vis detector model 9050. The chromatographic columns used are as follows: The registering and treatment of the chromatographic data was conducted using Varian Star Chromatography Work Station software (Version 4.5, Varian, Palo Alto, CA, USA).
The chromatograms for each sample were registered at wavelengths consistent with the absorption maximum of the studied compounds-310 nm for the phenolic acids, 360 nm for the flavonoids and 225 nm for the sesquiterpene lactones, respectively.

Spectral Measurements
Spectral data were recorded using a double beam JASCO V-570 UV-Vis-NIR (200-2500 nm) spectrophotometer (JASCO International Co, Tokyo, Japan), equipped with an ILN-470 (JASCO International Co, Tokyo, Japan) integrating sphere (200-2000 nm) for the measurement of the reflectance spectra of solid and powdered substances. For each sample, three replicates were measured after homogenizing at optimal instrumental conditions (scan speed-100nm/min, detector response-slow, resolution-1nm).

Data Processing
The trial version of the Unscrambler software (v 9.7, Camo, Trondheim, Norway) was used to obtain regression models of the components.
The small number of samples available for building the model necessitated that the whole number be included in the calibration set. The model was validated using the leave-one-out cross-validation method. The precision of the final model was evaluated by the square root of the correlation coefficient (R 2 ) and RMSECV (root mean square error of cross-validation). The RMSECV was determined by removing one of the samples from the calibration set followed by recalculating the model on the remaining samples and eventually testing it on the sample that was left out. This was repeated on all of the samples and the results were averaged. RMSECV was calculated using the equation (1): where n is the number of the samples included in the calibration set; y i is the reference value of the concentration for the i-th sample;ŷ i is the predicted concentration value for the i-th sample when the i-th sample is subtracted from the model. One of the crucial parameters in the construction of the calibration model is the number of principal components (PCs). In a simple system of one substance, the number of PCs would reflect the concentration of the substance and the influence of external factors-temperature, instrument conditions, impurities, etc. In a complex system, such as plant material, which contains hundreds of individual substances, it can prove difficult to predict all the relevant influences and their impact on the resulting spectrum. Nevertheless, it is essential to choose the optimal number of PCs that reflect the majority of variations in the composition of the samples, excluding those resulting from random errors and fluctuations. As a rule, determining the optimal number of PCs can be done in two different ways: • choosing the lowest value of RMSECV or • selecting the lowest number of PCs by which the largest percentage of variation can be described The best result was obtained by combining both of these approaches, with the RMSECV value being the decisive factor for choosing an optimal number of PCs. The influence of the number of PCs on the slope and the offset of the calibration curve was also evaluated.
When selecting samples to be included in the calibration set, the variation of the y-axis reflecting the concentration of the components should also be taken into account. The calibration set should not only fully cover the range of values, measured for the whole set of samples, but also to exclude any outliers which introduce a deviation in the linearity of the model, making further applications to samples with an unknown composition unreliable. Therefore, it is very important for outliers to be detected before the construction of the model and removed from the calibration set. In the set of samples described in this paper, outliers were observed for the kaemphferol and quercetin concentrations in the samples (G) and (H) ( Table 1) originating from Poland and Central America, respectively. This could be clearly seen on the graphical representation of quercetin (a) and kaemphferol (b) concentrations in the available samples ( Figure 1). Before the construction of the calibration model, the spectral data were centered so that the absolute reflectance value for each wavelength in every individual spectrum was subtracted from the mean value at that wavelength for all samples. This procedure is beneficial in cases where the relative variation between samples is more important than the absolute variation.
The final calibration models for all quantitatively determined compounds contained in Arnicae flos, were evaluated according to the minimum RMSECV value. The RMSECV values, R 2 and, number of the major components are shown in Table 3.  Before the construction of the calibration model, the spectral data were centered so that the absolute reflectance value for each wavelength in every individual spectrum was subtracted from the mean value at that wavelength for all samples. This procedure is beneficial in cases where the relative variation between samples is more important than the absolute variation.
The final calibration models for all quantitatively determined compounds contained in Arnicae flos, were evaluated according to the minimum RMSECV value. The RMSECV values, R 2 and, number of the major components are shown in Table 3.  Step-by-Step filter (SBSF); correlation coefficient (R 2 ); principal components (PCs); Golay-Savitzky (GS); root mean square error of cross-validation (RMSECV).

Results and Discussion
The raw spectra of the samples of Arnicae flos in the range 250-2000 nm are shown in Figure 2a. As can be seen, there are three broad peaks at 1920, 1720 and 1450 nm in the NIR region. Two peaks corresponding to the orange color (650 nm) of the petals and the green color (500 nm) of the involucral bracts are observed in the area of the visible spectrum. Basically, the visible region exhibits a relatively greater variation between the individual spectra as compared to the near-infrared one, due to the non-homogenic distribution of the powder particles of the samples and the presence of different plant parts. These variations were reduced substantially by measuring three replicates of each sample. In the region around 900 nm there is a sharp peak due to the change of the detector. This part of the spectra was removed to avoid spikes in the derivative curves. The same was done with the range 250-270 nm where the intensity of the sample led to saturation of the signal and to increased straight light. The corresponding curves are given in Figure 2b. It should be noted that the removal of these areas from the spectra significantly increased the quality of the derivative spectra and reduced errors in the model validation. Step-by-Step filter (SBSF); correlation coefficient (R 2 ); principal components (PCs); Golay-Savitzky (GS); root mean square error of cross-validation (RMSECV).

Results and Discussion
The raw spectra of the samples of Arnicae flos in the range 250-2000 nm are shown in Figure 2a. As can be seen, there are three broad peaks at 1920, 1720 and 1450 nm in the NIR region. Two peaks corresponding to the orange color (650 nm) of the petals and the green color (500 nm) of the involucral bracts are observed in the area of the visible spectrum. Basically, the visible region exhibits a relatively greater variation between the individual spectra as compared to the near-infrared one, due to the non-homogenic distribution of the powder particles of the samples and the presence of different plant parts. These variations were reduced substantially by measuring three replicates of each sample. In the region around 900 nm there is a sharp peak due to the change of the detector. This part of the spectra was removed to avoid spikes in the derivative curves. The same was done with the range 250-270 nm where the intensity of the sample led to saturation of the signal and to increased straight light. The corresponding curves are given in Figure 2b. It should be noted that the removal of these areas from the spectra significantly increased the quality of the derivative spectra and reduced errors in the model validation.  Table 1.
The raw spectra were preprocessed by using alternatively the Golay-Savitzky (GS) method and the step-by-step filter (SBSF). The parameters used (polynomial degree and number of points included in the filter window) were determined empirically according to the observed signal-to-noise ratio. It should be also taken into account that the convolution procedure used led to the loss of spectral data at the beginning and at the end of each spectral curve ((filter window-1)/2).
By varying the size of the filter window ( Figure 3) when the GS method is applied with a filter window of 5 points the presence of sharp and narrow peaks, characteristic of noise patterns, is evident. Whilst a filter window of 15 points gave a good level of smoothing it also produced a decrease in the peak intensity resulting in the loss of information. Consequently, a filter window of 10 points was selected as an optimal compromise that preserved the maximum amount of useful information contained in the spectra, while lowering the noise level.  Table 1.
The raw spectra were preprocessed by using alternatively the Golay-Savitzky (GS) method and the step-by-step filter (SBSF). The parameters used (polynomial degree and number of points included in the filter window) were determined empirically according to the observed signal-to-noise ratio. It should be also taken into account that the convolution procedure used led to the loss of spectral data at the beginning and at the end of each spectral curve ((filter window-1)/2).
By varying the size of the filter window ( Figure 3) when the GS method is applied with a filter window of 5 points the presence of sharp and narrow peaks, characteristic of noise patterns, is evident. Whilst a filter window of 15 points gave a good level of smoothing it also produced a decrease in the peak intensity resulting in the loss of information. Consequently, a filter window of 10 points was selected as an optimal compromise that preserved the maximum amount of useful information contained in the spectra, while lowering the noise level.
The calculation of the first derivative spectra using the SBSF requires a considerably smaller filter window, which allows a satisfactory smoothing of the spectrum while retaining its informativeness. When comparing two derivative spectra ( Figure 4) produced with a filter window of 2 and 5 points, respectively, a significant improvement in the signal-to-noise ratio can be already be observed with a smaller filter window. Increasing the window leads to a loss of spectral information in the long-wavelength region. For that reason, a filter window of two points was selected as optimal, where no decrease in peak intensity in the long-wavelength region is observed, which compared to the GS-derived derivative spectra, is a key advantage in the further construction of the regression models.  of 2 and 5 points, respectively, a significant improvement in the signal-to-noise ratio can be already be observed with a smaller filter window. Increasing the window leads to a loss of spectral information in the long-wavelength region. For that reason, a filter window of two points was selected as optimal, where no decrease in peak intensity in the long-wavelength region is observed, which compared to the GS-derived derivative spectra, is a key advantage in the further construction of the regression models. The direct comparison of the two preprocessing methods, made in Figure 4, very clearly shows that SBSF provides no attenuation in the near infrared area and maintains better sensitivity in this region compared to GS. The first derivative spectra obtained by both methods are shown in Figure 5. The direct comparison of the two preprocessing methods, made in Figure 4, very clearly shows that SBSF provides no attenuation in the near infrared area and maintains better sensitivity in this region compared to GS. The first derivative spectra obtained by both methods are shown in Figure 5.
The statistical parameters of the calibration models obtained by using both raw and first derivative spectra are collected in Table 3. As already described, due to the limited number of samples, the cross-validation method type leave-one-out was used for model validation. It is evident from Table 3, that nearly all tested compounds tend to have an equal or lower number of principal components in models built on the derivative spectra as compared to the raw ones. This is most likely due to the reduced influence of the noise and the effect of the baseline, achieved through the smoothing and the derivatization of the spectra. Figure 6 shows a graphical comparison of the obtained RMSECV values (Table 3) for the individual methods with different types of spectral data processing. The smaller the value of RMSECV, the better the model describes the available data set. Foods 2018, 7, x FOR PEER REVIEW 10 of 12  Table 1.
The statistical parameters of the calibration models obtained by using both raw and first derivative spectra are collected in Table 3. As already described, due to the limited number of samples, the cross-validation method type leave-one-out was used for model validation. It is evident from Table 3, that nearly all tested compounds tend to have an equal or lower number of principal components in models built on the derivative spectra as compared to the raw ones. This is most likely due to the reduced influence of the noise and the effect of the baseline, achieved through the smoothing and the derivatization of the spectra. Figure 6 shows a graphical comparison of the obtained RMSECV values (Table 3) for the individual methods with different types of spectral data processing. The smaller the value of RMSECV, the better the model describes the available data set.  Table 1.  Table 4) obtained from the different methods of spectral data processing: zero order curve -grey box; first derivative with GS method-striped box; first derivative with SBSF-black box. RMSECV-root mean square error of cross-validation.
Furthermore, for the sesquiterpene lactones (6), astragalin (10) and isorhamnetin-3-glu (11) a significantly higher error for the raw spectra were observed compared to the derivative ones. Presumably, this can be explained by the relatively strong effect of the baseline in areas where a large portion of the useful information about these compounds is contained.
For some of the components (1, 2, 3, 7, 8 and 9), better results were obtained with the raw spectra than with the spectra processed with the GS method. This was probably due to the attenuation of the peak intensity of the longest wavelengths of the spectrum. Since these peaks contain much of the useful information of the spectral data, it is essential that they are maximally retained in the derivative spectrum. Therefore, in such cases, the SBSF proves more suitable and offers an advantage to the GS-method.

Conclusions
A comparative study of the applicability of two preprocessing techniques in UV-Vis-NIR spectroscopy, namely Golay-Savitzky (GS) smoothing/differentiation and the step-by-step filter (SBSF), for monitoring of the 13 active compounds in Arnicae flos was performed. Although SFBF shows some obvious advantages -better signal-to-noise ratio and lower spectra distortion at a low convolution window, it loses a substantial number of spectral points in the red end of the spectral curve. This is probably the reason this approach shows better calibration only in 7 of the 13 active components monitored.