1. Introduction
The use of herbal medicinal products (HMP) is becoming increasingly relevant for modern healthcare as an alternative to conventional medicine [
1]. Plant substances are typically characterized by a diverse composition, which can vary greatly according to the conditions and manner of cultivation, harvesting, processing and, storage [
2]. In order to achieve reproducible quality and safety of the HMP, the raw materials from which they are harvested should be subjected to a comprehensive qualitative and quantitative analysis to ensure their authenticity and compliance with the pharmacopoeial requirements [
3].
The subject of this paper is the plant substance
Arnicae flos obtained from the species
Arnica montana L. (mountain arnica) and
Arnica chamissonis Less. (Asteraceae). It is widely used in herbal and homeopathic medicine as an anti-inflammatory agent for external use on sprains, hematomas and for arthritic pain [
4]. The pharmacological effects of the substance are due to a complex of active ingredients, the most important of which are the sesquiterpene lactones (STL) [
5,
6] and the phenolic compounds [
5,
7,
8]. In 1998, Lange et al. [
9] evaluated the use of
Arnicae flos in Europe at over 50,000 kg of dry substance, and according to Franke et al. [
10], over 20,000 kg of dried flowers are required to cover the annual needs of the German market alone. Today, the raw material is harvested both from wild populations and from cultivation.
The tremendous importance of mountain arnica for the pharmaceutical market and the variations in composition and pharmacological activity, require the use of reliable analytical techniques for qualitative and quantitative characterization and high-speed monitoring of the raw material. In the present paper, we aim to investigate the prospect of applying a non-destructive near-infrared (NIR) method for the rapid quantification of pharmacologically relevant components (sesquiterpenic lactones and phenolic compounds) of
Arnicae flos using high performance liquid chromatography (HPLC) as a reference method. In the chemometric processing of the spectral data, along with the traditionally used Golay-Savitzky differentiation procedure, [
11,
12] we use a newly developed “step by step” filter [
13] that substantially reduced spectral distortion. To our best knowledge, this is the first comparative study of these two pre-processing approaches in the investigation of dried medicinal plants.
2. Materials and Methods
2.1. Sample Material
The experiment encompasses samples of the plant substance
Arnicae flos with diverse origins: Two Bulgarian, one Polish cultivated collection, three cultivars, three botanical garden collections, one purchased from a pharmacy store and one from a wild population (
Table 1). The extracts, as well as the NIR-spectra, were prepared using whole inflorescences including the involucral bract, which were shade-dried at room temperature and ground to a particle size of 2 mm.
2.2. HPLC-Analysis
In the current study, HPLC analysis was used for quantitation of phenolic compounds and sesquiterpene lactones. However, it should be noted that some alternative methods can be found described in the literature [
14,
15].
The chromatographic analysis was conducted on an HPLC-system produced by Varian (Varian, Inc., Walnut Creek, CA, USA) comprising of: Tertiary pump model 9012, Rheodyne manual injector with an injecting volume of 10 µm, and a UV-vis detector model 9050. The chromatographic columns used are as follows:
For the phenolic compounds—Hypersil ODS C18, 5 μm, 250 × 4.6 mm I.D. (Shandon, Runcom, England), with precolumn 30 × 4.6 mm (Interchim, Montluço France) with the same adsorbent.
For the sesquiterpene lactones—Luna 5 μm C18 100 A, 150 × 4.6 mm (Phenomenex, Torrance, CA, USA), with precolumn 30 × 4.6 mm (Interchim, Montluçon, France) with the same adsorbent.
The registering and treatment of the chromatographic data was conducted using Varian Star Chromatography Work Station software (Version 4.5, Varian, Palo Alto, CA, USA).
The chromatograms for each sample were registered at wavelengths consistent with the absorption maximum of the studied compounds—310 nm for the phenolic acids, 360 nm for the flavonoids and 225 nm for the sesquiterpene lactones, respectively.
For HPLC, separation of the phenolic compounds used an optimized method, as already described [
16]. The analysis of the sesquiterpene lactones was conducted using the following conditions: Flow rate of 1.0 mL/min; temperature 35 °C. Mobile phase composition (A—water; B—methanol) and linear gradients in respect of A were as follows: 0 min–55%; 22 min–50%; 35 min–40%; 37 min–35%; 40 min–15% and 45 min–40%.
The quantitative analysis of the phenolic compounds was performed using the external standard method. The content of flavonoid glycosides was calculated as isoquercitrin, for the flavonoid aglycones as quercetin, and for the phenolic acids using standard solutions of each corresponding acid. For the quantification of the sesquiterpene lactones, a standard solution of santonin (1 mg/mL) was used as an internal standard. The obtained data are collected in
Table 2.
2.3. Spectral Measurements
Spectral data were recorded using a double beam JASCO V-570 UV-Vis-NIR (200–2500 nm) spectrophotometer (JASCO International Co, Tokyo, Japan), equipped with an ILN-470 (JASCO International Co, Tokyo, Japan) integrating sphere (200–2000 nm) for the measurement of the reflectance spectra of solid and powdered substances. For each sample, three replicates were measured after homogenizing at optimal instrumental conditions (scan speed—100nm/min, detector response—slow, resolution—1nm).
The first derivative spectra were calculated alternatively using Golay-Savitzky (GS) differentiation (filter window = 10 points, polynomial degree = 2) [
11] and the step-by-step filter (SBSF) [
17] (filter window = 2 points, polynomial degree = 3) [
12]. The software used for this purpose is described elsewhere [
17].
Spectral data from all samples were used in the intervals:
270–850; 907–2000 nm for non-derivate spectra,
270–850; 935–1920 nm for spectra treated with GS,
270–850; 935–1980 nm for spectra treated with SBSF,
in order to eliminate the scattering caused by the change of the detector from UV-Vis to NIR.
2.4. Data Processing
The trial version of the Unscrambler software (v 9.7, Camo, Trondheim, Norway) was used to obtain regression models of the components.
The small number of samples available for building the model necessitated that the whole number be included in the calibration set. The model was validated using the leave-one-out cross-validation method. The precision of the final model was evaluated by the square root of the correlation coefficient (R
2) and RMSECV (root mean square error of cross-validation). The RMSECV was determined by removing one of the samples from the calibration set followed by recalculating the model on the remaining samples and eventually testing it on the sample that was left out. This was repeated on all of the samples and the results were averaged. RMSECV was calculated using the equation (1):
where n is the number of the samples included in the calibration set; y
i is the reference value of the concentration for the i-th sample;
is the predicted concentration value for the i-th sample when the i-th sample is subtracted from the model.
One of the crucial parameters in the construction of the calibration model is the number of principal components (PCs). In a simple system of one substance, the number of PCs would reflect the concentration of the substance and the influence of external factors—temperature, instrument conditions, impurities, etc. In a complex system, such as plant material, which contains hundreds of individual substances, it can prove difficult to predict all the relevant influences and their impact on the resulting spectrum. Nevertheless, it is essential to choose the optimal number of PCs that reflect the majority of variations in the composition of the samples, excluding those resulting from random errors and fluctuations. As a rule, determining the optimal number of PCs can be done in two different ways:
The best result was obtained by combining both of these approaches, with the RMSECV value being the decisive factor for choosing an optimal number of PCs. The influence of the number of PCs on the slope and the offset of the calibration curve was also evaluated.
When selecting samples to be included in the calibration set, the variation of the y-axis reflecting the concentration of the components should also be taken into account. The calibration set should not only fully cover the range of values, measured for the whole set of samples, but also to exclude any outliers which introduce a deviation in the linearity of the model, making further applications to samples with an unknown composition unreliable. Therefore, it is very important for outliers to be detected before the construction of the model and removed from the calibration set. In the set of samples described in this paper, outliers were observed for the kaemphferol and quercetin concentrations in the samples (G) and (H) (
Table 1) originating from Poland and Central America, respectively. This could be clearly seen on the graphical representation of quercetin (a) and kaemphferol (b) concentrations in the available samples (
Figure 1).
Before the construction of the calibration model, the spectral data were centered so that the absolute reflectance value for each wavelength in every individual spectrum was subtracted from the mean value at that wavelength for all samples. This procedure is beneficial in cases where the relative variation between samples is more important than the absolute variation.
The final calibration models for all quantitatively determined compounds contained in
Arnicae flos, were evaluated according to the minimum RMSECV value. The RMSECV values, R
2 and, number of the major components are shown in
Table 3.
3. Results and Discussion
The raw spectra of the samples of
Arnicae flos in the range 250–2000 nm are shown in
Figure 2a. As can be seen, there are three broad peaks at 1920, 1720 and 1450 nm in the NIR region. Two peaks corresponding to the orange color (650 nm) of the petals and the green color (500 nm) of the involucral bracts are observed in the area of the visible spectrum. Basically, the visible region exhibits a relatively greater variation between the individual spectra as compared to the near-infrared one, due to the non-homogenic distribution of the powder particles of the samples and the presence of different plant parts. These variations were reduced substantially by measuring three replicates of each sample. In the region around 900 nm there is a sharp peak due to the change of the detector. This part of the spectra was removed to avoid spikes in the derivative curves. The same was done with the range 250–270 nm where the intensity of the sample led to saturation of the signal and to increased straight light. The corresponding curves are given in
Figure 2b. It should be noted that the removal of these areas from the spectra significantly increased the quality of the derivative spectra and reduced errors in the model validation.
The raw spectra were preprocessed by using alternatively the Golay-Savitzky (GS) method and the step-by-step filter (SBSF). The parameters used (polynomial degree and number of points included in the filter window) were determined empirically according to the observed signal-to-noise ratio. It should be also taken into account that the convolution procedure used led to the loss of spectral data at the beginning and at the end of each spectral curve ((filter window-1)/2).
By varying the size of the filter window (
Figure 3) when the GS method is applied with a filter window of 5 points the presence of sharp and narrow peaks, characteristic of noise patterns, is evident. Whilst a filter window of 15 points gave a good level of smoothing it also produced a decrease in the peak intensity resulting in the loss of information. Consequently, a filter window of 10 points was selected as an optimal compromise that preserved the maximum amount of useful information contained in the spectra, while lowering the noise level.
The calculation of the first derivative spectra using the SBSF requires a considerably smaller filter window, which allows a satisfactory smoothing of the spectrum while retaining its informativeness. When comparing two derivative spectra (
Figure 4) produced with a filter window of 2 and 5 points, respectively, a significant improvement in the signal-to-noise ratio can be already be observed with a smaller filter window. Increasing the window leads to a loss of spectral information in the long-wavelength region. For that reason, a filter window of two points was selected as optimal, where no decrease in peak intensity in the long-wavelength region is observed, which compared to the GS-derived derivative spectra, is a key advantage in the further construction of the regression models.
The direct comparison of the two preprocessing methods, made in
Figure 4, very clearly shows that SBSF provides no attenuation in the near infrared area and maintains better sensitivity in this region compared to GS. The first derivative spectra obtained by both methods are shown in
Figure 5.
The statistical parameters of the calibration models obtained by using both raw and first derivative spectra are collected in
Table 3. As already described, due to the limited number of samples, the cross-validation method type leave-one-out was used for model validation. It is evident from
Table 3, that nearly all tested compounds tend to have an equal or lower number of principal components in models built on the derivative spectra as compared to the raw ones. This is most likely due to the reduced influence of the noise and the effect of the baseline, achieved through the smoothing and the derivatization of the spectra.
Figure 6 shows a graphical comparison of the obtained RMSECV values (
Table 3) for the individual methods with different types of spectral data processing. The smaller the value of RMSECV, the better the model describes the available data set.
The lowest values of RMSECV were obtained from the models built using SBSF for the following compounds: protocatechuic acid (1), chlorogenic acid (2), caffeic acid (3), p-cumaric acid (4), ferulic acid (5), isoquercitrin (8), and quercetin (12). Similar values were observed for the models built using SBSF and GS for the sesquiterpene lactones (6), apigenin-7-glucoside (9) and kaempferol (13). The best results for astragalin (10) and isorhamnetin-3-glucoside (11) were obtained from the GS derivatization of the spectra.
Furthermore, for the sesquiterpene lactones (6), astragalin (10) and isorhamnetin-3-glu (11) a significantly higher error for the raw spectra were observed compared to the derivative ones. Presumably, this can be explained by the relatively strong effect of the baseline in areas where a large portion of the useful information about these compounds is contained.
For some of the components (1, 2, 3, 7, 8 and 9), better results were obtained with the raw spectra than with the spectra processed with the GS method. This was probably due to the attenuation of the peak intensity of the longest wavelengths of the spectrum. Since these peaks contain much of the useful information of the spectral data, it is essential that they are maximally retained in the derivative spectrum. Therefore, in such cases, the SBSF proves more suitable and offers an advantage to the GS-method.