In 2004, the U.S. Food and Drug Administration issued its “Process Analytical Technology (PAT) Industry Guide,” which encourages pharmaceutical companies to develop innovative drugs and ensure quality during manufacturing [1
]. PAT is considered a quality system. Its purpose is to collect real-time information on all aspects of critical processes and to guide the process towards its desired state, hence ensuring the quality of the final product. Online technology is mentioned several times in the guide. Online near-infrared (NIR) sensors have been proven to be one of most efficient and advanced tools available for monitoring and controlling the production and processing of food, agricultural products, pharmaceuticals and petroleum. Killner et al
. utilized a NIR sensor for online monitoring of the progress of the catalyzed transesterification reactions of soybean oil that produced biodiesel [2
]. Collell et al
. used three different NIR sensors to predict superficial water activity and moisture content in two types of fermented sausages [3
]. Marín-González et al
. evaluated the performance and accuracy of online measurement of soil properties using indirect spectral response in NIR spectral range [4
]. All these investigations mentioned above confirmed that online NIR sensors can be used to monitor and control the quality of production and processing, but all used a single quantification PLS model.
NIR has recently come to be regarded as an excellent sensor for the monitoring of processes in Chinese Herbal Medicine (CHM). Online NIR applications have been reported in CHM. The use of online NIR sensors for quality control reveals an increasing trend. Firstly, online NIR sensor can record spectra for liquid CHM samples, e.g., the extraction solution of extraction processes, concentration processes, alcohol precipitation processes and purification processes. Wu et al
. monitored the extraction process of chlorogenic acid from Lonicera japonica
using online NIR sensors and established multivariate models including PLS and interval partial least squares (iPLS) models, which produced encouraging results regarding the reliability of online NIR sensors in the monitoring of extraction processes in CHM [5
]. Qu et al
. showed that the process of concentrating the alcohol extract from red ginseng alcohol extraction can be monitored online using a NIR sensor coupled with a PLS regression model [6
]. Jin et al
. investigated the use of a NIR sensor combined with particle swarm optimization-based (PSO-based) least square support vector machine (LS-SVM) regression and PLS regression for quantitative online monitoring of alcohol precipitation of the Danhong injection formulation [7
]. Liu et al
. reported that NIR sensors using PLS were used to monitor online the column separation and purification of madecassoside and asiaticoside of Centella asiatica L. Urban
Online NIR sensors can record spectra for solid CHM samples, such as tablets, capsules, plasters, and pills. Wan et al
. developed a NIR sensor and PLS for rapid, nondestructive analysis of the content and moisture levels of semi-manufactured products generated from the preparation of granular dried of gingko leaf dispersible tablets [9
]. Geng et al
. established a method for online analysis of the paeoniflorin content of the Chuanhong Huoxue capsule extraction process with a NIR sensor [10
]. Jiang et al
. investigated the use of NIR combined with PLS regression for online monitoring of the content of baicalin in Shang Jie plaster extract solutions [11
]. Jin et al
. established a simple and speedy method of monitoring the bleeding of Zhongsheng pill powder online based on diffuse reflectance NIR spectra, and the moving block standard deviation (MBSD) method was used to identify the endpoint of the bleeding process [12
CHMs have their own characteristics, including a complex chemical composition and low-concentration active pharmaceutical ingredients (API). For this reason, any NIR model used to quantify API should be validated. PLS is a popular multivariate calibration technique for quantitative analysis of NIR spectral data [13
]. It is a dimension reduction technique that involves finding a set of latent variables of two variable blocks. Although it is very useful in the resolution of calibration problems, the PLS model is susceptible to unrelated and collinear spectral variables [14
]. Recently, increasing amounts of attention have been paid to the use of ensemble methods, such as bagging and partial least squares regression (bagging-PLS), in multivariate regression [15
]. Bagging was proposed by Breiman. It involves reducing the variation in predictors by aggregating several models obtained in the course of sampling. Bagging involves combining the results of these models into one. In this way, this method can be viewed as combinations of many models and can be used to summarize them. This gives better predictive performance than a single model. It can also prevent over fitting [16
]. The robustness of these PLSR models and their predictions can be improved by combining them with ensemble techniques. For example, Viscarra Rossel tested the implementation of bagging with PLSR using vis-NIR and mid-IR diffuse reflectance spectra to predict soil organic carbon, which showed bagging-PLSR to be more robust than PLSR alone [17
(Zhiqiao), the dried, immature fruit of Citrus aurantium
L., is well-known in CHM [18
]. Flavonoids are derived from Fructus aurantii
and via previous modern pharmacological studies and clinical trials they have been proven to have anti-oxidative, anti-inflammatory, antiviral, and anti-dyspeptic effects [19
According to the literature, studies investigating the use of bagging with multivariate calibration of online NIR CHM sensors have only rarely been reported. The purpose of the present paper is to compare the performance of PLS to that of bagging-PLS using online NIR sensors regarding the monitoring of the pilot-scale Fructus aurantii extraction process.
2. Materials and Methods
Fructus aurantii was purchased from Ben Cao Fang Yuan (Beijing, China). Naringin reference standard (No. 110722-201312), hesperidin reference standard (No. 110721-201316), and neohesperidin reference standard (No. 111857-201102) were supplied by the National Institutes for Food and Drug Control (Beijing, China). Acetonitrile (Fisher Scientific, Fair Lawn, NJ, USA) was HPLC grade. Acetic acid (Beijing Chemical Works, Beijing, China) was analytical grade. Deionized water was prepared by a Milli-Q water system (Millipore Corp., Bedford, MA, USA).
2.2. Preparation of Samples
(6.5 kg) was extracted three times with 10-fold deionized water in a multi-functional extractor (100 L), once every 1.5 h. The speed of the stirring paddle was set to 50 rpm. During the extraction process, the NIR spectra were scanned periodically. According to the content of three ingredients, a reasonable sampling interval was designed (Table 1
). During the initial heating and boiling phase, the levels of components varied rapidly, so the sampling interval was set very small. In the second and third stages of the extraction process, the levels of components varied less than during the first extraction stage. In this way, the interval could be adjusted to reduce the amount of work required.
The sampling interval of extraction process.
The sampling interval of extraction process.
|process||Heating||0–1 h||1–1.5 h|
|1st extraction||3 min||4 min||4 min|
|2nd extraction||5 min||5 min||5 min|
|3rd extraction||5 min||6 min||10 min|
The process system included the online NIR scanning sensor and extraction equipment containing a sampling device (Figure 1
). The whole process can be described as follows: the tank was added with CHM and extracted with deionized water. The extraction solution was circulated in the bypass under the action of a pump. Bubbles and solid content could interfere with the spectra to a considerable extent. For this reason, 80 μm and 100 μm filters were used to eliminate these interference factors when the extraction solution passed through the filters during bypass [24
Temperature was recorded in real time using thermometers. Throughout the extraction process, spectra were collected using online NIR instruments with optical fibers. After the cessation of NIR scanning, the switch was opened, and about 10 mL of extraction solution was collected for HPLC determination.
Platform of extraction.
Platform of extraction.
2.3. NIR Equipment and Measurement
The NIR spectra were collected online using fiber optic probes. NIR radiation was applied through a 2 mm optical using an XDS process analyzer and VISION software (Foss NIR System, Foss, Silver Spring, MD, USA). The wavelength range of spectra was between 800 nm and 2200 nm. Each spectrum was the average of 32 scans, and the wavelength increment was 0.5 nm.
2.4. HPLC Method for Fructus aurantii
Amounts of naringin, hesperidin and neohesperidin standards were accurately weighed using an XS205DU electronic balance (Mettler Toledo, Greifensee, Switzerland). The samples were dissolved in methanol, and the concentrations of hesperidin, naringin and neohesperidin were 0.1392 mg/mL, 0.1380 mg/mL and 0.1044 mg/mL, respectively. An HPLC assay was used to determine the levels of hesperidin, naringin and neohesperidin [19
The concentration of the solution obtained from the extraction process was too high for the HPLC assay. For this reason, at the initial heating and boiling phase, 5 mL and 1 mL of the extraction solution were transferred into a 25 mL volumetric flask, and diluted to volume with 20% aqueous methanol, respectively. Meanwhile, during the second heating and boiling phase of extraction process, 2.5 mL and 2 mL of the extraction solution were transferred into 25 mL volumetric flask and diluted to volume with 20% aqueous methanol. During the third extraction stage, the solution was used immediately. The solutions obtained as above were filtered through a 0.45 μm membrane filter for subsequent analysis.
The chromatographic analysis of Fructus aurantii was carried out using a Waters 2695 HPLC system and Waters 2996 DAD detector (Waters Technologies, Milford, MA, USA). The sample solutions were analyzed using reverse-phase chromatography on Diamonsil C18 column (250 mm × 4.6 μm, Dikma, Beijing, China) with gradient elution of the mobile phase consisting of acetonitrile and deionized water with 0.1 acetic acid (v/v) at a flow rate of 1.0 mL/min. The column temperature was 30 °C and detection wavelength was set to 283 nm. A 10 μL volume of the extracted fluid was injected into the HPLC system for analysis.
2.5. Preprocessing and Variable Selection Methods
To improve the accuracy of the model performance, derivatives, including first (1D) and second derivatives (2D), were used to reduce baseline variation and to enhance spectral features [25
]. Then the Savitzky-Golay smoothing filter was combined with 11 points to depress the background noise that had been amplified by the derivative [26
]. Standard normal variation (SNV) and multiplicative scatter correction (MSC) were used to reduce the influence of small particles in extraction solution [27
]. Normalize was also used to preprocess the raw spectra and so produce an accurate model [29
Chemometric methods of synergy interval partial least squares (SiPLS) and moving window partial least squares (MWPLS) were used to select variables [30
]. The principle underlying the SiPLS algorithm was that the full spectrum was split into a number of smaller intervals. Several intervals were combined to form a joint model, and these joint models were optimized using the RMSECV value. Several good joint models were combined to select the best sub-interval combination. MWPLS was also used to select variables. The function of the MWPLS model was briefly described and used to identify the informative regions and to approximate latent factors. In effect, a window of size H
was moved across the data set to collect modeling information. The RMSECV value was calculated and used to find the best spectral regions of size H
. If the model was of sufficient quality, the value of RMSECV was lower than the value of PLS. In this way, informative regions were optimized so that they had lower RMSECV values than PLS models.
2.6. Software and Data Analysis
Data analysis was performed using the Unscrambler 9.6 software package (CAMO Software AS, Trondheim, Norway), and home-made routines programmed in MATLAB code (MATLAB v7.0, the Math Works, Natick, MA, USA). The toolbox of SiPLS and MWPLS that had been used to select the informative variables were downloaded from the Internet [32
]. Others algorithms were modified on the basis of Norgaard algorithms developed by the current team. According to Kennard-Stone (KS) algorithm, 75 samples were divided to 50 calibration samples and 25 validation samples. In addition, the coefficient of determination in calibration (R2cal
), the coefficient of determination in cross validation (R2val
), the coefficient of determination in prediction (R2pre
), root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP) were used to evaluate the PLS model and SiPLS model. The MWPLS model was evaluated according to the RMSECV value, and the bagging-PLS model was evaluated according to the RMSECV value.