Application of Transmission Raman Spectroscopy in Combination with Partial Least-Squares (PLS) for the Fast Quantification of Paracetamol

In recent years, transmission Raman spectroscopy (TRS) has emerged as a potent new tool for rapid, nondestructive quantitation in pharmaceutical manufacturing. In order to expand the applicability of TRS and enhance its use in product quality monitoring during drug production, we aimed, in the present study, to apply partial least-squares (PLS) approaches to build a model consisting of 150 handmade tablets and covering 15 levels through the use of a multifactor orthogonal design of experiment (DOE), which was used to predict concentrations of validation tablets made by hand. The difference between results according to HPLC and TRS were negligible. The model was used to predict the active pharmaceutical ingredient (API) content in four random commercial paracetamol tablets, and corrected with the spectra of the commercial tablets to obtain four corresponding models. The results show that the content relative error in the model’s predictions after correction with commercially available tablets was significantly lower than that before correction. The corrected model was used to make predictions for 20 tablets from the brand Panadol. Compared with the HPLC results, the prediction relative error was basically less than 4.00%, and the relative standard deviation (RSD) of the content was 0.86%.


Introduction
In pharmaceutical manufacturing and finished product testing, determining the content of drugs using high-performance liquid chromatography (HPLC) testing is not only time-consuming but also destructive. In recent years, transmission Raman spectroscopy (TRS) has been widely used in the quantification of API and excipients in drugs [1][2][3] and the quantification of polymorphs in pharmaceutical formulations [4,5]. It is a fast and practical technique and also has the ability to obtain highly chemical-specific information and quantitative volumetric data from thick and highly turbid samples [6][7][8].
Compared to HPLC, TRS has the characteristics of no preprocessing, no damage to the sample, and fast determination [9]. In addition, unlike the backscatter mode, transmission Raman geometry can reduce the difference between the results of TRS and HPLC, offering a much improved accuracy and precision by maximizing the sampling volume when the laser beam is directed onto the sample from one side and the Raman signal is collected from the other side, allowing the laser photons to move through the entire body of the sample to convey molecular spectroscopic information on its volumetric content [10][11][12][13].
An additional benefit of this method over conventional backscattering Raman spectroscopy is the ability to suppress Raman and fluorescence signals from a tablet coating or capsule shell [14,15]. TRS typically exhibits excellent specificity with many sharp and distinct features that can be assigned readily to individual components [16], while nearinfra-red spectroscopy (NIRS) results often contain broader and overlapping features [17,18]. This makes it easier to interpret TRS results and visualize changes in composition.
TRS spectra contain multiple peaks of the various Raman-active compounds in a sample, which overlap with each other to form a starting point for a complex, informative analysis. Chemometrics (multivariate analysis) allows us to analyze these complex data. The main multivariate analysis methods are partial least-squares (PLS), principal component analysis (PCA), partial least-squares discriminant analysis (PLS-DA), and constrained regularization (CR) [19]. Compared to other methods, partial least-squares (PLS) is a multivariate data analysis method based on principal component analysis and principal component regression. It is one of the most widely used multivariable calibration methods; it has good selectivity and prediction accuracy, and is suitable for complex multicomponent spectra. PLS can eliminate the influence of data collinearity and effectively reduce the dimensions of spectral data.
Paracetamol is a nonsteroidal antipyretic and analgesic mainly used to treat fever, headache, joint pain, and other symptoms caused by the common cold or influenza [20]. Currently, in the manufacturing of paracetamol tablets and the quantitation of the final product, HPLC is usually used to measure the API content, presenting disadvantages such as the consumption of the chromatographic column and solvent, complicated preprocessing, and deviations in results obtained by different operators. In addition, electroanalytical [21], capillary electrophoretic [22,23], and spectrophotometric methods [24] have also been applied to the determination of paracetamol content [25].
The use of TRS for quantification has been previously reported [26,27]; for example, Griffen et al. [1] studied the quantification of all the constituents in a set of tablets consisting of five components (containing paracetamol) using this method. Their study was a proofof-concept study, which provided sufficient theoretical support for our experiments. The authors demonstrated the feasibility of the technology using a compound handmade tablet, but commercially available tablets are often not suitable for evaluation with the established model due to changes in the composition, proportion, and shape of the tablets. Additionally the study of modeling process parameters such as acquisition time, laser power, and wavelength has not been optimized. In this study, we made use of the spectra of the commercially available tablets to correct the established model, which made the model more applicable and reduced the time required for modeling so that it could be used for high-throughput overall analysis, realizing online batch quality control and nondestructive analysis of continuous production processes.
The purpose of this research was to develop a method for determining the content of paracetamol tablets using transmission Raman spectroscopy in combination with PLS. The model was optimized by changing the type of signal collector, wavelength, preprocessing method, and other parameters, and was corrected by HPLC in order to predict the contents of paracetamol tablets. The API contents in currently marketed paracetamol tablets were predicted and measured and the results were compared with the HPLC results, with the comparison suggesting that the model can be used in pharmaceutical production processes. The specific process is shown in the Figure 1

Method Feasibility
In the feasibility stage, we assessed the viability of a TRS application without venturing into a complete method development effort; we compared the API and excipients using a Principal Component Analysis (PCA) dose-response analysis or other appropriate assessment of method feasibility [28,29]. The powdered material of the API and mixture of excipients were dispensed into small, clear 7.5 cm 2 plastic bags and scanned by TRS with the best acquisition parameters. Because of the large quantity of powder and the high volumetric sensing capability of TRS, the signal contribution of the thin plastic bag here can be considered negligible.
The original spectrum in Figure 2 shows that paracetamol has unique characteristic absorption peaks at 840 cm −1 , 1170 cm −1 , 1240 cm −1 , 1324 cm −1 , and 1550-1670 cm −1 , compared to the other components. The peaks at 840 cm −1 and 1550-1670 cm −1 originated from out-of-plane C-H bending and amide I and amide II bands, respectively [30]. The peaks at 1170 cm −1 , 1240 cm −1 , and 1324 cm −1 were separately derived from the symmetric stretching vibration of C-N-C, the stretching vibration of benzene -OH, and the symmetric variant of CH3. This was recognized by TRS after mixing with other substances so that TRS could be used to quantify the API values of paracetamol tablets.  In the feasibility stage, we assessed the viability of a TRS application without venturing into a complete method development effort; we compared the API and excipients using a Principal Component Analysis (PCA) dose-response analysis or other appropriate assessment of method feasibility [28,29]. The powdered material of the API and mixture of excipients were dispensed into small, clear 7.5 cm 2 plastic bags and scanned by TRS with the best acquisition parameters. Because of the large quantity of powder and the high volumetric sensing capability of TRS, the signal contribution of the thin plastic bag here can be considered negligible.
The original spectrum in Figure 2 shows that paracetamol has unique characteristic absorption peaks at 840 cm −1 , 1170 cm −1 , 1240 cm −1 , 1324 cm −1 , and 1550-1670 cm −1 , compared to the other components. The peaks at 840 cm −1 and 1550-1670 cm −1 originated from out-of-plane C-H bending and amide I and amide II bands, respectively [30]. The peaks at 1170 cm −1 , 1240 cm −1 , and 1324 cm −1 were separately derived from the symmetric stretching vibration of C-N-C, the stretching vibration of benzene -OH, and the symmetric variant of CH 3 . This was recognized by TRS after mixing with other substances so that TRS could be used to quantify the API values of paracetamol tablets. PLS is a calibration algorithm, namely a kind of multivariate analysis (MVA) method used for the analysis of mixtures [31]. In the development of the PLS calibration model, seven levels were selected for calibration ( * in Table 1). In this study, we explored acqui-

Development of PLS Calibration Model
PLS is a calibration algorithm, namely a kind of multivariate analysis (MVA) method used for the analysis of mixtures [31]. In the development of the PLS calibration model, seven levels were selected for calibration (* in Table 1). In this study, we explored acquisition parameters such as the type of signal collector, laser power, acquisition time, preprocessing method, and so on, in order to build the most suitable model by assessment of the root mean square error of correction (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP), and linearity (R 2 ) of each model, and made good use of the PLS method to process and analyze the raw data. In a suitable calibration model, there should be no significant differences between RMSEC and RMSECV; if such differences are present, it means the sample is not representative or the model information is not sufficiently extracted. RMSEP was used as the model evaluation index to evaluate the accuracy of prediction. The most suitable models were selected because they combined a low RMSECV and RMSEC with good linearity. The quality of the model has an important relationship with the instrument parameters and data processing methods. Comparing the results from the varying acquisition parameters, as shown in Figure 3, we can conclude that the model with a 4 mm laser illumination spot diameter at 0.5 w laser power with an M-type signal collector for 10.5 s (0.35 s × 30 accumulations) acquisition time was most suitable, and the optimum processing method for spectra was derivative, multiplicative scatter correction, and mean center (DMM) instead of baselined, standard normal variate, and mean center (BSM) (date not shown). In order to make the model more available, we increased the number of levels from 7 to 15 (all levels in Table 1) and changed the tablet number at each level from 5 to 10 (data not shown). Additionally, we selected the latent variable number 3 instead of 4 to avoid interference by other non-characteristic peaks, although the RMSEC and RMSECV values associated with the number 4 were more closed.
As shown in Figure 4, when the acquisition wavelength was 1700-170 cm −1 , the RMSEP was the lowest and the RMSEC and RMSECV were the closest with no significant difference, so that the model had a good prediction accuracy. Furthermore, the excipients had a very distinctive peak at about 180 cm −1 (as shown in Figure 2), which the acquisition wavelength range should contain, so that the relative intensities of the API and excipient signals could be compared during the modeling process. 73.95 * Indicates calibration samples, • indicates samples removed from calibration for use as independent validation samples, ♦ indicates the mixture of crospovidone (23.41%), sodium propyl p-hydroxybenzoate (2.34%), povidone k25 (9.93%), alginic acid (59.48%), silica (1.63%), and magnesium stearate (3.21%). As shown in Figure 4, when the acquisition wavelength was 1700-170 cm −1 , the RMSEP was the lowest and the RMSEC and RMSECV were the closest with no significant difference, so that the model had a good prediction accuracy. Furthermore, the excipients had a very distinctive peak at about 180 cm −1 (as shown in Figure 2), which the acquisition wavelength range should contain, so that the relative intensities of the API and excipient signals could be compared during the modeling process.
In order to make the results predicted by the model closer to the true values, the API theoretical concentration (in Table 1) in the original model was replaced with the actual  The scores of the model built using 15 levels are exhibited in Figures 5 and 6. As shown in Figure 5, we assessed the degree of dispersion of the data from different perspectives, examining whether there were particularly extreme points in the data. Hotelling and Q-residuals are further model statistics that can be used to judge model performance and sample quality within a calibration sample set. The two statistics describe the similarities of samples within the calibration space. From Figure 5A, it can be seen that In order to make the results predicted by the model closer to the true values, the API theoretical concentration (in Table 1) in the original model was replaced with the actual concentration measured by HPLC and the model was corrected after optimizing parameters.
The scores of the model built using 15 levels are exhibited in Figures 5 and 6. As shown in Figure 5, we assessed the degree of dispersion of the data from different perspectives, examining whether there were particularly extreme points in the data. Hotelling and Q-residuals are further model statistics that can be used to judge model performance and sample quality within a calibration sample set. The two statistics describe the similarities of samples within the calibration space. From Figure 5A, it can be seen that most of the samples sat within the reduced statistic threshold, but the red (samples of level 15 in Table 1) samples were outliers and sat away from the other samples. It may be that there were unknown compounds that produced noise interference during the mixing with API. The score of Residuals vs. Leverage in Figure 5B was used to judge whether there were extreme points, and we found one yellow point (one sample of level 10 in Table 1) that sat away from the other samples of the same level. In the manual tableting process, mixing time affects the similarity of tablets at the same concentration level. However, Figure 5C shows that 150 samples were all located within the 95% confidence level, and the samples of the same color were close to each other, so outliers did not need to be excluded and the model score was acceptable.   Table 1.  Table 1).

Model Validation
In order to evaluate whether the model was established successfully, it was necessary to use the model to predict results for actual tablets. The established model was validated by taking the HPLC result as the true value and the TRS result as the predicted value. The established models with 15 levels were used to predict the contents of 15 samples (3 levels (• in Table 1) × 5 tablets) using TRS and HPLC, in order to validate the feasibility of the model. The results in Figure 7 show that the relative errors between TRS and HPLC were basically within 2%, which indicates that the model predicted the API content feasibly and accurately. Although there were three samples that exceeded 2%, due to the high concentration of API used in this study, the API and excipients may not have been uniformly mixed during the self-made tablet process, which may have caused difference in tablets. Tablets could be mixed by machine or over an increased mixing time to eliminate this difference. Additionally, as shown in Table 2, the linear regression equation y = 0.7672x + 16.79 (x means the content measured by HPLC, y means the content measured by TRS) was applied and the R 2 was 0.9099. These results show that TRS guaranteed the accuracy and precision of the measurement with high speed and saved time.   Table 1).
The RMSEC (1.0144), RMSECV (1.0972), and R 2 (0.881, 0.861) values are shown in Figure 6. The closeness in values of RMSEC and RMSECV indicates that the model scores were acceptable. The 150 points were a bit scattered, which may have been caused by insufficient mixing of materials during the sample preparation process, and can be improved by increasing the mixing time to improve R 2 . The quality of the model should not be judged only by the model score, but also by the relative error between the predicted value and the true value during the validation process, which is mentioned in Section 2.1.3.

Model Validation
In order to evaluate whether the model was established successfully, it was necessary to use the model to predict results for actual tablets. The established model was validated by taking the HPLC result as the true value and the TRS result as the predicted value. The established models with 15 levels were used to predict the contents of 15 samples (3 levels (• in Table 1) × 5 tablets) using TRS and HPLC, in order to validate the feasibility of the model. The results in Figure 7 show that the relative errors between TRS and HPLC were basically within 2%, which indicates that the model predicted the API content feasibly and accurately. Although there were three samples that exceeded 2%, due to the high concen-tration of API used in this study, the API and excipients may not have been uniformly mixed during the self-made tablet process, which may have caused difference in tablets. Tablets could be mixed by machine or over an increased mixing time to eliminate this difference. Additionally, as shown in Table 2, the linear regression equation y = 0.7672x + 16.79 (x means the content measured by HPLC, y means the content measured by TRS) was applied and the R 2 was 0.9099. These results show that TRS guaranteed the accuracy and precision of the measurement with high speed and saved time.

Quantification of Marketed Paracetamol Tablets
The composition and proportions of the commercially available tablets are often different from those of the handmade tablets used to build the model. In order to prove the availability of the model built, four commercially available paracetamol tablets (Panadol, Anlipai, Guike, and Jinlu) were randomly selected (four brands × five tablets) and two tablets of each brand were scanned using TRS to obtain eight spectra (four brands × two tablets). We used the spectra in the model that had already been built to correct the model, so that each brand corresponded to a model which was used to predict the contents of other three tablets. The predicted contents were then compared with the results of HPLC (shown in Table 3).  Table 1; each level contains 10 handmade tablet spectra (150 spectra in total). 2 The model of 152 spectra was built on the basis of the model of

Quantification of Marketed Paracetamol Tablets
The composition and proportions of the commercially available tablets are often different from those of the handmade tablets used to build the model. In order to prove the availability of the model built, four commercially available paracetamol tablets (Panadol, Anlipai, Guike, and Jinlu) were randomly selected (four brands × five tablets) and two tablets of each brand were scanned using TRS to obtain eight spectra (four brands × two tablets). We used the spectra in the model that had already been built to correct the model, so that each brand corresponded to a model which was used to predict the contents of other three tablets. The predicted contents were then compared with the results of HPLC (shown in Table 3). As shown in Figure 8, in the process of quantifying the API contents of commercially available drugs, the models established using two tablets of the commercial drugs and 15 levels (152 spectra in total) were more accurate in predicting the content. Compared with the model built with only 15 levels (150 spectra in total), the relative error was greatly reduced. In the four corrected models, the relative errors of the content predicted by TRS were less than 5% compared with the results of HPLC. Because the compositions and proportions of the four branded tablets were different from the tablets made by hand, the relative error values in the prediction process were within the acceptable range, which shows that the model is suitable for determination of API content of commercially available paracetamol tablets. For different manufacturers, their tablets were used to correct the model and make it suitable for determination of the manufacturer's paracetamol tablets, which makes the model widely applicable.
Repeatability is one of the most important factors in quantitative assays using TRS. A total of 20 paracetamol tablets sold under the brand of Panadol (17.60 × 7.42 × 5.00 mm 3 , with white coating) was selected and the content was measured by TRS (shown in Table 4 and Figure 9). The RSD of the content measured by TRS was 0.86% and the relative error of the results between HPLC and TRS was basically within 4.00%. For the determination of the API content of 20 tablets, TRS was able to complete determination rapidly. Compared with HPLC, it greatly saves analysis time, and has good accuracy and repeatability.
were less than 5% compared with the results of HPLC. Because the compositions and pro-portions of the four branded tablets were different from the tablets made by hand, the relative error values in the prediction process were within the acceptable range, which shows that the model is suitable for determination of API content of commercially available paracetamol tablets. For different manufacturers, their tablets were used to correct the model and make it suitable for determination of the manufacturer's paracetamol tablets, which makes the model widely applicable. Repeatability is one of the most important factors in quantitative assays using TRS. A total of 20 paracetamol tablets sold under the brand of Panadol (17.60 × 7.42 × 5.00 mm 3 , with white coating) was selected and the content was measured by TRS (shown in Table  4 and Figure 9). The RSD of the content measured by TRS was 0.86% and the relative error of the results between HPLC and TRS was basically within 4.00%. For the determination of the API content of 20 tablets, TRS was able to complete determination rapidly. Compared with HPLC, it greatly saves analysis time, and has good accuracy and repeatability.

Preparation of Samples
According to the preparation instructions, the prescriptions were determined and powder mixtures were prepared according to DOE to make the samples. A total of 150 samples (15 levels × 10 tablets) were prepared according to the design shown in Table 1. Each powder with multiple ingredients of each level was enough to make 10 tablets. This allowed us to cover the whole calibration space while minimizing the number of samples to be prepared. The mixtures of API and excipients were mixed and compressed using a 17.5 × 7.5 × 5.5 mm 3 flat surface tablet die in a DP30A single-punch tablet machine into 10 tablets per level. The tablets weighed on average~660 mg with a range between 650 and 670 mg.

Transmission Raman Spectroscopy Conditions
The TRS spectra of API and mixture of excipients were collected using a TRS instrument and ContentQC software. The acquisition parameters employed a 4 mm laser illumination spot diameter at 0.5 W laser power with an M-type signal collector, and the acquisition time was 10.5 s (0.35 s × 30 accumulations). The processing method of spectra was derivative, multiplicative scatter correction, and mean center (DMM). All TRS spectra were recorded from 1700 cm −1 to 170 cm −1 and brought into the Solo software with the corresponding concentrations (%) to build a calibration model.

Chromatographic Conditions of HPLC-UV
The API contents of the tablets were evaluated using an HPLC-UV method. The separation was carried out at 30 • C using an MGIIC18 250 mm × 4.6 mm × 5 µm column (Capcell pak, Shiseido, Japan), with a mobile phase containing methanol: water (1:3) at a flow rate of 1.5 mL min −1 for 6 min. Detection was carried out at 243 nm. Under these conditions, paracetamol had a retention time of 3.95 min. The results of API content measured by HPLC were brought into the model to replace the original contents in order to calibrate the model.

Conclusions
This study demonstrated the feasibility of quantifying the content of pharmaceutical tablets noninvasively using TRS. The model was established and optimized by changing the parameters, then used to measure the contents of four commercially available paracetamol tablets. For the quantification of active ingredients, TRS was found to be suitable for multivariate regression model development, resulting in models with increased predictive capacity. The method can greatly reduce the analysis period and sample consumption, achieve online analysis on the production line, and realize the quality control of drugs during the production process. The ability to yield spectrum-specific information and rapidly predict the content of API will unlock a range of new applications in pharmaceutical settings.