A Novel Validated Injectable Colistimethate Sodium Analysis Combining Advanced Chemometrics and Design of Experiments

Colistimethate sodium (CMS) is widely administrated for the treatment of life-threatening infections caused by multidrug-resistant Gram-negative bacteria. Until now, the quality control of CMS formulations has been based on microbiological assays. Herein, an ultra-high-performance liquid chromatography coupled to ultraviolet detector methodology was developed for the quantitation of CMS in injectable formulations. The design of experiments was performed for the optimization of the chromatographic parameters. The chromatographic separation was achieved using a Waters Acquity BEH C8 column employing gradient elution with a mobile phase consisting of (A) 0.001 M aq. ammonium formate and (B) methanol/acetonitrile 79/21 (v/v). CMS compounds were detected at 214 nm. In all, 23 univariate linear-regression models were constructed to measure CMS compounds separately, and one partial least-square regression (PLSr) model constructed to assess the total CMS amount in formulations. The method was validated over the range 100–220 μg mL−1. The developed methodology was employed to analyze several batches of CMS injectable formulations that were also compared against a reference batch employing a Principal Component Analysis, similarity and distance measures, heatmaps and the structural similarity index. The methodology was based on freely available software in order to be readily available for the pharmaceutical industry.


Introduction
The increasing resistance of Gram-negative bacteria (GNB) to all known antibiotics (e.g., penicillins, aminoglycosides and β-lactams) and the absence of new, effective drugs against multidrug resistant bacteria (MDR) led to the reconsideration of old-generation antibiotics such as the polymyxins [1][2][3][4]. Colistin (polymyxin E) re-emerged in the 1990s to cover the then-inefficient treatment of MDR-GNB-caused infections [5,6]. It is deemed to be a last-resort in the era of antibiotic resistance; however, it induces high neuro-and nephrotoxicity; thus, its administration is performed carefully to consider the patient's condition, adjustment of the dosing regimen and assessment of the risk-benefit balance. Colistin can be administered as colistin sulfate (CS) either orally or topically and as colistimethate sodium (CMS) parenterally or by inhalation. CMS is the inactive, less-toxic prodrug of colistin that hydrolyzes in vivo in the active form of colistin [7].
The knowledge of CMS pharmacokinetics is important for determining the dosing regimen to minimize the drug's toxicity and increase its therapeutic window. Hence, several analytical methodologies have been reported for the indirect determination of CMS in biological fluids after its hydrolysis to colistin, based mainly on liquid chromatography coupled to mass spectrometry (LC-MS) [8][9][10][11][12][13].
CMS is produced by sulfomethylation of colistin's five free primary amine groups by reductive amination adding formaldehyde and sodium hydrogen sulfite sequentially. Nevertheless, the actual structure of CMS has not been clarified. Barnett et al. have reported that the CMS structure involves the mono-sulfomethylation of one to five amino groups [14], but EMA reported that amino groups are either non-substituted or bis-sulfomethylated [15]. Thus, conflicting evidence regarding the degree of sulfomethylation and the multiplicity of the substitution of amine groups occurs. He et al. [16] have demonstrated that CMS is actually a mixture of various methanesulfonate derivatives. Differences have been observed in the content of four CMS brands that lead, after intravenous administration, to different plasma concentration-time profiles of the formed active colistin in rat plasma. Worth mentioning is that although the above-mentioned products were standardized by microbiological assays in vitro, the exposure to active colistin is different in vivo, emphasizing the need for CMS content standardization to achieve precise control over the drug's bioavailability [17,18]. It should be noted that there is no analytical methodology for the quality control of CMS pharmaceutical products in the pharmacopoeia, partly because they were approved by the FDA in the late 1950s when control procedures were much less strict.
Our laboratory reported an ultra-high-performance liquid chromatography-mass spectrometry (UPLC-MS) methodology to determine the chemical characterization of CMS content in injectable formulations [19]. The aim of the presented study is the development of liquid chromatography coupled with an ultraviolet detector (UPLC-UV) to be applied as a quality-control procedure for CMS injectable formulations. This method is essentially different from the previous described UPLC-MS because the experimental design encompasses different a variable: wavelength against mass spectrometric peak. Furthermore, the mobile phase needs to be optimized against the UV signal because UV detection is affected in a vastly different way compared to mass spectrometry. It also has to be noted that the data treatment follows a completely different pipeline as the UV data are univariate and also collected by a low-resolution technique. Another step forward compared to UPLC-MS is the adoption of CMS peak ratios and the statistical treatment thereof. This method results in a unique metric that can be employed to characterize the quality of the tested CMS batch.
Pharmaceutical companies employ mainly UV-based instrumentation, so the development of such a method was deemed useful for filling the gaps in the currently used microbiological assays, which are generally deemed to be of rather limited accuracy compared to chemical analysis methods. Metcalf et al. [20] reported an HPLC-UV method for a CMS assay in pharmaceutical aerosol samples, but only two chromatographic peaks of CMS were taken into account for the quantitation. In this study, all the chromatographic peaks corresponding to CMS compounds were taken into consideration for the assay of CMS in injectable formulations. The method was fully validated and applied to commercial batches to assess their content consistency.
An effort was made to use freely available software so that the scientific community and pharmaceutical companies could have undisturbed access to the presented method for use as a CMS quality control procedure. The corresponding code can be found in the Supplementary Materials in order to be readily used.

Design of Experiments
CMS is an extremely complex mixture, so the optimized separation of its compounds was considered critical for unveiling the maximum possible number of underlying peaks and minimizing any matrix effects or interferences. Therefore, a Design-of-Experiments (DoE) strategy was followed, targeting the separation of the mixture components. It should be noted that Design-Expert 11 software from StatEase is not freely available; however, its use is not needed before each analysis. The proposed parameters are unaltered so it can be used without performing a DoE analysis before each analysis.
In our previous published UPLC-MS method, an extensive optimization was performed for both the chromatographic and mass-spectrometric parameters. As the same UPLC instrument and analytical column were used for the current UPLC-UV method, it was deemed unnecessary to optimize the ratio of the organic modifier and the gradient elution program because a very good separation of CMS components had been already achieved using these parameters. In changing the detector from mass spectrometer to UV, it was deemed necessary to optimize the UV parameters. Thus, the ammonium formate concentration in the aqueous mobile phase was optimized, as it was found to affect the intensity of CMS chromatographic peaks, presumably because it presents UV absorbance in the near range. Therefore, the wavelength was also optimized since the baseline drift and the retention time changed with the concentration of the buffer and the CMS chromatographic peaks, which were eluted in different ratios of the two elution solvents. For the same reasons, the column temperature was deemed crucial for optimization, the goals of which were to obtain the highest resolution possible between the chromatographic peaks along with the highest sensitivity. The Box-Behnken design was preferred over other designs, such as the central composite and the three-level full-factorials designs, because it is more efficient at giving an accurate determination of the interactions between the variables [21]. Seventeen experiments including five center points were performed. Ammonium formate was tested at concentration levels 1 × 10 −3 , 2 × 10 −3 and 3 × 10 −3 M, wavelength at 214, 217 and 220 nm, while the temperature values were 29, 34.5 and 40 • C. There were two response factors: (1) the sum of the resolution between the peaks in each chromatogram and (2) the sum of the peak areas. Preferably, these numbers should be maximized, as the large resolution values indicated less overlapping and a higher sensitivity in the increased sum of areas. These values were calculated automatically using Empower software. The "ANalysis Of Variance" (ANOVA) showed that reduced cubic models (excluding all the uninformative interactions) could best fit the data, with p-values 0.0003 and 0.0002 for response factor 1 and 2 respectively, while the lack of fit was non-significant for both models. The repeatability (% Residual Standard Deviation, % RSD, n = 5) was excellent, with values 2.56% and 5.00% for the sum of resolutions and the sum of areas, respectively. The optimized conditions suggested by the obtained model were 1 × 10 −3 M ammonium formate, 214 nm and 29 • C column temperature with a desirability value equal to 0.753. These conditions revealed 29 chromatographic peaks, while the corresponding number before the DoE experiments was 15.

Data Pre-Processing Baseline Correction
Even though the conditions for the analysis were optimized, chromatograms presented a high baseline due to the low detection wavelength. After applying several algorithms from the "baseline" package, the "Iterative baseline correction algorithm based on mean" was selected for the baseline correction. The criterion for choosing the "baseline" package parameters was the linearity of the calibration curve for each resolved CMS compound after application of the correction. The metric used for the choice of the algorithm and the corresponding parameters employed was the linearity function of each compound estimated by the correlation coefficient for each calibration curve. It should be noted that for the construction of the calibration curves that led to the determination of the baseline parameters, peak fitting, as described below, was also performed. The values of the parameters were set at 6 for the primary smoothing (lambda), 10 for the maximum number of iterations and 2000 for the number of buckets. Two values were tested for the half-width of local windows (hwi), 10 and 30. It was found that at the hwi setting of 30, 23 peaks were linear (R 2 > 0.99), while that number was only 15 at the hwi setting of 10. The commands for the baseline correction are presented in Supplementary A.1. The original chromatogram and the chromatogram after the baseline correction are shown in Figure 1.

Peak Fitting and Integration
Fityk was used to reveal, accurately describe and integrate the chromatographic peaks after the baseline correction. Peaks were added using the relative mode until the residuals were minimized in the corresponding plot, which indicated a good fitting (Supplementary B, Figure S1). Caution was taken to reveal and integrate the same peaks in all the acquired chromatograms. After processing, 29 peaks were discovered as is presented in Figure 2. It should be noted that in our previous UPLC-MS work, it was not possible to match the chromatographic peaks with the CMS components, due to the fact that a plethora of m/z signals were detected under each chromatographic peak perhaps because of sourceinduced dissociation. Thus, the chromatographic peaks of the current chromatogram acquired by UPLC-UV were not matched to the m/z values of CMS components.

Data Analysis-Application to CMS Commercial Batches
As was stressed in our previous publication [19], the establishment of a CMS golden standard for the comparison of the batches is necessary for CMS quality control, which until now was based on microbiological assays, not on chemical content characterization. An ideal golden standard should exhibit high antimicrobial activity and low toxicity. Due to the fact that such a standard has not yet been established, the method development and validation presented here were performed by selecting a CMS batch arbitrarily as the reference.
The conformity of the CMS content (by means of the content of the respective forms in the formulation) was tested among different batches (n = 5; b1, b2, b3, b4 and b5) as well as within the same batch (n = 3; b4a, b4b and b4c). It should be noted that one of the tested batches had expired a year earlier (b5). Test solutions of 1 mg mL −1 for the reference and the tested CMS batches were prepared, diluted in LC-MS-grade water to a target concentration of 160 µg mL −1 and analyzed.

Univariate Model Construction
Linear models for each CMS peak were constructed using the Linear Regression platform of MEPHAS after importing the areas of the integrated peaks as comma-separated values (CSV) files (Supplementary A.2) After construction of linear regression models for the 29 chromatographic peaks, it was observed that 23 presented a correlation coefficient of R 2 > 0.99; thus, only these linear models were finally kept. The best fit values of the intercept (B0) and of the coefficient of the linear term (B1), as well as the standard errors associated to the coefficients, the R 2 and the standard error of estimate (Sy.x) for the 23 univariate models are presented in Table 1. Peaks were named based on their retention times. The values of the linear terms cannot be explained on the basis of the CMS structure as these still remain unknown. However, it can be observed that CMS components with UV absorbance values > 0.01 ( Figure 2) presented B0 in the range of 10 −5 , while peaks with lower absorbance had B0 in the range of 10 −6 . The differences in the absorbance could mainly be attributed to the amount of these components in the CMS content. The interpolated values for the tested batches are presented in Table 2. The upper and lower limits of the interpolated values were also calculated at a 95% confidence interval (Supplementary B, Table S1). Table 1. The best-fit values B0 and B1, the standard errors associated to the coefficients, the correlation coefficient (R 2 ) and the standard error of the estimate (Sy.x) of the 23 linear models for the CMS peaks, as they were calculated by the MEPHAS software.  In the 5-year-old batch (b5) most of the peaks were higher than those in the younger batches. However, there were also peaks that were much lower compared to the other tested batches. As the structure of CMS components is unknown, it was not clear why this was happening. An explanation is that some CMS components underwent degradation (e.g., peak_8.73, peak_10.43, peak_23.13, peak_23.65), thus resulting in lower intensities. On the other hand, the resultant components combined with those already existed in the CMS content, led to increased intensities (e.g., peak_9.3, peak_12.76). Another reason is the different degree of substitution of the CMS components that could have occurred during the manufacturing of the pharmaceutical products 5 years ago. The presented work is very useful for exactly this reason: to show the possible inconsistencies of the batches owing to different degrees of substitution of CMS components that eventually led to different bioavailability. The stock solution (1 mg mL −1 ) of the b5 batch as well as of all batches was freshly weighted and diluted with accuracy, so the differences in the content of b5 was not attributed to wrong handling.

Multivariate Model Construction
A Partial Least Squares regression (PLSr) model was constructed employing the Partial Least Squares Regression platform of MEPHAS, importing the file with the areas of the 29 chromatographic peaks (Supplementary A.2). The CMS peak areas were used as the independent variables (X), whereas the levels of CMS were the dependent (Y). The leaveone-out cross-validation and the SIMPLS algorithm were selected for the PLSr analysis. The optimized number of components was determined by the construction of models with 1, 2, 3 and 4. The mean square error of prediction (MSEP) and the root mean square error of prediction (RMSEP) were high at the first component, whereas they were much lower even for the second component addition. The R 2 was 0.997 using the first component. Thus, only the first component was kept for the PLSr model, as it was adequate for the model description (Supplementary B, Figure S2). The absolute % error of the predicted values was <1.35%. The areas of the tested batches were also analyzed to calculate the amount of CMS using the PLSr model. The interpolated values of the CMS batches employing the PLSr are shown in Table 3. Table 3. Interpolated values and the confidence intervals of Y (jack-knifed) for the tested batches as they were calculated by the developed PLS-R model employing the free software MEPHAS. The % errors from the theoretical value of 160 µg mL −1 are also presented.

Batch
Interpolated It should be emphasized that the PLSr model provided a total estimation of the CMS amount without paying attention to the ratio of the CMS component; that is, if some components changed signal mutually but equally, then the model was not able to discriminate.

Validation of the Univariate and Multivariate Models
The validation was performed for the 23 linear regression models and the PLSr model. Since, in the literature, there are no consensus validation procedures that regulate the performance quality characteristics of multivariate models, the regulatory requirements ICH Q2(R1) were adopted for both univariate and multivariate models in the current study. Linearity was tested over the range of 100-220 µg mL −1 . Accuracy expressed as % standard error from the nominal value (% E) was assessed by analyzing samples at three concentration levels (130, 160 and 190 µg mL −1 ) and at three analytical runs. All the univariate models exhibited accuracy lower than 2.40% at the three tested levels, whereas for the PLSr model the accuracy was lower than 2.83%. Repeatability (the precision under the same operating condition over a short internal of time) and intermediate precision (the variations between different analytical days, n = 3) were expressed as % relative standard deviation (%RSD). The PLSr model presented better precision than the univariate models. Stability was examined at the same levels injecting the same sample (n = 3) every three hours (autosampler stability). The results from both univariate and multivariate validation showed that the CMS components were stable at the duration of the data acquisition. The robustness was examined at 160 µg mL −1 by making deliberate changes (+/−5%) of the starting percentage of the organic modifier and the column temperature. The limit of detection (LOD) and limit of quantitation (LOQ) were calculated only for the univariate models but not for the multivariate model because a multivariate calibration equation is not necessarily similar for all the contributing variables. The LOD and LOQ values were assessed using the equations 3.3 × σ/slope and 10 × σ/slope, respectively, and they were found to range from 4.12 to 17.98 and 12.48 to 54.49, respectively. Before the experiments, the instrumental performance expressed as % RSD was also tested by injection of the reference sample (n = 3) and it was found to be <3.4%. Therefore, the instrumental performance exhibited only a small contribution to the overall variabilities of the validation procedure. Table 4 presents the summary of the validation results for the 23 linear regression models and the PLSr model. Table 4. Accuracy, repeatability, intermediate precision, stability and robustness assessed for the 23 univariate linear regression models and the Partial Least Square regression (PLSr) model developed for the assay of CMS in pharmaceutical formulations using the proposed UPLC-TUV methodology. The limit of detection (LOD) and limit of quantitation (LOQ) values were calculated only for the univariate model.

Principal Component Analysis
In order to compare the content of the batches on the basis of the CMS component ratios, it was decided to divide all peaks with a "reference" peak. Thus, peak_14.42 was selected because it showed no overlap in the chromatogram, and its linear regression model had the highest correlation coefficient (R 2 = 0.9996). Thus, the areas of the 23 peaks that exhibited R 2 > 0.99 were divided by the area of peak_14.42. The same procedure was performed for the tested batches.
A Principal Component Analysis (PCA) model was constructed employing the Principal Component Analysis platform of MEPHAS, importing the peak area ratios (calibration points and batches) as a CSV file (Supplementary A.2). PAST software was also used for PCA (Supplementary A.3), because MEPHAS presented the limitation that the number of variables should be less than the number of samples. Both software gave the same results. Four principal components (PC) were found to describe the model adequately (Supplementary B, Figure S3), with R2X and Q2 equal to 0.992 and 0.616, respectively. The 1st PC held 79.7% of the variance, and the 2nd PC held 12.8%. Therefore, the main differences were summarized for the first two PCs holding more than 92% of the total variability. After a 7-fold bootstrapping the eigenvalues were 83.0% for the 1st PC and 24.0% for the 2nd PC (Supplementary B, Table S2). The scores scatter plot of PC1 vs PC2 is presented in Figure 3. It was observed that the calibration curve (CC) points were at the same cluster as these samples were prepared from the same reference batch. The batches b2, b3 and b4 were similar with that of the calibration curve, whereas b1 and b5 were markedly different. As it can be observed, the projection of the batches on the 1st PC (which holds the main part of the variability) lied together, while the b5 was clearly separate. The projection in PC2 rendered b1 as also clearly distinguishable from the other batches. The batch b5 was an outlier, indicating that the content of this batch was different from the others. It should be noted that the distances between the inter-batch distance of the CC points was similar to that of the bache 4 which was analyzed in triplicate, indicating visually the repeatability of the method. Overall, PCA could discern out-of-specification batches (e.g., batch b5 was expired), but also microbiologically assayed batches that were different from the individual CMS components (e.g., batch b1). PCA involves the initial rotation of the matrix according to the highest variability observed. This could lead to less accurate results compared to the univariate analysis. Therefore, the univariate analysis was also followed.

Similarity Tests
The above-mentioned peak ratios of the tested batches were also compared with the reference ones by employing several similarity tests embedded in the "proxy" package. The table was imported to RStudio as a CSV file. The commands that were followed for the computation of the similarity tests are presented in Supplementary A.4. The performed similarity measures were: eJaccard, cosine, eDice, correlation and gower, while the performed distance measures were: bray, canberra, chord, divergence, euclidean, geodesic, hellinger, kullback, manhattan, podani, soergel, supremum, whittaker and bhjattacharyya. The results are presented in Table 5. It was indicated that batch b5 was different from the reference batch as it presented the lower value at all the tests.

Heatmaps and Structure Similarity Index
Due to the fact that the division of the peak areas by a particular peak could have led to uncertainty because one variable (i.e., the peak 14_42) could leverage the analysis, one more step was followed to explore the CMS content further. All peaks were divided by all the peaks, resulting to a square matrix for each sample. The resulting matrices could be visually compared afterwards as heatmaps. The heatmaps of the reference batch and the tested batches are presented in Figure 4. As a visual inspection suffers from high bias and low accuracy, it was deemed crucial to depict the similarity between the reference heatmap and the tested ones using a simple-to-interpret metric. Thus, the heatmaps were compared using the "SPUTNIK" package, which employs the structural similarity index (SSIM). This index is widely used for evaluating image quality [22] and it has also been applied to the evaluation of chromatographic procedures [23]. The results are presented in Table 6. Batch b5 had a low SSIM value, indicating the content inconsistency of that batch compared to the reference, which was in agreement with the PCA results. The commands for the "stats" and "SPUTNIK" packages are presented in Supplementary A.5. Table 5. Results of the similarity and distance measures that were performed employing the "proxy" package in RStudio. The CMS peak ratios of the tested batches b1, b2, b3, b4 (a, b, c) were compared with those of the reference batch.  . Heatmaps for the peak ratios in the reference batch and the batches b1, b2, b3, b4 (a, b, c) and b5 that were produced using the "stats" package in RStudio. Low and high extreme values are emphasized with dark colors, while light colors represent middle values.

Chemicals
All CMS samples were from the same brand and purchased from a drug store. LC-MS-grade acetonitrile and formic acid were purchased from Carlo Erba reagents (Val de Reuil Cedex, France). LC-MS-grade methanol was obtained from Fisher Scientific (Loughborough, U.K.). Ammonium formate was purchased from Sigma-Aldrich (Steinheim, Germany). Ultrapure water was produced by a Millipore Direct-Q System (Molsheim, France).

Instrumentation
The instrumentation consisted of a Waters ACQUITY UPLC (Milford, MA, USA) module equipped with a binary solvent manager system and an autosampler thermostatically controlled at 4 • C coupled to a Waters ACQUITY UPLC™ Tunable UV (TUV) detector. The analysis was performed on a Waters Acquity BEH C 8 (2.1 mm × 100 mm, 1.7 µm) (Milford, MA, USA) analytical column. After optimization, the chromatographic conditions were; aq. ammonium formate 0.001 M as solvent A and methanol/acetonitrile 79/21 (v/v) as solvent B. The column temperature was maintained at 29 • C. The gradient mode was the same as the previously described program by our laboratory [19]. Briefly, the elution started with 15% B and was programmed to reach 80% B in 34 min at a flow rate of 0.15 mL min −1 . The UV detector was set at 214 nm. The sample injection volume was 10 µL. Instrumental settings, data acquisition and processing were controlled via Empower 2 software.

Sample Preparation
Stock solutions (1 mg mL −1 ) of CMS were prepared in methanol and stored at 30 • C. Calibration standards of CMS were freshly and independently prepared 3 times from the stock; that is, they were weighted and diluted appropriately in water in order to reach concentration levels of 100, 130, 160, 190, 220 µg mL −1 , which corresponded to 80-120% of the theoretical amount of 160 mg CMS in formulations.
For the tested batches (b1, b2, b3, b4a, b4b, b4c and b5), a stock solution of 1 mg mL −1 was prepared for each tested batch by weighting the total CMS and thereafter dilution was performed in order to reach a concentration level of 160 µg mL −1 .

Design of Experiments
Design of Experiments (DoE) for the chromatographic condition of CMS assay was performed via Design-Expert 11 software from StatEase. The Box-Behnken experimental design for response surface methodology was selected for the optimization of the column temperature, the ammonium formate concentration in an aqueous mobile phase and the wavelength, and it performed 17 runs that included 5 center points.

Data Processing
The steps followed in the present study are summarized in Figure 5. The freely available software that was employed for method development is presented in Table 7.

Baseline Correction
The baseline of the chromatograms was corrected using the free and open-source software RStudio, which uses R programming language. The baseline correction was performed using the "baseline" package. The raw data were exported from Empower as CDF files. They were converted to CSV files and uploaded to RStudio as a matrix, where the first row of the matrix included the time and the second row the intensities. After the baseline correction using the appropriate algorithm and parameters, RStudio returned a CSV file with the corrected intensities.

Peak Fitting and Integration
The free software Fityk (version 1.3.1) was used for peak fitting after the baseline correction and importing the files produced by RStudio. The exported results were converted to CSV files.

Univariate and Multivariate Data Analysis
MEPHAS, a free and open-source web-based interactive GUI [24], was used for the univariate (linear regression) and multivariate (Partial Least Squares regression-PLSr and Principal Component Analysis-PCA) for the CMS data analysis. PCA was also performed using PAST software [25].

Similarity Measures and Structure Similarity Index
The similarity and distance measures were assessed employing the "proxy" package in RStudio. Heatmaps were constructed using the "stats" package, and they were compared employing the "SPUTNIK" package that provides the calculation of the structure similarity index (SSIM) between two heatmaps.

Conclusions
In the current study, a UPLC-UV methodology was developed for the quality control of CMS in commercial formulations using freely available software. DoE optimization of the chromatographic parameters resulted in an adequate separation of the CMS components stemming from an extremely complex prodrug mixture. A multivariate PLSr model was developed to assess the total amount of CMS in pharmaceutical formulations and 23 univariate linear regression models to determine each CMS compound separately. All the models were fully validated. The concentration range was lower than that reported by Metcalf et al. (100-220 µg mL −1 instead of 0.05-7 mg ml −1 ). The linear regression models were constructed to measure the concentration of each CMS component. It is true that there are no calibration standards for the CMS components; thus, a CMS batch was selected arbitrarily to serve as the reference batch. The components of that reference batch were the calibration standards for the construction of the linear models. The advance chemometrics employed in the presented work gave an opportunity to access the ratios of the CMS components in the batches and to depict the outlier batches by one number. The 23 linear regression models were constructed in order to measure the concentration of each CMS component in the batches. The ratios of the peaks were not always the same, and this is the reason that all peaks were divided by all the peaks, resulting in squared matrices that were compared as heatmaps using the structure similarity index. Thus, three criteria should be met. The sum amount should be the similar; the amount of every peak should be at a predefined level; and the ratios among the substances should be also clearly defined. This ensured that the overall composition was the same; therefore, the bioavailability and the biological activity are clearly defined. This is the first simple, low-cost UPLC-UV methodology whatsoever for the direct chemical quantitation of CMS content based on all chromatographic peaks. The method was applied to determine the constitution of CMS in pharmaceutical formulations by examining different commercially available batches. The batchs' analysis revealed that no great chemical inconsistencies were observed among the tested samples except the expired batch, which had different chemical content from the reference batch.
It should be noted that the reference CMS batch was chosen arbitrarily in order to perform the batch comparison. Thus, there is an immediate need for a gold standard characterized by high antimicrobial activity and low toxicity to be established. Using this method we aimed to devise a methodology that can be used as a tool for the industry and the regulatory bodies to set up a gold-standard procedure for the proper chemical determination of this substance. In the absence of one, we chose one of the batches to serve as standard and showed that once such a gold standard is created, which could easily be done now with this method, a validation pipeline for the producers of the API as well as pharmaceutical substances could be created. Briefly, one batch was used arbitrarily as a standard (in the future there will be a standard created and provided by the regulatory bodies) and we were able to show that, using this arbitrary standard, we could definitely validate the CMS drug properly. Applying the presented methodology widely, quality control of CMS batches can be performed and quantitating each chromatographic peak, thus leading to injectable formulations showing better control of the desired bioavailability. Furthermore, the separation and quantification of all major chromatographic peaks could aid in the structural elucidation of the compounds consisting of this very complex mixture. Finally, the shelf-life estimate on the basis of chemical assessment can be performed, which leads to reliable results for the homogeneity of the sample content.
Supplementary Materials: The following are available online: Supplementary A Figure S1: Part of the table uploaded to MEPHAS for the linear regression models, Figure S2: MEPHAS interface of the data preparation for the Partial Least Squares regression model, Figure S3: MEPHAS interface for the building of Partial Least Squares regression model, Figure S4: Part of the table uploaded to MEPHAS for the principal component analysis, Figure S5: MEPHAS interface of the data preparation for the Principal Component Analysis, Figure S6: Imported data to the PAST software, Figure S7: Part of the table imported to RStudio in order to perform the similarities tests using the "proxy" package. Supplementary B; Figure S1: Peak fitting in the UPLC-UV chromatogram of 100 µg mL−1 CMS, employing the Fityk (v. 1.3.1) software. The residual plot above the chromatogram indicates the good fitting, Figure S2: The values of Mean square error of prediction (MSEP), Root mean square error of prediction (RMSEP) and R2 by using 1-4 components for PLSr model construction. It is observed that one component can fully describe the model, Figure S3: Scree plot as it is produced by PAST software. Table S1: Interpolated values (mg mL−1) of the CMS components in the tested batches, as they were calculated by the linear regression models developed in MEPHAS. The upper and lower limits of the interpolated values at the 95% confidence level are also presented, Table S2: Eigenvalues, % variances, Eig 2.5% and Eig 97.5% of the principal components after 7-fold bootstrapping as calculated from PAST software during the Principal Component Analysis.