A Chemometric Survey about the Ability of Voltammetry to Discriminate Pharmaceutical Products from the Evolution of Signals as a Function of pH

: Many pharmaceutical products are electroactive and, therefore, can be determined by voltammetry. However, most of these substances produce signals in the same region of oxidative potentials, which makes it di ﬃ cult to identify them. In this work, chemometric tools are applied to extract characteristic information not only from the peak potential of di ﬀ erential pulse voltammograms (DPV), but also from their evolution as a function of pH. The chemometric approach is based on principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA) and support vector machine discriminant analysis (SVM-DA) yielding promising results for the future discrimination of pharmaceutical products in water samples.


Introduction
In recent years, the increasing use of pharmaceutical products for both clinical and farming purposes and the growth of water recycling practices, urged by climate change, have considerably raised the presence of such substances and their metabolites in natural waters, wastewaters and food products [1][2][3][4][5][6][7][8]. The ecotoxic behavior of many pharmaceutical compounds [9] has led to their inclusion in the Watch List of emerging contaminants recently set by EU through the Decision 2015/495 [10,11]. Thus, the determination of pharmaceutical products in water and food samples is of the highest concern. This is usually achieved by means of chromatographic techniques, especially liquid chromatography combined with fluorescence and mass spectrometric detection [12][13][14][15]. These techniques are highly selective, sensitive and accurate and allow multianalyte determination. However, they are expensive and non-portable and require specifically trained personnel. This is why electrochemical sensors, less selective but still sensitive and much simpler, cheaper and portable, constitute a promising alternative to chromatography for the screening and monitoring of pharmaceutical products [16][17][18]. Moreover, the recent popularization of commercial screen-printed electrodes (SPE) modified with a large diversity of biomolecules and nanomaterials have boosted the capabilities of electrochemical sensing in this field [19][20][21][22].
Nevertheless, a crucial problem in the electroanalytical detection of a large set of pharmaceutical products is that they are reduced or, more frequently, oxidized in similar ranges of potential, which depends far more on the nature of the electroactive functional groups than on the structure of the molecules. Then, without the support of a previous chromatographic separation, the task of simultaneously identifying and quantifying different pharmaceutical products in a sample by voltammetry (the most potential-selective electroanalytical technique) is usually restricted to 2 or 3 substances. For this purpose, bare commercial SPE [23], modified SPE [24,25] and SPE grouped into electronic tongues [26] are used and, in the case of overlapping signals, multivariate analysis methods are applied [26]. In this scenario, we think that it is highly unrealistic to start a competition between SPE sensors and chromatographic methods for the simultaneous determination of dozens of pharmaceutical substances in a water sample. A sounder option would be working in two directions: on the one hand, modifying SPE with highly selective reagents for the accumulation and determination of target compounds (e.g., caffeine or paracetamol) that could act as indicators of contamination with pharmaceutical products; on the other hand, developing methodologies for the qualitative discrimination among large groups of such substances that could help as preliminary screening tools to guide the further analysis by chromatographic methods. The present work lies within the second category, quite unexplored up to now.
It is obvious that the most effective information provided by voltammetry for the qualitative identification of substances is the peak potential, which is characteristic of every compound independently of its concentration. However, when we think about dozens of compounds, it is sure that many of them will have virtually the same peak potential. Then, we could define groups of substances with signals appearing in certain potential regions. In this way, the appearance of signals in specific potential ranges would reveal the presence of one or more analytes belonging to the group of compounds appearing there, whereas the absence of signals would ensure the absence of all of them.
As the single information about peak potentials appears to be clearly insufficient for such an ambitious goal, in this work we intend to explore, as a proof of concept, the evolution of peak potentials with pH as a new dimension to help in the discrimination of pharmaceutical products. Fortunately, protons are involved in the oxidation mechanism of many organic substances and this causes a continuous potential shift as pH changes. Indeed, a semi-empirical equation has been used to estimate the ratio between the H + -ions and the electrons involved in an electrochemical reaction from the slope of peak potential vs. pH plots [27][28][29]. When voltammetric signals overlap each other, chemometric methods are required [30,31], but the potential shifts caused by pH changes produce data sets quite far from linearity, which demands specific corrections before the application of typically linear methods like principal component analysis (PCA), partial least squares calibration (PLS) or multivariate curve resolution by alternating least squares (MCR-ALS) [32][33][34].
In this work we have studied by differential pulse voltammetry (DPV) several mixtures of seven characteristic pharmaceutical products that play a crucial role in human life and health: (i) Ascorbic acid (AA): a water-soluble vitamin widely used to complement inadequate dietary intake, to prevent and treat scurvy and also as antioxidant [35]; (ii) Uric acid (UA): an end product of purine metabolism which has been suggested to act as an antioxidant in vivo [36]; (iii) Paracetamol (P): an antipyretic and analgesic drug widely used for fever reduction and pain relief associated with headache, backache, arthritis and postoperative pain [37]; (iv) Acetylsalicylic acid (AS): a drug of the salicylate family used as analgesic, antipyretic and anti-inflammatory; it is also used to treat specific inflammatory conditions such as Kawasaki disease, pericarditis and rheumatic fever, and its longterm use helps prevent further heart attacks, blood clots and ischemic strokes in people with a high level of risk [38]; (v) Trimethoprim (TR): an antibiotic used mainly in the treatment of urinary tract infections; it has also been used for treatment of acute otitis media caused by Streptococcus pneumoniae and Haemophilus influenzae [39]; (vi) Ibuprofen (IB): a nonsteroidal anti-inflammatory drug commonly used for the treatment of inflammation, fever and pain caused by migraines, menstrual cramps and rheumatoid arthritis [37]; and (vii) Caffeine (C): a central nervous system stimulant that is the most commonly used psychoactive substance in the world and whose intakes may protect against some diseases such as Parkinson's disease [37]. Considered mixtures contain five or six of these products in the same solution. Voltammetric measurements were made with a commercial screen-printed carbon electrode (SPCE) at three concentration levels in solutions of pH ranging from 2 to 12. Then, the evolution of the main parameters of the signals (peak current, peak area, half-peak width and, especially, peak potential) were studied in order to furnish a feasible strategy to identify the analytes from these data. For the exploratory data analysis, PCA was used, whereas different models were tested for the discrimination of substances by partial least squares discriminant analysis (PLS-DA) and supporting vector machine discriminant analysis (SVM-DA). This last approach is based on the supporting vector machine (SVM) methodology, a non-linear strategy that has evolved in the last years as a valuable tool for the analysis of non-linear data [40][41][42]. In light of these results, some conclusions are drawn about the perspectives of a future methodology for the qualitative sensing of pharmaceutical products mainly in natural waters and wastewaters.
Stock standard solutions of AA, UA, P, AS and C at 10 −2 mol L −1 were prepared in ultrapure water (Milli-Q plus 185 system, Millipore), whereas 5 × 10 −3 and 10 −2 mol L −1 stock standard solutions of TR and IB, respectively, were prepared in absolute ethanol. All stock solutions were stored at 4 °C in the refrigerator safeguarded from light.
Britton-Robinson (BR) aqueous universal buffer solutions in the pH range of 2-12 consist of a mixture of acetic acid, boric acid and phosphoric acid at a concentration of 0.04 mol L −1 each, and adjusted to the desired pH with 1 mol L −1 KOH.

Apparatus
For differential pulse voltammetry (DPV) an Autolab System PGSTAT12 (EcoChemie, The Netherlands) attached to a VA Stand 663 (Metrohm, Switzerland) was employed. For data acquisition a GPES software version 4.9 (EcoChemie) was used.
The working electrode was a carbon screen-printed electrode (SPCE, 4 mm diameter) provided by Metrohm DropSens (Oviedo, Spain) and connected to the Autolab System by means of a flexible cable (ref. CAC, Metrohm DropSens). Pt wire auxiliary electrode and Ag/AgCl/KCl (3 mol L −1 ) reference electrode were purchased from Metrohm (Switzerland).
A Crison micro pH 2000 was used for pH measurements.

Voltammetric measurements
DPV measurements using SPCE were carried out scanning the potential from −0.2 to 1.6 V using a pulse time of 50 ms, a pulse amplitude of 100 mV, a step potential of 5 mV, and a scan rate of 10 mV s −1 . All experiments were performed without any oxygen removal.
DPV measurements of each considered mixture of pharmaceutical products were carried out at three concentration levels: 10, 25 and 50 mg L −1 , and at twenty pH values ranging from 2 to 12 using a different SPCE unit for each pH value assessed. In this sense, it should be mentioned that a SPCE unit could be used for a large set of measurements (more than 20), enabling the performance of the DPV measurements involved for a single pH value without loss of sensitivity [23]. In addition, every new SPCE unit was scanned in a 25 mg L −1 control mixture at pH 5.7 to subsequently correct the possible variability derived from the performance of DPV measurements on different days and using different SPCE units.

Data treatment
Characteristic parameters of voltammetric signals (peak potential, peak height, peak area and half-peak width) were obtained from experimental voltammograms by means of GPES program by Autolab and also by using home-made programs developed for Matlab® environment [43]. In the chemometric analysis of data by PCA, PLS-DA and SVM-DA, PLS_Toolbox [44] (a toolbox especially designed for Matlab by Eigenvector) was employed.

Preliminary study
As mentioned in the introduction, seven pharmaceutical products were considered: ascorbic acid (AA), uric acid (UA), paracetamol (P), acetylsalicylic acid (AS), trimethoprim (TR), ibuprofen (IB) and caffeine (C). Three combinations of these substances were considered with the following compositions: (1) AA, P, AS, IB and C; (2) AA, P, AS, TR and C; and 3) AA, UA, P, AS, IB and C. Each mixture was measured at the three concentration levels (10, 25 and 50 mg L −1 ) and at twenty pH values ranging from 2 to 12. Figure 1 shows the composition of all three mixtures and the position of the corresponding DPV peaks at a concentration and pH value where all signals are visible.  Figure 2 shows that, as pH increases, all the peaks move to less positive potentials, but the potential shift can be quite different. Moreover, there is a general decrease of the peak height at extreme pH values. In fact, apart from paracetamol, the signals disappear at too low and, especially, too high pH values. As for the baseline, it dramatically increases at alkaline pH, distorting and hiding the peaks which appear at the most positive potentials (Figure 2). Depending on the pH value, some signals can overlap with each other or with the baseline, but most of the time they are reasonably separated. This allowed us to determine the characteristic parameters of the signals and study their evolution with pH. From the preliminary study of these parameters (not shown) it was clear that only the peak potential exhibited a reproducible variation that was very similar at all three concentration levels and in the three mixtures studied, as shown by Figure 3. The correction by using the peak potentials measured with every new electrode in a control mixture at pH 5.7 slightly increased the reproducibility of peak potentials (differences lower than 10 mV in most of cases, very low as compared to the evolution of several hundred mV along the pH). It was also clear that pH values higher than 9 produced too high baselines and caused the disappearance of most of the signals, so that they were discarded for further analysis.
From Figure 3 it is clear that the evolution of peak potentials with pH is a key parameter that can help to discriminate between signals of unknown substances which appear in the same potential region. In the next section, PCA, PLS-DA and SVM-DA methods will be applied for this purpose.

Discrimination of peaks from their evolution with pH
In order to investigate different discrimination strategies, the peak potentials of the signals were organized in a calibration matrix of 30 samples/peaks (in rows) and 14 pH values from 1.94 to 8.76 (in columns) and a validation matrix of 15 samples (in rows) and 14 pH values (in columns). Every set included data from all seven analytes and all three mixtures at different concentrations. Concerning data pretreatment, some usual strategies such as mean centering and autoscale were tried, but they did not provide good results. Therefore, the original values of peak potentials were analyzed without any pretreatment. Only a few values at extreme pH had to be extrapolated by PLS-Toolbox for these analytes whose peak disappeared under these conditions (e.g., IB). Firstly, PCA was used to examine initial patterns in the data. Figure 4 summarizes the results of the application of PCA to the calibration and validation sets. A PCA model was successfully built with two principal components (PC), the first one (PC1) explaining 99.55% of the data variance and the second one (PC2) explaining 0.43% of the variance. The loadings plot (Figure 4c) suggests that the first component represents the 'average' position of the peak in the potential axis, whereas the second component is essentially related to the evolution of the peak potential with pH. When we look at the scores plot ( Figure 4a) this intuition is confirmed: the substances are ordered along the PC1 axis according to their 'average' peak potential or, in other words, with the same distribution along the potential axis that could be seen in the voltammograms of the mixtures (Figure 1). As for the second component, it separates the samples according to the signal evolution with pH being on the upper side of the plot with smaller potential shifts (e.g., IB and C) and on the lower side with higher potential shifts (e.g., AA, UA and P). Although PC2 is far less important than PC1 in explaining the overall data variance (0.43% vs. 99.55%), it is very useful to separate along the vertical axis the scores of the substances with similar peak potentials and, hence, very close in the horizontal (PC1) axis (e.g., UA and P or IB and TR). As Figure 4a shows, the PCA model of two components allows a reasonable separation of the scores corresponding to the seven pharmaceutical products in the calibration set. Additionally, the application of the model (i.e., the PCA loadings) to the validation set produced scores for these products (Figure 4b) very close to those achieved by the same substances in the calibration step. The PCA of peak potentials allows a qualitative discrimination of substances, but it lacks a quantitative estimation of the probability of a peak belonging to a certain substance. This is just what PLS-DA can do. Figure 5 shows the results of applying this methodology to the same calibration and validation data sets used in PCA. This has been achieved with a set of seven PLS-DA models with four latent variables (LV) each. Every calibration model is made by assigning the value of 1 to the samples belonging to the class (e.g., peaks of ascorbic acid, AA) and 0 to the samples belonging to other predefined classes (i.e., peaks of other substances). As a result, the model predicts a value for every sample that, if it is close to 1, means that the sample belongs to the class and, if closer to zero, suggests that the sample does not belong to the class. For instance, Figure 5a shows the predictions of the AA model for the samples of the calibration set. The values of the AA samples (in red color) are quite close to 1, while the other samples are closer to zero. The dashed red line shows the threshold value, that clearly separates the samples belonging to the class (above the line) from the samples of other classes (below the line). Figure 5b shows the predictions for the AA samples of the validation set, which are also very good. In both cases, several parameters can be computed to quantify the accuracy of the classification. The most interesting ones are the sensitivity (positive samples identified / total positive samples), the specificity (negative samples identified / total negative samples) and the classification error (percentage of badly classified samples as compared to the total number of samples). The corresponding values for the AA model are 1.0, 1.0 and 0%, which are perfect. Table 1 summarizes these and other relevant parameters for the seven substances considered. As it can be seen, not all substances are equally well classified by the PLS-DA models. For instance, relatively high classification errors can be found for substances like P, AS or C which may have eventual problems of overlapping with the baseline or with neighbor peaks. A good evaluation of the overall classification ability of the set of PLS-DA models is the assignment of the samples of a data set to the predefined classes. If the strict assignation criterion is used (Figure 5c,d), only the samples clearly surpassing the threshold are assigned and, hence, several samples remain unassigned. This produces a relatively high overall prediction error (Table 1), especially in the validation set (46.7%). In contrast, the assignment of absolutely all the samples to the most probable class (Figure 5e,f) produce much better results (0% in the calibration set and 20% in the validation set).
As already pointed out, PLS-DA is a linear method, which means that it is based on the hypothesis that all the electroactive substances contribute linearly to the signals. Unfortunately, this is not always true in voltammetric measurements. As this could be the source of some weaknesses of the PLS-DA approach, we decided to try a more sophisticated, non-linear method such as SVM-DA, also implemented in PLS-Toolbox. Despite its higher complexity, SVM-DA provides results that can be visualized in a similar way as before. Thus, Figure 6a,b show the probability that the calibration and validation samples belong to the AA class (quite high, in agreement with the high values predicted in Figure 5a   The qualitative comparison of the plots in Figures 5 and 6 and, especially, the values of the error parameters summarized in Table 1, confirm a significantly better classification when SVM-DA is used (0% of error for both calibration and validation samples when the most probable classification is used).

Conclusions
The present work shows that the determination of the peak potentials of DPV signals at different pH values using commercial screen-printed electrodes could be used to discriminate unknown pharmaceutical products, mainly in natural waters and wastewaters samples but also in food products cultivated using reclaimed water, if the data are processed with a chemometric method of discriminant analysis. For this purpose, PCA is a convenient exploratory tool, PLS-DA provides a reasonable classification and SVM-DA appears to be a very promising strategy for the discrimination or even the identification of pharmaceutical products, as it seems to be less affected by the inherent non-linearity of voltammetric data.
Taking into account the large number of pharmaceutical products that are oxidized in the potential region considered in this work, it does not seem feasible to univocally identify individual substances in real samples by using this methodology. Instead, it seems more convenient to determine peak potentials of relevant pharmaceutical products as a function of pH and build an extensive database with them. The use of an inner standard like paracetamol (with a clearly visible signal even at extreme pH values) could help to minimize the variability of peak potentials due to secondary phenomena caused by the presence of other electroactive substances or by matrix effects. Then, the database could be used to build discrimination models (PLS-DA or SVM-DA) to stablish groups of substances of similar behavior and assign unknown analytes to these groups. The application of the model to unknown samples would only require the DPV measurement of the sample in the presence of the inner standard at several pH values, which can be achieved with successive additions of a KOH solution to the acidic form of the Britton-Robinson buffer.