In Silico Infrared Spectroscopy as a Benchmark for Identifying Seized Samples Suspected of Being N -Ethylpentylone

: New psychoactive substances (NPSs) have concerned authorities worldwide, and monitoring them has become increasingly complex. In addition to the frequent emergence of new chemical structures, the composition of adulterants has changed rapidly. Reliable reference data on NPS are not always available, and identifying them has become an operational problem. In this study, we evaluated the infrared spectral data of 68 seized samples suspected of containing a synthetic cathinone ( N -ethylpentylone). We used quantum chemistry tools to simulate infrared spectra as a benchmark and obtained infrared spectra for different cathinones, structurally analogous amphetamines, and possible adulterants. We employed these in silico data to construct different chemometric models and investigated the internal and external validation and classiﬁcation requirements of the models. We applied the best models to predict the classiﬁcation of the experimental data, which showed that the seized samples did not have a well-deﬁned proﬁle. Infrared spectra alone did not allow N -ethylpentylone to be distinguished from other substances. This study enabled us to evaluate whether experimental, in silico, and applied statistical techniques help to promote forensic analysis for decision-making. The seized samples required in-depth treatment and evaluation so that they could be correctly analyzed for forensic purposes.


Introduction
New psychoactive substances (NPSs) are compounds that are not controlled by the 1961 (Single Convention on Narcotic Drugs) or the 1971 (Convention on Psychotropic Substances) conventions in their respective pure forms or in mixtures. Because there is no indication of control, they have become a threat to public health, in many countries [1][2][3]. Some NPSs have been known for over 30 years, so the term "new" does not necessarily refer to novel drugs but rather to drugs that have recently become a threat to public health [3,4]. NPSs include different compounds, and synthetic cathinones (SC) are the second-most detected. Between 2009 and 2021, the United Nations Office on Drugs and Crime (UNODC) reported 1127 NPSs and identified 17.8% (201) of them as SC [1,3]. In Brazil, N-ethylpentylone is the most seized SC, and therefore, its correct identification is important for law enforcement agencies.
The rapid emergence of these different structures demands techniques that provide information faster and nondestructively, to aid decision-making. Spectroscopy has proven suitable for detecting and identifying NPSs [5]. Spectroscopic techniques (transmission, absorption, or reflection) are based on the interaction of matter with radiant energy [6]. Among spectroscopic techniques, near and middle infrared (NIR and MIR, respectively) spectroscopy stands out: NIR and MIR are easy to implement and have short response times [7][8][9]. NIR  The samples were ground and homogenized before being inserted into the attenuated total reflectance (ATR) accessory. The FTIR spectra were recorded on a Thermo Fisher Scientific Nicolet 380 spectrometer (Madison, WI, USA) equipped with a Smart Orbit (Madison, WI, USA) single-ATR reflection diamond crystal. However, the ATR accessory is known to distort the spectrum-bands with higher wave numbers are "undersized", and bands with lower wave numbers are "oversized" [8]. Thus, ATR was corrected by using OMNIC software version 7.2a, also by Thermo Fisher Scientific (Thermo Electron The samples were ground and homogenized before being inserted into the attenuated total reflectance (ATR) accessory. The FTIR spectra were recorded on a Thermo Fisher Scientific Nicolet 380 spectrometer (Madison, WI, USA) equipped with a Smart Orbit (Madison, WI, USA) single-ATR reflection diamond crystal. However, the ATR accessory is known to distort the spectrum-bands with higher wave numbers are "undersized", and bands with lower wave numbers are "oversized" [8]. Thus, ATR was corrected by using OMNIC software version 7.2a, also by Thermo Fisher Scientific (Thermo Electron Corporation, now Thermo Fisher Scientific, Madison, WI, USA). To reduce noise, the average spectra of the samples obtained after 32 readings were evaluated.
Gaussian distribution was employed to ensure that the experimental data and the simulated data had the same dimension. The results ranged from 4000 to 525 cm −1 ; the resolution was 2 cm −1 . The generated data were organized in a spreadsheet (Excel, Microsoft Office 365, Redmond, Washington, DC, USA), and their resolution was resized to be consistent with the routine analysis (4 cm −1 ). In addition, these data later underwent statistical analysis and prediction. The variables were not pretreated.
The samples were ground and homogenized before being inserted into the attenuated total reflectance (ATR) accessory. The FTIR spectra were recorded on a Thermo Fisher Scientific Nicolet 380 spectrometer (Madison, WI, USA) equipped with a Smart Orbit (Madison, WI, USA) single-ATR reflection diamond crystal. However, the ATR accessory is known to distort the spectrum-bands with higher wave numbers are "undersized", and bands with lower wave numbers are "oversized" [8]. Thus, ATR was corrected by using OMNIC software version 7.2a, also by Thermo Fisher Scientific (Thermo Electron Corporation, now Thermo Fisher Scientific, Madison, WI, USA). To reduce noise, the average spectra of the samples obtained after 32 readings were evaluated.
Gaussian distribution was employed to ensure that the experimental data and the simulated data had the same dimension. The results ranged from 4000 to 525 cm −1 ; the resolution was 2 cm −1 . The generated data were organized in a spreadsheet (Excel, Microsoft Office 365, Redmond, Washington, DC, USA), and their resolution was resized to be consistent with the routine analysis (4 cm −1 ). In addition, these data later underwent statistical analysis and prediction. The variables were not pretreated.

Part II-Computational Analyses
Different NPSs were simulated, namely N-ethylpentylone; N-ethylpentylanoamine (an amphetamine homologous to N-ethylpentylone); 21 amphetamines ( Figure 2A); 21 cathinones homologous to amphetamines ( Figure 2B), available in Table S1 of Supplementary Materials; and 12 other possible interferents (creatine, phenacetin, lidocaine, LSD, paracetamol, procaine, psilocin, 25H-NBOMe, benzocaine, caffeine, metoclopramide, and theobromine). On the basis of previous results [57], the functional DFT B3LYP [58] with the TZVP base [59][60][61], implemented in ORCA software version 4.2.1 [62], was selected for conducting this stage of the work and describing the structures. This combination was used to optimize the structures and to obtain the frequencies. For the analyses, chloroform was considered an implicit solvent; the conductor-like polarizable continuum model (CPCM) On the basis of previous results [57], the functional DFT B3LYP [58] with the TZVP base [59][60][61], implemented in ORCA software version 4.2.1 [62], was selected for conducting this stage of the work and describing the structures. This combination was used to optimize the structures and to obtain the frequencies. For the analyses, chloroform was considered an implicit solvent; the conductor-like polarizable continuum model (CPCM) [63,64] was employed. A correction (scaling factor) of the theoretical data compared with the experimental data was performed for the selected method; the scaling factor was 0.9650 [65].
After the calculation, the data referring to the spectra in the infrared region were organized in a spreadsheet (Excel, Microsoft Office 365). As in the case of the experimental data, Gaussian distribution was applied to fit the simulated data from 4000 to 525 cm −1 , with a resolution of 4 cm −1 . This approach allowed the spectra to be combined, to obtain a set of mixtures that will serve as input for creating statistical models.

Part III-Statistical Analyses
Statistical analyses and/or data validation were used in each previous step. This approach ensured the practical construction of forecast models that were later applied to the samples of interest. Thus, a similarity analysis was carried out by employing a heat map as a tool to understand the possible divergences between the nature of the compounds present in the seized samples [66]. The 68 samples that were experimentally analyzed by ATR-FTIR contained heterogeneous errors that were not constant, because the synthesis of the compounds was not standardized. Because of the complexity of these samples, evaluation started with a simple exploratory analysis. In this step, the intensities at each wavelength were evaluated by using Pearson's correlation coefficient (Equation (1)): This coefficient allows the strength of the two variables to be quantified. If r = 1, the relationship is positive and the variables are directly proportional. If r = −1, the association indicates that the variables are inversely proportional [66]. The responses obtained through this correlation were analyzed by employing a heat map and a histogram to quantify the occurrences of correlation between the spectra and by using separatrices (quartiles). These analyses were performed in a spreadsheet (Excel, Microsoft Office 365).

Models with the Computational Data
For the computational analysis step, the obtained data were used to formulate models to investigate whether N-ethylpentylone was present in the seized samples. The partial least squares discriminant analysis (PLS-DA) technique was employed to develop these models.
PLS-DA performs classification through multivariate regression, in which the dependent variable consists of a vector that divides the samples into target classes (numerically identified as +1 and −1). PLS-DA decomposes the data matrix X, which contains n samples and m variables (matrix X (n,m) ). In this case, the independent variables are the computationally obtained data corresponding to the vibrations in the infrared region. Decomposition of the data matrix gives rise to new axes (latent variables) that gather information from the original variables [5,33]. This technique consists of a multivariate classification analysis implemented in Pirouette software version 4.5 (Infometrix, Bothell, Washington, DC, USA).
Leave-one-out (LOO) cross-validation was applied in this stage of model construction [67,68]. LOO cross-validation consists of removing a sample and then reassessing the model to predict the sample that was removed. This occurs systematically until all the samples have been subjected to the process. The parameters used to evaluate the PLS-DA results are Q 2 , internal correlation coefficient of the cross-validation model (Equation (2)); R 2 , correlation coefficient for calibration (Equation (3)); RMSEV, root mean square error validation (Equation (4)); and RMSEC, root mean square error calibration (Equation (5)): Psychoactives 2023, 2

5
These parameters are evaluated independently and through the relations R 2 > Q 2 and RMSEC < RMSEV [69]. More specifically, the high value of R 2 represents low data variability, resulting in low root mean square error calibration (RMSEC) because few data are unexplained by the model.
After the models were constructed by using the computationally obtained spectra, the datasets were evaluated by employing the methodology of internal and external validation, which started with leave-N-out (LNO) cross-validation, where N varied from 1 to 10 for all the models. The mean (Q 2 LNO , N = 10), the variation between LOO and LNO, and the standard deviation were evaluated. For the model to be robust, the difference between Q 2 LNO and Q 2 LOO must be minimal [70]. For external validation, the computationally obtained datasets were divided into submodels by applying the Kennard-Stone algorithm [71], available in Dataset Division 1.2 software (Jadavpur, Kolkata, West Bengal, India). This method calculates the Euclidean distances from the dataset, and the two most-distant samples, most dispersed in space, are selected to group the others. We defined that a subgroup would gather 75% of the original data and another 25%, so the algorithm generates training and test sets. This approach assesses the prediction ability of the models, as well as their stability and robustness [72,73].

Assessing the Predictive Capacity of Models
The models were analyzed by using the computationally obtained data (validated internally and externally) to evaluate performance, sensitivity, specificity, precision, and accuracy. Experimental infrared data from monographs available in the SWGDRUG library version 2.1 (available at https://www.swgdrug.org/ir.htm, JCAMP Format, accessed on 21 March 2022) were used. The monographs of 20 amphetamines and 20 cathinones (available in Table S2 of Supplementary Materials) were selected because these compounds are analogous to the compound of interest in the study. All these new data were submitted to the same procedure of resizing the wavelengths to adjust the matrix for the chemometric processes. The models were used to evaluate not only the NPSs but also 20 adulterants with similar structures (available in Table S3 of Supplementary Materials).
In this evaluation, the values related to each class, either + 1 or −1, were employed as cutoff limits. The samples that showed a positive and a negative value were considered to belong entirely to the class indicated as +1 and −1, respectively. The cutoff limit was used because it allowed the responses to be evaluated through figures of merit (FOMs).
TP TP + FP (8) TN + TP TN + FP + TP + FN (9) This analysis enables the qualitative methods to be validated and ensures better reliability for the statistical procedure applied to the samples of interest. Thus, this evaluation Psychoactives 2023, 2 6 more safely indicates the probability that a new sample (characteristics resembling the characteristics used to create the model) belongs or not to classes [5].

Predicting the Samples of Interest
After the models were validated and their precision was evaluated, the experimental data of the 68 samples seized by the Federal Police and suspected of being N-ethylpentylone were used. Figure 3 summarizes the entire experimental procedure used here.
This analysis enables the qualitative methods to be validated and ensures better reliability for the statistical procedure applied to the samples of interest. Thus, this evaluation more safely indicates the probability that a new sample (characteristics resembling the characteristics used to create the model) belongs or not to classes [5].

Predicting the Samples of Interest
After the models were validated and their precision was evaluated, the experimental data of the 68 samples seized by the Federal Police and suspected of being N-ethylpentylone were used. Figure 3 summarizes the entire experimental procedure used here.

Part I-Experimental Analyses
The analyzed samples had been seized by the Brazilian Federal Police in 2017. They were initially identified as possibly being the synthetic cathinone known as N-ethylpentylone because their spectral profile resembled the spectral profile of N-ethylpentylone in the monograph available in the SWGDRUG library ( Figure 4) [77]. The seized samples initially had labels on their packages indicating that they were food for dogs and cats ( Figure 5), with different delivery destinations.

Part I-Experimental Analyses
The analyzed samples had been seized by the Brazilian Federal Police in 2017. They were initially identified as possibly being the synthetic cathinone known as N-ethylpentylone because their spectral profile resembled the spectral profile of N-ethylpentylone in the monograph available in the SWGDRUG library ( Figure 4) [77]. The seized samples initially had labels on their packages indicating that they were food for dogs and cats ( Figure 5), with different delivery destinations.
We analyzed the seized samples by spectroscopy in the infrared region without applying any analytical preparation to obtain the spectra. We ground, homogenized, and inserted the samples into the ATR accessory. We collected spectra from 4000 to 525 cm −1 , with spectral resolutions of 4 cm −1 . To reduce noise, in this analysis, we used the average spectra obtained after 32 readings, a standard protocol used by the Brazilian Federal Police.
We did not use the 400-525 cm −1 range, because the diamond employed in the equipment absorbs in this region. We analyzed 68 samples and obtained the average spectrum of each of them. Figure 6 groups the superposition of these average spectra obtained by the described methodology. We will use the nomenclature indicated in this figure throughout sample processing. We analyzed the seized samples by spectroscopy in the infrared region without applying any analytical preparation to obtain the spectra. We ground, homogenized, and inserted the samples into the ATR accessory. We collected spectra from 4000 to 525 cm −1 , with spectral resolutions of 4 cm −1 . To reduce noise, in this analysis, we used the average spectra obtained after 32 readings, a standard protocol used by the Brazilian Federal Police. We did not use the 400-525 cm −1 range, because the diamond employed in the equipment absorbs in this region. We analyzed 68 samples and obtained the average spectrum of each of them. Figure 6 groups the superposition of these average spectra obtained by the described methodology. We will use the nomenclature indicated in this figure throughout sample processing.  We analyzed the seized samples by spectroscopy in the infrared region without applying any analytical preparation to obtain the spectra. We ground, homogenized, and inserted the samples into the ATR accessory. We collected spectra from 4000 to 525 cm −1 , with spectral resolutions of 4 cm −1 . To reduce noise, in this analysis, we used the average spectra obtained after 32 readings, a standard protocol used by the Brazilian Federal Police. We did not use the 400-525 cm −1 range, because the diamond employed in the equipment absorbs in this region. We analyzed 68 samples and obtained the average spectrum of each of them. Figure 6 groups the superposition of these average spectra obtained by the described methodology. We will use the nomenclature indicated in this figure throughout sample processing. We observed that the samples had similar spectra. We used descriptive statistics, Pearson's correlation coefficient (Equation (1)), to assess the initial similarity of these data [78].
Additionally, we employed the peaks between 800 cm −1 and 1700 cm −1 and between We observed that the samples had similar spectra. We used descriptive statistics, Pearson's correlation coefficient (Equation (1)), to assess the initial similarity of these data [78].
Additionally, we employed the peaks between 800 cm −1 and 1700 cm −1 and between 2788 cm −1 and 3480 cm −1 for this evaluation because the spectra had these regions in common. We removed these regions from the data because they could introduce a doubtful aspect to the responses if we were to use them [79]. Figure 7 illustrates the relationships between the experimentally evaluated samples (available in Table S4 of Supplementary Materials).  Figure 7 shows very similar samples and divergent samples. No comparison was less than zero, indicating that all the samples were correlated. More objectively, Figure 8 summarizes these comparisons through a histogram. The red line represents the cumulative variation of these occurrences for each segment of the evaluated spectral range. We observed that most samples were around similarity (0.5-0.6), and a few samples, less than   Figure 8 summarizes these comparisons through a histogram. The red line represents the cumulative variation of these occurrences for each segment of the evaluated spectral range. We observed that most samples were around similarity (0.5-0.6), and a few samples, less than 20%, had high likeness. This analysis showed that the experimental spectra of the samples suspected of containing N-ethylpentylone and the spectrum available for N-ethylpentylone in the SWG-DRUG library of monographs had some similarity. However, we did not evaluate the samples for possible interferences or the presence of other compounds that may have the same response. Figure 9 indicates the correlation between the spectra of the 68 samples seized by the Federal Police and the reference spectrum for N-ethylpentylone deposited in the SWGDRUG library of monographs. On the basis of Figure 9, the values ranged from 0.09 to 0.80. More specifically, about 83% (57 of 68) of the samples had values between 0.20 and 0.75. This wide range indicated uncertainties and the need for a detailed analysis of which samples were similar or different [80]. We better explored this range (Figure 9) on the basis of the data presented in Table 1. These results gather the numerical values for the quartiles and allow the dataset to be analyzed more accurately.   This analysis showed that the experimental spectra of the samples suspected of containing N-ethylpentylone and the spectrum available for N-ethylpentylone in the SWGDRUG library of monographs had some similarity. However, we did not evaluate the samples for possible interferences or the presence of other compounds that may have the same response. Figure 9 indicates the correlation between the spectra of the 68 samples seized by the Federal Police and the reference spectrum for N-ethylpentylone deposited in the SWGDRUG library of monographs. This analysis showed that the experimental spectra of the samples suspected of containing N-ethylpentylone and the spectrum available for N-ethylpentylone in the SWG-DRUG library of monographs had some similarity. However, we did not evaluate the samples for possible interferences or the presence of other compounds that may have the same response. Figure 9 indicates the correlation between the spectra of the 68 samples seized by the Federal Police and the reference spectrum for N-ethylpentylone deposited in the SWGDRUG library of monographs. On the basis of Figure 9, the values ranged from 0.09 to 0.80. More specifically, about 83% (57 of 68) of the samples had values between 0.20 and 0.75. This wide range indicated uncertainties and the need for a detailed analysis of which samples were similar or different [80]. We better explored this range (Figure 9) on the basis of the data presented in Table 1. These results gather the numerical values for the quartiles and allow the dataset to be analyzed more accurately.   On the basis of Figure 9, the values ranged from 0.09 to 0.80. More specifically, about 83% (57 of 68) of the samples had values between 0.20 and 0.75. This wide range indicated uncertainties and the need for a detailed analysis of which samples were similar or different [80]. We better explored this range (Figure 9) on the basis of the data presented in Table 1. These results gather the numerical values for the quartiles and allow the dataset to be analyzed more accurately. From the analysis summarized in Table 1, 50% of the dataset correlated up to 0.6856, and 75% of the evaluated spectra correlated up to 0.7304. Given the need for analytical standards, we used a spectral survey through computational analysis to identify which spectra could contain N-ethylpentylone.

Part II-Computational Analyses
We simulated 21 amphetamines, 21 cathinones [45], and 12 adulterants. We obtained all these results by using chloroform as the implicit solvent and the functional DFT B3LYP with the TZVP base. All the obtained data resulted in structures with positive vibrational frequencies, which ensured that the structures were in a structural conformation of minimum energy [81].
Because the theoretical spectra use harmonic approximations and do not consider some anharmonic effects in the calculations, the wavelengths obtained in the simulation were shifted [82]. To correct these responses, we used the scaling factor 0.9650, available at NIST (National Institute of Standards and Technology) [65].
We applied the Gaussian distribution [83] to the obtained frequency data. This approach fit the simulated data in the experimental data dimension. For this, we used the wavelengths from 520 to 1700, from 2788 to 3156 cm −1 , and from 3316 to 3480 cm −1 with resolution resembling the resolution of the experimental data, totaling 431 wavelengths.
Additionally, we combined the frequency for each pair of simulated compounds to acquire the resulting spectrum of the mixture. The proposal was to simulate the addition of adulterants to the spectra of the different NPSs studied here. This combination allowed us to construct data matrixes for use during chemometric analyses. Table 2 shows the resulting mixtures.

Part III-Statistical Analysis
We used the combinations shown in Table 2 to create the PLS-DA models. More specifically, the data matrixes contained 431 variable wave numbers, referring to the computationally obtained spectroscopic results. We grouped the values obtained for each of the models in Table 3. On the basis of the results in Table 3, the parameters varied for each model. The principal components varied from 3 to 7, with accumulated information from 67.9705% (M5) to 94.6613% (M9). The correlation coefficient for calibration (R 2 ) ranged from 0.7264 (M6) to 0.9624 (M4). We obeyed the other parameters according to the relationship exposed in Equations (2) to (5). Furthermore, the difference between the values of R 2 and Q 2 varied between 0.0170 (M2) and 0.7133 (M5) [84,85].
We applied the leave-n-out (LNO) test to assess the stability of the cross-validation coefficient (Q 2 ) of the models. In addition, we grouped the results on Q 2 LNO variations in Table 4. We observed that the responses for Q 2 LNO in the models ranged from 0.0712 (M5) to 0.9035 (M4), with standard deviations of 0.0506 and 0.0170, respectively. LOO and LNO differed from 0.0002 (M1 and M12) to 0.0525 (M6) units. This information allowed us to observe the stability of the models given that a high standard deviation and an extensive range of values between Q 2 LOO and Q 2 LNO indicate internal problems in the validated dataset. Thus, we conducted external validation to evaluate the robustness of the models and how adjusted they were. Table 5 gathers the resulting parameters for this validation. The submodels obtained by the Kennard-Stone method provided the parameters described in Table 5. The R 2 values ranged from 0.9196 (M1) to 0.0008 (M6), indicating that the models with the lowest correlation were very well adjusted to the dataset. Moreover, the SEP and PRESS values also provided information about the modeling error: the higher the value, the greater the residual error.
We validated the predictions on the basis of the consistency between the evaluated and validated (both internally and externally) models and the dataset obtained by the simulation. For this approach, we used experimental spectra in the infrared region of known structures available on the SWGDRUG platform. The procedure consisted of predicting the classification of amphetamines (20), cathinones (20), and adulterants (20) known in the models. We used the figures of merit to evaluate these responses. With the responses obtained by applying the logical tests, we received information about the sensitivity, specificity, precision, and accuracy of the model. Table 6 gathers the results for the logical tests applied to the predictions and robustness percentages. We used the parameters indicated by Equations (6) to (9) to evaluate these predictions. We observed that models M1, M2, M3, M4, M7, and M12 presented values above 85% for sensitivity, specificity, precision, and accuracy. The other models either proved to be very specific and not very sensitive or very sensitive and not very specific. Furthermore, these different models did not have adequate accuracy.
We included N-ethylpentylone in the group of cathinones used to verify the predictive capacity. Because N-ethylpentylone is the structure of interest in this work, we included its spectrum, made available by the SWGDRUG library of monographs, in this dataset. Thus, we included this molecule in the group used to evaluate the predictive capacity of the models by using known NPS structures. Table 7 combines the values and classes predicted for N-ethylpentylone by the models. Table 7. Prediction value for the experimental spectrum of N-ethylpentylone in the different models developed in this work and the class indicated as correct.

Model
Forecast Only two models (M6 and M8) were not able to predict the spectrum of N-ethylpentylone itself or a homologous cathinone correctly. The other models predicted the correct class.
From the same ideal of evaluating the predictive capacity of the models, we used the experimental data available in the SWGDRUG library of monographs for adulterants. We expected that the models would not indicate the presence of NPSs, because the data concerned only adulterants. However, the models were not able to predict the 20 spectra of the adulterants in the infrared region, as illustrated in Figure 10.
Psychoactives 2023, 2 14 We included N-ethylpentylone in the group of cathinones used to verify the predictive capacity. Because N-ethylpentylone is the structure of interest in this work, we included its spectrum, made available by the SWGDRUG library of monographs, in this dataset. Thus, we included this molecule in the group used to evaluate the predictive capacity of the models by using known NPS structures. Table 7 combines the values and classes predicted for N-ethylpentylone by the models. Table 7. Prediction value for the experimental spectrum of N-ethylpentylone in the different models developed in this work and the class indicated as correct.

Model
Forecast Only two models (M6 and M8) were not able to predict the spectrum of N-ethylpentylone itself or a homologous cathinone correctly. The other models predicted the correct class.
From the same ideal of evaluating the predictive capacity of the models, we used the experimental data available in the SWGDRUG library of monographs for adulterants. We expected that the models would not indicate the presence of NPSs, because the data concerned only adulterants. However, the models were not able to predict the 20 spectra of the adulterants in the infrared region, as illustrated in Figure 10. Figure 10. Prediction of the spectra in the infrared region of 20 adulterants obtained from the SWG-DRUG library of monographs. The black bars group together the ratings for the model that contained the 21 amphetamines (+1) and cathinones (−1) mixed with adulterants (model M2). The blue bars gather the predictions by using the models containing only the pure adulterants (+1) and combine them with N-ethylpentylone (−1) (model M5). Finally, the red bars illustrate the predictions by For the most part, we observed that the classification of adulterants was erroneously predictive. The models indicated that they belonged to classes that contained some NPSs and not just individual adulterants. We observed that models M5 and M6 correctly predicted only samples d01, d10, and d17. Model M1 (in black) foresaw five possible adulterants (d01, d06, d07, d11, and d16) in the class of amphetamines mixed with adulterants (positive values in this model) and 15 possible adulterants in the class of cathinones with adulterants (negative values in this model).
Finally, Table 8 summarizes all the information obtained during the performed validations and evaluations. The symbol indicates that the modeling or validation parameters or the figures of merit were obeyed. If the parameters were not reached, they are shown with the symbol x. Table 8. Summary of evaluations carried out on the basis of the modeling parameters, validations, and figures of merit for the developed models.

PLS-DA Models Cross-Validation External Validation Figure of Merit
x Cells in red represent models that did not meet the minimum assessment requirements; those in green indicate the models that satisfactorily met the evaluated parameters.
Models M1 and M2 presented the best responses for all the analyses. We understand that external validation is a crucial step in evaluating the dataset. Therefore, we disregarded models M3 and M4 in the subsequent analyses.

Forecasting Seized Samples by PLS-DA
On the basis of the modeling carried out with the computationally obtained data (Table 3), the internal (Table 4) and external (Table 5) validations, and the analysis of figures of merit (Table 6), we submitted the experimental data originating from the arrests of the Federal Police Brazilian to the forecast. In the first stage of this work, we analyzed 68 samples suspected of containing the N-ethylpentylone in their composition ( Figure 6). Thus, we used models with adequate sensitivity, specificity, precision, and accuracy for the prediction, to reduce the rate of false positives or false negatives, as indicated in Table 8.
We performed the prediction of these samples, and interpretation of the results followed the principle. The more positive or negative, the greater the correspondence with the class indicated in that model. In Figure 11, we summarized the ranking results for the two best models (numerical values in Table S5 of Supplementary Materials).
The prediction provided by these two models indicated the presence of some structure resembling amphetamines in the seized samples. In Figure 11a, model M1 predicted that 32 of the 68 samples were individual amphetamines and that the others were individual cathinones (36 of 68). As expected, model M2 indicated that all the samples belonged to classes containing amphetamines with adulterants and not cathinones.
We used the percentile function to return the k-th values in a range. As described in Table 9, we explored the percentages from 10% to 100%. The prediction provided by these two models indicated the presence of some structure resembling amphetamines in the seized samples. In Figure 11a, model M1 predicted that 32 of the 68 samples were individual amphetamines and that the others were individual cathinones (36 of 68). As expected, model M2 indicated that all the samples belonged to classes containing amphetamines with adulterants and not cathinones.
We used the percentile function to return the k-th values in a range. As described in Table 9, we explored the percentages from 10% to 100%.  On the basis of the responses summarized in Table 9, we observed that model M1 classified 50% of the seized samples as cathinones (negative values) and the remaining samples as amphetamines (positive values). We observed dispersed results regarding model M2, indicating that the samples may have various constitutions.

Discussion
When we analyzed the experimental data by comparing Figures 4 and 6, we observed that the samples had similar profiles. However, in Figure 7, we observed that some samples were not correlated, indicating that the compositions were not homogeneous. Figure 8 reinforced this observation, as most samples showed similarity of less than 0.6. Only less than 20% of the samples showed similarity higher than 0.8. Interpreting spectra in the infrared region is not trivial and cannot be done by direct comparison (match), because the two spectra might be considerably similar while the compounds have different compositions. Indeed, infrared spectroscopy indicates bands of chemical groups, which may belong to different molecules [10]. These similarity metrics do not discriminate against structural differences in that there may be instrumental problems or sample impurities [86].
Given that there is no quality control for drugs, interferents may have a similar infrared profile, possibly generating uncertainties in the analyses. These interfering compounds can be (i) adulterants, compounds added to mimic some of the effects expected for these drugs, or (ii) diluents, compounds that do not produce psychoactive effects but can increase the bulk [87,88].
To reproduce this condition, we also computed the spectra of possible adulterants, 21 amphetamines, and 21 other cathinones, N-ethylpentylone, and its similar amphetamine. We used these data to prepare the training sets that were the basis of the models described in Table 2. Table 3 indicated that the relations R 2 > Q 2 and RMSEC < RMSEV were respected in all cases. Moreover, the difference between R 2 and Q 2 suggests that the possibility of the models being overfitted to the data will be minimum, so this value should be the minimum possible [85]. Thus, the cutoff criterion used for this evaluation corresponded to values lower than 0.1 [84].
Tables 4 and 5 refer to internal (crossed) and external validations (using the Kennard-Stone algorithm). We observed that the models that contained only the adulterants could not specifically indicate the compounds. The responses in Figure 11 reinforced this observation and suggested that the spectroscopic characteristics of the adulterants were diffuse, so this group of compounds could be more significant or would need to be mixed. This group did not provide cohesion for the adequate modeling of the classes. Furthermore, the responses indicated that other studies to evaluate mixtures by computational means are possible. Table 6 allowed us to evaluate the predictive capacity of the models, as a validation of the prediction, through figures of merit. With this approach, we were able to observe the sensitivity, specificity, prediction, and accuracy of detecting samples resembling the study system. By using spectral data from known samples, which are already employed as references in several forensic laboratories, we validated the predictive capacity of the models developed with computationally obtained data. In this evaluation with real experimental data, models M1, M2, M3, M4, M7, and M12 provided the best rates of correct classification. Table 7, which shows the predictions for the N-ethylpentylone spectrum available in the SWGDRUG library of monographs, reinforced this observation. Table 8 allowed us to analyze all the responses together and to make more-assertive decisions when choosing the best model. Thus, we conducted the other analyses by using only models M1 and M2, which presented sensitivity, specificity, and accuracy values above 85%. In addition, these models were under the other evaluated parameters.
Finally, we predicted samples suspected of being N-ethylpentylone, seized by the Brazilian Federal Police. The responses showed that the presence of amphetamines could be identified in most samples. In model M1, 47.1% and 52.9% of the samples belonged to the class of pure amphetamines and cathinones, respectively. When the presence of possible adulterants was considered, as in the case of model M2, all the samples were classified as belonging to the class of amphetamines with adulterants.
Given that the infrared spectroscopy technique allows functional groups and some intra-and intermolecular interactions to be analyzed, the technique was not enough to identify the single structures of amphetamines or cathinones in the samples. In models M1 and M2, most samples presented nominal values very close to the limit, around zero. Therefore, the identity of the compound as amphetamine or cathinone must be confirmed by ancillary techniques.
Different studies have applied spectroscopy in the infrared region to identify illicit drugs in seizures or to aid in harm reduction. Dixon et al. evaluated three techniques (FTIR, 1 H NMR, and GC-MS) for obtaining information on the drug composition of seized samples [89]. The authors concluded that analyzing response acquisition time and precision is important. If the objective involves quick responses, FTIR is the most appropriate technique. If accurate analysis is targeted, GC-MS is chosen. GC-MS is widely accepted as the gold standard for monitoring drug supply [90,91]. However, the method cannot provide timely information for consumers to make a decision or for police to seize samples. Thus, other techniques, such as infrared spectroscopy with Fourier-transform, ultravioletvisible spectroscopy, or Raman spectroscopy, would be necessary. In all cases, two (or more) analytical methods are needed to validate the presence or absence of a substance [14,90]. Wallace et al. proposed implementing multiple techniques to indicate the composition of drugs consumed in the context of a party [92]. For infrared analysis, they evaluated fentanyl-like drugs and compared them with the SWGDRUG library. The evaluations were limited to the available database. Green et al. also used different techniques, including FTIR, to analyze fentanyl drugs obtained from seizures. However, one of the main criticisms of using this technique is the need for updated sample libraries [93].
According to these results, although SWGDRUG classifies spectroscopy in the infrared region as Category A for pure or high-purity samples [14], specific conditions still need to be discussed. Issues related to spectroscopic analysis can occur because of (i) the resolution, (ii) linearity corrections, (iii) the absence of analytical standards for NPSs, (iv) the type of mathematical corrections used to collect interferograms, and (v) the amount of the sample [7,86], which justifies the use of interpretation through chemometric methods in conjunction with spectroscopic analysis [7].
Developing chemical drug profiles through modeling allows (i) the more assertive comparison of samples to be made, (ii) the geographic origin to be identified, (iii) drug trafficking and distribution to be tracked, (iv) the chemical reagents used in production to be assessed, and (v) interferences (diluents and/or adulterants) to be identified [94]. In addition to that, the structural alterations of psychoactive compounds are frequently found in the context of NPSs. Clandestine laboratories have no control, insecurity factors, and possible cross-contamination [4].
The results obtained here are essential because seized NPSs are becoming increasingly diverse and contain countless possible adulterants. However, as much as there is an information gap that needs to be filled for fast decision-making, a new discussion on spectroscopy in the infrared region is needed. This technique has been recently consolidated because of its instrumental facility and potential for providing faster responses than other methods. However, differentiating between similar structures or characterizing contaminated samples must be considered.
The information obtained and analyzed here needs to be combined into a larger context of forensic intelligence, where the data are put into perspective [95][96][97][98][99]. This procedure allows the problem to be understood with alternative information from different methodologies to solve crimes and to support decision-making [100][101][102].

Conclusions
We applied spectroscopy in the infrared region to obtain prompt information in the repressive context and for harm reduction. The proposed computational approach can identify amphetamines and/or cathinones in seized samples. The combination of computational simulation and statistical methods adequately sets a benchmark for new psychoactive substances. PLS-DA can predict the unknown samples in the modeled classes. However, infrared spectroscopy cannot differentiate between N-ethylpentylone and its homologous amphetamine. The reason for this behavior must be assessed so that the real influence of adulterants on the spectra can be determined. A more detailed study of the interferents may also be required. Other techniques, such as GC-MS, must provide moreassertive identification in a forensic laboratory. Literature shows similar conclusions [89][90][91][92][93]. Nevertheless, this study can give presumptive information and be a first step toward using computer simulation to compose benchmarks for in situ comparison. A more robust dataset can aid in decision-making in a multidisciplinary and integrated context of NPS forensics.
Creating a comprehensive dataset to follow international surveillance trends can be meaningful. The methodology studied in this work provides essential and adaptable information to identify new substances or harmful contaminants. The main advantage is that these benchmarks dismiss the need for laboratory supplies and regulatory authorizations. Using computational standards associated with chemometrics can be helpful for law enforcement and human rights in that these standards reduce the time needed for conducting the experimental tests.

Supplementary Materials:
The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/psychoactives2010001/s1. Table S1-Name and code of amphetamines and cathinones used in this work for designing the PLS-DA models. Table S2-20 amphetamines and 20 cathinones available on the SWGDRUG library of monographs. Table S3-20 adulterants available on the SWGDRUG library of monographs. Table S4-Numerical values obtained by Pearson's correlation for samples evaluated experimentally. Table S5-Numerical values obtained by classification using PLS-DA for samples seized in models M1 and M2 13.