Experimental Data Based Machine Learning Classification Models with Predictive Ability to Select in Vitro Active Antiviral and Non-Toxic Essential Oils

In the last decade essential oils have attracted scientists with a constant increase rate of more than 7% as witnessed by almost 5000 articles. Among the prominent studies essential oils are investigated as antibacterial agents alone or in combination with known drugs. Minor studies involved essential oil inspection as potential anticancer and antiviral natural remedies. In line with the authors previous reports the investigation of an in-house library of extracted essential oils as a potential blocker of HSV-1 infection is reported herein. A subset of essential oils was experimentally tested in an in vitro model of HSV-1 infection and the determined IC50s and CC50s values were used in conjunction with the results obtained by gas-chromatography/mass spectrometry chemical analysis to derive machine learning based classification models trained with the partial least square discriminant analysis algorithm. The internally validated models were thus applied on untested essential oils to assess their effective predictive ability in selecting both active and low toxic samples. Five essential oils were selected among a list of 52 and readily assayed for IC50 and CC50 determination. Interestingly, four out of the five selected samples, compared with the potencies of the training set, returned to be highly active and endowed with low toxicity. In particular, sample CJM1 from Calaminta nepeta was the most potent tested essential oil with the highest selectivity index (IC50 = 0.063 mg/mL, SI > 47.5). In conclusion, it was herein demonstrated how multidisciplinary applications involving machine learning could represent a valuable tool in predicting the bioactivity of complex mixtures and in the near future to enable the design of blended essential oil possibly endowed with higher potency and lower toxicity.


Introduction
Essential oils (EOs) are natural complex aromatic-smelling mixture [1], deriving from plants' secondary metabolism and containing predominately monoterpenes, sesquiterpenes and their oxygenated derivatives. EOs are known to be biosynthesized in flowers, leaves, fruits and roots [2] and are industrially produced mainly by hydro- [3] or steam-distillation [4].
Interests in the use of EOs in several fields is continuously increasing due to their biological properties such as antioxidant or antimicrobial activities and many others [5][6][7][8]. Due to the EOs' chemical composition complexity, recently challenges have been undertaken to discern the synergistic and anti-synergistic roles of each single constituent and how they could influence the pharmacological activities [9][10][11]. This categorization is even more complicated due to the 'chemotype' concept, in which the same plant could produce different EOs' chemical composition profiles and therefore different biological properties [12,13]. Holy basil, thyme, lavender and peppermint are examples of plants with several chemotypes [14]. Despite this hurdle, an effort to characterize EOs is currently undergoing in medical and pharmaceutical fields, with the goal to obtain a clearer indication for their uses in traditional medicine, chemical or pharmaceutical as witnessed by almost 5000 articles in the last decade with an average positive increment of more than 7% per year ( Figure 1). The emergence of novel drug-resistant microorganisms motivates the continuous search for new therapeutic agents also in the natural world [15]. Among these, EOs are also continuously under investigation as potential new antiviral agents. EOs derived from Malaleuca alternifolia, Mentha piperita and Thymus vulgaris as well as isolated essential oil components, were reported to show antiviral properties, specifically against enveloped viruses [9]. In 2014 Civitelli et al. [16] explored Mentha suaveolens EO (MSEO) effectiveness against herpes simplex virus type-1 (HSV-1) replication in an in vitro model of infection. MSEO and its main component, piperitenone oxide, were found to reduce HSV-1 replication with IC50s of 0.0051 mg/mL and 0.0014 mg/mL, respectively. Very recently, Toujani et al. [17] demonstrated the antiviral properties of Thymus capitatus against HSV-1 and herpes simplex virus type-2 (HSV-2), by testing three different phytopreparations (aqueous extract (AE), ethanolic The emergence of novel drug-resistant microorganisms motivates the continuous search for new therapeutic agents also in the natural world [15]. Among these, EOs are also continuously under investigation as potential new antiviral agents. EOs derived from Malaleuca alternifolia, Mentha piperita and Thymus vulgaris as well as isolated essential oil components, were reported to show antiviral properties, specifically against enveloped viruses [9]. In 2014 Civitelli et al. [16] explored Mentha suaveolens EO (MSEO) effectiveness against herpes simplex virus type-1 (HSV-1) replication in an in vitro model of infection. MSEO and its main component, piperitenone oxide, were found to reduce HSV-1 replication with IC 50 s of 0.0051 mg/mL and 0.0014 mg/mL, respectively. Very recently, Toujani et al. [17] demonstrated the antiviral properties of Thymus capitatus against HSV-1 and herpes simplex virus type-2 (HSV-2), by testing three different phytopreparations (aqueous extract (AE), ethanolic extract (EE) and EOs). Thymus capitatus phytopreparations AE, EE and EO were analyzed by a gas chromatography/mass spectrometry (GC/MS) technique [10,15,18,19], identifying β-sitosterol, cinnamaldehyde and carvacrol as the major chemical components. These three molecules Molecules 2020, 25,2452 3 of 17 were thus tested as pure compounds for their ability to inhibit the HSV-2 replication showing an EC 50 of 0.0027, 0.0397 and 0.0519 mg/mL, respectively [17].
Multidisciplinary applications have been reported to successfully confirm the antiviral properties of some medicinal plants extracts, including EOs as recently reported by Tariq et al. [20]. In this context, the herein reported study was aimed at investigating the potential anti-HSV-1 activity on a series of EOs to improve the knowledge about the antiviral effects of natural chemical mixtures. Hence, a series of EOs derived from three different plants, Calamintha nepeta (CN) [18], Foeniculum vulgare (FV) [19] and Ridolfia segetum (RS) [21], were considered. These EOs, extracted using the protocol by Božović et al. [18] and chemically characterized by GC/MS were herein tested in an in vitro model of HSV-1 infection. Next, by means of principal component analysis (PCA) [22] and partial least squares discriminant analysis (PLS-DA) [23], quantitative composition-activity relationships (QCAR) models were developed and validated for their abilities in prediction to select further untested EOs for improved antiviral and cytotoxic profile or possibly design blended EOs [24,25].

EOs' Cytotoxic and Antiviral Effects
First, to check for cytotoxicity, Vero cells were incubated with different EO concentrations (0.001-0.5 mg/mL) for 24 h and cell proliferation was measured by means of MTT assay ( Figure 2). Then the antiviral effect was evaluated in Vero cells infected with 0.1 m.o.i. of HSV-1 and exposed soon after the virus-adsorption period (1 h) to various concentrations of each EO in a range of 0.0312-0.5 mg/mL for 24 h post-infection (p.i.; a representative ICW analysis is shown in Figure 1, results in Table 1). With the only exception of samples 9 (FO24) and 35 (R6), all tested EOs displayed CC 50 values higher than IC 50 s, thus indicating that their effect on viral replication was not affected by the cytotoxicities (Table 2). In particular, EOs from CN displayed the highest antiviral potencies (IC 50 range = 0.12-0.44 mg/mL, average = 0.22 mg/mL), the lowest cytotoxicity (CC 50 range = 1.10-4.71 mg/mL, average = 2.65 mg/mL) and the most favorable selectivity indexes (SI range = 2.50-34.57, average = 13.87). Intermediate favorable profile was displayed by FVEOs samples that had average values of IC 50 , CC 50 and SI of 0.356, 1.24 and 4.7, respectively. Regarding the four RSEO samples, they displayed the worst profile with good average IC 50 , but associated to high cytotoxicity and thus low SI values.

Unsupervised Data Analysis
PCA was used as an unsupervised technique to analyze and compare the 38 selected EOs (Table  2). A cumulative explained variance of about 76.54% was described by the first two PCs. In particular, 63.17% of data variability was contained in the first PC, while 13.37% in the second PC. A cumulative explained variance of 95% was obtained extracting the third and fourth PCs (PC3 = 11.41%, PC4 = 7.04%). As most of the variance was contained in the first two PCs inspection was focused at the respective score plot whose analysis revealed the presence of two distinct clusters ( Figure 3A). FVEOs and RSEOs were grouped in a first most populated cluster, whereas the CNEOs constituted a second one. The two clusters clearly indicated differences in the EOs chemical compositions and at the same

Unsupervised Data Analysis
PCA was used as an unsupervised technique to analyze and compare the 38 selected EOs (Table 2). A cumulative explained variance of about 76.54% was described by the first two PCs. In particular, 63.17% of data variability was contained in the first PC, while 13.37% in the second PC. A cumulative explained variance of 95% was obtained extracting the third and fourth PCs (PC3 = 11.41%, PC4 = 7.04%). As most of the variance was contained in the first two PCs inspection was focused at the respective score plot whose analysis revealed the presence of two distinct clusters ( Figure 3A). FVEOs and RSEOs were grouped in a first most populated cluster, whereas the CNEOs constituted a second one. The two clusters clearly indicated differences in the EOs chemical compositions and at the same time also some resemblances among RSEOs and FVEOs samples. Analysis of PCA loading plots ( Figure 3B) revealed estragole, o-cymene, α-pinene and α-phellandrene as the chemical constituents mainly characterizing RSEOs and FVEOs cluster, while pulegone was mainly associated to the CNEOs samples cluster. A further important chemical component emphasized by the PCA loading plot ( Figure 3B) was menthone, mainly associated to sample 32 (COM1) that seem to be of peculiar composition so that this sample in the score plot is localized in a zone not comprised in neither above clusters.

Supervised Classification Modeling
Optimal cut-off values to divide the dependent data (IC50 and CC50), into active (A) and nonactive (NA), toxic (T) and non-toxic (NT) classes were established starting from the corresponding median values (0.20 for the IC50 and 2.06 for the CC50), which were systematically modified applying an increase or decrease of 5% to inspect for different cut-off boundaries. Cut-off values were inspected while running leave one out cross validation (LOO-CV) while monitoring the explained variance (EV), the fitting-non-error-rate (FNER), the cross-validation-non-error-rate (CVNER) and accuracy (ACC; Table 1).
For IC50 values, the best PLS-DA classification model (IC50-PLS-DA) was obtained with a cut-off of 0.15, while for CC50s (CC50-PLS-DA) the optimal cut-off was found to be 2.06 (Table 3).
Contiguous blocks LOO-CV with 19 PCs was applied to a preliminary model in order to select the optimal number of latent variables to be used for either IC50-PLS-DA or CC50-PLS-DA datasets. Focusing on IC50-PLS-DA, the analysis of the cross validation error rate (CVER) as a function of the increasing number of PCs showed a minimum explained variance difference between 7 and 8 PCs, revealing the first one as the best PC to be used in the final model ( Figure 4). A similar analysis was performed for CC50-PLS-DA, identifying the lowest CVER value both in 1 and the 3 PCs. To guarantee a major explained variance 3 PC was set ( Figure 4).

Supervised Classification Modeling
Optimal cut-off values to divide the dependent data (IC 50 and CC 50 ), into active (A) and non-active (NA), toxic (T) and non-toxic (NT) classes were established starting from the corresponding median values (0.20 for the IC 50 and 2.06 for the CC 50 ), which were systematically modified applying an increase or decrease of 5% to inspect for different cut-off boundaries. Cut-off values were inspected while running leave one out cross validation (LOO-CV) while monitoring the explained variance (EV), the fitting-non-error-rate (FNER), the cross-validation-non-error-rate (CVNER) and accuracy (ACC; Table 1).
For IC 50 values, the best PLS-DA classification model (IC 50 -PLS-DA) was obtained with a cut-off of 0.15, while for CC 50 s (CC 50 -PLS-DA) the optimal cut-off was found to be 2.06 (Table 3). Contiguous blocks LOO-CV with 19 PCs was applied to a preliminary model in order to select the optimal number of latent variables to be used for either IC 50 -PLS-DA or CC 50 -PLS-DA datasets. Focusing on IC 50 -PLS-DA, the analysis of the cross validation error rate (CVER) as a function of the increasing number of PCs showed a minimum explained variance difference between 7 and 8 PCs, revealing the first one as the best PC to be used in the final model ( Figure 4). A similar analysis was performed for CC 50 -PLS-DA, identifying the lowest CVER value both in 1 and the 3 PCs. To guarantee a major explained variance 3 PC was set ( Figure 4).    The IC 50 -PLS-DA and CC 50 -PLS-DA classification models were also inspected by means of the features importance plot, in which is summarized the contribution of each chemical constituent to the biological and toxicology properties, respectively. For the antiviral effects, in the IC 50 -PLS-DA model β-myrcene, limonene, 3-octanol and crysanthenone (Figure 7) were characterized by positive PLS coefficients (Figure 8) that could represent those components able to differentiate EOs into active or inactive. On the other hand, by inspecting the negative PLS-DA coefficients (Figure 8), these indicated those compounds likely determining the decreased biological activity. Among those were α-pinene, α-phellandrene, o-cymene, pulegone, thymol and myristicin (Figure 7). Interestingly, some of these were pointed by the unsupervised analysis (PCA) as the chemical constituents characterizing the RSEOs and FVEOs cluster (Figure 3). The only exception was pulegone, associated to negative regression coefficient, but characterizing the cluster of CNEO samples labeled as active and non-toxic EOs.
3-metilcicloesanone, germacrene D, isopiperitenone, methylisopulegone, p-menthone, p-menthene and trans-p-mentha-2,8-dienol had zero values for the PLS-DA coefficient (Figure 7), these molecules likely to be neutral for the biological activity.  (Table 1). Samples at the top of the plot were assigned to be a non-toxic class while those at the bottom were classified as toxic.
The IC50-PLS-DA and CC50-PLS-DA classification models were also inspected by means of the features importance plot, in which is summarized the contribution of each chemical constituent to the biological and toxicology properties, respectively. For the antiviral effects, in the IC50-PLS-DA model β-myrcene, limonene, 3-octanol and crysanthenone (Figures 7) were characterized by positive PLS coefficients (Figure 8) that could represent those components able to differentiate EOs into active or inactive. On the other hand, by inspecting the negative PLS-DA coefficients (Figure 8), these indicated those compounds likely determining the decreased biological activity. Among those were α-pinene, α-phellandrene, o-cymene, pulegone, thymol and myristicin (Figure 7). Interestingly, some of these were pointed by the unsupervised analysis (PCA) as the chemical constituents characterizing the RSEOs and FVEOs cluster ( Figure 3). The only exception was pulegone, associated to negative regression coefficient, but characterizing the cluster of CNEO samples labeled as active and non-toxic EOs. 3-metilcicloesanone, germacrene D, isopiperitenone, methylisopulegone, p-menthone, pmenthene and trans-p-mentha-2,8-dienol had zero values for the PLS-DA coefficient (Figure 7), these molecules likely to be neutral for the biological activity  Table 2). Samples at the top of the plot were assigned to be a non-toxic class while those at the bottom were classified as toxic.      A similar analysis was carried out for CC50-PLS-DA classification model ( Figure 9) positive coefficients were assigned to menthol, menthone, estragole, 3-octanol, pulegone and limonene ( Figure 7) indicating that these compounds could be associated to an EOs with low toxicity profile.
Among chemical components characterized by negative PLS-DA coefficients, only chrysanthenone ( Figure 7; Figure 9) displayed a highly negative coefficients indicating that it mainly associated to toxicity, nevertheless chrysanthenone displayed a positive coefficient in the IC50-PLS-DA model. A similar analysis was carried out for CC 50 -PLS-DA classification model (Figure 9) positive coefficients were assigned to menthol, menthone, estragole, 3-octanol, pulegone and limonene ( Figure 7) indicating that these compounds could be associated to an EOs with low toxicity profile.

PLS-DA Classification Models Predictive Abilities
As reported, any quantitative model should be assessed for its effective usability [26]. Herein the QCAR classification models were tested for their ability to classify the 52 excluded EOs used as an external test set. The two models were applied in a sequential way, as the first filter, the application of the above described IC50-PLS-DA classification model, predicted 21 out of 52 samples as potentially active against HSV-1 (40, 42, 43, 54, 58, 63-69, 72, 74, 75, 78, 79, 83 and 84 of Table 4, Figure 10A). Then, as a second filter, the CC50-PLS-DA classification model was applied on the 21 predicted active   Figure 9) displayed a highly negative coefficients indicating that it mainly associated to toxicity, nevertheless chrysanthenone displayed a positive coefficient in the IC 50 -PLS-DA model.

PLS-DA Classification Models Predictive Abilities
As reported, any quantitative model should be assessed for its effective usability [26]. Herein the QCAR classification models were tested for their ability to classify the 52 excluded EOs used as an external test set. The two models were applied in a sequential way, as the first filter, the application of the above described IC 50 -PLS-DA classification model, predicted 21 out of 52 samples as potentially active against HSV-1 (40, 42, 43, 54, 58, 63-69, 72, 74, 75, 78, 79, 83 and 84 of Table 4, Figure 10A). Then, as a second filter, the CC 50 -PLS-DA classification model was applied on the 21 predicted active EOs and predicted only five of them as potentially endowed of low cytotoxicity (68, 73-75 and 79; Figure 10B). Promptly the five EOs samples 68, 73-75 and 79 were tested both for their ability to inhibit HSV-1 and for their cytotoxicity. Sample 68 was selected as proof of concept as it was predicted to be toxic. Surprisingly, the experimental data confirmed the predictions, revealing four out of five samples (80%) to be endowed of high anti-HSV-1 potency and low cytotoxicity. Indeed, two of the newly tested EOs (73 and 75) displayed IC 50 with even greater potencies, being 73 the most potent (IC 50 = 0.0632 mg/mL) and with increased selectivity index (SI = 47.5). In agreement with the prediction, sample 68 was indeed found with modest anti-HSV-1 potency, quite toxic and low SI index (Table 5).  (A) (B)  [24] for all predicted EOs (test set, Table 1). For comparison in the plot are also reported the points relative to the training set (green and orange points, Figure 6).   [24] for all predicted EOs (test set, Table 2). For comparison in the plot are also reported the points relative to the training set (green and orange points, Figure 5).

Plants Materials and EOs Extraction
EOs extracted from Calamintha Nepeta (CNEO), Foeniculum vulgare (FVEO) and Ridolfia Segetum (RSEO) plants were used in this study. In particular, 38 different EOs were selected from an in-house list of 90 EOs [10] on the basis of their chemical composition to cover as much as possible the chemical variability and reducing the experimental part [5,6]. As previously reported [10,27], aerial parts of the three plants were collected during the summer and early autumn periods of the year 2015, in a wild area around Tarquinia city (Province of Viterbo, Italy). As previously described [27], CNEOs were obtained directly from fresh plant material, while for FVEOs and RSEOs were used air-dried in a shady place for 20 days. EOs were extracted by steam distillation using a 62 L distillatory apparatus (Albrigi Luigi E0131, Verona, Italy), following the protocol previously reported [28]. To prevent degradation EOs' were kept frozen at -30 • C until their usage and routinely checked for they stability.
EOs were dissolved in ethanol and further diluted in medium for cell culture experiments, always resulting in an ethanol concentration below 1%, which has no effect on cells and viruses [29].

Cell Culture, Virus Production
African green monkey kidney ATCC CCL-81 Vero cells were grown in Roswell Park Memorial Institute (RPMI) 1640 medium (Gibco, Invitrogen Corporation, CA) supplemented with 10% heat-inactivated fetal bovine serum (FBS, Gibco, Invitrogen Corporation, CA), 1% glutamine, 50 U per mL penicillin and 50 µg/mL streptomycin (Sigma-Aldrich, MO, USA). The cells were maintained at 37 • C in humidified air containing 5% CO 2 . Viability of cells was estimated by Trypan blue (0.02% final concentration) exclusion assay (Invitrogen Corporation). For virus production monolayers of Vero cells in 75 cm 2 tissue culture flasks were infected with HSV-1 strain F at a multiplicity of infection (m.o.i.) of 0.01. After 48 h at 37 • C, infected cells were harvested with 3 freeze-and-thaw cycles, cellular debris were removed with low-speed centrifugation and the virus titer was measured by the standard plaque assay [30]. Similarly, mock solution consists of the supernatant of mock-infected Vero cells. The titer of the virus preparation was 5 × 108 plaque forming units (pfu)/mL. The virus was stored at −70 • C until used.

Cellular Toxicity
Cellular toxicity of EOs was tested in vitro, as previously reported [31,32]. Monolayers of Vero cells were incubated with each of the 38 EOs at concentrations from 0.001 to 0.5 mg/mL in RPMI 1640 for 24 h and the medium added with 50 µL of a 1 mg/mL solution of MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide, Sigma-Aldrich (St. Louis, MO)) in RPMI without phenol red (Sigma-Aldrich). Cells were incubated at 37 • C for 3 h, and 100 µL of acid-isopropanol (0.1 N HCl in isopropanol) was added to each well. After a slightly mixing by pipetting to ensure that all MTT crystals were dissolved, the plates were read using an automatic plate reader with a 570 nm test wavelength and a 690 nm reference wavelength. The drug concentration required for reducing the cell viability by 50% (CC 50 ) was assessed. Wells containing medium with ethanol at the same concentration as in the samples were also included on each plate as controls.

In Cell Western (ICW) Technique for Antiviral Activity
The antiviral activity of EOs was evaluated using the in cell western (ICW) technique [33]. Briefly, Vero cells were seeded in 96 well-plates and after 24 h were HSV-1 infected at 0.1 m.o.i. and after 1 h adsorption at 37 • C, the plates were washed with phosphate buffered saline (PBS) and the medium replaced with 2% FBS RPMI containing 1% glutamine, 50 U per mL penicillin, and 50 µg/mL streptomycin in the presence of EOs at different serial concentrations (0.50, 0.25, 0.125, 0.0625 and 0.0312 mg/mL). HSV-1-infected cells cultured in the presence of EO vehicle were used as comparative control. Twenty-four hour later, cells were fixed with 4% paraformaldehyde in PBS for 15 min at room temperature (r.t.), and then were permeabilized in 0.1% triton X-100 PBS for 5 min at r.t. Cells were then incubated with Odyssey Blocking Buffer for 1 h at r.t., and then stained 1 h with a primary antibody raised against glycoprotein B (gB; sc-56987, Santa Cruz, 1:1000 dilution in Odyssey Blocking buffer), a late HSV-1 protein, then washed three times with PBS containing 0.1% Tween-20 and incubated with the secondary antibody IRDye 800 CW Goat Anti Mouse (926-32210 LI-COR Biosciences, 1:1000 dilution in Odyssey Blocking buffer; green fluorescence). Finally, cells were stained for 1 h with Cell-Tag 700 (926-41090, LI-COR Biosciences, 1:500) a fluorescent dye that stains cells and allows one to detect the cell layer (red fluorescence) in order to normalize viral protein fluorescence intensity to the cells number. After four washes with PBS containing 0.1% Tween-20, the plate was scanned on the Odyssey Infrared Imager, and the integrated intensity value of each well read by LI-COR Image Studio Software developed for Odyssey analysis. Mock-infected cells were used as controls, and their intensity used as a background. Normalized fluorescence intensity resulting from each staining was used to evaluate viral replication. Wells containing medium with ethanol at the same concentration as in the samples were also included on each plate as controls.

Biological Data Analysis
Data analysis for antiviral activity of EOs by ICW was evaluated using a method developed by exploiting a Java-based image processing program (IMAGE-J) that allows one to identify the surface area occupied by fluorescence in each well and then to calculate the 'area percentage', i.e., the percentage of well area covered by fluorescence [34]. The resulting values were fitted by a non-linear regression using the mathematical model log (EOs) vs. normalized response in GraphPad Prism, (Prism version 6.00 for MS Windows, GraphPad Software, La Jolla California USA, www.graphpad.com). IC 50 was calculated as the drug concentration required for reducing virus replication by 50%.

Unsupervised Data Analysis
Chemical composition data was organized in an independent data matrix consisting of 38 rows (EOs samples) and 56 columns (chemical components). PCA [20] was initially applied as a preliminary step for exploratory analysis to identify possible outliers. The number of principal components (PCs) was chosen on the basis of a minimal increment (5%) of explained variance.

Supervised Classification Modeling
For the binary classification models [22], performed with PLS-DA [37], the EOs concentrations, used as the independent variable X matrix, were pretreated by means of a mean scaling. The PLS-DA technique is a special form of projection of latent structures (PLS, also named partial least square) commonly used for linear classification [38] that search for latent variables with a maximum covariance with the Y variables. In PLS-DA the Y-block describes which objects are in the defined classes. In a binary classification application, the continuous variable can be easily defined in two classes by a cutoff value and setting the values to 1 if the objects have Y higher values than the cutoff and 0 if they are lower [39]. Elaboration of the model will return calculated Y values, in a similar way as for a regression approach by PLS. In analogy with the PLS algorithm, the model is described by the variables regression, i.e., for each class. PLS coefficients characterized by high absolute values are generally related to important variables for class discrimination, in particular positive coefficients indicate those variables that most contribute to the increase of the 1 class calculated response [35]. The coefficients were used to elaborate the feature importance plot (see the Results and Discussion Section Figures 7 and 8).
The antiviral activity (IC 50 ) and the toxicity (CC 50 ; Table 2) values experimentally determined were used as dependent variable vectors in two distinct PLS-DA models in which each dependent variable was divided into two classes (active/non-active and toxic/non-toxic) on the basis of an optimal cut-off value (see results section) obtained by a systematic procedure search. The final classification models were numerically and graphically evaluated through explained variance, accuracy (ACC) and non-error rate (NER) [40] as calculated from the final model and leave-one-out cross internal validation.
The accuracy describes the global predictive ability, identifying as positive the true positive and as negative the true negative and is defined as: where n is the total number of samples. Not assigned samples are not considered for the accuracy calculation. The NER [29] was calculated as arithmetic mean of sensitivity values of the G classes.
where G is the total number of classes, and Sn g [40] is the sensitivity of the g-th class, also known as true positive rate, and can be defined as the ability of given classifier to correctly identify the samples of the g-th class and can be calculated as: where c gg is the number of samples belonging and correctly assigned to class g and n g is the number of samples belonging to the g-th class. In the text a reference was added for this concept.

Assessment of the Models' Predictive Ability
An internal library of 52 EOs samples not used to define the PLS-DA model was selected as the external test set. The chemical composition of the test set was known and organized in an independent data matrix similarly as for the training set and consisted of 52 rows (EOs samples) and 56 columns (chemical components) [27].

Conclusions
From an internal library of 90 different EOs a training set of 38 was compiled, tested for antiviral activity and cytotoxicity and the experimental data used to develop PLS-DA classification models able to discriminate either anti-HSV-1 active versus non active samples or cytotoxic versus low cytotoxic endowed samples. Two classification models were obtained with satisfactory statistical coefficients. Analysis of the models by means of features importance indicated β-myrcene, limonene 3-octanol and chrysanthenone as key chemical components for the EOs' biological effects. The two models were applied to EOs not included in the training set and proved their predictive abilities in selecting five EOs capable of high antiviral potency and low cytotoxicity. Four out of five (80%) of the selected EOs indeed revealed to be active against HSV-1 and with low cytotoxicity values.
These results and those previously reported demonstrating the EOs great antioxidant and antimicrobial properties [41], confirm the possibility of using these substances in a wide array of applications, like pharmaceutical [11], nutraceutical [42] and food preservatives [43]. Despite the wide EOs potential, further efforts are needed to better understand crucial-chemical information like optimum dose and safe limits, as well as aspects related to food uses, as the impact of these compounds on sensory quality.
Different interesting aspects to be clarified and deepened is how the chemical composition may influence the observed biological effects, if these are the results of possible synergistic or antagonistic mechanisms between the single chemical components and if the isolated compound preserves the same identified effects. Several reports with this purpose have been found in the literature, often enriched with extensive machine learning approaches [11] in which a potential main chemical compound was identified and investigated about its biological properties [44]. This latter step is crucial for the detection of new molecules able to replace and support those already known and used as antimicrobial and antifungal agents. In this context are important further extensive researches trying to model blended EOs with enhanced biological profiles and mix key chemical components for preparation of mixture with ad-hoc enhanced efficacy and less toxicity.

Conflicts of Interest:
The authors declare no conflict of interest.