Machine Learning Analyses on Data including Essential Oil Chemical Composition and In Vitro Experimental Antibiofilm Activities against Staphylococcus Species

Biofilm resistance to antimicrobials is a complex phenomenon, driven not only by genetic mutation induced resistance, but also by means of increased microbial cell density that supports horizontal gene transfer across cells. The prevention of biofilm formation and the treatment of existing biofilms is currently a difficult challenge; therefore, the discovery of new multi-targeted or combinatorial therapies is growing. The development of anti-biofilm agents is considered of major interest and represents a key strategy as non-biocidal molecules are highly valuable to avoid the rapid appearance of escape mutants. Among bacteria, staphylococci are predominant causes of biofilm-associated infections. Staphylococci, especially Staphylococcus aureus (S. aureus) is an extraordinarily versatile pathogen that can survive in hostile environmental conditions, colonize mucous membranes and skin, and can cause severe, non-purulent, toxin-mediated diseases or invasive pyogenic infections in humans. Staphylococcus epidermidis (S. epidermidis) has also emerged as an important opportunistic pathogen in infections associated with medical devices (such as urinary and intravascular catheters, orthopaedic implants, etc.), causing approximately from 30% to 43% of joint prosthesis infections. The scientific community is continuously looking for new agents endowed of anti-biofilm capabilities to fight S. aureus and S epidermidis infections. Interestingly, several reports indicated in vitro efficacy of non-biocidal essential oils (EOs) as promising treatment to reduce bacterial biofilm production and prevent the inducing of drug resistance. In this report were analyzed 89 EOs with the objective of investigating their ability to modulate bacterial biofilm production of different S. aureus and S. epidermidis strains. Results showed the assayed EOs to modulated the biofilm production with unpredictable results for each strain. In particular, many EOs acted mainly as biofilm inhibitors in the case of S. epidermidis strains, while for S. aureus strains, EOs induced either no effect or stimulate biofilm production. In order to elucidate the obtained experimental results, machine learning (ML) algorithms were applied to the EOs’ chemical compositions and the determined associated anti-biofilm potencies. Statistically robust ML models were developed, and their analysis in term of feature importance and partial dependence plots led to indicating those chemical components mainly responsible for biofilm production, inhibition or stimulation for each studied strain, respectively.


Introduction
A biofilm is a microbially derived sessile community characterized by cells irreversibly attached to a substrate or interface or to each other, embedded in a self-produced matrix of extracellular polymeric substances, which exhibits an altered phenotype with regard to growth, gene expression and protein production [1]. Biofilm resistance to antimicrobials [2] is a complex phenomenon, driven not only by genetic mutation induced resistance, but also by means of increased microbial cell density that supports resistance by means of horizontal gene transfer across cells [3]. Indeed, other mechanisms are involved, such as: (i) low penetration of antimicrobial agents due to the barrier function exerted by the biofilm matrix, (ii) presence of cells exhibiting a high multidrug tolerance, (iii) reduced susceptibility to antibiotics as a consequence of stress adaptive responses or changes in the chemical biofilm microenvironment [4]. The strategies adopted to treat these challenging infections are rapidly changing due to the increasing understanding of biofilm structure and functions. Nonetheless, the prevention of biofilm formation and the treatment of existing biofilms is currently a difficult challenge; therefore, the discovery of new multi-targeted or combinatorial therapies is increasingly urgent [5].
The development of anti-biofilm agents is therefore considered of major interest and represents an important strategy since non-biocidal molecules to avoid the rapid appearance of resistant mutants are highly valuable. Among bacteria, staphylococci are prevalent causes of biofilm-associated infections [6]. In particular, Staphylococcus aureus (S. aureus) is an opportunistic pathogen that can cause serious diseases in humans, ranging from skin and soft tissue infections to invasive infections of the bloodstream, heart, lungs and other organs [7]. In 2013, Nicholson et al. reported that 30% of U.S. population was colonized by S. aureus while 1.5% was found to be a carrier of methicillin-resistant S. aureus (MRSA), a major cause of healthcare-related infections responsible for a significant proportion of nosocomial infections worldwide. Recently in the U.S. deaths from MRSA infections have exceeded those from many other infectious diseases, including HIV/AIDS [8]. Staphylococcus epidermidis (S. epidermidis), conventionally considered a commensal of human skin, can cause significant problems when breaching the epithelial barrier, especially during biofilm-associated infection of indwelling medical devices [9,10]. Most diseases caused by S. epidermidis exhibit a chronic profile and occur as device-related infections (such as intravascular catheter or prosthetic joint infections) and/or their complications [10]. In view of the above scenario the scientific community is seeking for new agents endowed of anti-biofilm capabilities to fight S. aureus and S epidermidis infections. Recently, several reports indicated in vitro efficacy of non-biocidal essential oils (EOs) as promising treatment to reduce bacterial biofilm production and prevent the inducing of drug resistance [11]. In different applications, EOs have been found of some efficacy in reducing biofilm production of either S. aureus standard strains or MRSA [12][13][14][15][16][17]. In other reports, EOs and some of their purified chemical components have also been proved to inhibit S epidermidis biofilm production [18][19][20].
Recently machine learning (ML) has been proved as a tool able to deeply investigate the modulatory role of EOs' chemical components on Pseudomonas aeruginosa biofilm production [21][22][23][24]. In particular, 89 EOs extracted in different periods and times of extractions from three different plants  [22]. In line with that study and with the objective of investigating EOs' ability to also reduce bacterial biofilm production in other bacteria, herein is reported an extensive study of the 89 EOs samples as potential antibacterial and anti-biofilm agents against S. aureus ATCC 6538P, S. aureus ATCC 25923, S. epidermidis RP62A and S. epidermidis O-47. To this purpose, like previously reported [22], ML algorithms were applied to the EOs' chemical compositions and the determined associated anti-biofilm potencies, with the purpose of shedding light on those components likely mainly responsible for either positive or negative modulation of biofilm production.

Biofilm Production Modulation by EOs at Selected Fixed Concentrations
Preliminarily, the same representative EOs (2 RSEOs, 3 CGEOs and 3 FVEOs) among the reported 89 used on P. aeruginosa [22] were selected, to evaluate the anti-biofilm potency at different concentrations starting from 25 mg/mL, using scalar dilutions (data not shown).
The obtained preliminary data analyzed in terms of biofilm production modulation and reproducibility led to the selection of two representatives concentrations (3.125 mg/mL and 0.0488 mg/mL). The first concentration was in the range of milligrams while the second one was in the range of micrograms. All 89 EOs were then tested at the two selected concentrations and the biofilm production was measured relatively to untreated bacteria (Figures 1-3). At either selected concentrations EOs modulated the biofilm production with unpredictable results for each strain. These results anticipated that many EOs may act mainly as biofilm inhibitors in the case of RP62A and O-47 strains, while for 6538P and 25923 EOs can either induce no effect or stimulate biofilm production (Table 4). In Table 4, the number of EOs able to inhibit (<100%, <80% and <50%, respectively) or stimulate (≥100%, ≥120%, ≥150% and ≥200%, respectively) biofilm formation is reported. It is worthy to note that on S. epidermidis strains about 30 EOs inhibited more than 50% of biofilm growth even at lowest concentration, while almost none of them showed an activity on S. aureus strains.

Quantitative Analysis of Selected EOs against Different Strains of S. epidermidis
Representative EOs selected among those able to reduce more than 70% of biofilm formation were further analyzed to evaluate a dose-dependent effect against S. epidermidis RP62A and O-47 (Figures 4-6). The inhibition by RSEOs was confirmed at lower concentrations on both strains despite their different biofilm matrix composition and the inhibition of biofilm formation was clearly not dose-dependent ( Figure 4). Analogous results were obtained with FVEOs samples ( Figure 5).
Differently, CGEOs revealed a dose dependent biofilm inhibition being more pronounced on the strongest biofilm producer S. epidermidis O-47 than on S. epidermidis RP62A ( Figure 6).   At either selected concentrations EOs modulated the biofilm production with unpredictable results for each strain. These results anticipated that many EOs may act mainly as biofilm inhibitors in the case of RP62A and O-47 strains, while for 6538P and 25923 EOs can either induce no effect or stimulate biofilm production (Table 4). In Table 4, the number of EOs able to inhibit (<100%, <80% and <50%, respectively) or stimulate (≥100%, ≥120%, ≥150% and ≥200%, respectively) biofilm formation is reported. It is worthy to note that on S. epidermidis strains about 30 EOs inhibited more than 50% of biofilm growth even at lowest concentration, while almost none of them showed an activity on S. aureus strains.

Quantitative Analysis of Selected EOs against Different Strains of S. epidermidis
Representative EOs selected among those able to reduce more than 70% of biofilm formation were further analyzed to evaluate a dose-dependent effect against S. epidermidis RP62A and O-47 (Figures 4-6). The inhibition by RSEOs was confirmed at lower concentrations on both strains despite their different biofilm matrix composition and the inhibition of biofilm formation was clearly not dose-dependent ( Figure 4). Analogous results were obtained with FVEOs samples ( Figure 5).
Differently, CGEOs revealed a dose dependent biofilm inhibition being more pronounced on the strongest biofilm producer S. epidermidis O-47 than on S. epidermidis RP62A ( Figure 6).

Figure 3.
Percentages of biofilm production after treatment at two concentrations (3.125 mg/mL and 0.0488 mg/mL) for CGEOs against the four strains S. aureus 6538P (A) and 25923 (B), S. epidermidis RP62A (C) and O-47 (D), respectively). In the ordinate axis are is reported the percentage of bacterial biofilm production. The abscissa axis is centered at 100% biofilm production. Data are reported as percentage of residual biofilm after the treatment in comparison with the untreated one. Each data point is composed of 4 independent experiments each performed with at least three replicates.

Quantitative Analysis of Selected EOs against Different Strains of S. epidermidis
Representative EOs selected among those able to reduce more than 70% of biofilm formation were further analyzed to evaluate a dose-dependent effect against S. epidermidis RP62A and O-47 (Figures 4-6). The inhibition by RSEOs was confirmed at lower concentrations on both strains despite their different biofilm matrix composition and the inhibition of biofilm formation was clearly not dose-dependent ( Figure 4). Analogous results were obtained with FVEOs samples ( Figure 5).
Differently, CGEOs revealed a dose dependent biofilm inhibition being more pronounced on the strongest biofilm producer S. epidermidis O-47 than on S. epidermidis RP62A ( Figure 6).

General Results
Analogously as in ML application to Pseudomonas aeruginosa (PA) [22] direct application of linear classification methods using algorithms such as Logistic Regression (LR) and Linear Support

General Results
Analogously as in ML application to Pseudomonas aeruginosa (PA) [22] direct application of linear classification methods using algorithms such as Logistic Regression (LR) and Linear Support Vector Machines (SVM) [25] did not lead to satisfying classifiers (data not shown). At the same time non-linear algorithms like random forest (RF) [26], non-linear support vector machine (SVM) [27] and gradient boosting (GB) [28] also led to insufficiently robust models (data not shown). Therefore, a mixed approach was used and taking the idea from the principal component regression (PCR) as an evolution of multiple linear regression (MLR) a number of PCs were used in place of the original variables (EOs chemical component percentages) as input for the sklearn LR implementation (PCLR). As an initial test, the PCLR was run on the PA dataset leading to highly overlapping results with those obtained with the GB application (data not shown). Nevertheless, as biofilm production assay profiled EOs as either inhibitors of activators (Table 4) accordingly, classification models were tentatively built for all four strains considering either biofilm production inhibition or activation for biofilm percentages observed at the two above introduced concentration levels of 48.8 µg/mL and 3.125 mg/mL. To this aim, initially the optimal biofilm production percentage cutoff for the binary classification was explored by systematically either decreasing it from a starting 80% to 60% or increasing from 120% to 140% for the inhibition or activation models, respectively, being the ranges arbitrarily chosen on the basis of Table 4 filled data. The models' accuracy was monitored by the MCC value obtained by leave-one-out cross-validation. Following this protocol, for the inhibition models EOs samples characterized by higher values than the best performing cutoff of biofilm production percentage were classified as inactive, while those with lower values were considered active. On the contrary, regarding the biofilm production enhancer models, EOs samples characterized by higher percentages than the cutoff value were classified as active, while those with lower values were considered non-active. Regarding the 6538P inhibition training set, the very low active/inactive ratio at biofilm inhibition below 80% prevented any optimization. Therefore, the grid search analysis to the starting sixteen training sets (four strains by two series of models by two concentrations) afforded to seven optimized models for either concentrations ( Table 5). Inspection of optimized models on both hyperparameters and cutoff values revealed for 25923/inhibition, RP62A/activation and O-47/activation sets composed of high unbalanced ratios of actives over non-actives and were hence not further analyzed. Comparing developed models for the two used EOs concentrations revealed 3.125 mg/mL level to lead to more reliable and robust models ( Table 6). Based on the above preliminary data, subsequent results and analyses were only carried out on RP62A/inhibition, O-47/inhibition, 6538P/activation and 259237/activation models derived for biofilm modulation recorded at 3.125 mg/mL. This is in full agreement with the fact that EOs samples acted prevalently as reducer of biofilm production for RP62A and O-47 strains, while for 6538P and 25923 the biofilm production was mainly enhanced (Table 4). 1 : number of principal components used in the model; 2 : number of EOs as inhibitors or enhancers of bacterial biofilm production: 3 : number of EOs as non-inhibitors or not-enhancers of biofilm production; 4 : optimal values of bacterial biofilm production percentage for binary classification as inhibitors/non-inhibitors or enhancers/not-enhancers of bacterial biofilm production. To assess either models' fitness and robustness, their lack of chance correlation was checked by Y-scrambling procedure whose 100 runs of cross-validated scrambled set led to average, standard deviation, maximum and minimum values for Accuracy Y-S , MCC Y-S , Precision-Recall Y-S and ROC-AUC Y-S ROC-AUC coefficients always lower than non-cross-validated and cross-validated ones, therefore assessing validity of all final models.

Binary Classification Model for 6538P Biofilm Production Activation
The 6538P/activation/3.125mg/mL optimized derived model was maximum at a cutoff of 133%, using  Table 8).   Table 8).

Binary Classification Model for RP62A biofilm production inhibition
The grid search on the EOs' chemical composition and their associated RP62A biofilm production inhibitory potencies at 3.125 mg/mL, identified 62% biofilm residual production as the best cutoff value with only 5 PCs and a actives over non-actives ratio of 31:58 (0.53). The final classification model was found characterized by Accuracy, MCC, Precision-Recall and ROC-AUC values of 0.721, 0.455, 0.657 and 0.742, respectively (Table 6). Cross-validation associated coefficients Accuracy CV , MCC CV , Precision-Recall CV and ROC-AUC CV were 0.687, 0.392, 0.584 and 0.683, respectively.
Inspection of model associated EOs' chemical components importance, the Skater algorithm indicated 3-octanol, phellandral, thymol and D-limonene as those mostly influencing biofilm production inhibition (Figure 8 and Table 7), whose positive control was highlighted by means of partial dependence plots which describe the marginal impact of a feature on model prediction (Supplementary Material Figure SM  AccuracyCV, MCCCV, Precision-RecallCV and ROC-AUCCV were 0.687, 0.392, 0.584 and 0.683, respectively. Inspection of model associated EOs' chemical components importance, the Skater algorithm indicated 3-octanol, phellandral, thymol and d-limonene as those mostly influencing biofilm production inhibition (Figure 8 and Table 7), whose positive control was highlighted by means of partial dependence plots which describe the marginal impact of a feature on model prediction (Supplementary Material Figure SM-2).   (Table 6). Y-scrambling application did not revealed the presence of any chance correlation ( Table 7). Inspection of feature importance and partial dependence pointed out as more significant for biofilm production inhibition the compounds 3-octanol, o-cymene, D-limonene and β-phellandrene (Figure 9, Supplementary Material Figure SM-3 and Table 8).
values for the Accuracy, MCC, Precision-Recall and ROC-AUC coefficients, respectively. Model robustness was assessed by AccuracyCV, MCCCV, Precision-RecallCV and ROC-AUCCV values of 0.738, 0.517, 0.589 and 0.659, correspondingly (Table 6). Y-scrambling application did not revealed the presence of any chance correlation ( Table 7). Inspection of feature importance and partial dependence pointed out as more significant for biofilm production inhibition the compounds 3-octanol, o-cymene, D-limonene and β-phellandrene (Figure 9, Supplementary Material Figure  (Tables 6 and 7). Feature importance and partial dependence pointed out compounds D-limonene, γ-terpinene, 3-octanol and piperitenone as more important for biofilm  Table 8).  Table  8).

EOs Biofilm Bioactivity General Consideration
From the results reported above it could be observed that each EO had a specific effect on biofilm formation, likely depending on its characteristics and unique chemical composition. In particular for S. aureus strains 6538P and 25923 the EOs mainly exhibited an enhancement of biofilm production. Stimulation of bacterial biofilm production by EOs is not surprising as it was previously observed, even by isolated chemical components [29][30][31]. On the other hand and more common [32,33], for S. epidermidis strains RP62A and O-47 an overall inhibition effect on biofilm production was observed by in vitro EOs treatment. Nevertheless, cinnamon EO was reported to stimulate biofilm production on some Staphylococcus epidermidis strains [34].

Bioactivity of RSEOs
The majority of tested RSEO samples did not show inhibitory effects on S. aureus 6538P biofilm formation (a partial inhibitory effect was observed only for R6 essential oil at 3.125 mg/mL, panel A of Figure 1). On the contrary, some RSEO samples (R6, R12, R24, RM4 and RM6) were shown to enhance biofilm formation by up to 140% at 0.0488 mg/mL Differently, several RSEO samples showed a good inhibitory effect on S. epidermids RP62A biofilm production. In particular 4 out of 13 EOs (R6, R24, RM4 and RM6) were able to potently inhibit biofilm formation with a rate of about 80% at either used concentrations (panel B of Figure 1), thus these EOs were selected for further analyses using scalar concentration of each EO starting from 0.0488 mg/mL. An attempt to determine a direct dose dependent effect was not effective (Figure 4).
On O-47 biofilm modulation (panel C of Figure 1) most RSEOs had a slight inhibition effect up to 40% (60% residual of biofilm production) at 0.0488 mg/mL. On the contrary, at the higher concentration RSEOs enhanced biofilm production by up to 130% for most samples. Only RM6 showed a remarkable biofilm production up to 160% at 3.125 mg/mL. For strain 25923 a profile similar to that of RP62A was observed (panel D of Figure 1), therefore no further investigation were pursued on the R6, R24, RM4 and RM6 samples despite the high inhibitory biofilm potency with residual biofilm production ranging 20-30%.

Bioactivity of FVEOs
Among all tested EOs, those from FV comprise among the most active samples able to inhibit biofilm production ( Figure 2). In particular only mild effects (positive or negative modulation of biofilm production) were observed on 6538P strain (panel A of Figure 2) with a few exception at both selected concentrations, including FA24, FS2, FO3 and FO6 that increased biofilm production by about 40-60%, and FOM1 that inhibited biofilm production by about 50% at 3.125 mg/mL. On the contrary, some FVEOs proved to be potent antibiofilm agents on S. epidermids RP62A (panel B of Figure 2). In particular 5 out 33 FVEOs (FO1, FO3, FO6, FO24 and FOM3) inhibited biofilm formation with a rate of about 80% (panel B of Figure 2) and were selected for further analyses using scalar concentration of each EO starting from 0.0488 mg/mL. Similarly as for the selected potent RSEOs, no direct dose dependent effect was determined ( Figure 5). Interestingly, on O-47 biofilm modulation most FVEOs showed a bioactivity profile almost overlapping that for RP62A with FO1, FO3, FO6, FO24 and FOM3 samples able to reduce biofilm formation of about 50-70% (panel C of Figure 2). While on both S. epidermidis strains RP62A and O-47, FVEOs displayed some peculiar samples with interesting biofilm inhibitory potencies in case of 25923 strain FVEOs displayed an overall bioactivity profile similar to that for RSEOs against O-47 (compare panel C of Figure 1 with panel D of Figure 2).

Bioactivity of CGEOs
EOs from CG are the most modulating biofilm producers either in positive (activators) or in negative (inhibitors). In particular in the case of strain 6538P all samples at either concentrations can be classified as neutral or biofilm promoters (panel A of Figure 3) with a strong inclination to increase biofilm production by up to 500% (CAM1). Other strong biofilm inducers (percentages over 300%) are CAM3, CAM5, CS1, CS3, CS6 and CS24 samples. Many other CGEOs, although to a lesser extent, induced a doubling or even tripling of biofilm production. On the contrary most of CGEOs displayed an inhibition by over 50% of biofilm production by RP62A (panel B of Figure 3). Many CGEO samples were further investigated and for 6 of them (CO2, CO6, COM5, CS2, CS6 and CSM5) a definite dose dependent relation was observed ( Figure 6). Regarding biofilm modulation for O-47 CGEOs in this case presented, at either tested concentrations, a mixed scenario in which some samples induced an enhanced biofilm production up to 250-350% (CAM5, CS1, CS3, CS6, CSM1 and CSM3) and 15 different samples showed high inhibition potencies (percentages of residual biofilm lower than 40-50%).

Machine Learning Classification Models
Application of the PCA coupled with logistic regression led to the formulation of 4 robust models that were characterized by quite good Accuracy, MCC, Precision-Recall and ROC-AUC values (Table 7). Model agnostic feature importance and partial dependence plots were used to find the marginal effect that each EO chemical component has on the predicted outcome of the binary classification models built on the 3.125 mg/mL response variables. Feature importance is a measure of the prediction error of the model after the feature's values are permuted and highlights the absolute importance of each chemical constituent while partial dependence plots show whether the relationship between the bioactivity and the chemical component is linear, monotonous or more complex.

Biofilm Activation ML Model on 6538P
Inspection of feature importance for model derived on 6538P biofilm percentage production and EOs' chemical compositions revealed 3-octanol, D-limonene and pulegone as the chemical components more associated to bacterial biofilm production ( Figure 7 and Table 8). Further investigation of their partial dependence plots (Supplementary Material Figure SM-2) indicated those three chemicals as all positively correlated with biofilm enhancement.

Biofilm Activation ML Model on 25923
Similarly as for 6538P, also for the 25923 strain a ML model was built to correlate biofilm production enhancement with EOs' chemical composition. Again, analysis of feature importance found as more important D-limonene, γ-terpinene, 3-octanol and piperitenone ( Figure 8). Differently as found for 6538P the main component were not all positively associated to biofilm enhancement production, but D-limonene, γ-terpinene, 3-octanol were suggested to negatively modulate the increase of biofilm, while piperitenone was found positively correlated (Table 8 and Supplementary Material Figure SM-3)

Biofilm Inhibition ML Model on RP62A
Differently from the previous model, feature importance associated to the EOs biofilm inhibition production on RP62A strain highlighted 3-octanol, phellandral, thymol, D-limonene as chemical compounds most important on modulating biofilm reduction ( Figure 9 and Table 8). Partial dependence plots for 3-octanol, phellandral, thymol, D-limonene associated the four components as all positively able to inhibit biofilm production (Supplementary Material Figure SM-4).

Biofilm Inhibition ML Model on O-47
Regarding ML model derived on the biofilm inhibition capability of EOs the compounds more responsible for biofilm production modulation were found to be 3-octanol, o-cymene, D-limonene and β-phellandrene ( Figure 10 and Table 8). Differently from above RP62A analogous inhibition model only 3-octanol and D-limonene were found positively associated with EOs' inhibitory ability by partial dependence plots (Supplementary Material Figure SM -5). On the contrary o-cymene and β-phellandrene were associated to a negative action on the inhibition. This could be speculated as a sort of anti-synergic effect that could balance EOs' potencies.

General Consideration on ML Models
According to the four classification models, two compounds, namely 3-octanol and D-limonene, can be considered as those that most influence biofilm production (Table 8). In particular, D-limonene positively correlated either in inhibiting or in enhancement of biofilm production in three out of the four models while has a negative modulation on the ML model built on the biofilm enhancement of EOs' on 25923 strain. These data indicate some controversial mechanism associated to D-limonene, it could be speculated that being this compound a highly apolar monoterpene its role could not be indirectly associated to biofilm modulation by altering the bacterial wall [35] allowing other compounds, likely oxygenated ones to enter the cell acting in altering some biochemical mechanism that could end in stimulation or inhibition of biofilm production. Nevertheless, on this topic the data available in the literature is controversial: Natcha and Caoili [36] reported that D-limonene is effective in inhibiting the growth of S. epidermidis RP62A when combined with the antibiotic rifampicin, likely due to D-limonene interference with biofilm formation. The effect of D-limonene in inhibiting bacterial biofilm formation was also proved against species of the genus Streptococcus [37] for which minimal biofilm inhibitory concentration (MBIC) of 400 µg/mL was determined. In a very recent study D-limonene was also reported as a biofilm inhibitor, although less efficient than an EO containing D-limonene [38]. On the contrary Kerekes et al. assayed a series of EOs and a list of chemical components against food-related micro-organisms and found D-limonene was almost deprived of any ability to inhibit biofilm production. In a study from Espina et al. D-limonene at 2000 µL/L was reported to reduce the production of biofilm mass in S. aureus USA300 by 90% after 8 h of incubation, but increase it by 30% after 40 h of incubation [39]. EOs containing D-limonene and the isolated component were found to stimulate biofilm production on Listeria monocytogenes and antibiotic-resistant Enterococcus faecalis strains [29][30][31]. A similar profile and speculation on 3-octanol could also be deduced. 3-Octanol is a molecule resembling normal octanol, a compound commonly used to evaluate compound membrane permeability and lipophilicity through the determination of the logP parameter often used in ADME and QSAR studies. Unfortunately no data are available on the influence of 3-octanol on biofilm production, except for a single report in which the 8-carbon molecules 1-octen-3-ol, 3-octanol and 3-octanone specifically induced conidiation in Trichoderma species colonies placed in the dark [40]. Considering the possible cell wall permeation role of both D-limonene and 3-octanol for strains 6538P and 25923 pulegone, γ-terpinene and piperitenone, on the basis of the ML elaboration, could be the main components responsible for the modulation of EOs' augmented biofilm production, while in the case RP62A and O-47 phellandral, thymol, o-cymene and β-phellandrene are mainly responsible for positively (phellandral and thymol) or negatively (o-cymene and β-phellandrene) modulating EOs' biofilm inhibition. Unfortunately no specific data data are available on these isolated components and the herein discussion although based on robust ML calculation are not experimentally based. It is worthy to note that the four bacterial strains tested here produced biofilms with different characteristics. First 6538P and 25923 belong to S. aureus species, while RP62A and O47 belong to S. epidermidis species. 25923 is classified as a strong biofilm producer, and 6538P is a medium/strong biofilm producer according to Cafiso and coworkers [41]. Proteins are the major component in the biofilm matrix of 6538P, while in 25923 the polysaccharides have a predominant role. As regards the S. epidermidis strains, they are both strong biofilm makers and produce a biofilm mainly composed by polysaccharides. Moreover O-47 is a naturally occurring agr mutant [42]. As previously reported [43], agr-negative genotype enhanced biofilm formation on polymer surfaces by an increased expression of the surface protein AtlE, a bifunctional adhesin/autolysin abundant in the cell wall of S. epidermidis. The amount of AtlE present in the cell envelop is one of the reported differences between RP62A and O-47 [43]. The overexpression of AtlE could induce significant changes in the hydrophobicity of the bacterial surface [44]; this effect could explain the different action of EOs on these two strains. Furthermore, the classification models were developed on the same EOs tested on P. aeruginosa biofilm production [22]. In that case, investigation of the most important components by means of feature importance and partial dependence plots indicated estragole and phellandral as the chemical components mostly related to biofilm inhibition of P. aeruginosa, while D-limonene, pulegone and chrysanthenone seem to be related to its biofilm production. Although the use of feature importance and partial dependence plots shed some light on the possible role of some EOs' components little is yet known on the role of the whole EOs mixture synergisms and anti-synergisms. Further studies on isolated EOs' chemical components and on their simple mixture are currently under evaluation to develop more refined ML models able to disclose more details on the EOs' mechanism of action.

Essential oil and Chemical Composition Analysis
EOs and their chemical compositions were available from previously reported studies [22,45,46]. Briefly, EOs were obtained by direct fractionated steam distillation and analyzed by a gas chromatographic/ mass spectrometric (GC/MS) protocol [47,48].

Bacterial Strains and Culture Conditions
Bacterial strains used in this work (Table 9) were grown in Brain Heart Infusion broth (BHI, Oxoid, UK). Biofilm formation was assessed in static conditions. Planktonic cultures were grown in flasks under vigorous agitation (180 rpm) at 37 • C. In particular, S. aureus ATCC 6538P (6538P) and S. aureus ATCC 25923 (25923) are reference strains for antimicrobial testing; S. epidermidis RP62A (RP62A) is a reference strain isolated from infected catheter, while S. epidermidis O-47 (O-47) is a clinical isolate strong biofilm producer strain characterized by a genomic mutation in agr locus [42].

Determination of Minimal Inhibitory Concentration (MIC)
The MIC was determined as the lowest concentration at which the observable bacterial growth was inhibited. MICs were determined according to the guidelines of Clinical Laboratory Standards Institute (CLSI [49]). Each EO was added directly from mother stock and solutions were prepared by two-fold serial dilutions. Mother stock solutions were obtained by solubilizing each EO in DMSO at a final concentration of 1 g/mL. Appropriate dilution (10 6 cfu/mL) of bacterial culture in exponential phase was used. Ten concentrations were used within the 25-0.045 mg/mL range. Experiments were performed in quadruplicate.

Biofilm Production Assay
The quantification of biofilm production was based on microtiter plate biofilm assay (MTP): an opportune dilution of bacterial culture in exponential growth phase was added into wells of a sterile 96-well flat-bottomed polystyrene plate in absence and in presence of each EO. Quantification of in vitro biofilm production was based on previously reported methodology [50]. Briefly, the wells of a sterile 96-well flat-bottomed polystyrene plate were filled with 100 µL of the appropriate medium. 1/100 dilution of overnight bacterial cultures was added into each well (about 0.5 OD 600nm). As control, the first row contained bacteria grown in 100 µL of BHI (untreated bacteria). In the second row was added BHI supplemented with each EO at concentrations of 3.125 mg/mL and 0.0488 mg/mL, respectively. The plates were incubated aerobically for 18 h at 37 • C. Biofilm formation was measured using crystal violet staining. After treatment, planktonic cells were gently removed; each well was washed three times with double-distilled water and patted dry with a piece of paper towel in an inverted position. To quantify biofilm formation, each well was stained with 0.1% crystal violet and incubated for 15 min at room temperature, rinsed twice with double-distilled water, and thoroughly dried. The dye bound to adherent cells was solubilized with 20% (v/v) glacial acetic acid and 80% (v/v) ethanol. After 30 min of incubation at room temperature, OD590 was measured to quantify the total biomass of biofilm formed in each well. Each data point is composed of 4 independent experiments, each performed at least in 3-replicates. EOs altering biofilm formation of selected strains were then tested as reported below. Briefly, the wells of a sterile 96-well flat-bottomed polystyrene plate were filled with 100 µL of the appropriate medium. 1/100 dilution of overnight bacterial cultures was added into each well (about 0.5 OD 600nm). As control, the first row contained bacteria grown in 100 µL of BHI (untreated bacteria). Furthermore, BHI broth was added to remaining wells starting from the third row. In the second row was added BHI supplemented with each EO at a concentration of 0.0488 mg/mL. Starting from this lane, samples were serially diluted (1:2 dilutions). The plates were incubated aerobically for 18 h at 37 • C. Biofilm formation was measured using crystal violet staining, as previously reported.

Statistical Analysis of Biological Evaluation
Data reported were statistically validated using Student's t-test comparing mean absorbance of treated and untreated samples. The significance of differences between mean absorbance values was calculated using a two-tailed Student's t-test. A p value of <0.05 was considered significant.

General Methods
All calculations were performed using the Python (version 3.6, https://www.python.org/) programming language [51] by executing in-house code in the Jupyter Notebook platform (version 5.7) [52]. The datasets were imported and loaded into a Pandas [53] dataframe and pre-processed to obtain four independent data matrices consisting of 89 rows (essential oil samples) and 54 columns (chemical components). Two dependent target vectors containing 89 biofilm production percentage observations at 48 µg/mL and 3.125 mg/mL were defined. Machine learning algorithms used in this study were implemented using the sklearn library (version 0.20) [54]. Unsupervised dimensionality reduction was performed with Principal component analysis (PCA) [55] while L2 regularized logistic regression was used for the supervised learning analysis. The scores and loadings relatives to the first two principal components (PCs) were graphically inspected on plots generated using the matplotlib library (version 3.0) [56]. To build the classification models, 30 PCs were extracted for each dataset. Cross-validation was used to search for the optimal inhibition/activation percentage cut-off values in order to define active and inactive samples. The optimal cut-off values were used to obtain the hyper-parameters optimized classification models. The Hyper-parameters optimization was achieved through a Bayesian optimization [57] of the number of PCs to be used as features and the regularization parameter of the L2-Logistic Regression (inverse of regularization strength in the sklearn implementation). For each dependent target vector two types of models were built: one to define EOs ability to inhibit biofilm production and another to describe biofilm production enhancement. Percentage ranges of 60-80% and 120-140% biofilm productions were chosen for inhibition and activation models, respectively. Finally, the most appropriate cut-offs for binary classification of biofilm inhibitors/not-inhibitors or biofilm enhancers/not-enhancers EOs were determined from a supervised learning analysis.
The binary classification models were numerically and graphically evaluated by accuracy (ACC), Matthews correlation coefficient (MCC), receiver operating characteristic (ROC) and precision-recall (PR) curves. Finally, the importance of EOs chemical components was evaluated individually through the "feature importance" and "partial dependence" plots [28] as implemented in the Skater python library [58,59]. Feature importance is a generic term for the degree to which a predictive model relies on a particular feature. Skater feature importance implementation is based on an information theoretic criteria, measuring the entropy in the change of predictions, given a perturbation of a given feature [58].

Classification Models' Validation
Validation of each classification model was carried out by leave-one-out cross-validation and taking into account the accuracy (ACC), the precision or positive predictive value (PPV), the recall or sensitivity or true positive rate (TPR), specificity or true negative rate (TNR), receiver operating characteristic (ROC) curve and the Matthews correlation coefficient (MCC) (see Supplementary  Information) [22,60]. Y-scrambling [61,62] was ultimately applied to check any lack of chance correlation and assess coefficients robustness.