Next Article in Journal
The Role of Anisakis sp. in α-Gal Sensitization: Implications for Parasitic-Induced Meat Allergy
Previous Article in Journal
Molnupiravir Inhibits Replication of Multiple Alphacoronavirus suis Strains in Feline Cells
Previous Article in Special Issue
Assessment of Antibiotic Resistance Among Isolates of Klebsiella spp. and Raoultella spp. in Wildlife and Their Environment from Portugal: A Positive Epidemiologic Outcome
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Core Perturbomes of Escherichia coli and Staphylococcus aureus Using a Machine Learning Approach

by
José Fabio Campos-Godínez
,
Mauricio Villegas-Campos
and
Jose Arturo Molina-Mora
*
Centro de Investigación en Enfermedades Tropicales, Centro de Investigación en Hematología y Trastornos Afines, Facultad de Microbiología, Universidad de Costa Rica, San José 30305, Costa Rica
*
Author to whom correspondence should be addressed.
Pathogens 2025, 14(8), 788; https://doi.org/10.3390/pathogens14080788
Submission received: 16 June 2025 / Revised: 24 July 2025 / Accepted: 5 August 2025 / Published: 7 August 2025
(This article belongs to the Collection New Insights into Bacterial Pathogenesis)

Abstract

The core perturbome is defined as a central response to multiple disturbances, functioning as a complex molecular network to overcome the disruption of homeostasis under stress conditions, thereby promoting tolerance and survival under stress conditions. Based on the biological and clinical relevance of Escherichia coli and Staphylococcus aureus, we characterized their molecular responses to multiple perturbations. Gene expression data from E. coli (8815 target genes—based on a pangenome—across 132 samples) and S. aureus (3312 target genes across 156 samples) were used. Accordingly, this study aimed to identify and describe the functionality of the core perturbome of these two prokaryotic models using a machine learning approach. For this purpose, feature selection and classification algorithms (KNN, RF and SVM) were implemented to identify a subset of genes as core molecular signatures, distinguishing control and perturbation conditions. After verifying effective dimensional reduction (with median accuracies of 82.6% and 85.1% for E. coli and S. aureus, respectively), a model of molecular interactions and functional enrichment analyses was performed to characterize the selected genes. The core perturbome was composed of 55 genes (including nine hubs) for E. coli and 46 (eight hubs) for S. aureus. Well-defined interactomes were predicted for each model, which are jointly associated with enriched pathways, including energy and macromolecule metabolism, DNA/RNA and protein synthesis and degradation, transcription regulation, virulence factors, and other signaling processes. Taken together, these results may support the identification of potential therapeutic targets and biomarkers of stress responses in future studies.

1. Introduction

Biological organisms require complex cellular and molecular interactions to ensure homeostasis and survival [1]. Several studies have revealed diverse molecular mechanisms that are coordinated through interaction networks and can explain the response to disturbances in many organisms [1,2,3,4,5]. Thus, developing new strategies for studying these interactions has been fundamental for understanding the biological processes and identifying different genotypic and phenotypic patterns, defined as consistent molecular changes, that serve as biomarkers of stress or biological states [6]. In this context, and given the importance of understanding central responses to cellular stress across organisms, the concept of the perturbome has been coined. Metabolic and signal transduction pathways are modulated following exposure to different perturbations, including a subset of shared or core pathways that are independent of the specific stressor, collectively referred to as the core perturbome (Figure 1), as shown in previous studies [1,7]. In one such study, a human cell model was used to describe diverse stress-response genes active upon drug exposure, suggesting the presence of a central control mechanism. The authors applied a framework to a large-scale imaging screen of cell morphology changes induced by diverse drugs and their combination, resulting in a network of 242 drugs and 1832 interactions [1]. In prokaryotes, the first perturbome was described for Pseudomonas aeruginosa, including a machine learning strategy implemented using a benchmarking strategy based on multiple data partition schemas and several classifiers to select genes guided by model performance metrics. The analysis identified 46 genes as part of the central response to perturbations, with biological functions related to biosynthesis, binding, and metabolism, DNA damage repair and aerobic respiration in the context of tolerance to stress [7].
At the transcriptional level, multiple studies have shown that distinct molecular responses can be detected within gene networks that are specific to each perturbation. These responses are closely linked to the modulation of various metabolic pathways, ensuring functional redundancy and robustness in the face of diverse stress stimuli [1,3,7]. Key contributors to stress responses include genes related to the SOS system (lexA, recA, dinB, umuDC, etc.) [8,9,10] and the general stress response mediated by RpoS response (sigma factor rpoS, RNAP, xthA, etc.) [8,11,12], whose roles have been extensively documented.
Although studies explicitly using the term “perturbome” are still limited, the concept has been successfully applied in cellular models in both eukaryotes and prokaryotes. In eukaryotic systems, direct relationships between functionally similar drugs and a specific cellular response have been established [1]. Other studies have also shown neuronal responses through molecular networks triggered by defined stressors [3]. In prokaryotes, related investigations have been conducted in Escherichia coli [4,5].
On the other hand, therapeutic interventions, such as antibiotics and biocides, are among the most potent stressors acting on bacterial pathogens. These agents disrupt microbial homeostasis and impose strong selective pressures, ultimately threatening the survival of the microorganisms [13].
However, the emergence of pathogens resistant to antibiotics and biocides constitutes a public health concern and is currently among the most significant critical global challenges [14,15]. In this context, elucidating the central molecular response to perturbations offers a valuable opportunity not only to describe the physiological strategies that bacteria employ to survive under stress conditions but also to identify potential biomarkers and therapeutic targets [7].
In this work, we investigated the molecular determinants of the core perturbome in E. coli and Staphylococcus aureus models. E. coli is a gram-negative, facultative anaerobic bacterium commonly found in various environments and involved in various infections across distinct hosts [16,17]. Over several decades, a vast arsenal of resistance genes has been found in E. coli, suggesting that this genus serves as a critical reservoir of determinants related to antibiotic resistance [18]. The second model, S. aureus, is a ubiquitous, gram-positive, facultative anaerobe frequently implicated in both nosocomial and community-acquired infections in humans and animals [19]. Of particular concern are methicillin-resistant S. aureus strains (MRSA), which are associated with high morbidity and mortality rates worldwide [20].
Given the public health relevance of these bacterial models and the increasing availability of high-throughput molecular technologies (e.g., microarrays and massively parallel sequencing), there is a pressing need for innovative computational strategies capable of handling and interpreting large-scale, complex datasets. In this regard, artificial intelligence (specifically machine learning) has emerged as a powerful tool for detecting and describing nontrivial patterns in massive molecular datasets [21]. Several studies have used machine learning algorithms to evaluate the impact of stressors on various biological organisms by identifying specific molecular responses, including models based on feature selection and classification tasks for accurate prediction of cellular states and the discovery of potential biomarkers from transcriptomic data [7]. Thus, machine learning has played a crucial role in uncovering patterns within molecular networks, enabling the identification of key and hub genes, pathways, and interactions that underlie complex biological responses in both eukaryotic organisms [22,23,24] and prokaryotes, such as Bacillus subtilis [25] and Listeria monocytogenes [26].
Overall, this study proposed to explore the perturbomes of E. coli and S. aureus through machine learning (specifically feature selection and classification) with transcriptomic data. Gene expression data from E. coli (8815 target genes based on pangenome, across 132 samples) and S. aureus (3312 target genes across 156 samples) were used. We hypothesized that the bacteria exposed to various perturbations would exhibit distinct transcriptomic signatures but would reveal a core molecular response characterized by the enrichment of metabolic and signal transduction pathways involved in stress responses. Based on this hypothesis, the specific goal was to identify and functionally characterize genes commonly associated with different perturbations in two prokaryotic models using machine learning.

2. Materials and Methods

The general strategy followed in this work is presented in Figure 2.

2.1. Selection of Biological Models and Transcriptomic Data

Two biological models, E. coli and S. aureus, were selected for this study. Publicly available transcriptomic datasets were retrieved from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/, accessed on 5 March 2021). For each organism, the high-throughput molecular platforms with the largest number of experiments and samples with available data were chosen:
The inclusion criteria for experiments and samples in each platform were as follows: (i) having information available regarding the type of perturbation to which the bacteria were exposed (antibiotics, detergents, or chemicals), (ii) having similar culture conditions, and (iii) having control conditions (i.e., unexposed to perturbations).
For E. coli, the final gene expression dataset was composed of 9 series with 87 samples for perturbations and 45 controls. For S. aureus, the final dataset was composed of 15 series with 92 perturbation cases and 64 controls. Descriptions of the datasets, including accession numbers, types of perturbations, and numbers of cases, are presented in Table 1.

2.2. Normalization

Transcriptomic data files (TAR format) were retrieved from the GEO database using Bioconductor (https://www.bioconductor.org/) in R software v4.2.2 (https://r-project.org/) with RStudio v2022.12.0 (https://rstudio.com) using classical functions for microarrays. Background correction, normalization, and summarization were performed with the Robust MultiArray Average algorithm (RMA) with the Affy package in Bioconductor [27].

2.3. Machine Learning Algorithms

Machine learning analyses were performed using the Caret package (caret.r-forge.r-project.org/) in RStudio/R software. In the first step, based on the complete dataset for each model, a feature selection approach was used to identify the most relevant genes contributing to each condition (control vs. perturbation). For this purpose, the correlation-based feature selection algorithm (Cfs) was used to reduce dimensionality [28], which identifies the most relevant features (genes) for distinguishing between classes. This approach selects a subset of features that are highly correlated with the target class (perturbations versus controls) but exhibit low intercorrelation among themselves.
In a second analysis, three classification algorithms were used to assess the effectiveness of dimensionality reduction based on the performance of the subset of genes in differentiating between the control and perturbation groups: support vector machine (SVM, kernel = “svmRadial”, epsilon = 0.1, complexity_C = 1.0, tolerance = 0.001) [29], K-nearest neighbors (KNN, algorithm = “LinearNNSearch”, Number_neighbours = 1) [30], and random forest (RF, num_slots = 1, bag% = 100, iterations = 100) [31]. Parameter tuning involved the use of the train() function and the “tuneGrid” option, in which specific model-dependent parameters were selected to be optimized. For RF, mtry (number of variables randomly sampled at each split) was tuned, while k (number of neighbors) was optimized for KNN. For SVM, cost (c) and sigma values were evaluated. Moreover, other classifiers were initially tested (logistic regression, rpart, logit-boost, and neural network) but were excluded after comparison (the three best cases at the training stage were selected for further analysis). All these algorithms considered a 10-fold cross-validation for training, similar to [7]. Due to the dependency of the results on data partitioning, three splits were applied for the training and testing steps: 70/30 (70% training and 30% testing), 80/20, and 90/10. These conditions were applied before and after gene selection. Performance metrics, including accuracy, kappa, precision, recall, true positives (TPs), false positives (FPs), and area under the receiver operating characteristic curve (AUC), were calculated. Selected genes were considered the key elements of the central response to multiple perturbations, i.e., the core perturbome members for each bacterial model.

2.4. Molecular Interactions and Functional Enrichment

Based on the list of candidate genes, corresponding identifiers, biological functions, and protein-level sequences were retrieved from the UniProt database (https://www.uniprot.org/id-mapping, accessed on 5 March 2021). Using a systems biology approach, sequences were employed to construct a model of molecular interactions (interactome) with the Search Tool for the Retrieval of Interacting Genes database (STRINGdb, https://string-db.org/) [32]. The interaction models were generated using default settings, incorporating evidence from experiments, co-expression, gene co-occurrence, text mining, and others, as well as a minimum required interaction score of 0.150.
The resulting graph was exported and visualized using Cytoscape software v 3.7.1 [33]. Hub genes were identified with the Cytohubba plugin [34] based on the top 5 nodes with the best values for degree, betweenness, and bottleneck topological metrics.
Finally, to investigate functional enrichment, protein sequences were analyzed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the tool KOALA (KEGG Orthology And Links Annotation, https://www.kegg.jp/blastkoala/, accessed on 5 March 2021, version 3.0) [35], allowing the identification of functional modules among the selected genes.

3. Results

3.1. Core Perturbome Genes of E. coli and S. aureus Can Be Identified Using a Machine Learning Approach

Transcriptomic microarray data comprising 8815 genes from E. coli (based on the pangenome from four strains) and 3312 genes from S. aureus (based on a single genome) were preprocessed for machine learning analyses. The Cfs feature selection algorithm was implemented to identify key elements capable of distinguishing between experimental classes (control vs. perturbations). Following this dimensionality reduction, a substantial decrease in the number of genes was achieved: 55 genes (0.62%) for E. coli and 46 for S. aureus (1.39%).
Three classification algorithms (SVM, KNN, and RF) and three data partitions (70/30, 80/20, and 90/10) were applied before and after dimensionality reduction. The accuracy or percentage of correctly classified instances was determined for each case, as shown in Table 2.
Among all algorithm–partition combinations for E. coli, the median accuracy was 56.52% when using the full dataset (8815 genes), but this value drastically increased to 82.61% when using the selected subset of 55 genes. Similarly, for S. aureus, the median accuracy was 74.5% for the complete dataset (3312 genes), which increased to 85.1% after feature selection (46 genes). Furthermore, under partitioning variations and algorithms, for both biological models, the best combination was the KNN classifier with a 90/10 partition. Nonetheless, other classifiers also performed satisfactorily after dimensionality reduction, with most configurations achieving an accuracy of above 70%. More details are provided in Table 2.
Given that model performance should not rely solely on accuracy, additional evaluation metrics were used to compare the classification models after dimensionality reduction, as shown in Table 3. Depending on the metric, RF outperformed other classifiers in several cases. For instance, RF yielded superior values for the kappa value, TP rate, F score, and AUC across multiple partitions for both biological models. Again, the results for other conditions (partitions and algorithms) also showed acceptable performance after gene selection, reinforcing the robustness of the selected features and classification strategy.

3.2. Biological Functions and Well-Defined Interactions Can Be Recognized for Genes of the Core Perturbome

Following gene selection, biological functions of the corresponding proteins were identified for each candidate gene, as shown in Table 4 and Table 5. Genes related to metabolic processes, transport, transcriptional regulators, and virulence factors were found in each model. More details are presented in the Supplementary Material. Subsequently, a systems biology approach was employed to construct molecular interaction networks for each model.
As shown in Figure 3 and Figure 4 for E. coli and S. aureus, respectively, well-defined interactions were obtained. For E. coli, topological metrics indicated that 42 nodes out of 55 selected genes (76.4%) were connected with 67 edges in total (six nodes were not connected, and six were not mapped to the database); nine hub gene products were recognized in the network, namely, marR, cueR, ecsC, and ycfJ. For S. aureus, 42 nodes out of 46 genes (91.3%) were connected through 123 edges (four unconnected genes), and the gene products recA, guaA, and sleA were among the eight hub genes in this interactome.
Finally, functional enrichment analysis based on KEGG ontologies revealed a diverse array of biological pathways shared across both models. These included energy and macromolecule metabolism, DNA/RNA and protein synthesis and degradation, transcription regulation, virulence factors, and pathways associated with human diseases (pathogenesis), and others, as depicted in Figure 5 and Table 6. These results were based on 35 (63.6%) annotated genes for E. coli and 41 (89.1%) annotated entries for S. aureus. Despite differences in the identities of the selected genes in the core perturbome in each model, both bacteria exhibited similar patterns of enriched biological modules (Table 6), suggesting conserved strategies in the bacterial response to stress.

4. Discussion

The core perturbome is defined as the central molecular response of an organism to multiple external disturbances [1,7]. This response functions as a complex molecular network that counteracts disruptions in cellular homeostasis, thereby promoting tolerance and survival under stress conditions [36]. In this study, we focused on the core perturbomes of E. coli and S. aureus, two bacterial species frequently implicated in infections across several hosts, including critical cases in humans with multidrug-resistant strains [4,37,38]. Characterizing the molecular responses to perturbations in those pathogens can provide valuable insights into potential therapeutic targets and biomarkers [39].
As in other studies, transcriptomic data served as a powerful resource for determining key elements involved in stress response, based on changes in gene expression and associations with phenotypic outcomes [40,41,42]. Although gene expression patterns have been studied via machine learning in other biological contexts [24,43,44,45,46], reports on the central response to stimuli in prokaryotic models remain limited. Some reports exist for Bacillus subtilis [25] and Listeria monocytogenes [26], while systematic analyses under the core perturbome framework for E. coli and S. aureus are largely absent. Alternative approaches have explored responses to multiple stressors in E. coli [5] and S. aureus [47], but comprehensive perturbome-level investigations have yet to be reported.
In our analysis, machine learning was used to identify core molecular signatures distinguishing control and perturbation conditions. Feature selection was performed using the Cfs algorithm, which removed irrelevant and redundant features, thereby enhancing the classifiers’ performance. This approach yielded a reduced set of genes with high predictive power: 55 (with nine hubs) for E. coli and 46 (with eight hubs) for S. aureus. These results are in line with previous reports in terms of magnitude. For example, network analysis revealed 24 central genes in E. coli [5], 122 genes in the sigmaB regulon of S. aureus [48], and 46 genes in the perturbome of Pseudomonas aeruginosa [7].
The assessment of selected genes using classification algorithms (SVM, RF, and KNN) and data partitions (70/30, 80/20, and 90/10) demonstrated a substantial improvement in model performance after feature selection in both prokaryotic models. The median accuracy was 82.6% for E. coli and 85.1% for S. aureus after dimensional reduction, in contrast to the median accuracies of 56.52% and 74.5%, respectively, for complete datasets. These results suggest that the selected subset of genes not only retained sufficient discriminatory power to classify the samples accurately but reduced noise effectively—an expected outcome of successful dimensional reduction. In transcriptomic profiling using massive amounts of molecular data, the extraction of relevant information and reducing noise by selecting a subset of relevant genes are still open problems [49]. Our approach, combining feature selection with robust machine learning classifiers, effectively addressed this challenge. The use of SVM, RF, and KNN, which usually outperform other classifiers in comparative strategies [50,51,52], was key to this success. In contrast, the other four classifiers (logistic regression, rpart, logit-boost, and neural network) that demonstrated suboptimal performance were excluded early in the analysis. For the selected classifiers, performance showed some differences across bacterial species. Based on all the metrics, and despite not large differences among classifiers, RF outperformed other algorithms for E. coli, while SVM showed superior performance for S. aureus. This situation is a common and expected behavior for different datasets (from two very different models, distinct microarray platforms, and diverse wet lab experiments used to generate transcriptomic data), as previously reported [30,46,53]. Given the data variability introduced by processing assays across different laboratories (different GEO projects), traditional approaches like differential expression analysis were unsuitable. A more robust method was, therefore, required to account for this variability, and machine learning proved effective in identifying meaningful patterns under these conditions.
Regarding the biological functionality of the selected genes, an orchestrated response was observed to work synergistically based on the modulation of metabolic pathways with interrelated genes, supported by gene annotation, network analysis, and functional enrichment analyses. In the case of hub genes, these are decisive regulators for transferring regulatory information through signaling, functioning as activators or repressors of case-specific operons/genes. Hubs not only have many connections with other genes within the network but also influence the expression and function of many other genes, acting as control centers in the network. Notably, transcription factors played a central role: six in E. coli (ybbI -hub-, caiF, RfaH -hub-, HexR, arsR, and marR -hub) and two in S. aureus (mtlR/SACOL2147 and MerR/SACOL2193 -hub-). The regulatory functions of these genes are associated with the control of efflux pump activity, porin expression, DNA repair mechanisms, and macromolecule transport, which together counteract the effects of antibiotics and genotoxic agents [54,55,56]. These regulated elements were found in our analysis. Furthermore, genes linked to protein synthesis, modulation of growth, and virulence factors were also identified in both bacterial models. These biological functions are consistent with other studies, indicating that these routes are involved in the physiological and metabolic changes that contribute to the tolerance, resisting stress and ensuring only essential functions to survive under stressful conditions [56,57].
Although no specific genes are directly linked to classical stress responses, such as the SOS or RpoS responses, several key pathways functionally related to these responses were enriched in the core perturbomes of E. coli and S. aureus [5,58,59]. For example, functional and enriched pathways associated with “DNA damage repair”, “energy and macromolecule metabolism”, “DNA/RNA and protein synthesis and degradation”, “transcription regulation”, “virulence factors”, and other signaling processes were consistently enriched in both models. These biological modules are well-documented components of bacterial adaptation under stress conditions [4,54,55,60]. Although part of the transcriptomic data is from non-pathogenic strains (E. coli K12, for example, as well as the consideration of genes from the pangenome with four strains), the selected genes largely belong to the conserved core genome, implying that these determinants likely play similar roles in pathogenic lineages.
Interestingly, the number of enriched pathways was relatively limited, which may reflect redundancy and robustness in the general response to stress [4,7]. This has been reported in other works. For instance, our previous study of the core perturbome of P. aeruginosa revealed 46 genes and a reduced number of pathways associated with biosynthesis, protein binding, and metabolism, many of which are related to DNA damage repair and aerobic respiration in the context of tolerance to stress [7]. In the study by [5] with E. coli, interactome analysis of 24 central proteins revealed the role of RNA binding, virulence factors, transporters, and DNA repair as important processes during the stress response.
Furthermore, the biological significance of perturbome analysis was underscored in our prior research with P. aeruginosa strain AG1 [61]. There, we identified core genes and those exclusively induced by ciprofloxacin exposure. Most of the genes that were not part of the core perturbome were resident prophages of the genome. These determinants were expressed in response to ciprofloxacin but not to other antibiotics. This observation, validated phenotypically, supports the potential utility of phages as endogenous modulators in therapeutic strategies, including phage therapy [61].
Moreover, a preliminary ortholog comparison among the studied models (P. aeruginosa from the previous study and E. coli and S. aureus from the present work) using OrthoFinder [62] revealed six shared genes (rho, fabF, tdcF, argS, sle1, and gtaB), suggesting a conserved role in the molecular stress response across these species. This is considered a robust selection from different bacteria and diverse experimental conditions to obtain transcriptomic data within each bacterial species. We are currently studying them by molecular docking to evaluate their druggability and eventually predict the in silico and in vitro effects using chemical compounds on the modulation of stress tolerance. These genes are being structurally modeled using PDB or AlphaFold, and in collaboration with a structural biology team, we are working to identify chemical inhibitors. We are also standardizing PCR protocols to quantify gene expression of the core perturbome for E. coli and S. aureus, as it was established for P. aeruginosa recently [63].
Regarding the limitations, this study was primarily affected by data availability, in which specific platforms were used to obtain complete data with comparable information. For example, for E. coli, the microarray data used were from a non-pathogenic strain (K-12 substrain MG1655) and the microarray platform was designed based on the pangenome with four strains; selection was based on data availability rather than relevance as a pathogen or obtained with more recent and advanced technologies such as RNA sequencing. Future analyses would benefit from incorporating RNA-seq data from pathogenic strains, ideally obtained using the same platform and under comparable experimental conditions to enhance consistency and biological relevance. Furthermore, the included transcriptomic datasets were generated under heterogeneous experimental setups, which may introduce variability. Although strict inclusion criteria were applied to minimize this source of bias, the lack of uniform conditions remains a limitation. Finally, while this study provides insights into gene expression responses at the transcriptomic level, additional validation of the identified genes through proteomic analyses and comprehensive phenotypic assays is necessary to confirm their functional roles in bacterial stress responses.

5. Conclusions

In conclusion, this study identified the core perturbomes of E. coli and S. aureus through a machine learning-based approach with transcriptomic data. Feature selection enabled effective dimensionality reduction, improving classification performance to median accuracies of 82.6% and 85.1% for E. coli and S. aureus, respectively, across multiple data partitions (70/30, 80/20, and 90/10). The core perturbomes comprised 55 genes (including nine hubs) for E. coli and 46 genes (including eight hubs) for S. aureus, including both old-acquaintance regulators (such as transcription factors) and new possible determinants of the response to stress. Functional and network analyses revealed enrichment in key biological processes, including pathways related to energy and macromolecule metabolism, DNA/RNA and protein synthesis and degradation, transcription regulation, virulence, and other signaling processes. These results provide new insights into the conserved and strain-specific molecular mechanisms that underpin bacterial adaptation to diverse stressors, offering a foundation for future research on antimicrobial targets and stress resilience in prokaryotes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/pathogens14080788/s1, Tables S1 and S2: Gene annotation for elements of the core perturbome; Tables S3 and S4: Correlation of genes and class (control or perturbation); Tables S5 and S6: Gene annotations for the complete dataset in the microarray.

Author Contributions

J.A.M.-M. participated in the funding, the conception and the design of the study. J.F.C.-G., M.V.-C. and J.A.M.-M. performed the experimental assays and data analysis. J.F.C.-G. and J.A.M.-M. drafted the manuscript. J.F.C.-G., M.V.-C. and J.A.M.-M. were involved in the revision and final approval of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by Vicerrectoría de Investigación, Universidad de Costa Rica with the projects “C1163 pro-NGS 2.0: Protocolos operativos estandarizados de análisis de datos moleculares obtenidos por NGS o afines y de algoritmos de inteligencia artificial en modelos biológicos”, “C4604 iPAT: Plataforma genómica, bioinformática y de inteligencia artificial para la vigilancia de patógenos”, and “C5027 PAM-IA Patrones moleculares y clínico-demográficos en bases de datos masivos del cihata asociadas a tres patologías estudiadas con Inteligencia Artificial”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Public raw data used in this study can be retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo/, accessed on 5 March 2025) based on the Series ID reported in Table 1. Pipelines to access data, normalization analysis, machine learning methods, and normalized data are available at: https://github.com/josemolina6/Perturbome.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AUCArea under the receiver operating characteristic curve
CfsCorrelation-based feature selection
KNNK-nearest neighbors
RFRandom forest
SVMSupport vector machine

References

  1. Caldera, M.; Müller, F.; Kaltenbrunner, I.; Licciardello, M.P.; Lardeau, C.H.; Kubicek, S.; Menche, J. Mapping the perturbome network of cellular perturbations. Nat. Commun. 2019, 10, 5140. [Google Scholar] [CrossRef]
  2. Bermingham, M.L.; Pong-Wong, R.; Spiliopoulou, A.; Hayward, C.; Rudan, I.; Campbell, H.; Wright, A.F.; Wilson, J.F.; Agakov, F.; Navarro, P. Application of high-dimensional feature selection: Evaluation for genomic prediction in man. Sci. Rep. 2015, 5, 10312. [Google Scholar] [CrossRef]
  3. Sadeh, S.; Clopath, C. Theory of Neuronal Perturbome: Linking Connectivity to Coding via Perturbations. bioRxiv 2020. bioRxiv: 2020.02.20.954222. [Google Scholar] [CrossRef]
  4. Dragosits, M.; Mozhayskiy, V.; Quinones-Soto, S.; Park, J.; Tagkopoulos, I. Evolutionary potential, cross-stress behavior and the genetic basis of acquired stress resistance in Escherichia coli. Mol. Syst. Biol. 2014, 9, 643. [Google Scholar] [CrossRef]
  5. Nagar, S.D.; Aggarwal, B.; Joon, S.; Bhatnagar, R.; Bhatnagar, S. A Network Biology Approach to Decipher Stress Response in Bacteria Using Escherichia coli As a Model. OMICS 2016, 20, 310–324. [Google Scholar] [CrossRef]
  6. KC, K.; Li, R.; Cui, F.; Yu, Q.; Haake, A.R. GNE: A deep learning framework for gene network inference by aggregating biological information. BMC Syst. Biol. 2019, 13, 38. [Google Scholar] [CrossRef] [PubMed]
  7. Mora, J.A.M.; Montero-Manso, P.; García-Batán, R.; Campos-Sánchez, R.; Fernández, J.V.; García, F. A first perturbome of Pseudomonas aeruginosa: Identification of core genes related to multiple perturbations by a machine learning approach. Biosystems 2021, 205, 104411. [Google Scholar] [CrossRef]
  8. Trastoy, R.; Manso, T.; Fernández-García, L.; Blasco, L.; Ambroa, A.; del Molino, M.L.P.; Bou, G.; García-Contreras, R.; Wood, T.K.; Tomás, M. Mechanisms of bacterial tolerance and persistence in the gastrointestinal and respiratory environments. Am. Soc. Microbiol. 2018, 31, e00023-18. [Google Scholar] [CrossRef]
  9. Vollmer, A.C.; Belkin, S.; Smulski, D.R.; Van Dyk, T.K.; Larossa, R.A. Detection of DNA damage by use of Escherichia coli carrying recA’::lux, uvrA’::lux, or alkA’::lux reporter plasmids. Appl. Environ. Microbiol. 1997, 63, 2566–2571. [Google Scholar] [CrossRef] [PubMed]
  10. Valencia, E.Y.; Esposito, F.; Spira, B.; Blázquez, J.; Galhardo, R.S. Ciprofloxacin-mediated mutagenesis is suppressed by subinhibitory concentrations of amikacin in Pseudomonas aeruginosa. Antimicrob. Agents Chemother. 2016, 61, e02107-16. [Google Scholar] [CrossRef] [PubMed]
  11. Weber, H.; Polen, T.; Heuveling, J.; Wendisch, V.F.; Hengge, R. Genome-wide analysis of the general stress response network in Escherichia coli: σS-dependent genes, promoters, and sigma factor selectivity. Society 2005, 187, 1591–1603. [Google Scholar] [CrossRef]
  12. Galhardo, R.S.; Do, R.; Yamada, M.; Friedberg, E.C.; Hastings, P.J.; Nohmi, T.; Rosenberg, S.M. DinB upregulation is the sole role of the SOS response in stress-induced mutagenesis in Escherichia coli. Genetics 2009, 182, 55–68. [Google Scholar] [CrossRef]
  13. Khodaparast, L.; Wu, G.; Khodaparast, L.; Schmidt, B.Z.; Rousseau, F.; Schymkowitz, J. Bacterial Protein Homeostasis Disruption as a Therapeutic Intervention. Front. Mol. Biosci. 2021, 8, 681855. [Google Scholar] [CrossRef]
  14. Nwobodo, D.C.; Ugwu, M.C.; Oliseloke Anie, C.; Al-Ouqaili, M.T.; Chinedu Ikem, J.; Victor Chigozie, U.; Saki, M. Antibiotic resistance: The challenges and some emerging strategies for tackling a global menace. J. Clin. Lab. Anal. 2022, 36, e24655. [Google Scholar] [CrossRef]
  15. Murray, C.J.; Ikuta, K.S.; Sharara, F.; Swetschinski, L.; Aguilar, G.R.; Gray, A.; Han, C.; Bisignano, C.; Rao, P.; Wool, E.; et al. Global burden of bacterial antimicrobial resistance in 2019: A systematic analysis. Lancet 2022, 399, 629–655. [Google Scholar] [CrossRef]
  16. Suay-García, B.; Pérez-Gracia, M.T. Present and Future of Carbapenem-resistant Enterobacteriaceae (CRE) Infections. Antibiotics 2019, 8, 122. [Google Scholar] [CrossRef]
  17. Jenkins, C.; Rentenaar, R.J.; Landraud, L.; Brisse, S. 180—Enterobacteriaceae. In Infectious Diseases; Cohen, J., Powderly, W.G., Opal, S.M., Eds.; Elsevier: Amsterdam, The Netherlands, 2017; pp. 1565–1578.e2. [Google Scholar] [CrossRef]
  18. Poirel, L.; Madec, J.Y.; Lupo, A.; Schink, A.K.; Kieffer, N.; Nordmann, P.; Schwarz, S. Antimicrobial Resistance in Escherichia coli. Microbiol. Spectr. 2018, 6, 10–1128. [Google Scholar] [CrossRef]
  19. Li, L.; Yeaman, M.R.; Bayer, A.S.; Xiong, Y.Q. Phenotypic and Genotypic Characteristics of Methicillin-Resistant Staphylococcus aureus (MRSA) Related to Persistent Endovascular Infection. Antibiotics 2019, 8, 71. [Google Scholar] [CrossRef] [PubMed]
  20. Rağbetli, C.; Parlak, M.; Bayram, Y.; Guducuoglu, H.; Ceylan, N. Evaluation of Antimicrobial Resistance in Staphylococcus aureus Isolates by Years. Interdiscip. Perspect. Infect. Dis. 2016, 2016, 9171395. [Google Scholar] [CrossRef] [PubMed]
  21. Molina-Mora, J.A.; Herrera-Hidalgo, M.L. Inteligencia Artificial en Ciencias de Laboratorio: Conceptos, Aplicaciones y Escenario Actual en Costa Rica. Rev. Del. Col. De. Microbiól. Quím. Clín. 2025, 29, 1–13. Available online: https://revista.microbiologos.cr/wp-content/uploads/2025/01/Articulo-MOLINA-MORA-IA.pdf (accessed on 5 February 2025).
  22. Gupta, C.; Ramegowda, V.; Basu, S.; Pereira, A. Using Network-Based Machine Learning to Predict Transcription Factors Involved in Drought Resistance. Front. Genet. 2021, 12, 652189. [Google Scholar] [CrossRef]
  23. Tahmasebi, A.; Niazi, A.; Akrami, S. Integration of meta-analysis, machine learning and systems biology approach for investigating the transcriptomic response to drought stress in Populus species. Sci. Rep. 2023, 13, 847. [Google Scholar] [CrossRef]
  24. Ma, C.; Xin, M.; Feldmann, K.A.; Wang, X. Machine Learning-Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis. Plant Cell 2014, 26, 520–537. [Google Scholar] [CrossRef]
  25. Huang, Y.; Sinha, N.; Wipat, A.; Bacardit, J. A knowledge integration strategy for the selection of a robust multi-stress biomarkers panel for Bacillus subtilis. Synth. Syst. Biotechnol. 2023, 8, 97–106. [Google Scholar] [CrossRef]
  26. Hanes, R.; Zhang, F.; Huang, Z. Protein Interaction Network Analysis to Investigate Stress Response, Virulence, and Antibiotic Resistance Mechanisms in Listeria monocytogenes. Microorganisms 2023, 11, 930. [Google Scholar] [CrossRef]
  27. Irizarry, R.A.; Hobbs, B.; Collin, F.; Beazer-Barclay, Y.D.; Antonellis, K.J.; Scherf, U.; Speed, T.P. Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics 2018, 4, 249–264. [Google Scholar]
  28. Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand, 1999. Available online: https://ml.cms.waikato.ac.nz/publications/1999/99MH-Thesis.pdf (accessed on 5 March 2021).
  29. Vapnik, V. Estimation of Dependences Based on Empirical Data; Springer: Berlin/Heidelberg, Germany, 1982; Available online: https://dl.acm.org/citation.cfm?id=1098680 (accessed on 16 November 2018).
  30. Li, L.; Weinberg, C.R.; Darden, T.A.; Pedersen, L.G. Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 2001, 17, 1131–1142. Available online: http://www.ncbi.nlm.nih.gov/pubmed/11751221 (accessed on 16 November 2018). [CrossRef]
  31. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  32. Szklarczyk, D.; Gable, A.L.; Lyon, D.; Junge, A.; Wyder, S.; Huerta-Cepas, J.; Simonovic, M.; Doncheva, N.T.; Morris, J.H.; Bork, P.; et al. STRING v11: Protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019, 47, D607–D613. [Google Scholar] [CrossRef] [PubMed]
  33. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
  34. Chin, C.-H.; Chen, S.-H.; Wu, H.-H.; Ho, C.-W.; Ko, M.-T.; Lin, C.-Y. cytoHubba: Identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol. 2014, 8 (Suppl. 4), S11. [Google Scholar] [CrossRef] [PubMed]
  35. Kanehisa, M.; Sato, Y.; Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 2016, 428, 726–731. [Google Scholar] [CrossRef]
  36. DeLong, E.F. Prokaryotes: Prokaryotic Physiology and Biochemistry; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  37. McVicker, G.; Prajsnar, T.K.; Williams, A.; Wagner, N.L.; Boots, M.; Renshaw, S.A.; Foster, S.J. Clonal Expansion during Staphylococcus aureus Infection Dynamics Reveals the Effect of Antibiotic Intervention. PLoS Pathog. 2014, 10, e1003959. [Google Scholar] [CrossRef]
  38. World Health Organization. Guidelines for the Prevention and Control of Carbapenem-Resistant Enterobacteriaceae, Acinetobacter baumannii and Pseudomonas aeruginosa in Health Care Facilities; World Health Organization: Geneva, Switzerland, 2017; Available online: https://apps.who.int/iris/bitstream/handle/10665/259462/9789241550178-eng.pdf?sequence=1&ua=1 (accessed on 21 January 2020).
  39. Pinto, A.C.; de Sá, P.H.C.G.; Ramos, R.T.J.; Barbosa, S.; Barbosa, H.P.M.; Ribeiro, A.C.; Silva, W.M.; Rocha, F.S.; Santana, M.P.; de Paula Castro, T.L.; et al. Differential transcriptional profile of Corynebacterium pseudotuberculosis in response to abiotic stresses. BMC Genom. 2014, 15, 14. [Google Scholar] [CrossRef]
  40. Blasdel, B.G.; Chevallereau, A.; Monot, M.; Lavigne, R.; Debarbieux, L. Comparative transcriptomics analyses reveal the conservation of an ancestral infectious strategy in two bacteriophage genera. ISME J. 2017, 11, 1988–1996. [Google Scholar] [CrossRef]
  41. Chung, M.; Bruno, V.M.; Rasko, D.A.; Cuomo, C.A.; Muñoz, J.F.; Livny, J.; Shetty, A.C.; Mahurkar, A. Best practices on the differential expression analysis of multi-species RNA-seq. Genome Biol. 2021, 22, 121. [Google Scholar] [CrossRef]
  42. Li, L.; Tetu, S.G.; Paulsen, I.T.; Hassan, K.A. A transcriptomic approach to identify novel drug efflux pumps in bacteria. Methods Mol. Biol. 2018, 1700, 221–235. [Google Scholar] [CrossRef]
  43. Zhao, W.; Chen, J.J.; Perkins, R.; Wang, Y.; Liu, Z.; Hong, H.; Tong, W.; Zou, W. A novel procedure on next generation sequencing data analysis using text mining algorithm. BMC Bioinform. 2016, 17, 213. [Google Scholar] [CrossRef] [PubMed]
  44. Cornforth, D.M.; Dees, J.L.; Ibberson, C.B.; Huse, H.K.; Mathiesen, I.H.; Kirketerp-Møller, K.; Wolcott, R.D.; Rumbaugh, K.P.; Bjarnsholt, T.; Whiteley, M. Pseudomonas aeruginosa transcriptome during human infection. Proc. Natl. Acad. Sci. USA 2018, 115, E5125–E5134. [Google Scholar] [CrossRef] [PubMed]
  45. Glaab, E.; Bacardit, J.; Garibaldi, J.M.; Krasnogor, N. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS ONE 2012, 7, e39932. [Google Scholar] [CrossRef]
  46. Raza, K.; Hasan, A. A Comprehensive Evaluation of Machine Learning Techniques for Cancer Class Prediction Based on Microarray Data. Int. J. Bioinform. Res. Appl. 2015, 11, 397–416. [Google Scholar] [CrossRef]
  47. Ranganathan, N.; Johnson, R.; Edwards, A.M. The general stress response of Staphylococcus aureus promotes tolerance of antibiotics and survival in whole human blood. Microbiology 2020, 166, 1088. [Google Scholar] [CrossRef]
  48. Pané-Farré, J.; Jonas, B.; Förstner, K.; Engelmann, S.; Hecker, M. The σB regulon in Staphylococcus aureus and its regulation. Int. J. Med. Microbiol. 2006, 296, 237–258. [Google Scholar] [CrossRef] [PubMed]
  49. Bui, T.T.; Lee, D.; Selvarajoo, K. ScatLay: Utilizing transcriptome-wide noise for identifying and visualizing differentially expressed genes. Sci. Rep. 2020, 10, 17483. [Google Scholar] [CrossRef]
  50. Leung, R.K.K.; Wang, Y.; Ma, R.C.; Luk, A.O.; Lam, V.; Ng, M.; So, W.Y.; Tsui, S.K.; Chan, J.C. Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: A prospective case-control cohort analysis. BMC Nephrol. 2013, 14, 162. [Google Scholar] [CrossRef] [PubMed]
  51. Noi, P.T.; Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imagery. Sensors 2018, 18, 18. [Google Scholar] [CrossRef]
  52. Park, H.; Shimamura, T.; Imoto, S.; Miyano, S. Adaptive NetworkProfiler for Identifying Cancer Characteristic-Specific Gene Regulatory Networks. J. Comput. Biol. 2017, 25, 130–145. [Google Scholar] [CrossRef]
  53. Tabe-Bordbar, S.; Emad, A.; Zhao, S.D.; Sinha, S. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Sci. Rep. 2018, 8, 6620. [Google Scholar] [CrossRef]
  54. Sharma, P.; Haycocks, J.R.; Middlemiss, A.D.; Kettles, R.A.; Sellars, L.E.; Ricci, V.; Piddock, L.J.; Grainger, D.C. The multiple antibiotic resistance operon of enteric bacteria controls DNA repair and outer membrane integrity. Nat. Commun. 2017, 8, 1444. [Google Scholar] [CrossRef]
  55. Poole, K. Bacterial stress responses as determinants of antimicrobial resistance. J. Antimicrob. Chemother. 2012, 67, 2069–2089. [Google Scholar] [CrossRef]
  56. Andersson, D. The biological cost of mutational antibiotic resistance: Any practical conclusions? Curr. Opin. Microbiol. 2006, 9, 461–465. [Google Scholar] [CrossRef] [PubMed]
  57. Wiesch, P.S.Z.; Engelstädter, J.; Bonhoeffer, S. Compensation of fitness costs and reversibility of antibiotic resistance mutations. Antimicrob. Agents Chemother. 2010, 54, 2085–2095. [Google Scholar] [CrossRef] [PubMed]
  58. Storvik, K.A.M.; Foster, P.L. RpoS, the stress response sigma factor, plays a dual role in the regulation of Escherichia coli’s error-prone DNA polymerase IV. J. Bacteriol. 2010, 192, 3639–3644. [Google Scholar] [CrossRef]
  59. Cirz, R.T.; O’Neill, B.M.; Hammond, J.A.; Head, S.R.; Romesberg, F.E. Defining the Pseudomonas aeruginosa SOS response and its role in the global response to the antibiotic ciprofloxacin. J. Bacteriol. 2006, 188, 7101–7110. [Google Scholar] [CrossRef] [PubMed]
  60. Vihervaara, A.; Duarte, F.M.; Lis, J.T. Molecular mechanisms driving transcriptional stress responses. Nat. Rev. Genet. 2018, 19, 385–397. [Google Scholar] [CrossRef]
  61. Molina-Mora, J.A.; García, F. Molecular Determinants of Antibiotic Resistance in the Costa Rican Pseudomonas aeruginosa AG1 by a Multi-omics Approach: A Review of 10 Years of Study. Phenomics 2021, 1, 3. [Google Scholar] [CrossRef]
  62. Emms, D.M.; Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 238. [Google Scholar] [CrossRef]
  63. Molina-Mora, J.A.; Sibaja-Amador, M.; Rivera-Montero, L.; Chacón-Arguedas, D.; Guzmán, C.; García, F. Assessment of Mathematical Approaches for the Estimation and Comparison of Efficiency in qPCR Assays for a Prokaryotic Model. DNA 2024, 4, 189–200. [Google Scholar] [CrossRef]
Figure 1. Conceptualization of the core perturbome for biological systems. After exposure to a stressor, general molecular responses (core) are modulated jointly with other specific responses to the perturbation. The gray nodes indicate specific response to a given perturbation.
Figure 1. Conceptualization of the core perturbome for biological systems. After exposure to a stressor, general molecular responses (core) are modulated jointly with other specific responses to the perturbation. The gray nodes indicate specific response to a given perturbation.
Pathogens 14 00788 g001
Figure 2. General pipeline for identifying core perturbome in E. coli and S. aureus by a machine learning approach. * For E. coli, microarray was designed based on the pangenome with four strains.
Figure 2. General pipeline for identifying core perturbome in E. coli and S. aureus by a machine learning approach. * For E. coli, microarray was designed based on the pangenome with four strains.
Pathogens 14 00788 g002
Figure 3. Interactome of genes in the core perturbome of E. coli. Annotation of each gene product (light blue nodes) was used to model molecular interactions, resulting in 42 connected elements with 9 hub genes (pink nodes) as key determinants of the network. Red-label nodes identify transcription factors.
Figure 3. Interactome of genes in the core perturbome of E. coli. Annotation of each gene product (light blue nodes) was used to model molecular interactions, resulting in 42 connected elements with 9 hub genes (pink nodes) as key determinants of the network. Red-label nodes identify transcription factors.
Pathogens 14 00788 g003
Figure 4. Interactome based on genes of the core perturbome in S. aureus. Annotation of each gene product (light blue nodes) was used to model molecular interactions, resulting in 42 connected elements with 8 hub genes (pink nodes) as key determinants of the network. Red-label nodes identify transcription factors.
Figure 4. Interactome based on genes of the core perturbome in S. aureus. Annotation of each gene product (light blue nodes) was used to model molecular interactions, resulting in 42 connected elements with 8 hub genes (pink nodes) as key determinants of the network. Red-label nodes identify transcription factors.
Pathogens 14 00788 g004
Figure 5. Functional enrichment of genes in the core perturbome of E. coli and S. aureus. Annotation was based on KEGG ontologies, indicating modulation pathways related to metabolism, protein synthesis, transcription factors and others.
Figure 5. Functional enrichment of genes in the core perturbome of E. coli and S. aureus. Annotation was based on KEGG ontologies, indicating modulation pathways related to metabolism, protein synthesis, transcription factors and others.
Pathogens 14 00788 g005
Table 1. Description of datasets used to study the perturbomes of E. coli and S. aureus (from the NCBI-GEO platform).
Table 1. Description of datasets used to study the perturbomes of E. coli and S. aureus (from the NCBI-GEO platform).
ModelGEO-IDPerturbationStrainNumber of Samples
ControlPerturbation
E. coliGSE10159Cefzulodin, mecillinamK12 MG16552041
GSE10160
GSE10345BicyclomycinK12 MG165526
GSE13982Carbon monoxideK12 MG165544
GSE34275GlycerolK12 MG165566
GSE37026ColicineK12 MG165544
GSE44211PGRP, gentamicin, CCCPK12 MG165539
GSE53140Octanoic acidK12 MG165532
GSE56133Ampicillin, gentamicin, kanamycin, norfloxacin, H2O2K12 MG1655315
S. aureusGSE7944Berberine chlorideATCC2592333
GSE8135Rhein/cassic acidATCC2592333
GSE8861TriclosanNCTC8325 WT510
GSE10605Ortho-phenylphenolNCTC8325 WT510
GSE13203CryptotanshinoneATCC2592333
GSE13233Sodium houttuyfonateATCC2592333
GSE13236MagnololATCC2592333
GSE14669RamoplaninNCTC 832566
GSE15394FosfomycinATCC 292131423
GSE36231Oleic acidNCTC832533
GSE40448Ortho-Benzyl-Para-Chloro PhenolNCTC 8325510
GSE40449Para-Tert-AmylphenolNCTC 832548
GSE58938Licochalcone AATCC 2921322
GSE65750NisinATCC 2921322
GSE84485Benzimidazole derivative C162NCTC8325, ATCC2592333
Table 2. Assessment of the performance of the classification models before and after dimensionality reduction of the transcriptomic data for E. coli and S. aureus.
Table 2. Assessment of the performance of the classification models before and after dimensionality reduction of the transcriptomic data for E. coli and S. aureus.
ModelPartitionGene Dataset
(Number of Genes)
Correctly Classified Instances (%)
KNNSVMRF
E. coli70/30All (8815)67.767.744.1
Selected genes (55)82.570.688.2
80/20All (8815)56.565.239.1
Selected genes (55)82.691.391.3
90/10All (8815)54.681.854.6
Selected genes (55)90.981.881.8
S. aureus70/30All (3312)74.563.880.9
Selected genes (46)85.191.578.7
80/20All (3312)61.374.274.2
Selected genes (46)77.480.674.2
90/10All (3312)87.587.5100.0
Selected genes (46)93.893.787.5
Table 3. Assessment of the performance of the classification models after dimensionality reduction of the transcriptomic data for E. coli and S. aureus using different metrics.
Table 3. Assessment of the performance of the classification models after dimensionality reduction of the transcriptomic data for E. coli and S. aureus using different metrics.
ModelMetricsKNNSVMRF
70/3080/2090/1070/3080/2090/1070/3080/2090/10
E. coliAccuracy82.582.690.970.691.381.888.291.381.8
Kappa65.065.481.342.982.762.076.582.587.1
TP rate82.482.690.970.691.381.888.291.393.8
FP rate16.416.710.926.18.021.811.28.73.8
Precision84.583.792.281.992.686.488.891.394.6
Recall82.482.690.970.691.381.888.291.393.8
F score82.282.590.868.491.380.888.291.393.8
AUC83.083.090.072.291.780.096.998.1100.0
S. aureusAccuracy85.177.493.891.580.693.778.774.287.5
Kappa70.055.087.182.458.487.155.344.673.3
TP Rate85.177.493.891.580.693.878.774.287.5
FP Rate14.820.63.810.524.73.825.431.514.2
Precision85.279.294.692.682.194.681.974.887.5
Recall85.177.493.891.580.693.878.774.287.5
F score85.177.693.891.379.993.877.673.187.5
AUC85.278.495.099.196.2100.076.671.486.7
Table 4. Gene annotation for elements of the core perturbome in E. coli.
Table 4. Gene annotation for elements of the core perturbome in E. coli.
ID (Array)Gene NamesProtein IDStringIDAnnotation
c0820c0820A0A0H2V608Not mappedUncharacterized protein
c1618c1618A0A0H2V6W5199310.c1618YmgI protein
c1419c1419A0A0H2V7D7Not mappedUncharacterized protein
c2755c2755A0A0H2V917Not mappedUncharacterized protein
c4081c4081A0A0H2VBA2199310.c4081Uncharacterized protein
c4088c4088A0A0H2VBF5Not mappedUncharacterized protein
c4086c4086A0A0H2VE07Not mappedUncharacterized protein
uhpBuhpB b3668 JW3643P09835511145.b3668Sensor histidine protein kinase UhpB
dnaTdnaT b4362 JW4326P0A8J2511145.b4362Primosomal protein DnaT
ybbI (hub)cueR copR ybbI b0487 JW0476P0A9G4511145.b0487Transcriptional regulator cueR, transcription factor
c1561essD ybcR b0554 JW0543P0A9R2511145.b0554Lysis protein S homolog from lambdoid prophage DLP12
fabFfabF fabJ b1095 JW1081P0AAI5511145.b10953-oxoacyl-[acyl-carrier-protein] synthase II
ycdOefeO ycdO b1018 JW1003P0AB24511145.b1018Iron uptake system component EfeO
ycfJ (hub)ycfJ b1110 JW1096P0AB35511145.b1110Hypothetical protein
b1171ymgD b1171 JW5177P0AB46511145.b1171Hypothetical protein ymgD precursor
cydAcydA cyd-1 b0733 JW0722P0ABJ9511145.b0733Cytochrome d terminal oxidase polypeptide subunit I
nirCnirC b3367 JW3330P0AC26511145.b3367Nitrite reductase activity
sdhAsdhA b0723 JW0713P0AC41511145.b0723Succinate dehydrogenase flavoprotein subunit
ygaCygaC b2671 JW2646P0AD53511145.b2671Hypothetical protein
caiFcaiF b0034 JW0033P0AE58511145.b0034Transcriptional regulator of cai operon, transcription factor
hdeAhdeA yhhC yhiB b3510 JW3478P0AES9511145.b3510Acid stress chaperone HdeA (10K-S protein)
hycHhycH hevH b2718 JW2688P0AEV7511145.b2718Formate hydrogenlyase maturation protein
yjbQyjbQ b4056 JW4017P0AF48511145.b4056Hypothetical protein
rfaH (hub)rfaH hlyT sfrB b3842 JW3818P0AFW0511145.b3842Transcriptional activator RfaH, transcription factor
rhorho nitA psuA rnsC sbaA tsu b3783 JW3756P0AG30511145.b3783Transcription termination factor Rho
rbsCrbsC b3750 JW3729P0AGI1511145.b3750D-ribose high-affinity transport system permease protein
b3113tdcF yhaR b3113 JW5521P0AGL2511145.b3113Putative reactive intermediate deaminase TdcF
b0161 (hub)degP htrA ptd b0161 JW0157P0C0V0511145.b0161Serine endoprotease (protease Do), membrane-associated
yijPeptC cptA yijP b3955 JW3927P0CB39511145.b3955Membrane protein
rcsC (hub)rcsC b2218 JW5917/JW5920P0DMC5511145.b2218Sensor for ctr capsule biosynthesis
nhaAnhaA ant b0019 JW0018P13738511145.b0019Na+/H antiporter
menDmenD b2264 JW5374P17109511145.b22642-oxoglutarate decarboxylase
malZmalZ b0403 JW0393P21517511145.b0403Maltodextrin glucosidase
marR (hub)marR cfxB inaR soxQ b1530 JW5248P27245511145.b1530Repressor of mar operon, transcription factor
marBmarB b1532 JW1525P31121511145.b1532Multiple antibiotic resistance protein
potFpotF b0854 JW0838P31133511145.b0854Periplasmic putrescine-binding permease protein
chaA (hub)chaA b1216 JW1207P31801511145.b1216Sodium-calcium/proton antiporter
yihTyihT b3881 JW3852P32141511145.b3881Putative aldolase
ybbBselU ybbB b0503 JW0491P33667511145.b0503Putative capsule anchoring protein
arsRarsR arsE b3501 JW3468P37309511145.b3501Arsenical resistance operon repressor, transcription factor
aldBaldB yiaX b3588 JW3561P37685511145.b3588Aldehyde dehydrogenase B
yddEyddE b1464 JW1459P37757511145.b1464Hypothetical protein
ytfGqorB qor2 ytfG b4211 JW4169P39315511145.b4211Putative oxidoreductase
yjiTyjiT b4342 JW5787P39391Not mappedHypothetical protein
ygjTalx ygjT b3088 JW5515P42601511145.b3088Putative membrane-bound redox modulator Alx
yraJyraJ b3144 JW3113P42915511145.b3144Outer membrane usher protein YraJ
ybcIybcI b0527 JW0516P45570511145.b0527Inner membrane protein YbcI
yebKhexR yebK b1853 JW1842P46118511145.b1853HTH-type transcriptional regulator HexR (Hex regulon repressor), transcription factor
yhcQ (hub)aaeA yhcQ b3241 JW3210P46482511145.b3241p-hydroxybenzoic acid efflux pump subunit AaeA (pHBA efflux pump protein A)
b1839yebY b1839 JW1828P64506511145.b1839Uncharacterized protein
c2390ypeC b2390 JW2387P64542511145.b2390Uncharacterized protein
yahMyahM b0327 JW5044P75692511145.b0327Uncharacterized protein
ycdYycdY b1035 JW1018P75915511145.b1035Chaperone protein YcdY
Z4985ysaB b4553 JW3532Q2M7M3511145.b4553Uncharacterized lipoprotein YsaB
yqhD (hub)yqhD b3011 JW2978Q46856511145.b3011Alcohol dehydrogenase YqhD
Table 5. Gene annotation for elements of the core perturbome in S. aureus.
Table 5. Gene annotation for elements of the core perturbome in S. aureus.
ID (Array)Gene NamesProtein IDStringIDAnnotation
SACOL0995ABD30052.1 SACOL0995A0A0H2WVP693061.SAOUHSC_00927Oligopeptide ABC transporter, oligopeptide-binding protein
SACOL1539ABD30669.1 SACOL1539A0A0H2WW3593061.SAOUHSC_01590Cytosolic protein
SACOL1360ABD30417.1 SACOL1360A0A0H2WW9493061.SAOUHSC_01319Aspartokinase
SACOL1169ABD30229.1 SACOL1169A0A0H2WWH193061.SAOUHSC_01115Staphylococcal complement inhibitor
SACOL2193 (hub)ABD31481.1 SACOL2193A0A0H2WWP993061.SAOUHSC_02461Transcriptional regulator, MerR family, transcription factor
SACOL1033ABD30087.1 SACOL1033A0A0H2WWU893061.SAOUHSC_00962IDEAL domain-containing protein
tcaBABD31642.1 tcaB SACOL2350A0A0H2WX3693061.SAOUHSC_02633Bcr/CflA family efflux transporter
SACOL2561ABD31859.1 SACOL2561A0A0H2WX8893061.SAOUHSC_02860Hydroxymethylglutaryl-CoA synthase
SACOL2731ABD32028.1 SACOL2731A0A0H2WXD293061.SAOUHSC_03045Cold shock protein CspA
SACOL2330ABD31623.1 SACOL2330A0A0H2WXH193061.SAOUHSC_02613MOSC domain-containing protein
cap5F (hub)ABD29300.1 cap5F SACOL0141A0A0H2WXH293061.SAOUHSC_00119Capsular polysaccharide biosynthesis protein Cap5F
SACOL0587ABD29671.1 SACOL0587A0A0H2WXZ993061.SAOUHSC_00523Methyltransferase small domain-containing protein
SACOL2551ABD31847.1 SACOL2551A0A0H2WY9293061.SAOUHSC_02846Acyl-CoA thioesterase
SACOL0959ABD30018.1 SACOL0959A0A0H2WYF893061.SAOUHSC_00893NADH-dependent flavin oxidoreductase, Oye family
SACOL2138ABD31420.1 SACOL2138A0A0H2WZ6493061.SAOUHSC_02389Cation efflux family protein
SACOL2147ABD31430.1 SACOL2147A0A0H2WZ6993061.SAOUHSC_02401Transcriptional antiterminator, BglG family/DNA-binding protein, transcription factor
SACOL1645ABD30766.1 SACOL1645A0A0H2WZH693061.SAOUHSC_01692ComE operon protein 2
SACOL2624 (hub)ABD31924.1 SACOL2624A0A0H2WZI593061.SAOUHSC_02929Putative long-chain fatty acid-CoA ligase VraA
SACOL2452ABD31749.1 SACOL2452A0A0H2X00093061.SAOUHSC_02743Amino acid ABC transporter, permease protein
SACOL2566ABD31865.1 SACOL2566A0A0H2X03493061.SAOUHSC_02866MmpL efflux pump, putative
SACOL1948ABD31154.1 SACOL1948A0A0H2X04493061.SAOUHSC_02104Uncharacterized protein
prmCprmC SACOL2109A0A0H2X05693061.SAOUHSC_02358Release factor glutamine methyltransferase PrmC
SACOL0102sbnC SACOL0102A0A0H2X06193061.SAOUHSC_00077Siderophore biosynthesis protein, IucC family
clpCclpC SA0483Q7A79793061.SAOUHSC_00505ATP-dependent Clp protease ATP-binding subunit ClpC
defdef def1 pdf1 SAV1091P6882593061.SAOUHSC_01038Peptide deformylase
drp35drp35 SACOL2712Q5HCK993061.SAOUHSC_03023Lactonase drp35
fdhDfdhD narQ SAV2280P6412093061.SAOUHSC_02550Sulfur carrier protein FdhD
fmtAfmtA fmt SACOL1066Q5HH2793061.SAOUHSC_00998Teichoic acid D-alanine hydrolase
glnA (hub)ABD30386.1 glnA SAV1310P6089093061.SAOUHSC_01287Glutamine synthetase
gtaBgtaB galU SACOL2508Q5HD5493061.SAOUHSC_02801UTP--glucose-1-phosphate uridylyltransferase
guaA (hub)guaA SAV0391P6429693061.SAOUHSC_00375GMP synthase [glutamine-hydrolyzing]
guaC (hub)guaC SAV1337P6056293061.SAOUHSC_01330GMP reductase
SAV1152ABD30221.1 SAV1152P6430993061.SAOUHSC_01107dITP/XTP pyrophosphatase
mprFmprF SACOL1396Q5HG5993061.SAOUHSC_01359Phosphatidylglycerol lysyltransferase
murImurI SAV1151P6363793061.SAOUHSC_01106Glutamate racemase
SACOL0944ABD30003.1 SACOL0944Q5HHE493061.SAOUHSC_00878Type II NADH:quinone oxidoreductase
SACOL2002ABD31208.1 SACOL2002Q5HEI293061.SAOUHSC_02161Membrane protein
pckApckA SAV1791P0A0B393061.SAOUHSC_01910Phosphoenolpyruvate carboxykinase
purApurA SAV0017P6588493061.SAOUHSC_00019Adenylosuccinate synthetase
rbsKrbsK SACOL0253A0A0H2WZY493061.SAOUHSC_00239Ribokinase
recA (hub)recA SAV1285P6884393061.SAOUHSC_01262Protein RecA
prfAprfA SAV2118P6601893061.SAOUHSC_02359Peptide chain release factor 1
rpmJrpmJ SAV2227P6629893061.SAOUHSC_02488Large ribosomal subunit protein bL36
sle1 (hub)sle1 aaa SACOL0507Q5HIL293061.SAOUHSC_00427N-acetylmuramoyl-L-alanine amidase sle1
argSargS SACOL0663Q5HI6093061.SAOUHSC_00611Arginine--tRNA ligase
SACOL0974ABD30032.1 SACOL0974Q5HHB593061.SAOUHSC_00907UPF0344 protein SACOL0974
Table 6. Functional enrichment of genes in the core perturbome of E. coli and S. aureus.
Table 6. Functional enrichment of genes in the core perturbome of E. coli and S. aureus.
E. coliS. aureus
Orthologs and modules
  • ko00001 KEGG Orthology (KO) (35)
Protein families: metabolism;
  • ko01000 Enzymes (12);
  • ko01001 Protein kinases (2);
  • ko01002 Peptidases and inhibitors (1);
  • ko01005 Lipopolysaccharide biosynthesis proteins (1);
  • ko01004 Lipid biosynthesis proteins (1).
Protein families: genetic information processing
  • ko03000 Transcription factors (6);
  • ko03021 Transcription machinery (1);
  • ko03019 Messenger RNA biogenesis (1);
  • ko03016 Transfer RNA biogenesis (1);
  • ko03110 Chaperones and folding catalysts (2);
  • ko03400 DNA repair and recombination proteins (1).
Protein families: signaling and cellular processes
  • ko02000 Transporters (8);
  • ko02044 Secretion system (1);
  • ko02022 Two-component system (2);
  • ko02035 Bacterial motility proteins (1);
  • ko01504 Antimicrobial resistance genes (1).
Orthologs and modules
  • ko00001 KEGG Orthology (KO) (41).
Protein families: metabolism
  • ko01000 Enzymes (23);
  • ko01002 Peptidases and inhibitors (1);
  • ko01011 Peptidoglycan biosynthesis and degradation proteins (1);
  • ko01004 Lipid biosynthesis proteins (1);
  • ko01007 Amino acid-related enzymes (1).
Protein families: genetic information processing
  • ko03000 Transcription factors (2);
  • ko03011 Ribosome (1);
  • ko03009 Ribosome biogenesis (1);
  • ko03016 Transfer RNA biogenesis (1);
  • ko03012 Translation factors (2);
  • ko03110 Chaperones and folding catalysts (1);
  • ko03400 DNA repair and recombination proteins (1);
  • ko03029 Mitochondrial biogenesis (1).
Protein families: signaling and cellular processes
  • ko02000 Transporters (4);
  • ko02044 Secretion system (1);
  • ko04147 Exosome (1);
  • ko01504 Antimicrobial resistance genes (1).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Campos-Godínez, J.F.; Villegas-Campos, M.; Molina-Mora, J.A. Core Perturbomes of Escherichia coli and Staphylococcus aureus Using a Machine Learning Approach. Pathogens 2025, 14, 788. https://doi.org/10.3390/pathogens14080788

AMA Style

Campos-Godínez JF, Villegas-Campos M, Molina-Mora JA. Core Perturbomes of Escherichia coli and Staphylococcus aureus Using a Machine Learning Approach. Pathogens. 2025; 14(8):788. https://doi.org/10.3390/pathogens14080788

Chicago/Turabian Style

Campos-Godínez, José Fabio, Mauricio Villegas-Campos, and Jose Arturo Molina-Mora. 2025. "Core Perturbomes of Escherichia coli and Staphylococcus aureus Using a Machine Learning Approach" Pathogens 14, no. 8: 788. https://doi.org/10.3390/pathogens14080788

APA Style

Campos-Godínez, J. F., Villegas-Campos, M., & Molina-Mora, J. A. (2025). Core Perturbomes of Escherichia coli and Staphylococcus aureus Using a Machine Learning Approach. Pathogens, 14(8), 788. https://doi.org/10.3390/pathogens14080788

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop