Next Article in Journal
Multidimensional Regulatory Mechanisms of LvChia2 on Growth in the Pacific White Shrimp (Litopenaeus vannamei)
Previous Article in Journal
Predicting Gene Expression Responses to Cold in Arabidopsis thaliana Using Natural Variation in DNA Sequence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Network Analysis and Machine Learning Identifies Key Autism Spectrum Disorder Genes Linked to Immune Dysregulation and Therapeutic Targets

Pinghu Normal College, Jiaxing University, Jiaxing 314041, China
*
Author to whom correspondence should be addressed.
Genes 2025, 16(9), 1109; https://doi.org/10.3390/genes16091109
Submission received: 14 August 2025 / Revised: 8 September 2025 / Accepted: 10 September 2025 / Published: 19 September 2025
(This article belongs to the Section Bioinformatics)

Abstract

Background: Understanding the genetic mechanisms and identifying potential therapeutic targets are essential for clarifying Autism Spectrum Disorder (ASD) etiology and improving treatments. This study aims to bridge the gap between basic transcriptomic discoveries and clinical applications in ASD research. Methods: Differentially expressed genes (DEGs) of GSE18123 datase were identified. A protein–protein interaction (PPI) network was constructed. Functional enrichment analysis was performed to link genetic loci to relevant biological pathways. Connectivity Map (CMap) analysis was used to predict potential drugs. Furthermore, immune infiltration correlation analysis explored associations between key genes and immune cell subpopulations. Diagnostic performance of top genes was evaluated by receiver operating characteristic (ROC) analysis. Results: The functional enrichment analysis successfully revealed relevant biological processes associated with ASD, while the CMap analysis predicted potential drugs that were consistent with some clinical trial results. Random forest analysis selected ten key feature genes (SHANK3, NLRP3, SERAC1, TUBB2A, MGAT4C, TFAP2A, EVC, GABRE, TRAK1, and GPR161) with the highest importance scores for autism prediction. Immune infiltration analysis showed significant correlations in genes and multiple immune cell types, demonstrating complex pleiotropic associations within the immune microenvironment. ROC curve analysis indicated that most top genes had strong discriminatory power in differentiating ASD from controls, particularly MGAT4C (AUC = 0.730), highlighting its potential as a robust biomarker. Conclusions: This study effectively bridges the basic transcriptomic discoveries and clinical applications in ASD research. The findings contribute to a better understanding of the etiology of ASD and provide potential therapeutic leads. Future research could focus on validating these potential drugs in clinical studies, as well as further exploring the biological functions of the identified genes to develop more targeted and effective treatments for ASD.

1. Introduction

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by high clinical and genetic heterogeneity. Its core features are persistent deficits in social communication and social interaction, alongside restricted, repetitive patterns of behavior, interests, or activities. According to current international authoritative diagnostic criteria (such as DSM-5 and ICD-11), ASD is defined as a continuous “spectrum” disorder [1,2]. This definition supersedes previous subcategories based on symptom presentation and functional levels, emphasizing the continuous dimensional variation in symptoms ranging from mild to severe. The diagnosis of ASD is primarily based on behavioral observations and developmental history assessment, requiring comprehensive judgment by specialized clinicians using standardized tools. It is typically identifiable in early childhood (ages 2–3 years), although significant individual variations exist in the age of diagnosis and symptomatic manifestations [3]. The present study aims to further explore the potential pathological mechanisms of ASD at the molecular level, thereby providing initial insights that could contribute to the long-term goal of precision medicine.
Recent research has yielded significant advances in deeply unraveling the etiology and pathophysiology of ASD. Diverse research approaches are progressively advancing our understanding of this complex condition across multiple dimensions and hierarchical levels. In the molecular genetics domain, large-scale genome-wide association studies (GWAS), whole-exome sequencing (WES), and copy number variation (CNV) analyses have identified hundreds of genetic loci significantly associated with ASD risk (SHANK3, NLGN3/4, CHD8), revealing abnormalities in key biological pathways involved in synaptic function, chromatin remodeling, and transcriptional regulation [4,5]. Neurobiological mechanism research increasingly relies on advanced multimodal neuroimaging techniques (fMRI, sMRI, dMRI), delineating abnormal early developmental trajectories of brain structure and functional connectivity networks in children with ASD [6].
Cellular and animal model studies utilize neurons/glia differentiated from induced pluripotent stem cells and genetically engineered animal models to mimic specific genetic defects through in vitro and in vivo experiments, deeply exploring ASD-related cellular behavioral alterations and identifying potential therapeutic targets [7,8]. Epidemiological and comorbidity studies focus on associations between environmental risk factors and ASD prevalence, while systematically analyzing the impact of common co-occurring disorders on individual functional outcomes [9]. Clinical translation and intervention research is dedicated to developing precision assessment tools and combined behavioral–pharmacological strategies to optimize individualized treatment plans [10]. Although the aforementioned research has substantially expanded the boundaries of our knowledge regarding ASD, the mechanisms for integrating and translating findings across these different levels of investigation remain incompletely elucidated. A critical challenge persists in establishing the causal links from genetic variants and aberrant neural circuitry to observable behaviors within the complex interactions of the biological system. Although research in cellular and animal models, epidemiology, comorbidity, and clinical translation has greatly expanded our knowledge of ASD, the mechanisms for integrating and translating findings across different levels of investigation remain incompletely understood. A major challenge lies in establishing the causal links from genetic variants and aberrant neural circuitry to observable behaviors within the complex interactions of the biological system.
By bridging the gap between basic transcriptomic discoveries and clinical applications, this work aims to contribute both mechanistic insights into ASD etiology and tangible leads for therapeutic development. The integration of multiple analytical dimensions from single gene differential expression to system level network pharmacology represents a significant methodological advance over previous fragmented approaches to ASD transcriptomics.

2. Materials and Methods

2.1. Data Acquisition and Preprocessing

The human autism and normal control blood sample microarray dataset GSE18123 was retrieved and downloaded from the National Center for Biotechnology Information (NCBI)/GEO [11]. From the GEO series GSE18123 (285 peripheral blood samples profiled on two platforms: GPL570 and GPL6244) comprising multiple diagnostic categories (ASD, Pervasive Developmental Disorder–Not Otherwise Specified [PDD-NOS], Asperger’s Disorder, and Controls), we restricted analyses to a homogeneous, single-platform subset to minimize diagnostic and technical heterogeneity. Specifically, we selected the GPL570 subset (99 samples) and included only those labeled as ASD and Control (31 ASD, 33 Control), excluding PDD-NOS and Asperger’s Disorder (n = 36) and all samples from GPL6244. This predefined filtering ensured a clear binary classification task (ASD vs. Control) and reduced potential confounding from cross-platform effects and mixed neurodevelopmental diagnoses.
The raw data were generated using the Affymetrix Human Genome U133 Plus 2. 0 Array platform (Affymetrix, Santa Clara, CA, USA) (GPL570), including the autism group (n = 31) and the control group (n = 33). R software (version: 4.2.2) and relevant Bioconductor packages (such as limma, version: 3.58.1, affy, version: 1.80.0) were used for background correction, normalization, and batch effect removal of the original expression matrix.

2.2. Differentially Expressed Gene (DEG) Analysis

Differential analysis between the autism and control groups was performed using the “limma” R package (version: 3.58.1) and a linear modeling approach. The screening criteria were |log2FC| > (1.5) and adjusted p-value (FDR) < 0.05. The results were visualized as volcano plots and heatmaps.

2.3. DEG Functional Enrichment Analysis

Gene Ontology (GO)/Kyoto Encyclopedia of Genes and Genomes (KEGG) [12,13]. Pathway enrichment analysis was conducted on the DEGs using the clusterProfiler R package (version: 4.10.1) and others, covering biological process (BP), molecular function (MF), and cellular component (CC) categories. Hypergeometric distribution was used for statistical testing (p adjust. method = “BH”, significance threshold p < 0.05). Enrichment results were displayed in a chord diagram and similar visualization forms. The DEG interaction network was obtained via the STRING database (https://string-db.org, confidence score threshold ≥ 0.4) and imported into Cytoscape software (version: 3.10.3) for visualization [14,15,16,17]. The confidence threshold of 0.4 in the PPI analysis using STRING refers to the combined score, which indicates that this value is effective in filtering out low confidence interactions while retaining a sufficient number of relevant connections for our analysis.

2.4. CMap Drug Prediction

Upregulated and downregulated DEGs were submitted to the Connectivity Map (CMap) online platform (https://clue.io) for drug reversal prediction [18,19]. Candidate drugs with the top (6) enrichment scores were identified as potential regulatory molecules.

2.5. GeneCard Disease-Related Gene Retrieval and Downstream Analysis

Genes related to autism (Autism Spectrum Disorder) were retrieved from the GeneCard database (https://www.genecards.org/), using a high relevance score threshold (score > 10) [20,21]. These were intersected with the DEGs to yield candidate key genes. The intersection genes were subjected to GO enrichment and PPI network analysis as described above.

2.6. Screening for Feature Genes Using Random Forest

Data were randomly divided into training (70%) and validation (30%) sets. Random forests were trained using the R randomForest package (version: 4.7-1.2) with ntree = 500 and nPerm = 5. OOB estimates were utilized to calculate MeanDecreaseGini importances [22]. Genes were ranked by MeanDecreaseGini; the ten highest-ranked genes were designated as the “Top 10” feature set. Predictive performance was assessed using OOB error on the training set and held-out validation metrics (confusion matrix and ROC/AUC computed from predictions on the validation set).

2.7. Immune Landscape Analysis

Immune deconvolution analysis was performed using the R package “GSVA” (version: 1.46.x) to resolve the transcriptomic expression matrix into constituent proportions of diverse immune cell subtypes. Associations between the top-ranked ten pivotal genes and patterns of immune cell infiltration were quantitatively evaluated via Spearman or Pearson correlation analysis. Statistically significant correlations (p < 0.05) were visually represented through correlation heatmaps constructed with the “corrplot” R package (version: 0.95).

2.8. ROC Curve Analysis

The “pROC” R package (version: 1.19.0.1) was used to calculate the area under the curve (AUC) of the top 10 feature genes and to assess their diagnostic value in ASD screening. An AUC greater than 0.7 was considered indicative of good discriminative ability. Visualizations included both single-gene and combined ROC curves.

3. Results

3.1. Differentially Expressed Gene (DEG) Screening Results

Through differential analysis of the GSE18123 autism and normal control blood microarray data, a total of (446) differentially expressed genes (DEGs) were identified, including (255) upregulated and (191) downregulated genes. The distribution of DEGs is illustrated in the volcano plot (Figure 1A,B). Certain genes, such as (HLA) and (CALCA), showed significantly altered expression between the two groups.

3.2. DEG Functional Enrichment Analysis and PPI Network Construction

GO enrichment analysis of the screened DEGs revealed that these genes were mainly involved in biological processes such as regulation of membrane potential, cellular components such as the synaptic membrane, and molecular functions such as gated channel activity. Specifically, significant GO terms included (GO:0042391, GO:0035637, GO:0097060 et. al) (p < 0.05). Detailed enrichment results are shown in Figure 1C–F and Table 1. A DEG protein–protein interaction (PPI) network was constructed using the STRING database and Cytoscape (Figure 1G,H).

3.3. CMap Drug Prediction Results and Biological Significance

Upregulated and downregulated DEGs were entered into the CMap website for drug prediction analysis. A total of (18,937) small molecule compounds were predicted to inversely match the DEG expression signatures. The most promising candidate drugs are listed in Table 2. These compounds may represent potential therapeutic targets for autism (Figure 2).

3.4. GeneCard Intersection Analysis

GeneCard database retrieval of highly scored (>10) autism-related genes yielded 2715 candidate genes. Intersecting these with the DEGs resulted in 70 candidate genes. The screening process of intersection genes is displayed in the Venn diagram (Figure 3A). Figure 3B–E show further GO analysis, including the BP, CC, MF, and KEGG pathways.

3.5. Random Forest Selection of Top 10 Feature Genes

The random forest model identified a compact subset of genes that contributed most significantly to class separation. Based on OOB-derived importance scores, the highest-impact genes were concentrated in the top decile, from which the “Top 10” feature set emerged: SHANK3, NLRP3, SERAC1, TUBB2A, MGAT4C, TFAP2A, EVC, GABRE, TRAK1, and GPR161. This panel accounted for the largest cumulative decrease in node impurity across trees and dominated split decisions within the forest, indicating that these genes drive the model’s discrimination between Autism and Control. Performance evaluation on held-out data demonstrated stable generalization, with a validation ROC AUC of 0.762 and error rates consistent with OOB estimates, reinforcing the robustness of the selected feature set for predictive modeling (Figure 4A–D).

3.6. Immune Infiltration Analysis

Correlation heatmap results revealed that the top 10 key genes exhibited significant correlations among various immune cell subpopulations (Figure 5A). The expression levels of immune-related genes ISR1 and ATP2B2 were closely associated with 13 distinct types of immune cells, demonstrating significant positive and negative correlations. Such dual-directional associations reveal complex pleiotropic effects within the immune microenvironment.

3.7. ROC Analysis of Top 10 Genes

ROC curve analysis evaluated the diagnostic potential of the top 10 key genes (Figure 5B). Most genes demonstrated robust discriminatory power with AUC values exceeding 0.70. MGAT4C exhibited the highest diagnostic accuracy (AUC = 0.730), while other significant performers included, EVC (0.720), SERAC1 (0.701), TRAK1 (0.702), and TUBB2A (0.700). GABPB1 (0.699), TFAP2A (0.66), GPR161 (0.655), and NLRP3 (0.625) showed relatively lower discriminatory capacity. Collectively, these results indicate strong diagnostic utility for most key genes in distinguishing between clinical groups, with GPR161 representing the most promising biomarker candidate (complete AUC data visualized in Figure 5B).

4. Discussion

ASD represents a clinically heterogeneous neurodevelopmental condition defined by persistent deficits in social communication, restricted behavioral repertoire, and stereotyped patterns of activity [23,24,25]. Contemporary diagnosis remains predicated on behavioral phenotyping, typically delaying intervention beyond critical neurodevelopmental windows [26,27,28].
Functional enrichment analysis demonstrated significant over-representation of DEGs in biological processes related to the regulation of membrane potential (GO:0042391, p = 2.85 × 10−7), multicellular organismal signaling (GO:0035637, p = 1.89 × 10−6), and transmission of nerve impulse (GO:0019226, p = 1 × 10−4). These findings bridge the gap between genetic susceptibility loci identified in GWAS studies and their functional consequences at the pathway level. Notably, the convergence of synaptic and immune pathways (e.g., complement activation in synaptic pruning) suggests potential mechanistic interplay that could explain the heterogeneity of ASD phenotypes. A study demonstrates the possible effect of transdermal nicotine when applied during waking hours in reducing aggression and improving sleep quality [29]. This suggests a possible association with nicotine addiction signaling pathways (hsa05033). The observed enrichment of calcium signaling pathways (hsa04020) may have particular therapeutic relevance, given the known role of calcium channel mutations in ASD [30]. Our GO/KEGG enrichment highlights ion channel and synaptic terms (GO:0042391, GO:0097060, GO:0022836, hsa04727 and hsa04020). These pathways provide a mechanistic bridge to recent evidence that fetal-to-early postnatal zinc–copper rhythmic dysregulation precedes ASD phenotypes [31]. Since Zn/Cu modulate channel gating and postsynaptic receptor function, disrupted elemental rhythms may desynchronize membrane potential control and neurotransmission, linking early-life metallomic dynamics to synaptic dysfunction observed in ASD.
Our CMap analysis predicted six compounds with significant negative enrichment norm_cs scores (<−1.5), including the beta-adrenergic blocker and the tyrosine kinase inhibitor. These findings align with recent clinical trials showing modest benefits of everolimus in ASD patients with neurodegenerative problems [32,33]. The predicted drugs predominantly target neuroinflammatory pathways rather than directly correcting synaptic defects, suggesting that combination therapies may be necessary for optimal outcomes. However, the translational potential of these predictions requires careful evaluation given the limited representation of neural cell types in the CMap reference database [34].
Machine learning models, especially random forests, are used extensively in microbiome research due to their ease of understanding, excellent performance, and incorporated feature selection (via estimating feature importance) [35]. Using random forest to screen characteristic genes and performing ROC diagnostic analysis demonstrated that these genes have good diagnostic efficacy for ASD [36,37]. Current ASD diagnostics rely solely on the behavioral phenotypes of toddlers, and such approaches are limited by the age wherein ASD can be reliably diagnosed. The ROC curve analysis indicated that these signature genes may also have diagnostic value for ASD.
Immune deconvolution analysis revealed significant elevation of iDC (p = 2.29 × 10−9) and regulatory T cells (p = 4.65 × 10−7) in ASD patients, consistent with recent single-cell RNA-seq studies of ASD brains [38]. The strong correlation between SHANK3 expression and iDC (r = 0.66, p < 0.001) provides mechanistic support for the neuron–immune axis in ASD [39]. These findings should be interpreted in the context of methodological limitations, including the inherent noise in deconvoluting immune cell proportions from bulk RNA-seq data and potential confounding by peripheral inflammatory conditions unrelated to core ASD pathology.
In conclusion, this study analyzed ASD from the perspectives of drug prediction, signature gene screening, ROC diagnostic analysis, and immune infiltration, providing certain theoretical support for the diagnosis and treatment of ASD. However, further in-depth clinical research is still needed for practical application.

Author Contributions

H.W. and X.Z. wrote the main manuscript text, and H.Z. and W.C. prepared all figures. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge no funding in support of this research.

Institutional Review Board Statement

This manuscript does not contain any individual person’s private data (e.g., images, clinical details). All analyzed data were anonymized and publicly accessible through the GEO repository.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original dataset (GSE18123) analyzed during this study is publicly available in the NCBI Gene Expression Omnibus (GEO) repository at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse18123, accessed on 6 August 2024. reference number [11]. Genecard -The Human Gene Database at https://www.genecards.org/ accessed on 10 March 2025. reference number [21].

Acknowledgments

During the preparation of this manuscript, the authors used DeepSeek-R1 for the purposes of generating text. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. First, M.B.; Clarke, D.E.; Yousif, L.; Eng, A.M.; Gogtay, N.; Appelbaum, P.S. DSM-5-TR: Rationale, Process, and Overview of Changes. Psychiatr. Serv. 2023, 74, 869–875. [Google Scholar] [CrossRef]
  2. Valle, R. Schizophrenia in ICD-11: Comparison of ICD-10 and DSM-5. Rev. Psiquiatr. Salud Ment. (Engl. Ed.) 2020, 13, 95–104. [Google Scholar] [CrossRef]
  3. Tasnim, A.; Alkislar, I.; Hakim, R.; Turecek, J.; Abdelaziz, A.; Orefice, L.L.; Ginty, D.D. The developmental timing of spinal touch processing alterations predicts behavioral changes in genetic mouse models of autism spectrum disorders. Nat. Neurosci. 2024, 27, 484–496. [Google Scholar] [CrossRef]
  4. Uchino, S.; Waga, C. SHANK3 as an autism spectrum disorder-associated gene. Brain Dev. 2013, 35, 106–110. [Google Scholar] [CrossRef]
  5. Yasuda, Y.; Hashimoto, R.; Yamamori, H.; Ohi, K.; Fukumoto, M.; Umeda-Yano, S.; Mohri, I.; Ito, A.; Taniike, M.; Takeda, M. Gene expression analysis in lymphoblasts derived from patients with autism spectrum disorder. Mol. Autism 2011, 2, 9. [Google Scholar] [CrossRef]
  6. Subtirelu, R.; Writer, M.; Teichner, E.; Patil, S.; Indrakanti, D.; Werner, T.J.; Alavi, A. Potential Neuroimaging Biomarkers for Autism Spectrum Disorder: A Comprehensive Review of MR Imaging, fMR Imaging, and PET Studies. PET Clin. 2025, 20, 25–37. [Google Scholar] [CrossRef] [PubMed]
  7. Lo, L.H.; Lai, K. Dysregulation of protein synthesis and dendritic spine morphogenesis in ASD: Studies in human pluripotent stem cells. Mol. Autism 2020, 11, 40. [Google Scholar] [CrossRef]
  8. Bhandari, R.; Varma, M.; Rana, P.; Dhingra, N.; Kuhad, A. Taurine as a potential therapeutic agent interacting with multiple signaling pathways implicated in autism spectrum disorder (ASD): An in-silico analysis. Ibro Neurosci. Rep. 2023, 15, 170–177. [Google Scholar] [CrossRef]
  9. Schroder, C.M.; Broquere, M.A.; Claustrat, B.; Delorme, R.; Franco, P.; Lecendreux, M.; Tordjman, S. Therapeutic approaches for sleep and rhythms disorders in children with ASD. Encephale 2022, 48, 294–303. [Google Scholar] [CrossRef] [PubMed]
  10. Margari, A.; De Agazio, G.; Marzulli, L.; Piarulli, F.M.; Mandarelli, G.; Catanesi, R.; Carabellese, F.F.; Cortese, S. Autism spectrum disorder (ASD) and sexual offending: A systematic review. Neurosci. Biobehav. Rev. 2024, 162, 105687. [Google Scholar] [CrossRef] [PubMed]
  11. Kong, S.W.; Collins, C.D.; Shimizu-Motohashi, Y.; Holm, I.A.; Campbell, M.G.; Lee, I.; Brewster, S.J.; Hanson, E.; Harris, H.K.; Lowe, K.R.; et al. Characteristics and Predictive Value of Blood Transcriptome Signature in Males with Autism Spectrum sssDisorders [Data Set]. Gene Expression Omnibus. [Data Set]. Gene Expression Omnibus. 2012. Available online: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE18123 (accessed on 6 August 2024).
  12. Kanehisa, M.; Furumichi, M.; Sato, Y.; Ishiguro-Watanabe, M.; Tanabe, M. KEGG: Integrating viruses and cellular organisms. Nucleic. Acids. Res. 2021, 49, D545–D551. [Google Scholar] [CrossRef]
  13. Laboratories, K. KEGG: Kyoto Encyclopedia of Genes and Genomes. Available online: https://www.kegg.jp (accessed on 5 March 2025).
  14. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
  15. Consortium, C. Cytoscape 3.10.3. Available online: https://cytoscape.org (accessed on 25 February 2025).
  16. Huang, Z.; Lyu, Z.; Li, H.; You, H.; Yang, X.; Cha, C. Des-Arg(9) bradykinin as a causal metabolite for autism spectrum disorder. World J. Psychiatry 2024, 14, 88–101. [Google Scholar] [CrossRef]
  17. String. STRING v12.0. 2024. Available online: https://string-db.org/ (accessed on 5 March 2024).
  18. Subramanian, A.; Narayan, R.; Corsello, S.M.; Peck, D.D.; Natoli, T.E.; Lu, X.; Gould, J.; Davis, J.F.; Tubelli, A.A.; Asiedu, J.K.; et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 2017, 171, 1437–1452. [Google Scholar] [CrossRef]
  19. Institute, B. Connectivity Map (CMap). Available online: https://clue.io (accessed on 11 January 2024).
  20. Barshir, R.; Fishilevich, S.; Iny-Stein, T.; Zelig, O.; Mazor, Y.; Guan-Golan, Y.; Safran, M.; Lancet, D. GeneCaRNA: A Comprehensive Gene-centric Database of Human Non-coding RNAs in the GeneCards Suite. J. Mol. Biol. 2021, 433, 166913. [Google Scholar] [CrossRef]
  21. Suite, T.G. GeneCards—The Human Gene Database. 2025. Available online: https://www.genecards.org (accessed on 10 March 2025).
  22. Amdouni, J.; Conte, A.; Ippoliti, C.; Candeloro, L.; Tora, S.; Sghaier, S.; Hassine, T.B.; Fakhfekh, E.A.; Savini, G.; Hammami, S. Culex pipiens distribution in Tunisia: Identification of suitable areas through Random Forest and MaxEnt approaches. Vet. Med. Sci. 2022, 8, 2703–2715. [Google Scholar] [CrossRef]
  23. Puglisi, A.; Capri, T.; Pignolo, L.; Gismondo, S.; Chila, P.; Minutoli, R.; Marino, F.; Failla, C.; Arnao, A.A.; Tartarisco, G.; et al. Social Humanoid Robots for Children with Autism Spectrum Disorders: A Review of Modalities, Indications, and Pitfalls. Children 2022, 9, 953. [Google Scholar] [CrossRef] [PubMed]
  24. Rice, C.E.; Carpenter, L.A.; Morrier, M.J.; Lord, C.; Dirienzo, M.; Boan, A.; Skowyra, C.; Fusco, A.; Baio, J.; Esler, A.; et al. Defining in Detail and Evaluating Reliability of DSM-5 Criteria for Autism Spectrum Disorder (ASD) Among Children. J. Autism. Dev. Disord. 2022, 52, 5308–5320. [Google Scholar] [CrossRef]
  25. Clarke, E.B.; Lord, C. Social competence as a predictor of adult outcomes in autism spectrum disorder. Dev. Psychopathol. 2024, 36, 1442–1457. [Google Scholar] [CrossRef] [PubMed]
  26. Smith, L.E.; Greenberg, J.S.; Mailick, M.R. The family context of autism spectrum disorders: Influence on the behavioral phenotype and quality of life. Child Adolesc. Psychiatr. N. Am. 2014, 23, 143–155. [Google Scholar] [CrossRef] [PubMed]
  27. Gordon-Lipkin, E.; Foster, J.; Peacock, G. Whittling Down the Wait Time: Exploring Models to Minimize the Delay from Initial Concern to Diagnosis and Treatment of Autism Spectrum Disorder. Pediatr. Clin. N. Am. 2016, 63, 851–859. [Google Scholar] [CrossRef]
  28. Krol, A.; Feng, G. Windows of opportunity: Timing in neurodevelopmental disorders. Curr. Opin. Neurobiol. 2018, 48, 59–63. [Google Scholar] [CrossRef]
  29. Lewis, A.S.; van Schalkwyk, G.I.; Lopez, M.O.; Volkmar, F.R.; Picciotto, M.R.; Sukhodolsky, D.G. An Exploratory Trial of Transdermal Nicotine for Aggression and Irritability in Adults with Autism Spectrum Disorder. J. Autism Dev. Disord. 2018, 48, 2748–2757. [Google Scholar] [CrossRef]
  30. De Rubeis, S.; He, X.; Goldberg, A.P.; Poultney, C.S.; Samocha, K.; Cicek, A.E.; Kou, Y.; Liu, L.; Fromer, M.; Walker, S.; et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 2014, 515, 209–215. [Google Scholar] [CrossRef]
  31. Curtin, P.; Austin, C.; Curtin, A.; Gennings, C.; Arora, M.; Tammimies, K.; Willfors, C.; Berggren, S.; Siper, P.; Rai, D.; et al. Dynamical features in fetal and postnatal zinc-copper metabolic cycles predict the emergence of autism spectrum disorder. Sci. Adv. 2018, 4, eaat1293. [Google Scholar] [CrossRef]
  32. Asir, R.V.A.; Buzaeva, P.; Michaelevski, I. Unlocking the therapeutic potential of protein kinase inhibitors in neurodegenerative and psychiatric disorders. Explor. Drug Sci. 2025, 3, 100892. [Google Scholar] [CrossRef]
  33. London, E.B.; Yoo, J.H.; Fethke, E.D.; Zimmerman-Bier, B. The Safety and Effectiveness of High-Dose Propranolol as a Treatment for Challenging Behaviors in Individuals With Autism Spectrum Disorders. J. Clin. Psychopharmacol. 2020, 40, 122–129. [Google Scholar] [CrossRef] [PubMed]
  34. Lu, Z.; Chen, M.; Zong, Y.; Li, X.; Zhou, P. A Novel Analysis of CMAP Scans From Perspective of Information Theory: CMAP Distribution Index (CDIX). IEEE Trans. Biomed. Eng. 2023, 70, 1182–1188. [Google Scholar] [CrossRef]
  35. Seyed Tabib, N.S.; Madgwick, M.; Sudhakar, P.; Verstockt, B.; Korcsmaros, T.; Vermeire, S. Big data in IBD: Big progress for clinical practice. Gut 2020, 69, 1520–1532. [Google Scholar] [CrossRef]
  36. Wang, S.; Wang, Q.; Zhao, K.; Zhang, S.; Chen, Z. Exploration of the shared diagnostic genes and mechanisms between periodontitis and primary Sjogren’s syndrome by integrated comprehensive bioinformatics analysis and machine learning. Int. Immunopharmacol. 2024, 141, 112899. [Google Scholar] [CrossRef] [PubMed]
  37. Vaziri-Moghadam, A.; Foroughmand-Araabi, M. Integrating machine learning and bioinformatics approaches for identifying novel diagnostic gene biomarkers in colorectal cancer. Sci. Rep. 2024, 14, 24786. [Google Scholar] [CrossRef] [PubMed]
  38. Velmeshev, D.; Schirmer, L.; Jung, D.; Haeussler, M.; Perez, Y.; Mayer, S.; Bhaduri, A.; Goyal, N.; Rowitch, D.H.; Kriegstein, A.R. Single-cell genomics identifies cell type-specific molecular changes in autism. Science 2019, 364, 685–689. [Google Scholar] [CrossRef] [PubMed]
  39. Greaves-Lord, K.; Skuse, D.; Mandy, W. Innovations of the ICD-11 in the Field of Autism Spectrum Disorder: A Psychological Approach. Clin. Psychol. Eur. 2022, 4, e10005. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Identification and functional analysis of DEGs. (A,B) Volcano plots illustrating the distribution of differentially expressed genes (DEGs) between ASD and normal control samples in the GSE18123 blood microarray dataset. A total of 446 DEGs were identified, comprising 255 upregulated and 191 downregulated genes. (CF) Gene Ontology (GO) enrichment analysis results of DEGs, including significant biological processes (e.g., regulation of membrane potential), cellular components (e.g., synaptic membrane), and molecular functions (e.g., gated channel activity). Representative enriched GO terms included GO:0042391, GO:0035637, and GO:0097060, among others (p < 0.05). (G) Protein–protein interaction (PPI) networks of DEGs generated by the STRING database, PPI enrichment p-value: <1.0 × 10−16. (H) Protein–protein interaction (PPI) networks of DEGs visualized using Cytoscape, illustrating the interaction landscape among key DEGs. Red and green indicate upregulated and downregulated genes. Number of nodes 172, Number of edges 1692. DEGs were identified with p < 0.05. Abbreviations: ASD, Autism Spectrum Disorder; DEGs, differentially expressed genes; GO, Gene Ontology; PPI, protein–protein interaction.
Figure 1. Identification and functional analysis of DEGs. (A,B) Volcano plots illustrating the distribution of differentially expressed genes (DEGs) between ASD and normal control samples in the GSE18123 blood microarray dataset. A total of 446 DEGs were identified, comprising 255 upregulated and 191 downregulated genes. (CF) Gene Ontology (GO) enrichment analysis results of DEGs, including significant biological processes (e.g., regulation of membrane potential), cellular components (e.g., synaptic membrane), and molecular functions (e.g., gated channel activity). Representative enriched GO terms included GO:0042391, GO:0035637, and GO:0097060, among others (p < 0.05). (G) Protein–protein interaction (PPI) networks of DEGs generated by the STRING database, PPI enrichment p-value: <1.0 × 10−16. (H) Protein–protein interaction (PPI) networks of DEGs visualized using Cytoscape, illustrating the interaction landscape among key DEGs. Red and green indicate upregulated and downregulated genes. Number of nodes 172, Number of edges 1692. DEGs were identified with p < 0.05. Abbreviations: ASD, Autism Spectrum Disorder; DEGs, differentially expressed genes; GO, Gene Ontology; PPI, protein–protein interaction.
Genes 16 01109 g001
Figure 2. CMap-based prediction of small molecule compounds potentially targeting autism-related gene expression changes. (A) BRD-K42400758: A hydrophobic scaffold with pendant amide and amine groups. (B) BRD-K06328518: Aromatic core with nitro/oxo substituents. (C) BRD-K33663820: Complex heterocyclic framework with multiple heteroatoms. (D) Metoprolol: Beta-blocker scaffold shown to intersect with autism-relevant transcriptional programs in the dataset. (E) Afatinib: Epidermal growth factor receptor (EGFR) inhibitor-like structure. (F) BRD-K20197338: Linear amide-containing molecule.
Figure 2. CMap-based prediction of small molecule compounds potentially targeting autism-related gene expression changes. (A) BRD-K42400758: A hydrophobic scaffold with pendant amide and amine groups. (B) BRD-K06328518: Aromatic core with nitro/oxo substituents. (C) BRD-K33663820: Complex heterocyclic framework with multiple heteroatoms. (D) Metoprolol: Beta-blocker scaffold shown to intersect with autism-relevant transcriptional programs in the dataset. (E) Afatinib: Epidermal growth factor receptor (EGFR) inhibitor-like structure. (F) BRD-K20197338: Linear amide-containing molecule.
Genes 16 01109 g002
Figure 3. Intersection analysis of autism-related genes from GeneCard and DEGs, and functional enrichment. (A) Venn diagram illustrating the overlap between autism-related genes (GeneCard score > 10, n = 2715) and DEGs identified from the study. The intersection yielded 70 candidate genes. (BE) Functional enrichment analysis of the intersecting genes, including Gene Ontology (GO) enrichment for biological process (BP), cellular component (CC), molecular function (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Detailed enriched terms and pathways are shown.
Figure 3. Intersection analysis of autism-related genes from GeneCard and DEGs, and functional enrichment. (A) Venn diagram illustrating the overlap between autism-related genes (GeneCard score > 10, n = 2715) and DEGs identified from the study. The intersection yielded 70 candidate genes. (BE) Functional enrichment analysis of the intersecting genes, including Gene Ontology (GO) enrichment for biological process (BP), cellular component (CC), molecular function (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Detailed enriched terms and pathways are shown.
Genes 16 01109 g003
Figure 4. Identification of top 10 feature genes for autism prediction by random forest analysis. (A) Feature importance ranked by MeanDecreaseGini across all candidate genes; the top 10 contributors are highlighted in red, indicating a compact subset driving model decisions. (B) Confusion matrix summarizing predictions on the evaluation set, showing counts for Control and Autism classes and revealing both correct classifications and misclassifications. (C) ROC curve of the random forest classifier with an AUC of 0.762, reflecting moderate discriminative performance. (D) OOB error trajectory as a function of the number of trees; error stabilizes as the forest grows, supporting model robustness and generalization.
Figure 4. Identification of top 10 feature genes for autism prediction by random forest analysis. (A) Feature importance ranked by MeanDecreaseGini across all candidate genes; the top 10 contributors are highlighted in red, indicating a compact subset driving model decisions. (B) Confusion matrix summarizing predictions on the evaluation set, showing counts for Control and Autism classes and revealing both correct classifications and misclassifications. (C) ROC curve of the random forest classifier with an AUC of 0.762, reflecting moderate discriminative performance. (D) OOB error trajectory as a function of the number of trees; error stabilizes as the forest grows, supporting model robustness and generalization.
Genes 16 01109 g004
Figure 5. Immune infiltration correlation and diagnostic ROC analysis of key genes in autism. (A) Pairwise Spearman correlations between the top candidate genes and immune cell subsets. The heatmap reveals heterogeneous associations across innate and adaptive compartments; significant correlations are marked, suggesting coordinated regulation between specific genes and immune infiltration patterns. (B) Receiver-operating characteristic curves for ten marker genes assessed individually as diagnostic classifiers. Several genes reach acceptable-to-good performance with areas under the curve around or above 0.70. The strongest performer in this panel achieves an AUC near 0.73, while others cluster between approximately 0.62 and 0.71.
Figure 5. Immune infiltration correlation and diagnostic ROC analysis of key genes in autism. (A) Pairwise Spearman correlations between the top candidate genes and immune cell subsets. The heatmap reveals heterogeneous associations across innate and adaptive compartments; significant correlations are marked, suggesting coordinated regulation between specific genes and immune infiltration patterns. (B) Receiver-operating characteristic curves for ten marker genes assessed individually as diagnostic classifiers. Several genes reach acceptable-to-good performance with areas under the curve around or above 0.70. The strongest performer in this panel achieves an AUC near 0.73, while others cluster between approximately 0.62 and 0.71.
Genes 16 01109 g005
Table 1. Identification and functional analysis of DEGs.
Table 1. Identification and functional analysis of DEGs.
OntologyIDDescriptionGene RatioBg Ratiop Valuep AdjustZ Score
BPGO:0042391regulation of membrane potential11/67425/18,8002.85 × 10−70.00040.30151
BPGO:0035637multicellular organismal signaling7/67164/18,8001.89 × 10−60.00151.8898
BPGO:0019226transmission of nerve impulse4/6773/18,8000.00010.05542
BPGO:0001508action potential5/67143/18,8000.00020.05541.3416
BPGO:0035115embryonic forelimb morphogenesis3/6731/18,8000.00020.0554−0.57735
CCGO:0097060synaptic membrane11/70373/19,5948.13 × 10−81.54 × 10−50.30151
CCGO:0045211postsynaptic membrane9/70271/19,5945.05 × 10−74.77 × 10−5−0.33333
CCGO:1902495transmembrane transporter complex8/70377/19,5945.82 × 10−50.0033−0.70711
CCGO:0034702ion channel complex7/70294/19,5948.52 × 10−50.0033−0.37796
CCGO:1990351transporter complex8/70399/19,5948.64 × 10−50.0033−0.70711
MFGO:0022836gated channel activity9/65340/18,4102.89 × 10−60.0004−1
MFGO:0005216ion channel activity10/65442/18,4103.16 × 10−60.0004−0.63246
MFGO:0015267channel activity10/65489/18,4107.71 × 10−60.0005−0.63246
MFGO:0022803passive transmembrane transporter activity10/65490/18,4107.84 × 10−60.0005−0.63246
MFGO:0004714transmembrane receptor protein tyrosine kinase activity4/6560/18,4105.95 × 10−50.00300
KEGGhsa05033Nicotine addiction4/3840/81643.23 × 10−50.0038−1
KEGGhsa04727GABAergic synapse4/3889/81640.00070.0301−1
KEGGhsa04020Calcium signaling pathway6/38240/81640.00080.0301−0.8165
KEGGhsa04724Glutamatergic synapse4/38114/81640.00180.0454−2
KEGGhsa04726Serotonergic synapse4/38115/81640.00190.0454−1
Table 2. CMap-based prediction of small molecule compounds.
Table 2. CMap-based prediction of small molecule compounds.
No.pert_idpert_idosenorm_cs
1BRD-K4240075810 uM−1.5654
2BRD-K063285184 uM−1.5366
3BRD-K333683204 uM−1.533
4metoprolol10 uM−1.5314
5afatinib0.01 uM−1.5266
6BRD-K2019733820 uM−1.5233
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, H.; Zhu, X.; Zhang, H.; Chen, W. Integrating Network Analysis and Machine Learning Identifies Key Autism Spectrum Disorder Genes Linked to Immune Dysregulation and Therapeutic Targets. Genes 2025, 16, 1109. https://doi.org/10.3390/genes16091109

AMA Style

Wang H, Zhu X, Zhang H, Chen W. Integrating Network Analysis and Machine Learning Identifies Key Autism Spectrum Disorder Genes Linked to Immune Dysregulation and Therapeutic Targets. Genes. 2025; 16(9):1109. https://doi.org/10.3390/genes16091109

Chicago/Turabian Style

Wang, Haitang, Xiaofeng Zhu, Hong Zhang, and Weiwei Chen. 2025. "Integrating Network Analysis and Machine Learning Identifies Key Autism Spectrum Disorder Genes Linked to Immune Dysregulation and Therapeutic Targets" Genes 16, no. 9: 1109. https://doi.org/10.3390/genes16091109

APA Style

Wang, H., Zhu, X., Zhang, H., & Chen, W. (2025). Integrating Network Analysis and Machine Learning Identifies Key Autism Spectrum Disorder Genes Linked to Immune Dysregulation and Therapeutic Targets. Genes, 16(9), 1109. https://doi.org/10.3390/genes16091109

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop