Blood Proteomics Analysis Reveals Potential Biomarkers and Convergent Dysregulated Pathways in Autism Spectrum Disorder: A Pilot Study

Autism spectrum disorder (ASD) is an umbrella term that encompasses several disabling neurodevelopmental conditions. These conditions are characterized by impaired manifestation in social and communication skills with repetitive and restrictive behaviors or interests. Thus far, there are no approved biomarkers for ASD screening and diagnosis; also, the current diagnosis depends heavily on a physician’s assessment and family’s awareness of ASD symptoms. Identifying blood proteomic biomarkers and performing deep blood proteome profiling could highlight common underlying dysfunctions between cases of ASD, given its heterogeneous nature, thus laying the foundation for large-scale blood-based biomarker discovery studies. This study measured the expression of 1196 serum proteins using proximity extension assay (PEA) technology. The screened serum samples included ASD cases (n = 91) and healthy controls (n = 30) between 6 and 15 years of age. Our findings revealed 251 differentially expressed proteins between ASD and healthy controls, of which 237 proteins were significantly upregulated and 14 proteins were significantly downregulated. Machine learning analysis identified 15 proteins that could be biomarkers for ASD with an area under the curve (AUC) = 0.876 using support vector machine (SVM). Gene Ontology (GO) analysis of the top differentially expressed proteins (TopDE) and weighted gene co-expression analysis (WGCNA) revealed dysregulation of SNARE vesicular transport and ErbB pathways in ASD cases. Furthermore, correlation analysis showed that proteins from those pathways correlate with ASD severity. Further validation and verification of the identified biomarkers and pathways are warranted.


Introduction
Autism spectrum disorder (ASD) is an umbrella term that encompasses various disabling neurodevelopmental conditions. These conditions are characterized by impaired manifestation in social and communication skills with repetitive and restrictive behaviors

ASD Proteome Profiling Reveals Altered Protein Expression in ASD Individuals Compared to HCs
The main aim of this study is to identify differentially expressed proteins in peripheral blood between ASD cases and HCs. These proteins could be used for proteomics blood profiling, pathway enrichment analysis, and biomarker discovery. A total of 1124 proteins passed the quality control measures and were further used for downstream analysis. Similarly, sample outliers were removed prior to the downstream analysis, as shown in Figure S1. Differentially expressed proteins were identified using Limma package in R, and the model was adjusted for covariates (age and sex). A total of 251 proteins were significantly differentially expressed between ASD cases and HCs (BH-adjusted p-value < 0.05), of which 237 were upregulated and 14 were downregulated. To establish a list of the top differential expression (TopDE), a protein was listed if it fulfilled the following Figure 1. Study overview. Serum samples from 91 ASD cases and 30 healthy controls (HCs) were collected along with the participants' demographical information. The serum samples were analyzed using 13 Olink target panels. Different bioinformatic and statistical analysis approaches were deployed for blood proteome profiling to identify potential biomarkers for ASD diagnosis and dysregulated pathways. (Created with www.BioRender.com (accessed on 18 March 2023).) Table 1. Participants' demographical information summary.

ASD Proteome Profiling Reveals Altered Protein Expression in ASD Individuals Compared to HCs
The main aim of this study is to identify differentially expressed proteins in peripheral blood between ASD cases and HCs. These proteins could be used for proteomics blood profiling, pathway enrichment analysis, and biomarker discovery. A total of 1124 proteins passed the quality control measures and were further used for downstream analysis. Similarly, sample outliers were removed prior to the downstream analysis, as shown in Figure S1. Differentially expressed proteins were identified using Limma package in R, and the model was adjusted for covariates (age and sex). A total of 251 proteins were significantly differentially expressed between ASD cases and HCs (BH-adjusted p-value < 0.05), of which 237 were upregulated and 14 were downregulated. To establish a list of the top differential expression (TopDE), a protein was listed if it fulfilled the following criteria: (a) it exhibited a BH-adjusted p-value < 0.05, (b) it showed a minimum of 2-fold difference between ASD cases and HCs. As illustrated in Figure 2a, a total of 53 upregulated proteins and one downregulated protein were listed in the TopDE. The results of the TopDE proteins are depicted in a supervised heatmap (Figure 2b). Further details of all the significantly expressed proteins are listed in Table S1a,b. criteria: (a) it exhibited a BH-adjusted p-value < 0.05, (b) it showed a minimum of 2-fold difference between ASD cases and HCs. As illustrated in Figure 2a, a total of 53 upregulated proteins and one downregulated protein were listed in the TopDE. The results of the TopDE proteins are depicted in a supervised heatmap (Figure 2b). Further details of all the significantly expressed proteins are listed in Table S1a,b.

Gene Ontology Enrichment Analysis of the Differentially Expressed Proteins
Gene Ontology (GO) enrichment analysis of the top upregulated proteins (n = 53) indicated an overrepresentation of various biological processes that include response to stress, immune response, signal transduction, protein phosphorylation, cell-cell communication, apoptosis, and vesicular transport. Cell components showed an overrepresentation of a wide range of intra-and extracellular vesicles, as well as sorting endosomes and exosomes. KEGG database analysis showed enrichment in proteins associated with SNARE interactions in vesicular transport and the ErbB signaling pathway. Furthermore, REACTOME database analysis showed an overrepresentation of various signaling pathways such as signaling by non-receptor tyrosine kinase, signaling by PTK6, regulation of signaling by CBL, and signaling by EGFR; these pathways share the following proteins: EGF, EREG, CBL, and PTPN1. Since there is only one downregulated protein (MFAP5) in the TopDE list, we did not perform GO on that protein; however, it is known that MFAP5 is involved in immune system processes and localized extracellularly [26]. A summary of the top GO terms in the upregulated list is shown in Figure 2c and Table S2.

Machine Learning Identifies Potential Biomarkers for ASD Diagnosis
To demonstrate that the proteins in the TopDE list can be used as biomarkers for predicting the diagnosis, we employed machine learning algorithms. We first performed variable selection to obtain a short list of the best predictors and then evaluated this

Gene Ontology Enrichment Analysis of the Differentially Expressed Proteins
Gene Ontology (GO) enrichment analysis of the top upregulated proteins (n = 53) indicated an overrepresentation of various biological processes that include response to stress, immune response, signal transduction, protein phosphorylation, cell-cell communication, apoptosis, and vesicular transport. Cell components showed an overrepresentation of a wide range of intra-and extracellular vesicles, as well as sorting endosomes and exosomes. KEGG database analysis showed enrichment in proteins associated with SNARE interactions in vesicular transport and the ErbB signaling pathway. Furthermore, REAC-TOME database analysis showed an overrepresentation of various signaling pathways such as signaling by non-receptor tyrosine kinase, signaling by PTK6, regulation of signaling by CBL, and signaling by EGFR; these pathways share the following proteins: EGF, EREG, CBL, and PTPN1. Since there is only one downregulated protein (MFAP5) in the TopDE list, we did not perform GO on that protein; however, it is known that MFAP5 is involved in immune system processes and localized extracellularly [26]. A summary of the top GO terms in the upregulated list is shown in Figure 2c and Table S2.

Machine Learning Identifies Potential Biomarkers for ASD Diagnosis
To demonstrate that the proteins in the TopDE list can be used as biomarkers for predicting the diagnosis, we employed machine learning algorithms. We first performed variable selection to obtain a short list of the best predictors and then evaluated this biomarker signature using a variety of classification approaches to distinguish between ASD cases and healthy controls (for further information, refer to Section 4.6).
We applied minimal unbiased variable selection (MUVR) [27] and Boruta [28] for feature selection using the TopDE list; the list of the TopDE proteins resembled a univariant pre-filter for the dataset. MUVR selected all proteins in the TopDE list for optimal classification ( Figure S1a). Furthermore, Boruta selected only 15 proteins as the best predictors for the diagnosis. To minimize the number of biomarkers and to increase our confidence, we selected the overlapping proteins between MUVR and Boruta. As shown in Figure 3a, 15 proteins overlapped between MUVR and Boruta. Interestingly, these 15 overlapping proteins were the top-ranked proteins in the MUVR list (most informative for differentiating ASD cases from HCs). Details of these 15 proteins are listed in Table 2, and these proteins will be referred to as "Panel A". Subsequently, the diagnostic performance of Panel A was tested using multiple multivariant supervised machine learning algorithms (random forest, generalized linear model-net, and support vector machine (SVM)). The Panel A dataset was internally validated with four-fold cross-validations and 250 repeats. The result of the machine learning showed that, overall, SVM slightly outperformed RF and generalized linear model-net models (see Figure S1b,c). Panel A showed the following performance characteristics: AUC = 0.876, accuracy = 91.4%, sensitivity = 99.9%, specificity = 66.2% (Figure 3b). The receiver operating characteristic (ROC) curve of Panel A is shown in Figure 3c. Therefore, Panel A proteins can be used as potential diagnostic biomarkers for ASD. fication ( Figure S1a). Furthermore, Boruta selected only 15 proteins as the best predic for the diagnosis. To minimize the number of biomarkers and to increase our confide we selected the overlapping proteins between MUVR and Boruta. As shown in Figure  15 proteins overlapped between MUVR and Boruta. Interestingly, these 15 overlapp proteins were the top-ranked proteins in the MUVR list (most informative for differe ating ASD cases from HCs). Details of these 15 proteins are listed in Table 2, and th proteins will be referred to as "Panel A". Subsequently, the diagnostic performanc Panel A was tested using multiple multivariant supervised machine learning algorith (random forest, generalized linear model-net, and support vector machine (SVM)). Panel A dataset was internally validated with four-fold cross-validations and 250 repe The result of the machine learning showed that, overall, SVM slightly outperformed and generalized linear model-net models (see Figure S1b,c). Panel A showed the follow performance characteristics: AUC = 0.876, accuracy = 91.4%, sensitivity = 99.9%, specifi = 66.2% (Figure 3b). The receiver operating characteristic (ROC) curve of Panel A is sho in Figure 3c. Therefore, Panel A proteins can be used as potential diagnostic biomar for ASD.

Proteome Co-Expression Network Analysis Uncovers Modules That Correlate with ASD Clinical Traits
Weighted gene co-expression network analysis (WGCNA) was conducted on all the proteins. This algorithm divides the proteins into different modules based on their coexpression pattern [29,30]. Thus, we constructed a co-expression model using all ASD cases and HCs; eight modules were generated with different numbers of proteins and various levels of correlation with the diagnostic status, ASD versus HC, and the ADOS-2 score. These modules are M1 green, M2 pink, M3 magenta, M4 blue, M5 black, M6 brown, M7 red, and M8 grey ( Figure 4a, Table S3). Three modules showed a significant correlation with the diagnostic status (M1, M5, and M7) ( Figure 4b). The magenta module (M3) showed a significant correlation with the ADOS-2 score (Figure 4b). A scatter plot of the protein significance versus the module membership of M1, M3, M5, and M7 (the modules that correlated with the clinical traits) exhibited a significant correlation ( Figure 4c); however, the magenta M3 module did not show a high level of significance.
Int. J. Mol. Sci. 2023, 24, x FOR PEER REVIEW 7 of 16 and brain development [31]. In contrast, the red M7 module was dominated by immunerelated terms such as cytokine-cytokine receptor interaction; interleukin signaling (i.e., IL-17 and IL-10); inflammation; and decreased production of antibodies, autoimmunity, and immune cell proliferation and migration. Unlike M1, M5, and M7 modules, magenta M3 is enriched with semaphorin receptor activity. Some of the significantly downregulated proteins (MET and PLXNB2) that were co-expressed in this module play a crucial role in semaphorin function. Semaphorins are involved in axonal guidance and are key regulators of the motility and morphology of the neurons [32].
Overall, WGCNA uncovered clusters of co-expressed proteins enriched in some terms directly related to neuronal functions (M1, M3, and M5). In addition, WGCNA modules (M1, M5) showed a consistent pattern of GO and pathway enrichments that was also detected in TopDE with a consistent directionality for ASD cases versus HCs. For instance, SNARE vesicular transport and ErbB signaling pathway were upregulated in ASD cases; semaphorins were downregulated in ASD cases. For each module, the eigenprotein and its hub proteins are plotted in Figure 4d. Enrichment of immune-related function was evidenced in all the significant modules and the TopDE list; however, it largely dominated the red M7 module.  . Protein significance versus module membership exhibits a significant correlation. Module membership is each protein's expression level correlated with the module's eigengene/epiprotein (the first principal component). For a protein to be assigned to a particular module, its module membership value must be close to 1 or −1. (d) Boxplots of the eigenprotein (the first principal component value of each sample) between ASD cases and HCs in the significant modules, and the boxes on the right show the hub proteins in each module; t-test was used to assess the significance between the two groups (p-value < 0.05).
To investigate the functions of the significantly correlated modules, GO enrichment analysis was performed for each module (data not shown). Interestingly, the results showed that the green M1 module, which harbors the majority of TopDE proteins, has a similar GO enrichment to that of the TopDE list. It included SNARE vesicular transport, ErbB signaling pathways (as well as other signaling pathways), immune cell activation, and inflammatory markers. Importantly, the green module showed further overrepresentation of neuralrelated ontologies such as neuronal death, FOX-mediated transcription of neuronal genes, NAD+ metabolism, and cell leading edge, a biological process involved in the growth of axons and dendrites. Six proteins from this module are from the TopDE list (CASP2, PLXNA4, CBL, BID, NMNAT1, and NADK). Furthermore, several developmental process themes such as tub, embryo, vasculature, and epithelial development, are enriched in M1. Although the black M5 module showed a similar overall pattern to M1 GO enrichment, it showed overrepresentation in cerebellum proteins in Human Proteome Atlas (HPA) category. The black M5 module exhibited enrichment in other neuronal-related processes, such as neuregulin and cellular response to beta-amyloid, as well as the activation of matrix metalloproteinases, which plays a crucial role in neuroinflammation and brain development [31]. In contrast, the red M7 module was dominated by immune-related terms such as cytokine-cytokine receptor interaction; interleukin signaling (i.e., IL-17 and IL-10); inflammation; and decreased production of antibodies, autoimmunity, and immune cell proliferation and migration.
Unlike M1, M5, and M7 modules, magenta M3 is enriched with semaphorin receptor activity. Some of the significantly downregulated proteins (MET and PLXNB2) that were co-expressed in this module play a crucial role in semaphorin function. Semaphorins are involved in axonal guidance and are key regulators of the motility and morphology of the neurons [32].
Overall, WGCNA uncovered clusters of co-expressed proteins enriched in some terms directly related to neuronal functions (M1, M3, and M5). In addition, WGCNA modules (M1, M5) showed a consistent pattern of GO and pathway enrichments that was also detected in TopDE with a consistent directionality for ASD cases versus HCs. For instance, SNARE vesicular transport and ErbB signaling pathway were upregulated in ASD cases; semaphorins were downregulated in ASD cases. For each module, the eigenprotein and its hub proteins are plotted in Figure 4d. Enrichment of immune-related function was evidenced in all the significant modules and the TopDE list; however, it largely dominated the red M7 module.

SNARE Vesicular Transport Pathway and Axon Regeneration Proteins Correlate with ASD Severity
To identify proteins that correlate with ASD severity, we assessed the correlation between the ADOS-2 score, a tool for evaluating ASD diagnosis and severity across different ages and developmental stages, with the expression level (NPX values) of the proteins. The results (presented in Table S4a) showed that 64 proteins had a significantly moderate positive correlation and 7 proteins had a significantly moderate negative correlation with the ADOS-2 score. Eight proteins showed a correlation coefficient ≥0.4, namely FKBP1B, 8 of 16 ANXA11, HMOX2, CD40, ANGPTL4, CCL14, ABHD14B, and COL18A1, and the first four were significantly upregulated in ASD cases (Figure 5a,b shows the top positive and negative correlated proteins). Subsequently, GO analysis was performed on the proteins with a significant p-value and a correlation coefficient of ≥0.3 or ≤−0.3 in order to recognize steady and relevant pathways between the TopDE and the significantly correlated proteins. GO enrichment analysis revealed an overrepresentation of the SNARE vesicular transport pathway in the KEGG database. Interestingly, all three proteins involved in the SNARE pathway (SNAP29, SNAP23, and STX8) had a significant yet weak to moderate positive correlation with the ADOS-2 scores and a correlation coefficient in the range of 0.29 to 0.34. The same three proteins were listed in the TopDE protein list (FC ≥ 2, BH-adjusted p-value < 0.05) ( Table S1a). Therefore, SNARE vesicular transport was found to be enriched in GO of both the TopDE protein list and the ADOS-2-correlated protein list. Scatter plots of the correlations are shown in Figure 5c.

Discussion
Blood-based biomarkers are urgently needed to objectively diagnose and better understand the pathophysiological mechanisms behind ASD. Our proteomics study offers a deep insight into the blood proteome of Qatari ASD individuals using multiple computational methods, as this study is the first of its kind to comprehensively measure 1196 blood proteins using all the target panels developed by Olink (Uppsala, Sweden). We observed an overall dysregulation of the blood proteome between ASD cases and HCs with more upregulated proteins in ASD (237) compared to their 14 downregulated counterparts. In addition, we identified 15 proteins, summarized in Table 2, that could be potential biomarkers for ASD diagnosis, although further validation studies are needed. These proteins gave a high AUC, accuracy, and sensitivity and a lower specificity, which might be attributed to the small sample size and the heterogeneity of ASD. Some of the differentially expressed proteins identified in this cohort have been previously reported. Two of our upregulated proteins (uPAR and ARSB) and one of the downregulated ones (PTN) have been found to be differentially expressed in the serum of ASD cases using the So-maLogic platform with a similar directionality [24]. Moreover, another study that tested the correlation between serum and CSF proteins using three Olink panels (inflammation, Furthermore, axon and neuron projection regeneration processes were enriched, and they included DAG1, SCARF1, FKBP1B, TN-R, ARHGAP1, PRDX5, TOP2B, and USP8 (Table S4), four of which (FKBP1B, ARHGAP1, PRDX5, and TOP2B) are listed in the TopDE protein list. On the other hand, the negatively correlated proteins, although less in number compared to the positively correlated ones, showed an overrepresentation of the extracellular region and the SH-2 domain (SKAP1 and SIT1), which is crucial for the protein kinase signaling cascade as it binds tyrosine-phosphorylated sequences in the proteins. In addition, the ADOS-2 scores were used to sub-group ASD individuals into three groups based on the severity levels (severe, moderate, and mild), and then the correlation was assessed using these three categories; the results are presented in Table S4b.

Discussion
Blood-based biomarkers are urgently needed to objectively diagnose and better understand the pathophysiological mechanisms behind ASD. Our proteomics study offers a deep insight into the blood proteome of Qatari ASD individuals using multiple computational methods, as this study is the first of its kind to comprehensively measure 1196 blood proteins using all the target panels developed by Olink (Uppsala, Sweden). We observed an overall dysregulation of the blood proteome between ASD cases and HCs with more upregulated proteins in ASD (237) compared to their 14 downregulated counterparts. In addition, we identified 15 proteins, summarized in Table 2, that could be potential biomarkers for ASD diagnosis, although further validation studies are needed. These proteins gave a high AUC, accuracy, and sensitivity and a lower specificity, which might be attributed to the small sample size and the heterogeneity of ASD. Some of the differentially expressed proteins identified in this cohort have been previously reported. Two of our upregulated proteins (uPAR and ARSB) and one of the downregulated ones (PTN) have been found to be differentially expressed in the serum of ASD cases using the SomaLogic platform with a similar directionality [24]. Moreover, another study that tested the correlation between serum and CSF proteins using three Olink panels (inflammation, cardiovascular I, and oncology I) found a total of 32 proteins that were associated with ASD diagnosis or autism score using a social responsiveness scale. Five of these proteins (AXIN1, NEMO, CD244, CASP8, DKK1, and SIRT2) were significantly expressed in our cohort [33]. In addition, IL-8 was found to be significantly upregulated in another study using the MesoScale Discovery platform, which is consistent with our findings [34]. Similarly, another study conducted on the postmortem brain of ASD showed upregulation of proinflammatory response markers IL-6, TNF-alpha, IL-8, and INF-Y [35], and the latter three markers were also significantly upregulated in our cohort. It is noteworthy that our study found global changes in immunerelated markers, which is supported by many proteomics and transcriptomics studies in both the blood and brain tissue [35][36][37][38]. In addition, although ASD is a very heterogeneous condition, this study identified convergent mechanisms between ASD subjects by applying multiple approaches that differ in their essences, such as differential expression analysis, WGCNA, and correlation analysis. These approaches allowed us to identify redundant pathways that seemed to be enriched and may be of particular importance for ASD. Herein, we will discuss these pathways and their relevance to ASD.
We demonstrated that the ErbB (epidermal growth factor receptor signaling family) pathway was enriched in ASD cases compared to HCs ( Figure S2a). ErbB belongs to the receptor tyrosine kinase family, which is important for cell proliferation, differentiation, growth, and migration [39]. Our study revealed that three ligands of this pathway (EGF, EREG, and TNF-alpha) were significantly upregulated in ASD cases; two of these ligands (EGF and EREG) were also found in the diagnostic biomarker list (Panel A). EGF plays a crucial role in the growth of the midbrain during embryogenesis, and it enhances dopamine uptake and dopaminergic neuron survival [40]. A study that measured EGF in the serum of ASD subjects showed that it is significantly elevated in ASD compared to healthy controls [41], which is consistent with our findings. Another study subcutaneously administrated EGF in neonatal mice, and this caused a range of neurological changes that included a reduction in social interaction and motor activity [42]. Although EGF did not correlate with the ADOS-2 score in our cohort, EREG did show a moderate positive correlation. Even though EREG has not been reported previously in the context of ASD, a study showed that overexpression of EREG in the brain could lead to a brain tumor, and it was involved in tumor exacerbation in a glioblastoma cell line [43]. Interestingly, TNF-alpha, which is another ligand for the ErbB signaling pathway, was elevated in the postmortem brain and serum of ASD cases compared to HCs [35,41]. TNF-alpha has a crucial role in initiating inflammation and maintaining tolerance to self-antigens by controlling lymphocyte survival and proliferation [44]. Furthermore, mutation and upregulation of the ErbB signaling receptors (i.e., EGFR) have been associated with different types of cancers, such as breast and colon cancer [45,46]. Thus, the role of the ErbB signaling pathway and its ligands (EGF, EREG, and TNF-alpha) needs further investigation in the context of ASD.
The second pathway highlighted by the proteomic profiling is the SNARE vesicular transport pathway. Three proteins of the SNARE vesicular transport pathway (SNAP29, SNAP23, and STX8) were significantly upregulated in ASD cases compared to the HCs ( Figure S2b). In addition, these proteins showed a moderate correlation with ASD severity score and were co-expressed in the same module in WGCNA. In vertebrates, SNAP29 and SNAP23 belong to the SNAP25 protein subfamily. SNAP23 and SNAP25 proteins regulate exocytosis and can compensate for each other according to a cross-rescue experiment, as SNAP23 overexpression can rescue SNAP25 function in a SNAP25 KO model [47]. Although SNAP25 is almost exclusively expressed in neurons, SNAP23 can be expressed in neurons (neuronal cell body, somatodendritic compartment, and astrocytes) and other tissues of the body [48]. In addition, SNAP23 is localized in the plasma membrane and predominantly expressed in the post-synapse. Given its importance in a wide range of cell types, SNAP23 gene deletion leads to embryonic lethality in mice [49]. On the other hand, SNAP29 is prominently localized in the endosomal system and Golgi and is expressed at lower levels in the synaptic vesicles [50]. Overexpression of SNAP29 in a SNAP25 KO model showed that SNAP29 compensated for SNAP25 function; however, it caused a slower synaptic release and resulted in smaller neurons with fewer synapses [47]. Genetic studies have linked a specific allele polymorphism and copy number variants of SNAP29 to neuropsychiatric disorders such as schizophrenia [51]. SNAP25 subfamily proteins interact with a broad spectrum of syntaxin (STX) proteins to facilitate endosomal fusion [48]. In our cohort, STX8 was in the TopDE protein list. Mutations in SNARopathy genes can lead to diverse neurodevelopmental, cognitive, and motor delays and autistic traits [52].
It is important to mention that for 18 proteins differentially expressed in our cohort, their corresponding genes have been associated with ASD according to the Simons Foundation Autism Research Initiative (SFARI) genetic database. Two of these are from the TopDE list. The first is TOP2B (FC = 2.9). This gene encodes DNA topoisomerase protein, an enzyme that controls the topological landscape of the DNA during transcription; in addition, it catalyzes the breaking and rejoining of the two-strand DNA fragments [53]. According to SFARI, TOP2B was identified as a candidate gene for ASD as it has been reported to involve de novo mutations. It has been shown that knocking down TOP2B resulted in significantly reducing the transcription rate of long genes in neurons [54]. The ramification of TOB2P upregulation needs to be studied in the context of ASD. The second protein is PLXNA4 (FC = 2.4). This protein plays an essential role in mediating semaphorin signals and thus mediating axonal guidance during development [55]. Furthermore, copy number variants have been reported to encompass PLXNA4 in two unrelated cases of ASD [56].
It should be noted that this study was limited by the small sample size, particularly in the HC group, in addition to the male-to-female ratio imbalance between the ASD cases and HCs. The biomarkers identified in the present study showed a high AUC, accuracy, and sensitivity; however, the specificity was low, which might be caused by the small sample size and ASD heterogeneity. This sheds light on the imperative need for the stratification of ASD individuals based on their multi-omics profile, as this may help in identifying diagnostic biomarkers with high performance characteristics [57]. In addition, a validation cohort is needed to confirm our findings. Performing functional studies using animal models, organoids, and neuronal cell culture is essential to confirm and understand the pathophysiological role of these proteins in ASD in different dimensions. Finally, since our cohort was formed from the Qatari population, part of our future plans is to validate our findings using other populations.

Study Cohort
The study was performed according to the guidelines of the Declaration of Helsinki and was ethically approved (QBRI-IRB 2018-024). Written informed consent and assent were given by both parents and children, respectively, for all ASD individuals and healthy controls (HCs). The HCs' communication and social skills were evaluated using the Social Communication Questionnaire (SCQ). ASD individuals were clinically diagnosed using the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), and their severity levels were assessed using the second edition of the Autism Diagnostic Observation Schedule (ADOS-2) test. Demographical information was also obtained from the participants during the visit and is summarized in Table 1.

Olink Proteomics Analysis and Data Pre-Processing
Human serum samples were collected, processed, and stored (frozen at −80 • C) at Qatar Biomedical Research Institute (QBRI), Qatar. Differential protein discovery was performed using the Olink platform (Uppsala, Sweden), which utilizes proximity extension assay (PEA) technology that combines dual recognition of proteins by pairs of antibodies labeled with a DNA oligonucleotide; when bound in close proximity, the dual antibody recognition allows the hybridization of the DNA probes. This step is followed by DNA amplification for each corresponding protein signal. The signals are in the form of cycle threshold (Ct). The Ct signals were then normalized and reported as normalized protein expression (NPX) values in a log2 scale, which is an arbitrary unit for the relative quantification of proteins across samples. Prior to running the panels, the serum samples were randomized and aliquoted in 96-well plates. The serum samples were analyzed using all 13 Olink target panels (neurology, development, neuro-exploratory, inflammation, immune response, cell regulation, organ damage, metabolism, oncology II, oncology III, cardiometabolic, cardiovascular II, cardiovascular III) available from Olink (Uppsala, Sweden). These panels combined cover a total of 1196 measured proteins. Olink's standard protocol was followed, and the runs were performed at QBRI in an Olink-certified proteomics core facility. Samples that passed the QC assessment were used for downstream analysis. In addition, proteins were excluded from the analysis if ≥30% of cases and controls were below the lower limit of detection (LLOD). Missing values were imputed using missForest package in R. Sample outliers were identified using principal component analysis (PCA) and hierarchal clustering; then they were excluded from the analysis as illustrated in Figure S3.

Statistical Analysis
Differentially expressed proteins were identified using the Limma (Linear Models for Microarray Data) package in R. The p -values of all the proteins were adjusted for multiple testing using the Benjamini-Hochberg (BH) method. In addition, age and sex were considered as covariates in the model. Top differentially expressed proteins (TopDE) were selected based on the fold change (FC ≥ 2) and BH-adjusted p-value (adj p-value ≤ 0.05). Correlation analysis was performed using cor.test() function, and proteins that exhibited a correlation coefficient of ≥0.3 or ≤−0.3 and a p-value < 0.05 were considered for further enrichment analysis. Furthermore, the t-test() function was used to test the significant difference in eigenprotein between ASD cases and HCs in the important WGCNA modules. The heatmap was constructed using Heatmap() function from the ComplexHeatmap package. ROC curve analysis was performed using the pROC package. All analyses were conducted in R Studio (version 1.3.1093).

Weighted Gene Co-Expression Analysis
Signed co-expression networks were built using the WGCNA package in R as described [30]. Weighted gene (protein) co-expression analysis is a method for describing the correlation patterns among genes (proteins) and for finding clusters (modules) of highly correlated genes (proteins). The generated clusters are summarized in the form of the eigengene (eigenprotein), which is the first principal component of each sample's expression data. A similarity correlation matrix was produced from all the proteins; then, an adjacency matrix was calculated using a soft-threshold power of 4, chosen based on the scale-free topology criteria. Subsequently, a topological overlap matrix (TOM) was constructed from the adjacency matrix to build the modules. The hierarchical cluster was built using the topological overlap dissimilarity (dissTOM). Then, the modules were depicted in the clustering tree. The module hierarchical cluster cut was performed at 0.35 to merge the closely related modules. A minimum of 30 proteins per module was set to generate a moderately large module size. Proteins that were not assigned to any modules were listed in the grey module.

Protein Enrichment Analysis
For enrichment analysis, g:Profiler (https://biit.cs.ut.ee/gprofiler (accessed on 27 November 2022)), an online web-server that performs statistical enrichment analysis using the p:GOSt function, was used with the default setting to detect overrepresented terms in Gene Ontology (GO), such as cellular components (CCs), molecular functions (MFs), and biological processes (BPs), as well as overrepresented pathways in KEGG and REACTOME databases. All the significantly enriched terms and their associated proteins were selected based on the Fisher exact test (FET) with an adjusted p-value < 0.05. The hub proteins were identified by extracting the proteins from each WGCNA module. Next, the proteins were plugged into STRING (version 11.5) to generate the interaction network files. To find the hub proteins, the extracted files were uploaded into CytoScape (version 3.9.1).

Feature Selection and ROC Curve Analysis
To identify a minimal set of biomarkers, we applied two feature selection methods. The first is the minimal unbiased variable selection (MUVR) algorithm, a machine learning technique that aims to identify the minimal set of variables that are required for the successful classification of a given dataset [27], and it was applied to the list of top differentially expressed proteins (TopDE). This algorithm employs a variety of techniques, including recursive variable elimination and repeated cross-validation, which help to minimize overfitting and improve the performance of the resulting model. Random forest (RF) was chosen, along with 21 outer and 28 inner cross-validation segments, and a variable ratio of 0.75. The list of minimal optimal subsets was selected.
The second method is Boruta [28], another feature selection method based on RF. Using statistical testing, it iteratively removes the features which are less relevant than randomized features. Boruta was also applied to the TopDE list, and the algorithm was run with maxRun = 1000 and all other parameters set to their default values.
To ensure the robustness of our selected variable set, further validation was performed using a variety of classification methods such as RF, support vector machines (SVMs), and generalized linear models (GLMs). We used the mlr3 (Machine Learning in R, version 3) framework to implement the classification methods. This framework provides a consistent interface for a wide range of machine learning tasks, including classification, regression, and clustering [58]. In addition, it ensures the reliability of the results. A four-fold crossvalidation repeated 250 times was used on the dataset. Various performance evaluation measures were used to assess the performance of the model: accuracy, true positive rate (sensitivity), true negative rate (specificity), and the area under the receiver operating characteristic curve (AUC).

Conclusions
Overall, this study measured more than 1100 blood proteins of ASD using PEA technology. Our differential expression analysis revealed a difference in blood proteome profile between ASD cases and HCs. In addition, 15 proteins were selected by machine learning as potential diagnostic biomarkers for ASD. Our study identified two pathways that were upregulated in ASD (SNARE vesicular transport and ErbB signaling pathway). These pathways were enriched in the ASD blood proteome profile despite the high heterogeneity of ASD. Therefore, they might play an essential role as common underlying pathophysiological mechanisms in ASD. A validation cohort should be tested to confirm these findings, and patient stratification using a multi-omics approach may be the way forward toward better understanding and characterizing ASD. Funding: Areej Mesleh is a PhD student funded by GSRA-QNRF (GSRA6-1-0616-19097). The funders had no role in the design and conduct of the study; in the collection, management, analysis, or interpretation of the data; in the preparation, review, or approval of the article; or in the decision to submit the article for publication.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Qatar Biomedical Research Institute (QBRI) (QBRI-IRB 2018-024).

Informed Consent Statement:
Informed consent and assent were obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.