1. Introduction
Transcription factors (TFs) play an essential role in regulating gene expression, controlling cellular fate, proliferation, differentiation and apoptosis. One of the key TFs that contributes to hematopoiesis is
SPI1, which encodes for the ETS domain transcription factor PU.1. PU.1 is critical for the specification, differentiation and maturation of different blood cells lineages, including macrophage, B-cells, monocytes, granulocytes and dendritic cells [
1]. PU.1 binds to purine-rich DNA motifs to regulate transcriptional programs and recruit co-factors to control chromatin accessibility during immune cell development [
2].
Dysregulation of SPI1 has been associated with the pathogenesis of hematologic malignancies, especially myelodysplastic syndromes (MDS) and acute myeloid leukemia (AML) [
3]. Overexpression of PU.1 can disrupt normal hematopoietic homeostasis, while a decrease in its expression levels leads to block differentiation and promotes leukemic stem cell self-renewal [
4]. This dosage sensitivity shows the importance of SPI1 to maintain the normal hematopoiesis development.
Although several functional and structural domains of SPI1 have been well-examined, including the C-terminal ETS DNA-binding domain, how particular missense variants (nsSNPs) change its structure and function remains unexplored. Missense mutations can alter protein folding, stability, and interaction surfaces, thereby influencing TF activity. Given the SPI1 role in hematopoietic balance, investigating the consequence of these mutations is crucial for understanding cancer progression and congenital immunodeficiency.
Recent advances in computational bioinformatics nowadays provide large-scale functional assessment of missense variants. Predicting the impact of amino acid alterations on protein stability and function can be performed via algorithms such as MutationAssessor, PolyPhen-2, STFT and I-Mutant 3.0. Other tools such as Phyre2 and SWISS-MODEL, along with visualization software such as ChimeraX (v1.19), allow valuable insights into how these missense mutations can impact on protein conformation, particularly in the conserved domains. Post-translational modification (PTM) prediction tools such as ModPred and MusiteDeep reveal how variants interact with regulatory mechanisms such as methylation, acetylation, ubiquitination and phosphorylation. All of these processes are associated with SPI1 function during leukemogenesis and immune response [
5,
6,
7].
Evolutionary conservation highlights functionally residues. Highly conserved residues across species such as human, zebrafish and mouse reveal essential biological functions. For instance, zebrafish (Danio rerio) is considered as an excellent model organism to study hematopoiesis during embryogenesis due to ease of genetic manipulating, optical transparency and conserved genetic pathways [
8]. Therefore, examining
SPI1 mutations in different species enriches our understanding about their developmental and evolutionary relevance.
Furthermore, as PU.1 does not perform its function in isolation, examining SPI1 protein–protein interactions (PPI) network is essential. It forms network cooperation with hematopoietic transcriptional regulators such as CRBPA, GATA1, RUNX1 and IRF4/8, all of which are crucial for immune response and lineage specification [
4,
9,
10]. Therefore, structural mutations may disrupt these interactions, which may then alter cell fate pathways or impact immune competence.
Transcriptional data from The Cancer Genome Atlas (TCGA) has shown abnormality in SPI1 expression in leukemia. SPI1 has a role as a tumor suppressor; thus, any increase in its expression may indicate oncogenic transcriptional reprograming [
5]. Though the increase in SPI1’s clinical investigations, comprehensive deep examinations of its missense mutations using in silico approaches are still limited. The focus in most of the existing studies was usually on the expression pattern or large deletion, rather than the alterations in the amino acids which could change the protein function.
Cinnamic acid was selected as a fragment-like scaffold with known biological activity and favorable drug-like properties, intended for exploratory binding assessment rather than as a validated inhibitor [
11]. Cinnamic acid (trans-3-4-phenyl-2-propenoic acid) is a natural phenylpropanoid derived out of L-phenylalanine using phenylalanine ammonia-lyase and is a major intermediate in plant secondary metabolism [
11]. This is due to its structure which comprises a conjugated aromatic ring coupled with an alpha–beta unsaturated carboxylic acid group that provides it with redox activity and hydrogen bonding to biological targets. It has been demonstrated that cinnamic acid and its derivatives have antioxidant and anti-inflammatory effects, in part by the regulation of oxidative stress signaling and inflammatory mediators [
12]. Also, cinnamic acid analogs show antimicrobial action against different strains of bacteria; furthermore, in antiproliferative cancer-related research, the phenylpropanoid skeleton is an attractive lead structure in medicinal chemistry optimization [
13]. Cinnamic acid is drug-like in terms of having a rule of five compliance with Lipinski and physicochemical characteristics that are supportive of oral bioavailability [
14]. The cinnamic acid is an appropriate fragment-like scaffold due to its low molecular weight, and it possesses a potential modification framework that can be explored to design drugs and through computer-aided discovery.
Cinnamic acid was selected in this study as a fragment-like phenylpropanoid scaffold with favorable drug-like properties rather than as a pre-validated inhibitor of the PU.1 ETS domain. Its structural simplicity and documented biological activity make it a suitable candidate for exploratory docking, aimed at assessing initial structural compatibility with the ETS DNA-binding interface and providing a basis for future scaffold optimization. Consequently, a seamless combination of (i) computational evaluation of SPI1 missense mutants with (ii) structural examination of the PU.1 ETS domain and (iii) ADMET-directed docking of cinnamic acid can provide a sensible framework to relate sequence variation, structural stability and ligandability hypotheses pertaining to hematopoietic malregulation. This paper implements such a combined in silico approach to rank potentially toxic SPI1 replacement, decode their site of location in conserved functional domains, and consider the potential of cinnamic acid binding to the PU.1 ETS domain as a starting point towards subsequent experimental validation and rational optimization.
2. Results
2.1. Identification of Deleterious nsSNPs in the SPI1
In this section, nsSNPs that may interfere with the function or structure of SPI1 protein in humans were identified. To achieve this, several in silico prediction tools were used including PredictSNP, SIFT, MAPP, PolyPhen1, PolyPhen2, and PhD-SNP. Among 825 nsSNPs examined, 9 nsSNPs were consistently predicted to be deleterious across all tools employed (
Table 1).
2.2. Most of Identified nsSNPs Are Located Within the Functional Domain of SPI1
All the identified nine variants are located within or near the DNA-binding domain of the ETS domain of SPI1, specifically within the winged binding DNA motif, as shown via InterPro analysis (
Figure 1). This domain is crucial for SPI1 role as a transcription factor regulating hematopoietic cell fate. Thus, mutations within the DNA-binding domain might impact protein folding, which might cause a defect in hematopoietic cell differentiation.
2.3. Structural Evaluation of WT and Mutant SPI1 Proteins
Structural comparisons were restricted to the ETS DNA-binding domain to avoid artifacts associated with modeling intrinsically disordered regions of SPI1 and to ensure biologically meaningful interpretation. Structural alignment of WT-SPI1 with each mutation showed localized confirmational variations at the mutation positions. The R230L mutation, located within the ETS DNA-binding domain, revealed a slight structural change. The Cα–Cα distance between the WT and mutant residue was only 0.020 Å (
Figure 2A). Likewise, the Cα–Cα distance between WT and A229V residue, located in the same domain, was slightly more at 0.074 Å (
Figure 2B). Interestingly, the I189F and H211P displayed more substantial structure alteration, with Cα–Cα distances of 0.302 Å and 0.365 Å, respectively (
Figure 2C,D). The conformational alteration in H211P may be attributed to introduce of proline, a known helix-disrupting residue, which may affect the secondary structure within the ETS domain. The Cα–Cα distance between the wild-type and the other five missense mutant residues ranged from 0.010 Å to 0.060 Å (
Figure 2. These results suggest although all the four mutations impact the ETS domain, I189F and H211P displayed a significant structural impact on structural stabilization of SPI1 protein.
2.4. Protein Stability Prediction for SPI1 Mutations
The protein stability analysis revealed 7 out of 9 missense variations were predicted to decrease protein stability. The most pronounced destabilization was observed for the V241G mutation in both I-Mutant and MUpro with ΔΔG = −3.35 kcal/mol and −2.41 kcal/mol, respectively (
Table 2), indicating a huge structural impact. Similarly, both I189F and R259C mutations displayed prediction with significant impact on protein stability. H211P variant showed a partial discrepancy between tools. The protein stability was increased slightly as shown by I-Mutant (−0.70 kcal/mol) while strong destabilization was predicted via MUpro (−1.50). In contrast, R230L was predicted to be stabilizing via MUpro but destabilizing via I- Mutant. Combining the prediction by the two tools showed more validated and robust insight for the pathogenicity of these variants, particularly for those that are situated within the ETS domain.
2.5. Determination of SPI1 Post-Translational Modification Sites
Multiple residues within the SPI1 protein were predicted to be sites for post-translational modifications (PTM) (
Table 3). Three phosphorylation sites were detected at threonine 164 and serine residues 166 and 188, with scores of 0.192, 0.379 and 0.177, respectively. These modifications may play a role in transcriptional activity and signal transduction.
One ubiquitination site and three acetylation sites were identified at lysine residues 221, 242, 246 and 247, suggesting their involvement in chromatin remodeling or protein stability. Among the three sites, lysine 242 exhibited the highest score at 0.413, while lysine 221 for ubiquitination showed a score at 0.336. Additionally, four arginine residues (212, 220, 230 and 259) were predicted as methylation, recording the highest score at 259 residues at score 0.93 and indicating potential regulatory function. Finally, a glycosylation site was predicted at asparagine 219 with a score at 0.033. These findings show several sites that are predicted to contribute to modulate SPI1 via different post-translational process. The modifications may influence SPI1 protein activity by affecting chromatin remodeling, signal transduction and transcriptional process.
2.6. Interaction Network of SPI1 Protein
The analysis showed a direct functional association between SPI1 and several key regulators of immune cell development, hematopoiesis and transcriptional control (
Figure 3). Notably, SPI1 directly interacts with GATA1, a transcription factor involved in regulation of B-cell developments. RUNX1, a crucial regulator for hematopoietic lineage commitment, also showed a direct interaction with SPI1. Additionally, CEBPA and CEBPB, transcription factors essential for monocytes and granulocytes differentiation, were present in the interaction network. Finally, IRF4 and IRF8, which are transcription factors important for dendritic cells and myeloid differentiation, were detected in the interaction network of SPI1.
Table 1 presents information about the predicted functional partners of SPI1.
2.7. Conservation Analysis of Missense Mutations
A multiple sequence alignment (MSA) for SPI1 protein sequences for human, zebrafish and mouse was performed to detect the evolutionary conservation of the nine missense mutations (
Figure 4). In all three species analyzed, residues G165, I189, R220, A229V, R230, and K247 are displayed to be highly conserved as illustrated by the tall conservation bars in yellow color. In contrast, H211, V241 and R259 residues displayed more variability, especially in zebrafish sequences, and are shown as shorter conservation yellow bars. The findings revealed that most of the variation sites, particularly those located within the ETS domain, are highly conserved among the three species, highlighting their functional importance for transcriptional activities and DNA-binding.
2.8. Upregulation of SPI1 Expression in Acute Myeloid Leukemia (LAML)
Analysis of TCGA data showed that SPI1 expression was significantly upregulated in LAML tumor tissue compared to normal tissue (
Figure 5). The median expression levels were approximately log
2 (TPM + 1) = 7.0 and 5.0, respectively. The difference was statistically significant (
p < 0.01), suggesting possible transcriptional dysregulation of SPI1 in leukemia.
2.9. Molecular Docking Analysis of Cinnamic Acid Against 8eqg_modified.pdb
Figure 6 shows AutoDock Vina resulting in the docking of cinnamic acid (C1=CC=C(C=C1)/C=C/C(=O) O) into the binding site of 8eqg_modified.pdb produced twenty ranked binding conformations with affinity scores between −4.270 and −3.421 kcal/mol. The pose with the highest score (Model 1) had a binding energy of −4.270 kcal/mol with small differences in energy between the top-ranking poses (0.3 kcal/mol between the first three poses). The occurrence of this narrow distribution shows an easy and repeatable binding orientation within the defined docking grid (20 × 20 × 20 A; exhaustiveness 16) as opposed to a population of separate and energetically distinct binding modes.
The binding affinity calculated represents small thermodynamic stabilization of the protein–ligand complex. Energies in this regime are generally weak to moderate non-covalent interactions which are largely governed by van der Waals forces and weak hydrogen bonding as opposed to strong electrostatic anchoring or strong catalytic interactions. The lack of poses with energies significantly lower indicates that cinnamic acid does not bind to highly optimized or deep active-site cavity but rather binds to a fairly shallow binding site.
Structurally, cinnamic acid is a rigid and planar molecule that consists of an aromatic phenyl ring and is conjugated by a trans-alkene functional group on one end of the molecule to a carboxylic acid group. The carboxylic acid moiety seems to be the main center of polar interactions that has one or two directional hydrogen bonds with adjacent polar amino acid residues that line the pocket. The extent of the interaction network is however limited and there is no indication of strong salt bridge formation or multi-point hydrogen bonding which is why the binding energy is moderate.
The aromatic ring is also involved in stabilization through the hydrophobic interactions as well as the possible π interactions and cation interactions with other residues. With these interactions, enthalpic stabilization is promoted but is not very strong because the molecular surface area and the exposure of the solvent to the ligand is limited. The conjugated alkene spacer is rigid and electronically delocalized but structural rigidity limits conformational flexibility so that optimum steric complementarity between irregular pocket geometries cannot be achieved.
Throughout the docking ensemble, there is little difference in binding energies, which suggests that cinnamic acid assumes a stable preferred orientation that is determined by its planar scaffold. Nevertheless, the interaction pattern is dominated by scarce hydrogen bonding and moderate hydrophobic packing but there is no indication of profound occupation of catalytic sites and interwoven interaction patterns.
Generally, the outcomes of the docking process indicate that cinnamic acid is structurally plausible, but loosely complexed with 8eqg_modified.pdb. Although the reaction is regular and thermodynamically favorable, the small binding affinity indicates that cinnamic acid would be better used as a scaffold fragment in place of an effective inhibitor. Probable structural alteration or functional group enlargement would be necessary in order to increase interaction density, binding strength, and possible biological efficacy.
2.10. Molecular Binding Interaction Analysis of Cinnamic Acid with 8EQG
The interaction of Cinnamic Acid within the binding pocket of 8EQG is shown in
Figure 7, which is structurally stable yet moderately weak, with limited number of polar interactions and major number of hydrophobic interactions with the 8EQG binding pocket. The ligand takes on its typical planar structure, which is determined by the conjugated phenyl alkenecarboxylic acid backbone, and fits into a shallow cavity, as defined by both polar and non-polar residues.
The major stabilizing interaction is a conventional hydrogen bond between the carboxylic acid group of Cinnamic Acid and the protonated ε-amino group of Lys217, with the carbonyl oxygen serving as a hydrogen bond acceptor to the protonated ε-amino group of the lysine residue. This interaction may be further strengthened by partial electrostatic attraction because Lys217 is positively charged in physiological conditions. This hydrogen bond serves as the main anchoring site; the ligand is positioned in the pocket and unnecessary rotational flexibility is limited. Nevertheless, there are no more powerful hydrogen bonds or the formation of a salt bridge, which restricts the overall binding strength.
Around the aromatic ring, there are Ala231, Lys227 and Arg230, which provide π–alkyl interactions which stabilize the phenyl system with good interactions between the π–electron cloud and the aliphatic parts of their side chains. The interactions confer hydrophobic confinement and hold the spatial positions of the aromatic moiety. Each of these interactions are weak in isolation compared to classical hydrogen bonds; however, when combined, they collectively enhance enthalpic stabilization and minimize solvent exposure.
Other residues such as Met223, Asn219, and Trp213 are involved in van der Waals interactions, indicating a high level of steric complementarity but not strong directional interactions. The flexible thioether side chain on Met223 is probably optimized so as to maximize packing into the surface, and the hydrophobic or edge-to-face aromatic interaction of Trp213 may be involved. Asn219 is a polar amino acid that might affect the stabilization of the local microenvironment although it is not shown to form a strong hydrogen bond with the ligand.
Generally, cinnamic acid fits in an outer part of the binding cavity other than penetrating a catalytic core. A single polar anchor and surrounding hydrophobic packing prevails in the interaction network, which leads to modest stabilization in accordance with the docking affinity (=−4.27 kcal/mol). The low molecular weight and inflexible planar shape restrict the number of points of interaction and cannot occupy a pocket extensively. As a result, cinnamic acid acts like a fragment-like binder, forming a structurally plausible, but energetically moderate, complex with 8EQG. Strengthening of the binding density and increased inhibitory capacity would probably require structural extension or modification of functional groups.
2.11. Pharmacokinetic and Toxicological Characterization of Cinnamic Acid
Table 4 and
Figure 8 show the cinnamic acid (MW = 148.16 Da) has a tight molecular structure that highly favors passive diffusion across the membrane. It has low molecular weight with a LogP of 1.78 which places it in the best physicochemical window of oral small-molecule therapeutics. In terms of hydrophilicity versus lipophilicity, the balance is an indication of good partitions across biological membranes without being over-accumulated in lipid-rich regions. High permeability is further promoted by the presence of one hydrogen bond donor and one hydrogen bond acceptor along with a topological polar surface area (TPSA) of 37.30 A 2. A TPSA below 70 A 2 is usually characteristic of good intestinal absorption and possible blood–brain barrier (BBB) penetration; therefore, cinnamic acid meets the conditions theoretically. Its drug-like property (quantitative estimate, QED = 0.65) supports the fact that it is a lead-like molecule, but not a structurally complicated drug candidate.
In terms of absorption, the intestinal absorption (0.99) and oral bioavailability (0.91) are predicted with almost 100 percent uptake after the drug was administered orally. These are in harmony with its moderate aqueous solubility (−2.36 log mol/L) which indicates that it can dissolve adequately in gastrointestinal fluids and remain permeable across the membrane. The estimated cell permeability (−4.58 log10−6/s) confirms that there is efficient passive diffusion. Notably, a low P-glycoprotein (P-gp) inhibition (0.004) shows that there is minimal interference with efflux transporters, which decreases chances of drug–drug interactions between transporters.
BBB is moderate in terms of central nervous system accessibility, given its penetration probability (0.77), suggesting that it has average accessibility, which can be beneficial or not based on the therapeutic intent. The high plasma protein binding (92.47) indicates that there is a minimal amount of free fraction in circulation, although high binding might increase apparent exposure and rapid clearance. The calculated volume of distribution (4.61 L/kg) shows the distribution into the tissues reaches far into the extravascular space, which is in line with its lipophilic nature.
Cinnamic acid has a desirable cytochrome P450 interaction profile of a metabolic nature. This low-predicted inhibition of major isoforms (CYP1A2, CYP2C19, CYP2C9, CYP2D6, and CYP3A4) is an indicator of a low risk of developing all-encompassing metabolic drug–drug interactions. The average potential of CYP2C9 substrate action (0.44) indicates that hepatic biotransformation could be done by this route. The microsomal clearance (3.22 µL/min/mg) is found to be in support of moderate metabolic turnover, but due to the fact that the predicted half-life is 0 h, it is evident that it is ejected by the system quickly; this is possibly through efficient conjugation of the system by the liver and excretion by the kidneys.
The toxicological data is usually also reassuring with regard to cardiotoxicity (hERG inhibition 0.01), in addition to mutagenicity (0.02) being low-risk. Acute toxicity is abated on the basis of LD50 forecasts. Nevertheless, the probability of liver injury as a result of drugs is high (0.91) and should be considered. Seeing that cinnamic acid is hepatically metabolized, such an indication could indicate possible metabolic stress or the formation of reactive intermediates. The moderate potential of skin irritation (0.61) is in line with the moderate irritant effects at elevated levels.
All in all, cinnamic acid has an excellent oral absorption, widespread tissue distribution, clearance, low interaction liability, and low acute toxicity. The main issue that arises out of this in silico analysis is the fact that it predicts a high risk of hepatotoxicity; this would have to be carefully confirmed by conducting experimental toxicological analyses.
2.12. Swiss Target Prediction Profile of Cinnamic Acid
The Swiss Target Prediction analysis of cinnamic acid provided a multi-target interaction pattern involving G protein-coupled receptors (GPCRs), metabolic enzymes, ion channels, nuclear receptors, transporters, proteases, phosphatases and kinases (
Table 5 and
Figure 9). Hydroxycarboxylic acid receptor 2 (HCAR2, UniProt: Q8TDS4) showed the highest probability score of 0.8870, which is significant, therefore indicating a high probability of interaction. HCAR2 is a Class A GPCR which plays a role in lipid metabolism, anti-inflammatory signaling and immune modulation. This receptor is highly expressed, indicating that cinnamic acid could have metabolic and immunoregulatory actions via the pathways of GPCR.
One enzymatic target, Aldose reductases (AKR1B1, P15121), had an average probability (0.1012) of being targeted. Aldose reductase is an important oxidoreductase that plays an important role in the polyol pathway and diabetic complications. Its forecast is in line with other past reports which characterize the presence of antioxidant and metabolic regulatory functions of phenylpropanoid derivatives.
Components of inflammatory signaling, such as the homologous Toll-like receptor 4 (TLR4, O00206), were also predicted to be interfered with (0.0918), implying that it may interfere with innate immune activation and NF-KB-mediated signaling.
Three isoforms of carbonic anhydrase were always predicted, which include carbonic anhydrase II (CA2, P00918), carbonic anhydrase I (CA1, P00915), and carbonic anhydrase IX (CA9, Q16790) (probability ranges 0.0823–0.0727). These lyases control homeostasis of pH and they are often linked to adaptation of microenvironment of tumors, especially CA9 in hypoxic tumors. Their existence indicates potential functions in acid base regulation and metabolic reprogramming in cancer.
Estrogen receptor beta (ESR2, Q92731) suggested the role of the nuclear receptor in the potential of endocrine-modulatory activity. Monocarboxylate transporter 1 (SLC16A1, P53985) and transient receptor potential cation channel A1 (TRPA1, O75762) depicted transport and sensory pathways, respectively, and it was possible that they were involved in metabolic flux and nociceptive signaling.
The enzymes of extracellular matrix remodeling, including matrix metalloproteinase 9 (MMP9, P14780) and matrix metalloproteinase 2 (MMP2, P08253), were predicted with medium probabilities (0.0535), suggesting the possibility of anti-invasive or anti-metastatic relevance. Also, intracellular signaling modulators (protein–tyrosine phosphatase 1B, PTPN1, P18031) and epidermal growth factor receptor (EGFR, P00533) were found, which is indicative of a potential adjustment of proliferative and metabolic processes.
The combination of the predicted targets shows that cinnamic acid has a pleiotropic pharmacological profile. Its prevalence of GPCR signaling, and anti-inflammatory and proliferative pathways, contribute to the description of this multi-functional approach to bioactive scaffold with metabolic, anti-inflammatory and possible anticancer implications.
3. Discussion
SPI1, also known as PU.1, is a member of the ETS family of transcription factors and is located on chromosome 11 in the human genome. It has crucial roles in hematopoiesis, being highly expressed in myeloid cells, lymphocytes and hematopoietic stem cells [
4,
15]. SPI1 positively regulates multiple genes in granulocytes/monocytes with the highest levels of expression and B-cells with lower or moderate levels.
Additionally, it is expressed in erythroid precursors and across different hematopoietic progenitors with a potential to develop into lymphocytes lineages [
16,
17,
18]. SPI-deficient mice showed its pivotal role during hematopoietic lineage commitment, demonstrating a defect in the development of lymphocytes, T cells, B cells and other types of immune cells [
4,
15,
19]. The current study was performed to identify and characterize missense nonsynonymous single nucleotide polymorphisms (nsSNPs) within the
SPI1 gene. A comprehensive computational analysis was performed to explore the impact of these mutations on protein structure, stability and function. Additionally, the current work investigates the protein–protein interaction (PPI) network of SPI1 within the hematopoiesis process and its relevance to hematological diseases related to this gene.
This study identified 9 potentially pathogenic missense variants out of 825 nsSNPs in the
SPI1 gene, as predicted by the PredictSNP tool. Seven of these nine variants were found to be located within the ETS domain, while the other two were located close to the region, as determined by InterPro database. The ETS domain consists of 85 amino acids located near C- terminus of the SPI1 protein. It is a highly conserved DNA-binding domain that particularly recognizes the sequence 5′-GGAA/T-3′. It is essential for SPI1 as a regulatory transcription factor [
20].
Structural alignment through SWISS-MODEL UCSF Chimera between wild-type and SPI1 protein variants revealed that several missense mutations, although located within the conserved ETS domain, display differential impacts on the conformational integrity of the protein. Among the nine deleterious mutations, structural modeling indicated that variants such as R230L and A229V generated minor deviations, with Cα–Cα displacement of 0.020 Å and 0.074 Å, respectively. These minimal alterations suggest a limited disruption on protein architecture and stability, likely retaining the DNA-binding activity. In contrast, I189F and H211F displayed more shifting in the protein conformation with distances of 0.302 Å and 0.365 Å, respectively. These findings indicate a potential impact on the tertiary structure of SPI1 protein. In particular, H211P may disrupt protein stability due to the conformational rigidity of proline, a residue known to distribute α-helical structures, especially in the TFs domains such as ETS [
21]. The I189F variant, while involved in hydrophobic substitution, introduces a bulkier phenylalanine side chain, which may disrupt packing interactions within the ETS domain, suggesting the impairment of the SPI1 protein function.
Collectively, although all the nine mutations are supposed to be deleterious, only the I189F and H211P variants are likely to cause a marked impact on SPI1 structure, stability and function. These two mutations are likely to dysregulate gene expression and contribute to the impairment of hematopoietic development.
In silico prediction using I-Mutant and MUpro revealed that seven out of the nine missense nsSNPs mutations are likely to decrease SPI1 protein stability, leading to DNA-binding activity potentially being affected. Among these, the V241G substitution was suggested to be the most destabilizing for SPI1 protein, with ΔΔG values of −3.35 kcal/mol (I-Mutant) and −2.41 kcal/mol (MUpro); such a reduction in free energy suggests a marked change in the protein-folding environment [
22,
23]. Similarly, both I189F and R259C mutations were also associated with a reduction in protein stability. The I189F, located within the ETS domain, changes the non-polar isoleucine to an aromatic phenylalanine, disrupting hydrophobic packing in the ETS domain [
24]. On the other hand, R259C mutation, though located outside the ETS domain, replaces a positively charged arginine with thiol-containing cysteine, potentially introducing disulfide bridges and altering electrostatic interactions, affecting the structural integrity.
Interestingly, prediction discrepancy between the two prediction tools was shown in the H211P substitution with a slight increase in stability in I-Mutant (−0.70 kcal/mol) and marked destabilization in MUpro (−1.50 kcal/mol). This inconsistency is common in the computational predictions, showing the importance of using multiple algorithms [
25]. Given that the α-helices of protein structure may be disrupted via proline residues through the introduction of conformational links, especially in the ETS domains [
26], H211P is suggested to cause protein destabilization. Furthermore, R230L mutation also exhibited discrepant prediction, indicating destabilization in I-Mutant and stabilization in MUpro. The substitution of polar arginine-to-hydrophobic leucine may alter the intra-domain interactions and affect solvent exposure [
27]. Although the R230L substitution is suggested to cause a mild stability change, given that it is situated in the ETS domain, this might translate to marked functional impacts. Overall, the findings showed that most of the variants, especially those that are located in the ETS domain, affected the SPI1 protein stability, which in turn may affect the lineage commitment, causing immune dysregulation and hematological diseases.
Post-translational modifications (PTMs) are crucial for controlling transcription factors such as SPI1, affecting their localization, stability, and transcriptional activity. A prediction from MusiteDeep determined phosphorylation locations at residues T164, S166 and S188, which may modulate the SPI1 function in lineage-specific regulation and signal transduction. These results are consistent with findings regarding phosphorylation regulation in PU.1 activity during hematopoiesis [
28]. Additionally, the predicted methylation at arginine residues 212, 220, 230 and 259 displays a possible regulatory function due to the role of arginine methylation in gene expression and TF interactions [
29]. Ubiquitination at lysine 221, alongside acetylation at lysine 142, 246 and 247, predicts a role in the remodeling of SPI1 protein. Ubiquitination regulates protein degradation, while DNA-binding and transcriptional activities are enhanced via acetylation [
30,
31].
The protein–protein interaction (PPI) of SPI1 revealed a direct network between this protein and several hematopoietic transcription factors, indicating its critical role during hematopoietic regulation. It demonstrates a direct interaction with RUNX1, which is crucial for HSCs generation and differentiation. Previous studies showed cooperation between SPI1 and RUNX1 in hematopoietic gene expression [
18]. SPI1 was also shown to interact with GATA1, an essential regulator for lineage fate decision between erythroid and myeloid lineages [
32]; additionally, CEBPB, a key regulator for monocytes and granulocytes differentiation [
33], was also shown to interact directly with SPI1. Finally, SPI1 also displayed an interaction with IRF4 and IRF8, essential for macrophage and dendritic cell differentiation, suggesting the direct role of SPI1 in immune differentiation [
34]. Overall, these interactions highlight the critical role of PU.1 during hematopoietic lineage commitment and immune cell differentiation and specification.
Cross-species conservation analysis among human, zebrafish and mouse genomes showed that the majority of the pathogenic variants, in particular G165, I189, R220, A229, R130 and K247, are highly conserved. Their locations within the ETS domain highlighted their functional importance [
35]. In contrast, H211, V241 and R259 showed less conservation in zebrafish, illustrating specific adaptation or the more flexible function of SPI1 in non-mammalian species.
The analysis showed that SPI1 is upregulated significantly in tissues with LAML compared to the normal controls, indicating potential roles during leukemogenesis. This transcription factor has been shown to regulate lymphoid and myeloid lineage differentiation [
1]. A dysregulation in SPI1 expression may block the differentiation and proliferation shown in LAML [
3]. Thus, the transcriptional reprogramming of LMAL may be the cause of the observed increase in the SPI1 expression in LMAL tissues from TCGA, suggesting the potential to use SPI1 as a prognostic biomarker or therapeutic target in myeloid malignancies.
Cinnamic acid has been biologically reported to have antioxidant and anti-inflammatory effects, which take place, in part, by regulating redox-sensitive signaling pathways and inflammatory mediators [
12]. The mechanisms can have an indirect impact on transcriptional regulation in inflammatory or malignant processes. Moreover, the derivatives of cinnamic acids have been shown to have antimicrobial and antiproliferative effects, which confirms the pharmacological significance of the phenylpropanoid backbone [
13,
14]. Taken together, the current results are consistent with the available literature suggesting that cinnamic acid represents a bioactive low-complexity scaffold with reasonably good pharmacokinetic properties but poor inherent binding affinity and, hence, it should be optimally structured to achieve better therapeutic outcomes. Docking scores reported in this study represent relative interaction energies and should not be interpreted as experimental binding affinities.
The limitations of this study are as follows:
The study is entirely in silico, lacking experimental validation (e.g., biochemical or cellular assays).
Absence of positive and negative control ligands, limiting the interpretability of docking scores.
Docking energies represent approximate interaction scores, not true binding affinities.
A single ligand (cinnamic acid) was evaluated without comparative ligand benchmarking or a screening library.
Limited exploration of protein flexibility (static docking without extended conformational sampling).
Predicted ADMET profiles are computational estimations and require experimental confirmation.
Future studies should therefore do the following:
Perform in vitro validation (e.g., binding assays, reporter gene assays for PU.1 activity).
Conduct molecular dynamics simulations to assess binding stability and protein flexibility.
Include reference ligands (controls) for more robust docking validation.
Expand to virtual screening of cinnamic acid derivatives or compound libraries.
Investigate structure–activity relationships (SAR) to optimize binding interactions.
Validate ADMET predictions through experimental pharmacokinetic and toxicity studies.
Explore the biological impact of key SPI1 mutations in cellular or animal models.