Next Article in Journal
Gut–Brain Signaling in Parkinson’s Disease: A Narrative Review
Previous Article in Journal
From Luminal to Triple Negative: 3D Spheroids Reveal Molecular and Phenotypic Differences Across Breast Cancer Subtypes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AHR/NRF2 Dual Agonist Prediction and Natural Compound Screening Based on Machine Learning: A New Strategy for the Treatment of Atopic Dermatitis

1
State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
2
Sino-Danish College & Sino-Danish Centre for Education and Research, University of Chinese Academy of Sciences, Beijing 100190, China
3
Department of Pharmacy, University of Copenhagen, 2100 Copenhagen, Denmark
4
School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
5
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2026, 27(8), 3530; https://doi.org/10.3390/ijms27083530
Submission received: 3 March 2026 / Revised: 31 March 2026 / Accepted: 5 April 2026 / Published: 15 April 2026
(This article belongs to the Section Molecular Biology)

Abstract

In the treatment of atopic dermatitis (AD), synergistic activation of the aryl hydrocarbon receptor (AHR)/nuclear factor erythroid 2-related factor 2 (NRF2) pathways represents a promising strategy. However, known dual agonists are limited, and traditional screening methods are inefficient. Therefore, this study developed machine learning models to predict AHR/NRF2 dual agonists using molecular descriptors and fingerprints. All models achieved area under the receiver operating characteristic curve (AUC) values above 0.86, indicating good classification performance. The optimal AHR model showed an accuracy (ACC) of 0.811 and an AUC of 0.878, while the best NRF2 model yielded an ACC of 0.839 and an AUC of 0.907. Based on this model, compounds with a low fraction of sp3-hybridized carbons, moderate hydrophobicity, limited alkyl chains, and highly conjugated structures tend to act as AHR/NRF2 dual agonists. Finally, this study screened 1011 potential natural AHR/NRF2 dual agonists suitable for drug development. Among these, 2-arylbenzofurans, alkaloids, phenanthrenes, flavones, and furocoumarins demonstrated particular advantages. For validation, Indirubin, imperatorin and 3′-O-Methylbutastatin III were first discovered as AHR/NRF2 dual agonists in HaCaT cells. This work provides a robust predictive tool, clarifies key molecular features of dual agonists, and may support the discovery of anti-AD agents.

Graphical Abstract

1. Introduction

The aryl hydrocarbon receptor (AHR) is an evolutionarily ancient ligand-activated transcription factor highly expressed in all skin cell types [1]. It regulates numerous genes critical for fundamental skin functions, including environmental toxin metabolism [2], keratinocyte differentiation, epidermal barrier function [3,4], melanogenesis [5], and skin immunity and inflammatory responses [6,7]. Consequently, a certain level of AHR activity is essential for maintaining skin integrity and adapting to acute stress conditions. The nuclear factor erythroid 2-related factor 2 (NRF2) signaling pathway is a crucial intracellular antioxidant and stress response pathway, with downstream antioxidant enzymes including NAD(P)H quinone dehydrogenase 1 (NQO1), heme oxygenase-1 (HO-1), glutathione s-transferase (GST), catalase (CAT), superoxide dismutase (SOD), and others [8]. NRF2 activation protects skin cells from H2O2-induced and UV-induced cellular damage, including oxidative stress [9], DNA damage [10], apoptotic injury [11], mitochondrial dysfunction [12], inflammation [12], skin aging [13,14], and even skin cancer [15]. Crosstalk between AHR and NRF2 remains understudied. Although evidence suggests that NRF2 acts as a downstream target of AHR [16] or that AHR may indirectly activate NRF2 via reactive oxygen species generated by cytochrome P450 1A1 (CYP1A1) [17], this regulatory process appears to be species- and cell-dependent.
Given the critical roles of AHR and NRF2 in the skin, recent studies suggest these pathways may serve as therapeutic targets for atopic dermatitis (AD). AD is characterized by a Th2-polarized immune response with elevated levels of interleukin-4 (IL-4) and interleukin-13 (IL-13) [6]. Dysfunction of the skin barrier is highly correlated with AD pathogenesis, and a key downstream target of AHR is the epidermal differentiation complex [18]. For example, topical application of coal tar or glyteer can restore expression of the skin barrier protein filaggrin (FLG) in AD patients’ skin by activating AHR [3,19,20]. Both AHR and NRF2 activation can attenuate Th2 inflammation [21] while NRF2 simultaneously initiates antioxidant pathways to reduce oxidative stress and inhibit signal transducer and activator of transcription 6 (STAT6) activation [20]. Therefore, dual agonists targeting both AHR and NRF2 hold significant potential as novel therapeutic targets for AD. Currently known dual agonists such as tapinarof, coal tar, and WBI-1001 have demonstrated efficacy in treating AD [18]. Furthermore, current treatments such as glucocorticoids and monoclonal antibodies are often limited by management difficulties and adverse side effects, which make natural compound-derived drugs increasingly attractive due to their greater bioactivity and safety.
However, given the vast diversity of natural compounds, identifying dual agonists of AHR/NRF2 through experimental approaches is time-consuming, labor-intensive, and costly. Fortunately, with advances in computer science, machine learning-supported quantitative structure-activity relationship (QSAR) methods have demonstrated tremendous promise in small-molecule drug discovery [22]; these approaches correlate molecular descriptors of chemical structures with their biological activities or responses [23]. Among various molecular descriptors, molecular fingerprints can effectively and simply represent molecular structures, physicochemical properties, and pharmacophore characteristics in a 2D format [24]. They are widely applied in diverse drug discovery processes, including virtual screening, similarity-based compound searches, and drug ADMET (absorption, distribution, metabolism, excretion, and toxicity) prediction [25]. Currently, multiple algorithms from ensemble learning and deep learning have been applied to predict agonists for various targets, such as gamma-aminobutyric acid type A receptor (GABAA) [23], estrogen receptors [26], G protein-coupled receptors (GPCRs) [27], and peroxisome proliferator-activated receptor delta (PPAR-δ) [28]. However, existing machine learning studies on AHR agonists have yet to reveal structure-activity relationships between molecular structures and biological activities. Furthermore, no relevant literature has been reported in the field of machine learning research for NRF2 agonists.
This study primarily established machine learning models for predicting AHR/NRF2 dual agonists, compared the performance of different models, explored structural and chemical features of dual agonists through feature importance analysis, and screened several unreported natural compounds. This study successfully developed a reliable prediction tool with high accuracy, significantly reducing experimental costs. It not only elucidated the molecular properties of AHR/NRF2 dual agonists but also provided a convenient and robust framework for screening drugs targeting atopic dermatitis.

2. Results

2.1. Statistical Analysis of RDKit Descriptors

After preprocessing, the AHR dataset contained 5192 agonists and 5192 non-agonists, while the NRF2 dataset contained 1242 agonists and 1242 non-agonists. Molecular descriptors for each compound were obtained using RDKit, and point-biserial correlation coefficients were calculated to explore differences between agonists and non-agonists. Figure 1 lists the top five descriptors ranked by point-biserial correlation coefficient. Specifically, for AHR, agonists are characterized by higher values of BCUT2D-CHGLO and BCUT2D-LOGPLOW, whereas non-agonists exhibit higher values of FractionCSP3, SlogP_VSA2, and SlogP_VSA3. For NRF2, agonists are associated with higher BCUT2D-LOGPLOW, while non-agonists tend to show higher values of FractionCSP3, SlogP_VSA2, Chi0n, and Chi1n. FractionCSP3 denotes the proportion of sp3-hybridized carbon atoms in the molecule relative to the total number of carbon atoms. BCUT2D-LOGPLOW characterizes molecular hydrophilicity, while SlogP_VSA2 and SlogP_VSA3 both characterize hydrophobicity within the molecule [29,30]. BCUT2D-CHGLO represents the charge distribution of the molecule, while Chi0n and Chi1n denote the average values of the net charge. Thus, for AHR and NRF2, molecules with a lower proportion of sp3 hybridization and moderate hydrophobicity are more likely to act as agonists. Additionally, AHR agonists are characterized by stronger electrophilicity.

2.2. Prediction Performance of the Classification Models

Molecular fingerprints, including extended-connectivity fingerprint with a diameter of 4 (ECFP4)-2048 bits (E2048), ECFP4-1024 bits (E1024), PubChem (PUB), and Molecular access system keys (MACCS), were used to train Random Forest (RF) and Light Gradient Boosting Machine (LGBM) classifiers. Graph convolutional network representations of molecules (abbreviated as Graph) were employed to train four neural network models: Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), Message Passing Neural Networks (MPNN), and AttentiveFP (AFP). A total of 24 models were constructed and compared for both AHR and NRF2. Results from fivefold cross-validation on the training set are presented in the Supplementary Materials (Supplementary Table S4). Figure 2a lists the precision (PR), recall (RE), accuracy (ACC), and F1, which are typical indicators, and the area under the receiver operating characteristic curve (AUC) metrics for the 24 models on the test set. Notably, all models achieved AUC values above 0.86, indicating strong classification performance. Among the neural network models, GAT and GCN performed the worst. Across all four molecular fingerprints, MACCS consistently yielded the lowest scores, suggesting that the MACCS fingerprint may not adequately represent molecular structures. Assuming ACC as the evaluator, the best-performing classification model for AHR agonists is AHR_AFP_Graph (ACC = 0.811, AUC = 0.878); for NRF2 agonists, the top model is NRF2_RF_PUB (ACC = 0.839, AUC = 0.907) (the naming rules for the model can be found in Section 4). Figure 2b displays the confusion matrices of these two top models on the test set. It shows balanced sample sizes for labels 0 and 1, with consistent performance across agonists and non-agonists, demonstrating the models’ excellent classification capability.

2.3. Shapley Additive Explanations of Molecular Fingerprints

To investigate which molecular structures play key roles in distinguishing agonists from non-agonists, the SHAP explainable machine learning library was employed for analysis. Since E2048 provides more comprehensive and fine-grained molecular information, and the LGBM models based on these fingerprints demonstrated superior performance (AHR_LGBM_E2048: ACC = 0.809, AUC = 0.879; NRF2_LGBM_E2048: ACC = 0.825, AUC = 0.896), AHR_LGBM_E2048 and NRF2_LGBM_E2048 were selected for subsequent analysis. Figure 3a,b illustrate molecular fingerprint bits contributing most significantly to the model. Each point represents a sample, with color indicating whether the structure corresponding to the fingerprint bit is present. The SHAP value on the x-axis indicates whether the fingerprint bit exerts a positive or negative influence on the prediction for each sample. A larger absolute value of the SHAP value signifies a greater contribution to the model. For both models, only one color appears on either side of the SHAP value of “0”, indicating that the direction of influence of these features is very clear. This further demonstrates the models’ excellent ability to classify agonists and non-agonists based on these features.
For the AHR model, samples with a “1” at bits 80, 350, 926, 1476, and 1480 tend to be predicted as non-agonists, and the SHAP values of these features are relatively concentrated, indicating a stable effect. In contrast, samples with a “1” at bits 1855, 1357, 725, 843, and 486 are more likely to be predicted as agonists, and the SHAP values of these features are more dispersed, suggesting strong interactions between features—that is, the effect of these substructures depends on the presence of other substructures. For the NRF2 model, samples with a “1” at bits 80, 926, 935, 1152, 1480, and 1722 tend to be predicted as non-agonists, whereas samples with a “1” at bits 474, 694, 656, 675, and 1088 tend to be predicted as agonists. The patterns of concentration and dispersion of SHAP values are similar to those observed in the AHR model. Across the entire dataset, substructures that clearly support non-agonist predictions occur less frequently in agonists than in non-agonists, whereas substructures that favor agonist predictions are markedly more frequent in agonists (Figure 3c,d). However, no single substructure shows an absolute predominance, indicating that predicting agonists requires consideration of a combination of various substructures as well as the complex binding sites of large molecules.
To further clarify the specific structures corresponding to these important fingerprint bits, the top five features with positive and negative contributions were visualized using the RDKit library (Figure 3e,f). Considering that the same ECFP4 fingerprint position may correspond to different substructures in different molecules, we randomly selected 40 samples in which the bit value was “1”. The results indicate that despite variations in individual samples, the vast majority of samples exhibit identical or highly similar structures at the same fingerprint bit. Therefore, the structure with the highest frequency of occurrence was selected as the representative structure for display. For AHR, highly conjugated aromatic structures such as aromatic amides, aromatic amines, imines, and nitrogen-containing heterocycles (pyrrole, pyridine) are more likely to act as agonists. For NRF2, highly conjugated aromatic structures like furan and aromatic heterocycles are more likely to act as agonists. In contrast, hydrophobic substructures formed by nonpolar alkyl side chains are unfavorable for activating AHR and NRF2, consistent with the results presented in Section 2.1.

2.4. Potential Dual Agonists of AHR and NRF2 from the Natural Compound Library

Virtual screening of the COCONUT natural compound library was performed using the two best-performing models to identify dual agonists for AHR and NRF2. Following multiple rounds of screening of 695,142 compounds, 1011 potential natural small-molecule dual agonists with drug development potential were ultimately identified. A literature search via PubMed revealed that among these 1011 compounds, 79 have been reported to activate AHR, 129 to activate NRF2, and 43 to activate both AHR and NRF2. Detailed information on all 1011 compounds is provided in the supplementary table for reference (see Dataset S1: Predicted Dual AHR/NRF2 Agonists). These compounds represent highly promising AHR/NRF2 dual agonists, warranting further experimental validation. Additionally, enrichment analysis was performed on the categories of these 1011 potential natural dual agonists compared to the entire dataset. Results indicate that natural products such as phenanthrenoids, flavonoids, xanthones, polycyclic aromatic polyketides, isoflavonoids, alkaloids, and coumarins show greater potential as dual agonists of AHR/NRF2 (Figure 4a,b). At a more granular level, Figure 4c illustrates the corresponding relationships between categories. Subgroups such as 2-arylbenzofurans among isoflavonoids, carbazole alkaloids and carboline alkaloids among alkaloids, phenanthrenes among phenanthrenoids, flavones among flavonoids, and furocoumarins among coumarins exhibit greater advantages as dual agonists for AHR/NRF2. The molecular structural features of 2-arylbenzofurans, coumarins, alkaloids, and flavones further validate the results in Section 2.3.

2.5. Experimental Validation of Dual Agonists in HaCaT Cells

Considering the availability of compounds and their anti-inflammatory potential in the con-text of traditional Chinese medicine, five natural compounds that have not been reported as AHR or NRF2 agonists were selected. Among them, isoimperatorin, imperatorin, and 3′-O-Methylbutastatin III have not been reported to activate AHR, while Indirubin, bergapten, and 3′-O-Methylbutastatin III have not been reported to activate NRF2. Due to the fact that human keratinocytes are commonly used in dermatitis research, five compounds were validated on the HaCaT cell line. First, the effects of 10 μM of various compounds on cell viability were tested. Cell viability remained above 90% in all groups, suggesting that these compounds did not exert cytotoxic effects in HaCaT cells (Figure 5a). Then, regarding the activation of the pathways, indirubin, imperatorin and 3′-O-Methylbutastatin III significantly activated AHR and NRF2, which are reported here for the first time as dual agonists. Isoimperatorin was found for the first time to significantly activate AHR. Although bergapten was not significant, it still showed a numerical trend of activating AHR and NRF2, and is a potential dual agonist. In summary, these results demonstrate that the machine learning model established in this study has great potential for screening dual agonists.

3. Discussion

In this study, machine learning prediction models for AHR/NRF2 agonists were established, achieving classification performance with AUC values above 0.86. While some studies have reported on machine learning-based screening for AHR agonists, these studies exhibit several limitations. Zhu et al. [31] employed neural networks and RF to construct predictive models (sample size = 8164). However, due to class imbalance in the training data, the models exhibited significantly different prediction accuracies for agonists and non-agonists. Meanwhile, Wojtyło et al. [32] optimized classification algorithms (sample size = 978), but achieved a maximum accuracy of 0.76. Furthermore, neither study thoroughly investigated the influence of molecular structural features. To address these issues, this study employed a dataset with a sufficient sample size and balanced categories (sample size = 10,384, agonist:non-agonist = 1:1). The constructed classification model achieved a maximum ACC of 0.811, significantly outperforming previous studies. Furthermore, by integrating molecular descriptor analysis, this study enhanced model interpretability and achieved the first machine learning-based screening of NRF2 agonists (NRF2_RF_PUB: ACC = 0.839), providing a more reliable predictive tool for computer-aided drug design targeting this pathway.
This study investigated the molecular structural characteristics of AHR/NRF2 agonists from three perspectives: statistical differences in RDKit descriptors, ECFP4 feature importance analysis, and natural compound screening. Results indicate that dual agonists exhibit significant structural commonalities, including a low fraction of sp3-hybridized carbons, moderate hydrophobicity, limited alkyl chains, and highly conjugated structures. Collectively, the presence of saturated carbons or localized alkyl chains within aromatic compounds appears to hinder activation of both pathways. This conclusion is consistent with previous studies suggesting that AHR and NRF2 ligands require planar aromatic rings and extended π-conjugated systems—where planarity essentially reflects the high proportion of sp2-hybridized carbon atoms in the benzene ring side chains and the integrity of the conjugated structure [33,34]. Furthermore, regarding AHR agonists, SHAP analysis revealed the critical role of nitrogen-containing functional groups in their structure: these groups form hydrogen bonds to establish specific binding with the receptor [35,36]. Based on these findings, this study is the first to explicitly propose specific compound categories—aromatic amides, aromatic amines, imines, and nitrogen-containing heterocycles (e.g., pyrrole, pyridine)—as structural guidelines for AHR ligand design, thereby complementing existing theoretical frameworks.
Additionally, given the critical role of the AHR-NRF2 axis in AD treatment, a panel of natural compounds was screened for dual agonists using established models. Currently, aside from the marketed drug tapinarof, few therapeutic agents targeting the AHR/NRF2 pathway for AD have been reported. Coal tar, one of the oldest therapies, restores expression of key skin barrier proteins via AHR and activates NRF2 by dephosphorylating STAT6 to disrupt Th2 cytokine signaling. However, questions remain regarding its safety and potential carcinogenicity [20]. Difamilast treatment inhibits IL-33 activity via the AHR/NRF2 pathway, contributing to improved AD symptoms [37]. Ketoconazole (KCZ) suppresses IL-8 production and exhibits cell-protective effects mediated by the AHR/NRF2 system [38]. The only natural compound reported is triacylglycerol from the cannabis plant, which alleviates nicotinamide adenine dinucleotide phosphate oxidase 2 (NOX2)-dependent mitochondrial dysfunction and repairs the skin barrier via the AHR/NRF2 pathway, making it a promising therapeutic agent for preventing and treating AD [39]. Tapinarof, as a natural compound, demonstrates clear therapeutic efficacy, fully confirming that natural compounds targeting the AHR/NRF2 pathway possess potential therapeutic value for AD while exhibiting favorable safety profiles. This study has identified some new AHR/NRF2 dual agonists, providing more possibilities for the treatment of AD. However, this is only a small part of the screening results, and further validation and exploration of their potential for the treatment of AD are needed in the future. Although numerous research gaps remain in this field, this study proposed 2-arylbenzofurans, carbazole alkaloids, carboline alkaloids, phenanthrenes, flavones, and furocoumarins as potential dual agonists, providing crucial directional guidance and theoretical reference for subsequent research in related fields.
The rapid advancement of artificial intelligence (AI) in recent years has enabled algorithms to shine in the field of life sciences, with virtual drug screening emerging as a prevailing trend [40]. Machine learning has significantly propelled drug discovery [41]. However, the models established in this study still hold room for improvement. Targets often possess multiple binding sites that drive distinct downstream responses. For instance, AHR is commonly recognized as the receptor for dioxins and polycyclic aromatic hydrocarbons (PAHs), leading to its exclusion or abandonment in drug development [42,43]. Nevertheless, certain endogenous ligands of AHR are essential for maintaining normal physiological functions, and activation of AHR by some natural compounds can yield beneficial effects [44]. Therefore, in agonist screening, machine learning should distinguish compounds that produce different downstream effects. This, however, requires further elucidation of the AHR protein structure and its co-crystal complexes with ligands. Beyond machine learning, more intelligent algorithms are emerging with AI advancements, such as self-learning multimodal large models for DNA, RNA, and protein tasks [45]. The screening of agonists or drug candidates can evolve toward more complex and intelligent approaches in the future. This will not only better replace labor-intensive and resource-consuming experiments but also provide insights into the underlying biological mechanisms.

4. Materials and Methods

4.1. Dataset

The datasets for AHR and NRF2 agonists are both from PubChem (https://pubchem.ncbi.nlm.nih.gov/bioassay/2796 (accessed on 25 December 2023) and https://pubchem.ncbi.nlm.nih.gov/bioassay/624171 (accessed on 6 January 2025)), which include the molecular formula and test results of each compound. The data were preprocessed by removing inorganic, duplicate, and inclusive compounds. Due to the focus on designing small-molecule drugs, compounds with molecular weights greater than 600 were deleted from the dataset. In the original dataset, the number of non-agonist samples was significantly larger than that of agonist samples, which could result in the model being biased towards the majority class and performing poorly on the minority class. To address the class imbalance issue, undersampling was performed by randomly selecting an equal number of non-agonist samples as the agonist samples.
After completing all preprocessing steps, the chemical space of the dataset was examined. The molecular weight of the compounds in both datasets spanned from 100 to 600 Da, and the logP values were mainly distributed between −2 and 6. This indicates good chemical diversity, coverage, and balanced distribution (Supplementary Figure S1), meeting the requirements for drug development research. The chiral simplified molecular input line entry system (SMILES) of compounds was obtained using RDKit, and various molecular descriptors were generated using Deepchem, including molecular fingerprints—E2048, E1024, PUB, MACCS—as well as the featurizer of general graph convolution networks for molecules. To reduce redundant features, columns with zero variance were removed (Table 1). The molecular descriptors were used as features, and agonist (1) or non-agonist (0) labels as target values to build machine learning models.

4.2. Machine Learning Model Algorithms

Agonist prediction is a binary classification scenario. The algorithms employed in this study include ensemble learning (RF, LGBM from the Scikit-learn 1.5.1) and deep learning (GCN, GAT, MPNN, and AFP from the DeepChem 2.8.0). Decision trees are good at solving classification problems, and their ensemble patterns are categorized into two types: bagging and boosting. RF [46], one of the most classical bagging models, randomly samples partial data and features for training each tree, and performs equal-weight voting on the results. LGBM [47] is a boosting framework that learns and improves from each training iteration to obtain better learners through successive refinement. Additionally, graph representation has recently emerged as a frontier. Compared with molecular fingerprints, graph notation encodes more structural information [48]. Therefore, in addition to relatively classical deep learning models like GCN [49], GAT [50], and MPNN [51], graph-based neural networks suitable for drug discovery, such as AttentiveFP (AFP), have also emerged accordingly [52]. In this study, ensemble models were trained using various molecular fingerprints, while neural network models were trained with the featurizer of general graph convolution networks for molecules (abbreviated as Graph). The naming scheme follows the format [Target]_[Algorithm]_[Feature] (e.g., AHR_RF_E2048, NRF2_GCN_Graph). A complete list of model names is provided in Supplementary Table S1.

4.3. Cross-Validation and Hyperparameter Search

The dataset was stratified by class and randomly split into training, validation, and test sets at a ratio of 0.64, 0.16, and 0.2, respectively. The training and validation sets were used for hyperparameter optimization by averaging the results of 5-fold cross-validation, while the test set was employed to assess the predictive performance and generalization ability of the models. The hyperparameters of the ensemble models were automatically tuned using the Optuna 3.5.0 of Python 3.10.5, while parameter selection for the deep learning models was performed via grid search. Detailed information on the parameters is provided in Supplementary Tables S2 and S3.

4.4. Model Evaluation

For binary classification models, PR, RE, ACC, and F1 are typical indicators [53], which were integrated using the macro-averaging method. In addition, AUC was used to evaluate the classification performance of the models. For agonist classification, the threshold may not necessarily be 0.5, and the AUC is unaffected by the specific threshold, thus providing a comprehensive assessment of model performance. The formula is as follows:
P R = 1 n i = 1 n T P i T P i + F P i
R E = 1 n i = 1 n T P i T P i + F N i
A C C = i = 1 n T P i m
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
where true positives (TP) refer to samples for which both the true label and the predicted label are positive; false positives (FP) refer to samples with a negative true label but a positive predicted label; and false negatives (FN) refer to samples with a positive true label but a negative predicted label.

4.5. Feature Importance Assessment

To gain insights into the structural differences between agonists and non-agonists from the model, a feature importance analysis was conducted on the best-performing model. The SHAP interpretable machine learning library can calculate the SHAP values for each sample and each feature, thereby analyzing the direction in which features influence the model’s prediction results. It is currently a widely used analysis tool [54]. Subsequently, RDKit was used to visualize the most important molecular fingerprints to further analyze the relationship between structure and activation.

4.6. Screening of Natural Compound Libraries

The natural compound library is from https://coconut.naturalproducts.net/ (accessed on 8 January 2025) [55], which contains 695142 compounds. First, duplicate values and rows with empty “name” were removed. Second, rows with all null values in ‘np_classifier_class’, ‘np_classifier_superclass’, and ‘np_classifier_pathway’ were deleted, as these indicate unknown compound categories. In accordance with the calculation criteria of the COCONUT website, compounds with np-likeness ≤ 0 were removed. Considering the chemical characteristics of drugs, natural compounds with a molecular weight < 500, logP between 0 and 5, hydrogen bond acceptors < 10, hydrogen bond donors < 5, and a CAS number were selected, totaling 17,159. The optimal-performing models were used for screening to obtain AHR/NRF2 dual agonists.

4.7. Cell Culture and Reagents

The HaCaT cell line was originally purchased from the National Collection of Authenticated Cell Cultures (Beijing, China). Cells were grown in MEM medium (Gibco, Grand Island, NY, USA) supplemented with 10% fetal bovine serum (FBS) (Corning, New York, NY, USA) and 1% penicillin/streptomycin (Gibco, Grand Island, NY, USA) at 37 °C in a humidified atmosphere of 5% CO2. Indirubin was purchased from MedChemExpress (Monmouth Junction, NJ, USA). Imperatorin, isoimperatorin, 3′-O-Methylbatatasin III, and bergapten were purchased from Targetmol (Boston, MA, USA). TBHQ was purchased from Solarbio (Beijing, China).

4.8. Cell Viability Assay

Cell viability was determined using the Cell Counting Kit-8 (CCK-8) assay and was done according to the manufacturer’s instructions (Sangon, Shanghai, China). Briefly, HaCaT cells were seeded in 96-well plates (2 × 104 cells/well) and allowed to adhere overnight. Then, cells were treated with different natural compounds for 24 h. A total of 10 μL of CCK-8 reagent was added to each well, and cells were incubated at 37 °C for 1.5 h. The absorbance was measured at 450 nm using a multifunctional enzyme marker (Tecan, Männedorf, Switzerland). The asorbance of cells in the control group was regarded as 100% cell survival. All experiments were carried out three times in four parallel wells.

4.9. Dual Luciferase Reporter Gene Assay

Cells were first seeded in 96-well plates at a density of 2 × 104 cells per well and cultured overnight to achieve complete adhesion; after adhesion, the cells were transfected with pGL4.43 [luc2P/XRE/Hygro] Vector (Promega, Madison, WI, USA) or pGL4.37 [luc2P/ARE/Hygro] Vector (Promega, Madison, WI, USA) and pRL-TK Vector (Promega, Madison, WI, USA) using Lipofectamine® LTX & PLUS™ Reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. Twenty-four hours later, cells were treated with different natural compounds. After 24 h, the luminescence was measured in a Promega GloMax-Multi microplate reader (Promega, Madison, WI, USA) using the Dual Luciferase Reporter Assay System (Promega, Madison, WI, USA). All experiments were carried out three times in four parallel wells.

4.10. Statistics

All experimental results are shown as the means ± SEMs. Statistical analyses were performed using GraphPad Prism version 9.0 (GraphPad Software, San Diego, CA, USA). Statistical significance among the different groups was tested by one-way analysis of variance, and p < 0.05 indicated statistical significance.

5. Conclusions

This study established a machine learning model for predicting dual AHR/NRF2 agonists, which showed good classification performance. Based on this model, we further identified key molecular features of dual agonists, namely a low fraction of sp3-hybridized carbons, moderate hydrophobicity, limited alkyl chains, and highly conjugated structures. Utilizing the established model, 1011 potential dual agonists of AHR/NRF2 from a natural compound library were screened for reference. For validation, Indirubin, imperatorin and 3′-O-Methylbutastatin III were first discovered as AHR/NRF2 dual agonists in HaCaT cells. Overall, this study facilitates and supports the screening of AD therapeutics. However, as AHR and NRF2 represent key metabolic pathways listed in TOX21, further investigation into their ligand characteristics is warranted to better leverage them as disease targets.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms27083530/s1.

Author Contributions

Y.Z.: Investigation, methodology, software, data curation, visualization, writing—original draft; Q.L.: conceptualization, investigation; X.H.: validation, investigation; X.L.: formal analysis, investigation; Z.S.: investigation; H.Q.X.: project administration; B.Z.: supervision. L.X.: supervision, funding acquisition, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (2024YFA0918802).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank Tianlong Wang for his guidance in machine learning and we thank Artemis Biologics Limited for their support.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:
AHRAryl Hydrocarbon Receptor
NRF2Nuclear Factor Erythroid 2-Related Factor 2
NQO1NAD(P)H Quinone Dehydrogenase 1
HO-1Heme Oxygenase-1
GSTGlutathione S-Transferase
CATCatalase
SODSuperoxide Dismutase
CYP1A1Cytochrome P450 Family 1 Subfamily A Member 1
IL-4, 8, 13, 33Interleukin-4, 8, 13, 33
FLGFilaggrin
STAT6Signal Transducer and Activator of Transcription 6
GABAAGamma-Aminobutyric Acid Type A Receptor
GPCRG Protein-Coupled Receptor
PPAR-δPeroxisome Proliferator-Activated Receptor Delta
SMILESSimplified Molecular Input Line Entry System
ECFP4Extended-Connectivity Fingerprint with a diameter of 4
MACCSMolecular ACCess System keys
NOX2Nicotinamide Adenine Dinucleotide Phosphate Oxidase 2

References

  1. Esser, C.; Bargen, I.; Weighardt, H.; Haarmann-Stemmann, T.; Krutmann, J. Functions of the aryl hydrocarbon receptor in the skin. Semin. Immunopathol. 2013, 35, 677–691. [Google Scholar] [CrossRef] [PubMed]
  2. Alalaiwe, A.; Lin, Y.K.; Lin, C.H.; Wang, P.W.; Lin, J.Y.; Fang, J.Y. The absorption of polycyclic aromatic hydrocarbons into the skin to elicit cutaneous inflammation: The establishment of structure-permeation and In Silico-In Vitro-In Vivo relationships. Chemosphere 2020, 255, 126955. [Google Scholar] [CrossRef] [PubMed]
  3. Tsuji, G.; Hashimoto-Hachiya, A.; Kiyomatsu-Oda, M.; Takemura, M.; Ohno, F.; Ito, T.; Morino-Koga, S.; Mitoma, C.; Nakahara, T.; Uchi, H.; et al. Aryl hydrocarbon receptor activation restores filaggrin expression via OVOL1 in atopic dermatitis. Cell Death. Dis. 2017, 8, e2931. [Google Scholar] [CrossRef] [PubMed]
  4. Chen, Z.; Dragan, M.; Sun, P.; Haensel, D.; Vu, R.; Cui, L.; Zhu, P.; Yang, N.; Shi, Y.; Dai, X. The AhR-Ovol1-Id1 regulatory axis in keratinocytes promotes epidermal and immune homeostasis in atopic dermatitis-like skin inflammation. Cell. Mol. Immunol. 2025, 22, 300–315. [Google Scholar] [CrossRef]
  5. Abbas, S.; Alam, S.; Singh, K.P.; Kumar, M.; Gupta, S.K.; Ansari, K.M. Aryl Hydrocarbon Receptor Activation Contributes to Benzanthrone-Induced Hyperpigmentation via Modulation of Melanogenic Signaling Pathways. Chem. Res. Toxicol. 2017, 30, 625–634. [Google Scholar] [CrossRef]
  6. Fernández-Gallego, N.; Sánchez-Madrid, F.; Cibrian, D. Role of AHR Ligands in Skin Homeostasis and Cutaneous Inflammation. Cells 2021, 10, 3176. [Google Scholar] [CrossRef]
  7. Dawe, H.R.; Di Meglio, P. The Aryl Hydrocarbon Receptor (AHR): Peacekeeper of the Skin. Int. J. Mol. Sci. 2025, 26, 1618. [Google Scholar] [CrossRef]
  8. Salman, S.; Paulet, V.; Hardonnière, K.; Kerdine-Römer, S. The role of NRF2 transcription factor in inflammatory skin diseases. BioFactors 2025, 51, e70013. [Google Scholar] [CrossRef]
  9. Liu, H.; Hu, Y.; Ji, W.; Wang, S.; Zhu, Y.; Lin, Q.; Zhao, X.; Zhou, H.; Guo, X.; Liu, Y.; et al. Sustained Activation of Nrf2 Antioxidant Pathway by Flexible Liposome Based on Low Phase Transition Temperature to Delay Skin Aging. Adv. Healthc. Mater. 2026, 15, e01696. [Google Scholar] [CrossRef]
  10. Park, C.; Lee, H.; Noh, J.S.; Jin, C.Y.; Kim, G.Y.; Hyun, J.W.; Leem, S.H.; Choi, Y.H. Hemistepsin A protects human keratinocytes against hydrogen peroxide-induced oxidative stress through activation of the Nrf2/HO-1 signaling pathway. Arch. Biochem. Biophys. 2020, 691, 108512. [Google Scholar] [CrossRef]
  11. Lu, Y.; Wei, W.; Li, M.; Chen, D.; Li, W.; Hu, Q.; Dong, S.; Liu, L.; Zhao, Q. The USP11/Nrf2 positive feedback loop promotes colorectal cancer progression by inhibiting mitochondrial apoptosis. Cell. Death. Dis. 2024, 15, 873. [Google Scholar] [CrossRef] [PubMed]
  12. Liang, J.; Lian, L.; Wang, X.; Li, L. Thymoquinone, extract from Nigella sativa seeds, protects human skin keratinocytes against UVA-irradiated oxidative stress, inflammation and mitochondrial dysfunction. Mol. Immunol. 2021, 135, 21–27. [Google Scholar] [CrossRef]
  13. Ho, C.C.; Ng, S.C.; Chuang, H.L.; Wen, S.Y.; Kuo, C.H.; Mahalakshmi, B.; Huang, C.Y.; Kuo, W.W. Extracts of Jasminum sambac flowers fermented by Lactobacillus rhamnosus inhibit H(2) O(2)–and UVB-induced aging in human dermal fibroblasts. Environ. Toxicol. 2021, 36, 607–619. [Google Scholar] [CrossRef] [PubMed]
  14. Li, Q.; Bai, D.; Qin, L.; Shao, M.; Zhang, S.; Yan, C.; Yu, G.; Hao, J. Protective effect of d-tetramannuronic acid tetrasodium salt on UVA-induced photo-aging in HaCaT cells. Biomed. Pharmacother. 2020, 126, 110094. [Google Scholar] [CrossRef]
  15. Zhong, Q.Y.; Luo, Q.H.; Lin, B.; Lin, B.Q.; Su, Z.R.; Zhan, J.Y. Protective effects of andrographolide sodium bisulfate on UV-induced skin carcinogenesis in mice model. Eur. J. Pharm. Sci. 2022, 176, 106232. [Google Scholar] [CrossRef]
  16. Köhle, C.; Bock, K.W. Activation of coupled Ah receptor and Nrf2 gene batteries by dietary phytochemicals in relation to chemoprevention. Biochem. Pharmacol. 2006, 72, 795–805. [Google Scholar] [CrossRef]
  17. Marchand, A.; Barouki, R.; Garlatti, M. Regulation of NAD(P)H:quinone oxidoreductase 1 gene expression by CYP1A1 activity. Mol. Pharmacol. 2004, 65, 1029–1037. [Google Scholar] [CrossRef] [PubMed]
  18. Hwang, J.; Newton, E.M.; Hsiao, J.; Shi, V.Y. Aryl hydrocarbon receptor/nuclear factor E2-related factor 2 (AHR/NRF2) signalling: A novel therapeutic target for atopic dermatitis. Exp. Dermatol. 2022, 31, 485–497. [Google Scholar] [CrossRef]
  19. Furue, M.; Tsuji, G.; Mitoma, C.; Nakahara, T.; Chiba, T.; Morino-Koga, S.; Uchi, H. Gene regulation of filaggrin and other skin barrier proteins via aryl hydrocarbon receptor. J. Dermatol. Sci. 2015, 80, 83–88. [Google Scholar] [CrossRef]
  20. van den Bogaard, E.H.; Bergboer, J.G.; Vonk-Bergers, M.; van Vlijmen-Willems, I.M.; Hato, S.V.; van der Valk, P.G.; Schröder, J.M.; Joosten, I.; Zeeuwen, P.L.; Schalkwijk, J. Coal tar induces AHR-dependent skin barrier repair in atopic dermatitis. J. Clin. Investig. 2013, 123, 917–927. [Google Scholar] [CrossRef]
  21. Furue, M. Regulation of Filaggrin, Loricrin, and Involucrin by IL-4, IL-13, IL-17A, IL-22, AHR, and NRF2: Pathogenic Implications in Atopic Dermatitis. Int. J. Mol. Sci. 2020, 21, 5382. [Google Scholar] [CrossRef]
  22. Nielsen, J.C.; Hjo Rringgaard, C.; Nygaard, M.M.R.; Wester, A.; Elster, L.; Porsgaard, T.; Mikkelsen, R.B.; Rasmussen, S.; Madsen, A.N.; Schlein, M.; et al. Machine-Learning-Guided Peptide Drug Discovery: Development of GLP-1 Receptor Agonists with Improved Drug Properties. J. Med. Chem. 2024, 67, 11814–11826. [Google Scholar] [CrossRef]
  23. Xiao, F.; Ding, X.; Shi, Y.; Wang, D.; Wang, Y.; Cui, C.; Zhu, T.; Chen, K.; Xiang, P.; Luo, X. Application of ensemble learning for predicting GABA(A) receptor agonists. Comput. Biol. Med. 2024, 169, 107958. [Google Scholar] [CrossRef] [PubMed]
  24. Yang, J.; Cai, Y.; Zhao, K.; Xie, H.; Chen, X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov. Today 2022, 27, 103356. [Google Scholar] [CrossRef]
  25. Li, Z.; Huang, R.; Xia, M.; Patterson, T.A.; Hong, H. Fingerprinting Interactions between Proteins and Ligands for Facilitating Machine Learning in Drug Discovery. Biomolecules 2024, 14, 72. [Google Scholar] [CrossRef]
  26. Zorn, K.M.; Foil, D.H.; Lane, T.R.; Russo, D.P.; Hillwalker, W.; Feifarek, D.J.; Jones, F.; Klaren, W.D.; Brinkman, A.M.; Ekins, S. Machine Learning Models for Estrogen Receptor Bioactivity and Endocrine Disruption Prediction. Environ. Sci. Technol. 2020, 54, 12202–12213. [Google Scholar] [CrossRef]
  27. Jabeen, A.; Ranganathan, S. Applications of machine learning in GPCR bioactive ligand discovery. Curr. Opin. Struct. Biol. 2019, 55, 66–76. [Google Scholar] [CrossRef]
  28. Da’adoosh, B.; Marcus, D.; Rayan, A.; King, F.; Che, J.; Goldblum, A. Discovering highly selective and diverse PPAR-delta agonists by ligand based machine learning and structural modeling. Sci. Rep. 2019, 9, 1106. [Google Scholar] [CrossRef] [PubMed]
  29. Zhang, Z.; Pan, F.; Chen, Q.; Guo, T.; Song, H. Decoding the quantitative structure-activity relationship and astringency formation mechanism of oxygenated aromatic compounds. Food Res. Int. 2025, 210, 116421. [Google Scholar] [CrossRef] [PubMed]
  30. Wu, S.; Wang, L.; Schlenk, D.; Liu, J. Machine Learning-Based Toxicological Modeling for Screening Environmental Obesogens. Environ. Sci. Technol. 2024, 58, 18133–18144. [Google Scholar] [CrossRef]
  31. Zhu, K.; Shen, C.; Tang, C.; Zhou, Y.; He, C.; Zuo, Z. Improvement in the screening performance of potential aryl hydrocarbon receptor ligands by using supervised machine learning. Chemosphere 2021, 265, 129099. [Google Scholar] [CrossRef]
  32. Wojtyło, P.A.; Łapińska, N.; Bellagamba, L.; Camaioni, E.; Mendyk, A.; Giovagnoli, S. Initial development of automated machine learning-assisted prediction tools for aryl hydrocarbon receptor activators. Pharmaceutics 2024, 16, 1456. [Google Scholar] [CrossRef]
  33. Gruszczyk, J.; Grandvuillemin, L.; Lai-Kee-Him, J.; Paloni, M.; Savva, C.G.; Germain, P.; Grimaldi, M.; Boulahtouf, A.; Kwong, H.-S.; Bous, J. Cryo-EM structure of the agonist-bound Hsp90-XAP2-AHR cytosolic complex. Nat. Commun. 2022, 13, 7010. [Google Scholar] [CrossRef]
  34. Jiang, Z.-Y.; Xu, L.L.; Lu, M.-C.; Chen, Z.-Y.; Yuan, Z.-W.; Xu, X.-L.; Guo, X.-K.; Zhang, X.-J.; Sun, H.-P.; You, Q.-D. Structure–activity and structure–property relationship and exploratory in vivo evaluation of the nanomolar Keap1–Nrf2 protein–protein interaction inhibitor. J. Med. Chem. 2015, 58, 6410–6421. [Google Scholar] [CrossRef]
  35. Diao, X.; Shang, Q.; Guo, M.; Huang, Y.; Zhang, M.; Chen, X.; Liang, Y.; Sun, X.; Zhou, F.; Zhuang, J. Structural basis for the ligand-dependent activation of heterodimeric AHR-ARNT complex. Nat. Commun. 2025, 16, 1282. [Google Scholar] [CrossRef]
  36. Kwong, H.-S.; Paloni, M.; Grandvuillemin, L.; Sirounian, S.; Ancelin, A.; Lai-Kee-Him, J.; Grimaldi, M.; Carivenc, C.; Lancey, C.; Ragan, T. Structural insights into the activation of human aryl hydrocarbon receptor by the environmental contaminant Benzo [a] pyrene and structurally related compounds. J. Mol. Biol. 2024, 436, 168411. [Google Scholar] [CrossRef]
  37. Tsuji, G.; Yumine, A.; Kawamura, K.; Takemura, M.; Kido-Nakahara, M.; Yamamura, K.; Nakahara, T. Difamilast, a Topical Phosphodiesterase 4 Inhibitor, Produces Soluble ST2 via the AHR–NRF2 Axis in Human Keratinocytes. Int. J. Mol. Sci. 2024, 25, 7910. [Google Scholar] [CrossRef]
  38. Tsuji, G.; Takahara, M.; Uchi, H.; Matsuda, T.; Chiba, T.; Takeuchi, S.; Yasukawa, F.; Moroi, Y.; Furue, M. Identification of ketoconazole as an AhR-Nrf2 activator in cultured human keratinocytes: The basis of its anti-inflammatory effect. J. Investig. Dermatol. 2012, 132, 59–68. [Google Scholar] [CrossRef] [PubMed]
  39. Wang, Y.; Lu, H.; Cheng, L.; Guo, W.; Hu, Y.; Du, X.; Liu, X.; Xu, M.; Liu, Y.; Zhang, Y. Targeting mitochondrial dysfunction in atopic dermatitis with trilinolein: A triacylglycerol from the medicinal plant Cannabis fructus. Phytomedicine 2024, 132, 155856. [Google Scholar] [CrossRef] [PubMed]
  40. Chen, Y.; Huang, J.-H.; Sun, Y.; Zhang, Y.; Li, Y.; Xu, X. Haplotype-resolved assembly of diploid and polyploid genomes using quantum computing. Cell Rep. Methods 2024, 4, 100754. [Google Scholar] [CrossRef] [PubMed]
  41. Patel, L.; Shukla, T.; Huang, X.; Ussery, D.W.; Wang, S. Machine learning methods in drug discovery. Molecules 2020, 25, 5277. [Google Scholar] [CrossRef] [PubMed]
  42. Zhang, W.; Xie, H.Q.; Li, Y.; Zhou, M.; Zhou, Z.; Wang, R.; Hahn, M.E.; Zhao, B. The aryl hydrocarbon receptor: A predominant mediator for the toxicity of emerging dioxin-like compounds. J. Hazard. Mater. 2022, 426, 128084. [Google Scholar] [CrossRef] [PubMed]
  43. d’Anna, B.; Albinet, A.; Aït-Aïssa, S. In vitro assessment of aryl hydrocarbon, estrogen, and androgen receptor-mediated activities of secondary organic aerosols formed from the oxidation of polycyclic aromatic hydrocarbons and furans. Environ. Res. 2025, 273, 121220. [Google Scholar]
  44. Polonio, C.M.; McHale, K.A.; Sherr, D.H.; Rubenstein, D.; Quintana, F. The aryl hydrocarbon receptor: A rehabilitated target for therapeutic immune modulation. Nat. Rev. Drug Discov. 2025, 24, 610–630. [Google Scholar] [CrossRef]
  45. de Almeida, B.P.; Richard, G.; Dalla-Torre, H.; Blum, C.; Hexemer, L.; Pandey, P.; Laurent, S.; Rajesh, C.; Lopez, M.; Laterre, A. A multimodal conversational agent for DNA, RNA and protein tasks. Nat. Mach. Intell. 2025, 7, 928–941. [Google Scholar] [CrossRef]
  46. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  47. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  48. An, X.; Chen, X.; Yi, D.; Li, H.; Guan, Y. Representation of molecules for drug response prediction. Brief. Bioinf. 2022, 23, bbab393. [Google Scholar] [CrossRef] [PubMed]
  49. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  50. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  51. Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; pp. 1263–1272. [Google Scholar]
  52. Xiong, Z.; Wang, D.; Liu, X.; Zhong, F.; Wan, X.; Li, X.; Li, Z.; Luo, X.; Chen, K.; Jiang, H.; et al. Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism. J. Med. Chem. 2020, 63, 8749–8760. [Google Scholar] [CrossRef]
  53. Anh, P.T.Q.; Thuyet, D.Q.; Kobayashi, Y. Image classification of root-trimmed garlic using multi-label and multi-class classification with deep convolutional neural network. Postharvest Biol. Technol. 2022, 190, 111956. [Google Scholar] [CrossRef]
  54. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  55. Chandrasekhar, V.; Rajan, K.; Kanakam, S.R.S.; Sharma, N.; Weißenborn, V.; Schaub, J.; Steinbeck, C. COCONUT 2.0: A comprehensive overhaul and curation of the collection of open natural products database. Nucleic Acids Res. 2024, 53, D634–D643. [Google Scholar] [CrossRef]
Figure 1. Distribution of the top five RDKit descriptors with the highest point-biserial correlation to AHR and NRF2 agonist activity. (a) Violin plots showing the distribution of the top five descriptors (FractionCSP3, SlogP_VSA3, BCUT2D_LOGPLOW, BCUT2D_CHGLO, SlogP_VSA2) for AHR agonists (red) and non-agonists (blue); (b) Violin plots showing the distribution of the top five descriptors (FractionCSP3, BCUT2D_LOGPLOW, SlogP_VSA2, Chi0n, Chi1n) for NRF2 agonists (red) and non-agonists (blue). The black box inside each violin represents the interquartile range, with the vertical line indicating the median value. The shape of the violin reflects the kernel density estimate of the data distribution, highlighting differences in central tendency and spread between agonist and non-agonist groups.
Figure 1. Distribution of the top five RDKit descriptors with the highest point-biserial correlation to AHR and NRF2 agonist activity. (a) Violin plots showing the distribution of the top five descriptors (FractionCSP3, SlogP_VSA3, BCUT2D_LOGPLOW, BCUT2D_CHGLO, SlogP_VSA2) for AHR agonists (red) and non-agonists (blue); (b) Violin plots showing the distribution of the top five descriptors (FractionCSP3, BCUT2D_LOGPLOW, SlogP_VSA2, Chi0n, Chi1n) for NRF2 agonists (red) and non-agonists (blue). The black box inside each violin represents the interquartile range, with the vertical line indicating the median value. The shape of the violin reflects the kernel density estimate of the data distribution, highlighting differences in central tendency and spread between agonist and non-agonist groups.
Ijms 27 03530 g001
Figure 2. Evaluation of classification performance of models for AHR and NRF2 agonist prediction. (a) Performance metrics (Accuracy (ACC), Precision (PR), Recall (RE), F1-score (F1), and Area Under the ROC Curve (AUC)) of multiple machine learning models. Numerical values are labeled to compare model performance across metrics. (b) Confusion matrices for the top-performing models (AHR_AFP_Graph and NRF2_RF_PUB), showing the counts of true negatives, false positives, false negatives, and true positives. Color intensity reflects sample size. Models include random forest (RF), light gradient boosting machine (LGBM), and graph neural networks (GCN, GAT, MPNN, AFP) using different molecular fingerprints (E2048, E1024, MACCS, PUB) or graph representations.
Figure 2. Evaluation of classification performance of models for AHR and NRF2 agonist prediction. (a) Performance metrics (Accuracy (ACC), Precision (PR), Recall (RE), F1-score (F1), and Area Under the ROC Curve (AUC)) of multiple machine learning models. Numerical values are labeled to compare model performance across metrics. (b) Confusion matrices for the top-performing models (AHR_AFP_Graph and NRF2_RF_PUB), showing the counts of true negatives, false positives, false negatives, and true positives. Color intensity reflects sample size. Models include random forest (RF), light gradient boosting machine (LGBM), and graph neural networks (GCN, GAT, MPNN, AFP) using different molecular fingerprints (E2048, E1024, MACCS, PUB) or graph representations.
Ijms 27 03530 g002
Figure 3. SHAP analysis and visualization of key ECFP4 fingerprint bits driving AHR and NRF2 agonist predictions. (a) SHAP summary plot for the AHR_LGBM_E2048 model, showing the impact of the top 10 most influential ECFP4 fingerprint bits on the model output. The x-axis represents SHAP values (positive values promote agonist prediction, negative values promote non-agonist prediction), while the y-axis lists ECFP4 bit indices. Color coding indicates the ECFP4 bit value (red = 1, bit present; blue = 0, bit absent), revealing how each fingerprint feature influences model decisions. (b) SHAP summary plot for the NRF2_LGBM_E2048 model, formatted identically to panel (a), highlighting the top 10 ECFP4 bits driving NRF2 agonist/non-agonist predictions. (c) Bar chart of ECFP4 bit occurrence frequencies in AHR agonists (orange) and non-agonists (grey) for the top 10 influential bits identified in panel (a). (d) Bar chart of ECFP4 bit occurrence frequencies in NRF2 agonists (orange) and non-agonists (grey) for the top 10 influential bits identified in panel (b). (e) 2D molecular visualization of the top five ECFP4 bits supporting non-AHR-agonist predictions (top row) and AHR-agonist predictions (bottom row) in the AHR_LGBM_E2048 model. Atoms are colored by type: blue = central atom, yellow = aromatic atoms, gray = aliphatic atoms. Asterisks (*) denote omitted portions of the molecular substructure. (f) 2D molecular visualization of the top five ECFP4 bits supporting non-NRF2-agonist predictions (top row) and NRF2-agonist predictions (bottom row) in the NRF2_LGBM_E2048 model, using the same atom coloring scheme as panel (e) to highlight relevant chemical substructures.
Figure 3. SHAP analysis and visualization of key ECFP4 fingerprint bits driving AHR and NRF2 agonist predictions. (a) SHAP summary plot for the AHR_LGBM_E2048 model, showing the impact of the top 10 most influential ECFP4 fingerprint bits on the model output. The x-axis represents SHAP values (positive values promote agonist prediction, negative values promote non-agonist prediction), while the y-axis lists ECFP4 bit indices. Color coding indicates the ECFP4 bit value (red = 1, bit present; blue = 0, bit absent), revealing how each fingerprint feature influences model decisions. (b) SHAP summary plot for the NRF2_LGBM_E2048 model, formatted identically to panel (a), highlighting the top 10 ECFP4 bits driving NRF2 agonist/non-agonist predictions. (c) Bar chart of ECFP4 bit occurrence frequencies in AHR agonists (orange) and non-agonists (grey) for the top 10 influential bits identified in panel (a). (d) Bar chart of ECFP4 bit occurrence frequencies in NRF2 agonists (orange) and non-agonists (grey) for the top 10 influential bits identified in panel (b). (e) 2D molecular visualization of the top five ECFP4 bits supporting non-AHR-agonist predictions (top row) and AHR-agonist predictions (bottom row) in the AHR_LGBM_E2048 model. Atoms are colored by type: blue = central atom, yellow = aromatic atoms, gray = aliphatic atoms. Asterisks (*) denote omitted portions of the molecular substructure. (f) 2D molecular visualization of the top five ECFP4 bits supporting non-NRF2-agonist predictions (top row) and NRF2-agonist predictions (bottom row) in the NRF2_LGBM_E2048 model, using the same atom coloring scheme as panel (e) to highlight relevant chemical substructures.
Ijms 27 03530 g003
Figure 4. Enrichment analysis of superclass (a) and class (b) for potential AHR/NRF2 dual-agonist natural compounds, along with their hierarchical correspondence (c). The enrichment ratio represents the proportion of compounds in this category within the 1011 compounds relative to that in the entire dataset (NP: natural products).
Figure 4. Enrichment analysis of superclass (a) and class (b) for potential AHR/NRF2 dual-agonist natural compounds, along with their hierarchical correspondence (c). The enrichment ratio represents the proportion of compounds in this category within the 1011 compounds relative to that in the entire dataset (NP: natural products).
Ijms 27 03530 g004
Figure 5. Validation of five natural compounds in HaCaT cells. (a) Cell viability test; (b) Detection of NRF2 pathway activation using dual luciferase reporter gene assay, with TBHQ used as a positive control; (c) Detection of AHR pathway activation using dual luciferase reporter gene assay, with indirubin serving as a positive control. The concentration of all compounds is 10 μM. Data are represented as mean ± SEM. Statistical significance is shown as * p < 0.05, ** p < 0.01, and *** for p < 0.001, as evaluated by one-way ANOVA.
Figure 5. Validation of five natural compounds in HaCaT cells. (a) Cell viability test; (b) Detection of NRF2 pathway activation using dual luciferase reporter gene assay, with TBHQ used as a positive control; (c) Detection of AHR pathway activation using dual luciferase reporter gene assay, with indirubin serving as a positive control. The concentration of all compounds is 10 μM. Data are represented as mean ± SEM. Statistical significance is shown as * p < 0.05, ** p < 0.01, and *** for p < 0.001, as evaluated by one-way ANOVA.
Ijms 27 03530 g005
Table 1. The lengths of molecular fingerprints used in this work.
Table 1. The lengths of molecular fingerprints used in this work.
Molecular FingerprintsAHRNRF2
Length (Bits)Length After FS (Bits)Length (Bits)Length After FS (Bits)
E20482048204820482040
E10241024102410241024
MACCS167153167154
PUB881633881616
Note: FS stands for feature selection.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhen, Y.; Li, Q.; Hu, X.; Liu, X.; Shao, Z.; Xie, H.Q.; Zhao, B.; Xu, L. AHR/NRF2 Dual Agonist Prediction and Natural Compound Screening Based on Machine Learning: A New Strategy for the Treatment of Atopic Dermatitis. Int. J. Mol. Sci. 2026, 27, 3530. https://doi.org/10.3390/ijms27083530

AMA Style

Zhen Y, Li Q, Hu X, Liu X, Shao Z, Xie HQ, Zhao B, Xu L. AHR/NRF2 Dual Agonist Prediction and Natural Compound Screening Based on Machine Learning: A New Strategy for the Treatment of Atopic Dermatitis. International Journal of Molecular Sciences. 2026; 27(8):3530. https://doi.org/10.3390/ijms27083530

Chicago/Turabian Style

Zhen, Yu, Qi Li, Xiaoxu Hu, Xiaorui Liu, Zhijie Shao, Heidi Qunhui Xie, Bin Zhao, and Li Xu. 2026. "AHR/NRF2 Dual Agonist Prediction and Natural Compound Screening Based on Machine Learning: A New Strategy for the Treatment of Atopic Dermatitis" International Journal of Molecular Sciences 27, no. 8: 3530. https://doi.org/10.3390/ijms27083530

APA Style

Zhen, Y., Li, Q., Hu, X., Liu, X., Shao, Z., Xie, H. Q., Zhao, B., & Xu, L. (2026). AHR/NRF2 Dual Agonist Prediction and Natural Compound Screening Based on Machine Learning: A New Strategy for the Treatment of Atopic Dermatitis. International Journal of Molecular Sciences, 27(8), 3530. https://doi.org/10.3390/ijms27083530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop