Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning

Naghshnejad, Pegah; Das, Debojyoti; Romagnoli, Jose A.; Kumar, Revati; Chen, Jianhua

doi:10.3390/membranes16010012

Open AccessArticle

Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning

by

Pegah Naghshnejad

¹,

Debojyoti Das

²,

Jose A. Romagnoli

^1,*,

Revati Kumar

² and

Jianhua Chen

³

¹

Department of Chemical Engineering, Louisiana State University, Baton Rouge, LA 70803, USA

²

Department of Chemistry, Louisiana State University, Baton Rouge, LA 70803, USA

³

Department of Computer Science, Louisiana State University, Baton Rouge, LA 70803, USA

^*

Author to whom correspondence should be addressed.

Membranes 2026, 16(1), 12; https://doi.org/10.3390/membranes16010012

Submission received: 31 October 2025 / Revised: 14 December 2025 / Accepted: 17 December 2025 / Published: 31 December 2025

(This article belongs to the Special Issue Design, Synthesis and Applications of Ion Exchange Membranes)

Download

Browse Figures

Versions Notes

Abstract

Anion exchange membranes (AEMs) play a vital role in the performance of water electrolyzers and fuel cells, yet their discovery and optimization remain challenging due to the complexity of structure–property relationships. In this study, we introduce a machine learning framework that leverages conditional graph neural networks (cGNNs) and descriptor-based models and a hybrid graph neural network (HGARE) to predict and interpret ionic conductivity. The descriptor-based pipeline employs principal component analysis (PCA), ablation, and SHAP analysis to identify factors governing anion conductivity, revealing electronic, topological, and compositional descriptors as key contributors. Beyond prediction, dimensionality reduction and clustering are performed by employing t-SNE and KMeans as well as SOM, which reveal distinct membranes clusters, some of which were enriched with high anion conductivity. Among graph-based approaches, the graph convolutional (GCN) achieved strong predictive performance, while the Hybrid Graph Autoencoder-Regressor Ensemble (HGARE) achieved the highest accuracy. Additionally, atom-level saliency maps from GCN provide spatial explanations for conductive behavior, revealing the importance of polarizable and flexible regions. This work contributes to the accelerated and data-driven design of high-performance AEMs.

Keywords:

anion exchange membranes; anion conductivity; machine learning; data driven modeling

1. Introduction

Anion exchange membranes (AEMs) play a crucial role in electrochemical energy systems, such as alkaline fuel cells and water electrolyzers, and facilitate the selective transport of hydroxide ions (

{O H}^{-}

). Their development offers significant advantages over proton exchange membranes by enabling the use of non-noble metal catalysts and cost-effective cell components [1]. However, limitations such as lower hydroxide ion mobility, instability of cationic groups in alkaline media, and insufficient mechanical durability pose ongoing challenges [2,3,4].

Traditionally, progress in the development of anion exchange membranes for electrochemical energy conversion and separation technologies has been achieved primarily through trial-and-error experimental approaches. In these workflows, new polymer backbones, side chains, or cationic headgroups are systematically synthesized and subsequently evaluated for the electrochemical performance. Such workflows are time-consuming and inefficient, given the vast design space of potential polymers. Notably, minor changes in molecular architecture can influence ion conductivity and membrane swelling [3]. The identification of informative molecular features is critical for predictive accuracy [5]. The Mordred package generates thousands of physicochemical, topological, and electronic features that capture subtle structural variations [6]. To address these challenges, machine learning (ML) has emerged as a powerful approach for predicting key polymer properties and guiding material discovery. Recent developments demonstrate how deep learning models are successfully applied across domains, ranging from polymer chemistry to medical diagnostics [7,8,9].

ML and data-driven modeling are increasingly applied in material science [10,11,12], enabling the accelerated screening and prediction of polymer properties. Data-driven models can extract complex structure–property relationships from high-dimensional molecular descriptors and enable efficient design of high-performance AEMs [1,2,13]. ML models can learn complex structure–property relationships from existing data, enabling researchers to screen and optimize new AEM candidates virtually before synthesis [14]. Zhai et al. developed a deep learning model to predict

{O H}^{-}

conductivity in poly (2,6)-dimethyl phenylene oxide-based AEMs, achieving high accuracy across different functional group chemistries [15]. Similarly, Phua et al. applied explainable ML to jointly predict conductivity and alkaline stability, identifying compositional and topological features that distinguish high-performance materials [16]. In related work, MOF-based membranes with tailored/hydrophilic/hydrophobic balance and improved mechanical robustness demonstrate the significance of combining stability with next-generation AEM design [17]. For further discussion of topological design in materials, see Deng et al. [18].

This field is also moving towards sustainability-driven polymer design. Most commercial AEMs rely on fluorinated backbones or unstable cationic groups. To overcome these issues, Schertzer et al. employed ML to design fluorine-free AEMs that balance high

{O H}^{-}

conductivity with low swelling and high alkaline stability. Their virtual screening campaign identified over 400 high-performing, fluorine-free candidates from millions of hypothetical polymers [19]. These align with the recommendations from Yassin et al. who emphasize that future AEMs must achieve a balance between conductivity, stability, and durability to meet fuel cell operation targets [2]. Together, these works illustrate the growing convergence of polymer chemistry, data science, and explainable ML.

Graph neural networks (GNNs) such as graph convolutional networks (GCNs) [20,21,22], have shown promise in polymer informatics due to their ability to process molecular graphs rather than precomputed descriptors [23,24]. This capability is rooted in advances in message-passing neural networks that directly learn atom-level features from graph topology, which have been widely applied in molecular property prediction [25,26]. This allows models to directly learn from atomic and bonding information in the polymer backbone. Liu et al. used augmented GCN models to screen copolymers for AEM applications, ultimately guiding the synthesis of membranes that achieved high conductivity and favorable water uptake [27].

Despite the success of ML in property prediction, interpretability remains a key challenge. SHAP quantifies the contribution of each input feature to a given prediction, enabling the identification of key descriptors associated with high or low conductivity [28]. In a recent study, Dalal et al. used SHAP and Bayesian optimization to guide the design of gene delivery polymers, showing how model interpretations can uncover actionable chemical insights [29]. In the context of AEMs, SHAP analysis by Phua et al. revealed that some features were strongly associated with enhanced ion transport.

In addition to molecular descriptors and graph representations, we also considered several system-level design parameters that strongly influence anion conductivity but are not directly encoded in SMILES or molecular graphs. Specifically, the Block A fraction represents the hydrophilic, ion-conducting segment that enhances hydroxide ion transport through water uptake, while the Block B (same here as well) fraction corresponds to the hydrophobic, mechanically stabilizing domain that improves structural robustness and mitigates excessive swelling. Their ratio (Block A/B) is a critical trade-off parameter: higher Block A content facilitates conductivity but risks mechanical instability, whereas higher block B content improves durability but restricts ion mobility. We further distinguished between polymer type (copolymer vs. homopolymer). Finally, all conductivity measurements were at 80 °C, widely accepted benchmarking conditions that isolate structure–property relationships and eliminate temperature variability in ion transport.

We further employed saliency mapping [30,31] on GNNs to directly visualize which atoms and substructures more strongly contribute to predicted hydroxide conductivity. Unsupervised learning techniques are increasingly applied in polymer science to explore chemical space and uncover latent structural patterns. Dimensionality reduction algorithms such as t-SNE [32] and UMAP [33], when combined with clustering algorithms like DBSCAN [34], can reveal subpopulations of structurally or functionally similar membranes. Phua et al. used dimensionality reduction and clustering techniques to map AEMS structures into distinct clusters and showed that specific clusters were enriched in high-conductivity polymers [35]. Similarly, Ehiro et al. developed a method to interpret clusters using feature importance analysis, linking each chemical space region to dominant molecular characteristics [36].

Despite recent progress, a unified framework that integrates descriptor-based interpretability with graph-based generalization and unsupervised pattern discovery for AEM remains underexplored. Building on these foundations, this study introduces a multi-scale interpretable ML framework for predicting and explaining anion conductivity in AEMs. Our approach integrates descriptor-based modeling (using PCA, SHAP, ablation analysis) with graph-based learning (GCN, GAT) and a novel Hybrid Graph Autoencoder-Regressor Ensemble (HGARE). The framework not only enhances predictive accuracy but also provides chemical interpretability through feature attribution and saliency visualization. Furthermore, unsupervised models, t-SNE, KMeans, and SOM uncovers latent structural families associated with high anion conductivity. Together, these components bridge molecular-level descriptors and graph-level learning, accelerating the data-driven discovery of high-performance AEMs.

2. Methodology

Our proposed ML framework for predicting and interpreting anion conductivity of AEMs is illustrated in Figure 1. Molecular descriptors were computed using Mordred [6].

For unsupervised analysis, t-SNE is combined with KMeans for data clustering. Self-Organizing Map (SOM) plots were used to identify descriptor trends within each cluster [10,37].

For supervised modeling, we trained two separate pipelines. The first model is based on deep learning, consisting of a GCN with two layers and ReLU activation, followed by a multi-layer perceptron (MLP) head [38]. In addition, we implemented a Graph Attention mechanism to weigh the importance of neighboring nodes, providing more expressive graph representations. Also, we used a HGARE model that combines graph autoencoder pretraining and supervised fine-tuning and ensemble averaging for enhanced prediction accuracy. The other model is a group of traditional ML regressors (including XGBoost, CatBoost, Random Forest, LightGBM, ElasticNet, MLP) trained on precomputed descriptors. The models were evaluated using

R^{2}

, MAE, and RMSE.

Descriptor-based models are advantageous because they are computationally efficient, relatively easy to train, and highly interpretable due to their reliance on chemically meaningful molecular descriptors. However, their performance is constrained by the expressiveness of predefined descriptors, which may not fully capture complex topological or electronic features present in AEM architectures.

Graph-based neural networks (GNNs) automatically learn hierarchical representations directly from molecular graphs, enabling richer characterization of polymer backbones, cross-linking patterns, and functional group connectivity. Their main limitations include higher data requirements and reduced interpretability relative to descriptor-based models.

Unsupervised learning methods (PCA, t-SNE, SOM, KMeans) serve as exploratory tools that reveal intrinsic structures within the descriptor space, helping identify regions of high-performance materials and trends in molecular diversity. These methods do not perform prediction directly but support model interpretation and design rule formulation.

2.1. Unsupervised Learning

Unsupervised learning or knowledge discovery is a machine learning area in which algorithms extract features and identify unlabeled dataset patterns. By leveraging unsupervised learning, researchers can uncover hidden structures and gain a more comprehensive understanding of the data [39,40].

After preprocessing, dimensionality reduction (DR) techniques project the process data, remove redundant, correlated data, and combine them into lower dimensional scores. DR may project data in two or three dimensions to enable visualization, or the technique may simply function to remove redundant information from process data. In this work, t-SNE was employed for dimensionality reduction with a perplexity of 30 and was chosen to balance local neighborhood preservation with global structure, the learning rate was set to 200 to avoid crowding effects, and the algorithm was run for 1000 iterations to ensure convergence.

After DR, clustering algorithms were applied to group the data into meaningful clusters without prior labels. The goal of data clustering is the unsupervised classification of data into groups or clusters that are useful and meaningful. Using data clustering, the key groups within a database can be isolated and connected to meaningful classification. In this work, a density-based clustering tool was considered. Here, we used KMeans [41] specifying three clusters. These values were empirically determined to yield the most stable and chemically meaningful separation of polymers. In this clustering method, regions of high density of any shape are located and separated from one another by regions of low density.

Once clusters are identified, verification of their separation is crucial before using supervised learning. To verify the separation between different clusters and provide a meaningful explanation, alternative methods such as Subspace Greedy Search (SGS) or SHAP analysis can be used to find the highest contributing combination of variables (descriptors in our application) that cause separation. A complete description of all these methods, hyperparameter tuning, and it applications to chemical processes is provided by Romagnoli et al. [42].

In the predictive phase, PCA was utilized to construct regression models. PCA compressed the original 1432 Mordred descriptors into eight orthogonal principal components, retaining 80.54% of the total variance, effectively minimizing collinearity and overfitting in the regression stage. These principal components were subsequently used as inputs for the descriptor-based models which achieved high R² values (0.85), demonstrating that PCA-based latent features successfully preserved the essential structural–electronic information required for accurate anion conductivity prediction.

2.2. Descriptor-Based Model

To develop a robust descriptor-based model for predicting the ionic conductivity of AEMs, we utilized Mordred, a widely used molecular descriptor calculator that supports over 1800 two-dimensional and three-dimensional descriptors [6]. Each polymer backbone or repeat unit is represented as a SMILES string, and then the SMILES strings are converted to Mordred descriptors. Mordred is an open-source molecular descriptor calculator [6] capable of rapidly computing an extensive set of features of each molecule. However, certain descriptors may be undefined for specific molecules, resulting in missing values. To ensure data quality, we applied strict data cleaning. We removed descriptors containing missing values across polymers, yielding a high-quality dataset with 207 AEMs and 1432 numeric descriptors, each free of missing or invalid values. Additionally, composition-based features were included to reflect polymer microstructure, alongside categorical information such as polymer type. These descriptors were subsequently used as input features for training supervised machine learning models to predict the membranes of anion conductivity (Figure 2).

To ensure robust generalization and minimize overfitting due to high dimensionality, PCA [43] was applied prior to the model training, reducing the descriptor space to eight principal components explaining 80.54 of the total variances. The resulting features were then used to train and evaluate multiple regression algorithms, including CatBoost [44], XGBoost [45], Random Forest [46], LightGBM [47], ElasticNet [48], and multi-layer Perceptron (MLP) neural network [49].

2.3. Graph Models

In the graph-based models, each polymer was represented as a molecular graph, where each atom served as nodes and bonds as edges. Each atom in the polymer is represented by a 74-dimensional feature vector. This vector encodes basic chemical properties such as atom type, degree (number of bonded neighbors), formal charge, aromaticity, and hybridization state (e.g., sp, sp², sp³). Additional flags mark whether the atom belongs to a ring and their categorical descriptors generated with RDKIT. These atom-level features serve as the input to the GNN layers. Graph-level embeddings were obtained through two stacked convolutional layers, followed by mean pooling to aggregate node representation into a single molecular vector. Mean pooling was selected instead of sum or max pooling because it normalizes variations in molecular, size, ensuring comparability across polymers with different chain lengths.

2.3.1. Graph Convolutional Network

In this model, we used two layers of graph convolution operations [38], implemented using the DGL library. Each layer applies the following Transformation:

h_{v}^{(l + 1)} = R e L U (\sum_{u \in N (V)} W^{(l)} h_{u}^{(l)} + b^{(l)}),

(1)

where

N

(v) denotes the neighborhood of node v, and

W^{(l)}, b^{(l)}

are trainable parameters of the l-th GCN layer. After two such layers, node embeddings are aggregated using mean pooling to obtain graph-level representation

h_{G}

.

The representation is concatenated with the input molecular descriptors and passed through a three-layer MLP with ReLU activation to predict the target property. Formally,

Y^{\land} = f_{M L P} (Concat (h_{G}, f e a t u r e s))

(2)

where features represent other molecular features.

Figure 3 illustrates the overall architecture of our graph-based model. In this hybrid framework, molecular structure is encoded via a graph convolutional layer that learns atom-level embeddings through neighborhood aggregation. These node embeddings are pooled into a graph-level vector and concatenated with hand-crafted molecular descriptors and polymer composition features, including the percentage of Block A, percentage of Block B, and the block A/B ratio.

Block A and Block B correspond to the two polymer segments or building blocks within the copolymer structure. The Block A fraction represents the hydrophilic or ion-conducting segment, which facilitates hydroxide ion transport, while Block B represents the hydrophobic or mechanically stable segment that enhances structural integrity. The Block A/B ratio captures the relative proportion of these segments and reflects the balance between ion conductivity and mechanical durability.

The combined representation is passed through a feed-forward layer to predict anion conductivity. This design enables the model to simultaneously leverage structural information from the molecular graph and composition features.

2.3.2. Graph Attention Network

To improve expressive power, the GAT model replaces standard graph convolutions with Graph Attention layers, enabling the network to learn dynamic, edge-aware attention weights across neighbors. Each attention layer computes the following:

h_{v}^{(l + 1)} = ∥_{k = 1}^{K} R e L U (\sum_{u \in N (v)} {α_{v u}^{(k)} W}^{(k)} + h_{u}^{(l)}),

(3)

where

α_{v u}^{(k)}

are the attention coefficients learned from the

k^{t h}

attention head,

W^{(k)}

are the corresponding projection weights, and K is the number of heads. Two stacked GAT layers are applied, followed by mean node pooling and concatenation as in the GCN. The combined feature vector is passed through an MLP to produce the final prediction. A dropout layer is used after concatenation for regularization.

2.3.3. Hybrid Graph Autoencoder-Regressor Ensemble (HGARE)

A hybrid graph neural network, termed the Hybrid Graph Autoencoder-Regressor Ensemble (HGARE), was developed to predict anion conductivity in AEMs. This framework was designed to first learn chemically meaningful latent representations through unsupervised learning and subsequently refine these embeddings via supervised fine-tuning within an ensemble-learning setup to enhance stability and predictive accuracy.

The HGARE architecture comprises three principal modules as shown in Figure 4: a graph encoder, a node-feature decoder, and a Dense-SE regressor. The encoder was implemented using multiple layers of the Graph Isomorphism Network with Edge Attributes (GINEConv), each followed by batch normalization, ReLU activation, and dropout regularization. Three consecutive layers generated atom-level embeddings that were aggregated by global mean pooling to form a compact graph-level latent representation. To ensure that the encoder captured a chemically consistent latent space, a lightweight feed-forward decoder reconstructed the node features from the latent embeddings, enforcing structural awareness through a combined binary cross-entropy and mean-squared error reconstruction loss.

The Dense-SE regressor received the concatenated vector composed of the learned molecular embeddings together with hand-crafted molecular descriptors and polymer composition features, including the percentage of Block A, the percentage of Block B, and the Block A/B ratio. The regressor consisted of several dense blocks, each comprising linear, batch normalization, ReLU, and dropout layers. To enhance representation learning, a squeeze-and-excitation (SE) attention mechanism was incorporated to adaptively re-weight hidden channels, emphasizing the most informative features. The regressor output a single scalar corresponding to the predicted ionic conductivity.

Model optimization was performed in two stages. In the first stage, the encoder–decoder pair underwent denoising autoencoder pretraining using only the reconstruction losses to capture intrinsic molecular regularities. In the second stage, the pretrained encoder was coupled with the regressor and fine-tuned jointly. The joint loss function combined supervised and reconstruction components as

L = λ_{s u p} (α L_{S m o o t h L 1} + (1 - α) L_{M S E}) + (1 - λ_{s u p}) L_{r e c o n}

(4)

where L_sup represents a weighted combination of Smooth L1 and mean-squared error losses, L_recon denotes the reconstruction loss, and λ controls the relative weighting between supervised and unsupervised objectives.

Each component of the HGARE architecture (AE pretraining, SE block, joint reconstruction loss, and ensemble) was later evaluated individually through ablation analysis to quantify its contribution to predictive performance. In contrast to existing hybrid graph neural network autoencoder frameworks, the proposed Hybrid Graph Autoencoder-Regressor Ensemble (HGARE) introduces several architectural and training distinctions. The key differences are as follows:

SE-based feature recalibration: HGARE incorporates a squeeze-and-excitation (SE) attention block that adaptively weights node embeddings—an element that is not present in standard GNN-AE hybrids.
Joint reconstruction–regression training: Instead of pretraining followed by isolated regression, HGARE employs dual-loss fine-tuning, enabling more stable representation learning.
Graph ensemble averaging: HGARE aggregates predictions across multiple random graph initializations, substantially improving robustness (as shown by ablation).
Tailored architecture for polymer descriptors: Unlike prior small-molecule GNN-AEs, HGARE handles block–copolymer graphs with repeating-unit expansion and cross-link representations. The HGARE framework effectively integrates the structural interpretability of graph neural networks with the representational flexibility of dense architectures, thereby leveraging both molecular topology and global physicochemical descriptors. The combination of autoencoder pretraining, joint fine-tuning, and ensemble averaging results in stable, accurate, and chemically consistent predictions of anion conductivity in AEMs. Consequently, HGARE represents a robust and generalizable modeling paradigm for polymer informatics and data-driven materials discovery.

3. Dataset

The dataset was further curated by reconstructing polymer structures from the original publications and generating canonical SMILES strings. We performed an additional round of manual structural curation. Each membrane was traced back to its original publication, the polymer structure was redrawn in ChemDraw Suit 23.1.2 [50], and canonical SMILES strings were generated. These smiles were then carefully checked against their backbone identity, cationic group, and block composition to ensure structural accuracy. In some cases, the original literature [35] did not provide sufficient detail to reconstruct the chemical structure. Such entries were excluded to preserve the integrity of dataset, leaving samples that could be verified. The outcome of this process is a curated dataset of 207 AEMs from 48 peer-reviewed publications found in Web of Science. This curated resource is not only internally consistent but also extends the original database by providing canonical SMILES that were not available previously. Each record includes SMILES strings of the whole polymer structure, Block A and Block B and Block A/B composition, and experimentally reported hydroxide conductivity at 80 °C. This dataset served as the input for both descriptor-based and graph-based machine learning workflows. Molecular graphs were created from SMILES using RDKit (version 2024.03.5) [51], and descriptors were computed using Mordred [6] and RDKit libraries [51]. The full dataset, including membrane names, membrane structures, and source references, is provided in Supporting Information to facilitate transparency and reproducibility (Table S1).

Molecular descriptors were computed using RDKit and the Mordred toolkit, a molecular representation method that encodes atomic composition, electric properties, topological indices, and geometric configurations of each polymer molecule, yielding an initial 1610 descriptors. All descriptors were converted to numeric format, and columns/rows with missing values were dropped to ensure clean inputs. This resulted in a dataset of 1432 descriptors per sample. Additionally, engineered features such as Block A and B compositions, Block A/B ratio, and polymer type encoded as a binary variable (0 = homopolymer, 1 = copolymer) were appended to each row to reflect microstructural context. These descriptors span a wide range of physiochemical, topological, and electrostatic properties, and they serve as the foundation for both descriptor-based supervised learning and unsupervised learning clustering described later in this work.

External Data Curation

In addition to the primary dataset, an external set of 20 anion exchange membranes [19] was assembled to evaluate the generalization of the model under domain shift. The identification of suitable external data was nontrivial, as open-source databases for anion exchange membranes are limited and experimental data are typically scattered across individual studies with heterogeneous reporting standards. Conductivity values were curated from independent experimental reports of hydroxide conductivity measured at comparable temperatures and hydration levels. Only membranes with explicitly reported chemical structures, block compositions, and conductivity measurements were retained. All molecular structures were validated using RDKit, and entries with invalid or non-sanitizable SMILES representations were excluded. Minor inconsistencies in units and data formats were standardized, while no smoothing or imputation was applied in order to preserve the intrinsic experimental noise.

The external dataset was derived primarily from the compilation reported by Ramprasad and co-workers [19], in which several membrane chemistries were measured under multiple experimental conditions. For membranes reported more than once under comparable conditions, conductivity values were averaged to obtain a single representative value per unique chemistry, thereby avoiding overrepresentation while retaining experimental variability. The resulting dataset spans a narrow conductivity range and exhibits higher relative noise than the primary dataset, rendering correlation-based metrics such as R² inappropriate for performance assessment. Consequently, model evaluation relied on absolute error-based metrics and a regressor-only domain adaptation strategy during inference. Additional details of the dataset curation and numerical distributions are provided in Supporting Information (page 2–4).

4. Training and Hyperparameter Optimization

For all models, we employed Optuna [52], a Bayesian optimization framework to perform hyperparameter tuning over 20 trials per model. Prior to optimization, the dataset was divided into random 90/10 train–test sets. Within the training data, a 5-fold cross-validation (CV) strategy was implemented to evaluate model generalization and mitigate overfitting during hyperparameter search.

Each Optuna trail involved training the model on the four folds and validating the remaining one. The final model configuration was retained using the best hyperparameters on the full training set and evaluated on the held-out test set.

To further assess the robustness and interoperability of the trained models, we conducted ablation analysis; in addition, SHAP (Shapley Additive exPlanations) [29] analysis was applied to evaluate feature importance and instability of trained models.

For graph neural network models including GCN and GAT, and the proposed HGARE, all implementations were developed in PyTorch 2.6 [53] and PyTorch Geometric 2.5. For the graph-based models, each polymer was reported as a molecular graph using RDKIT, where atoms were treated as nodes and bonds as edges. Node features were derived using canonical atomic features (e.g., atom number, aromaticity, and formal charge). These graphs were concatenated with the same composition features (Block A, Block B, Block A/B ratio, polymer type) to form a hybrid input for graph models. Hyperparameter tuning and model selection were performed using internal validation sets, derived from statistical random splits of training data.

We optimized models with Optuna, minimizing validation of RMSE with early stopping to avoid overfitting. During training, metrics including RMSE, mean absolute error (MAE), and coefficient of determination (

R^{2}

) were recorded using TensorBoard (version 2.20.0), and the best checkpoint from each trial was saved. The optimal hyperparameters from the best performing results were then used to retrain the model on the training set, and final results were reported on held-out test splits.

5. Results and Discussion

To evaluate the predictive performance of different machine learning models for estimating the ionic conductivity of AEMs, a set of descriptor-based and graph-based regressors were trained and optimized. These included XGBoost, Random Forest, and CatBoost, LightGBM, ElasticNet, and MLP regressors based on Mordred descriptors, two graph neural networks (GCN and GAT), and our HGARE model trained on molecular graphs generated from SMILES. Saliency maps combined with ablation analysis provide multi-scale interpretability, highlighting atom-level contributions and feature-level relevance [28,54]. This approach aligns with recent studies using graph-level explanations in polymer informatics [55]. The clusters discovered through t-SNE and SOM not only validate patterns of notes in prior unsupervised studies [35,36,37] but also reveal distinct structural groupings associated with conductivity that were not previously characterized.

5.1. Descriptor-Based Models

The Mordred featurization produced approximately 1432 molecular descriptors after data cleaning per membrane, spanning topological and electrostatic properties. However, the number of available samples was comparatively small relative to feature dimensionality, resulting in a high feature-to-sample ratio.

Such high-dimensional spaces are prone to overfitting and collinearity among descriptors, which can obscure true structure–property relationships. To address this, we employed PCA to compress the descriptor space into a set of orthogonal latent features that retain the majority of the dataset’s variance. This level of compression effectively preserved the essential physicochemical information while mitigating redundancy, noise, and potential overfitting in machine learning models.

All features were first standardized to zero mean and unit variance using z-score normalization, ensuring that descriptors with different numerical ranges contributed equally to the PCA decomposition. PCA was then applied with the explained variance threshold set to 80%, automatically selecting the smallest number of components that captured most of the data variance (Figure S2).

In our case, PCA retained eight PCs, collectively explaining 80.5% of the total variance, with PC1 alone explaining 42.4%, including a strong structural–topological gradient across the membranes, representing descriptors such as ETA, MPC, and ATS linked to molecular size, branching, and topological complexity; PC2 (12.1%) dominated by FCSP3, HybRatio, and AATS families, reflecting hybridization and polarity balance.

Subsequent components (PC3-PC8) encoded surface area, dipole correlation, and electronic dispersion effects, providing complementary information about molecular topology and charge distribution (Table 1).

This dimensionality reduction not only reduced overfitting but also improved computational efficiency during model optimization.

To further interpret the compressed feature space, Table 1 summarizes the variance explained by each PCA component along with their dominant descriptor families and physicochemical interpretation.

Using these PCA components as inputs, several descriptor-based machine learning models were trained. As summarized in Table 2, ensemble models CatBoost (

R^{2}

= 0.8566), XGBoost (

R^{2}

= 0.8445), and Random Forest (0.8477) practically outperformed shallow and linear baselines such ElasticNet (0.4385), confirming the nonlinearity of conductivity–structure relationships. The CatBoost model achieved the lowest prediction error (MAE = 0.0155 S/cm, RMSE = 0.0187 S/cm), indicating robust generalization after dimensionality reduction.

The ablation analysis was performed by systemically removing each PCA component and reevaluating model performance. This method shows the relevance of each PCA component to model performance (Figure 5 Bottom figures). Among the tested algorithms, CatBoost, XGBoost, and Random Forest were selected for visualization because they consistently ranked as the best-performing descriptor-based models. Across three high-performance descriptor-based models, two complementary conductivity governing mechanism consistently emerged:

Macromolecular topology and polarity (PC1-PC2): These components describe overall molecular size, branching, complexity, and hydrophobic–hydrophilic balance. Their strong influence, particularly in the XGboost model, highlights the morphological continuity of hydrophilic domains and the distribution of polar functional groups primary enablers of ion transport. Well-connected polymer backbones with balanced polarity facilitate continuous ion channels, enhancing charge mobility.
Electronic polarization and dipolar correlation (PC4-PC6-PC7): The CatBoost and Random Forest models revealed sensitivity to these components, which capture distance-weighted dipole moments, charge delocalization, and intramolecular electrostatic coupling. These effects represent the microscopic polarization environment governing ion solvation and dynamic screening within conductive regions. Enhanced electronic flexibility and dipolar alignment promote lower activation barriers for ion hopping and diffusion.

Together, these findings reveal the structural organization and electronic polarizability and cooperatively determine anion conductivity.

5.2. Descriptor Selection and Correlation Analysis

Figure S1 presents a combined view of SHAP summary plots and parity plots for Catboost, XGBoost, and Random Forest.

SHAP values revealed that despite slight variation in feature rankings across models, several descriptors consistently emerged as important. Notably, ETA_shape_y, a topological descriptor related to molecular geometry, was followed closely by Block A/B ratio which qualifies the relative composition of polymer blocks. Interestingly, basic compositional parameters such as Block A and Block B also ranked highly, indicating that structural balance in the copolymer plays a crucial role in determining ionic conductivity. Other descriptors like MATS1p, ATC1p, and GATS7c reflect molecular autocorrelation and three-dimensional structural information [5], aligning with the expected structure–property relationships in ion-conducting membranes.

To further refine the descriptor space, we examined the Pearson correlation matrix of top 20 descriptors identified by SHAP across three best-performing descriptor models: CatBoost, XGBoost, and Random Forest models (Figure 6). This triangular matrix highlights pairwise correlations and reveals several strong dependencies among descriptors, indicating potential redundancy.

Notably, MATS1p and AATSC1p displayed near-perfect correlation (r ≈ 0.99), followed by SMR_VSA9 and ATSC4s (r ≈ 0.93), and AATS3i and AATS7i (r ≈ 0.74). Among these, AATSC1p with MATS1p and SMR_VSA9 with ATSC4s were deemed highly redundant, reflecting nearly identical underlying information regarding mass and electrotopological autocorrelation of atomic properties. Therefore, one descriptor from each correlated pair was excluded to reduce feature redundancy without sacrificing chemical interpretability.

On moderate correlations, the selection retained the one that was both higher in SHAP importance and chemically more interpretable. For example, MATS1p was retained over AATSC1p, as it provided stronger SHAP contributions (Figure S2) while still representing molecular autocorrelation.

Other moderate correlations, such as between ETA_shape_y and AATSC2m (r ≈ 0.61), suggest that shape anisotropy and molecular autocorrelation are partially linked but still capture distinct physical phenomena; specifically, ETA_shape_y describes molecular three-dimensional compactness, while AATSC2m encodes the averaged mass distribution weighted by atom-pair distances. Hence, both remain as they represent complementary structural and topological factors influencing ion transport.

Through this two-step process, SHAP-based ranking followed by correlation-based pruning, we reduced the feature set from the original 20 to a more compact and informative set of 17 descriptors. These retained descriptors balance predictive performance with interoperability, ensuring that the final model captures diverse, non-redundant structural and structural features relevant to anion conductivity.

To enhance the clustering analysis and more directly capture the relationship between structure and performance, we incorporated the target variable (membrane anion conductivity (S/cm)) into the features set used for clustering. Additionally, a derived numeric feature test was introduced. This variable was derived using the following formula:

test = \{\begin{matrix} 1, & i f c o n d u c t i v i t y \geq 0.05 S / cm \\ - 1, & o t h e r w i s e \end{matrix}

(5)

This formulation provides a coarse but meaningful distinction between high and low conductivity membranes in our dataset. Including this binary label in the clustering pipeline allows us to assess whether structure-based clusters correspond to performance categories. The distribution of anion conductivity values is shown Figure S3. This plot reveals a right-skewed distribution, with a natural cutoff near 0.05 S/cm, supporting our use of this threshold in defining the classification label. After identifying a refined set of informative, minimally correlated descriptors, we sought to explore whether these features could capture latent structural patterns related to ionic conductivity. To this end, we applied unsupervised learning techniques to visualize the chemical space [56] of AEMs and investigate whether distinct performance-relevant subgroups could be identified.

5.3. Clustering of Descriptor Space

To explore the structure–performance relationship from an unsupervised perspective, KMeans clustered in a reduced two-dimensional feature space obtained via t-SNE using the selected molecular descriptors. As shown in Figure 7, the clustering separated the AEMs into three distinct groups. Cluster 1, positioned at the top, contained the majority of high-conductivity membranes, forming tight and isolated groups. In contrast, clusters 2 and 3, situated to the bottom, represented two subtypes of low-conductivity membranes, suggesting that multiple structural motifs can independently contribute to poor performance.

To quantitatively assess the grouping structure suggested by clustering, we computed silhouette coefficients and Davies–Bouldin indices in PCA-reduced descriptor space. Both metrics showed local optima at K = 3, with a positive silhouette score (0.32) and a minimum in the Davies–Bouldin index. These results justify the selection of three clusters by indicating a balanced combination of intracluster compactness and intercluster separation.

We employed a Self-Organizing Map (SOM) using JMP-Fastman (version 0.6) [57] to visualize the component contribution of the selected descriptors. In this analysis, we included the binary test feature (test = 1 if conductivity > 0.05 S/cm; −1 otherwise), which served as a classification component to highlight differences between high- and low-conductivity membranes. SOMs were trained (10 × 10,500 iterations, decaying learning rate from 0.5 to 0.01). The SOM projected high-dimensional descriptor data onto a two-dimensional map while preserving topological relationships. The resulting SOM U-matrix revealed a clear segmentation of feature space, with well-defined distance boundaries indicating distinct regions of similarity among the data samples. In this U-matrix, darker areas indicate higher dissimilarity between nodes and stronger cluster boundaries, while lighter areas denote closely related samples. When coloring the test variable, the SOM demonstrated a strong separation between high- and low-conductivity membranes, with high-performance samples forming a compact and isolated region on the map. Figure 8b represents cluster 1, which corresponds to the high conductivity membranes to exhibit consistent molecular patterns and physicochemical attributes.

To further interpret the SOM clusters, we examined component (feature) contribution analysis across the map, Block A/B ratio was substantially higher in the region associated with high conductivity (cluster 1), confirming that membranes with a greater fraction of hydrophilic block are structurally optimized for ion transport. In many AEMs, Block A often corresponds to hydrophilic moieties such as quaternary imidazolium or ammonium functional groups. These positively charged groups not only attract water but also enable efficient transport of hydroxide ions (

{O H}^{-}

) by supporting the formation of ion conduction channels.

TopoPSA highlights the importance of polar functional group exposure in facilitating ion conduction, which was elevated in the high-conductivity cluster. PEOPE_VSA11 also exhibited a notable contribution in cluster 1 compared to cluster 2. This descriptor represents the partial equalization of orbital electronegatively (PEOE)-based van der Waals surface area within a specific charge range, effectively quantifying regions of moderate positive electrostatic potential on the molecular space.

Bar plots summarizing descriptor values with each SOM-identified cluster reinforced these (Figure 9).

Across PCA, SHAP, and clustering analyses, a common trend emerged; molecular topology, hydrophilic hydrophobic balance, and electronic polarization consistently differentiate high- from low-conductivity AEMs.

Based on the insights gained from PCA, SHAP, saliency analysis, and SOM, a set of design rules emerges that can guide the selection and engineering of high-conductivity structures.

Extended molecular shape (e.g., high values of the descriptor ETA_shape_y), suggesting that elongated or extended polymer architectures may enhance ion transport by facilitating continuous ionic domains.
Elevated polar surface area and charge-rich regions (e.g., descriptors such as topoPSA, PEOE_VSA11, ESTATE_VSA8) indicating that polar/electronic surface features favor formation of hydrophilic, ion-conducting pathways.
Intermediate values of two-dimensional autocorrelation/topological descriptors (e.g., GATS- and MATS-type descriptors reflecting a balance between rigidity/flexibility and charge delocalization, which may optimize microstructure and ionic mobility).

These rules are intended to be heuristic guidance (not hard thresholds) for selecting candidates with high likelihood of good conductivity in future polymer design.

5.4. Graph-Based Models

Graph neural networks were trained directly to learn from topological structures of molecules. The GCN model achieved the highest performance across all models, with a test R² of 0.8807 and MAE of 0.0143, demonstrating good performance (Table 2). The GAT model yielded lower R² of 0. 8186, suggesting that attention-based aggregation did not offer a performance advantage in this context. The GAT model’s performance may stem from overfitting due to the increased parameter complexity from the fact that the learned attention weights provided limited added value over simple aggregation in this dataset.

The parity plot for GCN (Figure 10a) shows close alignment between predicted and true conductivity values for both test and train sets, validating its robustness. The strong predictive capacity of GCN highlights the effectiveness of graph-based learning in capturing complex structure–property relationships. For GCN, the optimal parameters were input dimension = 74, hidden dimension = 256, learning rate =

8.42 \times 10^{- 4}

, batch size = 6, and 500 training epochs. For the GAT model, the best configuration employed input dimension = 74, hidden dimension = 128, learning rate =

1.19 \times 10^{- 3}

, batch_size = 8, and 1500 training epochs.

Building on these baselines, the proposed HGARE achieved a substantial improvement by integrating graph autoencoder pretraining, joint supervised fine-tuning, and ensemble averaging. This model achieved a test R² of 0.9460 and MAE of 0.0070, outperforming all models. The parity plot in Figure 10c shows an almost perfect diagonal alignment between predicted and true values, confirming the model’s accuracy and generalization capability. The best hyperparameter settings obtained via random search are hidden dimension = 224, encoder layers = 4, encoder dropout = 0.08, autoencoder learning rate =

1 \times 10^{- 3}

, AE weight decay =

1 \times 10^{- 5}

, AE epochs = 120, batch size = 48, main learning rate =

7 \times 10^{- 4}

, weight decay =

5 \times 10^{- 4}

, dropout = 0.20, head-hidden = 256, α = 0.60, supervised loss weight = 0.92, reconstruction weight = 0.08, cosine annealing scheduler (

T_{O}

= 120, Tmult = 2), and SWA = 0.33.

These results confirm that HGARE architecture outperformed both descriptor-based models (CatBoost, XGBoost) and conventional GNNs by capturing complex structure–property dependencies through its hybrid unsupervised–supervised training and ensemble strategy. This establishes HGARE as a state-of-the-art framework for molecular anion conductivity prediction.

5.5. Ablation Study of HGARE Model Component

To assess the contribution of each component within HGARE, we conducted a systematic ablation study following the reviewer’s recommendation. Four model variants were evaluated by independently removing (i) the autoencoder (AE) pretraining, (ii) the squeeze-and-excitation (SE) attention block, (iii) the reconstruction loss used during joint fine-tuning, and (iv) the ensemble averaging. A GCN + MLP baseline was also included for comparison. All variants were trained under identical conditions and using the same data split as the full HGARE model to ensure fair comparison.

Table 3 summarizes the results. The full HGARE achieves the highest accuracy (R² = 0.948), demonstrating the effectiveness for predicting ionic conductivity. Removing AE pretraining produces almost no degradation, which is expected given the small dataset size. Eliminating the SE block reduces performance to 0.940, confirming that the feature recalibration contributes meaningfully. Removing the reconstruction loss causes a mild decrease in accuracy (R² = 0.940), confirming that the feature recalibration contributes meaningfully. Removing the reconstruction loss causes a mild decrease in accuracy (0.947), suggesting that reconstruction offers weak regularization. In contrast, removing the ensemble produces a substantial drop (R² = 0.907), indicating that ensembling is critical for variance reduction and stability. The GCN + MLP baseline (R² = 0.942) performs well but remains below the full HGARE.

Overall, the ablation results validate the necessity of SE-based feature weighting and ensemble averaging within HGARE, while clarifying that AE pretraining offers minimal benefit for the dataset. These findings directly address the reviewer’s concern regarding architectural complexity and demonstrate which components meaningfully improve model performance.

5.6. Saliency Maps

To investigate the interpretability of our model and understand how structural features influence anion conductivity predictions, we generated atom-level saliency maps using gradient-based attribution, a widely adopted technique for GNN explainability [54], for selected representative molecules. Figure 11 presents a qualitative comparison of these molecules, grouped by their predicted anion conductivity at 80 °C. The saliency values were obtained through gradient-based attribution and are visualized as heatmaps overlaid on each molecular structure. Red colors represent atoms with a positive contribution to the predicted conductivity, while blue tones indicate negative contributions. All saliency values are normalized per molecule. Similar methods have been explored in recent studies of explainable GNNs, particularly those using gradient-based attribution techniques [54].

Across low-conductivity parts (Figure 11a), nitrogen atoms, particularly those forming quaternary ammonium groups, are consistently assigned to negative saliency values (blue), whereas surrounding oxygen atoms (e.g., in ether, ester, or carbonyl functionalities) appear red, indicating a positive contribution. This pattern implies that while cationic groups are required for conductivity, their local structural context, such as spatial, isolation, limited hydration, or steric hindrance, may reduce their effectiveness. For example, the imidazolium group in [VMI] Styrene shows localized blue shading, potentially reflecting insufficient hydrophilicity or poor connectivity to conduction pathways.

In contrast, high-conductivity AEMs (Figure 11), such as lm-DFDM-bPES and Dlm-CHPAES, display pronounced red saliency values near sulfonyl oxygen atoms, ether linkers, and flexible aliphatic segments. Interestingly, the positively charged imidazolium groups in these structures show neutral to slightly blue saliency, implying that their conductive benefits arise more from favorable interaction with surrounding hydrophilic domains than from charged centers alone. The extended conjugation and high segmental mobility in these molecules likely continue to enable efficient ion transport, consistent with known design principles in high-performance AEMs.

These observations align with prior experimental insights, where conductivity is enhanced not solely by the presence of cationic groups but also by their spatial distribution, hydration environment, and their polymer chain flexibility. Our model appears to capture these nuanced relationships, confirming that their predictions are grounded in chemically interpretable structure–property associations.

6. Conclusions

We present a hybrid machine learning framework that integrates unsupervised clustering, descriptor-based modeling, and graph neural networks to predict and interpret ionic conductivity in AEMs.

Among all tested models, the Hybrid Graph Autoencoder-Regressor Ensemble (HGARE) achieved the highest accuracy and robustness. By jointly optimizing a denoising graph autoencoder with a dense squeeze–excitation regressor and applying ensemble averaging across multiple seeds, HGARE effectively captures both local atomic interaction and global compositional features.

For descriptor based-models, PCA was applied to reduce dimensionality and mitigate multicollinearity among 1400+ Mordred descriptors while retaining 80% of the variance. Ablation analysis was then performed on these components to examine how each descriptor group influences ionic conductivity. The result revealed that structural and topological and electrostatic components were primary contributors.

To explore structure–performance relationships beyond prediction, we applied unsupervised learning using t-SNE, KMeans, and Self-Organizing Maps (SOMs). This analysis revealed three distinct membrane clusters, including two low-conductivity subtypes and one high-performance group characterized by flexible, polarizable, and hydrophilic-rich architectures. The clustering results are consistent with ablation analysis and saliency mapping, highlighting how segmental flexibility, local charge distribution, and hydrophilic content collectively enable efficient ion transport. Saliency highlights the gradients of model prediction with respect to node features, revealing chemically meaningful motifs such as quaternary ammonium groups, imidazolium cations, and aromatic backbones that form ion-conducting domains.

This framework not only enables accurate prediction but also chemically interpretable insights, bridging the gap between data-driven predictions and rational AEM design. This combined approach offers a powerful tool for rational AEM design and can be extended to other polymer property predictions. In the future, the framework can be extended to predict multiple properties simultaneously using multitask learning [57], accelerating the discovery of high-performance AEMs. Relevant targets include chemical stability under alkaline conditions, water uptake, swelling ratio, ion exchange capacity (IEC), hydrophilicity distribution, and mechanical durability, each of which contributes to real-world AEM performance. Achieving this requires a curated dataset where these properties are inconsistently measured across identical or structurally comparable membranes. With such data, the proposed HGARE architecture can be adapted to a multitask setting by sharing the graph-based encoder while introducing separate prediction heads for each property. Experimental synthesis and in-plane conductivity measurements of these candidates are important directions for future collaboration with synthetic groups.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/membranes16010012/s1. Figure S1: SHAP summary plots (right) highlighting top molecular descriptors contributing to conductivity predictions for (a) CatBoost, (b) XGBoost, (c) Random Forest. Figure S2: Explained variance curve for PCA transformation of Mordred descriptors. The first eight components capture approximately 80% of the total variance. Figure S3: Kernel density estimate (KDE) plot showing the distribution of membrane anion conductivity values (target variable). The peak near 0.05 S/cm supports the threshold used to define the binary test variable (test = 1 if conductivity < 0.05; −1 otherwise) for performance-aware clustering. Figure S4: Absolute prediction error as a function of experimental anion conductivity for the external validation dataset following regressor-only domain adaptation. Each point corresponds to a single polymer system. The absence of systematic error growth across the conductivity range indicates stable numerical behavior under domain shift. Table S1: This dataset is used in both Graph models and descriptor-based models SMILES representation of these polymer structures used to produce molecular descriptors [58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105].

Author Contributions

Design of Research: P.N. and R.K.; Data Collection: P.N.; Performing Research: P.N. and D.D.; New Analytics Tools: P.N. and D.D.; Data Analysis Tools: P.N. and J.A.R.; Manuscript writing: P.N. and J.A.R.; Manuscript Editing and Review: P.N., D.D., J.A.R., R.K. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the U.S Department of Energy, Office of Science, under the Office of Basic Energy Science Separation Science program under Award No. DE-SC0022304.

Data Availability Statement

All datasets used in this study, including SMILES string used for producing Mordred descriptors and graph models, are in modified_data_final.csv. The descriptors used for descriptor models are available in full_descriptor_data.csv. Both files are included in the data/folder of the repository. All scripts for models are in the src/folder and notebook/folder. The corresponding source code and data for this work are available online at https://github.com/Pegahnn/Conductivity_AEM/ (accessed on 19 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Q.; Yuan, Y.; Zhang, J.; Fang, P.; Pan, J.; Zhang, H.; Zhou, T.; Yu, Q.; Zou, X.; Sun, Z.; et al. Machine Learning-Aided Design of Highly Conductive Anion Exchange Membranes for Fuel Cells and Water Electrolyzers. Adv. Mater. 2024, 36, e2404981. [Google Scholar] [CrossRef]
Yassin, K.; Rasin, I.G.; Brandon, S.; Dekel, D.R. How can we design anion-exchange membranes to achieve longer fuel cell lifetime? J. Membr. Sci. 2024, 690, 122164. [Google Scholar] [CrossRef]
Chen, N.; Lee, Y.M. Anion exchange polyelectrolytes for membranes and ionomers. Prog. Polym. Sci. 2021, 113, 101345. [Google Scholar] [CrossRef]
Abouzari-Lotf, E.; Jacob, M.V.; Ghassemi, H.; Zakeri, M.; Nasef, M.M.; Abdolahi, Y.; Abbasi, A.; Ahmad, A. Highly conductive anion exchange membranes based on polymer networks containing imidazolium functionalised side chains. Sci. Rep. 2021, 11, 3764. [Google Scholar] [CrossRef]
Hutter, M.C. Molecular Descriptors for Chemoinformatics (2nd ed.). By Roberto Todeschini and Viviana Consonni. ChemMedChem 2010, 5, 306–307. [Google Scholar] [CrossRef]
Moriwaki, H.; Tian, Y.-S.; Kawashita, N.; Takagi, T. Mordred: A molecular descriptor calculator. J. Cheminform. 2018, 10, 4. [Google Scholar] [CrossRef] [PubMed]
Mohammadjafari, A.; Lin, M.; Shi, M. Deep Learning-Based Glaucoma Detection Using Clinical Notes: A Comparative Study of Long Short-Term Memory and Convolutional Neural Network Models. Diagnostics 2025, 15, 807. [Google Scholar] [CrossRef] [PubMed]
Das, D.; Chakraborty, D. In-silico identification of a Doxorubicin alternative with reduced cardiotoxicity informed by LLM-assisted modeling. J. Mol. Graph. Model. 2026, 142, 109217. [Google Scholar] [CrossRef] [PubMed]
Das, D.; Teixeira, E.S.; Morales, J.A. Recurrent Neural Network/Machine Learning Predictions of Reactive Channels in H⁺ + C₂H₄ at E_Lab = 30 eV: A Prototype of Ion Cancer Therapy Reactions. J. Comput. Chem. 2025, 46, e70033. [Google Scholar] [CrossRef] [PubMed]
Xie, T.; Grossman, J.C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120, 145301. [Google Scholar] [CrossRef] [PubMed]
Butler, K.T.; Davies, D.W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine learning for molecular and materials science. Nature 2018, 559, 547–555. [Google Scholar] [CrossRef] [PubMed]
Olayiwola, T.; Briceno-Mena, L.A.; Arges, C.G.; Romagnoli, J.A. Synergizing Data-Driven and Knowledge-Based Hybrid Models for Ionic Separations. ACS EST Eng. 2024, 4, 3032–3044. [Google Scholar] [CrossRef]
Ramprasad, R.; Batra, R.; Pilania, G.; Mannodi-Kanakkithodi, A.; Kim, C. Machine learning in materials informatics: Recent applications and prospects. npj Comput. Mater. 2017, 3, 54. [Google Scholar] [CrossRef]
Bradford, G.; Lopez, J.; Ruza, J.; Stolberg, M.A.; Osterude, R.; Johnson, J.A.; Gomez-Bombarelli, R.; Shao-Horn, Y. Chemistry-Informed Machine Learning for Polymer Electrolyte Discovery. ACS Cent. Sci. 2023, 9, 206–216. [Google Scholar] [CrossRef]
Zhai, F.-H.; Zhan, Q.-Q.; Yang, Y.-F.; Ye, N.-Y.; Wan, R.-Y.; Wang, J.; Chen, S.; He, R.-H. A deep learning protocol for analyzing and predicting ionic conductivity of anion exchange membranes. J. Membr. Sci. 2022, 642, 119983. [Google Scholar] [CrossRef]
Phua, Y.K.; Tsuyohiko, F.; Kato, K. Predicting the anion conductivities and alkaline stabilities of anion conducting membrane polymeric materials: Development of explainable machine learning models. Sci. Technol. Adv. Mater. 2023, 24, 2261833. [Google Scholar] [CrossRef] [PubMed]
Shahid, M.U.; Najam, T.; Islam, M.; Hassan, A.M.; Assiri, M.A.; Rauf, A.; Rehman, A.u.; Shah, S.S.A.; Nazir, M.A. Engineering of metal organic framework (MOF) membrane for waste water treatment: Synthesis, applications and future challenges. J. Water Process Eng. 2024, 57, 104676. [Google Scholar] [CrossRef]
Deng, C.-S.; Peng, Z.-X.; Li, B.-X. Ultrahigh Extinction Ratio Topological Polarization Beam Splitter Based on Dual-Polarization Second-Order Topological Photonic Crystals. Adv. Quantum Technol. 2025, 8, 2400637. [Google Scholar] [CrossRef]
William, S.; Shivank, S.; Abhishek, S.; Reanna, R.; Mohammed Al, O.; Janani, S.; Ryan, P.L.; Rampi, R. AI-driven design of fluorine-free polymers for sustainable and high-performance anion exchange membranes. J. Mater. Inform. 2025, 5, 5. [Google Scholar] [CrossRef]
Wang, X.; Zhu, M.; Bo, D.; Cui, P.; Shi, C.; Pei, J. AM-GCN: Adaptive Multi-channel Graph Convolutional Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1243–1253. [Google Scholar] [CrossRef]
Das, D.; Chakraborty, D. Machine Learning Prediction of Physicochemical Properties in Lithium-Ion Battery Electrolytes with Active Learning Applied to Graph Neural Networks. J. Comput. Chem. 2025, 46, e70009. [Google Scholar] [CrossRef]
Ye, Z.; Kumar, Y.J.; Sing, G.O.; Song, F.; Wang, J. A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs. IEEE Access 2022, 10, 75729–75741. [Google Scholar] [CrossRef]
Ge, W.; De Silva, R.; Fan, Y.; Sisson, S.A.; Stenzel, M.H. Machine Learning in Polymer Research. Adv. Mater. 2025, 37, 2413695. [Google Scholar] [CrossRef] [PubMed]
Park, J.; Shim, Y.; Lee, F.; Rammohan, A.; Goyal, S.; Shim, M.; Jeong, C.; Kim, D.S. Prediction and Interpretation of Polymer Properties Using the Graph Convolutional Network. ACS Polym. Au 2022, 2, 213–222. [Google Scholar] [CrossRef] [PubMed]
Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 2015, 28, 2224–2232. [Google Scholar]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
Liu, L.; Li, Y.; Zheng, J.; Li, H. Expert-augmented machine learning to accelerate the discovery of copolymers for anion exchange membrane. J. Membr. Sci. 2024, 693, 122327. [Google Scholar] [CrossRef]
Lundberg, S.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Dalal, R.J.; Oviedo, F.; Leyden, M.C.; Reineke, T.M. Polymer design via SHAP and Bayesian machine learning optimizes pDNA and CRISPR ribonucleoprotein delivery. Chem. Sci. 2024, 15, 7219–7228. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity Checks for Saliency Maps. 2018. Available online: https://proceedings.neurips.cc/paper_files/paper/2018/file/294a8ed24b1ad22ec2e7efea049b8737-Paper.pdf (accessed on 19 October 2025).
van der Maaten, L.; Hinton, G. Viualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
McInnes, L.; John, H. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
Ester, M. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
Phua, Y.K.; Terasoba, N.; Tanaka, M.; Fujigaya, T.; Kato, K. Unsupervised Machine Learning-Derived Anion-Exchange Membrane Polymers Map: A Guideline for Polymers Exploration and Design. ChemElectroChem 2024, 11, e202400252. [Google Scholar] [CrossRef]
Ehiro, T. Feature importance-based interpretation of UMAP-visualized polymer space. Mol. Inform. 2023, 42, 2300061. [Google Scholar] [CrossRef]
Ceriotti, M. Unsupervised machine learning in atomistic simulations, between predictions and understanding. J. Chem. Phys. 2019, 150, 150901. [Google Scholar] [CrossRef]
Naghshnejad, P.; Marchan, G.T.; Olayiwola, T.; Kumar, R.; Romagnoli, J.A. Graph-Based Modeling and Molecular Dynamics for Ion Activity Coefficient Prediction in Polymeric Ion-Exchange Membranes. Ind. Eng. Chem. 2024, 64, 599–612. [Google Scholar] [CrossRef]
Seghers, E.E.; Briceno-Mena, L.A.; Romagnoli, J.A. Unsupervised learning: Local and global structure preservation in industrial data. Comput. Chem. Eng. 2023, 178, 108378. [Google Scholar] [CrossRef]
Seghers, E.E.; Romagnoli, J.A. Data-Driven Process Monitoring for Knowledge Discovery: Local and Global Structures. In Computer Aided Chemical Engineering; Kokossis, A.C., Georgiadis, M.C., Pistikopoulos, E., Eds.; Elsevier: Amsterdam, The Netherlands, 2023; Volume 52, pp. 1809–1815. [Google Scholar]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
Romagnoli, J.; Briceno-Mena, L.; Manee, V. AI in Chemical Engineering: Unlocking the Power Within Data; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar] [CrossRef]
Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 6639–6649. [Google Scholar]
Chen, T.a.G.C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Popescu, M.-C.; Balas, V.E.; Perescu-Popescu, L.; Mastorakis, N. Multilayer perceptron and neural networks. WSEAS Trans. Cir. Sys. 2009, 8, 579–588. [Google Scholar]
Mills, N. ChemDraw Ultra 10.0 CambridgeSoft, 100 CambridgePark Drive, Cambridge, MA 02140. Commercial Price: $1910 for download, $2150 for CD-ROM; Academic Price: $710 for download, $800 for CD-ROM. J. Am. Chem. Soc. 2006, 128, 13649–13650. [Google Scholar] [CrossRef]
Landrum, G. RDKit: Open-Source Cheminformatics. Available online: http://www.rdkit.org (accessed on 15 October 2025).
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
Pope, P.E.; Kolouri, S.; Rostami, M.; Martin, C.E.; Hoffmann, H. Explainability Methods for Graph Convolutional Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10764–10773. [Google Scholar] [CrossRef]
Gurnani, R.; Kuenneth, C.; Toland, A.; Ramprasad, R. Polymer Informatics at Scale with Multitask Graph Neural Networks. Chem. Mater. 2023, 35, 1560–1567. [Google Scholar] [CrossRef]
Himanen, L.; Geurts, A.; Foster, A.S.; Rinke, P. Data-Driven Materials Science: Status, Challenges, and Perspectives. Adv. Sci. 2019, 6, 1900808. [Google Scholar] [CrossRef]
Territo, K.; Romagnoli, J. FASTMAN-JMP: All-in-one Tool for Data Mining and Model Building. In Computer Aided Chemical Engineering; Manenti, F., Reklaitis, G.V., Eds.; Elsevier: Amsterdam, The Netherlands, 2024; Volume 53, pp. 3421–3426. [Google Scholar]
Kim, S.; Yang, S.; Kim, D. Poly (arylene ether ketone) with pendant pyridinium groups for alkaline fuel cell membranes. Int. J. Hydrogen Energy 2017, 42, 12496–12506. [Google Scholar] [CrossRef]
Kim, D.J.; Lee, B.-N.; Nam, S.Y. Synthesis and characterization of PEEK containing imidazole for anion exchange membrane fuel cell. Int. J. Hydrogen Energy 2017, 42, 23759–23767. [Google Scholar] [CrossRef]
Irfan, M.; Bakangura, E.; Afsar, N.U.; Hossain, M.M.; Ran, J.; Xu, T. Preparation and performance evaluation of novel alkaline stable anion exchange membranes. J. Power Sources 2017, 355, 171–180. [Google Scholar] [CrossRef]
Lin, C.; Huang, X.; Guo, D.; Zhang, Q.; Zhu, A.; Ye, M.; Liu, Q. Side-chain-type anion exchange membranes bearing pendant quaternary ammonium groups via flexible spacer for fuel cells. J. Mater. Chem. A 2016, 4, 13938–13948. [Google Scholar] [CrossRef]
Zhang, X.; Li, S.; Chen, P.; Fang, J.; Shi, Q.; Weng, Q.; Luo, X.; Chen, X.; An, Z. Imidazolium functionalized block copolymer anion exchange membrane with enhanced hydroxide conductivity and alkaline stability via tailoring side chains. Int. J. Hydrogen Energy 2018, 43, 3716–3730. [Google Scholar] [CrossRef]
Lu, D.; Li, D.; Wen, L.; Xue, L. Effects of non-planar hydrophobic cyclohexylidene moiety on the structure and stability of poly (arylene ether sulfone)s based anion exchange membranes. J. Membr. Sci. 2017, 533, 210–219. [Google Scholar] [CrossRef]
Lin, C.X.; Zhuo, Y.Z.; Lai, A.N.; Zhang, Q.G.; Zhu, A.M.; Ye, M.L.; Liu, Q.L. Side-chain-type anion exchange membranes bearing pendent imidazolium-functionalized poly (phenylene oxide) for fuel cells. J. Membr. Sci. 2016, 513, 206–216. [Google Scholar] [CrossRef]
Guo, D.; Lin, C.X.; Hu, E.N.; Shi, L.; Soyekwo, F.; Zhang, Q.G.; Zhu, A.M.; Liu, Q.L. Clustered multi-imidazolium side chains functionalized alkaline anion exchange membranes for fuel cells. J. Membr. Sci. 2017, 541, 214–223. [Google Scholar] [CrossRef]
Lee, J.Y.; Lim, D.-H.; Chae, J.E.; Choi, J.; Kim, B.H.; Lee, S.Y.; Yoon, C.W.; Nam, S.Y.; Jang, J.H.; Henkensmeier, D.; et al. Base tolerant polybenzimidazolium hydroxide membranes for solid alkaline-exchange membrane fuel cells. J. Membr. Sci. 2016, 514, 398–406. [Google Scholar] [CrossRef]
Kwon, S.; Rao, A.; Kim, T.-H. Anion exchange membranes based on terminally crosslinked methyl morpholinium-functionalized poly (arylene ether sulfone)s. J. Power Sources 2018, 375, 421–432. [Google Scholar] [CrossRef]
Lai, A.N.; Zhou, K.; Zhuo, Y.Z.; Zhang, Q.G.; Zhu, A.M.; Ye, M.L.; Liu, Q.L. Anion exchange membranes based on carbazole-containing polyolefin for direct methanol fuel cells. J. Membr. Sci. 2016, 497, 99–107. [Google Scholar] [CrossRef]
Fang, J.; Lyu, M.; Wang, X.; Wu, Y.; Zhao, J. Synthesis and performance of novel anion exchange membranes based on imidazolium ionic liquids for alkaline fuel cell applications. J. Power Sources 2015, 284, 517–523. [Google Scholar] [CrossRef]
Yang, Q.; Lin, C.X.; Liu, F.H.; Li, L.; Zhang, Q.G.; Zhu, A.M.; Liu, Q.L. Poly (2,6-dimethyl-1,4-phenylene oxide)/ionic liquid functionalized graphene oxide anion exchange membranes for fuel cells. J. Membr. Sci. 2018, 552, 367–376. [Google Scholar] [CrossRef]
He, Y.; Si, J.; Wu, L.; Chen, S.; Zhu, Y.; Pan, J.; Ge, X.; Yang, Z.; Xu, T. Dual-cation comb-shaped anion exchange membranes: Structure, morphology and properties. J. Membr. Sci. 2016, 515, 189–195. [Google Scholar] [CrossRef]
He, Y.; Pan, J.; Wu, L.; Zhu, Y.; Ge, X.; Ran, J.; Yang, Z.; Xu, T. A Novel Methodology to Synthesize Highly Conductive Anion Exchange Membranes. Sci. Rep. 2015, 5, 13417. [Google Scholar] [CrossRef] [PubMed]
Guo, D.; Lai, A.N.; Lin, C.X.; Zhang, Q.G.; Zhu, A.M.; Liu, Q.L. Imidazolium-Functionalized Poly (arylene ether sulfone) Anion-Exchange Membranes Densely Grafted with Flexible Side Chains for Fuel Cells. ACS Appl. Mater. Interfaces 2016, 8, 25279–25288. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.; Shan, C.; Liu, L.; Liao, J.; Chen, Q.; Zhu, M.; Wang, Y.; An, L.; Li, N. Facilitating Anion Transport in Polyolefin-Based Anion Exchange Membranes via Bulky Side Chains. ACS Appl. Mater. Interfaces 2016, 8, 23321–23330. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Xu, C.; Shen, B.; Zhao, X.; Li, J. Stable poly (arylene ether sulfone)s anion exchange membranes containing imidazolium cations on pendant phenyl rings. Electrochim. Acta 2016, 190, 1057–1065. [Google Scholar] [CrossRef]
Ge, Q.; Ran, J.; Miao, J.; Yang, Z.; Xu, T. Click Chemistry Finds Its Way in Constructing an Ionic Highway in Anion-Exchange Membrane. ACS Appl. Mater. Interfaces 2015, 7, 28545–28553. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Shen, B.; Xu, C.; Zhao, X.; Li, J. Side-chain-type poly (arylene ether sulfone)s containing multiple quaternary ammonium groups as anion exchange membranes. J. Membr. Sci. 2015, 492, 281–288. [Google Scholar] [CrossRef]
Pan, J.; Zhu, L.; Han, J.; Hickner, M.A. Mechanically Tough and Chemically Stable Anion Exchange Membranes from Rigid-Flexible Semi-Interpenetrating Networks. Chem. Mater. 2015, 27, 6689–6698. [Google Scholar] [CrossRef]
Mohanty, A.D.; Ryu, C.Y.; Kim, Y.S.; Bae, C. Stable Elastomeric Anion Exchange Membranes Based on Quaternary Ammonium-Tethered Polystyrene-b-poly (ethylene-co-butylene)-b-polystyrene Triblock Copolymers. Macromolecules 2015, 48, 7085–7095. [Google Scholar] [CrossRef]
Lai, A.N.; Wang, L.S.; Lin, C.X.; Zhuo, Y.Z.; Zhang, Q.G.; Zhu, A.M.; Liu, Q.L. Benzylmethyl-containing poly (arylene ether nitrile) as anion exchange membranes for alkaline fuel cells. J. Membr. Sci. 2015, 481, 9–18. [Google Scholar] [CrossRef]
Lai, A.N.; Wang, L.S.; Lin, C.X.; Zhuo, Y.Z.; Zhang, Q.G.; Zhu, A.M.; Liu, Q.L. Phenolphthalein-based Poly (arylene ether sulfone nitrile)s Multiblock Copolymers As Anion Exchange Membranes for Alkaline Fuel Cells. ACS Appl. Mater. Interfaces 2015, 7, 8284–8292. [Google Scholar] [CrossRef] [PubMed]
Sherazi, T.A.; Zahoor, S.; Raza, R.; Shaikh, A.J.; Naqvi, S.A.R.; Abbas, G.; Khan, Y.; Li, S. Guanidine functionalized radiation induced grafted anion-exchange membranes for solid alkaline fuel cells. Int. J. Hydrogen Energy 2015, 40, 786–796. [Google Scholar] [CrossRef]
Zhang, M.; Liu, J.; Wang, Y.; An, L.; Guiver, M.; Li, N. Highly Stable Anion Exchange Membranes Based on Quaternized Polypropylene. J. Mater. Chem. A 2015, 3, 12284–12296. [Google Scholar] [CrossRef]
Si, J.; Lu, S.; Xu, X.; Peng, S.; Xiu, R.; Xiang, Y. A Gemini Quaternary Ammonium Poly (ether ether ketone) Anion-Exchange Membrane for Alkaline Fuel Cell: Design, Synthesis, and Properties. ChemSusChem 2014, 7, 3389–3395. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Liu, Q.; Yu, Y.; Meng, Y. Synthesis and properties of multiblock ionomers containing densely functionalized hydrophilic blocks for anion exchange membranes. J. Membr. Sci. 2014, 467, 1–12. [Google Scholar] [CrossRef]
Li, X.; Nie, G.; Tao, J.; Wu, W.; Wang, L.; Liao, S. Assessing the Influence of Side-Chain and Main-Chain Aromatic Benzyltrimethyl Ammonium on Anion Exchange Membranes. ACS Appl. Mater. Interfaces 2014, 6, 7585–7595. [Google Scholar] [CrossRef]
Miyake, J.; Fukasawa, K.; Watanabe, M.; Miyatake, K. Effect of ammonium groups on the properties and alkaline stability of poly(arylene ether)-based anion exchange membranes. J. Polym. Sci. Part A Polym. Chem. 2014, 52, 383–389. [Google Scholar] [CrossRef]
Li, X.; Cheng, S.; Wang, L.; Long, Q.; Tao, J.; Nie, G.; Liao, S. Anion exchange membranes by bromination of benzylmethyl-containing poly (arylene ether)s for alkaline membrane fuel cells. RSC Adv. 2014, 4, 29682–29693. [Google Scholar] [CrossRef]
Zhu, L.; Pan, J.; Wang, Y.; Han, J.; Zhuang, L.; Hickner, M.A. Multication Side Chain Anion Exchange Membranes. Macromolecules 2016, 49, 815–824. [Google Scholar] [CrossRef]
Rao, A.H.N.; Nam, S.; Kim, T.-H. Comb-shaped alkyl imidazolium-functionalized poly (arylene ether sulfone)s as high performance anion-exchange membranes. J. Mater. Chem. A 2015, 3, 8571–8580. [Google Scholar] [CrossRef]
Strasser, D.; Graziano, B.; Knauss, D. Base stable poly (diallylpiperidinium hydroxide) multiblock copolymers for anion exchange membranes. J. Mater. Chem. A 2017, 5, 9627–9640. [Google Scholar] [CrossRef]
Zhang, S.; Zhu, X.; Jin, C. Development of a high-performance anion exchange membrane using poly(isatin biphenylene) with flexible heterocyclic quaternary ammonium cations for alkaline fuel cells. J. Mater. Chem. A 2019, 7, 6883–6893. [Google Scholar] [CrossRef]
Wang, J.; Zhao, Y.; Setzler, B.P.; Rojas-Carbonell, S.; Ben Yehuda, C.; Amel, A.; Page, M.; Wang, L.; Hu, K.; Shi, L.; et al. Poly (aryl piperidinium) membranes and ionomers for hydroxide exchange membrane fuel cells. Nat. Energy 2019, 4, 392–398. [Google Scholar] [CrossRef]
Lin, C.X.; Wang, X.Q.; Li, L.; Liu, F.H.; Zhang, Q.G.; Zhu, A.M.; Liu, Q.L. Triblock copolymer anion exchange membranes bearing alkyl-tethered cycloaliphatic quaternary ammonium-head-groups for fuel cells. J. Power Sources 2017, 365, 282–292. [Google Scholar] [CrossRef]
Chu, X.; Shi, Y.; Liu, L.; Huang, Y.; Li, N. Piperidinium-functionalized anion exchange membranes and their application in alkaline fuel cells and water electrolysis. J. Mater. Chem. A 2019, 7, 7717–7727. [Google Scholar] [CrossRef]
Liu, L.; Chu, X.; Liao, J.; Huang, Y.; Li, Y.; Ge, Z.; Hickner, M.A.; Li, N. Tuning the properties of poly (2,6-dimethyl-1,4-phenylene oxide) anion exchange membranes and their performance in H₂/O₂ fuel cells. Energy Environ. Sci. 2018, 11, 435–446. [Google Scholar] [CrossRef]
Peng, H.; Li, Q.; Hu, M.; Xiao, L.; Lu, J.; Zhuang, L. Alkaline polymer electrolyte fuel cells stably working at 80 °C. J. Power Sources 2018, 390, 165–167. [Google Scholar] [CrossRef]
Chen, N.; Hu, C.; Wang, H.H.; Kim, S.P.; Kim, H.M.; Lee, W.H.; Bae, J.Y.; Park, J.H.; Lee, Y.M. Poly(Alkyl-Terphenyl Piperidinium) Ionomers and Membranes with an Outstanding Alkaline-Membrane Fuel-Cell Performance of 2.58 W cm⁻². Angew. Chem. Int. Ed. 2021, 60, 7710–7718. [Google Scholar] [CrossRef]
Allushi, A.; Pham, T.H.; Olsson, J.S.; Jannasch, P. Ether-free polyfluorenes tethered with quinuclidinium cations as hydroxide exchange membranes. J. Mater. Chem. A 2019, 7, 27164–27174. [Google Scholar] [CrossRef]
Liu, R.; Wang, J.; Che, X.; Wang, T.; Aili, D.; Li, Q.; Yang, J. Facile synthesis and properties of poly(ether ketone cardo)s bearing heterocycle groups for high temperature polymer electrolyte membrane fuel cells. J. Membr. Sci. 2021, 636, 119584. [Google Scholar] [CrossRef]
Xue, J.; Liu, X.; Zhang, J.; Yin, Y.; Guiver, M.D. Poly(phenylene oxide)s incorporating N-spirocyclic quaternary ammonium cation/cation strings for anion exchange membranes. J. Membr. Sci. 2020, 595, 117507. [Google Scholar] [CrossRef]
Tian, L.; Ma, W.; Tuo, S.; Wang, F.; Zhu, H. Novel polyaryl isatin polyelectrolytes with flexible monomers for anion exchange membrane fuel cells. J. Membr. Sci. 2024, 690, 122172. [Google Scholar] [CrossRef]
Pham, T.H.; Olsson, J.S.; Jannasch, P. Poly(arylene alkylene)s with pendant N-spirocyclic quaternary ammonium cations for anion exchange membranes. J. Mater. Chem. A 2018, 6, 16537–16547. [Google Scholar] [CrossRef]
Dang, H.-S.; Weiber, E.; Jannasch, P. Poly(phenylene oxide) functionalized with quaternary ammonium groups via flexible alkyl spacers for high-performance anion exchange membranes. J. Mater. Chem. A 2015, 3, 5280–5284. [Google Scholar] [CrossRef]
Pan, D.; Bakvand, P.M.; Pham, T.H.; Jannasch, P. Improving poly(arylene piperidinium) anion exchange membranes by monomer design. J. Mater. Chem. A 2022, 10, 16478–16489. [Google Scholar] [CrossRef]

Figure 1. Workflow of proposed machine learning framework for predicting and interpreting anion conductivity of AEMs.

Figure 2. Schematic of descriptor-based prediction pipeline.

Figure 3. Architecture of the hybrid GCN model.

Figure 4. Schematic of Hybrid Graph Autoencoder-Regressor (HGARE) architecture.

Figure 5. Descriptor-based models’ performance; parity plots (top) compare predicted and actual anion conductivity (S/cm), the dotted line represents the ideal parity line (y = x), and circles represents individual data points. And ablation analysis of plots (bottom) indicate how sensitive the model’s predictive accuracy is to each component space for (a) CatBoost, (b) XGBoost, (c) Random Forest.

Figure 6. Triangle Pearson correlation matrix of top 20 SHAP-selected descriptors contributing to anion conductivity prediction in AEMs.

Figure 7. Clustering of AEM samples using KMeans in two-dimensional descriptor space.

Figure 8. SOM figures, (a) SOM U-Matrix showing topological distances between mapped units; darker regions indicate higher dissimilarly. (b) Cluster 1, (c) cluster 2, and (d) cluster 3 are shown in white circles.

Figure 9. Bar plots comparing (a) cluster 1 to 2, (b) cluster 2 to 3 average descriptor values.

Figure 10. Parity plot for test set in (a) GCN model, (b) GAT model; (c) HGARE achieves near-perfect parity, supporting its high predictive accuracy. The dotted line represents the ideal parity line (y = x), and circles represents individual data points.

Figure 11. Saliency maps of representative anion exchange membranes (AEMs), grouped by predicted anion conductivity at 80 °C. Red and blue colors indicate positive and negative contributions, respectively, to the predicted conductivity, normalized across each molecule. Molecules are grouped into three conductivity classes: (a) low (<0.06 S/cm), (b) moderate (0.06–0.15 S/cm), and (c) high (>0.20 S/cm).

Table 1. Variance contribution of the top eight principal components obtained from the Mordred descriptor matrix.

Principal Component	Explained Variance (%)	Dominant Descriptor Families	Physicochemical Interpretation
PC1	42.41	ETA, MPC, ATS, BertzCT, Xp	Overall molecular size branching and topological complexity, dominant structural factor governing ion transport.
PC2	12.13	FCSP3, hybRatio, AATS, GATS, AETA	Degree of hybridization, polarity, and carbon saturation influencing charge delocalization and hydrophobic–hydrophilic balance.
PC3	6.77	ATSC, VSA_Estate, AATS	Electronic surface area and atomic electronegativity effects on local charge distribution.
PC4	5.71	GATS, AATSC, MATS	Short-range autocorrelation of atomic properties, dipole-included and intramolecular interaction patterns.
PC5	4.19	AATSC, MATS, GATS	Weighted descriptors of surface polarity and dispersion interactions.
PC6	3.67	GATS, MATS, AATSC	Distance-weighted dipole descriptors linked to electronic polarizations.
PC7	3.36	GATS, AATSC, MATS	Medium-range spatial autocorrelation descriptors, topology-dependent polarizability.
PC8	2.30	AATSC, MATS, AMID, JGI	Connectivity indices and electronic delocalization parameters associated with charge transport continuity.

Table 2. Performance summary of all models.

Model	Type	R² (Test)	RMSE (Test)	MAE (Test)	Notes
HGARE	Graph-based	0.9460	0.0093	0.0070	Best overall model
GCN	Graph-based	0.8807	0.0178	0.0143	Good performance
GAT	Graph-based	0.8186	0.0175	0.0210	Weaker than GCN
XGBoost	Descriptor-based	0.8445	0.0195	0.0158	Good Performance
Random Forest	Descriptor-based	0.8477	0.0193	0.0161	Good performance
CatBoost	Descriptor-based	0.8566	0.0187	0.0155	Best descriptor model
ElasticNet	Descriptor-based	0.4285	0.0373	0.0295	Weak performance
LightGBM	Descriptor-based	0.7815	0.0231	0.0186	Weaker performance
MLP	Descriptor-based	0.8340	0.0201	0.0167	Good performance

Table 3. Ablation study results for HGARE.

Variant	R²	RMSE	MAE
Full HGARE	0.9475	0.00919	0.00653
No AE	0.9483	0.000912	0.00687
No SE	0.9403	0.00980	0.00758
No Recon	0.9467	0.00926	0.00691
No Ensemble	0.9072	0.01222	0.00976
GCN + MLP Baseline	0.9419	0.000966	0.00732

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Naghshnejad, P.; Das, D.; Romagnoli, J.A.; Kumar, R.; Chen, J. Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning. Membranes 2026, 16, 12. https://doi.org/10.3390/membranes16010012

AMA Style

Naghshnejad P, Das D, Romagnoli JA, Kumar R, Chen J. Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning. Membranes. 2026; 16(1):12. https://doi.org/10.3390/membranes16010012

Chicago/Turabian Style

Naghshnejad, Pegah, Debojyoti Das, Jose A. Romagnoli, Revati Kumar, and Jianhua Chen. 2026. "Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning" Membranes 16, no. 1: 12. https://doi.org/10.3390/membranes16010012

APA Style

Naghshnejad, P., Das, D., Romagnoli, J. A., Kumar, R., & Chen, J. (2026). Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning. Membranes, 16(1), 12. https://doi.org/10.3390/membranes16010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Uncovering Structure–Conductivity Relationships in Anion Exchange Membranes (AEMs) Using Interpretable Machine Learning

Abstract

1. Introduction

2. Methodology

2.1. Unsupervised Learning

2.2. Descriptor-Based Model

2.3. Graph Models

2.3.1. Graph Convolutional Network

2.3.2. Graph Attention Network

2.3.3. Hybrid Graph Autoencoder-Regressor Ensemble (HGARE)

3. Dataset

External Data Curation

4. Training and Hyperparameter Optimization

5. Results and Discussion

5.1. Descriptor-Based Models

5.2. Descriptor Selection and Correlation Analysis

5.3. Clustering of Descriptor Space

5.4. Graph-Based Models

5.5. Ablation Study of HGARE Model Component

5.6. Saliency Maps

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI