Background/Objectives: Carbonic anhydrase I (CAI) is a zinc-dependent metalloenzyme whose inhibitor discovery requires both effective navigation of chemical space and explicit evaluation of coordination-credible binding hypotheses. We aimed to develop an interpretable and reproducible QSAR-to-structure workflow for CAI inhibitor discovery. The workflow links potency prediction with zinc-site plausibility and early developability to support decision-oriented prioritization of new CAI inhibitor candidates.
Methods: CAI inhibitors were retrieved from ChEMBL (CHEMBL261) and modeled as
. AlvaDesc v3.0.8 generated 4224 2D descriptors, which were reduced using train-only preprocessing, variance filtering, correlation pruning, and bagged-tree ranking to a top-100 panel. Five regressors (elastic net, CART, bagging, GB, and XGB) were benchmarked on a held-out test set. Potent ChEMBL seeds (
Ki ≤ 10 nM) were used for a 90% 2D similarity PubChem expansion. Predicted hits were then externally validated using independently available PubChem CAI
records. Ten novel candidates lacking CAI
data were docked to CAI (PDB: 1AZM) via SwissDock AutoDock Vina in neutral and relevant anionic states, with pose selection constrained by a Zn-donor filter (Zn-N/O
Å). SwissADME was used to profile physicochemical space, alerts, and absorption/distribution proxies.
Results: The bagging model showed the best test generalization (
; RMSE = 0.61; MAE = 0.45). PFI and SHAP converged on sulfur/heteroatom connectivity and polar–lipophilic organization as dominant potency drivers. PubChem expansion yielded 25,315 analogs and 233 candidates at predicted
; external validation on 145 CAI-measured hits gave
(RMSE = 0.456; MAE = 0.320). Across 20 ligand/protomer docking runs, 12 produced canonical Zn-anchored poses (10 Zn-N; 2 Zn-O). SwissADME indicated consensus logP values from −0.65 to 3.21, 0/10 PAINS alerts, and predominantly favorable drug-likeness (8/10 with zero Lipinski violations), supporting tiered advancement.
Conclusions: Integrating interpretable QSAR, external PubChem validation, coordination-aware docking, and SwissADME yields a practical triage framework for CAI inhibitor discovery. The resulting tiered shortlist identifies two Zn-N-anchored N-alkyl sulfamides (CIDs 103935964 and 112684680) and one Zn-O-anchored carboxylate control (CID 122367674) as highest-priority computational hypotheses for staged biochemical evaluation.
Full article