You are currently viewing a new version of our website. To view the old version click .
Data
  • Data Descriptor
  • Open Access

10 December 2025

Computational Dataset for Polymer–Pharmaceutical Interactions: MD/MM-PBSA and DFT Resources for Molecularly Imprinted Polymer (MIP) Design

,
,
,
,
,
and
1
Department of Molecular and Systemic Biomedicine, Faculty of Biotechnology and Drug Development, Radmile Matejčić 2, 51000 Rijeka, Croatia
2
Institute for Anthropological Research, Gajeva ul. 32, 10000 Zagreb, Croatia
3
Faculty of Food Technology Osijek, Josip Juraj Strossmayer University of Osijek, 31000 Osijek, Croatia
4
Department of Science, Institute for Information Technologies, University of Kragujevac, Jovana Cvijića bb, 34000 Kragujevac, Serbia

Abstract

Molecularly imprinted polymers (MIPs) are promising sorbents for selectively capturing pharmaceutically active compounds (PhACs), but design remains slow because candidate screening is largely experimental or based on computationally expensive methods. We present MIP–PhAC, an open, curated resource of polymer–pharmaceutical interaction energies generated from molecular dynamics (MD) followed by MM/PBSA analysis, with a small DFT subset for cross-method comparison. This resource is comprised of two complementary datasets: MIP–PhAC-Calibrated, a benchmark set with manually verified pH-7 microstates that reports both monomeric (pre-polymerized) and polymeric (short-chain) MD/MMPBSA energies and includes a DFT subset; and MIP–PhAC-Screen, a broader, high-throughput collection produced under a uniform automated workflow (including automated protonation) for rapid within-polymer ranking and machine learning development. For each MIP—PhAC pair we provide ΔG* components (electrostatics, van der Waals, polar and non-polar solvation; −TΔS omitted), summary statistics from post-convergence frames, simulation inputs, and chemical metadata. To our knowledge, MIP–PhAC is the largest open, curated dataset of polymer–pharmaceutical interaction energies to date. It enables benchmarking of end-point methods, reproducible protocol evaluation, data-driven ranking of polymer–pharmaceutical combinations, and training/validation of machine learning (ML) models for MIP design on modest compute budgets.
Dataset: DOI: 10.5281/zenodo.17514456
Dataset License: CC BY-NC 4.0

1. Summary

Pharmaceutically active compounds (PhACs) are increasingly detected in surface and drinking waters; even at low concentrations they pose ecological and public health risks [1,2,3]. Molecularly imprinted polymers (MIPs) are promising for this task, but design remains slow and costly because screening is largely experimental or depends on resource-intensive calculations [4,5]. To address this gap, we assembled MIP–PhAC, which spans 20 PhACs and 24 functional monomers and consists of the following: (i) MIP–PhAC-Calibrated, a curated benchmark with manually verified pH-7 microstates that includes monomeric (pre-polymerized; n = 60 systems) and polymeric (short-chain; n = 60 systems) molecular dynamics (MD) simulations plus a DFT subset (n = 12 systems), where n denotes the number of unique MIP-PhAC combinations; (ii) MIP–PhAC-Screen, a high-throughput collection generated by an automated pipeline (including automated protonation) covering 19 × 23 attempted pairs, 434 of which converged.
To the best of our knowledge, MIP–PhAC is the largest open, curated dataset of polymer–pharmaceutical interaction energies to date. We assembled it to enable within-polymer ranking, calibration, and development of machine learning models for predicting binding interactions from pharmaceutical and polymer descriptors with the goal of accelerating MIP polymer design [6]. The release includes ready-to-use master tables, per-frame energy components, and complete simulation/input archives to support transparent reuse and extension. The systems were built from SMILES, parameterized with GAFF2, packed in 6 × 6 × 6 nm boxes, solvated with SPC water, and simulated in GROMACS [7,8,9,10]. Only the post-convergence frames (defined by a global t95 cutoff) were analyzed with gmx_MMPBSA to report ΔG* means and standard deviations (SDs) (kcal·mol−1; see the Methods section) [11,12]. Representative low-energy conformations were also used to compute a small DFT subset, providing cross-method reference energies.
The dataset supports ongoing projects on ML-based rank prediction, protocol development for longer polymer MD with explicit ions, and application-oriented screens for priority pollutants. We will also perform experimental validation of computed interaction energies via adsorption studies, to be reported separately. In the future, we plan to expand the resource to more complex interaction systems and additional solvents, and to improve our automated pipelines so that users can rapidly generate predictions for new pharmaceutical–polymer combinations. This work was supported by the Croatian Science Foundation (HRZZ-IP-2022-10-4400, “MIPdePharma”). No publications based on this dataset have yet appeared; related manuscripts are in preparation.

2. Data Description

The MIP–PhAC resource contains two complementary MD datasets describing polymer–pharmaceutical interactions intended to aid MIP design. The MIP–PhAC-Calibrated dataset serves as a curated benchmark containing manually verified microstates (charges and protonation adjusted to pH 7) and both monomeric “pre-polymerized” and short-chain polymeric systems (60 unique systems each), along with an additional smaller DFT subset (12 systems) for cross-method validation. The choice of 60 systems reflects the full set of combinations between four representative polymers and the 15 pharmaceuticals currently available in our laboratory, selected to enable future integration of experimentally measured retention data. The MIP–PhAC-Screen expands coverage to 19 pharmaceuticals × 23 polymers (437 attempts); 3 systems failed repeatedly and were excluded, yielding 434 reported systems. The screen was generated with an automated workflow (including automated protonation). All systems were analyzed with MM/PBSA, and we report binding free energies (ΔG*) in kcal mol−1. Lower more negative ΔG* indicates stronger binding.
The datasets collectively include 20 pharmaceuticals and 24 polymeric monomer units (Table 1 and Table 2). The 20 pharmaceuticals included in MIP–PhAC were selected because they represent environmentally relevant, analytically challenging targets for which commercially available MIPs often show inadequate performance. Selection criteria included (i) documented frequent occurrence in environment monitoring programs, including EU “Watch List” substances; (ii) literature reports of persistence, toxicity, and removal difficulty; and (iii) physicochemical indicators of poor biodegradability and bioaccumulation potential (e.g., BIOWIN < 0.5).
Functional monomers were chosen to reflect the interaction diversity relevant to MIP design. Different interaction types provide distinct selectivity profiles [13], so we selected a set of widely used monomers spanning a broad range of physicochemical properties and interaction mechanisms [14,15].
Together these combinations provide a diverse sampling of pharmaceutical–polymer interactions for benchmarking or model development.
Table 1. Pharmaceutically active compounds included in MIP–PhAC datasets. Each entry lists the compound name, molecular formula, CAS number, molecular weight (Mw), pKa values, logarithm of the octanol/water partition coefficient (log Kow), and dataset inclusion. Physiochemical properties were retrieved from PubChem [16].
Table 1. Pharmaceutically active compounds included in MIP–PhAC datasets. Each entry lists the compound name, molecular formula, CAS number, molecular weight (Mw), pKa values, logarithm of the octanol/water partition coefficient (log Kow), and dataset inclusion. Physiochemical properties were retrieved from PubChem [16].
Full PhAC NameEmpirical FormulaCASMwpKalog KowDataset Membership CalibratedDataset Membership Screen
AmoxicilinC16H19N3O5S26787-78-0365.43.2; 11.70.87
AtenololC14H22N2O329122-68-7266.349.160.16
DiazepamC16H13ClN2O439-14-5284.743.42.82
DiclofenacC14H11Cl2NO215307-86-5296.14.154.51
CarbamazepineC15H12N2O298-46-4236.2713.9 2.77
ProcaineC13H20N2O259-46-1236.318.051.92
SulfamethazineC12H14N4O2S57-68-1278.332.65; 7.650.89
SulfamethoxazoleC10H11N3O3S723-46-6253.281.6; 5.70.89
TorasemideC16H20N4O3S56211-40-6348.47.13.36
β-estradiolC18H24O250-28-2272.410.464.01
VenlafaxineC17H27NO293413-69-5277.49.53.2
O-DesmethylvenlafaxineC16H25NO2142761-12-4263.379.45; 10.662.72
HydroxychloroquineC18H26ClN3O118-42-3335.99.673.6
MetoclopramideC14H22ClN3O2364-62-5299.799.32.66
TrimethoprimC14H18N4O3738-70-5290.327.120.91X
AmitriptylineC20H23N50-48-6277.49.44.9X
β-SitosterolC29H50O83-46-5414.7-9.3X
MiconazoleC18H14Cl4N2O22916-47-8416.16.916.1X
ClotrimazoleC22H17ClN223593-75-1344.84.15.92X
DexamethasoneC22H29FO550-02-2392.51.18; 3.41.83X
Each dataset folder (Calibrated/, Screen/) contains a master summary table, raw per-frame data, and simulation archives. The Calibrated master table (master_table_calibrated.xlsx) includes an overview sheet (Info) and three data sheets (Monomer_Energy, Polymer_Energy, DFT_Energy) listing pharmaceutical and polymer identifiers, predicted mean affinity (ΔG* for MM/PBSA and ΔG for DFT), and SD.
We additionally report a PhAC self-interaction metric (mean ΔG* and SD) obtained by running the same monomeric MM/PBSA pipeline on ligand-only boxes; this supports interpretation of concentration-dependent effects in pharmaceutical–polymer retention. The Screen workbook (master_table_screen.xlsx) also includes an overview sheet and a single “Data” sheet with the mean and SD of ΔG* across pharmaceutical–polymer pairs. Raw per-frame MM/PBSA energies are collected in a single CSV file, located in raw_data/ Energy_raw_data_multyterm.csv, with rows as trajectory frames and columns following the naming scheme {PolymerID}_{PhACID}_{term}. Terms include per-component energy contributions and the total (ΔG*). Molecular metadata in PhAC_data/ and Polymer_data/ (SMILES, formal charge, names/IDs) are provided for descriptor construction. Each simulation is archived in Simulations/ within the main ZIP containing the production trajectory (.xtc), topology (topol.top), and final equilibrated structure (.gro). Template inputs for PackMol, GROMACS, and gmx_MMPBSA (.inp, .mdp, .in) are provided in inputs/ to enable full reproducibility [7,11,17]. The primary quantitative output is the predicted binding affinity, obtained with gmx_MMPBSA. As absolute ΔG* values can shift across different polymer environments, comparisons are most reliable within a single polymer series. The DFT subset provides reference energies for cross-method benchmarking. All files are provided in open, machine-readable formats suitable for direct integration into statistical or machine learning pipelines.
Table 2. Monomeric units included in MIP–PhAC datasets. Each row lists the monomer name, formula, CAS number, polymer type (neutral/anionic/cationic), number of monomers used in the simulation, and dataset inclusion. Monomer metadata (empirical formula, CAS) were obtained from the PubChem database [16].
Table 2. Monomeric units included in MIP–PhAC datasets. Each row lists the monomer name, formula, CAS number, polymer type (neutral/anionic/cationic), number of monomers used in the simulation, and dataset inclusion. Monomer metadata (empirical formula, CAS) were obtained from the PubChem database [16].
Full Monomer NameEmpirical FormulaCASPolymer TypeSimulation Size Monomeric
(n)
Simulation Size
Polymeric (Chains × Monomers)
Dataset Membership CalibratedDataset Membership Screen
Methacrylic acidC4H6O279-41-4Anionic8720 × 10-mer
4-VinylpyridineC7H7N100-43-6Neutral7616 × 10-mer
2-Hydroxyethyl methacrylateC6H10O3868-77-9Neutral6913 × 10-mer
Oasis HLB ***NANANeutral218 × 5-merX
Acrylic acidC3H4O279-10-7Anionic93-X
Itaconic acidC5H6O497-65-4Anionic72-X
2-(Trifluoromethyl)acrylic acidC4H3F3O2381-98-6Anionic75-X
4-Vinylbenzoic acidC9H8O21075-49-6Anionic60-X
Trans-3-(3-Pyridyl)acrylic acidC8H7NO219337-97-4Anionic61-X
AllylamineC3H7N107-11-9Cationic96-X
N-(2-Aminoethyl)acrylamideC5H10N2O23918-29-8Cationic67-X
2-(Diethylamino)ethyl methacrylateC10H19NO2105-16-8Cationic43-X
[2-(Trimethylammonio)ethyl] methacrylateC9H18NO25039-78-1Cationic49-X
MethacrylamideC4H7NO79-39-0Neutral86-X
AcrylamideC3H5NO79-06-1Neutral92-X
AcrylonitrileC3H3N107-13-1Neutral97-X
Methyl methacrylateC5H8O280-62-6Neutral80-X
EthylstyreneC10H123454-07-7Neutral62-X
StyreneC8H8100-42-5Neutral75-X
N-VinylpyrrolidoneC6H9NO88-12-0Neutral75-X
Ethyl urocanate
(ethyl ester)
C8H10N2O227538-35-8Neutral55-X
4-VinylimidazoleC5H6N225189-76-8Neutral82-X
N-VinylimidazoleC5H6N21072-63-5Neutral82-X
2-VinylpyridineC7H7N100-69-6Neutral76-X
*** commercialized sorbent, supplied by Waters; no single empirical formula; it is a copolymer.

3. Methods

We generated two related MD datasets describing polymer–pharmaceutical interactions (see the Data Description section). Both datasets were produced using an integrated computational workflow for molecular preparation, simulation, and analysis, summarized in Figure 1. A detailed example of the data acquisition workflow is provided in Appendix A, using the 2-hydroxyethyl methacrylate and diazepam system as a representative case.

3.1. Molecule Preparation and Parameterization

Ligand structures were standardized from SMILES using RDKit (version 2023.09.4) and embedded/protonated with Open Babel (version 3.1.0) [8,9,18]. Molecular metadata form monomers and PhACs were retrieved from the PubChem database [16]. In MIP–PhAC-Calibrated, microstates were reviewed in Avogadro (version 1.2) and edited where necessary to reflect pH 7; MIP–PhAC-Screen retained the automated assignments [19]. Polymer environments were represented in two forms. For pre-polymerization, we used pools of a single unique functional monomer surrounding a single PhAC. To approximate post-polymerization packing, we constructed analogous systems composed of several short chains generated from monomer outputs with Winmostar V10 polymerization tools [20]. To avoid simulation instabilities while retaining sufficient functional monomers for meaningful sampling, monomeric systems used 30–100 monomers, scaled inversely with molecular volume to maintain a stable monomer to box volume ratio. Owing to their inherently greater stability, polymeric systems were packed at approximately twice the density used for monomeric simulations. All species were parameterized with GAFF2 via acpype (version 2023.10.27) (antechamber [AmberTools]) [10,21,22].
Figure 1. Overview of the computational workflow used in the creation of the MIP_PhAC datasets. Schematic overview of the integrated pipeline for dataset generation and analysis. Ligand and polymer structures were standardized, protonated, and parameterized (top), assembled into solvated simulation boxes (middle), and simulated in GROMACS for 20 ns (monomeric) or 40 ns (polymeric) production runs. Post-processing included PBC correction, conformer analysis, and MM/PBSA or DFT energy calculations. Colored arrows indicate the three methodological branches used in this work: monomeric MM/PBSA, polymeric MM/PBSA, and the DFT reference subset. RDKit, Open Babel, Avogadro, Winmostar, PackMol, GROMACS, gmx_MMPBSA, and Gaussian were used as indicated. Created with BioRender.com [23].

3.2. System Assembly and Molecular Dynamics

Final topology assembly was performed with ParmEd (version 4.2.2) [24], which was used to merge acpype-generated topologies. Each system was packed in a 6 × 6 × 6 nm cubic box using PackMol (version 20.010) with a single PhAC in the center surrounded by {n} monomers/polymers (Table 2), solvated with SPC water, and neutralized with Na+/Cl [17,25]. This box size was chosen based on preliminary tests showing that 4 nm boxes were unstable during NPT equilibration, while 6 nm and 8 nm boxes showed similar stability and energy profiles; 6 nm therefore provided the best balance between stability and computational efficiency. Simulations were performed in GROMACS 2023.1 using GAFF2 parameters for both pharmaceuticals and monomers/polymers [7]. After steepest-descent energy minimization, systems underwent three equilibration phases at 293.15 K with a 0.5 fs timestep: NVT with Nose–Hoover thermostat; NPT with Nose–Hoover thermostat and Berendsen barostat at 1 bar; and NPT with the Nose–Hoover thermostat and Parrinello–Rahman barostat at 1 bar [26,27,28,29]. Polymeric systems (4 unique polymers) exhibited equilibrated densities of 971.84–1076.03 kg·m−3 (lowest for HLB; highest for methacrylic acid). Monomeric simulations (24 unique monomers) ranged from 965.24 to 1040.02 kg·m−3 (lowest for ethylstyrene; highest for itaconic acid). These values fall within the expected range and confirm adequate packing across all simulations.
Production trajectories were then run for 20 ns for monomeric simulations and 40 ns for polymeric simulations using PME electrostatics (cutoff 1.2 nm), the Verlet scheme for nonbonded interactions, and a van der Waals cutoff of 1.2 nm with force-switch from 1.0 nm [30]. Thermal equilibration occurred rapidly under this protocol. Temperature profiles showed that monomeric systems reached thermal stability within ~5 ps, whereas polymeric systems stabilized within ~30 ps. From each trajectory we extracted 1000 frames for further analysis.

3.3. Convergence Assessment and Analysis Window

Trajectories were recentered (-center -pbc mol -ur compact) and aligned (rot + trans) to remove periodic boundary artifacts. We computed an average, scaled monomer/polymer RMSD and applied a 10-frame moving average. The stabilization time tstab was defined as the earliest time at which a subsequent 50-frame window remained within ±5% of its local mean. Across systems (434 monomeric and 60 polymeric), we then defined a global t95 as the 95th percentile of tstab. Monomeric systems stabilized rapidly (median 1.40 ns; t95 = 3.18 ns), whereas polymeric systems required longer sampling (median 6.24 ns; t95 = 19.17 ns). Based on this, we applied a conservative burn-in of 3.18 ns (monomeric) and 19.17 ns (polymeric) and analyzed only post-burn-in frames.

3.4. MM/PBSA Calculations

End-point binding energies were computed with gmx_MMPBSA v1.6.2 using
Δ G * = ( E e l e c t r i c + E v d w ) + ( G p o l a r + G n o n p o l a r )
with the non-polar contribution modeled as SASA-only with the entropic term −TΔS omitted [11,12]. Poisson–Boltzmann settings followed the tool’s recommended parameters. Component and total energies were averaged over post-burn-in frames.

3.5. DFT Subset

For a subset of neutral ligand–fragment complexes, Gibbs free energies were evaluated using Gaussian 16 [31]. Structures were optimized with the B3LYP functional [32,33] and the 6-31+G(d,p) basis set. The Conductor-like Polarizable Continuum Model (CPCM) to account for solvent effects, with water as the solvent [34,35]. The binding energies were calculated from Gibbs free energy differences using the formula
Δ G =   G c o m p l e x   ( G p h a r m a c e u t i c a l   + n G m o n o m e r )
where Gcomplex is the Gibbs free energy of the pharmaceutical–monomer complex, Gpharmaceutical is the Gibbs free energy of the free pharmaceutical, Gmonomer is the Gibbs free energy of the free monomer, and n is the number of monomer units, which varied depending on the stoichiometric ratio. The ratios for each pharmaceutical–monomer pair were systematically varied to explore the impact of stoichiometry on binding interactions. These calculations were performed to evaluate the stability and strength of the non-covalent interactions between the pharmaceuticals and monomers in aqueous environments.

4. User Notes and Data Limitations

The datasets are designed for (i) within-polymer ranking of PhACs, (ii) benchmarking end-point methods and workflow variants, and (iii) developing ML models that map ligand/polymer descriptors to MM/PBSA ranks. Monomeric systems approximate pre-polymerization chemistry; short-chain polymeric systems approximate local packing after polymerization. Neither reproduces full network topology, crosslink density, or porosity. Cooperative effects beyond the simulated length scales may be underrepresented. Known limitations are as follows: (i) The Screen dataset uses automated protonation; edge-case microstates (multiprotic/tautomeric species) can be misassigned; verify SMILES and charges if microstate sensitivity is important. (ii) Reported ΔG* from MM/PBSA excludes −TΔS and uses SASA-only for the non-polar term; treat values as interaction score, not as absolute binding free energies. (iii) Cross-polymer comparisons can drift due to dielectric and packing differences; prefer within-polymer ranking. (iv) Crosslinkers, porogens, salts, and co-monomers present in experimental MIPs are not modeled; when relating ΔG* to retention/adsorption, note that these missing components can affect selectivity and capacity. Future studies will expand solvent conditions and polymer chemistry, incorporate explicit crosslinkers/ions, refine microstate assignment, and provide replicated trajectories for uncertainty estimation.

Author Contributions

Conceptualization, D.M.P. and M.L.; methodology, D.V., D.M., and R.V.; software, D.V. and D.M.; validation, D.V., M.L., and D.M.; formal analysis, D.V.; investigation, D.V. and D.M.; resources, D.M.P. and M.L.; data curation, D.V. and D.M.; writing—original draft preparation, D.V., M.L., and D.M.; writing—review and editing, D.V., M.L., D.M., R.V., Ž.M., K.T.Č., and D.M.P.; visualization, D.V. and M.L.; supervision, D.M.P., M.L., and R.V.; project administration, D.M.P.; funding acquisition, D.M.P. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Croatian Science Foundation under the project “Development of molecularly imprinted polymers for use in analysis of pharmaceuticals and during advanced water treatment processes” (MIPdePharma) (HRZZ-IP-2022-10-4400). M.L. received funding from Next Generation EU, grant number IA-INT-2024-BioAntroPoP.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The full dataset is available at DOI: 10.5281/zenodo.17743761.

Acknowledgments

During the preparation of this manuscript/study, the authors used ChatGPT 5.1 licensed to M.L. for improving coding pipelines. The authors have reviewed and edited the output and take full responsibility for the content of this publication. M.L., D.M., and R.V. would like to acknowledge the Zagreb University Computing Centre (SRCE) for granting computational resources for the SUPEK cluster.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PhACPharmaceutically active compound
MIPMolecularly imprinted polymer
MDMolecular dynamics
DFTDensity functional theory
MM/PBSAMolecular Mechanics/Poisson–Boltzmann Surface Area
SPCSimple Point Charge (water model)
GAFF2General AMBER Force Field 2
RMSDRoot-mean-square deviation
PBCPeriodic boundary conditions
PMEParticle Mesh Ewald
NVTConstant number, volume, temperature ensemble
NPTConstant number, pressure, temperature ensemble
SASASolvent-accessible surface area
SDStandard deviation
SMILESSimplified Molecular Input Line Entry System
CPCMConductor-like Polarizable Continuum Model
MKMerz–Kollman (electrostatic potential charges)

Appendix A

Appendix A.1. Worked Example: 2-Hydroxyethyl Methacrylate–Diazepam Workflow

To illustrate the complete workflow shown in Figure 1, we provide an example of how the data were obtained for the monomeric system composed of 2-hydroxyethyl methacrylate (HEMA) and diazepam. All files referred to below are included in the dataset archive under
MIP_PhAC\Example

Appendix A.1.1. Molecular Input and Standardization

  • Specification of species
Pharmaceutical (PhAC)
  • Pharmaceutical: diazepam
  • ID: TDIA
  • SMILES: CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3
Functional monomer:
  • Name: 2-Hydroxyethyl methacrylate (HEMA)
  • ID: NHEM
  • SMILES: CC(=C)C(=O)OCCO
  • Standardization and 3D generation
SMILES were standardized using RDKit (fragment removal, valence normalization).
Protonation states were assigned for pH 7 with Open Babel, and one low-energy conformer per species was generated. Relaxation was performed using MMFF94 forcefield.
Output:
  • T_TDIA.mol2
  • P_NHEM.mol2
These MOL2 files serve as the input to the parameterization step.

Appendix A.1.2. Parameterization and Topology Generation

GAFF2
parameterization (acpype)
Both molecules were parameterized using acpype (AmberTools/antechamber backend) with GAFF2 and AM1-BCC charges:
acpype -i <molecule>.mol2 -c bcc -a gaff
Output folders:
  • T_TDIA.acpype/
  • P_NHEM.acpype/
Within each folder, acpype produced the following:
  • <ID>_GMX.gro—GROMACS coordinates
  • <ID>_GMX.itp—GROMACS include topology
  • <ID>_GMX.top—standalone topology
  • posre_<ID>.itp—position restraint file
  • Monomer count selection
The number of HEMA monomers that will be used in the simulation was determined by comparing and scaling the molecular volume of all the functional monomers (using RDKit). For this monomer, n = 69 HEMA monomers.

Appendix A.1.3. System Assembly

  • PackMol placement
A 3 × 3 × 3 nm cubic box was constructed with PackMol using script:
 
tolerance 2.0
filetype pdb
 
output system.pdb
 
structure TDIA.pdb
number 1
center
add_amber_ter
fixed 0. 0. 0. 0. 0. 0.
end structure
 
structure NHEM.pdb
number 69
inside box -15. -15. -15. 15. 15. 15.
add_amber_ter
outside sphere 0. 0. 0. 2.
end structure
  • Topology merge (ParmEd)
A unified topology was constructed using ParmEd, combining ligand and monomer .itp files, positioning restraints, and including water/ion.
Outputs:
  • topol.top
  • system.gro
These constitute the full starting system.

Appendix A.1.4. Molecular Dynamics Simulation

MD was performed with GROMACS 2023.1 using GAFF2. All .mdp files are included in the example folder.
  • Minimization and equilibration
  • Minimization: steepest descent
  • Equilibration:
    NVT (62.5 ps), Nose–Hoover, 293.15 K
    NPT (500 ps), Berendsen, 1 bar
    PT (500 ps), Parrinello–Rahman, 1 bar
  • Production
  • Duration: 20 ns
  • PME electrostatics, 1.2 nm cutoffs
  • 1000 trajectory frames written
  • Outputs:
  • step5_production_noIon.tpr
  • step5_production_noIon.xtc

Appendix A.1.5. Trajectory Post-Processing

Trajectory conditioning was performed using the following:
gmx trjconv -center -pbc mol -ur compact
gmx trjconv -fit rot+trans
Output for analysis:
  • cen_energy_fit.xtc
  • Stabilization time inspection
Scaled RMSD with a 10-frame moving average placed the stabilization point of 95% of all the simulations (435 systems) at 3.18 ns (t95); therefore, only frames >3.18 ns were considered valid in the MM/PBSA results.

Appendix A.1.6. MM/PBSA Binding Energy Calculation

Binding energies were computed using gmx_MMPBSA v1.6.2.
Input:
  • topol.top
  • cen_energy_fit.xtc
  • mmpbsa.in
Key parameters:
  • Polar solvation: PB with default gmx_MMPBSA settings
  • Non-polar solvation: SASA-only
  • Entropy (−TΔS): omitted in ΔG*
  • Outputs
  • FINAL_RESULTS_MMPBSA.csv (Per-frame CSV)
Energy results for the HEMA x diazepam system:
Energy termValue (kcal·mol−1)
Electrostatic−2.2 ± 2.8
van der Waals−15.1 ± 7.6
Polar solvation+9 ± 5.3
Non-polar solvation −2.2 ± 0.9
ΔG*−10.5 ± 5.2
The corresponding columns appear in the following:
Energy_raw_data_multyterm_monomeric.csv; (NHEM_TDIA)

Appendix A.1.7. Files Provided for the Example

Molecule-level inputs
  • <ID>.mol2
  • <ID>_GMX.itp, <ID>_GMX.top, <ID>_GMX.gro, posre_<ID>.itp
Assembled system
  • system.pdb, system.gro, topol.top
Simulation files
  • step5_production_noIon.tpr, step5_production_noIon.xtc
  • step4.{1-3}_equilibration_noIon.mdp
  • step5_production_noIon.mdp
Post-processed data
  • cen_energy_fit.xtc
  • index.ndx
  • mmpbsa.in
Energy outputs
  • FINAL_RESULTS_MMPBSA.csv

References

  1. Duan, L.; Zhang, Y.; Wang, B.; Zhou, Y.; Wang, F.; Sui, Q.; Xu, D.; Yu, G. Seasonal Occurrence and Source Analysis of Pharmaceutically Active Compounds (PhACs) in Aquatic Environment in a Small and Medium-Sized City, China. Sci. Total Environ. 2021, 769, 144272. [Google Scholar] [CrossRef] [PubMed]
  2. Royano, S.; Navarro, I.; de la Torre, A.; Martínez, M.Á. Investigating the Presence, Distribution and Risk of Pharmaceutically Active Compounds (PhACs) in Wastewater Treatment Plants, River Sediments and Fish. Chemosphere 2024, 368, 143759. [Google Scholar] [CrossRef]
  3. Zhao, J.; Wang, Y.; Jiang, Z.; Song, Y.; Deng, N.; Ren, Y.; Ma, R.; Jiang, K. Novel Insights into the Morphological Effects of Trace Organic Contaminants on Water/PhAC Selectivity in Nanofiltration of Sewage. J. Hazard. Mater. 2025, 497, 139672. [Google Scholar] [CrossRef]
  4. Rajpal, S.; Mishra, P.; Mizaikoff, B. Rational In Silico Design of Molecularly Imprinted Polymers: Current Challenges and Future Potential. Int. J. Mol. Sci. 2023, 24, 6785. [Google Scholar] [CrossRef]
  5. Zare, E.N.; Fallah, Z.; Le, V.T.; Doan, V.-D.; Mudhoo, A.; Joo, S.-W.; Vasseghian, Y.; Tajbakhsh, M.; Moradi, O.; Sillanpää, M.; et al. Remediation of Pharmaceuticals from Contaminated Water by Molecularly Imprinted Polymers: A Review. Environ. Chem. Lett. 2022, 20, 2629–2664. [Google Scholar] [CrossRef]
  6. Lowdon, J.W.; Ishikura, H.; Kvernenes, M.K.; Caldara, M.; Cleij, T.J.; van Grinsven, B.; Eersels, K.; Diliën, H. Identifying Potential Machine Learning Algorithms for the Simulation of Binding Affinities to Molecularly Imprinted Polymers. Computation 2021, 9, 103. [Google Scholar] [CrossRef]
  7. Abraham, M.J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J.C.; Hess, B.; Lindahl, E. GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX 2015, 1–2, 19–25. [Google Scholar] [CrossRef]
  8. O’Boyle, N.M.; Banck, M.; James, C.A.; Morley, C.; Vandermeersch, T.; Hutchison, G.R. Open Babel: An Open Chemical Toolbox. J. Cheminform. 2011, 3, 33. [Google Scholar] [CrossRef] [PubMed]
  9. RDKit. Available online: https://www.rdkit.org/ (accessed on 2 November 2025).
  10. He, X.; Man, V.H.; Yang, W.; Lee, T.-S.; Wang, J. A Fast and High-Quality Charge Model for the next Generation General AMBER Force Field. J. Chem. Phys. 2020, 153, 114502. [Google Scholar] [CrossRef]
  11. Valdés-Tresanco, M.S.; Valdés-Tresanco, M.E.; Valiente, P.A.; Moreno, E. gmx_MMPBSA: A New Tool to Perform End-State Free Energy Calculations with GROMACS. J. Chem. Theory Comput. 2021, 17, 6281–6291. [Google Scholar] [CrossRef]
  12. Massova, I.; Kollman, P.A. Combined Molecular Mechanical and Continuum Solvent Approach (MM-PBSA/GBSA) to Predict Ligand Binding. Perspect. Drug Discov. Des. 2000, 18, 113–135. [Google Scholar] [CrossRef]
  13. Mutavdžić Pavlović, D.; Nikšić, K.; Livazović, S.; Brnardić, I.; Anžlovar, A. Preparation and Application of Sulfaguanidine-Imprinted Polymer on Solid-Phase Extraction of Pharmaceuticals from Water. Talanta 2015, 131, 99–107. [Google Scholar] [CrossRef]
  14. Yan, H.; Row, K.H. Characteristic and Synthetic Approach of Molecularly Imprinted Polymer. Int. J. Mol. Sci. 2006, 7, 155–178. [Google Scholar] [CrossRef]
  15. Introduction to Molecularly Imprinted Polymer. In Interface Science and Technology; Elsevier: Amsterdam, The Netherlands, 2021; Volume 33, pp. 511–556.
  16. Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A.; et al. PubChem Substance and Compound Databases. Nucleic Acids Res. 2016, 44, D1202–D1213. [Google Scholar] [CrossRef] [PubMed]
  17. Martínez, L.; Andrade, R.; Birgin, E.G.; Martínez, J.M. PACKMOL: A Package for Building Initial Configurations for Molecular Dynamics Simulations. J. Comput. Chem. 2009, 30, 2157–2164. [Google Scholar] [CrossRef]
  18. Riniker, S.; Landrum, G.A. Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. J. Chem. Inf. Model. 2015, 55, 2562–2574. [Google Scholar] [CrossRef]
  19. Hanwell, M.D.; Curtis, D.E.; Lonie, D.C.; Vandermeersch, T.; Zurek, E.; Hutchison, G.R. Avogadro: An Advanced Semantic Chemical Editor, Visualization, and Analysis Platform. J. Cheminform. 2012, 4, 17. [Google Scholar] [CrossRef]
  20. Themefisher Winmostar (TM). Available online: https://winmostar.com (accessed on 25 November 2025).
  21. Sousa da Silva, A.W.; Vranken, W.F. ACPYPE—AnteChamber PYthon Parser interfacE. BMC Res. Notes 2012, 5, 367. [Google Scholar] [CrossRef]
  22. Case, D.A.; Aktulga, H.M.; Belfon, K.; Cerutti, D.S.; Cisneros, G.A.; Cruzeiro, V.W.D.; Forouzesh, N.; Giese, T.J.; Götz, A.W.; Gohlke, H.; et al. AmberTools. J. Chem. Inf. Model. 2023, 63, 6183–6191. [Google Scholar] [CrossRef] [PubMed]
  23. Scientific Image and Illustration Software|BioRender. Available online: https://www.biorender.com/ (accessed on 27 November 2025).
  24. Shirts, M.R.; Klein, C.; Swails, J.M.; Yin, J.; Gilson, M.K.; Mobley, D.L.; Case, D.A.; Zhong, E.D. Lessons Learned from Comparing Molecular Dynamics Engines on the SAMPL5 Dataset. J. Comput. Mol. Des. 2016, 31, 147–161. [Google Scholar] [CrossRef] [PubMed]
  25. Berendsen, H.J.C.; Grigera, J.R.; Straatsma, T.P. The missing term in effective pair potentials. J. Phys. Chem. 1987, 91, 6269–6271. [Google Scholar] [CrossRef]
  26. Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182–7190. [Google Scholar] [CrossRef]
  27. Evans, D.J.; Holian, B.L. The Nose–Hoover Thermostat. J. Chem. Phys. 1985, 83, 4069–4074. [Google Scholar] [CrossRef]
  28. Berendsen, H.J.C.; Postma, J.P.M.; van Gunsteren, W.F.; DiNola, A.; Haak, J.R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684–3690. [Google Scholar] [CrossRef]
  29. Nosé, S. A Molecular Dynamics Method for Simulations in the Canonical Ensemble. Mol. Phys. 1984, 52, 255–268. [Google Scholar] [CrossRef]
  30. Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N⋅log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089–10092. [Google Scholar] [CrossRef]
  31. Frisch, M.J.; Trucks, G.W.; Schlegel, H.B.; Scuseria, G.E.; Robb, M.A.; Cheeseman, J.R.; Scalmani, G.; Barone, V.; Petersson, G.A.; Nakatsuji, H.; et al. Gaussian 16, Revision B.01; Gaussian, Inc.: Wallingford, CT, USA, 2016; GaussView 5.0. Wallingford, E.U.A. Available online: https://gaussian.com/ (accessed on 25 November 2025).
  32. Lee, C. Development of the Colle-Salvetti Correlation-Energy Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37, 785–789. [Google Scholar] [CrossRef]
  33. Becke, A.D. Density-functional Thermochemistry. III. The Role of Exact Exchange. J. Chem. Phys. 1993, 98, 5648–5652. [Google Scholar] [CrossRef]
  34. Barone, V.; Cossi, M. Quantum Calculation of Molecular Energies and Energy Gradients in Solution by a Conductor Solvent Model. J. Phys. Chem. A 1998, 102, 1995–2001. [Google Scholar] [CrossRef]
  35. Cossi, M.; Rega, N.; Scalmani, G.; Barone, V. Energies, Structures, and Electronic Properties of Molecules in Solution with the C-PCM Solvation Model. J. Comput. Chem. 2003, 24, 669–681. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.