1. Summary
Graphene oxide (GO) stands out as a practical and versatile alternative to pristine graphene because its surface is functionalized with oxygen-containing functional groups like hydroxyl and epoxy groups [
1]. These groups disrupt the perfect sp
2 hybridized carbon network of graphene, increasing hydrophilicity, surface charge, and the ability to form hydrogen bonds, which enhances its dispersion in water and interactions with ions, polymers, and biomolecules [
2]. At the molecular scale, the type and distribution of functional groups tune GO’s flexibility, reactivity, electronic properties, and sheet–sheet stacking behavior [
3]. Another practical strength of GO is its scalability. Unlike pristine graphene, which is expensive and difficult to produce in large quantities, GO can be synthesized through oxidative intercalation and exfoliation of graphite using well-established methods that operate reliably at mass-production [
4]. This combination of the low-cost synthesis, tunable structure, and chemistry of GO supports a wide range of uses from composite reinforcement and energy storage to pollutant adsorption, drug delivery, and the creation of hybrid materials.
Several automated GO-builder frameworks have recently been developed to address the difficulty of constructing oxidized graphene sheets with controlled functionality. For example, the MakeGraphitics [
5] program provides an automated and experimentally validated way to build graphene-based structures by sequentially oxidizing graphitic lattices according to locally predicted reactivity, reproducing the experimentally observed two-phase morphology of oxidized and unoxidized domains rather than relying on random functional-group placement [
6]. Released on Zenodo as a part of Sinclair and Coveney’s modeling workflow, it generates atomistic structures compatible with LAMMPS [
7] and other MD engines using machine learning-derived reactivity rules. However, this package was written in Python 2 and depends on legacy libraries, leading to incompatibilities with the newer Python 3 environments and requiring manual patching or containerization to ensure reliable execution. GOPY [
8] is another Python-based tool that rapidly generates 2D graphene-based models, including pristine graphene and several graphene derivatives such as graphene oxide, reduced graphene oxide, aminated polyethylene glycol functionalized reduced graphene oxide (rGO-PEG-NH
2), and N-doped graphene (NG), in PDB format. Functional groups are added using simple geometric rules, allowing users to specify the number of carboxyl, epoxy, and hydroxyl groups for GO construction. Although GOPY generates these structures by placing functional groups onto a geometrically constructed basal lattice, its implementation is limited when systems become large. In our experience, GOPY’s coordinate-generation scheme can accumulate small numerical deviations across extended lattices, leading to misaligned atoms and incomplete bond recognition in downstream visualization software due to the absence of CONECT records in PDB files. In contrast, another tool called HierGO [
9] employs a modular tiling algorithm that assembles large pristine and defective graphene regions from internally consistent sub-units, ensuring structural regularity even at tens of nanometers in size and enabling the incorporation of vacancies, holes, and topological defects with controlled spatial distributions. Although both GOPY and HierGO rely on geometry-based placement, we did not observe drift or placement inconsistencies in HierGO that occurred with GOPY, even when generating large sheets from a single tile. HierGO’s implementation maintains consistent lattice geometry across extended domains and outputs structurally coherent PDB files and simulation-ready GROMACS files without the extensive post-processing required in our GOPY workflows. While these tools lower the barrier to constructing GO models, they ultimately generate their final structures in the PDB format, which is not directly compatible with AMBER and therefore requires substantial post-processing to assign atom types, bonds, residues, and charge models in a force-field-consistent manner. Moreover, existing builders typically target specific force fields (e.g., OPLS or ReaxFF) or provide code without a curated, simulation-ready dataset covering a systematic range of oxidation states, leaving a gap for workflows that demand fully parameterized, immediately usable GO models.
This work presents a reproducible dataset and a complete computational protocol for generating chemically homogeneous GO structures compatible with the AMBER [
10] molecular dynamics simulation package. This workflow integrates the HierGO [
9] suite for oxidation pattern generation with BIOVIA Discovery Studio (DS) visualizer [
11] and Visual Molecular Dynamics (VMD) [
12] for structure refinement, atom typing, and parameterization. The curated dataset comprises 15 GO systems spanning oxidation levels from 0% to 68% validated through carbon to oxygen (C:O) ratio analysis and atom-type counting scripts. The resulting Tripos MOL2 files include General AMBER Force Field (GAFF)-compatible atom types and are ready for simulation setup. Details on why GAFF was used are described in the document titled Building AMBER Compatible Graphene Oxide Tutorial provided with the dataset [
13]. The provided Python and Tool Command Language (Tcl) scripts automated key stages such as oxidation generation, topological correction, residue assignment, and validation. Tcl was used for the VMD-based steps because it is VMD’s native scripting interface and provides direct access to atom selections, bonding/connectivity, and file conversion operations in the same environment used for visualization, enabling reliable, repeatable edits on large sheets. All files, scripts, and a complete step-by-step tutorial were made available
2. Data Description
The dataset comprises fifteen GO structures at specified oxidation targets of 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, and 68%. Each structure is provided as an AMBER-ready Tripos MOL2 file derived from a single, non-periodic graphene sheet approximately 20 × 20 nm2 in size. The sheet was generated by repeating the graphene unit cell to an integer number of rings so that no bonds cross a periodic boundary, as AMBER does not support bonds across periodic boundaries. Oxidation was applied directly to this pristine sheet rather than by stitching smaller tiles, ensuring structural continuity and avoiding junction artifacts.
Each oxidation level is represented by a single file named x_GOyy_final.mol2 (where x indicates the file order and yy indicates the specified O%). The complete protocol is detailed in
Figure 1 and in the tutorial document provided with the dataset [
13].
Figure 1 shows the detailed pipeline from pristine graphene tile creation to oxidation, error handling, and post-processing (cleaning, typing, and composition checks). The decision flow shows (i) tile creation (create-tile.py), (ii) oxidation (decorate-tile.py), (iii) fallback to decorate-tile_modified.py if persistent placement errors occur, and (iv) post-processing in DS Visualizer and VMD with custom Tcl/Python scripts to finalize MOL2 and frcmod files. We generated ∼20 × 20 nm
2 GRA tiles using create-tile.py, then applied target oxygen coverages (O/C) using decorate-tile.py at a “fresh” ratio of 2:1. When epoxy placement failures persisted at higher O% targets, we invoked decorate-tile_modified.py (same inputs) to complete functionalization. Structures were opened in DS Visualizer to remove loose/dangling atoms, then converted to MOL2 in VMD. We assigned atom types/charges and structural sanity checks for artifact removal with 1_GO_atom_info_assignment.tcl, validated counts quantitatively with 2_atom_type_counter.tcl and 3_C:O_ratio.tcl, followed by visualization with 4_snapshot.tcl, and produced final substructures with 5_mol2_substruct_numpy.py to obtain GRA/GO as an AMBER-compatible MOL2 file.
This final MOL2 file contains the complete atom, bond, and substructure records needed for the direct import into topology builders. Intermediate files are also provided for transparency and follow a numeric prefix (x) corresponding to the processing stage (for example, a cleaned pre-typing PDB, an unedited MOL2 after oxidation, or an edited MOL2 before substructure rebuild). The x_GOyy_final.mol2 files follow the Tripos specification with GAFF-compatible atom types and a rebuilt @<TRIPOS>SUBSTRUCTURE section. Atom records include atom name, Cartesian coordinates, atom type, formal charge, and residue assignment and bond records define connectivity and bond orders. The SUBSTRUCTURE block lists residue names and sequential residue IDs that encode the chemical semantics for analysis. Residues are categorized as basal graphene carbons (GRA), hydroxyl groups (OH), and epoxy bridges (EPX). The atom typing scheme employs a minimal GAFF mapping where aromatic sp2 carbons are labeled ca; oxidized sp3 carbons are c3; hydroxyl oxygens are oh; epoxy oxygens are os; and hydroxyl hydrogens are ho. This limited vocabulary simplifies comparison between models and maintains consistency across oxidation levels. The final MOL2 files preserve these labels for the efficient selection of atom or residue types.
Table 1 shows details of the oxidation rates and the corresponding C:O (to have comparisons with experimental elemental analysis studies) and modified-carbon ratios, as well as the total number of atoms per model as it is relevant information for molecular dynamics simulations. The specified oxidation percentage (Overall Modification % in
Table 1) is calculated as 100 * N
O/N
C, where N
O and N
C are the number of oxygen and carbon atoms in the final structure, respectively. The C:O ratio is N
C/N
O and is expressed as X:1; numerically, this equals 100/O% under the same definition. The “Modified carbon %” is given by 100 * N
c3/(N
ca + N
c3) and represents the proportion of carbons that rehybridize to sp
3. For the “fresh” 2:1 hydroxyl:epoxy placement, a first-order approximation is Modified carbon % ≈ (4/3) * O%, since each epoxy oxygen converts two carbons while each hydroxyl converts one. At higher oxidation, achievable epoxy placements are limited, leading to an expected hydroxyl bias. The atom-type count report lists N
ca, N
c3, N
oh, N
os, and N
ho. From these, the composition script computes N
C = N
ca + N
c3 and N
O = N
oh + N
os, then outputs N
C/N
O as the C:O ratio and 100 * N
O/N
C as the “Overall Modification %”. The modified-carbon percentage is derived as 100 * N
c3/(N
ca + N
c3). For example, the GO20 model shows C:O ≈ 5.0:1 and Modified carbon ≈ 26.7%. Minor deviations at high oxidation (GO60 and above) are expected and explicitly reported.
Figure 2 highlights the distribution of GRA, OH, and EPX regions, revealing local clustering relevant to surface wetting and adsorption analyses.
3. Methods
The workflow proceeds in five stages that collectively generate AMBER-compatible GO structures at specified oxidation levels while maintaining a single, coherent sheet topology and consistent atom typing. The first stage constructs a non-periodic graphene tile with lateral dimensions near 20 × 20 nm2, chosen to balance realistic interfacial length scales with tractable atom counts for routine simulations. The tile is saved as a PDB for portability and to enable downstream cleaning operations that remove hanging atoms at edges and ensure that the resulting structure reloads as a single fragment.
The second stage applies oxygen functionalization to reach the desired specified coverage. Hydroxyl and epoxy groups are placed with an intended “fresh” ratio of approximately two hydroxyls per epoxy, which promotes a realistic mixture of single-carbon and bridging modifications across the basal plane. At moderate oxidation levels, the standard decoration routine achieves the requested placements with high fidelity. As the target coverage increases toward 60% and beyond, epoxy placement attempts can fail because suitable neighboring pairs become rare on a finite lattice. To address this, during the decoration step, epoxy placement attempts are ignored to prioritize as much hydroxyl placement as possible. This improves the coverage without leaving the sheet under-functionalized. However, this robustness comes at the cost of large hydroxyl bias at the highest coverages, which we make explicit during validation.
The third stage encompasses cleaning and export. The decorated PDB is inspected to remove dangling fragments and to correct any spurious edge hydrogens or atypical valences introduced during aggressive high-coverage placement. The cleaned sheet is then exported to a MOL2 file so that atom typing and residue assignment can be applied consistently in a single file format widely supported in the AMBER ecosystem. Care is taken to preserve connectivity, so that the entire sheet remains one molecular fragment; this is important for later parameterization and to avoid unintended fragmentation at the topology-building step.
The fourth stage assigns residues and GAFF-compatible atom types. Basal carbons retain an aromatic sp2 type, oxidized carbons are converted to sp3 as appropriate, oxygen atoms are labeled according to their role in hydroxyl or epoxy groups, and hydroxyl hydrogens are explicitly typed. At this step, partial charges may be assigned according to a consistent scheme suitable for large sheets. The assignment script also performs sanity checks, such as detecting the rare but possible case in which a carbon has been targeted both by hydroxyl and epoxy routines. The output of this stage is a MOL2 file in which residue names (e.g., GRA, OH, and EPX) and GAFF atom types (ca, c3, oh, os, and ho) encode the chemical semantics needed by the force field.
The fifth stage validates and finalizes the structures. Automated counting scripts tabulate atom-type populations, from which the oxidation percentage and C:O ratio are computed. A separate report computes the modified-carbon percentage directly from the c3 and ca counts. Snapshot scripts render color-coded images that provide rapid visual assessment of functional-group distribution and help identify any nonuniformities or artifacts that merit a return to cleaning. Finally, a substructure-rebuild utility regenerates the MOL2 SUBSTRUCTURE section to ensure that residues are numbered sequentially and that the file is well-formed for use in topology builders. The final deliverables are saved as GOxx_final.mol2, and the corresponding text reports and images are archived with the same base name for traceability.
4. Experimental and Computational Validation
GO models in this dataset use a hydroxyl/epoxide-focused parameterization that has been applied and evaluated in prior peer-reviewed studies. In particular, the partial-charge model for epoxide and hydroxyl functional groups follows Stauffer et al. [
14], where charges were derived from ab initio electrostatic potential fitting for epoxide and hydroxyl-based motifs. The same class of hydroxyl/epoxide GO surfaces and parameterization has also been used for studies in our group, including Kim et al. [
15], where MD-predicted adsorption structures of single-stranded DNA on GO were compared with AFM observations, and Xiong et al. [
16], for which experiments and simulations were combined to evaluate cellulose nanofiber adsorption across different GO oxidation levels using AFM-based comparisons. Together, these published studies provide external support for using hydroxyl/epoxide GO models in interfacial simulations relevant to adsorption and surface interactions.
We are currently conducting additional simulations using the large-sheet structures released here to study aqueous surface topology, water–droplet interfacial behavior, and biomolecule immobilization on GO. These results, including direct comparison to experimental observables, where available, will be reported separately.
5. Limitations and Reasonings
The present workflow and dataset represent graphene oxide functionalization using only pristine basal-plane hydroxyl and epoxide groups. Other oxygen-containing functionalities that may occur in experimental GO, particularly those enriched at edges and defect sites (e.g., carbonyl-, aldehyde-, and carboxyl-containing groups), are not included in the current models. This scope was selected to maintain a parameter-complete and reproducible AMBER-compatible workflow for large GO sheets across a broad oxidation range using a minimal GAFF-compatible atom-typing scheme supported by an existing validated parameterization. Incorporating chemically explicit edge/defect functionalities requires expanding the number of atom types and validating a larger set of bonded parameters (bonds/angles/dihedrals/impropers), both of which lead to a more complex, labor-intensive parameterization process. Accordingly, the dataset is most directly applicable to studies emphasizing basal-plane interfacial behavior (e.g., wetting, adsorption, and surface interactions away from boundaries), where edge contributions can be minimized by sheet size and by selecting regions of interest away from the sheet perimeter. Users requiring explicit edge/defect chemistry are encouraged to parameterize smaller representative fragments with higher specificity typing and bonded terms and then incorporate these validated parameters into their own extended GO models.
Data quality control centers on predictable failure modes and their remedies. At the highest oxidation levels, achievable epoxy placement can be limited, and a bias toward hydroxyl groups is expected. This effect is recorded rather than hidden; each final model comes with its achieved composition so that simulation studies can reference the actual hydroxyl-to-epoxy balance. Additional failure modes include lingering edge artifacts and occasional mis-typed carbons if cleaning is skipped. The curation protocol requires repeating the cleaning and assignment stages until validation reports contain no warnings. Reproducibility is supported by consistent file naming, versioned scripts, and inclusion of representative logs showing command lines and console outputs for low, mid, and high-oxidation cases.
Further limitation occurs when these GO surfaces are used alongside biomolecules. Default atom and residue names can cause GO sheets to be misidentified as a protein in VMD or AMBER/LEaP, leading to incorrect selections, residue merging, or connect0/connect1 errors. To avoid this, the entire sheet should be assigned a single residue (e.g., GRO) with a unified residue ID, and atom names should be adjusted to avoid overlap with common biomolecular atom names. Although the default models are adequate for surface–water simulations, users integrating GO with biomolecular systems should apply this renaming procedure by reassigning atom and residue names in the final MOL2 file.
6. User Notes
The structures released here are intended as simulation-ready inputs: large, GAFF-compatible GO sheets with assigned atom types and accompanying scripts for reproducible model construction and verification. Validation therefore focuses on two complementary aspects: (i) structure-level correctness and reproducibility of the generated models, and (ii) provenance and prior benchmarking of the hydroxyl/epoxide parameterization used to make the models immediately usable in AMBER workflows. The dataset does not include MD trajectories or simulation outputs.
Each generated surface is accompanied by automated checks that quantify achieved oxidation and confirm chemical consistency. These include carbon-to-oxygen (C:O) ratio and oxidation-level reporting, atom-type counting to confirm the intended GAFF typing distribution, and integrity checks to flag rare decoration artifacts (e.g., chemically invalid over-functionalization at a site observed at 45% GO) that can occur in stochastic surface generation and are difficult to detect by visual inspection on large sheets. When a defect is detected, the workflow documents a correction loop (artifact removal and re-check) to ensure that the final released structures are chemically consistent and traceable through intermediate files.
Further notes on the usage of the dataset will help other researchers quickly begin working with the models. Users are encouraged to consult the accompanying tutorial document, which provides a step-by-step workflow with checklists, figures, and detailed explanations that make the procedures easier to digest and reproduce.