1. Introduction
The accurate ranking of small-molecule binding affinities to protein targets sits at the centre of structure-based drug discovery [
1,
2,
3]. Across virtual libraries that now routinely exceed billions of compounds, the difference between a tractable lead-optimisation campaign and a wasted experimental cycle often reduces to the question of whether a scoring function can correctly order candidate molecules by their experimental dissociation or inhibition constants. The classical empirical scoring functions that dominated the field for two decades, exemplified by AutoDock Vina [
4] and X-Score [
5], combine weighted physicochemical terms in an additive functional form and plateau at Pearson correlations of about 0.60 on the CASF-2016 scoring power benchmark of Su et al. [
6]. Their accuracy is bounded by the expressiveness of the analytic form rather than by the data, and the ceiling has resisted incremental refinement. A complementary and more physically grounded family of methods estimates binding free energies from molecular dynamics endpoints rather than from a single docked pose, most prominently the molecular mechanics Poisson–Boltzmann and generalised Born surface area approaches, which provide an explicit account of solvation and configurational averaging at a markedly higher computational cost [
7].
Over the past five years, atom-level graph neural networks have displaced these classical scoring functions as the de facto state of the art. By representing the protein–ligand complex as a graph whose nodes are heavy atoms and whose edges are distance-based or contact-based, models such as InteractionGraphNet [
8], the geometric interaction network GIGN [
9], the interaction-based inductive bias framework EHIGN [
10], the multi-objective network PLANET [
11], and the physics-informed PIGNet2 [
12] have advanced the state of the art to Pearson correlations of approximately 0.84 on CASF-2016 and to roughly 0.50 on the leakage-protected LP-PDBBind partitions reported by Li et al. [
13]. The field is consolidating around the position that geometric inductive biases, when supplied through three-dimensional message passing, deliver substantial gains over sequence and ligand-only representations [
14,
15,
16,
17].
Despite this consolidation, two tensions continue to shape methodological work in protein–ligand affinity scoring. The first is the problem of dataset leakage, which the field has only recently learned to take seriously. The audit by Volkov et al. [
18] showed that random partitions of PDBBind overestimate held-out performance by approximately 0.20 in Pearson correlation relative to target-disjoint splits, an inflation attributable to ligand similarity and protein-family memorisation rather than to learned chemistry. Bernett et al. [
19] later formalised this into a set of guiding questions that any biological machine learning benchmark must answer before its reported performance can be trusted. In response the community has converged on progressively stricter evaluation protocols, including target clustering at 30% sequence identity, ligand-similarity cutoffs, the LP-PDBBind splits of Li and colleagues, and most recently the optimisation-based DataSAIL splits of Joeres et al. [
20], in which a binary linear program directly minimises the residual information leakage between training and test partitions. The second tension concerns the chemical interpretability of the features that drive prediction. Distance-based message passing on atomic coordinates is sufficiently expressive that it is often unclear whether additional, hand-crafted geometric descriptors of the binding pocket carry information that is genuinely complementary to what three-dimensional atomic graphs already encode, and unclear where in the pipeline such descriptors contribute most.
Discrete differential geometry, as developed in the applied mesh processing literature [
21,
22], provides a principled vocabulary for describing the binding pocket as a triangulated two-manifold and extracting rotation-invariant, translation-invariant and approximately isometry-invariant descriptors of its shape. Mean and Gaussian curvatures characterise local convexity and topological deviation, the principal curvatures and the derived shape index resolve saddle, ridge and bowl geometries, the leading eigenvalues of the discrete Laplace–Beltrami operator constitute the so-called shape DNA of the surface and encode its global spectrum in the sense of Reuter et al. [
23], and the heat-kernel signature of Sun et al. [
24] provides a multi-scale descriptor by sampling the diagonal of the heat kernel at several diffusion times. Short diffusion times reveal local curvature, while longer times report on cavity dimension and global enclosure. These descriptors have been profitably used in protein-protein interaction studies through MaSIF and its differentiable extension dMaSIF [
25,
26], and surface representations have very recently been combined with multimodal encoders to predict binding affinities [
27]. Curvature-aware graph neural networks such as CurvAGN [
28] have shown that pocket geometry can be exploited inside three-dimensional message-passing architectures, and the broader programme of topological data analysis has produced a family of persistent homology, persistent Laplacian and Mayer homology descriptors for binding affinity that reach state-of-the-art accuracy on standard benchmarks [
29,
30]. What has not been quantified, to our knowledge, is the direct utility of a fixed-size, interpretable discrete differential geometry descriptor of the ligand-aware pocket surface for protein–ligand affinity scoring under the strictest leakage-protected protocols that the community now demands.
The present study addresses this gap. Our first concern is to establish whether a compact discrete differential geometry descriptor of the ligand-aware solvent-excluded pocket surface carries a leakage-robust affinity signal that survives increasingly strict evaluation regimes, up to and including the strictest leak-proof partitions currently available. Our second concern is to localise, through ablation and TreeSHAP attribution, the components of the descriptor that drive that signal, so that the matter-of-fact statement that pocket geometry helps can be refined into a chemically interpretable claim about which geometric modes carry information. Our third concern is to delineate the operational scope of the descriptor by comparing it to representative atom-level graph neural network treatments of the same geometric content. To this end we compute a 59-dimensional descriptor on every PDBBind v2020 refined-set complex with publicly retrievable structures and a parseable mesh (
), benchmark it across four internal split regimes of increasing stringency, including a
k-mer Jaccard cluster-disjoint protocol at three thresholds, and evaluate it against two external held-out benchmarks, namely the CASF-2016 core set and the leak-proof LP-PDBBind DataSAIL S2 partition. We report bootstrap confidence intervals over 500 resamples, a 5000-iteration paired permutation test, and a five-seed cluster-split ensemble. As a complementary architectural analysis, we further inject the same descriptor into a SchNet-style distance-based graph neural network [
31] through four mechanistically distinct strategies, namely global concatenation at the readout stage, per-atom projection at the embedding stage, geometry-conditioned cross-attention, and curvature and heat-kernel-conditioned edge features in the spirit of CurvAGN.
Our principal finding is that pocket-surface discrete differential geometry behaves as a leakage-robust feature class. In gradient-boosted tree pipelines without direct access to atomic coordinates, the descriptor produces consistent and statistically significant gains that grow with split stringency, reaching 0.111 in Pearson correlation on cluster-disjoint testing, 0.258 on the leak-proof LP-PDBBind DataSAIL S2 partition and 0.365 on CASF-2016, while the descriptor used in isolation matches classical scoring-function performance on both external benchmarks. Component ablation and SHAP attribution identify the heat-kernel signature, particularly at short and mid-range diffusion times, as the dominant carrier of the signal, consistent with the physical interpretation of those time scales as probes of local curvature, cavity dimension and the rigidity modes that govern desolvation and binding entropy. The complementary architectural analysis demonstrates that, within a small SchNet-style distance-based backbone, none of the four injection strategies exceeds a Pearson lift of 0.02 on cluster-disjoint testing, with a mean lift of approximately 0.004. Read together with the strong feature-based effect, this absence of lift is best understood not as evidence that the geometric content is uninformative but as evidence that distance-based message passing on atomic coordinates already captures much of it, leaving an architectural frontier for networks that treat the pocket surface as a first-class object rather than as a collection of atom-anchored projections.
The practical implication is that pocket-surface discrete differential geometry is well suited to the lightweight, interpretable and computationally efficient pipelines used in early-stage virtual screening and in resource-limited settings, where it can recover classical scoring-function performance from a 59-dimensional descriptor while remaining inspectable by a medicinal chemist. At the same time, our results motivate a research direction in which ligand atoms attend directly to the spectral and curvature information of the pocket surface through hybrid mesh-to-atom architectures, extending the surface-based geometric deep learning programme that has so far been most successful in protein-protein interaction prediction to the protein–ligand affinity setting.
3. Discussion
Our analyses position pocket-surface discrete differential geometry as a leakage-robust feature class for protein–ligand binding affinity prediction with a clearly delineated operational scope. In gradient-boosted tree pipelines that operate on fixed-size feature vectors and lack direct access to three-dimensional atomic coordinates, the 59-dimensional descriptor delivers consistent and statistically significant gains across progressively stricter evaluation regimes. The Pearson lift grows monotonically with split stringency, from 0.027 on a random partition to 0.111 on a k-mer Jaccard cluster-disjoint partition, to 0.258 on the leak-proof LP-PDBBind DataSAIL S2 partition and to 0.365 on the CASF-2016 core set, in a profile that confirms the geometric content captures genuine binding determinants rather than memorisation artefacts. The descriptor used in isolation reaches Pearson correlations between 0.456 and 0.594 on external benchmarks, on a par with the long-standing classical scoring functions X-Score and AutoDock Vina. The ablation, the permutation test, the five-seed ensemble, the surface area partial-out control and the TreeSHAP analysis collectively identify the heat-kernel signature, particularly at short and mid-range diffusion times, as the dominant carrier of the signal. These spectral features encode multi-scale information about the pocket, including local curvature, cavity dimensions relevant to desolvation, and the global enclosure and rigidity modes that influence the entropic component of binding.
The complementary architectural analysis sharpens rather than contradicts this picture. The same descriptor injected into a SchNet-style distance-based graph neural network through four mechanistically distinct strategies, namely global concatenation at the readout, per-atom projection at the embedding stage, DDG-conditional cross-attention biased by per-atom geometric similarity, and curvature-conditioned edge features that augment distance-based edge attributes with differences of mean and principal curvature and with heat-kernel cosine distance, produces values in the range across both splits, never exceeds the 0.02 marginality threshold on cluster-disjoint testing and has a mean of approximately 0.004. The four-fold consistency, across node-level, attention-level and edge-level injection points, is consistent with the interpretation that distance-based message passing on atomic coordinates within this small backbone already encodes a substantial part of the geometric information that the discrete differential geometry block makes explicit. We emphasise, however, that redundancy of geometric information is not the only possible explanation for the absence of a lift. The result may equally reflect the limited capacity and short training schedule of the deliberately compact backbone, the specific fusion mechanisms we tested, rather than the full space of possible ones, or limited statistical power, given that multi-seed confidence intervals on the graph neural network experiments were not collected. The most defensible reading is therefore the narrower one, namely that under the present SchNet-style backbone and the four tested integration strategies, no statistically significant improvement was observed, and that the residual signal does not justify the added architectural complexity in this specific setting. We emphasise that our SchNet baseline at a Pearson correlation of approximately 0.68 on the random split is small and short-trained relative to published state-of-the-art three-dimensional graph neural networks at approximately 0.84 on CASF-2016, so this conclusion is best stated as architecture dependence within the SchNet family rather than as a universal claim about all three-dimensional message-passing networks. The result is most useful read as a pipeline-design finding rather than as a competitive benchmark on large 3D networks, and it identifies a concrete research direction in the form of hybrid mesh-to-atom attention that we discuss below.
These findings carry direct practical implications for computational drug design. For resource-constrained or early-stage virtual screening pipelines that rely on sequence or ligand-fingerprint features alone, or where full three-dimensional docking and equivariant message passing are prohibitively expensive at the required throughput, the discrete differential geometry block offers a lightweight, interpretable and computationally efficient complement that recovers classical scoring-function performance from a single 59-dimensional surface descriptor. The descriptor adds an information channel that is invariant under sequence distribution shift, as documented by the CASF-2016 lift, and it does so at a cost dominated by mesh construction and Laplace–Beltrami eigendecomposition. To quantify this claim, the full discrete differential geometry batch over the 3285 complexes completed in approximately 36 min on a single NVIDIA T4 graphics processing unit, an average of roughly 0.65 s per complex, including implicit-surface sampling, marching-cubes meshing, Taubin smoothing, quadric-error decimation and the spectral computation, after which each feature-based fit, evaluate and bootstrap cycle required at most five minutes. This descriptor-generation cost is incurred once per complex and is modest relative to the training of large three-dimensional graph neural networks, which typically require many graphics-processing-unit hours; the lightweight characterisation therefore refers to the end-to-end cost of computing and using the descriptor rather than to its dimensionality alone. The interpretability is a non-trivial advantage in medicinal chemistry contexts where post hoc rationalisation of model predictions is required for internal review or regulatory submission. The TreeSHAP attributions identify the short-time-scale heat-kernel signature and the low-order Laplace–Beltrami eigenvalues as the dominant features, and both correspond to chemically meaningful properties of the binding pocket, namely local curvature and global enclosure, that medicinal chemists can inspect in the context of structural hypotheses about hot-spots, druggability and selectivity.
For modern three-dimensional graph neural network pipelines, our results suggest that the explicit injection of pocket-surface discrete differential geometry is largely redundant under current distance-based message-passing paradigms, at least within the SchNet family we tested. The natural next direction is the development of hybrid mesh-to-atom architectures, in which ligand atoms attend directly to the full pocket surface rather than to per-atom projections of mesh quantities, weighted by curvature or spectral signatures. Such an architecture would allow the model to dynamically select the geometric modes most relevant to a given ligand-pocket pair, would treat the surface as a first-class object amenable to multi-scale spectral analysis, and would, in principle, preserve the rotational and translational invariance that motivates the use of discrete differential geometry in the first place. The architectural design space this opens up is broad, including, for example, surface attention layers conditioned on heat-kernel signature similarity at multiple diffusion times, or equivariant transformers that operate jointly on atomic point clouds and on mesh patches sampled around the binding cleft.
We acknowledge several limitations of the present study. Our experiments were conducted on PDBBind v2020 with partial mirror coverage, namely 3285 of 3558 refined-set complexes, 253 of 285 CASF-2016 complexes and 626 of 709 LP-PDBBind DataSAIL S2 complexes, with the missing complexes itemised in
Appendix A. The complementary architectural analysis used a deliberately compact SchNet-style backbone chosen to enable a controlled feature-injection ablation under a constrained compute budget, and our SchNet baseline at approximately 0.68 on the random split lies below published large-scale models at approximately 0.84 on CASF-2016, so the architectural conclusion is restricted to the SchNet family. Multi-seed confidence intervals on the graph neural network results were not collected due to compute budget, and the reported
values are best-epoch single-seed differences whose uncertainty is bounded only by the four-mechanism consistency we report. The architectural space we examined does not include equivariant networks such as
-NN or NequIP, large-scale distance-based networks at IGN or GIGN scale, or architectures that operate directly on the LA-SES mesh rather than projecting per-atom geometric quantities to atoms. Whether the discrete differential geometry block provides additive value in those architectures remains an open and high-priority question that we identify for future work.
A related limitation concerns the scope of the baseline comparison. Our external comparison in
Table 3 positions the descriptor against classical empirical scoring functions and against published atom-level three-dimensional graph neural networks, but it does not include a like-for-like re-evaluation of stronger modern architectures under the identical leakage-protected protocol, in particular, equivariant networks, large interaction-focused protein–ligand networks, surface-based geometric deep learning models such as MaSIF and dMaSIF, and the persistent-homology, persistent-Laplacian and Mayer-homology topological descriptors that reach state-of-the-art accuracy on standard benchmarks. The literature values we cite were obtained on the full benchmarks rather than on our exact partitions, so they bound the competitive gap only approximately and do not control for leakage in the same way as our internal protocol. Our results therefore establish that the descriptor is a competitive lightweight feature-engineering approach, but they do not establish whether it offers complementary value beyond these stronger three-dimensional and surface-based models. Settling this question would require retraining each such model under the DataSAIL S2 and cluster-disjoint protocols used here, which we regard as an important direction for future work.
The exclusion of the 273 complexes for which a structure could not be retrieved in a parseable form or for which mesh computation failed may introduce a selection effect. The large majority of these exclusions arises from the structure being absent from the public mirror, a cause that is unrelated to pocket geometry or to binding affinity and is therefore expected to act approximately at random, while a smaller fraction reflects rejection by RDKit at the strictest sanitisation level, which may correlate weakly with unusual ligand chemistry. The retained subset spans the same pK dynamic range and distribution as the full labelled set (
Figure 2), which argues against a strong affinity-dependent bias, but we cannot fully exclude a residual selection effect, and confirmation on a complete-coverage structure source would remove this caveat.
4. Materials and Methods
This section describes each stage of the analysis workflow of
Figure 1 in detail, beginning with the construction of the ligand-aware pocket surface and the extraction of its geometric descriptors, continuing with the two parallel modelling pipelines we evaluated, and concluding with the leakage-protected splits and external benchmarks used for assessment.
4.1. Datasets and Affinity Labels
PDBBind v2020 protein–ligand structures and their associated binding affinities, expressed throughout this work as
from
or
values, were obtained from the public Hugging Face mirror
jglaser/pdbbind_complexes containing 16,079 complexes, joined with the BindingDB-derived affinity table
jglaser/binding_affinity of 1,836,729 records on the composite key formed by the protein sequence and the canonical ligand SMILES. After filtering for sequence length between 20 and 2000,
and uniqueness of PDB identifier, we retained 3558 labelled complexes. Three-dimensional protein structures and ligand SDF files were retrieved from the
THU-ATOM/PDBbind mirror, of which 3285 complexes (92.3%) were available in parseable form, the remaining 273 were either absent from the mirror or rejected by RDKit at the strictest sanitisation level. Affinity labels for the CASF-2016 core set and the DataSAIL-curated LP-PDBBind splits were obtained from Zenodo records 8091220 and 17376012, respectively. Structure files for CASF-2016 and the LP-PDBBind DataSAIL S2 test partition were retrieved through the same mirror channel with 88.8% and 88.3% coverage, yielding
for CASF-2016 and
for LP-PDBBind DataSAIL S2. The complete filtration log from the raw mirrors to the final analysis set, and an itemised list of the complexes absent from the mirror, are reported in
Table A1 of
Appendix A.
4.2. Ligand-Aware Solvent-Excluded Surface
For every complex we constructed a ligand-aware solvent-excluded surface (LA-SES) using a probe of radius 1.4 Å rolled over protein heavy atoms within 10 Å of any ligand heavy atom. The implicit signed distance field was sampled on a grid of resolution 0.8 Å, or 1.12 Å for very large bounding boxes, and converted to a triangle mesh by marching cubes. The largest connected component was retained, smoothed with eight Taubin iterations (, ), and decimated by quadric-error-metric edge collapse to at most 12,000 faces. The resulting mesh sizes were uniform across the dataset, with and .
4.3. Discrete Differential Geometry Descriptors
On every LA-SES mesh we computed five families of per-vertex geometric quantities, summarised by their distributional moments to yield a fixed-size descriptor. The mean curvature was obtained via the cotangent Laplacian as
where
is the discrete Laplace–Beltrami operator on the mesh and
denotes the embedding of a vertex in three-dimensional space. The Gaussian curvature
K was computed from the angle deficit divided by the Voronoi mixed area, and the principal curvatures
were extracted using
libigl’s
principal_curvature routine. From the principal curvatures we derived the shape index
which maps the local surface geometry to a scalar value in
that distinguishes cup, rut, saddle, ridge and cap configurations. The heat-kernel signatures were evaluated at ten logarithmically spaced diffusion times
t as
where
and
are the
i-th eigenvalue and eigenfunction of the discrete Laplace–Beltrami operator. Each per-vertex distribution was summarised by six moments, namely the mean, standard deviation, the tenth and ninetieth percentiles, skewness and kurtosis. To these moment summaries we appended the leading sixteen Laplace–Beltrami eigenvalues and three mesh statistics, namely the number of vertices, the number of faces and the surface area. The concatenation produced a fixed 59-dimensional per-complex descriptor. The dataset-wide kernel density estimates of the per-vertex curvature features and of the heat-kernel signature spectrum across all 3285 pocket meshes are provided in
Figure A4 and
Figure A5 of
Appendix B. Per-channel rank correlations between individual DDG features and pK on the full set, including LA-SES surface area, mean curvature
, principal
and the second Laplace–Beltrami eigenvalue
, are summarised in
Figure A6 of the same appendix.
4.4. Auxiliary Feature Blocks
For comparative analysis and as baselines, we computed four conventional feature blocks. The first is a one-dimensional ligand size baseline equal to the heavy-atom count of the ligand. The second is a nine-dimensional vector of canonical RDKit descriptors, namely MolWt, HeavyAtoms, RotBonds, LogP, TPSA, HBA, HBD, Rings and AromaticRings. The third is a 1024-bit Morgan circular fingerprint of radius 2. The fourth is a top-1024 protein 3-mer count vector obtained by applying DictVectorizer to all length-three substrings of the protein sequence and selecting the most frequent features by document frequency. The combined fingerprint and 3-mer baseline, referred to throughout as concat, has 2048 dimensions, and the addition of the discrete differential geometry block produces a 2107-dimensional feature vector.
4.5. Train–Test Splits
We evaluated four split regimes on the labelled set of 3285 complexes. The random regime used an 80/20 train–test partition with seed 42. The target-disjoint regime forbade any full sequence shared between train and test partitions, with seed 43. The
k-mer Jaccard cluster-disjoint regime first clustered unique sequences by greedy single-linkage on
amino-acid
k-mer Jaccard similarity at thresholds
, and then constructed train and test partitions whose cluster assignments were disjoint, with seed 44. The time-like alphabetical regime sorted complexes by PDB identifier and assigned the alphabetically last 20% to the test partition. For external benchmarking we used the CASF-2016 core set of 285 complexes as defined by Su et al. [
6] and the leak-proof LP-PDBBind DataSAIL S2 partition of 709 complexes as distributed by the DataSAIL benchmark [
20]. For both external benchmarks all overlapping PDB identifiers were removed from the training partition, yielding
between 3212 and 3215 depending on the benchmark.
4.6. Feature-Based Models
All feature-based models were trained with HistGradientBoostingRegressor from scikit-learn 1.4 using a Huber loss, a learning rate of 0.05, a maximum tree depth of six and no early stopping, with for feature counts at or below 100 and for higher feature counts. Linear regression was used for the size-only baseline.
4.7. SchNet-Style Atom-Level Graph Neural Network and DDG Injection Mechanisms
For the complementary architectural analysis, we employed a SchNet-style distance-based graph neural network with four InteractionBlock layers, 96 hidden channels, a 6 Å spatial cutoff, 32 Gaussian radial basis functions, separate ligand and pocket atom-type embeddings, a ligand-restricted global sum-pool and a three-layer multilayer perceptron readout. Training used a Smooth-L1 loss, the AdamW optimiser with , and weight decay , an initial learning rate of with cosine decay to , gradient clipping at norm 5, batch size 8, between 30 and 35 epochs and deterministic seed 42. The backbone was deliberately kept compact to enable a controlled feature-injection ablation under a constrained compute budget, and we therefore stress that the SchNet baseline reported here, which reaches a Pearson correlation of approximately 0.68 on the random split, lies below the published state-of-the-art for atom-level graph neural networks at approximately 0.84 on CASF-2016. Our architectural conclusions are accordingly framed as feature-injection sensitivity within the SchNet family rather than as a generalisation to all three-dimensional atom-level networks.
The first injection mechanism, denoted B.2, performs global concatenation of the 59-dimensional per-complex DDG descriptor with the pooled graph embedding immediately before the multilayer perceptron readout. No information is shared between the geometric descriptor and the per-atom message passing. The second mechanism, B.3.1, performs per-atom projection of geometric information at the embedding stage. For each ligand and pocket atom, the nearest LA-SES mesh vertex is identified by Euclidean distance and a 16-dimensional per-atom geometric slice is gathered, comprising the per-vertex mean and Gaussian curvatures, the principal curvatures, the shape index and the ten-dimensional heat-kernel signature at that vertex. This slice is projected through a two-layer SiLU multilayer perceptron of sizes
and added to the atomic embedding before the first message-passing block; the remainder of the backbone is unchanged. The third mechanism, B.3.2, applies a DDG-conditional cross-attention layer after the four message-passing blocks, in which ligand atom embeddings serve as queries and pocket atom embeddings serve as keys and values within a four-head attention layer of width 96. The pre-softmax attention score between query atom
i and key atom
j is biased by an additive geometric term according to
where
is the standard scaled dot-product attention score,
and
are the same 16-dimensional per-atom geometric slices used in B.3.1, cos denotes the cosine similarity and
is a fixed scalar. The attention output is added back to the ligand embeddings as a residual, ligand atoms are sum-pooled and the readout multilayer perceptron is applied. The fourth mechanism, B.3.3, implements a curvature-conditioned edge-feature scheme in the spirit of CurvAGN. Standard 32-dimensional Gaussian-expanded distances are augmented with three additional geometric channels, each Gaussian-expanded over sixteen centres,
where the per-vertex quantities at atomic positions are evaluated at the corresponding nearest mesh vertices. The first
InteractionBlock edge multilayer perceptron is widened to ingest the resulting 80-dimensional edge attribute, while downstream blocks are unchanged. All four mechanisms share the same backbone, training schedule, hyperparameters and per-split atom-only baselines retrained jointly within the same compute session to control for stochastic seed effects. Multi-seed confidence intervals on the GNN results were not collected due to compute budget, and the reported Pearson lifts should accordingly be read as best-epoch single-seed point estimates whose reliability is bounded by the four-fold consistency across mechanistically distinct injection points reported in the Results.
4.8. Statistical Evaluation
We report the Pearson correlation alongside the Spearman and Kendall rank correlations, together with root-mean-square error and mean absolute error, with 95% confidence intervals derived from 500 bootstrap resamples of the test fold. The two rank correlations are reported because the practical use of a scoring function in virtual screening is the correct ordering of candidate molecules, for which rank-based measures are more directly informative than the value-sensitive Pearson coefficient. Pairwise significance of the lift induced by adding the discrete differential geometry block was assessed by a 5000-iteration paired permutation test, in which for each held-out sample the model assignment, base or with DDG, was randomly swapped, and the resulting null distribution of the Pearson difference was compared against the observed . Robustness was further evaluated through a five-seed ensemble that varied both the cluster-split random number generator seed across 44 to 48 and the gradient-booster seed across 0 to 4.
4.9. Reproducibility
The feature-based experiments were conducted on a single Apple Silicon laptop, while the discrete differential geometry batch processing and the graph neural network training were carried out on Google Colaboratory with an NVIDIA T4 GPU. End-to-end runtime was approximately 36 min for the 3285-complex DDG batch, less than five minutes per feature-based configuration, and approximately two minutes per 30-epoch graph neural network seed. To make the pipeline reproducible without reference to the released code, the mesh-generation parameters, the gradient-boosted tree hyperparameters and the exact random seeds used for every split and ensemble are consolidated in
Table A8, and the complete list of software versions used in the analysis pipeline is provided in
Table A7, both in
Appendix G.
5. Conclusions
We have presented a controlled, leakage-protected and architecturally explicit evaluation of pocket-surface discrete differential geometry as a feature class for protein–ligand binding affinity prediction. The 59-dimensional descriptor, computed from the ligand-aware solvent-excluded surface and combining mean, Gaussian and principal curvature distributions with the leading sixteen Laplace–Beltrami eigenvalues and a ten-point heat-kernel signature, delivers consistent and statistically significant performance gains in gradient-boosted tree pipelines across progressively stricter evaluation regimes, including a Pearson correlation lift of 0.111 on a k-mer Jaccard cluster-disjoint split with permutation , 0.258 on the leak-proof LP-PDBBind DataSAIL S2 partition and 0.365 on CASF-2016. The descriptor used in isolation reaches Pearson correlations between 0.456 and 0.594 on external benchmarks, recovering the performance of long-standing classical scoring functions such as X-Score and AutoDock Vina from a single architecture-light surface descriptor. Component ablation and TreeSHAP attribution localise the dominant contribution to the heat-kernel signature, particularly at short and mid-range diffusion times that probe local curvature, cavity dimension and global enclosure.
The same descriptor, when injected into a SchNet-style distance-based graph neural network through global concatenation, per-atom projection, DDG-conditional cross-attention or curvature-conditioned edge features, produces only marginal and statistically non-significant lifts on cluster-disjoint testing, with a mean lift of approximately 0.004. Under the present compact backbone and these four integration strategies, the explicit injection of the descriptor therefore did not yield a measurable improvement. This negative result is consistent with the geometric content being already captured by distance-based message passing on atomic coordinates, but it may also reflect the limited capacity and training budget of the backbone or the particular fusion schemes tested, and we accordingly frame it as specific to this architectural setting rather than as a general property of three-dimensional networks.
Taken together these findings define a clear practical and conceptual boundary. Pocket-surface discrete differential geometry is most valuable in lightweight, fingerprint or sequence-based pipelines common in early-stage virtual screening and in resource-limited settings, where it functions as an interpretable, leakage-robust and computationally efficient feature class. For the SchNet-style distance-based graph neural network we tested, the residual benefit of explicit injection is limited. The most promising direction we identify is the development of hybrid mesh-to-atom attention mechanisms, in which ligand atoms attend directly to the rich spectral and curvature information of the pocket surface, and which would allow larger and more diverse geometric deep learning models to exploit the full informational content of the binding cleft as a two-manifold object. All processed data, computed descriptors, trained models and code are openly released to facilitate reproduction and further methodological development.