1. Introduction
Single-cell RNA sequencing (scRNA-seq) has transformed transcriptomics by revealing the rich cell-to-cell heterogeneity masked in bulk averages, uncovering rare subpopulations, transitional states, and lineage hierarchies across tissues and species [
1,
2]. Yet each cell is still frozen in time—captured under one environmental condition and at a single instant—so most temporal trajectories and stimulus-response behaviors remain invisible. Systematically probing these dimensions demands direct perturbations: transcriptional readouts following targeted genetic edits illuminate how regulatory circuits maintain identity, pinpoint druggable control nodes, and expose synergistic gene pairs that can amplify combination therapies or guide precise cell-engineering strategies [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16]. CRISPR activation or inhibition screens joined with scRNA-seq (“Perturb-seq”) meet this need by bar-coding guide RNAs so that every sequenced cell records its own perturbation history, enabling thousands of knockouts, knockdowns, or activations to be multiplexed in one experiment [
17,
18,
19]. The resulting high-dimensional maps of genotype-to-phenotype relationships have already sharpened target validation pipelines, suggested rescue strategies for disease mutations, and provided training data for machine-learning models that aspire to predict unseen edits. These advances have converged in ambitious blueprints for an in silico “AI virtual cell” (AIVC) capable of simulating cellular behavior across modalities, contexts, and perturbations [
20].
Several recent frameworks aim to predict perturbation responses from single-cell data, including transformer-based models (e.g., scGPT), large foundation models (e.g., CellFM), and graph-based approaches (e.g., GEARS) [
21,
22,
23]. While powerful, these pipelines are typically designed to be pretrained or adapted on very large single-cell corpora and often require multi-GPU hardware, substantial storage, and/or curated pathway priors. Their latent representations can also be difficult to map onto concrete biochemical rate laws, complicating mechanistic interpretation in routine lab settings. Moreover, recent analyses indicate that deep models have not yet been shown to systematically outperform simpler baselines for gene-perturbation prediction in every setting, underscoring the need for methods that are both accurate and practical [
24]. Neural-ODE approaches offer improved mechanistic fidelity but often assume fixed functional forms or entail numerically intensive integration over high-dimensional states, which limits tractability at the transcriptome scale [
25,
26,
27,
28,
29,
30,
31,
32,
33,
34].
LazyNet addresses this gap with a neural ODE that embeds a log–linear–exp residual block inside a single explicit Euler step. Working in log space compresses multiplicative gene–gene effects into a sparse, directly interpretable rate matrix, avoiding attention-style quadratic costs and heavy ODE solvers. Crucially, the model is designed for CPU-level efficiency (no GPU required), operates directly on count matrices without external pathway priors, and matches the common two-snapshot CRISPR A/I design by treating the pre- and postperturbation profiles as a finite-difference sample of the underlying dynamics.
Many CRISPR A/I studies, however, are small, self-contained projects: a single screen with tens of thousands of cells at most, limited local compute, and sometimes privacy, licensing, or logistical barriers that preclude aggregating data for large-scale pretraining or hosting heavy foundation models. In this regime, practitioners must typically train models from scratch on their own dataset under strict time and hardware budgets, often on CPU-only nodes. Our focus in this work is precisely this from-scratch, low-data setting. In the comparisons that follow, transformer and state-space baselines such as scGPT-style and CellFM/RetNet-style architectures are therefore used as non-pretrained architectural controls trained from random initialization on the same dataset, rather than in their usual large-scale pretraining–fine-tuning mode.
In this study we show that, when all models are trained from scratch on the same neuronal Perturb-seq matrix under a fixed 1 h budget, LazyNet attains competitive or superior predictive accuracy to transformer- and state-space-style baselines while training in hours on a single 24-core CPU node and that its elasticity-based analysis yields compact, biologically coherent subnetworks. These experiments are intended to reflect the practical setting of a stand-alone CRISPR A/I project in which large-scale pretraining of foundation models is not feasible, rather than to evaluate fully pretrained scGPT or CellFM in their natural high-data regime. Applied to ferroptosis, the framework recovers established regulators and nominates a lysosomal–mitochondrial–immune module for experimental follow-up, illustrating how CPU-tractable, interpretable modeling can turn routine perturbation screens into causal hypotheses at the transcriptome scale.
2. Materials and Methods
2.1. LazyNet Architecture
From
Figure 1, the full gene-expression vector (INPUT) is propagated through a two-stage transformation composed of a logarithmic layer (LOG) followed by an exponential layer (EXP). These twin layers realize multiplicative biology-inspired interactions in log space while retaining computational simplicity. The transformed signal is added back to the original input via a residual skip connection (black bar and “+”), producing an updated vector of identical dimensionality (OUTPUT). Stacking such blocks corresponds to explicit Euler integration of an underlying ordinary differential equation, allowing LazyNet to model gene-regulatory dynamics within a ResNet framework while continuously adjusting weights during training [
35].
LazyNet has already been applied successfully to both synthetic ODE benchmarks and real-world dynamical systems. Its core idea is to embed a log → linear → exp residual block inside explicit Euler approximation, so that every weight acts as a direct ODE rate constant and the network learns interpretable dynamics with minimal parameters.
An ODE of the form can be discretized with forward Euler:
LazyNet represents the rate law F with a single log–linear–exp (LLE) block:
Here
are learnable weight matrices and
,
are biases. Taking the logarithm converts multiplicative gene–gene interactions into additions, allowing a single neuron to encode an entire monomial exactly, while the outer exponential restores the original scale. More details and examples can be seen in
Supplementary S4.
During training, each observed state
is provided as input, the network predicts
, and the prediction is compared with ground truth. With sufficiently varied samples
and adequate temporal coverage
, the learned mapping approaches the true governing equations:
Because the ODE is Markovian, every update depends only on the current state, not on the full history; LazyNet therefore learns the system dynamics efficiently and with minimal parameter count.
In a CRISPR-A/I screen, the before-and-after expression profiles can be treated as Euler approximation of the underlying ODE—thereby sampling the true regulatory function directly.
2.2. Theoretical Rationale for Efficiency
Most biochemical rate laws can be written as a finite sum of monomials of the form . A representation that reproduces each monomial exactly captures the full reaction space.
LazyNet achieves this with a single log–linear–exp (LLE) block (2), where . Choosing one row of W equal to the integer exponent vector α and the corresponding bias reproduces the monomial exactly. Hence an entire K-term rate law needs only trainable scalars.
By contrast, a degree-N Taylor polynomial on the same positive domain requires parameters to reach uniform error . Thus, the parameter ratio grows super-polynomially with both dimension d and desired accuracy. Fewer parameters translate to fewer multiply–add operations per mini-batch and a smaller memory footprint, directly accelerating training.
Statistically, concentrating capacity onto a small set of mechanistically meaningful weights improves the effective signal-to-noise ratio: each observation updates compact exponent and coefficient parameters rather than being diffused across a large dictionary. Empirically, this manifests as faster convergence at larger learning rates, better small-sample generalization, and more stable optimization under residual connections.
Importantly, the multiplicative form encodes synergistic (higher-order, multi-locus) interactions as single monomials rather than as emergent products of many additive units. This is not only an efficiency gain; it directly addresses a substantive modeling need by making synergic effects first-class, precisely parameterized objects that can be learned robustly from limited data.
Transformer-based baselines such as scGPT and state-space models such as RetNet avoid the explicit coefficient blow-up of polynomial bases but incur other costs [
21,
22]. Attention layers scale as
in sequence length
, while state-space kernels move toward linear time yet introduce millions of additional parameters. Neither family encodes a monomial in closed form; multiplicative gene–gene terms are learned implicitly via gradient descent, typically demanding more data and iterations. In contrast, the LLE map captures multiplicative structure exactly with a forward pass that is linear in
(and in the retained term count
), yielding a favorable accuracy–efficiency–interpretability tradeoff.
For the kind of perturbation-effect prediction considered here, recent benchmarks suggest that deep models, including those inspired by large language models, do not yet consistently outperform simple baselines [
24]. This motivates exploring architectures that encode multiplicative biology more explicitly, such as the log–linear–exp block in LazyNet. Neural-ODE approaches can be accurate but usually require a prespecified differential form—e.g., fixed symbolic templates or universal Hill-type non-linearities [
28,
29,
30,
31,
32,
33,
34]—which constrains adaptability and can produce models that are numerically correct yet mechanistically misspecified. LazyNet retains an ODE viewpoint without fixing the functional form: placing the LLE block inside an explicit Euler residual update allows approximation of a broad class of interaction terms, so synergy is captured through learned exponents and weights rather than hand-engineered equations, improving both flexibility and mechanistic interpretability.
2.3. scRNA Data Preparation and Gene Selection
We analyze GSE152988, a CRISPR activation/interference screen in 53,495 human iPSC-derived neurons (67,077 genes including non-coding RNAs) with two expression snapshots per cell (a prestate “time-0” and a postperturbation state) [
36]. With no intermediate measurements, we treat the poststate as a single explicit update from the baseline, matching the minimal ODE setting used throughout this work. No external dynamics or pathway priors were introduced.
To construct “time-0” without leakage, we do not edit individual cells. Instead, we first compute an averaged baseline cell from library-size-normalized non-targeting controls (per batch/stratum as applicable). For each perturbation targeting gene g, the time-0 input is this averaged baseline vector with a single edit at g to encode the intervention: for CRISPRi/KO we set g to a small floor; for CRISPRa/OE we set g to the observed postperturbation level of g (estimated across the perturbed cells), while all other genes remain at control baselines. Because only the scalar for the targeted gene is updated—and never any multi-gene pattern or any individual test cell—this captures the intended one-gene intervention while preventing multi-gene peeking into the poststate.
After time-0 construction, we restrict to 18,152 Ensembl GRCh38 protein-coding genes for modeling and evaluation. Cells are grouped by (target gene × gRNA) (and by batch when available), and whole groups are assigned to train/validation/test at 80%/10%/10% to avoid guide- or batch-specific leakage across splits. On the working expression scale used for modeling, the held-out neuronal iPSC test targets have mean of 0.6082 and standard deviation of 3.212 (min 0, max 1203) as the data landscape.
Because only two snapshots are available, absolute rates are not identifiable; the product of step size and elasticity is what can be learned. Accordingly, downstream evaluation emphasizes one-step prediction errors and relative signals (e.g., rankings) that are invariant to monotone rescaling of the latent time step. When dense time series are available, a multi-step fit resolves Δt and improves identifiability (see
Section 4 and [
30,
31,
32]).
We further use a large human T-cell CRISPR guide-capture dataset (10× Genomics RNA with per-cell guide calls) comprising 36,601 genes × 103,805 cells [
37]. Expression matrices come from the “Gene Expression” count matrix and guide calls from the corresponding CellRanger export. Minimal QC is applied (genes detected in ≥3 cells; cells with ≥200 detected genes) to preserve a blind and reproducible comparison.
Each barcode is joined to its guide-call summary. Baseline cells (0/unknown guides) are used only to estimate time-0. Singlets (exactly one guide) form the training set; multiplets (≥2 guides) form the test set. After matching and filters, the split yields ~61k training cells and 28,707 test cells.
Counts are library-size-normalized with a small pseudocount and log-transformed. We select the top 25% highly variable genes (HVGs) across cells (~one quarter of 36,601 genes) and carry this fixed HVG panel into all downstream modeling and evaluation. On this normalized log scale, the held-out test targets have mean ≈ 0.19 and SD ≈ 2.30 (min 0, max 2620), which we report to orient error magnitudes in later sections.
2.4. Baseline Models
We compared LazyNet to two public baselines chosen to span the principal inductive biases for single-cell perturbation prediction without curated priors: a transformer (scGPT) and a state-space model (RetNet, representing the CellFM-style state-space family) [
21,
22,
23]. These baselines were selected because they (i) have maintained, open implementations with documented end-to-end pipelines; (ii) do not require curated cell-type labels or pathway priors (in contrast to, e.g., GEARS); (iii) are compatible with two-snapshot Perturb-seq inputs; and (iv) have prior use on closely related single-cell prediction tasks.
Together, scGPT and RetNet provide architectural diversity—attention versus state-space dynamics—yielding an appropriate, apples-to-apples contrast with LazyNet under a common data regime. In our experiments, both scGPT and RetNet are instantiated as randomly initialized architectures trained from scratch on the same dataset as LazyNet, without loading any large-scale pretrained checkpoints or external data. They should therefore be interpreted as non-pretrained architectural controls in a strictly from-scratch, low-data setting, rather than as evaluations of fully pretrained foundation models in their intended pretraining–fine-tuning regime.
2.5. Training Protocol
LazyNet is trained on our Intel Xeon CPU server (Intel Xeon Gold 6126, 24 cores, 768 GB RAM) without accelerators (Accessed by Big Data Computing Center of Southeast University, Nanjing, China).
The main runs use a Huber loss (δ = 0.1) for robustness to outliers and a three-replica LazyNet ensemble (independent random seeds; arithmetic averaging at inference). This ensemble smooths idiosyncratic local minima while preserving the interpretability of the learned elasticities. Complete training hyperparameters, exact command lines, and seeds are provided in the replication package.
In the ablations, the model depth is the most sensitive knob: increasing the log–exp pair beyond one layer frequently led to numerical instabilities (NaNs) or higher error, consistent with overfitting in a setting where a one-step ODE is already a low-complexity mechanism and LazyNet is highly expressive. Width is less sensitive, but a moderate range (several hundred units) performs best; widths that are too small underfit, while very wide models tend to overfit or fail to converge usefully within the time budget. The additive residual path is essential for ODE fitting—removing it substantially degrades accuracy or prevents stable training. We also varied the loss family (Huber/L1/L2 and δ), mini-batch size, and integration step length. Within the ~1 h per-replica budget (3 replicas), batch size and step length had negligible effect on test metrics, whereas expanding LazyNet beyond moderate capacity often undertrained within the time cap, yielding lower accuracy despite more parameters.
scGPT and RetNet are trained on a single NVIDIA V100 GPU with three CPU cores allocated (Accessed by Big Data Computing Center of Southeast University, Nanjing, China). To match LazyNet’s elapsed-time constraint, each baseline is capped at 1 h of wall-clock time per dataset, inclusive of data loading and evaluation, and is trained from random initialization without any external pretraining or pretrained checkpoints. We rely on the authors’ documented defaults where applicable, enable mixed precision and gradient accumulation as needed to respect memory limits, and do not introduce curated priors or external labels. The last checkpoint produced within the time cap is evaluated on the common test split. Regression fidelity is assessed with RMSE, MAE, and Pearson’s r on the modeling gene panel defined in
Section 2.3. Threshold-free ranking quality is summarized by ROC-AUC and PR-AUC computed directly from continuous predictions. When reported, F1 (micro and macro) uses a single per-gene operating point learned once on the training data via a fixed quantile rule and held fixed for evaluation. All models use identical train/validation/test partitions, deterministic data loaders, fixed random seeds, and the same evaluation scripts.
2.6. Inference and Visualization
To produce a directed, dynamical snapshot for network analysis, we averaged ten independently trained LazyNet checkpoints into a single ensemble and evaluated it on the full single-cell matrix (train + validation + test). From this ensemble we computed a Jacobian of absolute elasticities at a common baseline expression vector, restricted to the original 18,152 Ensembl protein-coding genes [
38]. For each literature-anchored seed gene (seven total;
Supplementary S1A), we ranked both downstream (row) and upstream (column) elasticities, retained neighbors with |∂xᵢ/∂xⱼ| ≥ 0.001 that also exceeded either the 95th or 99th percentile of that seed’s ranked list, and then kept the top-N survivors. Because a single-baseline Jacobian yields few candidates per seed, we set the Benjamini–Hochberg threshold to q = 1 (effectively no FDR filtering) and report the filtered ranked sets. Evaluation proceeds in two parts: first, whether each seed’s neighborhood recovers interactions reported in the literature; second, cross-seed recall, i.e., how often other seed genes appear among a given seed’s ranked neighbors.
The goal is to recover directed, dynamical relationships consistent with the ODE-based formulation—edges that reflect functional influence under perturbation, not merely correlation. For this reason, GENIE3/GRNBoost/SCENIC-style pipelines are not used as head-to-head comparators here. They operate as subgraph generators that prioritize local correlation edges and typically require a manual regulator prior (e.g., TF lists) to seed the search [
39,
40,
41]. Their outputs are commonly undirected and tuned to rank local associations, rather than to assemble a globally coherent dynamic network. They also lack the external authority of curated/experimental resources; when a comparator is needed, databases such as STRING and ARCHS4—grounded in empirical evidence—are more appropriate reference points than GENIE3-family predictions.
Even with authoritative databases, edge-level comparisons remain problematic: curated graphs are incomplete, context-dependent, and unevenly sampled, and the true functional wiring can vary substantially even within the same cell type and omic snapshot. We therefore evaluate at the module level, where the signal is more robust: given a set of curated modules (pathways, TF regulons), we quantify module activity/enrichment induced by the inferred network (or by model-predicted responses) and test whether the recovered activity patterns align with known biology. This enrichment-based evaluation emphasizes coherent functional programs over individual edges, respects the dynamical and directed nature of our model, and provides a fairer, more stable basis for comparison across methods and references.
2.7. ARCHS4 Co-Expression Validation
We annotate each directed edge (I → j) with the ARCHS4 Pearson correlation r(i,j). To control for confounding, enrichment is computed against a matched null: for each edge we sample matched pairs (i′ → j′) from strata that jointly bin genes by (i) control-cell detection/mean expression and (ii) out-/in-degree in the LazyNet elasticity graph (≈400 matched pairs per edge; 9860 edges; 3,943,792 null pairs). We report fractions exceeding |r| ≥ τ and the corresponding fold enrichment (Observed/Expected). For sign-aware checks, we multiply r by the LazyNet edge sign; because ARCHS4 is undirected/heterogeneous, we emphasize |r| results and treat sign panels as exploratory.
2.8. STRING Interaction Concordance
STRING functional associations (human, version 12.0) were converted from protein identifiers to HGNC symbols with the official mapping file [
42]. Isoform-specific and unmapped entries were removed. Here, 32 × 4 LazyNet subgraph edges were queried for exact symbol matches, and statistical significance of the overlap was assessed with a one-tailed hypergeometric test that treats the total number of possible unordered human gene pairs as the population size. Elasticity scores and STRING combined scores were each dichotomized at the 75th percentile to classify edges as “both-high”, “STRING-only”, or “LazyNet-only”. Degree-preserving Maslov–Sneppen rewiring (1000 iterations) provided a null distribution for network-density comparisons.
2.9. Proteomics Validation
Pulsed-SILAC proteomes were re-analyzed to evaluate the 32 × 4 LazyNet subgraph at the protein layer. For the tamoxifen-inducible GPX4-knockout time course in mouse fibroblasts (PRIDE PXD050979), proteinGroups.txt (PXD050979.txt) was taken as input [
43]. Reverse sequences and common contaminants were discarded; gene symbols were extracted and converted to upper-case HGNC format and duplicate entries were averaged, yielding 1026 proteins. Intensities for the light, medium, and heavy SILAC channels (nine runs) were treated as independent samples and zeros set to missing and imputed with the MinProb left-shift approach (μ − 1.8 σ; width 0.3 σ) as implemented in Perseus [
44]. Values were log
2-transformed and median-centered per run and proteins quantified in at least three channels with non-zero variance were retained.
For the larger constitutive GPX4-knockout study in Pfa1 mouse embryonic fibroblasts (PRIDE PXD040094) we used the combined_protein.tsv [
45]. Gene symbols were cleaned as above and duplicate symbols collapsed. Columns labeled “MaxLFQ Intensity” defined fourteen quantitative channels covering wild-type and knockout states. Missing values were replaced with MinProb and the matrix was log
2-transformed and median-centered. Proteins present in at least three channels and showing non-zero variance were kept.
For each dataset the cleaned protein matrix was intersected with the RNA-based LazyNet edge list. Gene pairs with at least three paired, finite observations were assigned a Spearman rank correlation coefficient. To control for hub bias a degree-matched null of 10k random edges was generated by stratified sampling across twenty-one degree bins (0-to-20+). Absolute correlation values of LazyNet edges were compared with the null using a two-sided Wilcoxon rank-sum test.
2.10. Experiment Layout
As shown in
Figure 2, we start from a single preprocessed neuronal iPSC Perturb-seq matrix (library-size normalization followed by a small-pseudocount transform, z = log(x + ε)) and run two parallel analyses.
The benchmark arm performs a one-time stratified split (8:1:1 train:validation:test) and trains scGPT, RetNet, and three LazyNet replicas on the training fold, tunes on validation, and reports performance on the locked test fold (RMSE, MAE, Pearson r; threshold-free AUCs where noted) [
21,
22,
23]. A T-cell screen is included only in this benchmark arm to test cross-dataset generalization under the identical pipeline; our central goal remains the neuronal dataset. All models are trained from scratch on the study dataset without any external pretraining or pretrained checkpoints.
The discovery arm fits ten independently seeded LazyNet models on the full neuronal matrix (train+validation+test), averages their Jacobians (restricted to 18,152 Ensembl protein-coding genes), and extracts a 32 × 4 breadth-first subgraph as a consensus network for interpretation. This subgraph is evaluated against authoritative resources—STRING v12 and ARCHS4 [
42,
46]—and cross-omic coherence is assessed in two GPX4-knockout SILAC proteomes from mouse embryonic fibroblasts (PRIDE PXD040094, PXD050979) [
43,
45]. The validated interactions are then mined for mechanistic hypotheses and used to nominate follow-up perturbations.
3. Results
3.1. Model Benchmark
Table 1 summarizes head-to-head results for a neuronal iPSC Perturb-seq slice for later inference and a T-cell guide-capture slice (top-quartile HVGs) solely used for the generalization test.
On the neuronal iPSC slice, LazyNet attains the strongest threshold-free ranking quality with the highest ROC-AUC and PR-AUC among the compared methods, while training faster on CPU and using a mid-range parameter count. Its absolute regression errors (RMSE, MAE) and correlation r are slightly worse than the best baseline, and its F1 (micro/macro) is lower at the fixed per-gene operating point—an expected AUC–F1 tradeoff when a single threshold is predeclared per gene. Together, the iPSC results indicate that LazyNet delivers the best ordering of perturbation effects under CPU constraints, with modestly higher regression error than the top baseline.
On the T-cell HVG slice, LazyNet again leads on ranking metrics with the highest ROC- and PR-AUC, and it achieves the best MAE, while its RMSE and Pearson r are lower than those of the strongest baseline and its F1 sits mid-pack at the fixed threshold (
Table 1). This pattern—excellent ranking with conservative amplitudes—matches the expected behavior when per-gene thresholds are learned once on singlets and then applied to a broader test distribution of multiplets.
Interpreting AUC and F1 together clarifies these results. ROC-/PR-AUC capture threshold-free ranking and are insensitive to the exact operating point; F1 fixes a single per-gene threshold learned on training singlets, so it can move counter to AUC when calibration shifts or variance broadens between train and test. In our setting, this explains why LazyNet can lead decisively in AUCs yet not maximize F1 on every slice. All definitions and thresholding rules are held constant across models to keep comparisons fair.
Overall, across both slices and under CPU-only execution with a three-run LazyNet ensemble, the model offers the strongest ranking of perturbation effects and a favorable MAE profile on T cells, while ceding some ground on RMSE and r to the best baseline on certain metrics. Because LazyNet instantiates a one-step ODE with log–exp residual dynamics, it exposes per-gene elasticities whose ensemble Jacobian yields directed, mechanism-level edges; we leverage this structure immediately in network inference, where the Jacobian is converted into a consensus subgraph and evaluated via module-level enrichment against curated resources.
It is important to emphasize that these comparisons are performed under a constrained, from-scratch regime: all models are trained only on the study dataset, with a 1 h wall-time cap and no access to any external pretraining or pretrained checkpoints. In contrast, scGPT and CellFM are designed primarily as large-scale foundation models to be pretrained on millions of cells and then adapted. Our results therefore show that, under strict from-scratch, low-data constraints, LazyNet can be competitive with transformer- and state-space-style architectures; they should not be interpreted as a statement that LazyNet outperforms fully pretrained scGPT or CellFM in general.
3.2. Network Inference
Ten independently trained LazyNet replicas were averaged into a single ensemble, and Jacobian elasticities were queried around seven ferroptosis seeds (GPX4, PSAP, ACSL4, WIPI2, MTOR, NFU1, SOD2), selection details appear in
Supplementary S1A. A broader parameter sweep, summarized in
Supplementary S1B, identified five expansion regimes that were carried forward (
Table 2).
In the broadest setting (512 × 1) each seed ballooned into a neighborhood of ~3000 genes—GPX4 alone recruited 1811—and yielded a mean per-seed recall of ~0.08 (≈2–3/27). Constraining the search to 64 × 3 reduced each subgraph to ~1000 genes and lifted the mean per-seed recall to ~0.40 (≈11/27). The most selective regime, 32 × 4, fused all seven seeds into a single graph of 4641 genes and 11,676 edges while preserving a benchmark recall of 15/27 and nudging the hit-rate upward; its data are in
Supplementary S1C. Because only 27 of 18,152 genes are treated as positives (base rate ≈0.15%), these fractions are numerically small; viewed as enrichment, the 32×4 graph recovers 15 regulators versus ~6.9 expected by chance (~2.17×).
Evaluating all the gene seeds, per-seed recalls (
Supplementary S1D) highlight a practical strength: starting from just one ferroptosis gene, LazyNet typically retrieves ~10 additional benchmarks when run at top = 32, 64, or 128. In other words, minimal prior knowledge suffices to reconstruct much of the ferroptosis module—evidence of the method’s consistency and day-to-day utility.
LazyNet recovers 56% (15/27) of benchmark ferroptosis interactions in <6 h on a single 24-core CPU node (768 GB RAM, no GPUs). By comparison, GENIE3 recovers ~33% of gold-standard E. coli edges after analyzing > 100 microarrays and ~48 h on 32 cores [
39,
47]; GRNBoost recalls ~22% in a 68 k-cell PBMC atlas [
40,
48]; and DeepFGRN reaches ~40% but requires dense time-series data and multi-GPU training (4 × A100, several days) [
41]. Thus, LazyNet achieves state-of-the-art recall with orders-of-magnitude fewer time points and far lighter compute while still yielding biologically interpretable subnetworks from a single seed gene.
The 32 × 4 expansion is chosen for all subsequent analyses because it strikes the right balance between biological depth and practical scope. Limiting each hop to 32 of the strongest elasticity neighbors keeps the growth of the graph contained, while four propagation steps are ample to capture multi-step regulatory cascades. The resulting subgraph (4641 genes and 11,676 edges) remains small enough for reliable enrichment statistics, manual curation, and clear visualization, yet large enough to preserve the key ferroptosis signals that LazyNet uncovers. This setting therefore provides a tractable, biologically informative backbone for downstream validation and interpretation.
3.3. ARCHS4 Co-Expression
Of the 11,676 edges in the 32 × 4 subgraph, 9860 (84.4%) mapped to the ARCHS4 pan-human matrix and received a co-expression value (Pearson r) [
46]. The r distribution is broadly centered near zero (mean ≈ 0.053; median ≈ 0.044) and closely matches degree-matched random pairs (
Supplementary S2A).
In an unsigned view, LazyNet edges are far more likely than random to show moderate co-expression: 26.6% of edges meet |r| ≥ 0.20 versus ~5% of random pairs—an ≈5 × enrichment—indicating that the model links genes that frequently co-vary across thousands of public RNA-seq datasets (
Supplementary S2B).
Sign-aware analysis refines this picture in
Figure 3: the positive, sign-matched tail is enriched (~1.36 fold), the opposite-sign tail is depleted (~0.67 fold), and the overall |r| ≥ 0.20 mass is near expectation (~1.02 fold) once signs are considered (
Supplementary S2C). This is consistent with the biology: LazyNet elasticities encode local, directional responses in a one-step ODE, whereas ARCHS4 aggregates steady-state, undirected correlations across heterogeneous tissues; inhibitory or context-specific links can therefore appear anti-correlated in the aggregate.
Taken together, the strong unsigned enrichment and the direction-consistent positive-tail enrichment indicate that a substantial subset of LazyNet edges is corroborated by large-scale expression evidence while respecting expected sign- and context-dependent effects.
3.4. STRING Database and Literature Validation
We evaluated the 32 × 4 LazyNet subgraph against a filtered STRING network containing the 18,152 genes detected in our single-cell atlas [
42]. Of the 4630 genes retained by LazyNet, STRING supplies 411,761 undirected links, giving an observed density of 0.0384. Size-matched random gene sets drawn 1000 times from the same background average of 0.0373 ± 0.0008, placing the subgraph 1.5 s.d. above expectation (empirical
p = 0.072)—evidence for modest but over-connectivity (
Supplementary S3A).
Edgewise, 11,662 LazyNet elasticity links intersect STRING in 523 cases (Jaccard ≈ 1 × 10
−4). A one-tailed hypergeometric test confirms this overlap is highly non-random (
p = 1.2 × 10
−5). Although elasticity magnitudes and STRING scores show no global correlation (Pearson r = –0.071,
p = 0.10; Spearman ρ = –0.036), a top-quartile split in each metric separates three informative classes: 27 “both-high” edges, 104 STRING-high only (canonical interactions), and 104 LazyNet-high only—highlighting under-explored relationships for experimental follow-up. Representative literature support for ten both-high edges is provided in
Table 3.
Among the findings, the moderate enrichments are entirely acceptable—and even expected—because STRING confidence and LazyNet elasticity quantify different biological facets. STRING aggregates heterogeneous evidence (co-expression, physical binding, literature co-mentions), whereas elasticity captures the transcriptional impact inferred by our ODE model. Perfect concordance would therefore be unrealistic; instead, the modest density excess and significant, yet limited, edge-level overlap demonstrate that LazyNet recovers a meaningful core of known biology while offering a large complement of novel, model-specific interactions.
3.5. Proteomics Corroborates Transcript-Derived Network Structure
Cross-omics robustness was assessed by projecting RNA-derived edges onto two independent mouse fibroblast SILAC proteomes of GPX4 loss-of-function ferroptosis. The inducible GPX4-knockout time course (PRIDE PXD050979) yielded 1026 quantifiable proteins after consolidation; 47 network edges had sufficient protein coverage (≥3 finite values per partner). Their absolute protein co-variation was higher than degree-matched random pairs (median |ρ| = 0.536 vs. 0.400; Wilcoxon
p = 4.9 × 10
−3). The larger pulsed-SILAC study of constitutive GPX4 knockout (PRIDE PXD040094) provided 5516 proteins across 14 channels; of 563 mapped edges, 507 met the coverage criterion and likewise exceeded the null (median |ρ| = 0.367 vs. 0.336; Wilcoxon
p = 6.2 × 10
−4). Representative protein–protein scatters are shown in
Figure 4, including XDH–ANXA8 (ρ = 0.92), RPLP2–BAIAP2 (ρ = 0.89), NRDC–ENO3 (ρ = 0.88), and SFPQ–CKAP5 (ρ = 0.98), illustrating coherent proteomic co-variation across channels. Together with the enrichment tests, these results indicate that a substantial subset of LazyNet edges remain concordant at the protein layer, supporting the biological fidelity of the inferred network, despite differences in species and experimental design.
3.6. Biological Insights
The 32 × 4 subgraph seeded on the CRISPR A/I screen recapitulates the ferroptosis core: among the top 1000 LazyNet edges (
Supplementary S1C), 15 of the 27 benchmark regulators reported by Tian et al. cluster near the network center, consistent with the canonical lipid–peroxidation axis [
36].
Several connections align with established mechanisms, supporting the face validity of the wiring. A mitochondrial module links MFN2-mediated mitophagy, which constrains ROS-driven death, with SIRT3 signaling that modulates ferroptosis through GPX4 and AMPK–mTOR pathways [
49,
50]. Immune–redox cross-talk involving TLR4–STAT3 that amplifies mitochondrial ROS is also recovered [
51]. In parallel, ADCY10-driven mitochondrial cAMP has been shown to protect neurons under oxidative stress, matching the directionality of edges observed here [
52]. These concordances indicate that LazyNet recovers known biology across mitochondrial, immune, and metabolic axes.
Beyond literature-supported modules, the subgraph nominates cross-axis links that, to our knowledge, are not reported as direct connections and are presented as testable hypotheses. First, a lysosomal–autophagy route from PSAP to mTORC1 (MTOR/RPTOR/RHEB), potentially via ULK1-associated scaffolding (with PPP4R1 as a putative bridge), is consistent with PSAP-dependent lipofuscin accumulation and neuron-specific ferroptosis. Second, high-elasticity edges suggest immune inputs from TLR6 and CCR10 into autophagy/mTOR nodes (ATG13, MTOR). Third, a metabolic coupling is predicted in which TRPM6 (Mg2+ channel) and ADCY10 interface with the SIRT3–mTORC1 axis.
Taken together, the results point to an integrated immune–mitochondrial–lysosomal circuit in which mitochondrial quality control (MFN2/SIRT3), innate immune signaling (TLRs), and metabolic/cAMP cues (ADCY10/TRPM6) converge on mTOR-linked autophagy to tune the ferroptotic threshold. Edge-level attributes are provided in
Supplementary S1C to facilitate replication and follow-up.
4. Discussion
This study set out to answer a practical question for small labs and individual projects: given a single CRISPR A/I screen with modest sample size, limited compute, and no realistic path to large-scale pretraining, what modeling strategy is worth the effort? We designed LazyNet as a one-step, log–exp neural ODE that directly encodes multiplicative perturbation effects and exposes gene–gene elasticities as interpretable parameters. Under a strictly from-scratch, low-data regime—where all models, including scGPT-style and CellFM/RetNet-style baselines, are trained only on the study dataset under a 1 h time cap—LazyNet achieves competitive or better predictive performance for single-snapshot perturbation prediction while remaining numerically stable and CPU-tractable. These experiments do not evaluate fully pretrained foundation models in their intended high-data regime. When large-scale pretraining and fine-tuning are feasible, we expect scGPT and CellFM-style models to remain strong options, and we view LazyNet as a complementary choice tailored specifically to from-scratch, low-data settings.
A central feature of LazyNet is that its one-step, log–exp residual update exposes elasticities—local, directed sensitivities of genes to one another around a baseline. Averaging Jacobians across replicas produces a consensus, directed interaction matrix that we expand into a compact subgraph for interpretation. This graph recovers a substantial fraction of known ferroptosis regulators and shows coherent support across independent resources: enrichment in large-scale co-expression (ARCHS4), non-random overlap with curated functional associations (STRING), and concordant protein-level co-variation in GPX4 loss-of-function proteomes. The agreement is moderate rather than exhaustive—as expected given that elasticity (local, directional quantitative response) and database edges (heterogeneous, often undirected evidence) measure different facets of biology—but it is sufficient to anchor new, testable hypotheses, including a lysosomal–mitochondrial–immune module that nominates specific links into mTOR-connected autophagy.
Equally important is practicality. LazyNet’s capacity is concentrated into a small set of mechanistically meaningful weights, which helps it converge quickly on CPUs without external priors, while still producing parameters that carry direct interpretive value (rate-like elasticities). The ensemble Jacobian pipeline is deterministic and light enough to run end-to-end on a single node, and the entire workflow—preprocessing, splits, commands, and seeds—is packaged for exact replication. In contrast, graph-inference tools optimized for edge ranking from co-expression can generate high-accuracy local subgraphs but are not designed to provide a globally consistent, directed dynamic map aligned with potential causality [
53]; for comparative purposes we therefore emphasize module-level enrichment against authoritative, curated resources and orthogonal experiments rather than raw edge-overlap counts.
Several limitations follow from design choices. Working in log space requires positive inputs and introduces mild dependence on normalization and pseudocounts. A single explicit Euler step matches two-snapshot designs but leaves the absolute Δt unidentified and cannot represent delays or oscillations; elasticities are local to the chosen baseline and may vary across states. Our enrichment tests rely on incomplete references whose coverage and evidence channels differ from the signal inferred by LazyNet; consequently, we do not expect high global concordance, and we treat overlaps as supportive rather than definitive. Finally, within a fixed wall-time budget, excessively large models can under-train and reduce accuracy despite higher nominal capacity.
Looking ahead, three directions appear most impactful. First, time-resolved perturbation series would allow multi-step fitting that estimates Δt, tests stability, and distinguishes fast from slow modes. Second, structured priors (e.g., motif-based regulator sets or pathway constraints) could be added in a transparent way—regularizing the elasticity matrix without sacrificing interpretability. Third, prospective validation—targeted CRISPR edits chosen from LazyNet’s high-elasticity, database-unsupported edges—will be essential to calibrate precision and to refine the expansion rules that balance depth with tractability. Within these bounds, our results suggest that a one-step neural ODE view can make routine Perturb-seq datasets actionable on commodity hardware, linking accurate prediction with mechanism-level hypotheses that are ready for bench testing.
5. Conclusions
Perturbation datasets remain costly to generate at scale and hard to harmonize across studies, so many groups benchmark on public resources while analyzing their own data with bespoke code. Our goal was a method that operates directly on local measurements, runs under modest compute, and yields parameters with clear mechanistic meaning. LazyNet meets this brief: a one-step neural ODE that trains on CPUs, reports reproducible predictions, and exposes directed elasticities suitable for network interpretation.
Empirically, under a strictly from-scratch, low-data setting with a 1 h training cap per model, LazyNet attains competitive or superior predictive accuracy with substantially fewer parameters and lower wall-time than strong baselines, while producing elasticities that show coherent external concordance at the module level. A central advantage is the log–linear–exp map, which exactly represents multiplicative terms. Synergistic, multi-locus effects therefore appear as explicit components rather than opaque composites, enabling practical design and prioritization of combinatorial perturbations from sparsely sampled CRISPR A/I data.
Limits follow from design choices: two-snapshot learning cannot identify absolute Δt, delays, or oscillations; log-domain preprocessing (pseudocount, normalization) introduces mild dependencies; the update is Markovian without latent memory; and full transcriptome-scale expansion can be memory-intensive (partially mitigated with block-sparse top-K storage and streaming). Nonetheless, within a fixed 1 h budget, the model’s capacity-to-convergence tradeoff is favorable on CPUs, and its parameters remain directly interpretable.
Future work will extend validation to time-resolved perturbation series and additional cell contexts, incorporate lightweight structured priors (motifs, pathways) and uncertainty quantification, and perform prospective tests of synergy-driven predictions. Within these bounds, LazyNet offers a reproducible, resource-efficient route to synergy-aware mechanistic inference from perturbation data—bridging accurate prediction with directed, testable hypotheses.