A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems

Atchade-Adelomou, Parfait; Coronas Sala, Laia

doi:10.3390/electronics14132601

Open AccessArticle

A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems

by

Parfait Atchade-Adelomou

^1,2,* and

Laia Coronas Sala

³

¹

Lighthouse Disruptive Innovation Group, LLC, 1 Broadway, 14th Floor, Cambridge, MA 02142, USA

²

MIT Media Lab-City Science Group, Cambridge, MA 02139, USA

³

Lighthouse Disruptive Innovation Group Europe, SL., 08830 Barcelona, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2601; https://doi.org/10.3390/electronics14132601

Submission received: 20 May 2025 / Revised: 19 June 2025 / Accepted: 23 June 2025 / Published: 27 June 2025

(This article belongs to the Special Issue Recent Advances in Quantum Information)

Download

Browse Figures

Review Reports Versions Notes

Abstract

We present a scalable and resource-aware framework for the quantum simulation of large proteins, grounded in systematic molecular fragmentation, analytical Toffoli gate modeling, and empirical validation. The ground-state energy of a target biomolecule is reconstructed from capped amino acid fragments, with fixed corrections to account for artificial boundaries. Analytical cost estimates—derived from reduced Hamiltonians—are benchmarked against empirical Toffoli counts using PennyLane’s resource estimation module. Our model maintains predictive accuracy across biologically relevant systems of up to 1852 electrons, capturing consistent patterns across diverse fragments. This framework enables early-stage feasibility assessments for achieving quantum advantage in biochemical simulation pipelines.

Keywords:

quantum computing; protein simulation; QMProt; fragmentation; Toffoli optimization; resource estimation; glucagon

1. Introduction

The quantum simulation of biomolecular systems remains a major computational challenge due to the exponential scaling associated with solving the electronic Schrödinger equation. Proteins, in particular, present a formidable case, as they involve a large number of electrons and complex many-body interactions [1,2]. Fragmentation techniques, which decompose macromolecules into smaller subsystems such as amino acids or peptides, have emerged as a viable approach to reducing computational overhead while preserving physical accuracy [3,4,5,6].

Quantum computing offers theoretical advantages in simulating electronic structure problems, where classical methods rapidly become intractable with system size [7,8]. By encoding quantum states directly in qubit-based representations, quantum computers avoid the combinatorial explosion characteristic of classical full-configuration interaction methods. Despite the availability of approximations such as DFT [9] and CAS-CI [10], these approaches face well-known limitations in accuracy or scalability, particularly for systems with strong electronic correlation [11,12].

Exact quantum chemical simulations remain out of reach for full protein systems, even with quantum hardware. As such, methodologically principled reductions in problem size are essential. Our objective is to evaluate a fragmentation-based framework for scalable quantum simulation of proteins, balancing error control with computational feasibility. The methodology builds on our prior work [3], which introduced a reassembly scheme for independently simulated fragments with chemical corrections. That work demonstrated high accuracy on small peptides, achieving relative errors of approximately

0.005 %

for amino acid-level fragmentation and

0.27 %

for finer subdivisions.

In this study, we extend the methodology to larger and biologically relevant peptides, including Glucagon, Oxytocin, Vasopressin, and Angiotensin II, which vary in electron counts and structural complexity. Among them, Glucagon stands out as a critical benchmark due to its physiological role and its size, comprising 29 amino acids and over 1800 electrons. Its simulation requires addressing more than 10⁴⁸ coefficients, posing a stringent test for the scalability of our approach.

By systematically evaluating prediction errors and quantum resource estimates across these molecules, we aim to assess the practical applicability of fragmentation-based strategies for quantum chemistry. This work contributes to the broader effort of making accurate electronic structure simulations tractable for systems of biological and chemical relevance. Beyond molecular simulation, the methodology has potential implications in areas such as drug discovery and materials design, where electronic structure accuracy is essential, and classical methods face intrinsic limitations.

The structure of the paper is as follows: Section 2 reviews related work in quantum simulation and fragmentation-based methods. Section 3 introduces our methodology, including regression models, Toffoli gate estimation, and the proposed multi-level fragmentation and reassembly strategy. Section 4 presents experimental validation on a range of peptides and proteins, from small dipeptides to large systems such as Glucagon, along with detailed error and scalability analyses. Section 5 discusses limitations, compares with alternative techniques, and outlines practical implications and future research. Section 6 summarizes our main findings and perspectives. Additional technical derivations and regression diagnostics are provided in the appendices.

2. Related Works

Quantum simulations of biomolecules [5,6,13] have progressed along two converging lines: (i) classical fragment-based electronic-structure methods that exploit locality to curb cost, and (ii) quantum algorithms that aggressively compress qubit and gate resources [14,15]. Our work positions itself at this intersection, extending classical fragmentation ideas into the quantum era and knitting together the most effective resource-optimization tools reported to date.

Fragmentation approaches such as the Fragment Molecular Orbital (FMO) method [16], Our N-Layered Integrated Molecular Orbital and Molecular Mechanics (ONIOM) [17], and adaptive Quantum Mechanics/Molecular Mechanics (QM/MM) [18] mitigate the exponential scaling bottleneck by decomposing proteins into chemically intuitive fragments. However, the high-level electronic treatment of each block is still limited to density-functional or semi-empirical accuracy.

E_{protein} = \sum_{i = 1}^{n} E_{f_{i}} \pm \sum_{j = 1}^{k} Δ E_{coupling, j},

(1)

Δ E_{coupling, j} = \{\begin{matrix} E_{a m_{j}}, capping / missing groups, \\ \sum_{n = 2}^{N} E_{n -body}, many-body terms . \end{matrix}

(2)

In Equation (1),

E_{protein}

denotes the total ground-state energy of the reassembled molecule, computed from a set of n fragments. Each

E_{f_{i}}

is the energy of fragment i, typically obtained from an independent quantum simulation. The term

Δ E_{coupling, j}

represents an additional correction associated with inter-fragment effects or artificial modifications introduced via fragmentation.

Equation (2) defines two types of corrections that

Δ E_{coupling, j}

may include the following:

$E_{a m_{j}}$ : the energy of a group assembled or removed during fragmentation (e.g., a capping hydrogen atom to preserve chemical valency);
$\sum_{n = 2}^{N} E_{n -body}$ : a higher-order many-body interaction correction involving n fragments simultaneously, as in the fragment molecular orbital (FMO) method [16].

The index k denotes the total number of such coupling corrections considered. The ± sign in Equation (1) reflects the fact that these contributions can either increase or decrease the total energy, depending on whether the correction represents an additive or subtractive effect (e.g., insertion/removal of atoms or stabilizing/destabilizing couplings). By introducing the general symbol

Δ E_{coupling, j}

, we unify the treatment of both structural and energetic corrections into a single formalism. This allows Equations (1) and (2) to remain valid across a wide range of fragmentation strategies, including those that incorporate post-reassembly many-body expansions (MBE).

Bowling et al. [19] exemplify this approach by combining single-residue fragmentation, minimal hydrogen capping, and screened two-body terms within a convergent many-body expansion (MBE) scheme:

E = \sum_{I} E_{I} + \sum_{I < J} Δ E_{I J} + \sum_{I < J < K} Δ E_{I J K} + \dots

(3)

Each correction term accounts for interactions omitted at lower orders of the expansion. In particular, truncating the series at the n-body level defines the degree of approximation. For instance, the three-body correction is given by the following:

Δ E_{I J K} = E_{I J K} - (E_{I J} + E_{I K} + E_{J K}) + (E_{I} + E_{J} + E_{K})

(4)

Although many-body expansions offer chemically accurate results, their combinatorial scaling limits practical applicability. To address this, our method introduces resource-aware fragmentation, statistical estimation, and circuit-level compression to ensure scalability. Prior works, such as MFCC-MBE(2) [20], have improved classical accuracy by incorporating fragment and cap interactions. Extending these ideas, our framework replaces classical subroutines with quantum solvers while preserving compatibility with post-fragmentation corrections, thus enabling seamless integration into hybrid quantum–classical approaches.

Three breakthroughs underpin our scalable workflow:

Local qubit tapering. Extending the symmetry-based tapering of Bravyi et al. [21], we identify $Z_{2}$ symmetries within each fragment, removing ∼4–6 logical qubits on average.
SelectSwap oracle synthesis. The SelectSwap network of Zhu et al. [22] prepares fragment phase oracles at a cost of $O (\sqrt{2^{n_{f}} log (1 / ε)})$ T gates, where $n_{f} = ⌈ {log}_{2} N_{coeff} ⌉$ is the number of logical qubits required to represent the $N_{coeff}$ diagonal coefficients of the fragment.
Optimal state preparation [23]. Diagonal-unitary synthesis plus exact amplitude amplification [24,25] reduces the non-Clifford depth by 20–50% in published benchmarks (22% for QAOA, 50% for random diagonals) [24,25].

Together, these optimizations shrink the space–time volume of a 400-orbital active site by nearly two orders of magnitude versus the double-factorized algorithm of von Burg et al. [26].

Most prior quantum–chemistry demonstrations target small peptides (

< 200 e^{-}

) or model chromophores. To probe genuine scalability, we select four bio-relevant hormones spanning two decades in electron count and 30 orders in the Hamiltonian-coefficient space:

Glucagon (29aa, $1852 e^{-}$ ) — $4.33 \times 10^{48}$ coefficients; 2679 logical qubits after tapering.
Oxytocin (9aa, $536 e^{-}$ ) — $8.85 \times 10^{17}$ coefficients; 778 qubits.
Vasopressin (9aa, $1134 e^{-}$ ) — $7.81 \times 10^{31}$ coefficients; 1641 qubits.
Angiotensin II (8aa, $558 e^{-}$ ) — $2.88 \times 10^{18}$ coefficients; 809 qubits.

These systems fill the gap between toy peptides and full enzymes—precisely the scale at which the existing methods begin to break down and where our integrated pipeline is explicitly designed to operate.

Tensor-network simulation methods (e.g., DMRG [27], MPS [28], and TTN [29]) are highly efficient for systems with low or structured entanglement. However, when applied to quantum circuits that generate strong global entanglement—such as GHZ preparation, quantum Fourier transform, or Hamiltonian evolution—the bond dimension grows exponentially, making contraction intractable [29,30]. This limits their applicability to deep circuits or realistic biomolecules.

Most current approaches to quantum simulation remain limited to small peptides with fewer than 150 electrons, where accurate reassembly has been demonstrated with sub-1% errors. However, no previous method integrates fragmentation-aware oracle synthesis and tapering into a scalable pipeline for biomolecules with 500–2000 electrons. Our work addresses this gap by combining these techniques into a unified, resource-efficient framework capable of operating at the hormone scale. While challenges remain—such as chemically informed fragmentation, correlation beyond MP2 [31], and cross-fragment error mitigation—recent advances in entanglement-guided heuristics [32] suggest promising directions to extend the approach.

3. Methodology

Our methodology addresses the challenges of simulating large protein systems on quantum computers by combining fragmentation strategies with advanced quantum algorithms. Below, we outline the key components of our approach, from data modeling and resource estimation to fragmentation and reassembly techniques.

3.1. Fragmentation and Recombination Strategy

Our methodology builds upon the fragmentation protocol introduced in the QMProt framework [33], extending it to enable scalable quantum simulations of large biomolecules. The core idea is to approximate the ground-state energy of a full protein or peptide by summing the energies of chemically meaningful fragments, primarily based on individual amino acid residues. Each fragment is saturated with hydrogen atoms or small chemical groups to preserve valency and molecular stability.

In our framework, the reassembled ground–state energy

E_{m}

is given by

E_{m} = \sum_{i = 1}^{n} E_{f_{i}} \pm \sum_{j = 1}^{k} E_{a m_{j}},

(5)

where the following applies:

n denotes the total number of fragments generated;
$E_{f_{i}}$ is the ground-state energy (GSE) of fragment i;
k is the number of small molecules added or removed;
$E_{a m_{j}}$ is the GSE of each such molecule;
$E_{m}$ is the final GSE of the reassembled molecule.

Here, the index i runs from 1 to n, enumerating each fragment, and j runs from 1 to k, enumerating each added or removed molecule.

To correct the artificial effects introduced at the cutting points, we apply fixed energy corrections based on the capping strategy:

If a methyl group (CH₃) is added to complete valency, we add the energy of a methane molecule as a corrective term.
If a water molecule is implicitly removed during fragmentation (e.g., in peptide bonds), its reference energy is subtracted.
If no group is added or removed, no correction is applied.

These correction values are precomputed and consistently reused. This avoids the need for iterative many-body corrections or self-consistent inter-fragment coupling, drastically reducing computational complexity compared to traditional fragmentation schemes.

We apply this fragmentation strategy at two levels:

Amino acid level: Each amino acid is split into radical and backbone groups. We benchmark the ground-state energy (GSE) and resource estimates (e.g., qubits, Toffoli gates) of the full amino acid against the sum of its fragments, quantifying the reduction factor.
Peptide/protein level: For representative peptides such as Oxytocin, Angiotensin II, and Glucagon, we compute total energy as the sum of fragment energies plus corrective terms, as expressed in Equation (1). We compare this estimate to the full molecular simulation to evaluate accuracy and resource savings.

This approach maintains high computational efficiency while ensuring that GSE estimates remain within acceptable error bounds. The residual recombination error grows with the number of fragments and is influenced by the precision of the energy evaluation for each fragment. In our current implementation, calculations are performed at the Hartree–Fock (HF) level, which introduces some limitations for large systems. However, the modular nature of the method makes it compatible with more accurate solvers (e.g., DFT, MP2, or quantum algorithms), which are expected to further reduce the error as these techniques mature. Thus, our strategy provides a robust foundation for efficient and scalable quantum simulations of biomolecular systems.

3.2. Modeling Based on Experimental Data

To support our modeling and prediction efforts, we use a dataset—QMProt [33]. This dataset encompasses 45 carefully selected organic molecules, with a particular focus on the 20 canonical amino acids essential to human biology. Each molecule is decomposed into chemically meaningful subunits, including amino and carboxyl termini, central

α

-carbon atoms, and characteristic side chains.

The molecules in QMProt are composed primarily of non-hydrogen atoms such as carbon, nitrogen, oxygen, and sulfur, and they contain up to 15 heavy atoms. For each molecular entry, the dataset provides the following:

The total number of electrons and molecular orbitals.
The corresponding number of logical qubits required for simulation.
The full Hamiltonian encoded as a set of quantum coefficients.
Ground-state energy estimates derived from quantum mechanical methods
Additional physicochemical attributes relevant for simulation benchmarking.

This dataset bridges quantum chemical characterization with quantum resource modeling, offering a representative basis for scaling predictions to larger biomolecules. Using this foundation, we develop regression models to forecast quantum resource needs—including qubits and gate counts—based on fundamental descriptors such as the electron number, enabling extrapolation to peptides and protein fragments well beyond the initial dataset.

3.2.1. Linear Model for Qubits

We assume a linear relationship between the number of qubits and the number of electrons:

n_{qubits} = α + β \cdot n_{electrons} + ε,

(6)

where

α

is the intercept,

β

is the slope, and

ε

is the residual error. The model is fitted using ordinary least squares (OLS), minimizing the squared residuals:

min_{α, β} \sum_{i = 1}^{n} {(n_{qubits}^{(i)} - (α + β n_{electrons}^{(i)}))}^{2} .

(7)

The analytical solution involves solving the normal equations:

[\begin{matrix} n & \sum n_{electrons} \\ \sum n_{electrons} & \sum n_{electrons}^{2} \end{matrix}] [\begin{matrix} α \\ β \end{matrix}] = [\begin{matrix} \sum n_{qubits} \\ \sum n_{electrons} n_{qubits} \end{matrix}] .

(8)

3.2.2. Log-Linear Robust Model for Qubits

While the linear model offers a simple baseline, an empirical evaluation revealed that the relationship between electron count and qubit requirements is more accurately captured using a log-transformed regression. Accordingly, we consider the following:

log (n_{qubits}) = α + β \cdot n_{electrons} + ε,

(9)

where the parameters

α

and

β

are estimated using robust methods, such as the Huber Regressor [34] and the Theil–Sen Estimator [35], to mitigate the influence of outliers.

To account for structural diversity, we partition the dataset into three molecular complexity segments: Small (electron count ≤ 150), medium (151–500), and large (>500), enabling localized regressions adapted to each scale. Predictions are returned to the original scale via the following:

{\hat{n}}_{qubits} = exp (α + β \cdot n_{electrons}) .

(10)

3.2.3. Exponential Model for Hamiltonian Coefficients

We hypothesize an exponential growth in the number of coefficients concerning the electron count:

n_{coef} = a \cdot exp (b \cdot n_{electrons}),

(11)

with a and b as model parameters. Taking the natural logarithm yields a linearized version:

log (n_{coef}) = log (a) + b \cdot n_{electrons} + ε,

(12)

which is fitted using OLS. The predicted values in the original scale are obtained via exponentiation:

{\hat{n}}_{coef} = exp (log (a) + b \cdot n_{electrons}) .

(13)

3.2.4. Confidence Intervals

For the linear model, the 95% confidence interval for a prediction

\hat{y}

is as follows:

\hat{y} \pm t_{n - 2, 0.975} \cdot s_{fit} (x),

(14)

where x is the number of electrons for which the confidence interval is being computed, and

\bar{x}

denotes the sample mean of the observed electron counts.

s_{fit} (x) = \hat{σ} \cdot \sqrt{\frac{1}{n} + \frac{{(x - \bar{x})}^{2}}{\sum {(x_{i} - \bar{x})}^{2}}} .

(15)

\hat{σ}

denotes the estimated standard deviation of the residuals, calculated as

\hat{σ} = \sqrt{\frac{1}{n - 2} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

, assuming the model is fitted using ordinary least squares, and where

t_{n - 2, 0.975}

denotes the critical value from the Student’s t-distribution with

n - 2

degrees of freedom, corresponding to a two-tailed confidence level of 95%.

For the log-linear robust model, the 95% confidence interval is computed using the delta method. Let

\hat{y} = α + β x

denote the log-qubit prediction. The Jacobian vector with respect to the parameters is as follows:

J (x) = [\begin{matrix} \frac{\partial \hat{y}}{\partial α} & \frac{\partial \hat{y}}{\partial β} \end{matrix}] = [1, x],

(16)

and the variance of the prediction is as follows:

Var (\hat{y}) = J (x) \cdot Σ \cdot J {(x)}^{⊤},

(17)

where

Σ

is the estimated covariance matrix of the fitted parameters. The resulting interval is given by the following:

\hat{y} \pm t_{n - 2, 0.975} \cdot \sqrt{Var (\hat{y})},

(18)

and transformed back to the original scale as follows:

{\hat{n}}_{qubits} \in [exp (\hat{y} - Δ), exp (\hat{y} + Δ)],

(19)

where

Δ = t_{n - 2, 0.975} \cdot \sqrt{Var (\hat{y})}

. A complete derivation of these expressions and additional justifications for the segment-wise log-qubit modeling can be found in Appendix D.

3.2.5. Error Metrics

We evaluate model performance using standard regression error metrics commonly employed in predictive modeling. The root mean squared error (RMSE) and the mean absolute error (MAE) are used to quantify the average magnitude of residuals, with RMSE giving greater weight to larger errors and MAE providing a more robust central tendency measure less affected by outliers [36].

To contextualize the errors concerning the scale of the observed values, we also report the mean relative error (MRE), expressed as a percentage. Additionally, we compute the relative standard deviation (

σ_{rel}

), which captures the dispersion of predicted values normalized by their mean. These metrics provide a consistent basis for comparing model outputs across different experimental conditions.

3.3. Estimation of Toffoli Gate Count

To estimate the quantum cost, we employ a function inspired by the SelectSwap algorithm [15]. This algorithm achieves a quadratic improvement in gate counts by reducing the number of Toffoli gates to a scale of

f \cdot \sqrt{M}

, where M is the number of coefficients, and f is a multiplicative factor. Without SelectSwap, the cost grows proportionally to M and requires

log (M)

ancillary qubits.

The Toffoli gate count is estimated using the following function:

T_{toffoli} = C_{1} \sqrt{2^{n} \cdot {log}_{2} (\frac{n}{ϵ})} + C_{2} {log}_{2} (\frac{n}{ϵ}),

(20)

where

n = ⌈ {log}_{2} (n_{coef}) ⌉, C_{1}, and C_{2} \in R .

We define the target precision

ϵ

as a function of fragment size, setting

ϵ = \frac{n}{2^{b}}

for a fragment comprising n logical qubits. Here, b denotes the resolution parameter of the quantum hardware, capturing the logarithmic relationship between numerical accuracy and the size of the Hamiltonian coefficient vector. In our experiments, we fix

b = 20

, corresponding to a 20-bit precision level that reflects realistic thresholds for fault-tolerant quantum architectures. This formulation is consistent with practical accuracy requirements and aligns with resource scaling trends reported in prior studies [22].

Equation (20) expresses the asymptotic scaling of the SelectSwap algorithm and serves as a practical estimator for Toffoli gate counts in fault-tolerant quantum simulations. For all experiments, we assume constant factors

C_{1} = C_{2} = 3

unless otherwise stated. Additional implementation details and derivations are provided in Appendix B.

3.4. Experimental Validation

To validate our analytical Toffoli-gate estimates, we employ PennyLane’s resource estimation module, available through the pennylane.labs.resource_estimation interface (Commit 5fc5d02308d25831594ecb41e4acc6f74ce2da30 on GitHub; not yet part of the stable release or fully documented at the time of our study). Using this framework, we instantiate both QROM-based state preparation circuits [37] and qubitization routines [38] for each fragment and benchmark peptide, adapting the templates to match the structure of our reduced Hamiltonians.

For each system, we extract empirical Toffoli counts and compare them against the theoretical predictions obtained from Equations (6)–(20). This experimental procedure serves to verify the internal consistency of our cost model and ensure compatibility with circuit-level resource estimators in current quantum toolchains.

4. Results

We begin by evaluating the accuracy of our fragmentation strategy via regression analysis. Figure 1 shows an exponential fit of the total number of Hamiltonian coefficients versus the electron count, including a 95% confidence interval. In addition, Figure 2 provides a complementary linear interpolation for the smallest molecules in our dataset.

Next, we quantify recombination and estimation errors. Table 1 reports fragment recombination errors across all peptides, confirming minimal error accumulation for small fragments and identifying increased errors for larger systems.

We then examine resource-scaling trends. In Figure 3, we plot both Toffoli gate counts and qubit requirements against molecular size, comparing the original full-Hamiltonian approach to our base-structure method. Table 2 summarizes these benchmarks— numbers of the coefficients, Toffoli estimates, reduction factors, and electron counts—for every fragment.

To validate regression robustness, Table 3 presents performance metrics for robust regression models (Huber with varying

ϵ

and Theil–Sen) in predicting log-transformed qubit counts across small, medium, and large molecular complexity segments.

Finally, to ground our theoretical Toffoli formula (Equation (20)) in real circuit data, we measured gate counts using PennyLane’s resource_estimation module [39]. Table 4 reports these empirical gate counts alongside the derived coefficients

C_{1}

and

C_{2}

and their relative errors.

Using our regression models—linear for qubits and exponential for coefficients—we predicted the resources required to simulate a variety of proteins and peptides, including both well-known and novel examples. Our models provide the following key insights:

Number of coefficients ( $n_{coef}$ ): The number of Hamiltonian coefficients exhibits exponential growth with the number of electrons, as captured by the exponential model in Equation (11).
Number of Qubits ( $n_{qubits}$ ): the number of qubits grows moderately linearly with the number of electrons, as described by the linear model in Equation (6).
Number of Toffoli Gates: While fragmentation occasionally introduces overhead for small amino acids—due to duplicated setup costs and additional reassembly steps—it proves advantageous at the peptide scale, where monolithic encodings become intractable. This trade-off is acceptable, given the preservation of accuracy and the exponential savings in larger systems. However, for small systems, fragmentation maintains extremely low errors, supporting the method’s accuracy and feasibility. This suggests that, while fragmentation introduces a slight overhead in gate count for small systems, it remains a viable strategy for reducing resource requirements in larger systems.

These predictions were validated on a diverse set of intermediate systems and then extrapolated to full hormone peptides—such as Glucagon—demonstrating that the model retains predictive power across four orders of magnitude in electron count and 30 orders of Hamiltonian complexity.

To further analyze the scalability of our approach, we generated log-log plots that relate the following quantities:

$n_{coef}$ versus $n_{electrons}$ ,
Toffoli gates versus $n_{electrons}$ ,
Reduction factor versus $n_{electrons}$ ,
Total qubits versus $n_{electrons}$ .

These plots reveal that, although the quantum cost increases with system size, the application of fragmentation and the SelectSwap algorithm significantly mitigates this increase. Specifically, the reduction factor achieved through fragmentation keeps the gate count and qubit requirements within acceptable ranges, even for large systems. This demonstrates the effectiveness of our approach in managing the exponential scaling of quantum resources. Further details of these affirmations can be corroborated in Table 2, where the reduction in terms of coefficients and Toffoli gates of our approach is presented, both in the case of protein and amino acid fragmentation levels. These results confirm that, while quantum resource scaling remains exponential, targeted fragmentation combined with modern quantum oracles offers a tractable and chemically accurate pathway to simulate biologically meaningful molecules on future quantum hardware.

To bring the strategy of our previous work to the next level [3], we compute the accuracy of our fragmentation strategy by comparing reference energies (

E_{GT}

) with calculated energies (

E_{m}

) for a set of larger peptides. Table 1 summarizes the number of electrons, orbitals, theoretical energy, computed energy, and relative error (

% R E

), starting with some small peptides included in our previous work and continuing with much larger ones to compare the change in accuracy as the system scales up. Key observations include the following:

Small peptides: Relative errors of 0.0005–0.0065% in dipeptides (e.g., Gly-Gly, Pro-Gly, Gly-Ala). This confirms the high accuracy of our fragmentation strategy for small systems.
Intermediate peptides: Some (e.g., Aspartame and Phe-Ile) exhibit slightly higher errors (up to 0.065%), confirming again the accuracy of the strategy.
Large systems: In molecules with hundreds of electrons (e.g., Angiotensin II and IV, Oxytocin, Glucagon), the relative error increases (between 2 and 3%), highlighting the need for further optimization strategies, even though the errors remain within acceptable limits for practical applications.

Our combined methodology—statistical modeling, hierarchical fragmentation, and quantum resource optimization—enables the accurate simulation of complex biomolecules at an unprecedented scale. These results lay the groundwork for near-term beyond applications in quantum biochemistry and offer a scalable framework compatible with fault-tolerant quantum computing architectures.

Experimental Validation

Table 4 presents a detailed comparison between theoretical Toffoli gate counts—computed via Equation (20) with fixed parameters

C_{1} = C_{2} = 3

—and empirical counts (Toffoli*) obtained using the pennylane.labs.resource_estimation [40] module for each fragment’s qubitization circuit. Additionally, we perform an ordinary least squares (OLS) fit across all samples to extract empirical scaling coefficients

〈 C_{1} 〉

and

〈 C_{2} 〉

, along with their deviations. Per-fragment parameter variations under different precision settings

(ε_{1}, ε_{2})

are detailed in Appendix C.

5. Discussion

The results presented demonstrate that our fragmentation–reassembly protocol effectively decomposes large protein Hamiltonians into tractable subunits while preserving overall accuracy. By applying Equation (20) with fixed parameters (

C_{1} = C_{2} = 3

) and validating against PennyLane’s resource_estimation [39] (Table 4), we achieve sub-3% relative energy errors alongside reductions of up to 20 orders of magnitude in Toffoli gate counts (Table 2). This dual attainment of precision and scalability is essential for hybrid quantum–classical workflows on near-term devices.

A key strength of our methodology lies in its modular error propagation: The overall simulation error results from the additive contributions of each independently simulated fragment. Consequently, any improvement in fragment accuracy, whether through higher-order quantum algorithms (e.g., quantum phase estimation [41] or improved state preparation circuits [23], directly results in a lower global error. As logical qubit fidelity advances, we therefore anticipate convergence toward chemical precision without altering the core fragmentation scheme.

Our resource-prediction framework is underpinned by robust regression, with Huber estimators (

ϵ ≃ 2.0

) delivering superior outlier resistance and fit quality across small, medium, and large fragment regimes (Table 3). Compared to Theil–Sen, Huber regression maintains higher

R^{2}

values and lower MAE in log space, justifying its selection as the primary model for qubit- and gate-count estimation. The trends illustrated in Figure 3 further corroborate the consistency of these predictions across molecular sizes.

Empirical calibration refines our theoretical scaling: fitting Equation (20) to measured data yields global corrections

〈 C_{1} 〉 = 2.64

,

〈 C_{2} 〉 = 3.17

, reducing the mean Toffoli-count error to 11.9%. These adjustments preserve the validity of the analytical model while aligning it with practical hardware characteristics, thereby bridging theory and experiment with minimal overhead.

When benchmarked against classical fragmentation approaches (e.g. ONIOM, many-body expansions), our pipeline offers comparable or improved accuracy for small peptides and more favorable scaling for larger biomolecules. Unlike global symmetry tapering [21], which is limited due to the scarcity of protein-wide symmetries, our fragment-wise active-space identification applies universally, enhancing adaptability across diverse protein architectures.

Nonetheless, several limitations remain. Current noise and decoherence on real quantum hardware constrain immediate fault-tolerant implementation. Fragments containing atypical bonds or motifs may introduce systematic deviations; targeted corrections—guided by the QMProt dataset [33]—and refined state-preparation protocols are required to address these edge cases. Moreover, the integration of advanced subroutines will improve fragment precision at the expense of greater circuit complexity, necessitating a careful trade-off between accuracy and resource availability.

In summary, our fragmentation framework provides a scientifically rigorous, modular, and scalable blueprint for quantum simulation of biomolecules. By combining analytical modeling, empirical validation, and error modularity, we propose a scientifically grounded and practically oriented pathway toward achieving quantum advantage in real-world biochemical simulations.

6. Conclusions and Perspectives

We have presented a scalable, resource-efficient protocol for the quantum simulation of large protein systems, based on molecular fragmentation, regression-driven resource estimation, and circuit optimization via the SelectSwap algorithm.

Starting from small peptides and extending to Glucagon, our method maintains relative energy errors below 3% while reducing Toffoli gate counts by up to 20 orders of magnitude. Regression models accurately predict resource requirements and support pre-optimization of quantum workloads for both current and near-term devices.

Future work includes expanding the molecular dataset to improve model robustness, integrating advanced subroutines (e.g. QPE and optimized state preparation) to approach chemical accuracy, and benchmarking on actual hardware to evaluate noise resilience. Comparative analysis with techniques such as Hamiltonian tapering and active-space reduction will further refine the reassembly pipeline, and the modular nature of our framework ensures that hardware improvements map directly to simulation accuracy without altering the core methodology.

Our head-to-head comparison with PennyLane’s resource_estimation module [39] confirms that, once validated, it can serve as a reliable standalone estimator of circuit cost. Simulating each fragment’s qubitization circuit under the chosen precision settings can replace the explicit extraction of empirical coefficients

C_{1}

and

C_{2}

, simplifying future resource estimates without sacrificing rigor.

Moreover, implementing quantum phase estimation at the fragment level is expected to reduce individual fragment errors to near–chemical precision. Because the total energy error accumulates fragment-wise, improving each fragment’s precision directly lowers the global error. This property is well suited to near-term quantum processors with limited numbers of high-fidelity logical qubits.

Overall, the proposed strategy offers a practical and verifiable route to simulate increasingly large biomolecules, aligning with the current trajectory of quantum hardware development and providing a clear framework for ongoing methodological enhancement.

Author Contributions

Conceptualization, P.A.-A. and L.C.S.; Methodology, P.A.-A. and L.C.S.; Software, P.A.-A.; Validation, P.A.-A. and L.C.S.; Formal Analysis, P.A.-A. and L.C.S.; Investigation, P.A.-A. and L.C.S.; Resources, P.A.-A.; Data Curation, P.A.-A. and L.C.S.; Writing—Original Draft Preparation, P.A.-A.; Writing—Review and Editing, P.A.-A. and L.C.S.; Visualization, P.A.-A. and L.C.S.; Supervision, P.A.-A.; Project Administration, P.A.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The QMProt dataset is publicly available on the Pennylane platform for direct integration with quantum computing pipelines. It can be accessed at: https://pennylane.ai/datasets/collection/qmprot (accessed on 19 May 2025).

Acknowledgments

The authors gratefully acknowledge Guillermo Alonso-Linaje and Diego Guala for their valuable comments and discussions during the development of this project. We also extend our special thanks to the Pennylane team for their ongoing work on the resource_estimation module and for providing helpful guidance that supported the validation and implementation phases of this study.

Conflicts of Interest

Authors Parfait Atchade-Adelomou and Laia Coronas Sala were employed by Lighthouse Disruptive Innovation Group (LDIG). The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Source Code

The Python scripts used for model fitting, Toffoli gate estimation, and comparisons between full and fragmented simulations are publicly available at https://github.com/pifparfait/qmprot_strategy (accessed on 19 May 2025).

Appendix B. T-Gate Count from the Big-O Bound

We translate the Big-O bound from the reference [22] on the T-gate count into an explicit formula with constant factors. Specifically, the T-gate count required to implement the quantum lookup table is given by

T_{total} (n, ε) = C_{1} \sqrt{2^{n} ln \frac{1}{ε}} + C_{2} ln \frac{1}{ε},

(A1)

where

n = {log}_{2} (N)

is the number of address qubits for a lookup table with N entries, and

ε

denotes the allowable error tolerance in the state preparation. The constants

C_{1}

and

C_{2}

depend on the gate decomposition and error-correction protocols [22].

This formula underpins the analysis in Section 3 and is directly used to estimate the quantum resource cost for each molecule (see Table 2).

The asymptotic bound is given by

O (\sqrt{2^{n} ln \frac{1}{ε}} + ln \frac{1}{ε}),

(A2)

which implies that, for a sufficiently large n and small

ε

, there exist constants

C_{1}^{'}, C_{2}^{'} > 0

, such that

T_{total} (n, ε) \leq C_{1}^{'} \sqrt{2^{n} ln \frac{1}{ε}} + C_{2}^{'} ln \frac{1}{ε} .

Our goal is to determine these constant factors explicitly. Following the work developed in [22], the partition the memory into blocks of size is given by

λ = \sqrt{2^{n}},

so that the number of blocks is

\frac{2^{n}}{λ} = \frac{2^{n}}{\sqrt{2^{n}}} = \sqrt{2^{n}} .

(A3)

Each block is processed via a quantum routing tree using non-Clifford operations (e.g., CSWAP or Toffoli gates). Let

K_{1}

denote the T-gate cost per block, which includes the decomposition cost (e.g., a Toffoli gate typically decomposes into about 7 T-gates [22,42]) and an extra factor of

\sqrt{ln (1 / ε)}

due to error suppression. Hence, the T-gate cost per block is

T_{block} \approx K_{1} \sqrt{ln \frac{1}{ε}} .

(A4)

After multiplying by the number of blocks in Equation (A3), the primary query cost is

T_{query} \approx K_{1} \sqrt{2^{n}} \sqrt{ln \frac{1}{ε}} = K_{1} \sqrt{2^{n} ln \frac{1}{ε}} .

(A5)

We set

C_{1} = K_{1} .

Ancillary operations (uncomputing and error correction, e.g., via entanglement distillation [22]) contribute a cost proportional to

ln (1 / ε)

. Let

K_{2}

be the unit cost for these operations; then,

T_{aux} \approx K_{2} ln \frac{1}{ε} .

(A6)

We define

C_{2} = K_{2} .

Adding Equations (A5) and (A6) yields equation Equation (A1)

Two key insights justify this formula:

1.: An optimal block partitioning ( $λ = \sqrt{2^{n}}$ ) results in $\sqrt{2^{n}}$ iterations, each costing $K_{1} \sqrt{ln (1 / ε)}$ T–gates [22].
2.: Ancillary operations add an overhead scaling linearly with $ln (1 / ε)$ [22].

The value of

C_{1}

is influenced by the cost of decomposing complex non-Clifford operations. While a raw Toffoli gate might require approximately 7 T-gates [22,42], the unified architecture integrates several optimizations (e.g., reducing SWAP overhead, parallel processing, and gate cancellations) that lower the effective cost per block. Empirical and theoretical analyses in [22] and related literature (including [15,42,43]) support an effective value for

C_{1}

in the range of 3 to 10. Similarly,

C_{2}

reflects the cost of auxiliary operations for amplitude amplification and uncomputing. The procedure in [22] applies two such operations per query, each with a low, nearly constant T–gate cost (typically around 1 T–gate), leading to

C_{2} \geq 2

.

The expression in Equation (A1) is essential for precise resource estimation in fault-tolerant quantum implementations of lookup tables and is supported by the detailed analyses in [22].

Appendix C. Per-Fragment Coefficient Determination

This appendix rigorously derives the empirical coefficients

C_{1, i}

and

C_{2, i}

employed in Equation (20) and reported in Table 4 of the main text. It supplements the study by formalizing the relation between the measured Toffoli counts and the theoretical scaling parameters.

Let i index each Hamiltonian fragment, with the following:

$m_{i} = # {coefs}_{i}$ , the number of Hamiltonian terms (input).
$n_{i} = ⌈ {log}_{2} m_{i} ⌉$ , the required qubit-index width.

Two precision settings

b_{1}, b_{2}

are chosen a priori to probe sensitivity:

ε_{k} = \frac{n_{i}}{2^{b_{k}}}, b_{1} = 20, b_{2} = 18 .

(A7)

For each

(i, k)

, compute the auxiliary variables:

X_{1, i}^{(k)} = \sqrt{2^{n_{i}} {log}_{2} (\frac{n_{i}}{ε_{k}})},

(A8)

X_{2, i}^{(k)} = {log}_{2} (\frac{n_{i}}{ε_{k}}) .

(A9)

Denote with

T_{i}^{(*) (k)}

the Toffoli count measured via PennyLane’s resource_estimation under precision

ε_{k}

(input). The linear model for each fragment is as follows:

T_{i}^{(*) (k)} = C_{1, i} X_{1, i}^{(k)} + C_{2, i} X_{2, i}^{(k)}, k = 1, 2 .

(A10)

Solving the

2 \times 2

system yields closed-form expressions:

C_{1, i} = \frac{T_{i}^{(*) (1)} X_{2, i}^{(2)} - T_{i}^{(*) (2)} X_{2, i}^{(1)}}{X_{1, i}^{(1)} X_{2, i}^{(2)} - X_{1, i}^{(2)} X_{2, i}^{(1)}},

(A11)

C_{2, i} = \frac{T_{i}^{(*) (2)} X_{1, i}^{(1)} - T_{i}^{(*) (1)} X_{1, i}^{(2)}}{X_{1, i}^{(1)} X_{2, i}^{(2)} - X_{1, i}^{(2)} X_{2, i}^{(1)}} .

(A12)

In practice, the global constants

(C_{1}, C_{2})

reported in Table 4 are obtained via a least-squares fit across all fragments:

{({\hat{C}}_{1}, {\hat{C}}_{2})}^{⊤} = arg min_{C_{1}, C_{2}} \sum_{i} \sum_{k = 1}^{2} {[T_{i}^{(*) (k)} - (C_{1} X_{1, i}^{(k)} + C_{2} X_{2, i}^{(k)})]}^{2} .

(A13)

Appendix D. Regression Model Analysis for Qubits

This appendix presents a regression-based analysis to estimate the number of quantum qubits required to simulate molecular systems, based on their electron count. In contrast to classical approaches relying on ordinary least squares (OLSs), we adopt robust regression techniques—namely the Huber regressor and Theil–Sen estimator—combined with molecular segmentation to improve stability and resistance to outliers.

Appendix D.1. Model Formulation

We consider a log-linear model relating the number of logical qubits

n_{qubits}

to the number of electrons

n_{electrons}

:

log (n_{qubits}) = α + β \cdot n_{electrons} + ε,

(A14)

where

α

and

β

are regression coefficients and

ε

is a residual error term. This formulation enables exponential growth in qubit requirements while allowing for linear estimation in the log-domain.

Robust estimators were employed to mitigate the influence of outliers in the training data. Additionally, the dataset was segmented into three groups based on the electron count:

Small: $n_{electrons} \leq 150$
Medium: $151 \leq n_{electrons} \leq 500$
Large: $n_{electrons} > 500$

The predicted qubit counts are then obtained by exponentiating the log prediction:

{\hat{n}}_{qubits} = exp (α + β \cdot n_{electrons}) .

(A15)

Appendix D.2. Evaluation Metrics

To assess model performance, we compute the coefficient of determination

R^{2}

, defined as follows:

R^{2} = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}},

(A16)

where

y_{i}

and

{\hat{y}}_{i}

represent the observed and predicted

log (n_{qubits})

values, respectively.

We further report the following:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |,

(A17)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

(A18)

σ = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i} - \bar{ε})}^{2}},

(A19)

CV = (\frac{σ}{\bar{\hat{y}}}) \cdot 100 .

(A20)

Appendix D.3. Confidence Interval (95%) via Delta Method

For the log-linear model, 95% confidence intervals are computed using the delta method. Let

\hat{y} = α + β x

be the predicted

log (n_{qubits})

for a given electron count x. The Jacobian vector is as follows:

J (x) = [\begin{matrix} \frac{\partial \hat{y}}{\partial α} & \frac{\partial \hat{y}}{\partial β} \end{matrix}] = [1, x],

(A21)

and the variance of the prediction is as follows:

Var (\hat{y}) = J (x) \cdot Σ \cdot J {(x)}^{⊤},

(A22)

where

Σ

is the parameter covariance matrix. The 95% confidence interval in log-space is as follows:

\hat{y} \pm t_{n - 2, 0.975} \cdot \sqrt{Var (\hat{y})},

(A23)

which is back-transformed to yield the following:

{\hat{n}}_{qubits} \in [exp (\hat{y} - Δ), exp (\hat{y} + Δ)],

(A24)

where

Δ = t_{n - 2, 0.975} \cdot \sqrt{Var (\hat{y})}

, and

t_{n - 2, 0.975}

is the t-quantile with

n - 2

degrees of freedom for a two-tailed 95% confidence level.

The use of log-linear robust regression models across molecular segments improves the consistency of predictions and their interpretability.

References

Bryce, R.A. What Next for Quantum Mechanics in Structure-Based Drug Discovery? Methods Mol. Biol. 2020, 2114, 339–353. [Google Scholar] [CrossRef] [PubMed]
Baiardi, A.; Christandl, M.; Reiher, M. Quantum Computing for Molecular Biology. Chembiochem 2023, 24, e202300120. [Google Scholar] [CrossRef]
Sala, L.C.; Atchade-Adelemou, P. Efficient Protein Ground State Energy Computation via Fragmentation and Reassembly. arXiv 2025, arXiv:2501.03766. [Google Scholar]
Li, S.; Li, W.; Jiang, Y. Generalized energy-based fragmentation approach for computing the ground-state energies and properties of large molecules. J. Phys. Chem. A 2007, 111, 2193–2199. [Google Scholar] [CrossRef] [PubMed]
Deev, V.; Collins, M.A. Approximate ab initio energies by systematic molecular fragmentation. J. Chem. Phys. 2005, 122, 154102. [Google Scholar] [CrossRef]
Bettens, R.P.A.; Lee, A.M. Accurately reproducing ab initio electrostatic potentials with multipoles and fragmentation. J. Phys. Chem. A 2006, 110, 8777. [Google Scholar] [CrossRef]
Atchade-Adelomou, P. Quantum algorithms for solving hard constrained optimisation problems. arXiv 2022, arXiv:2202.13125. [Google Scholar]
Reiher, M.; Wiebe, N.; Svore, K.M.; Wecker, D.; Troyer, M. Elucidating reaction mechanisms on quantum computers. Proc. Natl. Acad. Sci. USA 2017, 114, 7555–7560. [Google Scholar] [CrossRef]
Hohenberg, P.; Kohn, W. Density functional theory (DFT). Phys. Rev 1964, 136, B864. [Google Scholar] [CrossRef]
Evangelisti, S.; Bendazzoli, G.L.; Gagliardi, L. Complete active-space configuration interaction with optimized orbitals: Application to Li2. Int. J. Quantum Chem. 1995, 55, 277–280. [Google Scholar] [CrossRef]
McArdle, S.; Endo, S.; Aspuru-Guzik, A.; Benjamin, S.C.; Yuan, X. Quantum computational chemistry. Rev. Mod. Phys. 2020, 92, 015003. [Google Scholar] [CrossRef]
Yang, P.J.; Sugiyama, M.; Tsuda, K.; Yanai, T. Artificial neural networks applied as molecular wave function solvers. J. Chem. Theory Comput. 2020, 16, 3513–3529. [Google Scholar] [CrossRef] [PubMed]
Cao, Y.; Romero, J.; Olson, J.P.; Degroote, M.; Johnson, P.D.; Kieferová, M.; Kivlichan, I.D.; Menke, T.; Peropadre, B.; Sawaya, N.P.; et al. Quantum chemistry in the age of quantum computing. Chem. Rev. 2019, 119, 10856–10915. [Google Scholar] [CrossRef]
Wecker, D.; Bauer, B.; Clark, B.K.; Hastings, M.B.; Troyer, M. Gate-count estimates for performing quantum chemistry on small quantum computers. Phys. Rev. A 2014, 90, 022305. [Google Scholar] [CrossRef]
Low, G.H.; Kliuchnikov, V.; Schaeffer, L. Trading T gates for dirty qubits in state preparation and unitary synthesis. Quantum 2024, 8, 1375. [Google Scholar] [CrossRef]
Kitaura, K.; Ikeo, E.; Asada, T.; Nakano, T.; Uebayasi, M. Fragment Molecular Orbital Method: An Approximate Computational Method for Large Molecules. Chem. Phys. Lett. 1999, 313, 701–706. [Google Scholar] [CrossRef]
Morokuma, M.; collaborators. ONIOM: A Multilayered Integrated MO + MM Method. J. Mol. Struct. 1999, 461–462, 1–21. [Google Scholar]
ApSimon, J.; J. Bearpark, R. Adaptive QM/MM Methods for Chemical Reaction Dynamics. WIREs Comput. Mol. Sci. 2023, 13, e1618. [Google Scholar]
Bowling, P.E.; Broderick, D.R.; Herbert, J.M. Convergent Protocols for Computing Protein–Ligand Interaction Energies Using Fragment-Based Quantum Chemistry. J. Chem. Theory Comput. 2023, 19, 3656–3670. [Google Scholar] [CrossRef]
Vornweg, J.R.; Wolter, M.; Jacob, C.R. A simple and consistent quantum-chemical fragmentation scheme for proteins that includes two-body contributions. J. Chem. Theory Comput. 2022, 18, 4516–4527. [Google Scholar] [CrossRef]
Bravyi, S.; Gambetta, J.M.; Mezzacapo, A.; Temme, K. Tapering off qubits to simulate fermionic Hamiltonians. arXiv 2017. [Google Scholar] [CrossRef]
Zhu, S.; Sundaram, A.; Low, G.H. Unified architecture for a quantum lookup table. arXiv 2024, arXiv:2406.18030. [Google Scholar]
Fomichev, S.; Hejazi, K.; Zini, M.S.; Kiser, M.; Fraxanet, J.; Casares, P.A.M.; Delgado, A.; Huh, J.; Voigt, A.C.; Mueller, J.E.; et al. Initial state preparation for quantum chemistry on quantum computers. PRX Quantum 2024, 5, 040339. [Google Scholar] [CrossRef]
Gosset, D.; Kothari, R.; Wu, K. Quantum state preparation with optimal T-count. arXiv 2024, arXiv:2411.04790. [Google Scholar]
Carrera Vazquez, A.; Woerner, S. Efficient state preparation for quantum amplitude estimation. Phys. Rev. Appl. 2021, 15, 034027. [Google Scholar] [CrossRef]
von Burg, V.; Low, G.H.; Häner, T.; Steiger, D.S.; Reiher, M.; Roetteler, M.; Troyer, M. Quantum computing enhanced computational catalysis. Phys. Rev. Res. 2021, 3. [Google Scholar] [CrossRef]
Schollwöck, U. The density-matrix renormalization group. Rev. Mod. Phys. 2005, 77, 259–315. [Google Scholar] [CrossRef]
Perez-Garcia, D.; Verstraete, F.; Wolf, M.M.; Cirac, J.I. Matrix product state representations. arXiv 2006. [Google Scholar] [CrossRef]
Orús, R.; Mugel, S.; Lizaso, E. Tensor Networks for Complex Quantum Systems. Nat. Rev. Phys. 2019, 1, 538–550. [Google Scholar] [CrossRef]
Chan, G.K.L.; Zgid, D. The Density Matrix Renormalization Group in Quantum Chemistry. Annu. Rep. Comput. Chem. 2009, 5, 149–162. [Google Scholar]
Møller, C.; Plesset, M.S. Note on an Approximation Treatment for Many-Electron Systems. Phys. Rev. 1934, 46, 618–622. [Google Scholar] [CrossRef]
Zou, H.; Magnusson, E.; Brunander, H.; Dobrautz, W.; Rahm, M. Multireference error mitigation for quantum computation of chemistry. arXiv 2025. [Google Scholar] [CrossRef]
Sala, L.C.; Atchade-Adelomou, P. QMProt: A Comprehensive Dataset of Quantum Properties for Proteins. arXiv 2025, arXiv:2505.08956. [Google Scholar]
Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics: Methodology and Distribution; Springer: Berlin/Heidelberg, Germany, 1992; pp. 492–518. [Google Scholar]
Theil, H. A rank-invariant method of linear and polynomial regression analysis. Indag. Math. 1950, 12, 173. [Google Scholar]
Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
PennyLaneAI Development Team. test_qrom_state_prep.py: QROM State Preparation Test. Available online: https://github.com/PennyLaneAI/pennylane/blob/qrom_state_prep/tests/templates/test_state_preparations/test_qrom_state_prep.py (accessed on 17 June 2025).
PennyLaneAI Development Team. test_resource_qubitization.py: Resource Qubitization Test. 2025. Available online: https://github.com/PennyLaneAI/pennylane/blob/5fc5d02308d25831594ecb41e4acc6f74ce2da30/pennylane/labs/tests/resource_estimation/templates/test_resource_qubitization.py (accessed on 17 June 2025).
Xanadu Quantum Technologies Inc. qml.labs.resource_estimation—PennyLane 0.41.1 Documentation. 2024. Available online: https://docs.pennylane.ai/en/stable/code/api/pennylane.labs.resource_estimation.html (accessed on 16 June 2025).
contributors, P. pennylane.labs.resource_estimation: Resource Estimation Module. 2025. Available online: https://github.com/PennyLaneAI/pennylane/tree/5fc5d02308d25831594ecb41e4acc6f74ce2da30/pennylane/labs (accessed on 16 June 2025).
Kitaev, A.Y. Quantum measurements and the Abelian stabilizer problem. arXiv 1995. [Google Scholar] [CrossRef]
Amy, M.; Maslov, D.; Mosca, M.; Roetteler, M. A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal Quantum Circuits. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2013, 32, 818–830. [Google Scholar] [CrossRef]
Ross, N.J.; Selinger, P. Optimal ancilla-free Clifford+ T approximation of z-rotations. arXiv 2014, arXiv:1403.2975. [Google Scholar]

Figure 1. Scaling of quantum resource requirements with electron count, shown on log–log axes with 95% confidence intervals. (Left) Exponential regression of Hamiltonian coefficient counts versus number of electrons. Experimental data (filled circles), fitted model (solid line), and fragmentation-based predictions (open circles) are plotted. (Right) Linear regression of qubit requirements versus electron count, including measured values (filled circles), regression line (solid), and fragmentation estimates (open). Shaded bands in both panels represent the 95% confidence intervals of the fits.

Figure 2. Linear fit of the required qubits (

n_{qubits}

) as a function of the number of electrons (

n_{electrons}

), using data from QMProt [33]. Blue markers denote empirical values, the orange line represents the fitted model

n_{qubits} = α + β n_{electrons}

, and the shaded band indicates the 95% confidence interval. The nearly linear relationship provides a useful estimate of qubit requirements as system size increases.

Figure 2. Linear fit of the required qubits (

n_{qubits}

) as a function of the number of electrons (

n_{electrons}

), using data from QMProt [33]. Blue markers denote empirical values, the orange line represents the fitted model

n_{qubits} = α + β n_{electrons}

, and the shaded band indicates the 95% confidence interval. The nearly linear relationship provides a useful estimate of qubit requirements as system size increases.

Figure 3. Predicted quantum resource requirements as a function of the number of electrons. (Top-left) exponential model of coefficient growth. (Top-right) linear model for qubit requirements. (Bottom-left) estimated Toffoli gate counts based on predicted coefficients. (Bottom-right) total qubit requirements. Shaded regions indicate 95% confidence intervals. These trends support the scalability of the resource model across molecular sizes.

Table 1. Summary of electrons, orbitals, theoretical energy (GT), calculated energy (Em), and relative error (%RE) for different peptides.

Peptides	Electrons	Orbitals	GT	Em	%RE
Gly-Gly	70	53	$- 4.83 \times 10^{2}$	$- 4.83 \times 10^{2}$	$4.00 \times 10^{- 3}$
Gly-Ala	78	60	$- 5.22 \times 10^{2}$	$- 5.22 \times 10^{2}$	$3.16 \times 10^{- 3}$
Glu-Gly	108	82	$- 7.45 \times 10^{2}$	$- 7.45 \times 10^{2}$	$2.52 \times 10^{- 3}$
Ser-Cys	110	81	$- 1.03 \times 10^{3}$	$- 1.03 \times 10^{3}$	$1.58 \times 10^{- 3}$
Carnosine (Ala-His)	120	94	$- 7.81 \times 10^{2}$	$- 7.81 \times 10^{2}$	$1.94 \times 10^{- 3}$
Gly-Ser	86	65	$- 5.96 \times 10^{2}$	$- 5.96 \times 10^{2}$	$2.62 \times 10^{- 3}$
Pro-Gly	92	72	$- 5.98 \times 10^{2}$	$- 5.98 \times 10^{2}$	$3.33 \times 10^{- 3}$
Cystine (Cys-Cys)	126	90	$- 1.42 \times 10^{3}$	$- 1.42 \times 10^{3}$	$5.40 \times 10^{- 4}$
Leu-Thr	126	100	$- 7.89 \times 10^{2}$	$- 7.89 \times 10^{2}$	$1.82 \times 10^{- 3}$
Gly-Val-Ala	132	104	$- 8.42 \times 10^{2}$	$- 8.42 \times 10^{2}$	$3.59 \times 10^{- 3}$
Thr-Lys	134	106	$- 8.43 \times 10^{2}$	$- 8.43 \times 10^{2}$	$1.88 \times 10^{- 3}$
Val-Ala-Ser	148	116	$- 9.54 \times 10^{2}$	$- 9.54 \times 10^{2}$	$3.09 \times 10^{- 3}$
Phe-Ile	150	122	$- 9.03 \times 10^{2}$	$- 9.03 \times 10^{2}$	$1.60 \times 10^{- 3}$
Ser-Gly-Glu	154	117	$- 1.06 \times 10^{3}$	$- 1.06 \times 10^{3}$	$3.40 \times 10^{- 3}$
Aspartame (Asp-Phe)	156	123	$- 1.01 \times 10^{3}$	$- 1.01 \times 10^{3}$	$5.00 \times 10^{- 2}$
Tyr-Asp	156	121	$- 1.05 \times 10^{3}$	$- 1.05 \times 10^{3}$	$1.47 \times 10^{- 3}$
Glutathione (Cys-Glu-Gly)	162	121	$- 1.38 \times 10^{3}$	$- 1.38 \times 10^{3}$	$2.34 \times 10^{- 3}$
Arg-Met	164	127	$- 1.31 \times 10^{3}$	$- 1.31 \times 10^{3}$	$1.02 \times 10^{- 3}$
Val-Asp-Ser	170	131	$- 1.14 \times 10^{3}$	$- 1.14 \times 10^{3}$	$2.71 \times 10^{- 3}$
Gly-His-Lys	182	144	$- 1.16 \times 10^{3}$	$- 1.16 \times 10^{3}$	$2.94 \times 10^{- 3}$
Trp-His	180	144	$- 1.14 \times 10^{3}$	$- 1.14 \times 10^{3}$	$1.84 \times 10^{- 3}$
Tyr-Arg	180	143	$- 1.14 \times 10^{3}$	$- 1.14 \times 10^{3}$	$1.43 \times 10^{- 3}$
His-Arg-Val	220	175	$- 1.38 \times 10^{3}$	$- 1.38 \times 10^{3}$	$2.01 \times 10^{- 3}$
Tuftsin (Thr-Lys-Pro-Arg)	270	215	$- 1.64 \times 10^{3}$	$- 1.68 \times 10^{3}$	$2.84$
Methionine-enkephalin (Tyr-Gly-Gly-Phe-Met)	304	239	$- 2.16 \times 10^{3}$	$- 2.21 \times 10^{3}$	$2.06$
Leucine-enkephalin (Tyr-Gly-Gly-Phe-Leu)	296	237	$- 1.81 \times 10^{3}$	$- 1.85 \times 10^{3}$	$2.59$
Oxytocin (Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu–Gly)	536	419	$- 3.84 \times 10^{3}$	$- 3.91 \times 10^{3}$	$1.85$
Opiorphin (Gln-Arg-Phe-Ser-Arg)	558	446	$- 2.29 \times 10^{3}$	$- 2.35 \times 10^{3}$	$2.53$
Bradykinin (Arg-Pro-Pro-Gly-Phe-Ser-Pro-Phe-Arg)	566	453	$- 3.43 \times 10^{3}$	$- 3.53 \times 10^{3}$	$2.92$
Neurotensin (Glu-Leu-Tyr-Glu-Asn-Lys-Pro-Arg-Arg-Pro-Tyr-Ile-Leu)	896	716	$- 5.43 \times 10^{3}$	$- 5.27 \times 10^{3}$	$2.83$
Gastrin-14 (Trp-Leu-Glu-Glu-Glu-Glu-Glu-Ala-Tyr-Gly-Trp-Met-Asp-Phe)	970	763	$- 6.39 \times 10^{3}$	$- 6.25 \times 10^{3}$	$2.31$
Angiotensin IV (Val-Tyr-Ile-His-Pro-Phe)	414	334	$- 2.47 \times 10^{3}$	$- 2.55 \times 10^{3}$	$2.98$
Angiotensin II (Asp-Arg-Val-Tyr-Ile-His-Pro-Phe)	558	446	$- 3.40 \times 10^{3}$	$- 3.49 \times 10^{3}$	$2.83$
Angiotensin I (Asp-Arg-Val-Tyr-Ile-His-Pro-Phe-His-Leu)	692	554	$- 4.19 \times 10^{3}$	$- 4.32 \times 10^{3}$	$2.97$
Glucagon (His-Ser-Gln-Gly-Thr-Phe-Thr-Ser-Asp-Tyr-Ser-Lys-Tyr-
Leu-Asp-Ser-Arg-Arg-Ala-Gln-Asp-Phe-Val-Gln-Trp-Leu-Met-Asn-Thr)	1852	1459	$- 1.18 \times 10^{4}$	$- 1.22 \times 10^{4}$	$2.80$

Table 2. Resource comparison for peptide simulations: theoretical Toffoli gate estimates using the full Hamiltonian (

C_{1} = C_{2} = 3

) versus the proposed fragmentation–reassembly (base-structure) approach. For each molecule and version (Ver.: Orig. = Original; Prop. = Proposed), the table lists the number of Hamiltonian coefficients (# Coeffs.), the estimated Toffoli gate count, the Toffoli reduction factor (Red. (Toffoli)), the electron count (when available), and the coefficient reduction factor (Red. (Coeff.)).

Table 2. Resource comparison for peptide simulations: theoretical Toffoli gate estimates using the full Hamiltonian (

C_{1} = C_{2} = 3

) versus the proposed fragmentation–reassembly (base-structure) approach. For each molecule and version (Ver.: Orig. = Original; Prop. = Proposed), the table lists the number of Hamiltonian coefficients (# Coeffs.), the estimated Toffoli gate count, the Toffoli reduction factor (Red. (Toffoli)), the electron count (when available), and the coefficient reduction factor (Red. (Coeff.)).

Molecules	Ver.	# Coeffs.	Toffoli	Red. (Toffoli)	Electrons	Red. (Coeff.)
Alanine	Orig.	$2.73 \times 10^{6}$	$2.02 \times 10^{4}$	-	$4.80 \times 10^{1}$	-
R_ala + Base Structures	Prop.	$5.79 \times 10^{4}$	$2.59 \times 10^{3}$	$7.80$	–	$4.72 \times 10^{1}$
Histidine	Orig.	$2.38 \times 10^{7}$	$5.67 \times 10^{4}$	-	$8.20 \times 10^{1}$	-
R_his + Base Structures	Prop.	$2.03 \times 10^{6}$	$1.43 \times 10^{4}$	$3.96$	–	$1.17 \times 10^{1}$
Leucine	Orig.	$1.62 \times 10^{7}$	$4.02 \times 10^{4}$	-	$7.20 \times 10^{1}$	-
R_leu + Base Structures	Prop.	$5.76 \times 10^{5}$	$1.02 \times 10^{4}$	$3.96$	–	$2.81 \times 10^{1}$
Isoleucine	Orig.	$1.64 \times 10^{7}$	$4.02 \times 10^{4}$	-	$7.20 \times 10^{1}$	-
R_ile + Base Structures	Prop.	$5.76 \times 10^{5}$	$1.02 \times 10^{4}$	$3.96$	–	$2.84 \times 10^{1}$
Lysine	Orig.	$2.39 \times 10^{7}$	$5.67 \times 10^{4}$	-	$8.00 \times 10^{1}$	-
R_lys + Base Structures	Prop.	$2.25 \times 10^{6}$	$2.02 \times 10^{4}$	$2.82$	–	$1.06 \times 10^{1}$
Methionine	Orig.	$1.78 \times 10^{7}$	$5.67 \times 10^{4}$	-	$8.00 \times 10^{1}$	-
R_met + Base Structures	Prop.	$5.63 \times 10^{5}$	$1.02 \times 10^{4}$	$5.64$	–	$3.16 \times 10^{1}$
Phenylalanine	Orig.	$3.61 \times 10^{7}$	$8.01 \times 10^{4}$	-	$8.80 \times 10^{1}$	-
R_phe + Base Structures	Prop.	$3.78 \times 10^{6}$	$2.02 \times 10^{4}$	$3.99$	–	$9.56$
Threonine	Orig.	$8.36 \times 10^{6}$	$2.58 \times 10^{4}$	-	$6.40 \times 10^{1}$	-
R_thr + Base Structures	Prop.	$1.06 \times 10^{5}$	$3.90 \times 10^{3}$	$7.91$	–	$7.92 \times 10^{1}$
Tryptophan	Orig.	$9.24 \times 10^{7}$	$6.11 \times 10^{4}$	-	$1.08 \times 10^{2}$	-
R_trp + Base Structures	Prop.	$1.49 \times 10^{7}$	$9.54 \times 10^{3}$	$2.83$	–	$6.19$
Valine	Orig.	$9.82 \times 10^{6}$	$2.58 \times 10^{4}$	-	$6.40 \times 10^{1}$	-
R_val + Base Structures	Prop.	$3.98 \times 10^{5}$	$3.89 \times 10^{3}$	$5.63$	–	$2.47 \times 10^{1}$
Arginine	Orig.	$4.36 \times 10^{7}$	$8.01 \times 10^{4}$	-	$9.40 \times 10^{1}$	-
R_arg + Base Structures	Prop.	$5.47 \times 10^{6}$	$3.13 \times 10^{4}$	$2.56$	–	$7.97$
Cysteine	Orig.	$6.19 \times 10^{6}$	$2.38 \times 10^{4}$	-	$6.60 \times 10^{1}$	-
R_cys + Base Structures	Prop.	$1.56 \times 10^{5}$	$4.82 \times 10^{3}$	$5.62$	–	$3.97 \times 10^{1}$
Glutamine	Orig.	$1.83 \times 10^{7}$	$5.67 \times 10^{4}$	-	$7.80 \times 10^{1}$	-
R_gln + Base Structures	Prop.	$8.73 \times 10^{5}$	$1.43 \times 10^{4}$	$5.64$	–	$2.09 \times 10^{1}$
Asparagine	Orig.	$1.13 \times 10^{7}$	$3.79 \times 10^{4}$	-	$7.00 \times 10^{1}$	-
R_asn + Base Structures	Prop.	$3.45 \times 10^{5}$	$6.97 \times 10^{3}$	$5.63$	–	$3.28 \times 10^{1}$
Tyrosine	Orig.	$4.85 \times 10^{7}$	$8.00 \times 10^{4}$	-	$9.60 \times 10^{1}$	-
R_tyr + Base Structures	Prop.	$4.32 \times 10^{6}$	$3.17 \times 10^{4}$	$2.52$	–	$1.12 \times 10^{1}$
Serine	Orig.	$4.53 \times 10^{6}$	$2.00 \times 10^{4}$	-	$5.60 \times 10^{1}$	-
R_ser + Base Structures	Prop.	$9.70 \times 10^{4}$	$4.82 \times 10^{3}$	$7.91$	–	$4.67 \times 10^{1}$
Glycine	Orig.	$1.16 \times 10^{6}$	$1.43 \times 10^{4}$	-	$4.00 \times 10^{1}$	-
R_gly + Base Structures	Prop.	$5.60 \times 10^{4}$	$3.38 \times 10^{3}$	$5.58$	–	$2.08 \times 10^{1}$
Aspartic_Acid	Orig.	$1.05 \times 10^{7}$	$3.79 \times 10^{4}$	-	$7.00 \times 10^{1}$	-
R_asp + Base Structures	Prop.	$4.31 \times 10^{5}$	$6.97 \times 10^{3}$	$5.63$	–	$2.45 \times 10^{1}$
Glutamic_Acid	Orig.	$1.72 \times 10^{7}$	$5.67 \times 10^{4}$	-	$7.80 \times 10^{1}$	-
R_glu + Base Structures	Prop.	$1.22 \times 10^{6}$	$1.43 \times 10^{4}$	$3.99$	–	$1.41 \times 10^{1}$
Proline	Orig.	$8.37 \times 10^{6}$	$2.58 \times 10^{4}$	-	$6.20 \times 10^{1}$	-
R_pro + Base Structures	Prop.	$1.29 \times 10^{5}$	$4.82 \times 10^{3}$	$7.91$	–	$6.48 \times 10^{1}$
Glucagon	Orig.	$4.33 \times 10^{48}$	$3.24 \times 10^{25}$	-	$1.85 \times 10^{3}$	-
Amino acids - Glucagon	Prop.	$5.02 \times 10^{8}$	$3.11 \times 10^{5}$	$1.04 \times 10^{20}$	–	$8.63 \times 10^{39}$
Oxytocin	Orig.	$8.85 \times 10^{17}$	$1.44 \times 10^{10}$	-	$5.36 \times 10^{2}$	-
Amino acids - Oxytocin	Prop.	$1.31 \times 10^{8}$	$1.55 \times 10^{5}$	$9.26 \times 10^{4}$	–	$6.76 \times 10^{9}$
Vasopressin	Orig.	$7.81 \times 10^{31}$	$1.21 \times 10^{17}$	-	$1.13 \times 10^{3}$	-
Amino acids - Vasopressin	Prop.	$1.76 \times 10^{8}$	$2.20 \times 10^{5}$	$5.50 \times 10^{11}$	–	$4.44 \times 10^{23}$
Angiotensin II	Orig.	$2.88 \times 10^{18}$	$2.88 \times 10^{10}$	-	$5.58 \times 10^{2}$	-
Amino acids - Angiotensin II	Prop.	$1.93 \times 10^{8}$	$2.20 \times 10^{5}$	$1.31 \times 10^{5}$	–	$1.49 \times 10^{10}$
Kyotorphin	Orig.	$4.41 \times 10^{9}$	$1.24 \times 10^{6}$	-	$1.80 \times 10^{2}$	-
Amino acids - Kyotorphin	Prop.	$8.84 \times 10^{7}$	$1.55 \times 10^{5}$	$8.00$	–	$5.00 \times 10^{1}$
Methionine-enkephalin	Orig.	$3.44 \times 10^{12}$	$2.81 \times 10^{7}$	-	$3.04 \times 10^{2}$	-
Amino acids - Methionine-enkephalin	Prop.	$1.03 \times 10^{8}$	$1.55 \times 10^{5}$	$1.81 \times 10^{2}$	–	$3.34 \times 10^{4}$
Leucine-enkephalin	Orig.	$2.24 \times 10^{12}$	$2.81 \times 10^{7}$	-	$2.96 \times 10^{2}$	-
Amino acids - Leucine-enkephalin	Prop.	$1.01 \times 10^{8}$	$1.55 \times 10^{5}$	$1.81 \times 10^{2}$	–	$2.21 \times 10^{4}$
Tuftsin	Orig.	$5.54 \times 10^{11}$	$1.41 \times 10^{7}$	-	$2.70 \times 10^{2}$	-
Amino acids - Tuftsin	Prop.	$8.22 \times 10^{7}$	$1.55 \times 10^{5}$	$9.05 \times 10^{1}$	–	$6.74 \times 10^{3}$
Opiorphin	Orig.	$1.19 \times 10^{14}$	$1.59 \times 10^{8}$	-	$3.70 \times 10^{2}$	-
Amino acids - Opiorphin	Prop.	$1.42 \times 10^{8}$	$2.20 \times 10^{5}$	$7.24 \times 10^{2}$	–	$8.37 \times 10^{5}$
Angiotensin IV	Orig.	$1.26 \times 10^{15}$	$6.37 \times 10^{8}$	-	$4.14 \times 10^{2}$	-
Amino acids - Angiotensin IV	Prop.	$1.41 \times 10^{8}$	$2.20 \times 10^{5}$	$2.90 \times 10^{3}$	–	$8.95 \times 10^{6}$
Neurotensin	Orig.	$2.20 \times 10^{26}$	$2.36 \times 10^{14}$	-	$8.96 \times 10^{2}$	-
Amino acids - Neurotensin	Prop.	$3.12 \times 10^{8}$	$3.11 \times 10^{5}$	$7.59 \times 10^{8}$	–	$7.05 \times 10^{17}$
Bradykinin	Orig.	$4.43 \times 10^{18}$	$2.88 \times 10^{10}$	-	$5.66 \times 10^{2}$	-
Amino acids - Bradykinin	Prop.	$1.86 \times 10^{8}$	$2.20 \times 10^{5}$	$1.31 \times 10^{5}$	–	$2.38 \times 10^{10}$
Angiotensin I	Orig.	$3.84 \times 10^{21}$	$9.22 \times 10^{11}$	-	$6.92 \times 10^{2}$	-
Amino acids - Angiotensin I	Prop.	$2.33 \times 10^{8}$	$2.20 \times 10^{5}$	$4.19 \times 10^{6}$	–	$1.64 \times 10^{13}$
Gastrin-14	Orig.	$7.03 \times 10^{23}$	$1.02 \times 10^{13}$	-	$7.89 \times 10^{2}$	-
Amino acids - Gastrin-14	Prop.	$2.17 \times 10^{8}$	$2.20 \times 10^{5}$	$6.37 \times 10^{7}$	–	$3.23 \times 10^{15}$
GLU_CYS_GLY	Orig.	$5.47 \times 10^{9}$	$8.95 \times 10^{5}$	-	$1.62 \times 10^{2}$	-
Amino acids - GLU_CYS_GLY	Prop.	$2.46 \times 10^{7}$	$5.67 \times 10^{4}$	$1.58 \times 10^{1}$	–	$2.23 \times 10^{2}$
ALA_HIS	Orig.	$3.01 \times 10^{8}$	$2.25 \times 10^{5}$	-	$1.20 \times 10^{2}$	-
Amino acids - ALA_HIS	Prop.	$2.66 \times 10^{7}$	$5.67 \times 10^{4}$	$4.00$	–	$1.13 \times 10^{1}$
PRO_GLY_PRO	Orig.	$1.87 \times 10^{9}$	$4.49 \times 10^{5}$	-	$1.82 \times 10^{2}$	-
Amino acids - PRO_GLY_PRO	Prop.	$1.79 \times 10^{7}$	$4.49 \times 10^{4}$	$7.99$	–	$1.04 \times 10^{2}$
GLY_HIS_LYS	Orig.	$1.44 \times 10^{10}$	$1.26 \times 10^{6}$	-	$1.44 \times 10^{2}$	-
Amino acids - GLY_HIS_LYS	Prop.	$4.89 \times 10^{7}$	$8.01 \times 10^{4}$	$1.58 \times 10^{1}$	–	$2.94 \times 10^{2}$

Table 3. Comparison of regression models (Huber and Theil–Sen) for predicting log-transformed qubit counts across molecular complexity segments. Metrics include

R^{2}

, the cross-validated

R^{2}

(when applicable), the MAE, the RMSE, the standard deviation, and the coefficient of variation. Segment-specific modeling reveals performance variation by molecular size. Cross-validation scores are omitted (–) where data scarcity limits statistical significance.

Table 3. Comparison of regression models (Huber and Theil–Sen) for predicting log-transformed qubit counts across molecular complexity segments. Metrics include

R^{2}

, the cross-validated

R^{2}

(when applicable), the MAE, the RMSE, the standard deviation, and the coefficient of variation. Segment-specific modeling reveals performance variation by molecular size. Cross-validation scores are omitted (–) where data scarcity limits statistical significance.

Segment	Model	R² Total	R² CV	MAE (log)	RMSE (log)	Std Dev (log)	CV (%)
Small (≤150)	Huber $ϵ = 1.1$	0.922	0.807	0.057	0.081	0.288	6.059
Small (≤150)	Huber $ϵ = 1.35$	0.925	0.824	0.057	0.079	0.288	6.059
Small (≤150)	Huber $ϵ = 1.75$	0.929	0.828	0.056	0.076	0.288	6.059
Small (≤150)	Huber $ϵ = 2.0$	0.931	0.831	0.057	0.076	0.288	6.059
Small (≤150)	Theil–Sen	0.927	0.806	0.056	0.078	0.288	6.059
Medium (151–500)	Huber $ϵ = 1.1$	0.978	–	0.039	0.051	0.347	5.741
Medium (151–500)	Huber $ϵ = 1.35$	0.983	–	0.042	0.045	0.347	5.741
Medium (151–500)	Huber $ϵ = 1.75$	0.983	–	0.042	0.046	0.347	5.741
Medium (151–500)	Huber $ϵ = 2.0$	0.983	–	0.042	0.046	0.347	5.741
Medium (151–500)	Theil–Sen	0.977	–	0.041	0.052	0.347	5.741
Large (>500)	Huber $ϵ = 1.1$	0.946	–	0.078	0.093	0.398	5.591
Large (>500)	Huber $ϵ = 1.35$	0.948	–	0.078	0.091	0.398	5.591
Large (>500)	Huber $ϵ = 1.75$	0.956	–	0.075	0.083	0.398	5.591
Large (>500)	Huber $ϵ = 2.0$	0.956	–	0.075	0.083	0.398	5.591
Large (>500)	Theil–Sen	0.912	–	0.071	0.118	0.398	5.591

Table 4. Comparison of analytical Toffoli estimates [Equation (20),

C_{1} = C_{2} = 3

] with empirical counts (Toffoli *) from PennyLane’s resource estimation module [39], applied to each fragment’s qubitization circuit. A global OLS fit gives

〈 C_{1} 〉 = 2.64

,

〈 C_{2} 〉 = 3.17

(

| Δ C_{1} | = 0.36

,

| Δ C_{2} | = 0.17

), with a mean relative error of 11.9%. See Appendix C for

C_{1, i}

,

C_{2, i}

derivations at different

ε

. Ver.: Orig. = Original; Prop. = Proposed; # Coefs. = number of Hamiltonian coefficients.

Table 4. Comparison of analytical Toffoli estimates [Equation (20),

C_{1} = C_{2} = 3

] with empirical counts (Toffoli *) from PennyLane’s resource estimation module [39], applied to each fragment’s qubitization circuit. A global OLS fit gives

〈 C_{1} 〉 = 2.64

,

〈 C_{2} 〉 = 3.17

(

| Δ C_{1} | = 0.36

,

| Δ C_{2} | = 0.17

), with a mean relative error of 11.9%. See Appendix C for

C_{1, i}

,

C_{2, i}

derivations at different

ε

. Ver.: Orig. = Original; Prop. = Proposed; # Coefs. = number of Hamiltonian coefficients.

Molecules	Ver.	# Coeffs	Toffoli	Toffoli *	$C_{1}$	$C_{2}$	$Δ C_{1}$ (%)	$Δ C_{2}$ (%)	$Δ T$ (%)
Alanine	Orig.	$2.73 \times 10^{6}$	$2.02 \times 10^{4}$	$1.77 \times 10^{4}$	$2.62$	$3.27$	$3.78 \times 10^{- 1}$	$2.68 \times 10^{- 1}$	$1.26 \times 10^{1}$
R_ala + Base Structures	Prop.	$5.79 \times 10^{4}$	$2.59 \times 10^{3}$	$2.37 \times 10^{3}$	$2.74$	$3.05$	$2.56 \times 10^{- 1}$	$4.66 \times 10^{- 2}$	$8.41$
Histidine	Orig.	$2.38 \times 10^{7}$	$5.67 \times 10^{4}$	$5.10 \times 10^{4}$	$2.70$	$3.14$	$3.02 \times 10^{- 1}$	$1.37 \times 10^{- 1}$	$1.01 \times 10^{1}$
R_his + Base Structures	Prop.	$2.03 \times 10^{6}$	$1.43 \times 10^{4}$	$1.27 \times 10^{4}$	$2.67$	$3.06$	$3.31 \times 10^{- 1}$	$5.99 \times 10^{- 2}$	$1.10 \times 10^{1}$
Leucine	Orig.	$1.62 \times 10^{7}$	$4.02 \times 10^{4}$	$3.45 \times 10^{4}$	$2.58$	$3.30$	$4.24 \times 10^{- 1}$	$2.97 \times 10^{- 1}$	$1.41 \times 10^{1}$
R_leu + Base Structures	Prop.	$5.76 \times 10^{5}$	$1.02 \times 10^{4}$	$8.73 \times 10^{3}$	$2.58$	$3.22$	$4.24 \times 10^{- 1}$	$2.21 \times 10^{- 1}$	$1.41 \times 10^{1}$
Isoleucine	Orig.	$1.64 \times 10^{7}$	$4.02 \times 10^{4}$	$3.42 \times 10^{4}$	$2.56$	$3.13$	$4.45 \times 10^{- 1}$	$1.28 \times 10^{- 1}$	$1.48 \times 10^{1}$
R_ile + Base Structures	Prop.	$5.76 \times 10^{5}$	$1.02 \times 10^{4}$	$9.24 \times 10^{3}$	$2.73$	$3.04$	$2.74 \times 10^{- 1}$	$4.33 \times 10^{- 2}$	$9.10$
Lysine	Orig.	$2.39 \times 10^{7}$	$5.67 \times 10^{4}$	$5.05 \times 10^{4}$	$2.67$	$3.12$	$3.30 \times 10^{- 1}$	$1.22 \times 10^{- 1}$	$1.10 \times 10^{1}$
R_lys + Base Structures	Prop.	$2.25 \times 10^{6}$	$2.02 \times 10^{4}$	$1.81 \times 10^{4}$	$2.69$	$3.13$	$3.07 \times 10^{- 1}$	$1.26 \times 10^{- 1}$	$1.02 \times 10^{1}$
Methionine	Orig.	$1.78 \times 10^{7}$	$5.67 \times 10^{4}$	$4.82 \times 10^{4}$	$2.55$	$3.26$	$4.53 \times 10^{- 1}$	$2.55 \times 10^{- 1}$	$1.51 \times 10^{1}$
R_met + Base Structures	Prop.	$5.63 \times 10^{5}$	$1.02 \times 10^{4}$	$9.31 \times 10^{3}$	$2.75$	$3.23$	$2.52 \times 10^{- 1}$	$2.26 \times 10^{- 1}$	$8.35$
Mhenylalanine	Orig.	$3.61 \times 10^{7}$	$8.01 \times 10^{4}$	$7.26 \times 10^{4}$	$2.72$	$3.31$	$2.81 \times 10^{- 1}$	$3.05 \times 10^{- 1}$	$9.36$
R_phe + Base Structures	Prop.	$3.78 \times 10^{6}$	$2.02 \times 10^{4}$	$1.74 \times 10^{4}$	$2.59$	$3.17$	$4.12 \times 10^{- 1}$	$1.73 \times 10^{- 1}$	$1.37 \times 10^{1}$
Threonine	Orig.	$8.36 \times 10^{6}$	$2.85 \times 10^{4}$	$2.45 \times 10^{4}$	$2.58$	$3.06$	$4.19 \times 10^{- 1}$	$6.11 \times 10^{- 2}$	$1.39 \times 10^{1}$
R_thr + Base Structures	Prop.	$1.06 \times 10^{5}$	$3.64 \times 10^{3}$	$3.14 \times 10^{3}$	$2.58$	$3.25$	$4.18 \times 10^{- 1}$	$2.50 \times 10^{- 1}$	$1.37 \times 10^{1}$
Tryptophan	Orig.	$9.24 \times 10^{7}$	$1.13 \times 10^{5}$	$9.82 \times 10^{4}$	$2.61$	$3.26$	$3.93 \times 10^{- 1}$	$2.65 \times 10^{- 1}$	$1.31 \times 10^{1}$
R_trp + Base Structures	Prop.	$1.49 \times 10^{7}$	$4.02 \times 10^{4}$	$3.56 \times 10^{4}$	$2.65$	$3.20$	$3.46 \times 10^{- 1}$	$2.02 \times 10^{- 1}$	$1.15 \times 10^{1}$
Valine	Orig.	$9.82 \times 10^{6}$	$4.02 \times 10^{4}$	$3.53 \times 10^{4}$	$2.63$	$3.27$	$3.66 \times 10^{- 1}$	$2.68 \times 10^{- 1}$	$1.22 \times 10^{1}$
R_val + Base Structures	Prop.	$3.98 \times 10^{5}$	$7.21 \times 10^{3}$	$6.27 \times 10^{3}$	$2.60$	$3.18$	$3.95 \times 10^{- 1}$	$1.80 \times 10^{- 1}$	$1.31 \times 10^{1}$
Arginine	Orig.	$4.36 \times 10^{7}$	$8.01 \times 10^{4}$	$7.13 \times 10^{4}$	$2.67$	$3.19$	$3.28 \times 10^{- 1}$	$1.89 \times 10^{- 1}$	$1.23 \times 10^{1}$
R_arg + Base Structures	Prop.	$5.47 \times 10^{6}$	$2.85 \times 10^{4}$	$2.44 \times 10^{4}$	$2.57$	$3.16$	$4.28 \times 10^{- 1}$	$1.59 \times 10^{- 1}$	$1.42 \times 10^{1}$
Cysteine	Orig.	$6.19 \times 10^{6}$	$2.85 \times 10^{4}$	$2.47 \times 10^{4}$	$2.60$	$3.03$	$3.95 \times 10^{- 1}$	$3.12 \times 10^{- 2}$	$1.32 \times 10^{1}$
R_cys + Base Structures	Prop.	$1.56 \times 10^{5}$	$5.12 \times 10^{3}$	$4.48 \times 10^{3}$	$2.62$	$3.06$	$3.80 \times 10^{- 1}$	$5.74 \times 10^{- 2}$	$1.26 \times 10^{1}$
Glutamine	Orig.	$1.83 \times 10^{7}$	$5.67 \times 10^{4}$	$4.99 \times 10^{4}$	$2.64$	$3.03$	$3.61 \times 10^{- 1}$	$3.31 \times 10^{- 2}$	$1.20 \times 10^{1}$
R_gln + Base Structures	Prop.	$8.73 \times 10^{5}$	$1.02 \times 10^{4}$	$9.18 \times 10^{3}$	$2.71$	$3.23$	$2.91 \times 10^{- 1}$	$2.25 \times 10^{- 1}$	$9.65$
Asparagine	Orig.	$1.13 \times 10^{7}$	$4.02 \times 10^{4}$	$3.46 \times 10^{4}$	$2.59$	$3.12$	$4.15 \times 10^{- 1}$	$1.23 \times 10^{- 1}$	$1.38 \times 10^{1}$
R_asn + Base Structures	Prop.	$3.45 \times 10^{5}$	$7.21 \times 10^{3}$	$6.38 \times 10^{3}$	$2.65$	$3.18$	$3.48 \times 10^{- 1}$	$1.85 \times 10^{- 1}$	$1.15 \times 10^{1}$
Tyrosine	Orig.	$4.85 \times 10^{7}$	$8.01 \times 10^{4}$	$7.12 \times 10^{4}$	$2.67$	$3.31$	$3.32 \times 10^{- 1}$	$3.12 \times 10^{- 1}$	$1.25 \times 10^{1}$
R_tyr + Base Structures	Prop.	$4.32 \times 10^{6}$	$2.85 \times 10^{4}$	$2.42 \times 10^{4}$	$2.55$	$3.10$	$4.47 \times 10^{- 1}$	$1.02 \times 10^{- 1}$	$1.49 \times 10^{1}$
Serine	Orig.	$4.53 \times 10^{6}$	$2.85 \times 10^{4}$	$2.54 \times 10^{4}$	$2.67$	$3.15$	$3.29 \times 10^{- 1}$	$1.54 \times 10^{- 1}$	$1.09 \times 10^{1}$
R_ser + Base Structures	Prop.	$9.70 \times 10^{4}$	$3.64 \times 10^{3}$	$3.14 \times 10^{3}$	$2.58$	$3.26$	$4.21 \times 10^{- 1}$	$2.63 \times 10^{- 1}$	$1.38 \times 10^{1}$
Glycine	Orig.	$1.16 \times 10^{6}$	$1.43 \times 10^{4}$	$1.22 \times 10^{4}$	$2.56$	$3.10$	$4.43 \times 10^{- 1}$	$9.58 \times 10^{- 2}$	$1.47 \times 10^{1}$
R_gly + Base Structures	Prop.	$5.60 \times 10^{4}$	$2.59 \times 10^{3}$	$2.37 \times 10^{3}$	$2.74$	$3.05$	$2.57 \times 10^{- 1}$	$4.76 \times 10^{- 2}$	$8.42$
Aspartic_acid	Orig.	$1.05 \times 10^{7}$	$4.02 \times 10^{4}$	$3.68 \times 10^{4}$	$2.75$	$3.12$	$2.53 \times 10^{- 1}$	$1.15 \times 10^{- 1}$	$8.42$
R_asp + Base Structures	Prop.	$4.31 \times 10^{5}$	$7.21 \times 10^{3}$	$6.53 \times 10^{3}$	$2.71$	$3.07$	$2.86 \times 10^{- 1}$	$7.44 \times 10^{- 2}$	$9.48$
Glutamic_acid	Orig.	$1.72 \times 10^{7}$	$5.67 \times 10^{4}$	$4.93 \times 10^{4}$	$2.61$	$3.32$	$3.93 \times 10^{- 1}$	$3.19 \times 10^{- 1}$	$1.31 \times 10^{1}$
R_glu + Base Structures	Prop.	$1.22 \times 10^{6}$	$1.43 \times 10^{4}$	$1.22 \times 10^{4}$	$2.56$	$3.28$	$4.36 \times 10^{- 1}$	$2.80 \times 10^{- 1}$	$1.45 \times 10^{1}$
Proline	Orig.	$8.37 \times 10^{6}$	$2.85 \times 10^{4}$	$2.55 \times 10^{4}$	$2.69$	$3.22$	$3.12 \times 10^{- 1}$	$2.24 \times 10^{- 1}$	$1.04 \times 10^{1}$
R_pro + Base Structures	Prop.	$1.29 \times 10^{5}$	$3.64 \times 10^{3}$	$3.21 \times 10^{3}$	$2.64$	$3.30$	$3.64 \times 10^{- 1}$	$3.00 \times 10^{- 1}$	$1.19 \times 10^{1}$
Glucagon	Orig.	$4.33 \times 10^{48}$	$2.15 \times 10^{25}$	$1.84 \times 10^{25}$	$2.57$	$3.28$	$4.31 \times 10^{- 1}$	$2.79 \times 10^{- 1}$	$1.44 \times 10^{1}$
Amino acids – Glucagon	Prop.	$5.02 \times 10^{8}$	$2.25 \times 10^{5}$	$1.99 \times 10^{5}$	$2.65$	$3.08$	$3.52 \times 10^{- 1}$	$8.24 \times 10^{- 2}$	$1.17 \times 10^{1}$
Oxytocin	Orig.	$8.85 \times 10^{17}$	$1.01 \times 10^{10}$	$8.56 \times 10^{9}$	$2.55$	$3.31$	$4.50 \times 10^{- 1}$	$3.07 \times 10^{- 1}$	$1.50 \times 10^{1}$
Amino acids – Oxytocin	Prop.	$1.31 \times 10^{8}$	$1.13 \times 10^{5}$	$1.03 \times 10^{5}$	$2.74$	$3.19$	$2.65 \times 10^{- 1}$	$1.95 \times 10^{- 1}$	$8.82$
Vasopressin	Orig.	$7.81 \times 10^{31}$	$8.20 \times 10^{16}$	$7.10 \times 10^{16}$	$2.60$	$3.28$	$4.02 \times 10^{- 1}$	$2.80 \times 10^{- 1}$	$1.34 \times 10^{1}$
Amino acids – Vasopressin	Prop.	$1.76 \times 10^{8}$	$1.60 \times 10^{5}$	$1.43 \times 10^{5}$	$2.68$	$3.31$	$3.17 \times 10^{- 1}$	$3.08 \times 10^{- 1}$	$1.06 \times 10^{1}$
Angiotensin II	Orig.	$2.88 \times 10^{18}$	$2.01 \times 10^{10}$	$1.75 \times 10^{10}$	$2.61$	$3.12$	$3.91 \times 10^{- 1}$	$1.24 \times 10^{- 1}$	$1.30 \times 10^{1}$
Amino acids – Angiotensin II	Prop.	$1.93 \times 10^{8}$	$1.60 \times 10^{5}$	$1.41 \times 10^{5}$	$2.65$	$3.06$	$3.47 \times 10^{- 1}$	$5.81 \times 10^{- 2}$	$1.16 \times 10^{1}$
Kyotorphin	Orig.	$4.41 \times 10^{9}$	$8.95 \times 10^{5}$	$7.93 \times 10^{5}$	$2.66$	$3.10$	$3.41 \times 10^{- 1}$	$9.56 \times 10^{- 2}$	$1.14 \times 10^{1}$
Amino acids – Kyotorphin	Prop.	$8.84 \times 10^{7}$	$1.13 \times 10^{5}$	$9.73 \times 10^{4}$	$2.58$	$3.16$	$4.18 \times 10^{- 1}$	$1.59 \times 10^{- 1}$	$1.39 \times 10^{1}$
Metionina encefalina	Orig.	$3.44 \times 10^{12}$	$2.00 \times 10^{7}$	$1.83 \times 10^{7}$	$2.75$	$3.28$	$2.52 \times 10^{- 1}$	$2.83 \times 10^{- 1}$	$8.40$
Amino acids – Metionina encefalina	Prop.	$1.03 \times 10^{8}$	$1.13 \times 10^{5}$	$1.02 \times 10^{5}$	$2.71$	$3.30$	$2.93 \times 10^{- 1}$	$2.97 \times 10^{- 1}$	$9.77$
Leucina encefalina	Orig.	$2.24 \times 10^{12}$	$2.00 \times 10^{7}$	$1.83 \times 10^{7}$	$2.74$	$3.03$	$2.58 \times 10^{- 1}$	$2.53 \times 10^{- 2}$	$8.62$
Amino acids – Leucina encefalina	Prop.	$1.01 \times 10^{8}$	$1.13 \times 10^{5}$	$1.03 \times 10^{5}$	$2.73$	$3.19$	$2.68 \times 10^{- 1}$	$1.85 \times 10^{- 1}$	$8.93$
Tuftsin	Orig.	$5.54 \times 10^{11}$	$1.00 \times 10^{7}$	$8.93 \times 10^{6}$	$2.67$	$3.16$	$3.31 \times 10^{- 1}$	$1.56 \times 10^{- 1}$	$1.10 \times 10^{1}$
Amino acids – Tuftsin	Prop.	$8.22 \times 10^{7}$	$1.13 \times 10^{5}$	$1.03 \times 10^{5}$	$2.74$	$3.09$	$2.62 \times 10^{- 1}$	$9.37 \times 10^{- 2}$	$8.74$
Opiorfina	Orig.	$1.19 \times 10^{14}$	$1.13 \times 10^{8}$	$9.62 \times 10^{7}$	$2.56$	$3.06$	$4.38 \times 10^{- 1}$	$6.12 \times 10^{- 2}$	$1.46 \times 10^{1}$
Amino acids – Opiorfina	Prop.	$1.42 \times 10^{8}$	$1.60 \times 10^{5}$	$1.37 \times 10^{5}$	$2.58$	$3.13$	$4.16 \times 10^{- 1}$	$1.30 \times 10^{- 1}$	$1.38 \times 10^{1}$
Angiotensina IV	Orig.	$1.26 \times 10^{15}$	$4.49 \times 10^{8}$	$3.82 \times 10^{8}$	$2.55$	$3.32$	$4.47 \times 10^{- 1}$	$3.23 \times 10^{- 1}$	$1.49 \times 10^{1}$
Amino acids – Angiotensina IV	Prop.	$1.41 \times 10^{8}$	$1.60 \times 10^{5}$	$1.39 \times 10^{5}$	$2.61$	$3.13$	$3.88 \times 10^{- 1}$	$1.26 \times 10^{- 1}$	$1.29 \times 10^{1}$
Neurotensina	Orig.	$2.20 \times 10^{26}$	$1.62 \times 10^{14}$	$1.41 \times 10^{14}$	$2.63$	$3.19$	$3.75 \times 10^{- 1}$	$1.88 \times 10^{- 1}$	$1.25 \times 10^{1}$
Amino acids – Neurotensina	Prop.	$3.12 \times 10^{8}$	$2.25 \times 10^{5}$	$1.95 \times 10^{5}$	$2.60$	$3.25$	$4.00 \times 10^{- 1}$	$2.47 \times 10^{- 1}$	$1.33 \times 10^{1}$
Bradicinina	Orig.	$4.43 \times 10^{18}$	$2.01 \times 10^{10}$	$1.82 \times 10^{10}$	$2.72$	$3.14$	$2.82 \times 10^{- 1}$	$1.39 \times 10^{- 1}$	$9.40$
Amino acids – Bradicinina	Prop.	$1.86 \times 10^{8}$	$1.60 \times 10^{5}$	$1.39 \times 10^{5}$	$2.62$	$3.33$	$3.82 \times 10^{- 1}$	$3.32 \times 10^{- 1}$	$1.27 \times 10^{1}$
Angiotensina I	Orig.	$3.84 \times 10^{21}$	$6.38 \times 10^{11}$	$5.54 \times 10^{11}$	$2.60$	$3.33$	$3.98 \times 10^{- 1}$	$3.29 \times 10^{- 1}$	$1.33 \times 10^{1}$
Amino acids – Angiotensina I	Prop.	$2.33 \times 10^{8}$	$1.60 \times 10^{5}$	$1.41 \times 10^{5}$	$2.66$	$3.10$	$3.42 \times 10^{- 1}$	$1.03 \times 10^{- 1}$	$1.14 \times 10^{1}$
Gastrin-14	Orig.	$7.03 \times 10^{23}$	$1.02 \times 10^{13}$	$8.71 \times 10^{12}$	$2.57$	$3.18$	$4.27 \times 10^{- 1}$	$1.81 \times 10^{- 1}$	$1.42 \times 10^{1}$
Amino acids – Gastrin-14	Prop.	$2.17 \times 10^{8}$	$1.60 \times 10^{5}$	$1.44 \times 10^{5}$	$2.71$	$3.12$	$2.87 \times 10^{- 1}$	$1.19 \times 10^{- 1}$	$9.58$
GLU_CYS_GLY	Orig.	$5.47 \times 10^{9}$	$8.95 \times 10^{5}$	$7.64 \times 10^{5}$	$2.56$	$3.11$	$4.41 \times 10^{- 1}$	$1.14 \times 10^{- 1}$	$1.47 \times 10^{1}$
Amino acids – GLU_CYS_GLY	Prop.	$2.46 \times 10^{7}$	$5.67 \times 10^{4}$	$5.20 \times 10^{4}$	$2.75$	$3.03$	$2.48 \times 10^{- 1}$	$3.48 \times 10^{- 2}$	$8.28$
ALA_HIS	Orig.	$3.01 \times 10^{8}$	$2.25 \times 10^{5}$	$2.03 \times 10^{5}$	$2.71$	$3.22$	$2.94 \times 10^{- 1}$	$2.17 \times 10^{- 1}$	$9.79$
Amino acids – ALA_HIS	Prop.	$2.66 \times 10^{7}$	$5.67 \times 10^{4}$	$4.89 \times 10^{4}$	$2.58$	$3.18$	$4.15 \times 10^{- 1}$	$1.83 \times 10^{- 1}$	$1.38 \times 10^{1}$
PRO_GLY_PRO	Orig.	$1.87 \times 10^{9}$	$4.49 \times 10^{5}$	$3.81 \times 10^{5}$	$2.54$	$3.04$	$4.56 \times 10^{- 1}$	$3.95 \times 10^{- 2}$	$1.52 \times 10^{1}$
Amino acids – PRO_GLY_PRO	Prop.	$1.79 \times 10^{7}$	$5.67 \times 10^{4}$	$5.13 \times 10^{4}$	$2.72$	$3.11$	$2.85 \times 10^{- 1}$	$1.12 \times 10^{- 1}$	$9.48$
GLY_HIS_LYS	Orig.	$1.44 \times 10^{10}$	$1.26 \times 10^{6}$	$1.13 \times 10^{6}$	$2.69$	$3.31$	$3.08 \times 10^{- 1}$	$3.12 \times 10^{- 1}$	$1.03 \times 10^{1}$
Amino acids – GLY_HIS_LYS	Prop.	$4.89 \times 10^{7}$	$8.01 \times 10^{4}$	$7.20 \times 10^{4}$	$2.70$	$3.10$	$3.03 \times 10^{- 1}$	$9.93 \times 10^{- 2}$	$1.01 \times 10^{1}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Atchade-Adelomou, P.; Coronas Sala, L. A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems. Electronics 2025, 14, 2601. https://doi.org/10.3390/electronics14132601

AMA Style

Atchade-Adelomou P, Coronas Sala L. A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems. Electronics. 2025; 14(13):2601. https://doi.org/10.3390/electronics14132601

Chicago/Turabian Style

Atchade-Adelomou, Parfait, and Laia Coronas Sala. 2025. "A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems" Electronics 14, no. 13: 2601. https://doi.org/10.3390/electronics14132601

APA Style

Atchade-Adelomou, P., & Coronas Sala, L. (2025). A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems. Electronics, 14(13), 2601. https://doi.org/10.3390/electronics14132601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Fragmentation and Recombination Strategy

3.2. Modeling Based on Experimental Data

3.2.1. Linear Model for Qubits

3.2.2. Log-Linear Robust Model for Qubits

3.2.3. Exponential Model for Hamiltonian Coefficients

3.2.4. Confidence Intervals

3.2.5. Error Metrics

3.3. Estimation of Toffoli Gate Count

3.4. Experimental Validation

4. Results

Experimental Validation

5. Discussion

6. Conclusions and Perspectives

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Source Code

Appendix B. T-Gate Count from the Big-O Bound

Appendix C. Per-Fragment Coefficient Determination

Appendix D. Regression Model Analysis for Qubits

Appendix D.1. Model Formulation

Appendix D.2. Evaluation Metrics

Appendix D.3. Confidence Interval (95%) via Delta Method

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI