Review Reports - A Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes a fragment-based scalable quantum protein simulation framework. By systematically decomposing large protein systems into amino acid or small peptide fragments, independent quantum simulations are performed, supplemented by chemically-guided energy-corrected recombination. Regression models are established based on the QMProt dataset to predict quantum resource requirements: a linear model for qubit count based on electron number, and an exponential model for Hamiltonian coefficient scale. Combined with the SelectSwap algorithm to optimize Toffoli gate count, this approach achieves a recombination error below 3% while reducing quantum gate resource requirements by up to 20 orders of magnitude, providing a scalable pathway for quantum simulation of biological macromolecules.

Improvements:

Hamiltonian Coefficient Prediction:While an exponential modelis used to predict the number of Hamiltonian coefficients, Appendix I indicates significant prediction error (CV = 877.5%). Furthermore, the risk of extrapolating this model to extreme scales (e.g., 1048coefficients) is not sufficiently discussed.

Suggestion:Supplement the analysis with model uncertainty quantification. For instance, at the scale of glucagon, does the 95% confidence interval remain practically useful? Compare the performance of alternative models (e.g., polynomial fits or machine learning methods) on large systems to validate the current model's generalizability.

2.Recombination Error Scaling:Table 1 shows recombination errors of only 0.005% for small peptide fragments (<200 electrons), but this rises to 2.8% for larger molecules like glucagon. The paper does not explicitly explain the cause of this error growth, nor does it adequately discuss whether corrections for inter-fragment interactions (e.g., long-range electrostatics, hydrogen bonding) are sufficient.

Suggestion:Quantify the proportional contribution of the ΔEcoupling correction term at different scales to identify the primary source(s) of error.

Suggestion:Supplement the strategy with a multi-level fragmentation approach (e.g., secondary fragmentation for critical residue clusters) and validate its effectiveness in reducing error for large molecules.

3.Toffoli Gate Estimation & Benchmarking:The coefficients (C1=3, C2=3) in the key Toffoli gate estimation formula (Eq. 15) lack experimental calibration and rely solely on theoretical literature [14].

Suggestion:Provide specific parameters of the SelectSwap algorithm used in fragment simulations, such as the impact of the selection threshold E_select on gate count.

Suggestion:Add resource comparisons (e.g., gate count / qubit number for identical fragments) with benchmark algorithms like Double Factorization to highlight the advantageous scenarios of the proposed method.

4.The selection basis for the constants C1=3 and C2=3 in formula (A15) in the article is not fully explained, which may affect the credibility of the results. If there is literature support, please add it or provide theoretical basis.

5.The relative error of 2-3% for macromolecules (such as Glucagon) in Table 1 has not been compared with classical methods (such as FMO), which cannot reflect quantum advantages.

6.The QMProt dataset only mentions 45 molecules, but does not specify the specific amino acid/peptide composition and electron number distribution.

7.Both Figure 1 and Figure 3 show the increasing trend of coefficients with the number of electrons, and the expression of confidence intervals is repeated.

8.The coupling energy in equation (1) is mixed with the Δ EIJ Δ EIJ symbol in equation (3).

9.Appendix G only provides GitHub links and does not specify code licenses, dependency libraries, or example inputs.

Comments for author File: Comments.pdf

Author Response

Response to Reviewer

We sincerely thank the reviewers and editors for their time, careful reading, and constructive feedback. Their insightful comments have significantly improved the clarity, rigor, and overall quality of our manuscript titled:

“Quantum Strategy for the Simulation of Large Proteins: From Fragmentation in Small Proteins to Scalability in Complex Systems.”

In this response, we address each comment point-by-point. For clarity, the reviewers’ comments are reproduced in italic font, followed by our detailed replies in regular font. All changes made to the manuscript in response to the comments are clearly noted, and page numbers refer to the revised version of the manuscript unless otherwise specified.

We hope that the revised manuscript meets the expectations of the reviewers and the editorial team. We remain available for any further clarifications or adjustments if necessary.

Reviewer Comment 1:
The exponential model shows high prediction error (CV = 877.5%) and extrapolation risks are not sufficiently discussed. Suggests uncertainty quantification and comparison with alternative models.

Author Response:
Thank you for this valuable observation. Upon closer examination of our original dataset, we discovered several data-entry errors that affected the regression analysis. Specifically, the electron count for the leucine radical was incorrectly recorded as 14 instead of 58, and similar misreporting affected the arginine and tyrosine radicals. These errors introduced significant outliers, which inflated the variance and led to unreliable model metrics such as the reported CV = 877.5%.

To address this, we corrected the dataset and implemented two robust regression models—Huber and Theil–Sen—to improve resistance to outliers and increase generalizability. The results are presented in Table III (page 10) and show substantially improved statistical performance. For example, the standard deviation of the log-transformed residuals is consistently below 0.4 across all molecular size segments.

Additionally, we complemented the exponential regression with local interpolation models (Figure 6) and 95% confidence intervals (Figure 5) to better capture model uncertainty. These steps significantly improve predictive robustness while mitigating the risk of over-extrapolation.

Reviewer Comment 2:
Table 1 shows recombination errors of only 0.005% for small peptide fragments (<200 electrons), but this rises to 2.8% for larger molecules like glucagon. The paper does not explicitly explain the cause of this error growth, nor does it adequately discuss whether corrections for inter-fragment interactions (e.g., long-range electrostatics, hydrogen bonding) are sufficient. Suggests quantifying the proportional contribution of the ΔE_coupling correction term at different scales and supplementing the strategy with a multi-level fragmentation approach.

Author Response:
We thank the reviewer for highlighting this important point. In the revised manuscript, we have substantially rewritten the introduction to Section III (Methodology), explicitly reintroducing the fragmentation equation and clarifying the assumptions and trade-offs inherent to our recombination scheme (see Equation (5), page 3). This formulation builds upon the fragmentation protocol developed in our prior work [33], extending it to support scalable quantum simulations.

We emphasize that our framework applies fixed energy corrections at cutting points based on a chemically-guided capping strategy (e.g., adding methane for methyl caps or subtracting water in peptide bonds), which avoids the need for iterative many-body or self-consistent inter-fragment coupling. These correction values are precomputed and consistently reused, allowing for significant computational savings while maintaining chemical consistency.

However, as the number of fragments increases, so does the cumulative effect of fixed approximation errors. We now explicitly explain in both the Methodology and Discussion sections that this is the primary source of recombination error growth for large systems such as glucagon.

Although we do not yet implement a multi-level fragmentation scheme, we acknowledge its potential and have included it as a direction for future improvement. We also note that this framework is expected to scale favorably over time: once low-depth quantum algorithms are able to simulate each fragment directly (e.g., without Hartree–Fock precomputation), the compounded error from classical approximations will diminish accordingly.

Reviewer Comment 3:
The coefficients (C₁ = 3, C₂ = 3) in the key Toffoli gate estimation formula (Eq. 15) lack experimental calibration and rely solely on theoretical literature [14]. Suggests providing specific parameters of the SelectSwap algorithm used in fragment simulations and comparing with benchmark algorithms.

Author Response:
We are sincerely grateful to the reviewer for raising this point, which led us to substantially enhance the technical depth and structure of the manuscript. In response, we conducted an in-depth review of the state of the art and learned—through technical discussions with members of the quantum computing community and private forums—that Xanadu has been actively developing a resource estimation module for circuit-level cost analysis, accessible via the pennylane.labs.resource_estimation interface.

Although this tool is still under development and not yet part of the stable PennyLane release, it is publicly available via Xanadu’s GitHub repository. Following the reviewer’s suggestion, we integrated this tool into our pipeline to simulate the qubitization circuits of each fragment and to extract empirical Toffoli gate counts.

The results of this benchmarking are presented in Table IV (page 11), where we compare the analytical estimates from Eq. (20) (assuming C₁ = C₂ = 3) against the empirical Toffoli counts returned by PennyLane’s simulation engine. The observed discrepancies are summarized in terms of both absolute and relative error.

To give this validation the importance it deserves, we have included it as a dedicated subsection under Section III (Methodology), titled “Experimental Validation.” Additionally, Appendix C (“Per-Fragment Coefficient Determination”) details the methodology used to extract per-fragment values of C1C1 and C2C2, including the precision settings and the aggregation process across fragment types. In the Discussion section, we analyze the implications of these results and highlight their importance in reinforcing the predictive fidelity of our cost model.

This deeper validation process also led us to revise the abstract to reflect the strength of this empirical component. We would again like to thank the reviewer for this crucial and impactful suggestion.

Reviewer Comment 4:
The selection basis for the constants C₁ = C₂ = 3 in formula (A15) in the article is not fully explained, which may affect the credibility of the results. If there is literature support, please add it or provide theoretical basis.

Author Response:
We thank the reviewer for this follow-up remark. As discussed in our response to Comment 3, we have incorporated a dedicated “Experimental Validation” subsection into the Methodology section. This includes a comprehensive comparison between theoretical and empirical Toffoli counts, as well as the extraction of fitted values for C1C1 and C2C2 across all fragments. These results are presented in Table IV (page 11) and further detailed in Appendix C. Given the scope of these additions, we consider this point fully addressed through the extended treatment provided in response to Comment 3.

Reviewer Comment 5:
The relative error of 2–3% for macromolecules (such as Glucagon) in Table 1 has not been compared with classical methods (such as FMO), which cannot reflect quantum advantages.

Author Response:
We thank the reviewer for this observation. As discussed in our response to Comment 2, the main source of recombination error in our current framework stems from the classical approximation methods used to compute fragment energies—namely, the reliance on Hartree–Fock calculations. Our approach deliberately avoids explicit inter-fragment interaction calculations, focusing instead on a scalable fragmentation strategy supported by fixed chemical corrections.

We believe the real advantage of this approach will emerge as logical qubit quality improves, enabling direct execution of quantum algorithms such as Quantum Phase Estimation (QPE) or Qubitization for each fragment, rather than relying on classical precomputations. This would significantly reduce the cumulative error introduced by Hartree–Fock approximations, bringing the accuracy of the full-molecule energy reconstruction closer to exact quantum expectations.

While today’s hardware is not yet capable of executing such routines for full proteins, our architecture is designed with this trajectory in mind. We therefore view this work as laying the foundation for a quantum-native fragmentation scheme that will become increasingly powerful as fault-tolerant quantum resources become available.

Reviewer Comment 6:
The QMProt dataset only mentions 45 molecules, but does not specify the specific amino acid/peptide composition and electron number distribution.

Author Response:
We thank the reviewer for this observation. In this work, we chose not to reproduce the detailed listings of amino acid compositions or electron counts for each molecule in the QMProt dataset, as such information is fully documented in our prior publication [33], which is properly cited.

Our editorial intent was to keep the manuscript focused on the methodological and analytical contributions of this study, rather than reiterating previously published dataset descriptions. Nonetheless, key numerical details necessary to follow and replicate our analysis are included in Tables 1 and 2, which report per-molecule values such as electron counts, coefficient growth, and resource scaling across peptide fragments. These tables serve to anchor the discussion and support transparency without diverting from the core contributions of this paper.

Should the editor request it, we are willing to include additional dataset information in an appendix or supplementary material.

Reviewer Comment 7:
The figure 1 in the article is of little significance and can be deleted or replaced with other figures.

Author Response:
We sincerely thank the reviewer for this helpful suggestion. Following the comment, we have removed the original Figure 1 and reorganized the numbering of all subsequent figures accordingly. This adjustment has allowed us to emphasize more relevant visual content that directly supports the main contributions of the paper. We appreciate the reviewer’s guidance in improving the focus and clarity of the manuscript.

Reviewer Comment 8:
There is a missing link between the discussion on Eqs. (3)–(5) and the methodology description, which may confuse readers.

Author Response:
We thank the reviewer for pointing this out. After a careful review of the manuscript, we believe that the equations in question—Eqs. (3)–(5)—are properly contextualized within the “Related Work” section, where we discuss prior analytical models and theoretical cost formulations.

Nonetheless, we are open to improving the clarity of the manuscript. If the reviewer could kindly indicate the specific location where the linkage seems unclear or where a transitional explanation might be helpful, we would be glad to revise the text accordingly in the final version.

Reviewer Comment 9:
Appendix G only provides GitHub links and does not specify code licenses, dependency libraries, or example inputs.

Author Response:
We thank the reviewer for raising this point. The source code used in our empirical analysis is publicly available on GitHub. The license type is clearly specified in the repository’s README file and follows the terms of the MIT License, which ensures open access and permissive use.

Additionally, all software dependencies—including the development version of the PennyLane resource estimation module—are listed in the same README, allowing full reproducibility of the experiments.

Should the editor or reviewer consider it necessary, we can include this information explicitly in a footnote or an appendix.

Please let us know if further clarifications or additional materials are required. Once again, we are grateful for your thoughtful comments and the opportunity to improve our manuscript.

Sincerely,
The Authors

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In this work, the authors presented a strategy of simulating proteins, combining many-body expansion and hybrid quantum classical approaches. As a chemist, I have some concerns before I could recommend publication, and I welcome the authors to correct me if I made any mistake:

In line 128-131, how are the number of coefficients determined? What Hamiltonian are the authors using? If I use second quantization Hamiltonian, with a smaller active space in my electronic structure method, there shouldn’t be that many coefficients. Are the authors considering the full CI space? With such a small basis set like STO-3G, there’s no point to use FCI.

In Figure 2, can the authors explain why there’s an outlier where there are ~34 electrons but only ~15 qubits? Usually if second quantization is applied, the number of qubits will be 2x the number of the orbitals the basis set provides. Z2 reduction will only reduce a small part that keeps the spin/particle number fixed. For biomolecules, there’s usually no notable symmetry. What reductions are used in this case?

Why do the authors not consider using ansatz like UCCSD and apply VQE instead? VQE can still be implemented in fault-tolerant quantum device, and the result will still be exact with exact ansatz, and the number of quantum gates could be reduced.

Author Response

Dear Reviewer,

We sincerely thank you for your careful reading and constructive feedback. Your insights — especially regarding the Hamiltonian representation, the interpretation of Figure 2, and the choice of quantum algorithms — have been extremely helpful in refining our manuscript. Below, we respond to each of your comments point by point, and we have updated the manuscript and data accordingly.

Comment 1 (Lines 128–131 – Hamiltonian and coefficient counts):
“In line 128–131, how are the number of coefficients determined? What Hamiltonian are the authors using?”

Author Response:
Thank you for this important and technically insightful question. The Hamiltonians used in our simulations are derived from second-quantized electronic structure calculations at the SCF level (not Full Configuration Interaction, FCI). As explicitly defined in our code, the parameter run_fci=False is set in PySCF. The resulting fermionic Hamiltonians are then mapped to qubit operators using the Jordan–Wigner transformation.

The number of coefficients reported corresponds to the number of Pauli terms in the mapped qubit Hamiltonian.

This approach ensures an accurate count of qubits and the decomposition of the Hamiltonian into measurable operators, which are then formatted for compatibility with PennyLane. The basis set used in all cases is STO-3G, selected for its balance between computational feasibility and consistency across biomolecular fragments.

We would also like to take this opportunity to acknowledge that several outliers in the reported data—due to errors in coefficient extraction and JSON serialization—may have affected the reviewer’s interpretation. These errors have since been corrected in the revised dataset and manuscript. We sincerely apologize for any confusion this may have caused and thank the reviewer again for highlighting these inconsistencies, which helped us identify and resolve the underlying issues.

Comment 2 (Figure 2 – Outlier in qubit/electron ratio):
“In Figure 2, can the authors explain why there's an outlier where there are ∼34 electrons but only ∼15 qubits?”

Author Response:
We thank the reviewer for carefully pointing this out. Upon re-evaluating our dataset, we discovered that the apparent outlier was due to a reporting error in the original JSON output file. Specifically, the isoleucine radical was incorrectly recorded as having only 14 qubits, whereas the correct value is 58. This discrepancy arose from a misalignment in the script used to serialize outputs from PySCF and PennyLane.

We cross-validated the corrected value by comparing with the leucine radical, which shares the same number of electrons and had the correct qubit count. We also reviewed and corrected smaller inconsistencies in the reported qubit counts for tyrosine and arginine.

The updated dataset and revised Figure 2 now reflect accurate values, and the outlier in question has been resolved. We sincerely apologize for this error and any confusion it may have caused during the review process. We are grateful to the reviewer for drawing attention to this issue, which allowed us to improve the quality and integrity of our data presentation.

Comment 3 (Use of QPE vs. VQE):
“Why do the authors not consider using ansatz like UCCSD and apply VQE instead?”

Author Response:
We fully agree that VQE, especially with chemically motivated ansätze such as UCCSD, remains a valuable and practical approach for near-term quantum devices. However, for the purposes of this study, we chose to focus on Quantum Phase Estimation (QPE).

Our rationale is that QPE provides a true eigenvalue of the Hamiltonian (i.e., the ground-state energy), whereas VQE returns a variational approximation based on expectation values. While VQE is more suited to NISQ-era devices, it may suffer from convergence limitations in larger or more complex systems. Since a major objective of this work is to validate the structure of the Hamiltonians and to explore their utility in fault-tolerant settings, QPE offers a more direct and scalable characterization of energy levels, even though it entails higher circuit depth.

Please let us know if further clarifications or additional materials are required. Once again, we are grateful for your thoughtful comments and the opportunity to improve our manuscript.

Sincerely,
The Authors

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I thank the authors for their detailed answers for my concerns. I believe the manuscript can be accepted.