Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction

Liu, Yang; Fu, Ruijiang; Su, Wu-Pei; He, Hongxing

doi:10.3390/biom16020227

Open AccessArticle

Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction

¹

Department of Physics, School of Physical Science and Technology, Ningbo University, Ningbo 315211, China

²

Department of Physics and Texas Center for Superconductivity, University of Houston, Houston, TX 77204, USA

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Biomolecules 2026, 16(2), 227; https://doi.org/10.3390/biom16020227

Submission received: 24 December 2025 / Revised: 21 January 2026 / Accepted: 25 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue State-of-the-Art Protein X-Ray Crystallography)

Download

Browse Figures

Versions Notes

Abstract

Direct methods provide a model-free approach to solving the crystallographic phase problem and deliver unbiased atomic structures. However, conventional iterative projection algorithms such as Hybrid Input–Output (HIO) face two critical challenges: discontinuous density modification at the protein-solvent boundary and inaccurate molecular envelope reconstruction that fails to account for trapped solvent, particularly in crystals with solvent content approaching the lower limits of direct phasing applicability. We introduced four continuous iterative projection algorithms, including our improved continuous version, which implements smooth density modification at protein-solvent interfaces. To address envelope inaccuracy, we developed a two-step refined reconstruction scheme using sequential large-radius and small-radius Gaussian filters to identify trapped solvent molecules within surface cavities and internal channels. This scheme enhances the performance of both continuous and classical algorithms, including HIO, the difference map, and our improved versions. Benchmarking on 28 protein structures (solvent contents 55–78%, resolutions 1.46–3.2 Å, reported R-factor less than 0.22) showed that the refined envelope scheme increased average success rates of continuous algorithms by 45.7% and classical algorithms by 60.5%. The performance of continuous algorithms and improved classical algorithms proved comparable to the well-established HIO algorithm, forming a top-tier group that exceeded other classical algorithms. Integrating a genetic algorithm co-evolution strategy further enhanced average success rates by approximately 2.5-fold and accelerated convergence through population-wide information sharing. Although the success rate correlates with solvent content, our strategy improved success probability at any given solvent level, extending the practical boundaries of direct methods. The high success rate enabled averaging of multiple independent solutions, which reduced mean phase error by approximately 6.83° and yielded atomic models with backbone root-mean-square deviation (RMSD) typically below 0.5 Å relative to structures reported in the Protein Data Bank (PDB). This work introduces novel algorithms, a refined envelope reconstruction methodology, and an effective optimization strategy with genetic algorithm evolution. The complete framework enhances the capability and reliability of direct methods for phasing protein crystals with limited solvent content and provides a toolkit for addressing challenging cases in structural biology.

Keywords:

direct methods; phase problem; iterative projection algorithms; continuous density modification; molecular envelope reconstruction; genetic algorithm; high solvent content; protein crystallography

1. Introduction

Protein crystallography remains the primary method for determining three-dimensional structures of biological macromolecules through X-ray diffraction. While diffraction experiments record structure factor amplitudes, the phase information is inherently lost, creating the phase problem. Conventional structure determination approaches, such as molecular replacement, depend on AlphaFold-predicted models [1,2,3] or homologous structures, which may introduce model bias, while experimental phasing methods require heavy atom derivatization and considerable experimental investment. In contrast, direct methods operate independently of prior structural information, offering an unbiased route to structure determination. This makes them valuable for validating predicted structures and for determining novel protein folds through ab initio approaches. Direct methods iteratively enforce physical constraints in real space, such as uniform bulk solvent density and characteristic protein density distribution, while simultaneously satisfying experimental diffraction data in reciprocal space until convergence to the true electron density is achieved.

Direct phasing methods have evolved from small-molecule techniques to macromolecular approaches. Early methods for small molecules exploited statistical relationships in diffraction data, including Sayre’s equation [4], triplet phase relationships [5], tangent formula [6,7], and minimal principle [8], implemented in software such as SHELX [9]. However, these classical techniques require atomic-resolution data better than 1.2 Å and are limited to structures containing fewer than 1000 non-hydrogen atoms [10], rendering them unsuitable for most protein crystals that diffract to resolutions around 2 Å and contain thousands of non-hydrogen atoms. Traditional methods could only provide low-resolution structural information for macromolecules [11]. The breakthrough came with recognizing that additional physical constraints must be maximally exploited, the most powerful being the nearly constant electron density of bulk solvent regions in protein crystals. This led to the development of iterative projection algorithms (IPAs) designed to leverage this constraint.

IPAs are primary computational tools for the direct solution of macromolecular structures by utilizing constraints in both real and reciprocal space. Seminal contributions include the Hybrid Input–Output (HIO) algorithm by Fienup [12] and the Difference Map (DM) algorithm by Elser [13,14], both applied to ab initio protein structure determination. Millane et al. demonstrated using HIO for phasing periodic structures [15,16,17]. Miao et al. and Marchesini et al. demonstrated using HIO for phasing non-periodic structures [18,19]. Lunin et al. explored the use of density connectivity for searching low-resolution structures [20]. Liu et al. have shown that given a reasonable molecular envelope, HIO can recover atomic-resolution protein structures [21]. Further developments include automated envelope generation for HIO by He and Su [22], the application of DM to proteins and viruses by Lo and Millane [23,24,25], workflows incorporating envelope clustering by Kingston et al. [26,27], and partial structure completion with machine-learning models by Pan et al. [28]. Our previous research introduced ab initio non-crystallographic symmetry (NCS) averaging [29], the transition-region-based THIO algorithm [30], and advanced strategies including resolution-weighted phasing [31] and genetic algorithm co-evolution [32]. Despite these advances, existing IPA-based methods remain largely restricted to structures with solvent content exceeding 65% [32]. However, statistical analysis of the Protein Data Bank (PDB) reveals that only 9.6% of deposited structures exceed this threshold (see Appendix A, Figure A1 for the distribution of solvent content). We categorize crystals as low solvent content (<45%), medium solvent content (45–55%), or high solvent content (>55%), with each category comprising approximately one-third of PDB-reported crystal structures. This distribution highlights the need for methodological improvements to extend applicability toward lower solvent content within the high-solvent range. Two fundamental limitations of current IPAs restrict their broader application to lower-solvent-content and structurally complex systems.

The first limitation arises from discontinuous density modification at the protein-solvent boundary. Traditional IPAs apply different update rules to protein regions, where density is preserved or adjusted through histogram matching [33], versus bulk solvent regions, where strong negative feedback forces density toward zero. This binary treatment imposes an abrupt mathematical discontinuity at the molecular boundary that contradicts the continuous nature of electron density, propagating errors throughout iteration and hindering convergence. The second limitation concerns inaccurately reconstructed molecular envelopes. Starting from random phases, precisely delineating the true protein boundary during early iterations is difficult. Consequently, researchers employ loose, oversized envelopes to ensure complete protein enclosure. However, this approach encompasses substantial volumes of discrete solvent molecules residing within surface cavities and internal channels. Conventional algorithms mistakenly treat these trapped solvent regions as protein density, failing to apply the powerful constant-density constraint these regions would otherwise provide, representing a waste of valuable prior information. When the overall solvent content is already limited, this inefficiency becomes detrimental to successful phase recovery.

To address these challenges, we developed a framework that enhances phase recovery capability and reliability through innovations at three levels. At the algorithmic level, we introduce continuous IPAs, including established algorithms Continuous Hybrid Input–Output (CHIO) [34], Hybrid Projection Reflection (HPR) [35], and Transition Hybrid Input–Output (THIO) [30], along with our improved variant, Modified CHIO (MCHIO). These algorithms establish transition zones between protein and solvent regions, enabling smooth density modification. At the methodological level, we propose a two-step refined envelope reconstruction scheme. This scheme employs a coarse-to-fine strategy using sequential large-radius and small-radius Gaussian filters to identify and recover solvent erroneously included within coarse envelopes. By incorporating recovered solvent into constrained regions, the scheme maximizes utilization of available information. This methodology enhances performance for both continuous and classical IPAs, including HIO, DM, and our improved versions, Modified Difference Map (MDM) and Modified Relaxed Averaged Alternating Reflections (MRAAR). At the strategic level, we performed optimization comparing three phasing strategies: conventional full-resolution, resolution-weighted progressive [31], and genetic algorithm (GA) co-evolution [32]. The GA strategy establishes information-sharing and inheritance mechanisms among multiple independent reconstruction processes, achieving enhancement in global search capability that elevates success rates.

This work makes several contributions. We provide the first comparison and validation of multiple continuous IPAs for protein direct phasing, establishing their advantages and practical performance. We introduce the two-step refined envelope reconstruction scheme as a method of elevating baseline performance across various algorithms. We develop improved iterative algorithms, including MDM, MCHIO, and MRAAR. Through comprehensive testing on 28 protein structures with diverse space groups, solvent contents ranging from 55% to 78%, resolutions spanning 1.46 to 3.2 Å, and PDB-reported R-work below 0.22, we quantify performance gains under different strategies. For structures with favorable conditions, accurate diffraction data, and suitable molecular packing, our methods successfully phase structures and extend applicability to the lower boundary of the high-solvent-content range. Linear regression analysis suggests the potential applicability boundary reaches approximately 55% solvent content, compared with traditional approaches that typically require solvent contents above 65%. This extends the accessible pool from 9.6% (structures > 65%) to potentially 32.3% (structures > 55%) of PDB structures (Appendix A, Figure A1) when data quality and structural characteristics permit, though success rates decrease substantially as solvent content approaches this lower limit.

While the methods presented extend the practical boundaries of direct phasing, we acknowledge their scope and limitations. Success remains dependent on adequate solvent content and sufficient data quality. Although the GA strategy enhances success rates, it requires greater computational resources, and its effectiveness depends on at least one individual achieving convergence. This does not eliminate the dependence of phase recovery on solvent content constraints. For crystals with solvent content approaching or below 55%, integration with higher-quality diffraction data, complementary experimental information, or additional physical constraints will likely remain necessary. Nevertheless, this work provides more powerful computational tools and mechanistic understanding for addressing challenging crystallographic problems, advancing model-free phase retrieval methods toward broader practical applications in structural biology.

2. Materials and Methods

2.1. Overall Workflow and Key Operations in Direct Phasing

Direct phasing in protein crystallography cyclically enforces constraints in both real space and reciprocal space through iterative alternation between these two domains. Figure 1 illustrates the workflow of this process. The algorithm begins with structure factor amplitudes

| F_{obs} (h) |

obtained from experimentally measured diffraction intensities, where

h

denotes the reciprocal lattice indices. Since phase information is unavailable from the experiment, an initial set of random phases

ϕ_{rand}

is generated and combined with the observed amplitudes to construct the initial complex structure factors:

| F_{obs} (h) | \exp [i ϕ_{rand} (h)]

.

The algorithm then enters an iterative loop where constraints are applied to improve the phase estimates. At the k-th iteration, the current electron density

ρ_{k} (r)

is first subjected to reciprocal space constraints. This begins with a Fourier transformation converting the real-space density into reciprocal space:

F_{cal} (h) = \int ρ (r) \exp [2 π i h \cdot r] d r .

(1)

The experimental amplitude constraint is enforced by replacing the calculated amplitude

| F_{cal} (h) |

with the observed amplitude

| F_{obs} (h) |

while preserving the calculated phase

ϕ_{cal} (h)

, yielding an updated structure factor:

F_{cal}^{'} (h) = | F_{obs} (h) | \exp [i ϕ_{cal} (h)] .

(2)

This amplitude replacement ensures that the electron density satisfies the experimental diffraction data at each iteration. The modified structure factor is then transformed back to real space through inverse Fourier transformation, producing an updated density:

ρ^{'} (r) = V^{- 1} \sum_{h} F_{cal}^{'} (h) \exp [- 2 π i h \cdot r],

(3)

where V represents the unit cell volume, and

r

denotes the real-space coordinate vector.

Following reciprocal space operations, constraints are applied in real space through a procedure comprising two components: molecular envelope reconstruction and density modification based on the reconstructed envelope. The envelope reconstruction step aims to identify the approximate spatial region occupied by the protein molecule, the envelope domain S, from the current electron density

ρ^{'} (r)

, which may still contain inaccuracies. This is achieved by computing a weighted average density

w (r)

through Gaussian low-pass filtering of

ρ^{'} (r)

:

w (r) = \int ρ^{'} (r^{'}) G (| r - r^{'} |; σ) d r^{'} .

(4)

In this expression,

G (r; σ)

denotes a Gaussian kernel with standard deviation

σ

that smooths local density fluctuations and emphasizes the contiguous regions of macromolecular structure. Using the estimated solvent content

x_{solv}

derived from the Matthews coefficient [36] based on unit cell parameters and protein molecular weight, a threshold value

w_{cutoff}

is determined from the distribution of

w (r)

values throughout the unit cell. Regions satisfying the condition

w (r) > w_{cutoff}

are then assigned to the protein region S, although this initial envelope remains coarse. Depending on whether a classical or continuous iterative projection algorithm is employed, a transition region T may be defined at the protein-solvent interface. Once the envelope domain S for the current iteration has been established, which may include the transition region T for continuous algorithms, the density modification rules of the chosen algorithm are applied to

ρ^{'} (r)

to generate the updated density

ρ_{k + 1} (r)

for the subsequent iteration. This density modification step is the core mathematical operation of the algorithm and enforces physical constraints: the protein region density should conform to expected statistical distributions, achieved through histogram matching [33], while the solvent region density should approach a constant value [37].

The iterative cycle repeats for several thousand iterations until convergence is achieved. Throughout this process, multiple quality metrics are computed to monitor progress and evaluate convergence. These include the working R-factor

R_{work}

, the free R-factor

R_{free}

, the density deviations

Δ ρ

in protein and bulk solvent regions, and, when a reference structure is available, the mean phase error

Δ ϕ

and the intersection-over-union ratio of the envelope. Successful convergence is characterized by the reduction of both

R_{work}

and

R_{free}

to stable low values, accompanied by improvements in the other monitored metrics. The result of this iterative refinement is an electron density map representing the protein structure. This density map can be utilized by automated model-building software packages such as ARP/wARP version 8.0 [38,39], Buccaneer [40] from the CCP4 software suite version 9 [41], or Phenix (version 1.16-3549) AutoBuild [42,43] to construct and refine the atomic model, producing a three-dimensional protein structure.

2.2. Mathematical Framework and Continuous Modification of Iterative Projection Algorithms

The mathematical foundation of iterative projection algorithms centers on identifying a solution that satisfies two constraint sets. In crystallographic direct phasing, these are the real-space constraints

A

, which encompass the statistical characteristics of protein density distribution and the uniform nature of solvent regions, and the reciprocal-space constraints

B

, which represent the experimentally measured diffraction amplitudes. The objective is to refine the electron density

ρ

through repeated application of projection operators

P_{A}

and

P_{B}

until it converges to a state satisfying both constraint sets.

2.2.1. Partitioned Update Form of Classical Iterative Projection Algorithms

Classical iterative projection algorithms such as the Hybrid Input–Output (HIO) algorithm are implemented using a partitioned update strategy adapted to protein crystals. This approach relies on the premise that at the k-th iteration, a molecular envelope domain

S_{k}

can be estimated from the current electron density

ρ_{k} (r)

. The projection operator

P_{A}

incorporates the following: within this envelope domain

S_{k}

, constraints appropriate to protein density are enforced through histogram matching procedures, while in the region outside

S_{k}

corresponding to the bulk solvent, a constant-density constraint is applied through solvent flattening operations.

The HIO algorithm [12] exemplifies this partitioned approach, with its discretized form for protein crystallography expressed as

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r), & r \in S_{k}, \\ ρ_{k} (r) - β P_{B} ρ_{k} (r), & r \notin S_{k}, \end{matrix}

(5)

where

β

represents the negative feedback factor applied in the solvent region, typically assigned values between 0.7 and 0.9, with

β = 0.75

used throughout this study. In this formulation,

P_{B}

denotes the reciprocal-space projection corresponding to the amplitude replacement operation defined in Equation (2), while

P_{A}

represents the real-space constraint operations that include histogram matching within the protein region and solvent flattening in the solvent region. We have inserted

P_{A}

into the partitioned form of HIO. This formulation reveals the origin of the discontinuity problem: at the envelope boundary, the density update rule abruptly transitions from

P_{A} P_{B} ρ_{k}

inside the protein region to

ρ_{k} - β P_{B} ρ_{k}

in the solvent region, potentially introducing a discontinuous step change in the updated density

ρ_{k + 1}

across this boundary.

2.2.2. Continuous Iterative Projection Algorithms: Introduction of a Transition Region

To address the discontinuity problem in classical algorithms, continuous iterative projection algorithms introduce a transition region

T_{k}

at the k-th iteration, positioned at the interface between the protein region

S_{k}

and the bulk solvent region. This partitioning divides real space into three zones: the protein core region

S_{k}

, the interfacial transition region

T_{k}

, and the bulk solvent region. The objective of these continuous algorithms is to implement a smooth update strategy within the transition region

T_{k}

that provides gradual and continuous modulation between the density modification rules applied in the protein and solvent regions. We implemented and evaluated four continuous algorithms that achieve this objective through different mathematical approaches. We have updated the original forms of those algorithms to make them suitable for protein crystallography.

The Continuous Hybrid Input–Output (CHIO) algorithm [34] extends the framework of HIO by introducing intermediate feedback behavior in the transition region. Its update rule is formulated as

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r), & r \in S_{k}, \\ ρ_{k} (r) - γ P_{B} ρ_{k} (r), & r \in T_{k}, \\ ρ_{k} (r) - β P_{B} ρ_{k} (r), & r \notin S_{k} \cup T_{k}, \end{matrix}

(6)

where the transition feedback parameter

γ

is defined as

γ = (1 - α) / α

. In the original CHIO formulation [34], the parameter

α

is typically set to 0.4, which yields

γ = 1.5

and results in negative feedback that is stronger in the transition region than in the bulk solvent region where

β \approx 0.75

.

The Hybrid Projection Reflection (HPR) algorithm [35] employs a reflection-projection framework and can be expressed in partitioned form as

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r), & r \in S_{k}, \\ ρ_{k} (r) - β_{HPR} P_{B} ρ_{k} (r), & r \notin S_{k}, \end{matrix}

(7)

where

β_{HPR}

is an algorithm-specific parameter set to 0.588 in the original formulation [35]. In typical partitioned implementations, the HPR algorithm applies the same feedback factor uniformly across both the transition region

T_{k}

and the bulk solvent region, with its continuity manifesting through consistent treatment of the entire region outside the protein core

S_{k}

.

Based on our analysis of CHIO and HPR algorithms, we developed the modified CHIO (MCHIO) algorithm to address a limitation. We observed that the relatively strong feedback in CHIO’s transition region, with

γ = 1.5

, might be excessive when

T_{k}

contains portions of protein side-chain density, potentially leading to inappropriate suppression of structural features. To address this, we propose a modified update rule:

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r), & r \in S_{k}, \\ ρ_{k} (r) - γ_{M} P_{B} ρ_{k} (r), & r \in T_{k}, \\ ρ_{k} (r) - β P_{B} ρ_{k} (r), & r \notin S_{k} \cup T_{k}, \end{matrix}

(8)

where

γ_{M}

represents the adjusted feedback factor for the transition region. To achieve more appropriate behavior that is gentler than bulk solvent treatment while effectively leveraging the solvent constraint, we set

γ_{M} = 0.5

throughout this study. This value is chosen to be lower than both

β = 0.75

and

β_{HPR} = 0.588

, thereby implementing a gentler, more conservative density modification in the transition region compared with the bulk solvent. This adjustment aims to avoid excessive suppression of potential protein density features that might be incorrectly assigned to the transition region

T_{k}

, while still effectively applying the solvent-like constraint.

The Transition Hybrid Input–Output (THIO) algorithm, introduced in our previous research [30], implements a density-weighted assignment approach within the transition region. Its updated formulation is

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r), & r \in S_{k}, \\ ω_{k} (r) P_{A} P_{B} ρ_{k} (r) + [1 - ω_{k} (r)] [ρ_{k} (r) - β P_{B} ρ_{k} (r)], & r \in T_{k}, \\ ρ_{k} (r) - β P_{B} ρ_{k} (r), & r \notin S_{k} \cup T_{k}, \end{matrix}

(9)

where the weight function

ω_{k} (r)

takes values in

[0, 1.0]

and is computed based on the weighted average density at each grid point

r

using a small Gaussian kernel radius of approximately

σ = 1.5

Å. This weight represents the local probability that a given position belongs to the protein rather than solvent. Through this linear interpolation scheme, THIO achieves continuous and physically motivated density modulation throughout the transition region, with the update rule smoothly varying based on local density characteristics.

The fundamental concept unifying all these continuous algorithms is the mathematical refinement of electron density update behavior across the molecular boundary to achieve smoother transitions. This enhancement is designed to improve numerical stability and increase the likelihood of convergence when dealing with ambiguous boundaries and limited solvent content. Figure 2 provides a schematic comparison of the update rules employed by these algorithms, illustrating their distinctive characteristics and the progressive refinement of boundary treatment strategies.

2.2.3. Classical Iterative Projection Algorithms and Improved Variants

The two-step refined envelope reconstruction scheme demonstrated in the subsequent section exhibits universal applicability, providing performance enhancement for both continuous and classical iterative projection algorithms. To establish a comprehensive algorithmic framework, we evaluated classical iterative projection algorithms, including HIO, DM, ASR, and RAAR, and developed improved variants, MDM and MRAAR, optimized for protein crystallography. The HIO algorithm has been presented in the previous section, following the update rule in Equation (5).

The Difference Map (DM) algorithm [13] generates two candidate solutions at the k-th iteration:

\begin{matrix} ρ_{k, A} & = P_{A} [(1 + β_{DM}^{- 1}) P_{B} ρ_{k} - β_{DM}^{- 1} ρ_{k}], \end{matrix}

(10)

\begin{matrix} ρ_{k, B} & = P_{B} [(1 - β_{DM}^{- 1}) P_{A} ρ_{k} + β_{DM}^{- 1} ρ_{k}] . \end{matrix}

(11)

These solutions satisfy real-space constraint set

A

and reciprocal-space constraint set

B

, respectively. The DM iteration formula is

ρ_{k + 1} = ρ_{k} + β_{DM} (ρ_{k, A} - ρ_{k, B}),

(12)

where

β_{DM}

is the relaxation parameter (typically 0.5 to 1.0, set to 0.75 in this study) controlling convergence speed and search capability. Convergence is achieved when the two candidate solutions become equal. Considering partitioned updates for protein region

S_{k}

and solvent region, the DM formula can be expressed as

ρ_{k + 1} (r) = \{\begin{matrix} ρ_{k} (r) + β_{DM} [ρ_{k, A} (r) - ρ_{k, B} (r)], & r \in S_{k}, \\ ρ_{k} (r) - β_{DM} ρ_{k, B} (r), & r \notin S_{k} . \end{matrix}

(13)

Experimental results revealed that DM exhibits low success rates for protein phase retrieval, primarily due to insufficient constraint enforcement within the protein region during iteration. To address this, we propose an improved variant that explicitly applies constraint operators

P_{A}

and

P_{B}

to the protein region:

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r) + β_{DM} [ρ_{k, A} (r) - ρ_{k, B} (r)], & r \in S_{k}, \\ ρ_{k} (r) - β_{DM} ρ_{k, B} (r), & r \notin S_{k} . \end{matrix}

(14)

This improved formulation, designated Modified DM (MDM), strengthens constraint application in the protein region and enhances phase retrieval performance.

The Averaged Successive Reflections (ASR) algorithm [44] employs reflection operators

R_{A} = (2 P_{A} - I)

and

R_{B} = (2 P_{B} - I)

, where I is the identity operator. The ASR update rule is

ρ_{k + 1} (r) = \frac{1}{2} (R_{A} R_{B} + I) ρ_{k} (r) .

(15)

Substituting the reflection operators and considering partitioned updates yields

ρ_{k + 1} (r) = \{\begin{matrix} ρ_{k} (r) + (2 P_{A} P_{B} - P_{A} - P_{B}) ρ_{k} (r), & r \in S_{k}, \\ ρ_{k} (r) - P_{B} ρ_{k} (r), & r \notin S_{k} . \end{matrix}

(16)

The Relaxed Averaged Alternating Reflections (RAAR) algorithm [45] introduces a relaxation parameter

β_{RAAR}

(typically 0.95) with the update rule:

ρ_{k + 1} (r) = [\frac{1}{2} β_{RAAR} (R_{A} R_{B} + I) + (1 - β_{RAAR}) P_{B}] ρ_{k} (r) .

(17)

In partitioned form, this becomes

ρ_{k + 1} (r) = \{\begin{matrix} β_{RAAR} ρ_{k} (r) + (2 β_{RAAR} P_{A} P_{B} - β_{RAAR} P_{A} - 2 β_{RAAR} P_{B} + P_{B}) ρ_{k} (r), & r \in S_{k}, \\ β_{RAAR} ρ_{k} (r) - (2 β_{RAAR} - 1) P_{B} ρ_{k} (r), & r \notin S_{k} . \end{matrix}

(18)

Both ASR and RAAR exhibited low success rates in our experiments, attributable to insufficient constraint enforcement in the protein region. We developed an improved variant by setting

β_{RAAR} = 1

, which reduces RAAR to the ASR formulation, then explicitly applying operators

P_{A} P_{B}

to the protein region:

ρ_{k + 1} (r) = \{\begin{matrix} P_{A} P_{B} ρ_{k} (r) + (2 P_{A} P_{B} - P_{A} - P_{B}) ρ_{k} (r), & r \in S_{k}, \\ ρ_{k} (r) - P_{B} ρ_{k} (r), & r \notin S_{k} . \end{matrix}

(19)

This improved algorithm, designated Modified RAAR (MRAAR) or Modified ASR (MASR), enhances performance. This formulation is mathematically equivalent to the Hybrid Difference Map - formula 1 (HDM-f1) algorithm developed in our concurrent work [46], which was derived from a different starting point through modification of the DM framework, representing a convergence of approaches toward optimal algorithm design. The improved classical algorithms, when combined with the two-step refined envelope reconstruction scheme, expand the available toolkit for direct phasing of protein crystals.

2.3. Envelope Reconstruction Strategies: From One-Step Coarse Design to Two-Step Refined Design

The effectiveness of iterative projection algorithms, their capacity to enforce constraints within solvent regions, depends on the accuracy of the reconstructed protein molecular envelope. Traditional envelope reconstruction employs a single-step approach in which the k-th iteration’s electron density

ρ_{k} (r)

is convolved with a Gaussian kernel, usually featuring a large radius such as

σ_{1} \approx 2.5 \sim 4.0

Å, to produce a smoothed weighted average density map

w_{k} (r)

. Based on the estimated solvent content

x_{solv}

derived from the Matthews coefficient [36], a threshold value

w_{cutoff}

is determined from the statistical distribution of

w_{k} (r)

values. Grid points satisfying the condition

w_{k} (r) > w_{cutoff}

are classified as belonging to the protein region, while the remaining volume is the solvent region. This method offers computational simplicity and can rapidly delineate the general molecular shape during early iteration stages. However, it suffers from limitations that become problematic for challenging structures. The large-scale smoothing operation erases fine structural details at the molecular surface, resulting in boundaries that are rough and imprecise. The resulting coarse envelope tends to be over-expanded to ensure complete enclosure of the protein molecule, which incorporates volumes of solvent molecules that are situated within surface crevices, pockets, and internal channels. In classical iterative algorithms, these incorrectly assigned solvent regions are treated as protein density throughout the modification process, failing to exploit the constant-density constraint that these regions would otherwise provide. This inefficient use of available information becomes detrimental when the overall solvent content is limited.

To overcome these deficiencies and maximize the utilization of all available solvent constraints within the crystal unit cell, we developed a two-step refined envelope reconstruction scheme. The concept underlying this scheme is a strategy that proceeds from coarse localization to fine-scale local refinement. This approach provides a more physically rational foundation for defining the transition region in continuous algorithms and functions as a universal technique capable of enhancing the performance of all iterative projection algorithms.

The first step focuses on coarse envelope generation following an approach similar to the traditional single-step method. A Gaussian kernel with a large radius, set to

σ_{1} = 2.5

Å in this study, is applied to smooth the k-th iteration’s electron density

ρ_{k} (r)

, yielding a globally smoothed density distribution

w_{k}^{(1)} (r)

. Using the estimated solvent content

x_{solv}

, a threshold value

w_{cutoff}^{(1)}

is determined to define an initial coarse envelope

M_{coarse}

that serves as the protein candidate region:

M_{coarse} = {r ∣ w_{k}^{(1)} (r) > w_{cutoff}^{(1)}} .

(20)

The objective of this initial step is to capture the protein’s center of mass and overall molecular shape while avoiding artificial fragmentation of the envelope that could arise from localized density fluctuations. By employing a large smoothing kernel, this step ensures robust identification of the main protein volume even when the current density estimate contains noise or systematic errors.

The second step is the refinement stage that distinguishes our scheme from traditional approaches. Here, a second smoothing operation is performed on the same electron density

ρ_{k} (r)

using a Gaussian kernel with a smaller radius, set to

σ_{2} = 1.5

Å in this work, which produces a locally refined density map

w_{k}^{(2)} (r)

that preserves fine-scale density variations. All subsequent operations in this step are confined to the interior of the coarse envelope

M_{coarse}

established in the first step. Within this domain, the values of

w_{k}^{(2)} (r)

at all grid points are ranked by magnitude, and grid points in the lowest approximately 5% (of the asymmetric unit) are identified as transition region

T_{k}

, most likely to represent trapped solvent rather than protein density. This 5% threshold was determined empirically from our benchmark set and proved robust across diverse protein structures. For truly novel structures without homologous or predicted references, this value can be adjusted within a range of 3–7% through systematic trials, as the method shows tolerance to moderate variations in this parameter. The identified low-density regions are then removed from the protein candidate volume to yield refined spatial assignments. We define the refined protein core region as

S_{k} = M_{coarse} - T_{k}

, representing the volume that remains after removing the identified solvent-like regions from the coarse envelope. The refined transition region is defined as

T_{k} = {r \in M_{coarse} ∣ w_{k}^{(2)} (r) is within the lowest \sim 5 % percentile} .

(21)

The bulk solvent region comprises both the external volume lying outside

M_{coarse}

and the internal stripped region

T_{k}

, reclassified based on low local density. The transition region identifies trapped solvent molecules within surface cavities and internal channels, enabling the application of solvent-like constraints during iteration. This enhances convergence for crystals with limited solvent content by maximizing utilization of available constraint information. However, although physically consisting of trapped solvent, the transition region exhibits slightly non-uniform density due to proximity to the protein surface. Negative feedback applied during iteration drives density toward uniform values, deviating from actual surface solvent density. This slightly non-uniform density is incompatible with the strict uniform-density requirements of comprehensive solvent flattening [37] in final-stage refinement. Therefore, in the last 2000 iterations, the transition region is linearly reduced from 5% (of the asymmetric unit) to zero over 1500 iterations, followed by 500 iterations of solvent flattening applied uniformly to all trials. Removing the transition region eliminates density constraint errors from surface-proximal solvent and enables optimal uniform-density enforcement in true bulk solvent regions.

This two-step scheme offers three advantages over traditional single-step methods. First, the small-radius filter is sensitive to local low-density features corresponding to surface cavities and internal channels, enabling accurate identification of trapped solvent. Second, the scheme maximizes constraint utilization by identifying trapped solvent within the coarse envelope. Although the transition region

T_{k}

is spatially adjacent to protein, it exhibits low-density characteristics of solvent. During iteration, this region is treated with solvent constraints, such as negative feedback in continuous algorithms, increasing the effective constraint volume. As discussed above, the transition region should be removed in the final refinement stage to enable strict solvent flattening. Third, for continuous algorithms, the transition region

T_{k}

delineated through refined local density analysis is less likely to contain protein side-chain density compared with regions defined by traditional single-step approaches, improving the purity and effectiveness of applied constraints.

Section 3.3 provides a direct visual comparison using protein structures 3rd5 [47] and 2fg0 [48] as examples, demonstrating that envelopes generated by our two-step method reconstruct finer and more accurate molecular surface details compared with those produced by the traditional one-step approach. Although multi-step refinement with progressively decreasing kernel sizes was tested, we found that the two-step approach represents an optimal balance between envelope accuracy and computational efficiency. Additional refinement steps beyond two did not improve phasing success rates, as both one-step and two-step envelopes represent approximations to the true molecular boundary, and the two-step scheme already provides sufficient accuracy for iterative projection algorithms to converge to correct solutions.

2.4. Phase Retrieval Strategies: From Full-Resolution to Genetic Co-Evolution

We implemented and compared three phasing strategies to search the solution space for correct phases. These strategies optimize data utilization and search organization to enhance success rate and computational efficiency.

2.4.1. Full-Resolution Phasing Strategy

The full-resolution approach represents the fundamental strategy. All available experimental diffraction data, excluding only a reserved free set for cross-validation, participate equally in the reciprocal-space amplitude constraint at every iteration. This strategy applies no preprocessing or weighting to diffraction data, requiring the algorithm to simultaneously satisfy constraints spanning the entire resolution range. The advantages are simplicity and straightforward implementation. However, for high-dimensional optimization problems, requiring immediate fitting of structural information across all spatial scales can create convergence difficulties and entrapment in local minima for structures with limited solvent content.

2.4.2. Resolution-Weighted Progressive Phasing Strategy

The resolution-weighted progressive strategy implements a hierarchical data utilization scheme prioritizing low-resolution diffraction information during initial iterations, then introducing high-resolution data as convergence progresses. This is implemented through a time-dependent weighting function applied to observed structure factor amplitudes:

| F_{obs, w} (h) | = | F_{obs} (h) | \cdot \exp [- 2 {(π σ_{w} S (h))}^{2}],

(22)

where

S (h) = 1 / d (h)

represents the reciprocal of resolution spacing, and

σ_{w}

is a time-varying parameter controlling filter bandwidth. Initially,

σ_{w}

is set to 0.8∼1.0 Å, attenuating high-resolution contributions. As iterations progress,

σ_{w}

is reduced toward zero following a predetermined annealing schedule [31]. This progressive expansion from low to high spatial frequencies follows the logic of first establishing global protein fold before resolving atomic positions, facilitating the establishment of correct low-resolution phase relationships that provide a stable foundation for convergence.

2.4.3. Genetic Algorithm-Enhanced Co-Evolution Phasing Strategy

To overcome limitations of independent random-start searches, we implemented a genetic algorithm co-evolution strategy using population-based intelligence. This strategy treats multiple parallel phase retrieval processes, typically 100 individuals, as an evolutionary population, with each individual’s electron density map representing an organism. By simulating biological evolution through selection, crossover, and mutation, this approach establishes information exchange mechanisms enabling high-quality density features to propagate throughout the population, enhancing global search capability.

The workflow proceeds through interconnected stages. Population initialization generates N individuals, each with random phases ensuring diversity. During independent evolution, all individuals execute a predetermined number of iterations using standard iterative projection algorithms in parallel, employing envelope reconstruction schemes and resolution-weighted strategies. This phase is a local search within each individual’s solution space neighborhood. Periodically, typically every 100 iterations, the algorithm performs evaluation and selection. Individual fitness

f_{i}

is quantified through

f_{i} = max (0, \frac{R_{thres} - R_{work, i}}{R_{thres} - R_{\min}}),

(23)

R_{work} = \frac{\sum_{h \in work} || F_{obs} (h) | - λ | F_{cal} (h) ||}{\sum_{h \in work} | F_{obs} (h) |},

(24)

where

R_{\min}

and

R_{avg}

represent the best and the average

R_{work}

within the population, and

R_{thres} = R_{avg} + (R_{avg} - R_{\min})

. Higher fitness individuals have greater probability of genetic operations.

Genetic operations implement two mechanisms. During crossover, two parents are selected according to fitness-weighted probabilities. The asymmetric unit is divided into spatial blocks, and a chosen subset undergoes density value exchange between parents to generate offspring. Prior to crossover, all density maps undergo rotational and translational alignment to eliminate crystallographic origin and enantiomorphic ambiguities. Mutation operations introduce stochasticity by randomly selecting approximately 1% of grid points in offspring density maps and assigning new random values, introducing variation, and preventing premature convergence. The resulting offspring typically exhibit improved fitness relative to the parent population. Population update then replaces low-fitness individuals with these offspring, forming the next generation. The algorithm then returns to independent evolution for another round of local search, followed by global information exchange. More details, such as elite inheritance and similarity punishment preventing prematurity, can be found in our previous paper [32].

The effectiveness of this strategy derives from synergistic effects between local refinement and global information sharing. Once any individual approaches the correct solution, manifested as a sharp R-factor decrease, high-quality density features disseminate throughout the population via crossover. This collective guidance enables population convergence toward the global optimum at rates exceeding independent random searches. The GA strategy upgrades traditional multi-start independent searching to intelligent multi-start cooperative searching with active information sharing, representing an advancement for enhancing the reliability of direct-method phase retrieval in challenging applications.

2.5. Error Metrics, Missing Reflections, and Model Building

We employed a validation pipeline with quality metrics to monitor convergence, strategies to handle incomplete data, and automated model building from recovered density maps.

2.5.1. Error Metrics

Assessment metrics are organized into two categories: reference-dependent metrics for validation during method development and internal consistency metrics applicable to de novo structure determination. Reference-dependent metrics provide validation when PDB coordinates are available. The mean phase error quantifies the angular deviation between retrieved and true phases:

Δ ϕ = 〈 a r c c o s (c o s (ϕ_{cal} (h) - ϕ_{true} (h))) 〉,

(25)

where

〈 \cdot 〉

denotes averaging over unique reflections in the working set. The envelope intersection-over-union metric assesses spatial accuracy of the reconstructed molecular boundary:

IoU = \frac{| S_{cal} \cap S_{true} |}{| S_{cal} \cup S_{true} |},

(26)

quantifying the overlap ratio between reconstructed envelope domain

S_{cal}

and true envelope domain

S_{true}

.

Internal consistency metrics serve as criteria for evaluating convergence when reference coordinates are unavailable. The working R-factor

R_{work}

, defined in Equation (24), and free R-factor

R_{free}

measure agreement between calculated and experimental amplitudes:

R_{free} = \frac{\sum_{h \in free} || F_{obs} (h) | - λ | F_{cal} (h) ||}{\sum_{h \in free} | F_{obs} (h) |},

(27)

where

R_{free}

is computed using a randomly selected subset (1% of reflections in this study), excluded from all iterative processes as an independent validation dataset. Successful convergence is characterized by reduction of both

R_{work}

and

R_{free}

to stable low values. Density convergence indicators monitor the magnitude of real-space density modifications. For HIO-type algorithms, we compute

〈 | P_{B} ρ (r) | 〉

for grid points

r \notin S

. For MRAAR-type algorithms, we compute

\begin{matrix} Δ ρ & = \{\begin{matrix} 〈 | (2 P_{A} P_{B} - P_{A} - P_{B}) ρ (r) | 〉, & r \in S \\ 〈 | P_{B} ρ (r) | 〉, & r \notin S \end{matrix} \end{matrix}

(28)

where

〈 \cdot 〉

denotes averaging over grid points in each region. As convergence proceeds, these update magnitudes diminish toward zero, indicating stable density satisfying constraints.

2.5.2. Handling of Missing and Weak Diffraction Data

Diffraction datasets contain missing reflections due to various factors: reflections in the free set, low-resolution reflections (below 15 Å), and weak reflections failing signal-to-noise criteria (

| F_{obs} (h) | < 2 σ_{| F_{obs} (h) |}

). To handle such incomplete data, we implemented a dynamic mixing strategy. For missing or weak reflections, calculated amplitude

| F_{cal} (h) |

is blended with experimental values using weight factor

α_{miss}

:

| F_{filled} (h) | = α_{miss} \cdot λ \cdot | F_{cal} (h) | + (1 - α_{miss}) \cdot | F_{obs} (h) |,

(29)

where

λ

is a global scale factor. For missing data,

α_{miss} = 1.0

; for weak data,

α_{miss} = 0.5

. This strategy ensures reciprocal-space data completeness, preventing information loss and projection artifacts from data gaps.

2.5.3. Electron Density Map Post-Processing and Automated Model Building

After convergence, we applied post-processing to enhance density map quality. When multiple solutions converged successfully through GA strategy, we performed spatial alignment and density averaging to reduce noise. The resulting electron density map was used for automated model building, which identifies secondary structure elements, constructs polypeptide backbones, and positions side-chain atoms based on density features and stereochemical constraints. The initial model exhibits over 80% residue completeness and is refined through several iterations using phenix.refine [49], optimizing atomic coordinates and displacement parameters against experimental data to yield the final structural model.

2.6. Test Datasets, Computational Implementation, and Parameter Settings

To ensure robustness, we assembled a benchmark dataset of 28 protein crystal structures with diversity in space group symmetry, solvent content, and diffraction resolution. All diffraction data (

| F_{obs} |

) and reference coordinates were obtained from the Protein Data Bank. The structures comprise proteins containing 826 to 7475 non-hydrogen atoms with 0 to 635 bound water molecules. The dataset includes resolutions from 1.46 Å to 3.2 Å, solvent contents from 55% to 78%, PDB-reported R-work values below 0.22 (ranging from 0.133 to 0.215), and diverse space groups including

C 121

,

C 222_{1}

,

P 2_{1} 2_{1} 2_{1}

,

P 3_{1} 21

,

P 4_{1} 32

,

P 4_{1} 2_{1} 2

,

P 4_{3} 2_{1} 2

,

P 6_{5}

,

P 6_{1} 22

,

H 32

,

I 222

,

I 2_{1} 3

, and

I 4_{1} 22

, ensuring evaluation under varied crystallographic conditions. Diffraction datasets contain 4918 to 76,249 observed reflections, with missing low-resolution reflections ranging from 3 to 236. Detailed descriptions of all test structures are provided in Appendix A, Table A1. Solvent content for each structure was estimated using the matthews_coef program from the CCP4 suite [41], based on the Matthews coefficient [36]. The calculations incorporated unit cell parameters, space group symmetry, and protein molecular weight derived from the amino acid sequence. This estimated solvent content, along with space group information and unit cell parameters, constitutes essential prior knowledge used as constraint information during iterative phasing.

The computational framework was implemented using the Clipper C++ libraries for crystallographic computing [50] combined with MPI (Message Passing Interface) for parallel processing. Key parameters were standardized: electron density maps were discretized on 3D grids with approximately 1.0 Å spacing; Gaussian filter radii were

σ_{1} = 2.5

Å (large-radius) and

σ_{2} = 1.5

Å (small-radius); the refined scheme stripped 5% of grid points in the asymmetric unit from the coarse envelope; the solvent region negative feedback factor was

β = 0.75

; continuous algorithm parameters were

α = 0.4

for CHIO,

β_{HPR} = 0.588

for HPR, and

γ_{M} = 0.5

for MCHIO; GA population size was 100 individuals with genetic operations every 100 iterations, crossover probability 0.5, and mutation probability 0.01.

Real-space histogram matching requires a reference electron density distribution. When available, a homologous structure or AlphaFold-predicted model can serve as a reference; structure factors are computed via phenix.fmodel with bulk solvent correction and resolution-appropriate temperature factors. In computing structure factors from model coordinates, the constant term F(0,0,0) (F000) represents the mean electron density of the unit cell. We adjusted F000 such that bulk solvent regions exhibit zero mean electron density. During density modification iterations, solvent flattening constraints drive bulk solvent densities toward zero through negative feedback, and pre-setting the reference histogram with zero-centered solvent regions ensures consistency between the target distribution and algorithmic behavior.

For optimal histogram generation from simulated diffraction data, temperature factors (B-factors) must be assigned to the atomic model. However, there is no exact theoretical method to determine optimal B-factors for histogram matching applications. Several empirical approaches exist: atomic B-factors can be estimated from the Wilson B-factor [51] calculated from the experimental diffraction data of the unknown structure; alternatively, statistical analysis of deposited PDB structures can provide estimates for both atomic and bulk solvent B-factors [52]. In this work, we employ empirical relationships based on the high-resolution limit

d_{high}

(Å). For bulk solvent regions, the B-factor is approximated as

B_{sol} \approx 25 d_{high} + 25 Å^{2}

. For homologous or AI-predicted protein models, isotropic atomic B-factors are set to

B_{atom} \approx 15 d_{high} + 5 Å^{2}

. These empirical relationships are applicable for typical macromolecular crystallography resolution ranges (

1.0 \leq d_{high} \leq 3.5

Å).

With appropriately configured temperature factors and F000 value, electron density histograms generated from homologous or AI-predicted models closely resemble those from experimentally determined structures. Our tests demonstrate that at identical resolution and with appropriately set parameters, electron density distribution histograms remain remarkably similar even across different protein structures, validating their use as reliable constraint conditions for protein region density in unknown structures. This observation is consistent with the conclusions of Zhang and Main’s seminal work on histogram matching [33]. For benchmarking purposes in this study, we generated reference histograms from deposited PDB coordinates using this temperature factor protocol, which does not affect our conclusions, as the histograms provide statistically representative protein density distributions.

For each test structure, envelope scheme (one-step coarse or two-step refined), phasing strategy (full-resolution, resolution-weighted, or GA-enhanced), and iterative algorithm (10 total), we executed 100 independent phase retrieval attempts from different random phase seeds. Maximum iterations per attempt were 10,000, with early termination upon convergence detection (reduction of

R_{work}

,

R_{free}

, and

Δ ρ

to stable plateaus). Statistics collected from all 100 attempts included success rate, minimum iteration count for first successful convergence, and median iteration count across successful cases. Calculations were performed on a Dell R740 server with 52 cores (104 threads) at 2.1 GHz. A complete evaluation for a single structure with 100 independent trials required approximately 3 h.

3. Results

3.1. Constraint Framework for Ab Initio Phasing

All direct phasing tests in this study operate within a defined constraint framework that combines experimental measurements with general physical and statistical priors, without requiring knowledge of a specific homologous atomic structure. The reciprocal-space constraint is provided by the experimentally measured structure factor amplitudes

| F_{obs} (h) |

, while crystallographic symmetry (space group and unit cell parameters) is obtained directly from the diffraction experiment. The solvent content

x_{solv}

, estimated via the Matthews coefficient from the unit cell volume and protein molecular weight (derived from sequence information), serves as standard prior knowledge for envelope reconstruction. In real space, two physical constraints are enforced: solvent flattening imposes the near-constant electron density characteristic of bulk solvent regions, and histogram matching guides the protein-region density toward a statistically representative distribution. For genuinely novel structures, this reference histogram can be obtained without model bias either from known proteins [33] or from high-confidence AlphaFold predictions, which provide plausible density distributions without precise atomic coordinates. (In this benchmarking study, reference histograms were generated from deposited PDB coordinates using an empirical temperature factor protocol described in Section 2.6). The molecular envelope itself is reconstructed iteratively from the evolving density, guided by the estimated

x_{solv}

and local density continuity. This constraint set leverages measurable experimental data and fundamental physical principles, distinguishing our approach from model-dependent methods like molecular replacement that require specific atomic templates. All subsequent results are presented within this framework.

3.2. Validation Case Study: Direct Phasing of Structure 3rd5 with Continuous Algorithms

To validate the effectiveness of continuous iterative projection algorithms in addressing discontinuous density modification at molecular boundaries, we conducted a case study using protein crystal structure 3rd5, a putative uncharacterized protein from Mycobacterium paratuberculesis [47]. With estimated solvent content of 59.52%, this structure represents a challenging case at the lower end of the high-solvent-content range, falling below the 65% threshold above which traditional iterative projection algorithms typically converge, making it suitable for evaluating continuous algorithm capabilities under challenging conditions.

From the continuous iterative projection algorithms, we took HPR as an example and applied it with a full-resolution phasing strategy and two-step refined envelope reconstruction. The experiment consisted of 100 independent phase retrieval attempts, each initialized with random phases and allowed to iterate for up to 10,000 iterations. After 8000 iterations, the transition region was linearly shrunk to zero over 1500 iterations, followed by 500 iterations of solvent flattening to finalize convergence. Figure 3 presents monitoring results across multiple convergence metrics. Each trajectory in Figure 3a–e represents the temporal evolution of key quality indicators for a single independent attempt, including mean phase error, envelope intersection-over-union ratio, working R-factor, free R-factor, and solvent region density deviation. Across 96 of 100 attempts, these metrics fluctuated within high-error regimes without sustained improvement, reflecting the dimensionality and rugged topography of the solution space.

Figure 3 reveals successful convergence events, demonstrating the capability of continuous algorithms via HPR. In four attempts, distinguished from others by bold lines, all five monitored metrics underwent a dramatic transition. Mean phase error plummeted from near 90° to values indicating correct phase determination (Figure 3a). Simultaneously, envelope intersection-over-union jumped from around 0.8 to exceeding 0.9 (Figure 3b). Both working and free R-factors dropped from approximately 0.55 to lower stable values (Figure 3c,d), while solvent region density deviation decreased sharply (Figure 3e). This coordinated improvement across independent quality metrics is the signature of successful convergence to the global optimum. The scatter plot in Figure 3f provides visualization by displaying final R-factor pairs for all 100 attempts. The 96 unsuccessful attempts cluster in the high R-factor region, while the four successful cases appear as outliers in the low R-factor region.

The electron density map from these successful convergences exhibited quality for automated atomic model building. After refinement, the backbone root-mean-square deviation between the resulting model and PDB reference coordinates was only 0.10 Å, confirming high phase accuracy. This case study demonstrates that continuous iterative projection algorithms can recover atomic-resolution structural information for challenging protein crystals with limited solvent content from random phase initialization, despite the 4% success rate. The mechanistic basis lies in the transition region concept, which provides smoother and more physically realistic density modification pathways at the protein-solvent interface. By avoiding abrupt discontinuities of classical algorithms, continuous methods enable more effective navigation through multidimensional solution landscapes.

3.3. Enhancement of Phase Retrieval Performance by the Refined Envelope Scheme

The case study in Section 3.2 demonstrates that continuous iterative projection algorithms can successfully recover phases for challenging structures with limited solvent content when combined with refined envelope reconstruction. To systematically evaluate the impact of envelope quality on phase retrieval performance, we compared the traditional one-step coarse envelope versus our proposed two-step refined envelope reconstruction scheme. We conducted this evaluation using structures 3rd5 [47] and 2fg0 [48] (63.86% solvent), employing CHIO under a full-resolution phasing strategy.

Figure 4a,b provides a visual comparison of transition regions and molecular boundaries generated by the two envelope-reconstruction approaches at equivalent iteration stages for 3rd5 and 2fg0. The traditional one-step method (left panels) produces boundaries that appear smooth but are crude and oversimplified. The transition region resembles a uniform thick shell enveloping the molecular core without discrimination, overlaying portions of protein surface side chains and causing inappropriate application of solvent constraints to protein density features. In contrast, molecular envelopes from the two-step method (right panels) exhibit richer surface detail, conforming closely to true molecular geometry and capturing surface crevices, pockets, and protrusions. The transition region delineated by the refined scheme no longer manifests as a uniform shell but concentrates in spatial regions exhibiting low electron density situated on or near the molecular surface. This selective localization demonstrates identification of physical solvent spaces erroneously included within traditional coarse envelopes.

The envelope precision improvement translates into enhanced convergence behavior. Figure 4c illustrates transition region evolution for structure 3rd5 using CHIO with refined envelope scheme, comparing pre-convergence and post-convergence states. While the pre-convergence transition region remains diffuse, upon successful convergence, the transition region becomes aligned with the final molecular boundary, reflecting confidence in boundary determination. Figure 4d illustrates the progressive refinement of the protein envelope, showing the transformation from an incomplete envelope that fails to fully encompass the protein structure (with numerous atoms exposed outside the boundary) to a well-converged envelope that accurately conforms to the protein surface.

The advantage of enhanced envelope accuracy is the maximization of available constraint information within the unit cell volume. By re-identifying approximately 5% of the asymmetric unit within traditional coarse envelopes as solvent or transition region based on local density characteristics, the refined scheme provides iterative algorithms with additional reliable prior knowledge guiding convergence. For challenging crystals like 3rd5, where total bulk solvent volume is limited, this reclaimed solvent represents valuable information, strengthening the constraint framework. These enhancements demonstrate that refined envelope reconstruction, by optimizing molecular boundary determination, elevates phase retrieval capability. This represents progress toward solving challenges of phasing structures with limited solvent content, historically resistant to direct methods. As demonstrated through benchmarking in subsequent sections, performance gains prove universal, enhancing both continuous and classical iterative projection algorithms across diverse structural types and crystallographic conditions.

3.4. Systematic Benchmarking: Synergistic Effects of Algorithms, Envelopes, and Strategies

We conducted systematic benchmarking across twenty-eight diverse protein structures, comparing ten iterative projection algorithms (six classical or improved: ASR, RAAR, DM, MDM, MRAAR, HIO; four continuous: CHIO, HPR, MCHIO, THIO) under two envelope schemes (one-step coarse and two-step refined) and three phasing strategies (conventional full-resolution, resolution-weighted, genetic algorithm-enhanced). For each configuration, 100 independent phase retrieval attempts from random phase seeds were executed, recording success rate, minimum iteration count for the first convergence, and median iteration count for all successful runs. Detailed numerical results are provided in Appendix A, Table A2, Table A3, Table A4, Table A5, Table A6 and Table A7.

3.4.1. Universal Performance Enhancement from Refined Envelope Reconstruction

To contextualize the averaged performance comparisons, we first examine the success rate of two representative algorithms, the classical HIO and the continuous HPR, under the conventional full-resolution phasing strategy with one-step coarse and two-step refined envelope reconstruction schemes across all 28 test structures (Figure 5a). The results, ordered by solvent content, reveal a general upward trend but with significant variation. Structures like 1xhd [53] achieve near-perfect success, while others, such as 4qb6 [54], 3rd5 [47], 4q82 [55], 2fg0 [48], 2bke [56], and 4tpl [57], show very low or zero success rates. This preliminary view highlights the inherent difficulty of certain structures and motivates the need for both improved algorithms and enhanced strategies, the effects of which are analyzed in the following averaged results.

We conducted systematic benchmarking across the 28 diverse protein structures, comparing 10 iterative projection algorithms. Figure 5b–d demonstrates that refined envelope reconstruction delivers universal enhancement. As shown in Figure 5c, continuous algorithms improved from a 14.3% to a 20.9% success rate (45.7% relative gain), with reduced minimum and median iteration counts. For classical algorithms (excluding ASR and RAAR), the improvement was more striking: from 9.9% to 15.9% (60.5% relative gain). In Figure 5b, the proposed improved algorithms MDM, MRAAR, and MCHIO achieved 17.1%, 20.4%, and 20.5% success rates under refined envelopes, exceeding their original versions (DM, RAAR, and CHIO) by significant margins. These results establish refined envelope reconstruction as a universal performance enhancer for algorithms relying on solvent flatness constraints, while demonstrating that the proposed improved algorithms, when combined with a refined envelope, expand the available toolkit for direct phasing.

3.4.2. Algorithm Performance Differences and Selection Strategy

Figure 5b shows that continuous algorithms (CHIO, HPR, MCHIO, THIO) and improved classical variants (MDM, MRAAR) perform comparably to the established HIO algorithm when used with the two-step refined envelope scheme. Together, these algorithms form a high-performing group that surpasses classical DM, while ASR and RAAR consistently yield lower success rates.

However, examining success rates per structure in Appendix A Table A2 and Table A5 reveals that no single algorithm excels in all cases. For a given structure, different algorithms can produce widely varying success rates. If only HIO is used and its success rate is near zero for a particular structure (e.g., 3rd5), thousands of random trials may be needed to obtain a solution—a computationally demanding process. With multiple algorithms available, a more efficient strategy emerges: by testing several algorithms with a limited number of trials each (e.g., 100 random seeds per algorithm), one may identify an algorithm that performs substantially better than HIO for that structure, enabling phase determination within hundreds rather than thousands of attempts.

For practical applications to unknown structures, we recommend first testing multiple algorithms from the top-performing group—including continuous algorithms (CHIO, HPR, MCHIO, THIO), improved classical variants (MDM, MRAAR), and HIO—in combination with the two-step refined envelope scheme. If computational resources allow, the genetic algorithm co-evolution strategy should also be employed, as it enhances search efficiency through population-wide information sharing. This integrated approach exploits the benefits of improved envelope accuracy, algorithmic diversity, and collaborative optimization. If any combination succeeds, the structure is considered solved. If all tested combinations show very low success rates, increasing the number of random trials per algorithm may be attempted, though such cases likely belong to a challenging category where additional constraints or experimental data may be needed.

3.4.3. Structural Validation: From Electron Density to Atomic Models

To validate practical utility, we selected seven representative phased structures (3rd5 [47], 2fg0 [48], 1hp4 [58], 1nh6 [59], 2bke [56], 2boz [60], 4gtf [61]) for automated model building. As shown in Figure 6, rebuilt models exhibit excellent spatial alignment with PDB structures, with backbone RMSD below 0.5 Å. Local electron density maps for biologically significant regions, including bound ligands, coordinated ions, and secondary structure motifs, display clarity and spatial continuity sufficient to determine side-chain orientations and resolve structural features. This density quality validates that our direct phasing methods achieve sufficient accuracy to support atomic model construction and enable biological interpretation.

3.5. Evolution of Phasing Strategies: From Full-Resolution to Genetic Co-Evolution

We evaluated three phasing strategies: full-resolution baseline, resolution-weighted progressive, and genetic algorithm-enhanced co-evolution. Figure 7 illustrates the GA-enhanced strategy (incorporating resolution weighting) using structure 3rd5, with results compared against Figure 3. Six panels track the temporal evolution of five quality metrics (mean phase error, envelope intersection-over-union, working R-factor, free R-factor, and solvent density deviation) across 100 individuals, each starting from an independent random seed. The figure shows successful convergence unfolding in two phases: after approximately 5100 iterations, a single individual achieved convergence manifested through abrupt coordinated improvements across all metrics. Subsequently, high-quality structural information from this pioneer propagated throughout the population via genetic operations, guiding the majority of other individuals to converge within a few hundred additional iterations. Upon achieving near 100% population convergence at iteration 5500, early stopping was triggered. The transition region was then linearly reduced to zero over 1500 iterations, followed by 500 iterations of solvent flattening, with the process terminating at iteration 7500.

3.5.1. Resolution-Weighted Strategy: Modest Gains with Computational Trade-Offs

The resolution-weighted strategy prioritizes low-resolution data initially, then progressively introduces high-resolution data. Analysis of Figure 8g shows modest success rate increases of approximately 2 percentage points compared with the full-resolution baseline, regardless of envelope scheme. However, Figure 8h,i reveals increased minimum and median iteration counts. This slowdown occurs because suppressing high-resolution data during early stages, while beneficial for establishing stable low-resolution phases and accurate envelope determination, delays structural detail reconstruction.

3.5.2. Genetic Algorithm Strategy: Breakthrough Through Population Intelligence

The GA co-evolution strategy, introduced in our previous work [32], delivered breakthrough performance by transforming phase retrieval from multi-start independent search into population-based collaborative learning. While our previous study demonstrated GA effectiveness using only the HIO algorithm with traditional one-step envelope reconstruction, the current work extends this evaluation to 10 iterative algorithms, including four continuous IPAs, and examines the synergistic effects when GA is combined with the two-step refined envelope scheme. Statistical analysis (Figure 8a,g) demonstrates that the GA-enhanced strategy significantly improved average success rates compared with the conventional strategy. For coarse envelope design, success rates increased from 12.1% to 43.3%, while for refined envelope design, rates improved from 18.4% to 63.4%, representing an approximately 2.5-fold increase in success rates across both envelope designs. Figure 8d reveals that the GA-enhanced strategy universally improves success rates across all 10 algorithms. Notably, Figure 8l shows that when combining refined envelope reconstruction with the GA phasing strategy, continuous IPAs (CHIO, HPR, MCHIO, THIO) and improved classical algorithms (MDM, MRAAR) achieve success rates comparable to HIO and substantially exceeding DM. In contrast, ASR and RAAR consistently yield the lowest success rates.

Regarding convergence speed, the GA-enhanced strategy shows distinct effects on different convergence metrics (Figure 8e,f,h,i, and Appendix A, Figure A3). Minimum iteration counts remain comparable across all three strategies for both coarse and refined envelope schemes. However, median iteration counts decrease substantially under the GA-enhanced strategy, demonstrating improved overall convergence consistency across both envelope designs. Specifically, with the refined envelope design, median iterations dropped from 3681 (conventional) and 4398 (resolution-weighted) to 2112 (GA-enhanced), representing approximately a 43–52% reduction. This improvement in median convergence reflects GA’s collective optimization mechanism illustrated in Figure 7: successful pioneer individuals emerge and rapidly propagate their high-quality density features throughout the population via genetic operations. The synergy between refined envelope reconstruction and GA strategy enhances convergence reliability—refined envelopes provide accurate spatial constraints that increase pioneer emergence probability, while GA efficiently disseminates validated solutions through population-wide information sharing, thereby elevating the performance of the entire population rather than just isolated optimal cases.

3.6. Correlation Analysis: Solvent Content Dependence and Extended Applicability

Solvent content is a parameter governing phase retrieval difficulty [63,64]. We conducted correlation analysis examining relationships between solvent content and success rate across 28 structures. Figure 9 presents average success rates (across eight algorithms, excluding ASR and RAAR) versus solvent content under three phasing strategies and two envelope schemes. The 28 structures are arranged by increasing solvent content, with each represented by paired bars for coarse (blue) and refined (orange) envelopes. Results reveal that the success rate exhibits an overall increasing trend with rising solvent content, albeit with considerable variation. Within every solvent content interval, the refined envelope (orange bars) exceeds the coarse envelope (blue bars).

Figure 9 shows that several structures (4qb6 [54], 3rd5 [47], 4iqk [65], 4q82 [55], 2fg0 [48], 1nh6 [59], 2bke [56], 4tpl [57]) consistently exhibit below-average success rates regardless of strategy or envelope scheme. Detailed analysis identifies three contributing factors as illustrated in Appendix A, Figure A2 for representative cases. First, extensive surface-bound water molecules, 3rd5 (413 waters, 16% of non-hydrogen atoms), 4q82 (472, 11%), 2fg0 (428, 11%), 1nh6 (635, 13%), reduce bulk solvent available for constraints by requiring loose envelopes for encapsulation. Second, space group symmetry combined with molecular packing produces multiple equivalent origin choices with nearly identical envelopes: two proximate choices in 3rd5, 2fg0, and 4tpl; four converging choices in 4qb6, making it difficult for iterative reconstruction to select a definitive convergence pathway among these equivalent configurations. Third, severe protein-solvent interdigitation creates highly irregular envelope topologies with fragmented bulk solvent in 4iqk, 2bke, and 4tpl, which hinder accurate ab initio envelope reconstruction. Individual structures may exhibit a single factor or combinations; for instance, 3rd5 combines surface waters with origin ambiguity, 4qb6 combines four-choice origin ambiguity with surface waters (approximately 100 water molecules, 8% of atoms), while 4tpl exhibits both interdigitation and origin ambiguity.

Linear regression analysis (Figure 10) of success rates versus solvent content across six methodological combinations (three phasing strategies, two envelope designs) reveals positive correlations (coefficients ∼0.5), confirming the role of solvent constraints. Success rates decline sharply as solvent content decreases, with extrapolated trends suggesting near-zero success below 55% solvent content. Refined envelope regression lines consistently lie above coarse envelope lines across all strategies. These results demonstrate that refined envelope reconstruction combined with a GA strategy extends the applicable range to 55% solvent content by more efficiently exploiting structural constraints rather than overcoming the fundamental physical dependence. Below 55% solvent content, direct phasing remains challenging and requires integration with more constraints or complementary techniques. Analysis of convergence speed (Appendix A Figure A4 and Figure A5) reveals weaker correlations with solvent content compared with success rates. All three strategies show a slight trend toward faster convergence at higher solvent content, with refined envelopes requiring fewer iterations than coarse envelopes at equivalent solvent content.

3.7. Multi-Solution Averaging: Enhanced Precision Through Population Convergence

High GA success rates approaching 100% enable phase accuracy enhancement through multi-solution averaging. When multiple independent runs converge through distinct stochastic pathways to fixed points clustered around the true solution, their residual random errors are uncorrelated. Spatial alignment and averaging of multiple converged solutions suppress random noise, yielding a consensus solution.

Figure 11a compares mean phase error before (red bars) and after (green bars) averaging for 27 successfully solved structures (excluding 4qb6). Green bars are lower, demonstrating a reduction across diverse structures. Figure 11b quantifies improvements: averaging reduced phase error by 6.83° on average, lowering the mean from 39.06° to 32.23°. This enhancement yields electron density maps with improved signal-to-noise ratios, facilitating automated model building and identification of structural details.

Figure 11c reveals no correlation between phase error reduction magnitude and solvent content, indicating universal benefit regardless of structural difficulty. This occurs because averaging suppresses random noise from stochastic optimization aspects rather than systematic biases, providing enhancement for challenging crystals with limited solvent content. Analysis shows diminishing returns beyond approximately 20 averaged solutions (Figure 7f), indicating an optimal cost-benefit balance. Multi-solution averaging imposes minimal computational overhead while delivering a 6.83° phase error reduction, constituting a byproduct of the GA strategy.

4. Discussion

4.1. Theoretical and Practical Value of Continuous Density Modulation

Continuous iterative projection algorithms provide a more physically motivated mathematical description. Traditional algorithms impose binary update strategies at protein-solvent interfaces, creating mathematical discontinuities that contradict continuous electron density and propagate errors, hindering convergence. By introducing transition regions, continuous algorithms achieve smooth density modification, resulting in smoother optimization paths that reduce local minima entrapment. Results across 28 structures demonstrate that continuous algorithms (CHIO, HPR, MCHIO, THIO) achieve performance levels comparable to the well-established HIO algorithm under both envelope schemes (Figure 5b). The smooth density transitions at protein-solvent interfaces can provide stable convergence pathways in certain cases. Continuous iterative algorithms represent an optimization of the physical model through physically motivated mathematical descriptions that eliminate artificial discontinuities. It is important to note that algorithm performance varies across individual structures, and no single algorithm universally excels for all crystallographic scenarios.

4.2. Mechanism and Universal Significance of Refined Envelope Reconstruction

The two-step refined envelope reconstruction (Section 2.3) improved average success rates by approximately 50% (Figure 8a), demonstrating that optimizing foundational constraints can be as impactful as algorithmic innovations. Traditional one-step methods must balance complete protein enclosure against solvent misclassification. Our two-step approach uses coarse-scale smoothing to capture overall molecular shape, then fine-scale analysis to identify trapped solvent. This recovered solvent provides additional constraints for iterative refinement and benefits all algorithm types (Figure 5b). For crystals with limited solvent content where constraint information is already scarce, the ability to reclaim even 5% additional solvent volume provides crucial marginal gains that can determine success or failure.

4.3. Genetic Algorithm Strategy: Mechanism and Boundaries

The GA strategy was introduced in our previous work [32], using the HIO algorithm with traditional one-step envelope reconstruction. In the current study, we demonstrate that this enhancement extends to all 10 tested algorithms, with the combination of GA and a two-step refined envelope achieving approximately a 2.5-fold improvement (Figure 8g). While the GA mechanism itself remains unchanged, the current work reveals important synergistic effects: the refined envelope scheme increases the probability of pioneer emergence by providing more accurate constraints, enabling GA to achieve higher absolute success rates and faster convergence than reported previously. However, important boundaries exist. Regression analysis (Figure 10c,f,i) reveals that when solvent content falls below approximately 55%, even GA success rates approach zero, confirming that GA uncovers existing success probability within solution space but does not create new constraints. For constraint-insufficient problems, any search strategy fails. The primary trade-off of the GA strategy is increased computational cost. However, when GA succeeds, it typically achieves convergence faster (about a 40% reduction in median iterations, Figure 8i) compared with a conventional phasing strategy, partially offsetting the computational overhead. GA should thus be viewed as the optimal tool for excavating maximum success potential under given constraints, particularly valuable when data quality permits but traditional methods show marginal success rates.

4.4. Factors Influencing Phase Recovery Success

Our previous testing experience and the systematic analysis of 28 diverse protein crystals in this work reveal that the success rate is influenced by several primary factors including data quality and structural distribution characteristics. Building on the algorithmic selection guidelines provided in Section 3.4.2, we now analyze these underlying factors and discuss comprehensive strategies for challenging cases.

4.4.1. Data Quality

The quality of diffraction data, encompassing both measurement accuracy and completeness, fundamentally influences phase recovery success. Measurement accuracy, reflected in the PDB-reported R-work values, appears more critical than completeness for the methods presented here. Structures in our test set with R-work below 0.22 generally yielded favorable phasing outcomes, whereas those with substantially higher R-values presented greater challenges due to reduced signal-to-noise ratio, making ab initio envelope reconstruction and phase solution increasingly difficult; addressing such high-error cases remains a direction for future development.

Regarding data completeness, particularly at low resolution, missing reflections (detailed in Appendix A, Table A1) can affect the initial envelope reconstruction. While extensive missing low-resolution data (e.g., 236 reflections for 2boz) could hinder this initial stage, our dynamic data filling strategy (Section 2.5.2) enables the iterative algorithm to compensate progressively using the observed medium- and high-resolution data. Crucially, the algorithm’s ability to reconstruct accurate phases does not solely depend on low-resolution completeness, as evidenced by structures like 4qb6, which has very few missing low-resolution reflections (only 4) yet remains challenging to phase. This indicates that for such cases, the primary obstacles arise from solvent content and structural distribution factors rather than data incompleteness.

4.4.2. Structural Distribution Within the Unit Cell

Beyond molecular shape, the distribution pattern of molecules within the crystallographic unit cell directly determines envelope reconstruction difficulty and consequently affects success rates. When protein surfaces exhibit complex interdigitation with bulk solvent regions (Appendix A Figure A2a, structure 2bke), ab initio envelope reconstruction struggles to delineate clear boundaries between protein and solvent domains. The irregular topology creates ambiguity in envelope determination, hindering convergence.

Crystallographic space groups permit multiple equivalent origin choices related by allowed origin translations. Non-centrosymmetric space groups introduce additional ambiguity through enantiomorphic structures, spatially inverted configurations that produce identical diffraction amplitudes. For instance, space group

P 2_{1} 2_{1} 2_{1}

presents 8 equivalent origin choices from crystallographic symmetry plus 8 from spatial inversion, totaling 16 equivalent origin selections. Iterative algorithms and physical constraints cannot distinguish among these mathematically equivalent origins. Reconstructed envelopes randomly select origins with equal probability. Whether different origin choices yield similar envelopes depends not only on space group symmetry but also on the specific molecular packing within the unit cell. When this occurs, such as when envelopes corresponding to two different origin choices exhibit spatial proximity (Appendix A Figure A2b–d, structure 3rd5) or when four origin choices converge to quite similar envelopes (Appendix A Figure A2f–j, structure 4qb6), the reconstruction process oscillates among equivalent configurations, failing to converge to a definitive solution.

Structures containing numerous fixed water molecules integrated into the molecular surface (Appendix A Figure A2e, structure 3rd5) require loose envelopes to encompass these structured solvent shells. This necessity encroaches upon limited bulk solvent volumes, reducing the effectiveness of solvent flatness constraints. Moreover, a subtle pathology occasionally emerges where envelope reconstruction succeeds, but protein region electron density adopts inverted signs, phases differ by 180°, and diffraction amplitudes remain unchanged. Although histogram matching constraints with positive skewness typically correct this inversion, convergence sometimes stalls at intermediate error states between correct and inverted solutions. The genetic algorithm strategy, with its population-based search and information sharing, is particularly effective at escaping such metastable states and driving the system toward the correct global minimum. The most challenging structures typically combine limited solvent content, measurement errors (high R-factors), and one or more geometric complications, creating compounding difficulties that substantially reduce success rates.

In summary, phase recovery success is governed by the interplay of data quality (measurement accuracy and completeness) and structural characteristics (solvent content, molecular packing complexity, and origin ambiguity). The two-step refined envelope scheme helps mitigate boundary-related issues, while the GA strategy is particularly effective at escaping metastable states arising from these structural complications.

4.5. Practical Considerations and Future Directions

The performance of improved algorithms (MCHIO, MDM, MRAAR) highlights importance of domain-specific adaptation. MCHIO corrects excessive CHIO transition region feedback; MDM and MRAAR explicitly incorporate

P_{A} P_{B}

operators addressing insufficient protein region constraints, elevating performance to levels comparable with HIO. Different algorithms perform optimally under different conditions in Figure 8j–l, indicating that no single universal algorithm exists. Therefore, our multi-level synergistic framework becomes necessary: refined envelope reconstruction provides high-quality constraints; a diverse algorithm toolbox enriches algorithm selection; resolution weighting and GA offer strategic optimization. This transforms direct methods from experience-dependent art into a systematic, reproducible methodological pipeline.

For particularly challenging cases, integrated strategies may be needed. When available, homologous structures or AlphaFold predictions can guide envelope definition before applying direct methods for unbiased phase retrieval. For crystals with medium and low solvent content or complex packing, experimental phasing via anomalous diffraction can provide complementary phase information. When success rates are low but non-zero, increasing computational resources through more trials or larger genetic algorithm populations may eventually yield solutions.

Looking forward, integration of AI-predicted structural models offers promising avenues for further constraint enhancement. Beyond reference histograms, high-confidence prediction regions (pLDDT scores > 90) can serve as partial structural constraints, reducing the phase problem to determining only the uncertain fragments. This hybrid approach may reduce solvent content requirements, extending applicability into the medium- and low-solvent-content range. Combined with the multi-algorithm framework established in this work, such AI-augmented strategies represent a natural evolution toward widely applicable structure determination.

5. Conclusions

This study addresses two challenges in direct-method phase retrieval for protein crystals, discontinuous density modification and crude molecular envelope reconstruction, through a systematic approach. We introduced continuous iterative projection algorithms into this domain, validating their value in enhancing convergence stability by implementing smooth density transitions at the molecular interface. We developed improved algorithm variants, including MCHIO, MDM, and MRAAR, that optimize constraint enforcement, achieving performance levels comparable to or exceeding established methods (Figure 5b). Moreover, the proposed two-step refined envelope reconstruction scheme serves as a universal enhancement, elevating average phase retrieval success rates (Figure 8a). This demonstrates that optimizing foundational constraints can be as impactful as algorithmic innovations. While continuous algorithms demonstrate competitive performance comparable to HIO, algorithm effectiveness varies across individual structures, and no single method universally excels for all cases. For practical structure determination, we recommend testing multiple algorithms from both continuous (CHIO, HPR, MCHIO, THIO) and classical (HIO, MDM, MRAAR) categories. At the strategy level, the resolution-weighted strategy provided limited improvement, while the genetic algorithm co-evolution strategy delivered breakthrough performance that enabled multi-solution averaging, reducing mean phase error by approximately 6.83° (Figure 11a,b). This precision gain was achieved universally across different solvent content levels, consistently improving model quality.

Our methods shift the success rate versus solvent content curve upward, extending the applicability of direct methods within the high-solvent-content range to its lower boundary around 55% (Figure 10). Although method efficacy remains constrained by physical limits, including low solvent contents (approaching or below 55%), insufficient data quality (large R-work values due to measurement errors or poor crystal quality), and complex structural arrangements within the unit cell, continued development is needed. Nevertheless, the framework constructed in this study provides a powerful and systematic solution for the unbiased structure determination of challenging systems in structural biology. This framework synergistically combines refined envelope reconstruction, continuous and improved iterative projection algorithms whose performance is comparable to or exceeds established methods, and genetic algorithm strategies. The compiled algorithms developed in this work are accessible on GitHub [66].

Author Contributions

H.H.: conceptualization, methodology, software, formal analysis, data curation, writing—original draft preparation, and visualization; Y.L. and R.F. assisted with minor parameter testing; W.-P.S.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The diffraction data were downloaded from the Protein Data Bank at https://www.rcsb.org (accessed on 20 December 2025).

Acknowledgments

We acknowledge the computational resources provided by the Department of Physics, Ningbo University.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ASR	Averaged Successive Reflections
CHIO	Continuous Hybrid Input–Output
DM	Difference Map
GA	Genetic Algorithm
HIO	Hybrid Input–Output
HPR	Hybrid Projection Reflection
IPA	Iterative Projection Algorithms
MASR	Modified Averaged Successive Reflections
MCHIO	Modified Continuous Hybrid Input–Output
MPI	Message Passing Interface
MRAAR	Modified Relaxed Averaged Alternating Reflections
PDB	Protein Data Bank
RAAR	Relaxed Averaged Alternating Reflections
THIO	Transition Hybrid Input–Output

Appendix A. Supplementary Figures, Structure Information, and Numerical Results

Figure A1. Distribution of solvent content in protein crystal structures retrieved from the Protein Data Bank. The histogram (blue bars, left y-axis) shows the number of structures in each 5% solvent content interval, while the curve (dark red line, right y-axis) represents the cumulative percentage calculated from high to low solvent content. The analysis encompasses approximately 199,083 protein crystal structures. Orange and blue dashed lines mark the 55% and 65% solvent content thresholds, corresponding to cumulative percentages of 32.3% and 9.6%, respectively.

Figure A2. Challenging structural distribution patterns within crystallographic unit cells that impede envelope reconstruction and reduce direct phasing success rates. All molecular surfaces are rendered from PDB-deposited coordinates. (a) Structure 2bke illustrating severe interdigitation between protein surfaces (gray) and bulk solvent regions (white), creating highly irregular envelope topologies with fragmented bulk solvent that hinder accurate ab initio envelope reconstruction. (b,c) Structure 3rd5 demonstrating two equivalent origin choices in space group

P 2_{1} 2_{1} 2_{1}

, showing highly similar envelope configurations that cause oscillation between equivalent solutions during iterative reconstruction. (d) Superposition of panels (b,c) highlighting the spatial proximity of envelopes from different origin choices. (e) Structure 3rd5 showing numerous ordered water molecules (red spheres) bound to the protein surface, which must be encompassed by the reconstructed envelope, thereby reducing available bulk solvent volume. (f–i) Structure 4qb6 displaying four equivalent origin choices, all exhibiting quite similar envelope topologies. (j) Superposition of panels (f–i) demonstrating the convergence challenge when multiple equivalent origin selections produce indistinguishable envelope geometries. Structures with limited solvent content combined with one or more of these geometric complications present substantial challenges for direct phasing methods. Visualization using PyMOL 3.1 [62].

Figure A2. Challenging structural distribution patterns within crystallographic unit cells that impede envelope reconstruction and reduce direct phasing success rates. All molecular surfaces are rendered from PDB-deposited coordinates. (a) Structure 2bke illustrating severe interdigitation between protein surfaces (gray) and bulk solvent regions (white), creating highly irregular envelope topologies with fragmented bulk solvent that hinder accurate ab initio envelope reconstruction. (b,c) Structure 3rd5 demonstrating two equivalent origin choices in space group

P 2_{1} 2_{1} 2_{1}

, showing highly similar envelope configurations that cause oscillation between equivalent solutions during iterative reconstruction. (d) Superposition of panels (b,c) highlighting the spatial proximity of envelopes from different origin choices. (e) Structure 3rd5 showing numerous ordered water molecules (red spheres) bound to the protein surface, which must be encompassed by the reconstructed envelope, thereby reducing available bulk solvent volume. (f–i) Structure 4qb6 displaying four equivalent origin choices, all exhibiting quite similar envelope topologies. (j) Superposition of panels (f–i) demonstrating the convergence challenge when multiple equivalent origin selections produce indistinguishable envelope geometries. Structures with limited solvent content combined with one or more of these geometric complications present substantial challenges for direct phasing methods. Visualization using PyMOL 3.1 [62].

Figure A3. Comprehensive algorithm-by-algorithm comparison under three phasing strategies. Success rates (a–c), minimum convergence iterations ((d–f), lower is better), and median convergence iterations ((g–i), lower is better) are shown for the conventional (first column), resolution-weighted (second column), and GA-enhanced (third column) phasing strategies. Each panel directly compares the performance of all 10 iterative projection algorithms using the one-step coarse envelope (blue bars) versus the two-step refined envelope (orange bars). This detailed breakdown allows for the evaluation of individual algorithm sensitivity to envelope design and phasing strategy. Key observations include the universal enhancement provided by the refined envelope reconstruction scheme, the breakthrough performance of the GA strategy, and the formation of a top-tier performance group comprising continuous algorithms (CHIO, HPR, MCHIO, THIO), improved classical variants (MDM, MRAAR), and HIO.

Figure A4. Correlation between minimum convergence iterations and solvent content for conventional (first column), resolution-weighted (second column), and GA-enhanced (third column) phasing strategies. (a–c) Coarse envelope design results; (d–f) improved envelope design results; (g–i) overlaid comparison. Linear regression analysis reveals negative correlation between minimum iterations and solvent content across all strategies and designs. Pearson correlation coefficients (r) and p-values are shown in each panel. The improved envelope design consistently achieves lower minimum iterations than the coarse design at equivalent solvent contents.

Table A1. Structure information, data statistics, and ab initio phasing results for 28 test cases.

PDB ID	Description	Space Group	PDB Reported R-Work	Resolution Range High (Å)	Resolution Range Low (Å)	Number of Non-Hydrogen Atoms	Number of Water Molecules	Number of Reflections Observed	Number of Missing Low-Resolution Reflections	Matthews Coefficient Corresponding Solvent Content (%)	Volume Not Occupied by Model (%)	PDB Posted Solvent Content (%)	$Δ φ$ Before Averaging Multi Solutions (°)	$Δ φ$ After Averaging Multi Solutions (°)
1a53	Indole-3-glycerol phosphate synthase	$P 2_{1} 2_{1} 2_{1}$	0.159	2.0	15.4	2264	242	27,283	86	64.70	58.0	68.50	31.04	27.14
1gaj	ABC transporter ATP-binding protein	$P 432$	0.205	2.5	39.0	2171	133	13,564	12	65.23	58.7	67.04	49.96	41.62
1gc7	Radixin FERM domain	$P 4_{1} 2_{1} 2$	0.215	2.8	30.0	2482	0	15,929	22	72.09	66.8	71.87	39.60	32.15
1gk6	Vimentin coil 2B fragment	$P 3_{1} 21$	0.199	1.9	35.0	1064	194	15,422	3	64.07	57.3	67.50	33.54	28.45
1hes	Mu2 adaptin subunit with peptide	$P 6_{4}$	0.21	3.0	42.0	2138	33	13,045	6	78.39	74.3	79.00	40.55	31.79
1hp4	Beta-N-acetylhexosaminidase	$P 6_{1} 22$	0.181	2.2	32.2	4273	382	47,413	25	67.19	61.0	69.43	27.37	23.58
1nh6	Chitinase A-hexasaccharide complex	$I 222$	0.197	2.05	37.0	4858	635	61,725	17	66.04	59.6	66.04	32.83	28.23
1nw3	DOT1L histone methyltransferase	$P 6_{5}$	0.204	2.5	43.0	2780	68	22,926	7	71.82	66.5	71.82	44.63	34.37
1reu	Bone morphogenetic protein 2	$H 32$	0.215	2.65	19.8	833	13	4918	19	67.15	61.0	67.10	45.99	36.96
1vh7	Imidazolglycerolphosphate synthase	$P 6_{1} 22$	0.202	1.9	36.0	2202	245	34,390	10	63.36	56.4	65.67	29.54	24.47
1vmg	MazG nucleotide pyrophosphohydrolase	$I 4_{1} 22$	0.142	1.46	26.0	826	107	24,059	8	62.40	55.3	62.64	57.15	48.93
1xhd	Acetyltransferase from B. cereus	$I 2_{1} 3$	0.152	1.9	32.0	1516	130	37,517	12	77.99	73.8	80.20	23.58	20.45
1yh2	Ubiquitin-conjugating enzyme	$P 4_{3} 2_{1} 2$	0.202	2.0	20.0	1431	179	18,669	31	63.94	57.1	68.00	38.11	31.92
2b44	Lysostaphin-type enzymes	$P 3_{2} 21$	0.181	1.83	19.3	2226	188	49,103	62	74.25	69.4	76.00	31.36	25.75
2b71	Plasmodium cyclophilin-like protein	$P 4_{1} 32$	0.19	2.5	40.0	1473	122	13,563	7	73.56	68.6	71.80	36.12	28.48
2bke	Crenarchaeal RadA recombinase	$P 3_{1} 21$	0.174	3.2	50.0	2327	40	9062	3	70.72	65.2	67.80	47.25	38.90
2boz	Photosynthetic reaction center	$P 3_{1} 21$	0.174	2.4	18.0	7475	368	76,249	236	74.75	70.0	74.75	31.82	25.80
2bt1	Epstein Barr Virus dUTPase complex	$P 6_{2} 22$	0.19	2.7	20.0	2100	57	13,164	54	69.53	63.8	69.53	40.61	34.71
2buj	Serine-threonine kinase 16 complex	$C 222_{1}$	0.185	2.6	64.8	4766	76	33,626	5	71.12	65.7	72.30	44.27	37.24
2fg0	Gamma-D-glutamyl endopeptidase	$P 4_{1} 2_{1} 2$	0.154	1.79	29.5	3977	428	68,770	26	63.86	57.0	63.86	36.79	30.92
3rd5	Mycobacterium paratuberculosis protein	$P 2_{1} 2_{1} 2_{1}$	0.133	1.5	29.0	2549	413	69,906	15	59.52	51.9	65.00	45.14	38.61
3tqe	Malonyl CoA-acyl carrier protein transacylase	$C 121$	0.146	1.5	41.0	2915	347	72,567	5	56.29	48.0	63.07	45.18	38.65
4bsj	VEGFR-3 extracellular domains D4-5	$P 3_{1} 21$	0.21	2.5	44.0	1722	21	17,525	5	76.20	71.7	74.10	45.01	37.74
4gtf	Flavin-dependent thymidylate synthase	$I 4_{1} 22$	0.162	1.77	39.0	2086	124	34,764	6	60.72	53.3	63.38	36.08	30.11
4iqk	Keap1 Kelch domain with inhibitor	$C 121$	0.153	1.97	17.0	2341	122	30,772	57	63.25	56.3	63.77	42.57	32.45
4q82	Phospholipase/Carboxylesterase	$P 2_{1} 2_{1} 2$	0.153	1.85	36.0	4204	472	66,642	14	62.70	55.7	62.49	36.55	27.62
4qb6	CBM35 in complex with aldouronic acid	$P 2_{1} 2_{1} 2_{1}$	0.164	1.35	32.0	1265	100	41,390	4	54.75	46.2	55.82	-	-
4tpl	West Nile virus NS1 protein	$P 321$	0.17	2.9	48.5	5767	62	33,918	10	73.74	68.8	72.50	41.88	33.05

Figure A5. Correlation between median convergence iterations and solvent content for conventional (first column), resolution-weighted (second column), and GA-enhanced (third column) phasing strategies. (a–c) Coarse envelope design results; (d–f) improved envelope design results; (g–i) overlaid comparison. Linear regression analysis reveals negative correlation between median iterations and solvent content across all strategies and designs. Pearson correlation coefficients (r) and p-values are shown in each panel. The improved envelope design consistently achieves lower median iterations than the coarse design at equivalent solvent contents.

Table A2. Phase recovery success rates (%) for continuous and classical IPAs across 28 protein structures using conventional, resolution-weighted, and GA-enhanced strategies with one-step envelope reconstruction.

	ASR			RAAR			DM			MDM			MRAAR			HIO			CHIO			HPR			MCHIO			THIO
PDB ID	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA
1a53	0	0	100	0	0	100	0	0	0	0	17	100	0	29	100	0	4	100	0	8	100	7	83	100	0	27	100	0	4	100
1gaj	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	100	0	0	0	0	0	0	0	0	4	0	0	100
1gc7	0	0	0	0	0	0	1	0	14	22	2	99	38	33	100	39	27	100	24	15	100	15	15	100	32	23	100	49	56	100
1gk6	0	0	0	0	0	0	0	0	0	5	2	0	7	2	0	4	0	100	9	8	100	5	0	1	7	10	100	10	8	100
1hes	0	0	0	0	0	0	43	46	100	56	56	76	54	60	100	55	48	100	51	48	99	58	62	99	57	56	100	56	60	100
1hp4	0	0	0	0	0	0	0	0	0	2	0	0	6	0	0	7	4	0	2	2	0	0	0	0	1	0	5	19	6	100
1nh6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1nw3	0	0	0	0	0	0	0	0	0	0	0	0	27	33	100	19	48	100	6	8	0	8	4	100	15	8	100	10	23	100
1reu	0	0	0	0	0	0	0	0	0	2	2	0	25	4	100	29	23	0	21	40	83	8	23	94	19	25	95	40	46	9
1vh7	0	0	0	0	0	0	0	0	0	0	0	0	0	2	100	4	2	100	4	2	2	0	0	1	2	0	0	6	0	100
1vmg	0	0	0	0	0	0	12	2	100	23	15	100	21	31	100	20	17	100	34	40	100	37	42	0	34	29	0	32	52	0
1xhd	0	0	0	0	0	0	38	44	100	21	46	100	54	81	100	54	67	100	100	100	100	100	100	100	100	100	100	100	100	100
1yh2	0	0	0	0	0	0	0	0	0	4	4	0	6	2	0	4	6	0	21	12	0	15	2	1	15	4	0	27	27	100
2b44	0	0	0	0	0	0	23	21	100	15	19	100	15	1	2	35	42	100	27	14	0	6	6	13	29	16	25	46	25	100
2b71	0	0	0	0	0	0	25	16	0	19	27	100	33	51	100	33	47	100	38	41	100	44	27	100	42	42	100	38	48	100
2bke	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	100	0	0	0	0	0	0	0	0	0	0	0	0
2boz	0	0	0	0	0	0	4	23	100	15	25	100	31	37	100	21	17	100	0	41	100	25	22	38	6	34	100	0	34	100
2bt1	0	0	0	0	0	0	0	5	0	0	13	100	0	5	0	25	16	100	2	0	1	0	0	0	2	3	2	0	1	0
2buj	0	0	0	0	0	0	0	0	0	2	0	0	0	4	100	6	4	0	0	2	20	0	0	0	8	5	100	6	7	100
2fg0	0	0	0	0	0	0	0	0	0	0	0	0	3	2	0	0	0	0	0	3	0	3	0	2	3	1	100	1	2	0
3rd5	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	1	0	0	0	0	0	1	1	0	2	4	0
3tqe	0	0	0	0	0	0	0	0	0	29	3	0	17	19	0	0	8	100	2	18	100	27	14	100	8	28	100	6	26	100
4bsj	0	3	0	0	0	0	0	2	0	12	7	100	6	9	100	2	5	100	0	2	0	0	1	5	2	7	100	2	9	100
4gtf	0	0	0	0	0	0	0	0	0	8	7	0	17	12	100	6	6	100	6	23	0	23	6	0	23	19	100	6	31	0
4iqk	0	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	100	2	1	0	4	0	0	2	1	0	4	3	0
4q82	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
4qb6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4tpl	0	0	0	0	0	0	0	0	0	0	3	100	6	1	1	0	0	100	0	1	0	2	1	8	0	0	0	0	1	0

Table A3. Minimum iterations required for successful convergence of continuous and classical IPAs across 28 protein structures using conventional, resolution-weighted, and GA-enhanced strategies with one-step envelope reconstruction.

	ASR			RAAR			DM			MDM			MRAAR			HIO			CHIO			HPR			MCHIO			THIO
PDB ID	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA
1a53	-	-	1465	-	-	1113	-	-	-	-	1386	584	-	476	552	-	2155	899	-	1620	783	5150	540	257	-	1093	749	-	2219	574
1gaj	-	-	-	-	-	-	-	-	-	-	-	-	-	8730	-	-	-	1483	-	-	-	-	-	-	-	-	8939	-	-	3629
1gc7	-	-	-	-	-	-	3322	-	3633	389	8923	3770	417	1273	1752	653	1385	778	1029	5708	3176	1906	5891	4115	945	2350	2342	801	1467	1188
1gk6	-	-	-	-	-	-	-	-	-	3122	5854	-	1184	4018	-	650	-	1134	493	584	3981	6048	-	9208	1677	2262	2846	291	240	1542
1hes	-	-	-	-	-	-	128	153	225	154	9340	9248	125	189	269	79	117	222	1875	1938	2496	5914	3702	1742	223	683	1038	179	331	339
1hp4	-	-	-	-	-	-	-	-	-	6454	-	-	6710	-	-	2361	7382	-	8839	9475	-	-	-	-	9416	-	8763	1504	1702	3174
1nh6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1nw3	-	-	-	-	-	-	-	-	-	-	-	-	925	1368	839	1004	778	687	6898	999	-	8256	2707	1229	2969	3931	1051	525	1509	881
1reu	-	-	-	-	-	-	-	-	-	1173	3221	-	1251	3770	1040	2780	2144	-	3791	2759	1463	7345	2378	565	1946	1896	699	1616	634	380
1vh7	-	-	-	-	-	-	-	-	-	-	-	-	-	5181	6711	7288	9255	2362	7390	8810	9041	-	-	9465	7335	-	-	2486	-	4673
1vmg	-	-	-	-	-	-	987	1945	880	295	417	431	278	276	369	481	333	350	228	131	182	297	153	-	153	221	-	168	105	-
1xhd	-	-	-	-	-	-	77	131	86	75	5837	5347	83	135	94	71	76	89	50	73	57	68	125	129	55	69	41	53	75	79
1yh2	-	-	-	-	-	-	-	-	-	4873	7447	-	6726	9197	-	4403	3034	-	1645	5105	-	1418	9256	9167	139	888	-	1847	2372	2163
2b44	-	-	-	-	-	-	592	1699	1121	776	1785	637	1089	9261	8133	373	948	1054	1884	9079	-	8697	9160	8840	2233	8680	8570	328	7082	5992
2b71	-	-	-	-	-	-	200	512	-	256	328	190	194	166	185	217	114	152	770	170	81	96	142	169	180	127	191	419	140	232
2bke	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	1155	-	-	-	-	-	-	-	-	-	-	-	-
2boz	-	-	-	-	-	-	5584	1244	719	770	723	589	396	948	1031	468	698	581	-	3912	2568	2231	8515	8585	1052	5998	4708	-	3148	1292
2bt1	-	-	-	-	-	-	-	6697	-	-	1318	781	-	5758	-	1623	1217	1095	9491	-	8600	-	-	-	9464	9100	9149	-	8939	-
2buj	-	-	-	-	-	-	-	-	-	9146	-	-	-	4505	2972	2235	4430	-	-	8899	8779	-	-	-	5599	8706	7278	1677	5840	4516
2fg0	-	-	-	-	-	-	-	-	-	-	-	-	4153	5427	-	-	-	-	-	5508	-	7244	-	9160	7202	7491	4283	3170	4460	-
3rd5	-	-	-	-	-	-	-	-	-	-	-	-	-	8752	-	-	-	-	7228	-	-	-	-	-	6765	1470	-	7827	1691	-
3tqe	-	-	-	-	-	-	-	-	-	2394	8167	-	4095	2179	-	-	2692	1851	6432	2411	2751	1057	3501	4884	4353	2251	2012	4643	1519	1538
4bsj	-	2852	-	-	-	-	-	1416	-	2985	806	780	178	214	942	4600	715	447	-	8190	-	-	8671	8742	6734	167	4518	2504	2305	539
4gtf	-	-	-	-	-	-	-	-	-	2265	3371	-	521	1553	1332	2854	1822	1372	1433	3444	-	2021	8452	-	1034	3019	2544	1171	789	-
4iqk	-	-	-	-	-	-	-	-	-	-	-	-	5261	-	-	-	-	3080	7731	7334	-	3310	-	-	3326	5672	-	6318	4492	-
4q82	-	-	-	-	-	-	-	-	-	-	-	-	-	6416	-	-	-	-	-	3790	-	-	-	-	-	-	-	-	-	-
4qb6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4tpl	-	-	-	-	-	-	-	-	-	-	7609	5275	5318	9737	9769	-	-	2990	-	9418	-	2221	9169	8995	-	-	-	-	7088	-

Table A4. Median iterations required for successful convergence of continuous and classical IPAs across 28 protein structures using conventional, resolution-weighted, and GA-enhanced strategies with one-step envelope reconstruction.

	ASR			RAAR			DM			MDM			MRAAR			HIO			CHIO			HPR			MCHIO			THIO
PDB ID	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA
1a53	-	-	1701	-	-	1402	-	-	-	-	1819	900	-	2127	800	-	4211	1199	-	5896	1054	7314	2618	601	-	5110	1000	-	5300	800
1gaj	-	-	-	-	-	-	-	-	-	-	-	-	-	8730	-	-	-	2007	-	-	-	-	-	-	-	-	9111	-	-	4000
1gc7	-	-	-	-	-	-	3322	-	9200	4678	8923	4400	5310	4983	2018	4998	4959	1100	6426	8747	3902	8849	8445	4700	5952	6458	2802	4502	4680	1900
1gk6	-	-	-	-	-	-	-	-	-	4456	5854	-	3556	4018	-	4834	-	1500	5379	3658	4301	7202	-	9208	6268	5600	3204	3412	1744	1944
1hes	-	-	-	-	-	-	572	1844	500	472	9494	9444	515	589	500	351	469	418	8429	8631	8611	8588	8608	8571	5991	7029	6216	1238	2567	2337
1hp4	-	-	-	-	-	-	-	-	-	6616	-	-	7251	-	-	3728	7598	-	8975	9475	-	-	-	-	9416	-	9198	3403	3523	3500
1nh6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1nw3	-	-	-	-	-	-	-	-	-	-	-	-	2417	2856	1089	2460	2101	900	8388	6233	-	8906	5938	1902	7437	4356	1591	1242	1968	1302
1reu	-	-	-	-	-	-	-	-	-	1173	3221	-	5173	5516	1500	7342	6938	-	5078	6848	3768	7634	7390	5708	7379	6163	4562	5993	5076	8771
1vh7	-	-	-	-	-	-	-	-	-	-	-	-	-	5181	7200	7581	9255	2600	8044	8810	9073	-	-	9465	7335	-	-	3609	-	5101
1vmg	-	-	-	-	-	-	3751	1945	1205	1025	2508	818	824	1437	608	1674	631	716	1202	494	407	1450	1011	-	1330	563	-	620	706	-
1xhd	-	-	-	-	-	-	212	274	216	176	5950	5782	224	434	300	172	343	200	371	240	234	326	348	300	253	252	208	286	364	269
1yh2	-	-	-	-	-	-	-	-	-	5382	8466	-	7838	9197	-	4872	8040	-	5344	6965	-	4854	9256	9167	4600	2002	-	4701	5055	2538
2b44	-	-	-	-	-	-	1772	4050	1400	1044	6775	1057	5149	9261	8895	980	5151	1401	9023	9393	-	8779	9487	9327	8720	9128	9182	2421	8902	6308
2b71	-	-	-	-	-	-	5694	7436	-	1878	5592	572	2884	2514	500	1070	1514	400	3466	8818	702	8180	8705	703	2214	6624	652	1068	3418	600
2bke	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	1600	-	-	-	-	-	-	-	-	-	-	-	-
2boz	-	-	-	-	-	-	5808	3156	1100	803	2567	800	840	3935	1300	1502	1330	700	-	5929	3205	7735	8938	8932	1327	8056	5092	-	5934	1802
2bt1	-	-	-	-	-	-	-	7346	-	-	6636	1155	-	7596	-	6629	4568	1801	9491	-	8600	-	-	-	9464	9389	9332	-	8939	-
2buj	-	-	-	-	-	-	-	-	-	9146	-	-	-	5738	3400	2340	7514	-	-	9092	9232	-	-	-	7772	9127	7817	3234	8075	4801
2fg0	-	-	-	-	-	-	-	-	-	-	-	-	7672	7392	-	-	-	-	-	6779	-	7761	-	9194	7221	7491	4514	3170	6188	-
3rd5	-	-	-	-	-	-	-	-	-	-	-	-	-	8752	-	-	-	-	7228	-	-	-	-	-	6765	1470	-	8332	1767	-
3tqe	-	-	-	-	-	-	-	-	-	5570	9181	-	6730	6515	-	-	6834	2036	6432	5447	3012	6832	6878	5304	6340	6236	2500	8307	5414	1816
4bsj	-	6638	-	-	-	-	-	4118	-	4226	4587	1072	315	3780	1304	4600	2067	802	-	8533	-	-	8671	8819	6734	5832	4902	2504	7003	901
4gtf	-	-	-	-	-	-	-	-	-	3936	8020	-	2462	4716	1700	6136	3020	1560	1663	6649	-	5097	8932	-	3437	8129	3020	1934	4660	-
4iqk	-	-	-	-	-	-	-	-	-	-	-	-	5261	-	-	-	-	3302	7731	7334	-	5875	-	-	3326	5672	-	6332	4709	-
4q82	-	-	-	-	-	-	-	-	-	-	-	-	-	6416	-	-	-	-	-	3790	-	-	-	-	-	-	-	-	-	-
4qb6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4tpl	-	-	-	-	-	-	-	-	-	-	7833	5600	7872	9737	9769	-	-	3300	-	9418	-	2221	9169	9582	-	-	-	-	7088	-

Table A5. Phase recovery success rates (%) for continuous and classical IPAs across 28 protein structures using conventional, resolution-weighted, and GA-enhanced strategies with two-step envelope reconstruction.

	ASR			RAAR			DM			MDM			MRAAR			HIO			CHIO			HPR			MCHIO			THIO
PDB ID	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA
1a53	0	0	100	0	0	100	0	0	0	0	71	100	6	75	100	0	10	100	0	4	100	3	73	100	0	10	100	0	4	100
1gaj	0	0	0	0	0	0	0	0	0	1	0	100	11	21	100	8	21	100	6	33	100	7	23	100	8	17	100	8	23	100
1gc7	0	0	0	0	0	0	4	4	100	21	10	98	37	33	0	48	31	100	47	35	100	50	35	100	52	42	100	50	48	100
1gk6	0	0	0	0	0	0	0	2	0	6	4	100	13	19	100	7	4	100	13	12	100	21	10	100	12	15	100	8	8	100
1hes	0	0	0	0	0	0	44	52	100	39	42	84	56	50	100	58	50	100	61	50	100	48	52	100	61	52	100	50	42	100
1hp4	0	0	0	0	0	0	0	0	0	9	2	0	12	6	0	19	15	100	17	0	3	4	0	0	12	4	0	14	19	100
1nh6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	6	2	0	0	0	100	0	0	0	0	0	100	6	0	100
1nw3	0	0	0	0	0	0	0	0	0	0	0	0	29	42	100	54	48	100	48	35	100	48	31	100	44	42	100	46	50	100
1reu	0	0	0	0	0	0	0	0	0	15	12	100	35	21	100	58	48	100	38	60	100	29	37	100	42	40	100	56	50	100
1vh7	0	0	0	0	0	0	0	0	0	25	0	100	8	8	100	15	27	100	21	23	100	19	8	100	19	27	100	15	21	100
1vmg	0	0	0	0	0	0	16	12	0	32	25	100	32	38	100	31	37	0	19	31	100	26	33	100	31	27	100	23	29	0
1xhd	0	0	0	0	0	0	50	44	100	65	75	100	100	100	100	100	100	100	100	100	100	100	100	100	100	100	100	100	100	100
1yh2	0	0	0	0	0	0	0	0	0	27	8	0	25	4	78	25	29	100	25	6	2	10	4	0	17	10	100	29	12	100
2b44	0	0	0	0	0	0	19	25	100	27	23	100	0	2	0	48	38	100	44	21	0	21	15	24	44	21	99	38	46	100
2b71	0	0	0	0	0	0	10	10	0	29	15	100	35	52	100	50	46	100	35	46	100	52	54	100	35	31	100	29	40	100
2bke	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	2	0	100	0	0	0	0	0	4	2	0	100	0	2	0
2boz	0	0	0	0	0	0	0	27	100	33	42	100	40	31	100	0	31	100	15	52	100	46	58	100	19	48	100	27	54	100
2bt1	2	0	0	2	0	0	0	0	100	2	15	100	0	8	100	19	25	100	8	27	100	17	8	100	17	31	100	27	50	100
2buj	0	0	0	0	0	0	0	0	0	2	2	100	6	6	0	17	2	0	10	2	100	15	2	100	19	4	100	8	4	100
2fg0	0	0	0	0	0	0	0	0	0	8	4	0	7	2	2	0	5	0	0	4	100	3	2	0	0	4	0	1	7	100
3rd5	0	0	0	0	0	0	0	0	0	3	1	0	4	1	0	1	8	0	1	4	100	4	5	0	1	3	0	1	5	100
3tqe	0	0	0	0	0	0	0	0	0	75	33	100	58	60	100	6	40	100	4	52	100	54	50	100	21	58	100	12	38	100
4bsj	0	4	0	0	0	0	0	0	100	10	42	100	15	12	100	8	15	100	15	27	100	12	31	100	2	19	100	12	19	100
4gtf	0	0	0	0	0	0	0	2	0	29	10	100	25	40	100	10	21	0	12	27	0	38	38	0	15	35	100	19	21	0
4iqk	0	0	0	0	0	0	0	0	0	10	2	0	10	0	1	2	10	1	0	2	0	6	0	0	0	0	0	0	2	0
4q82	0	0	0	0	0	0	0	0	0	2	0	0	2	2	27	0	6	0	0	0	0	4	0	0	0	0	0	0	0	0
4qb6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4tpl	0	0	0	0	0	0	0	0	0	6	8	100	4	8	100	2	6	100	0	2	74	8	8	100	2	6	100	0	8	100

Table A6. Minimum iterations required for successful convergence of continuous and classical IPAs across 28 protein structures using conventional, resolution-weighted, and GA-enhanced strategies with two-step envelope reconstruction.

	ASR			RAAR			DM			MDM			MRAAR			HIO			CHIO			HPR			MCHIO			THIO
PDB ID	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA
1a53	-	-	1640	-	-	1997	-	-	-	-	511	306	3331	640	478	-	1801	484	-	1179	1014	5771	706	521	-	3488	452	-	5047	846
1gaj	-	-	-	-	-	-	-	-	-	693	-	919	1898	3818	590	2203	1019	1110	1502	1661	664	5812	619	1125	1876	1018	1064	955	1459	670
1gc7	-	-	-	-	-	-	1619	7012	2549	753	1414	3397	951	798	-	445	553	657	741	419	1146	807	1209	1066	360	1129	1297	666	1044	1131
1gk6	-	-	-	-	-	-	-	6651	-	584	1129	561	476	543	367	436	1377	288	339	1253	290	483	2287	737	384	558	540	556	835	349
1hes	-	-	-	-	-	-	135	802	198	151	9265	9201	168	292	347	175	226	227	147	244	491	130	140	346	149	264	271	91	243	265
1hp4	-	-	-	-	-	-	-	-	-	1405	3909	-	4780	3570	-	937	3808	1574	1714	-	9161	8368	-	-	1413	8889	-	1050	2027	4163
1nh6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	5636	8465	-	-	-	6611	-	-	-	-	-	3660	3602	-	2564
1nw3	-	-	-	-	-	-	-	-	-	-	-	-	1228	1057	848	800	1216	438	555	768	940	524	1251	942	929	1388	545	463	873	640
1reu	-	-	-	-	-	-	-	-	-	668	555	568	1147	1487	897	463	639	528	618	1122	185	1170	1665	396	968	2781	807	788	579	924
1vh7	-	-	-	-	-	-	-	-	-	734	-	1591	3460	1436	3061	1794	957	880	1748	2921	1328	1935	3761	1531	1699	2171	832	880	1200	1296
1vmg	-	-	-	-	-	-	670	725	-	228	396	424	173	272	320	270	325	-	248	321	180	236	296	257	168	295	258	436	325	-
1xhd	-	-	-	-	-	-	70	63	69	64	5186	4756	86	117	77	76	73	98	63	78	58	75	107	108	68	95	89	83	87	94
1yh2	-	-	-	-	-	-	-	-	-	2087	2559	-	1590	8081	7558	1595	1651	2169	1014	4537	9004	3429	6839	-	1717	3889	4456	1118	805	2902
2b44	-	-	-	-	-	-	666	2498	875	338	1218	1168	-	9223	-	443	2990	1543	1092	8457	-	4734	9033	8717	743	7069	4483	464	1358	986
2b71	-	-	-	-	-	-	3150	2190	-	171	176	233	278	325	383	162	231	182	195	206	271	211	201	278	167	105	158	184	196	134
2bke	-	-	-	-	-	-	-	-	-	5408	-	-	-	-	-	7768	-	2365	-	-	-	-	-	8142	3096	-	6189	-	4794	-
2boz	-	-	-	-	-	-	-	1392	915	470	1355	572	792	1607	1137	-	1310	769	579	1172	527	819	1000	686	561	1015	597	934	697	532
2bt1	7511	-	-	7806	-	-	-	-	2995	8920	1868	868	-	1339	859	905	777	767	796	456	479	2791	5522	1296	663	2374	1761	1470	1007	640
2buj	-	-	-	-	-	-	-	-	-	1903	6964	1131	2656	3770	-	2449	3727	-	1907	8779	2883	1672	6220	3960	1484	5204	2904	1833	7342	1447
2fg0	-	-	-	-	-	-	-	-	-	2844	3622	-	3299	8450	8504	-	5444	-	-	4849	3895	5759	6009	-	-	5304	-	4667	2744	1920
3rd5	-	-	-	-	-	-	-	-	-	1579	8055	-	2063	7747	-	5126	1576	-	7802	3427	5097	3020	2422	-	8740	3050	-	3996	2316	2435
3tqe	-	-	-	-	-	-	-	-	-	532	3465	1762	935	1637	1972	4145	1391	1226	4385	1454	5097	2049	1929	1566	3193	1541	1356	4439	1875	1285
4bsj	-	6951	-	-	-	-	-	-	1029	1305	500	592	1359	1890	587	453	1133	541	918	1154	390	1380	348	774	6365	798	824	591	941	340
4gtf	-	-	-	-	-	-	-	5149	-	1127	1055	785	1329	1048	1814	2010	758	-	1962	1168	-	1481	2608	-	2414	1280	1316	1056	841	-
4iqk	-	-	-	-	-	-	-	-	-	1567	8230	-	3110	-	9233	6994	6152	9314	-	9016	-	3644	-	-	-	-	-	-	5534	-
4q82	-	-	-	-	-	-	-	-	-	7035	-	-	7467	7752	8102	-	3553	-	-	-	-	6915	-	-	-	-	-	-	-	-
4qb6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4tpl	-	-	-	-	-	-	-	-	-	1403	6653	2673	2825	2929	3187	2474	3906	2046	-	2972	8847	5330	4490	2718	3429	2555	2168	-	4433	2978

Table A7. Median iterations required for successful convergence of continuous and classical IPAs across 28 protein structures using conventional, resolution-weighted, and GA-enhanced strategies with two-step envelope reconstruction.

	ASR			RAAR			DM			MDM			MRAAR			HIO			CHIO			HPR			MCHIO			THIO
PDB ID	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA	Con	Res	GA
1a53	-	-	1902	-	-	2304	-	-	-	-	1705	581	4956	1824	700	-	2337	800	-	2248	1200	6579	3008	801	-	5652	700	-	6318	1200
1gaj	-	-	-	-	-	-	-	-	-	693	-	1110	6182	6220	1000	7427	4386	1389	7468	5814	976	7092	4613	1401	7676	5719	1401	5510	5558	912
1gc7	-	-	-	-	-	-	4838	8239	2900	3541	2594	3800	3251	3246	-	2844	3034	1000	2332	4263	1524	3060	4100	1600	3404	2945	1800	3034	4636	1500
1gk6	-	-	-	-	-	-	-	6651	-	1124	2652	1011	2060	2912	702	4528	3452	608	1962	2337	702	3355	3975	1103	1569	4676	901	3098	2360	600
1hes	-	-	-	-	-	-	508	1645	474	330	9410	9373	1220	1048	600	482	758	500	525	939	1282	756	1063	1010	503	498	600	342	494	500
1hp4	-	-	-	-	-	-	-	-	-	4196	3909	-	5385	3658	-	4057	5309	1800	6048	-	9349	9142	-	-	3520	9100	-	4454	5180	4400
1nh6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	5842	8465	-	-	-	6900	-	-	-	-	-	4000	6136	-	2800
1nw3	-	-	-	-	-	-	-	-	-	-	-	-	1844	3246	1114	2056	2816	700	2437	2474	1304	2349	2176	1201	1715	4288	902	1634	1633	901
1reu	-	-	-	-	-	-	-	-	-	2238	2634	969	3534	5582	1500	5093	5336	902	6316	5630	602	4138	7223	1340	4888	5645	1235	4229	4682	1400
1vh7	-	-	-	-	-	-	-	-	-	3457	-	1900	4320	5199	3600	4017	5912	1500	4530	5463	1606	4438	7608	1901	3774	3464	1301	2716	3960	1578
1vmg	-	-	-	-	-	-	3922	2889	-	821	1365	638	523	902	538	952	1565	-	752	719	417	680	1405	504	618	1982	507	918	894	-
1xhd	-	-	-	-	-	-	240	205	200	198	5598	5081	539	594	300	381	408	218	414	392	219	480	434	300	266	428	236	474	326	215
1yh2	-	-	-	-	-	-	-	-	-	3526	4809	-	5418	8779	8000	2794	5534	2401	4764	5277	9020	4742	6922	-	3554	8128	4901	5229	4638	3150
2b44	-	-	-	-	-	-	1429	5854	1100	823	1620	1455	-	9223	-	1569	8159	1910	5758	9188	-	8764	9236	9264	2508	8606	4904	1199	4068	1462
2b71	-	-	-	-	-	-	6367	4319	-	1192	530	738	1721	1285	700	727	1427	401	1377	1068	601	1048	902	600	1442	862	500	1026	836	500
2bke	-	-	-	-	-	-	-	-	-	5408	-	-	-	-	-	7768	-	2802	-	-	-	-	-	9036	3096	-	6848	-	4794	-
2boz	-	-	-	-	-	-	-	2686	1146	2124	4732	716	1334	3650	1402	-	3968	905	1339	2087	903	1271	3375	915	1159	2311	801	2764	1896	720
2bt1	7511	-	-	7806	-	-	-	-	3400	8920	4722	1222	-	5580	1200	3283	3486	1101	6274	3596	801	8290	7662	1800	6210	5205	2100	2946	3181	1100
2buj	-	-	-	-	-	-	-	-	-	1903	6964	1440	3086	8763	-	4094	3727	-	3398	8779	3301	3977	6220	4301	3041	6584	3102	3964	8401	1800
2fg0	-	-	-	-	-	-	-	-	-	6096	6188	-	5880	8677	8738	-	6405	-	-	5510	4300	7398	7578	-	-	7010	-	4667	6266	2188
3rd5	-	-	-	-	-	-	-	-	-	2468	8055	-	3766	7747	-	5126	5140	-	7802	4060	5362	6716	3691	-	8740	6559	-	3996	3374	2701
3tqe	-	-	-	-	-	-	-	-	-	5386	6774	2076	4036	6345	2304	6060	4891	1430	4509	3349	5362	5068	5900	1824	6720	4598	1624	6094	4032	1574
4bsj	-	7900	-	-	-	-	-	-	1500	3974	3023	808	5175	5609	1004	3820	1596	806	5258	5895	804	6203	2426	1192	6365	5657	1107	4907	4305	616
4gtf	-	-	-	-	-	-	-	5149	-	2993	3484	1100	3852	4506	2101	3645	3312	-	4493	4797	-	5054	6480	-	3537	3842	1508	4379	2258	-
4iqk	-	-	-	-	-	-	-	-	-	7851	8230	-	4158	-	9233	6994	8353	9314	-	9016	-	7175	-	-	-	-	-	-	5534	-
4q82	-	-	-	-	-	-	-	-	-	7035	-	-	7467	7752	8515	-	4613	-	-	-	-	7956	-	-	-	-	-	-	-	-
4qb6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4tpl	-	-	-	-	-	-	-	-	-	6473	6764	3011	4812	7748	3604	2474	3939	2306	-	2972	9081	7546	5946	3002	3429	2874	2556	-	6149	3300

References

Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Terwilliger, T.C.; Afonine, P.V.; Liebschner, D.; Croll, T.I.; McCoy, A.J.; Oeffner, R.D.; Williams, C.J.; Poon, B.K.; Richardson, J.S.; Richardson, J.S.; et al. Accelerating crystal structure determination with iterative AlphaFold prediction. Acta Cryst. D 2023, 79, 234–244. [Google Scholar] [CrossRef]
Li, Z.; Fan, H.; Ding, W. Solving protein structures by combining structure prediction, molecular replacement and direct-methods-aided model completion. IUCrJ 2024, 11, 152–167. [Google Scholar] [CrossRef] [PubMed]
Sayre, D. The squaring method: A new method for phase determination. Acta Cryst. 1952, 5, 60–65. [Google Scholar] [CrossRef]
Cochran, W.T. Relations between the phases of structure factors. Acta Cryst. 1955, 8, 473–478. [Google Scholar] [CrossRef]
Karle, J.; Hauptman, H. A theory of phase determination for the four types of non-centrosymmetric space groups 1P222, 2P22, 3P₁2, 3P₂2. Acta Cryst. 1956, 9, 635–651. [Google Scholar] [CrossRef]
Schenk, H. An Introduction to Direct Methods: The Most Important Phase Relationships and Their Application in Solving the Phase Problem; University College Cardiff Press: Cardiff, UK, 1984. [Google Scholar]
Miller, R.; DeTitta, G.T.; Jones, R.; Langs, D.A.; Weeks, C.M.; Hauptman, H.A. On the application of the minimal principle to solve unknown structures. Science 1993, 259, 1430–1433. [Google Scholar] [CrossRef]
Sheldrick, G.M. A short history of SHELX. Acta Cryst. A 2008, 64, 112–122. [Google Scholar] [CrossRef]
Giacovazzo, C.; Siliqi, D.; Gonzalez Platas, J.; Hecht, H.J.; Zanotti, G.; York, B. The Ab Initio Crystal Structure Solution of Proteins by Direct Methods. VI. Complete Phasing up to Derivative Resolution. Acta Cryst. A 1996, 52, 813–825. [Google Scholar] [CrossRef]
Su, W.-P. Retrieving low- and medium-resolution structural features of macromolecules directly from the diffraction intensities—A real-space approach to the X-ray phase problem. Acta Cryst. A 2008, 64, 625–630. [Google Scholar] [CrossRef]
Fienup, J.R. Phase retrieval algorithms: A comparison. Appl. Opt. 1982, 21, 2758–2769. [Google Scholar] [CrossRef]
Elser, V. Phase retrieval by iterated projections. J. Opt. Soc. Am. A 2003, 20, 40–55. [Google Scholar] [CrossRef]
Elser, V. Solution of the crystallographic phase problem by iterated projections. Acta Cryst. A 2003, 59, 201–209. [Google Scholar] [CrossRef]
Millane, R.P.; Stroud, W.J. Reconstructing symmetric images from their undersampled Fourier intensities. J. Opt. Soc. Am. A 1997, 14, 568–579. [Google Scholar] [CrossRef]
Millane, R.P. Phase retrieval in crystallography and optics. J. Opt. Soc. Am. A 1990, 7, 394–411. [Google Scholar] [CrossRef]
Plas, J.L.; Millane, R.P. Ab Initio Phasing Protein Crystallography. In Image Reconstruction from Incomplete Data; SPIE: Bellingham, WA, USA, 2000; Volume 4123. [Google Scholar]
Miao, J.; Sayer, D.; Chapman, H.N. Phase retrieval from the magnitude of the Fourier transforms of non-periodic objects. J. Opt. Soc. Am. 1998, 15, 1662–1669. [Google Scholar] [CrossRef]
Marchesini, S.; He, H.; Chapman, H.N.; Hau-Riege, S.P.; Noy, A.; Howells, M.R.; Weierstall, U.; Spence, J.C.H. X-ray image reconstruction from a diffraction pattern alone. Phys. Rev. B 2003, 68, 140101. [Google Scholar] [CrossRef]
Lunin, V.Y.; Lunina, N.L.; Petrova, T.E.; Urzhumtsev, A.G.; Podjarny, A.D. On the ab initio solution of the phase problem for macromolecules at very low resolution. II. Generalized likelihood based approach to cluster discrimination. Acta Cryst. D 1998, 54, 726–734. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.C.; Xu, R.; Dong, Y.H. Phase retrieval in protein crystallography. Acta Cryst. A 2012, 68, 256–265. [Google Scholar] [CrossRef] [PubMed]
He, H.; Su, W.-P. Direct phasing of protein crystals with high solvent content. Acta Cryst. A 2015, 71, 92–98. [Google Scholar] [CrossRef] [PubMed]
Millane, R.P.; Lo, V.L. Iterative projection algorithms in protein crystallography. I. Theory. Acta Cryst. A 2013, 69, 517–527. [Google Scholar] [CrossRef]
Lo, V.L.; Kingston, R.L.; Millane, R.P. Iterative projection algorithms in protein crystallography. II. Application. Acta Cryst. A 2015, 71, 451–459. [Google Scholar] [CrossRef] [PubMed]
Lo, V.L.; Kingston, R.L.; Millane, R.P. Iterative projection algorithms for Ab Initio Phasing Virus Crystallography. J. Struct. Biol. 2016, 196, 407–413. [Google Scholar] [CrossRef] [PubMed]
Kingston, R.L.; Millane, R.P. A general method for directly phasing diffraction data from high-solvent-content protein crystals. IUCrJ 2022, 9, 648–665. [Google Scholar] [CrossRef]
Barnett, M.J.; Millane, R.P.; Kingston, R.L. Analysis of crystallographic phase retrieval using iterative projection algorithms. Acta Cryst. D 2024, 80, 800–818. [Google Scholar] [CrossRef]
Pan, T.; Dramko, E.; Miller, M.D.; Kyrillidisa, A.; George, N.P., Jr. Completion of partial structures using Patterson maps with the CrysFormer machine-learning model. Acta Cryst. D 2025, 81, 668–677. [Google Scholar] [CrossRef]
He, H.; Jiang, M.C.; Su, W.-P. Direct phasing of protein crystals with non-crystallographic symmetry. Crystals 2019, 9, 55. [Google Scholar] [CrossRef]
Fu, R.; Su, W.-P.; He, H. Refining protein envelopes with a transition region for enhanced direct phasing in protein crystallography. Crystals 2024, 14, 85. [Google Scholar] [CrossRef]
He, H.; Su, W.-P. Improving the convergence rate of a hybrid input-output phasing algorithm by varying the reflection data weight. Acta Cryst. A 2018, 74, 36–43. [Google Scholar] [CrossRef]
Fu, R.; Su, W.-P.; He, H. Genetic algorithm-enhanced direct method in protein crystallography. Molecules 2025, 30, 288. [Google Scholar] [CrossRef]
Zhang, K.Y.J.; Main, P. Histogram matching as a new density modification technique for phase refinement and extension of protein molecules. Acta Cryst. A 1990, 46, 41–46. [Google Scholar] [CrossRef]
Fienup, J.R. Phase retrieval with continuous version of hybrid input-output. In Frontiers in Optics, OSA Technical Digest (CD); Optica Publishing Group: Washington, DC, USA, 2003. [Google Scholar] [CrossRef]
Bauschke, H.H.; Combettes, P.L.; Luke, D.R. Hybrid projection-reflection method for phase retrieval. J. Opt. Soc. Am. A 2003, 20, 1025–1034. [Google Scholar] [CrossRef]
Matthews, B.W. Solvent Content of Protein Crystals. J. Mol. Biol. 1968, 33, 491–497. [Google Scholar] [CrossRef] [PubMed]
Wang, B.C. Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 1985, 115, 90–112. [Google Scholar] [CrossRef]
Chojnowski, G.; Pereira, J.; Lamzin, V.S. Sequence assignment for low-resolution modeling of protein crystal structures. Acta Cryst. D 2019, 75, 753–763. [Google Scholar] [CrossRef] [PubMed]
Kovalevskiy, O.; Nicholls, R.A.; Long, F.; Murshudov, G.N. Overview of refinement procedures within REFMAC5: Utilizing data from different sources. Acta Cryst. D 2018, 74, 492–505. [Google Scholar] [CrossRef]
Cowtan, K. Fitting molecular fragments into electron density. Acta Cryst. D 2008, 64, 83–89. [Google Scholar] [CrossRef]
Winn, M.D.; Ballard, C.C.; Cowtan, K.D.; Dodson, E.J.; Emsley, P.; Evans, P.R.; Keegan, R.M.; Krissinel, E.B.; Leslie, A.G.; McCoy, A.; et al. Overview of the CCP4 suite and current developments. Acta Cryst. D 2011, 67, 235–242. [Google Scholar] [CrossRef] [PubMed]
Adams, P.D.; Afonine, P.V.; Bunkóczi, G.; Chen, V.B.; Davis, I.W.; Echoo ls, N.; Headd, J.J.; Hung, L.-W.; Kapral, G.J.; Grosse-Kunstleve, R.W.; et al. PHENIX: A comprehensive Python-based system for macromolecular structure solution. Acta Cryst. D 2010, 66, 213–221. [Google Scholar] [CrossRef]
Terwilliger, T.C.; Grosse-Kunstleve, R.W.; Afonine, P.V.; Moriarty, N.W.; Zwart, P.H.; Hung, L.-W.; Read, R.J.; Adams, P.D. Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. Acta Cryst. D 2008, 64, 61–69. [Google Scholar] [CrossRef]
Bauschke, H.H.; Combettes, P.L.; Luke, D.R. Phase retrieval, error reduction algorithm, and Fienup variants: A view from convex optimization. J. Opt. Soc. Am. A 2002, 19, 1334–1345. [Google Scholar] [CrossRef]
Luke, D.R. Relaxed averaged alternating reflections for diffraction imaging. Inverse Probl. 2005, 21, 37–50. [Google Scholar] [CrossRef]
He, H.; Liu, Y.; Su, W.-P. Direct phasing of protein crystals with hybrid difference map algorithms. Molecules, 2026; in press. [Google Scholar] [CrossRef]
Baugh, L.; Phan, I.; Begley, D.W.; Clifton, M.C.; Armour, B.; Dranow, D.M.; Taylor, B.M.; Muruthi, M.M.; Abendroth, J.; Fairman, J.W.; et al. Increasing the structural coverage of tuberculosis drug targets. Tuberculosis 2015, 95, 142–148. [Google Scholar] [CrossRef] [PubMed]
Xu, Q.; Sudek, S.; McMullan, D.; Miller, M.D.; Geierstanger, B.; Jones, D.H.; Krishna, S.S.; Spraggon, G.; Bursalay, B.; Abdubek, P.; et al. Structural basis of murein peptide specificity of a gamma-D-glutamyl-L-diamino acid endopeptidase. Structure 2009, 17, 303–313. [Google Scholar] [CrossRef]
Afonine, P.V.; Grosse-Kunstleve, R.W.; Echols, N.; Headd, J.J.; Moriarty, N.W.; Mustyakimov, M.; Terwilliger, T.C.; Urzhumtsev, A.; Zwart, P.H.; Adams, P.D. Towards automated crystallographic structure refinement with phenix.refine. Acta Cryst. D 2012, 68, 352–367. [Google Scholar] [CrossRef] [PubMed]
Cowtan, K. The Clipper C++ libraries for X-ray crystallography. IUCr Comput. Comm. Newsl. 2003, 2, 4–9. Available online: http://www.iucr.org/resources/commissions/computing/newsletters/2 (accessed on 1 December 2025).
Wilson, A.J.C. The probability distribution of X-ray intensities. Acta Cryst. 1949, 2, 318–321. [Google Scholar] [CrossRef]
Fokine, A.; Urzhumtsev, A. Flat bulk-solvent model: Obtaining optimal parameters. Acta Cryst. D 2002, 58, 1387–1392. [Google Scholar] [CrossRef]
Osipiuk, J.; Zhou, M.; Moy, S.; Collart, F.; Joachimiak, A. X-Ray Crystal Structure of Putative Acetyltransferase, Product of BC4754 Gene from Bacillus cereus; RCSB PDB: Piscataway, NJ, USA, 2026. [Google Scholar]
Sainz-Polo, M.A.; Valenzuela, S.V.; Gonzalez, B.; Pastor, F.I.; Sanz-Aparicio, J. Structural analysis of glucuronoxylan-specific xyn30D and its attached CBM35 domain gives insights into the role of modularity in specificity. J. Biol. Chem. 2014, 289, 31088–31101. [Google Scholar] [CrossRef]
Kim, Y.; Hatzos-Skintges, C.; Endres, M.; Joachimiak, A. Crystal Structure of Phospholipase/Carboxylesterase from Haliangium ochraceum. To be published. 2026. Available online: https://www.rcsb.org/structure/4Q82 (accessed on 1 December 2025).
Ariza, A.; Richard, D.L.; White, M.F.; Bond, C.S. Conformational flexibility revealed by the crystal structure of a crenarchaeal RadA. Nucleic Acids Res. 2005, 33, 1465. [Google Scholar] [CrossRef]
Akey, D.L.; Brown, W.C.; Konwerski, J.R.; Ogata, C.M.; Smith, J.L. Use of massively multiple merged data for low-resolution S-SAD phasing and refinement of flavivirus NS1. Acta Cryst. D 2014, 70, 2719–2729. [Google Scholar] [CrossRef]
Mark, B.L.; Vocadlo, D.J.; Knapp, S.; Triggs-Raine, B.L.; Withers, S.G.; James, M.N. Crystallographic evidence for substrate-assisted catalysis in a bacterial beta-hexosaminidase. J. Biol. Chem. 2001, 276, 10330–10337. [Google Scholar] [CrossRef] [PubMed]
Aronson, N.N., Jr.; Halloran, B.A.; Alexyev, M.F.; Amable, L.; Madura, J.D.; Pasupulati, L.; Worth, C.; Van Roey, P. Family 18 chitinase-oligosaccharide substrate interaction: Subsite preference and anomer selectivity of Serratia marcescens chitinase A. Biochem. J. 2003, 376, 87–95. [Google Scholar] [CrossRef] [PubMed]
Potter, J.A.; Fyfe, P.K.; Frolov, D.; Wakeham, M.C.; Van Grondelle, R.; Robert, B.; Jones, M.R. Strong effects of an individual water molecule on the rate of light-driven charge separation in the rhodobacter sphaeroides reaction center. J. Biol. Chem. 2005, 280, 27155. [Google Scholar] [CrossRef] [PubMed]
Koehn, E.M.; Perissinotti, L.L.; Moghram, S.; Prabhakar, A.; Lesley, S.A.; Mathews, I.I.; Kohen, A. Folate binding site of flavin-dependent thymidylate synthase. Proc. Natl. Acad. Sci. USA 2012, 109, 15722–15727. [Google Scholar] [CrossRef]
Schrödinger LLC. The PyMOL Molecular Graphics System; Version 3.0; Schrödinger LLC: New York, NY, USA, 2025. [Google Scholar]
Millane, R.P.; Arnal, R.D. Uniqueness of the macromolecular crystallographic phase problem. Acta Cryst. A 2015, 71, 592–598. [Google Scholar] [CrossRef]
Elser, V.; Millane, R.P. Reconstruction of an object from its symmetry—Averaged diffraction pattern. Acta Cryst. A 2008, 64, 273–279. [Google Scholar] [CrossRef]
Marcotte, D.; Zeng, W.; Hus, J.C.; McKenzie, A.; Hession, C.; Jin, P.; Bergeron, C.; Lugovskoy, A.; Enyedy, I.; Cuervo, H.; et al. Small molecules inhibit the interaction of Nrf2 and the Keap1 Kelch domain through a non-covalent mechanism. Bioorg. Med. Chem. 2013, 21, 4011–4019. [Google Scholar] [CrossRef]
Direct Phasing of Protein Crystals with Continuous IPAs and Refined Envelope. Available online: https://github.com/hhe2/Direct-Phasing-of-Protein-Crystals-with-Continuous-IPAs-and-Refined-Envelope (accessed on 15 December 2025).

Figure 1. Schematic illustration of the iterative projection algorithm for direct phasing. Starting from random initial phases or density, the algorithm alternates between real-space operations and reciprocal-space constraints. Convergence is typically achieved after several thousand iterations, yielding interpretable electron density maps or phases.

Figure 2. Schematic illustration of density modification strategies across different algorithms.

ρ_{k}^{'} = P_{B} ρ_{k}

. The horizontal axis spans from the bulk solvent region (left) through the boundary to the protein region (right), while the vertical axis indicates the protein-solvent boundary defined by the coarse envelope. HIO exhibits discontinuous modification at the boundary, while continuous IPAs (CHIO, HPR, MCHIO, THIO) achieve smooth transitions through a refined transition region. Classical algorithms (DM, ASR, RAAR) show continuous modification but lack protein-region constraints, resulting in lower success rates.

Figure 2. Schematic illustration of density modification strategies across different algorithms.

ρ_{k}^{'} = P_{B} ρ_{k}

. The horizontal axis spans from the bulk solvent region (left) through the boundary to the protein region (right), while the vertical axis indicates the protein-solvent boundary defined by the coarse envelope. HIO exhibits discontinuous modification at the boundary, while continuous IPAs (CHIO, HPR, MCHIO, THIO) achieve smooth transitions through a refined transition region. Classical algorithms (DM, ASR, RAAR) show continuous modification but lack protein-region constraints, resulting in lower success rates.

Figure 3. Performance of HPR for ab initio phasing of 3rd5 using a conventional full-resolution strategy with two-step refined envelope reconstruction. Evolution of five quality metrics over 10,000 iterations from 100 independent trials with different random seeds: (a) mean phase error

Δ ϕ

, (b) protein envelope IoU, (c)

R_{w o r k}

, (d)

R_{f r e e}

, (e) solvent density deviation

Δ ρ

. Each curve represents one trial. Four trials successfully converged, showing sudden improvement in all metrics. In iterations 8000–9500, the transition region was linearly reduced from 5% (of the asymmetric unit) to zero, followed by solvent flattening in iterations 9500–10,000. (f) Final R-factor distribution clearly distinguishes the four successful solutions (low R-value) from 96 failed attempts (clustered at high R-values).

Figure 3. Performance of HPR for ab initio phasing of 3rd5 using a conventional full-resolution strategy with two-step refined envelope reconstruction. Evolution of five quality metrics over 10,000 iterations from 100 independent trials with different random seeds: (a) mean phase error

Δ ϕ

, (b) protein envelope IoU, (c)

R_{w o r k}

, (d)

R_{f r e e}

, (e) solvent density deviation

Δ ρ

. Each curve represents one trial. Four trials successfully converged, showing sudden improvement in all metrics. In iterations 8000–9500, the transition region was linearly reduced from 5% (of the asymmetric unit) to zero, followed by solvent flattening in iterations 9500–10,000. (f) Final R-factor distribution clearly distinguishes the four successful solutions (low R-value) from 96 failed attempts (clustered at high R-values).

Figure 4. Improved two-step envelope reconstruction versus conventional one-step approach. The refined method generates transition regions with adaptive thickness based on local density, capturing fine structural details while minimizing protein inclusion, unlike the uniform-shell approach. (a,b) Envelope comparison for 3rd5 and 2fg0 demonstrating superior detail recovery with the two-step method. Blue dash circles in (b) indicate surface cavities and internal channels. (c,d) Evolution of 3rd5 envelope during CHIO phasing: cross-section and full view showing progression from incomplete (pre-convergence) to complete protein encapsulation (post-convergence).

Figure 5. Benchmark results on 28 proteins using conventional full-resolution strategy. Metrics shown: (a) Success rates of HIO and HPR algorithms using conventional full-resolution phasing strategy, one-step coarse, and two-step refined envelope reconstruction schemes for all 28 test structures. (b–d) success rate, (e–g) minimum convergence iterations, (h–j) median convergence iterations, organized by algorithms (b,e,h), algorithm types (c,f,i), and envelope schemes (d,g,j). The two-step refined envelope scheme enhances success rates by 60.5% for classical IPAs (excluding ASR and RAAR) and 45.7% for continuous IPAs. Individual algorithm analysis reveals that continuous algorithms (CHIO, HPR, MCHIO, THIO) and improved classical algorithms (MDM, MRAAR) achieve performance levels comparable to HIO, while exceeding ASR, RAAR, and DM. Optimal performance achieved by combining top-performing algorithms from both categories with refined envelope reconstruction.

Figure 6. Ab initio structure determination for seven proteins. Each column represents one structure. Row 1: PDB reported structures (gray cartoon, green ligands and ions). Row 2: automatically built models from ab initio phasing results (red cartoon). Row 3: superposition of rows 1 and 2, demonstrating excellent agreement (backbone RMSD < 0.5 Å). Row 4: enlarged views of regions marked in green sticks of row 1, showing electron density maps calculated from ab initio phasing results (orange mesh) superimposed with PDB posted structures, highlighting density quality for secondary structures (3rd5, 1hp4), bound ligands (2fg0, 1nh6, 2boz, 4gtf), and ions (2bke, 4gtf). Structure 2boz is a transmembrane protein with membrane-embedded ligands. Visualization using PyMOL 3.1 [62].

Figure 7. GA-enhanced phasing demonstration (CHIO on 3rd5 with refined envelope scheme, 100 trials). (a–e) Quality metrics versus iterations; each line represents one trial. Genetic crossover every 100 iterations enables information sharing. First convergence at around 5100 iterations, population-wide success in about 400 additional iterations. Early stopping at iteration 5500, followed by transition region reduction (iterations 5500–7000) and solvent flattening (iterations 7000–7500). (f) Phase error reduction via multi-solution averaging (100 GA solutions), showing consistent improvement with diminishing returns beyond 20 averaged maps.

Figure 8. Comprehensive performance evaluation of coarse versus refined envelope designs across three phasing strategies (conventional, resolution-weighted, and GA-enhanced). Row 1 (a–c): Strategy comparison for success rate, minimum iterations, and median iterations (averaged over 8 algorithms, excluding ASR and RAAR). Row 2 (d–f): Algorithm-specific trends across strategies. Row 3 (g–i): Envelope design comparison across strategies. Row 4 (j–l): Success rate versus median iterations scatter plots showing algorithm-wise improvements from coarse (blue) to refined (orange) envelopes. Refined envelope consistently outperforms coarse envelope, with GA-enhanced strategy achieving optimal performance.

Figure 9. Average success rate across 28 protein structures ordered by solvent content for three phasing strategies: (a) conventional strategy, (b) resolution-weighted strategy, (c) GA-enhanced strategy. Paired bars compare coarse envelope (blue) versus refined envelope (orange) designs, averaged over 8 algorithms (excluding ASR and RAAR) with 100 trials per algorithm. Across all strategies, success rate generally increases with solvent content, though with notable variation among structures. The refined envelope design consistently outperforms the coarse envelope design regardless of solvent content.

Figure 10. Correlation analysis between success rate and solvent content for coarse and refined envelope designs across three phasing strategies. Rows show: (a–c) coarse envelope design, (d–f) refined envelope design, (g–i) overlaid comparison. Columns represent (a,d,g) conventional strategy, (b,e,h) Resolution-weighted strategy, (c,f,i) GA-enhanced strategy. Linear regression analysis reveals positive correlation between success rate and solvent content across all strategies and envelope designs (Pearson r and p-values shown in each panel). The refined envelope design consistently yields higher success rates at equivalent solvent contents, extending the lower limits of direct phasing applicability.

Figure 11. Benefits of multi-solution averaging in GA phasing. (a) Phase errors before/after averaging for 27 structures (excluding 4qb6) showing consistent improvement. (b) Absolute phase error reduction per structure (average: 6.83°). (c) Phase improvement versus solvent content reveals no correlation, demonstrating uniform benefits across all solvent content ranges. (d,e) Electron density for two ligands in 2boz before (d) and after (e) averaging, demonstrating enhanced density continuity. PDB reported structure displayed as gray sticks.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Fu, R.; Su, W.-P.; He, H. Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction. Biomolecules 2026, 16, 227. https://doi.org/10.3390/biom16020227

AMA Style

Liu Y, Fu R, Su W-P, He H. Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction. Biomolecules. 2026; 16(2):227. https://doi.org/10.3390/biom16020227

Chicago/Turabian Style

Liu, Yang, Ruijiang Fu, Wu-Pei Su, and Hongxing He. 2026. "Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction" Biomolecules 16, no. 2: 227. https://doi.org/10.3390/biom16020227

APA Style

Liu, Y., Fu, R., Su, W.-P., & He, H. (2026). Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction. Biomolecules, 16(2), 227. https://doi.org/10.3390/biom16020227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Direct Phasing of Protein Crystals with Continuous Iterative Projection Algorithms and Refined Envelope Reconstruction

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Workflow and Key Operations in Direct Phasing

2.2. Mathematical Framework and Continuous Modification of Iterative Projection Algorithms

2.2.1. Partitioned Update Form of Classical Iterative Projection Algorithms

2.2.2. Continuous Iterative Projection Algorithms: Introduction of a Transition Region

2.2.3. Classical Iterative Projection Algorithms and Improved Variants

2.3. Envelope Reconstruction Strategies: From One-Step Coarse Design to Two-Step Refined Design

2.4. Phase Retrieval Strategies: From Full-Resolution to Genetic Co-Evolution

2.4.1. Full-Resolution Phasing Strategy

2.4.2. Resolution-Weighted Progressive Phasing Strategy

2.4.3. Genetic Algorithm-Enhanced Co-Evolution Phasing Strategy

2.5. Error Metrics, Missing Reflections, and Model Building

2.5.1. Error Metrics

2.5.2. Handling of Missing and Weak Diffraction Data

2.5.3. Electron Density Map Post-Processing and Automated Model Building

2.6. Test Datasets, Computational Implementation, and Parameter Settings

3. Results

3.1. Constraint Framework for Ab Initio Phasing

3.2. Validation Case Study: Direct Phasing of Structure 3rd5 with Continuous Algorithms

3.3. Enhancement of Phase Retrieval Performance by the Refined Envelope Scheme

3.4. Systematic Benchmarking: Synergistic Effects of Algorithms, Envelopes, and Strategies

3.4.1. Universal Performance Enhancement from Refined Envelope Reconstruction

3.4.2. Algorithm Performance Differences and Selection Strategy

3.4.3. Structural Validation: From Electron Density to Atomic Models

3.5. Evolution of Phasing Strategies: From Full-Resolution to Genetic Co-Evolution

3.5.1. Resolution-Weighted Strategy: Modest Gains with Computational Trade-Offs

3.5.2. Genetic Algorithm Strategy: Breakthrough Through Population Intelligence

3.6. Correlation Analysis: Solvent Content Dependence and Extended Applicability

3.7. Multi-Solution Averaging: Enhanced Precision Through Population Convergence

4. Discussion

4.1. Theoretical and Practical Value of Continuous Density Modulation

4.2. Mechanism and Universal Significance of Refined Envelope Reconstruction

4.3. Genetic Algorithm Strategy: Mechanism and Boundaries

4.4. Factors Influencing Phase Recovery Success

4.4.1. Data Quality

4.4.2. Structural Distribution Within the Unit Cell

4.5. Practical Considerations and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Supplementary Figures, Structure Information, and Numerical Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI