Processing of Multicrystal Diffraction Patterns in Macromolecular Crystallography Using Serial Crystallography Programs

: Cryocrystallography is a widely used method for determining the crystal structure of macromolecules. This technique uses a cryoenvironment, which signiﬁcantly reduces the radiation damage to the crystals and has the advantage of requiring only one crystal for structural determination. In standard cryocrystallography, a single crystal is used for collecting diffraction data, which include single-crystal diffraction patterns. However, the X-ray data recorded often may contain diffraction patterns from several crystals. The indexing of multicrystal diffraction patterns in cryocrystallography requires more precise data processing techniques and is therefore time consuming. Here, an approach for processing multicrystal diffraction data using a serial crystallography program is introduced that allows for the integration of multicrystal diffraction patterns from a single image. Multicrystal diffraction data were collected from lysozyme crystals and processed using the serial crystallography program CrystFEL. From 360 images containing multicrystal diffraction patterns, 1138 and 691 crystal lattices could be obtained using the XGANDALF and MOSFLM indexing algorithms, respectively. Using this indexed multi-lattice information, the crystal structure of the lysozyme could be determined successfully at a resolution of 1.9 Å. Therefore, the proposed approach, which is based on serial crystallography, is suitable for processing multicrystal diffraction data in cryocrystallography.


Introduction
Cryocrystallography is a widely used X-ray crystallographic technique for determining the structure of macromolecules and has the advantage of significantly reducing the radiation damage experienced by the crystals during data collection [1,2]. Typically, a single crystal is mounted under cryogenic conditions (e.g., temperature of 100 K), resulting in higher-quality diffraction data compared with those collected at room temperature using a single crystal [3].
During a typical cryocrystallographic data collection process, the preferred approach is to mount a single crystal in the path of the X-ray beam and obtain the single-crystal diffraction patterns to determine the structure of the crystal [4]. When recording the X-ray images, the Bragg peaks are detected in the diffraction patterns and used to determine the crystal system and dimensions of the unit cell as well as the crystal orientation in the X-ray beam [5]. Based on the unit cell and orientation information of the crystal, the indexing of the diffraction pattern is performed, wherein each Bragg peak in the diffraction pattern is assigned an index, which is in the form of three integers, namely h, k, and l [6]. Based on the indexing results, the additional Bragg peaks in the diffraction patterns are integrated and scaled. This yields the structure factor for determining the crystal structure [7]. Therefore, the accurate indexing of the Bragg peaks in the diffraction patterns is critical for the success of the initial data processing step. To prevent the misindexing of the diffraction patterns, single-crystal diffraction patterns are preferred during typical X-ray data collection.
However, experimentally, crystal samples often grow in multiple layers or in the form of clusters [8][9][10]. When these crystals are exposed to X-rays, they result in multicrystal diffraction patterns. In addition, during the crystal-mounting process, several crystals could be mounted in the crystal-mounting tool (e.g., CryoLoop or MicroLoop). In such instances, when the crystals are exposed to X-rays, the diffraction images recorded could contain diffraction patterns from multiple crystals. From these multicrystal diffraction patterns, a single-crystal lattice must be distinguished and indexed to determine the structure factor.
In such cases, it is generally possible to index the crystal lattice by selecting the highestintensity Bragg peak detected in the multicrystal diffraction pattern [7]. However, this results in only one diffraction pattern being indexed and integrated for one image, and the unselected Bragg peaks are not used. Furthermore, if the crystal lattice is not clearly distinguishable from the multicrystal diffraction patterns, the indexing program may provide incorrect crystal lattice information [10]. When data indexing based on multicrystal diffraction patterns fails, the collected data cannot be used for structural determination. Therefore, when collecting multicrystal diffraction patterns during cryocrystallography, it is important to use a program that can index the crystal lattice efficiently.
CrystFEL is one of the most commonly employed SX data processing programs [33,34] and is used with various indexing algorithms, such as MOSFLM [39], XDS [40], DirAx [41], TakeTwo [42], FELIX [43], and XGANDALF [44]. CrystFEL can index and integrate multicrystal lattices from images containing multicrystal diffraction patterns [33,34]. Accordingly, it is expected that using an SX program to process multicrystal diffraction images obtained from cryocrystallography would simplify the data processing step and yield a higher number of diffraction patterns compared to the conventional programs. However, to the best of my knowledge, this has not been demonstrated experimentally.
Here, a new approach for processing multicrystal diffraction data obtained by cryocrystallography is presented using an SX program. Several lysozyme crystals were exposed to an X-ray beam to produce multicrystal diffraction patterns. These diffraction patterns were then successfully processed using the CrystFEL program and the XGANDALF and MOS-FLM indexing algorithms, and the crystal structure of the lysozyme could be determined successfully at a resolution of 1.9 Å. The results obtained confirmed that SX programs can be employed for the data processing of multicrystal diffraction patterns obtained from conventional cryocrystallography.

Sample Preparation
Lysozyme powder made from hen egg white was purchased from Sigma-Aldrich (L6876; St. Louis, MO, USA). The lysozyme powder was crystallized by the batch method as previously reported [31]. Briefly, the lysozyme powder was dissolved in a buffer containing 10 mM Tris-HCl (pH 8.0) and 200 mM NaCl. The lysozyme solution (25 mg/mL) and a crystallization solution containing 0.1 M sodium acetate (pH 4.5), 2 M NaCl, and 8% (w/v) PEG 8000 were transferred to a 1.5 mL microcentrifuge tube and vortexed immediately at 3000 rpm for 30 s. The mixture was then incubated at 20 • C overnight. The size of the crystals formed was approximately 40-100 µm.

Data Collection
The diffraction data were collected at beamline 11 C of Pohang Light Source II (Republic of Korea) [45]. The lysozyme crystal suspension was transferred to a siliconized cover glass using a pipette. The lysozyme crystals were picked using a CryoLoop device and soaked in a cryoprotectant solution consisting of 0.1 M sodium acetate (pH 4.5), 2 M NaCl, 8% (w/v) Polyethylene glycol 8000, and 20% (v/v) ethylene glycol for 10 s. The CryoLoop device with the crystals was mounted on the goniometer of the beamline in a nitrogen stream at 100 K. Multiple crystals were present in the CryoLoop, and the diffraction data were collected for an X-ray exposure time of 1 s in steps of 1 • while rotating the crystals by 360 • . During data collection, 3-6 crystals were exposed to X-rays depending on the rotation angle of the CryoLoop. The recorded multicrystal diffraction data were processed using CrystFEL [33]. To record single-crystal diffraction data, a lysozyme crystal was immersed in the cryoprotectant solution and then mounted on the goniometer; the data were collected using the same parameters as those used for multicrystal diffraction data collection. The single-crystal diffraction data were processed using HKL2000 [7].

Structural Determination
The electron density maps were obtained based on molecular replacement using Molrep [46]. The room-temperature structure of the lysozyme (PDB code 7DTB) [29] was used as the search model. The model was built using Coot [47]. Structural refinement was performed using the phenix.refine program of the Phenix package [48]. The final structure was validated with MolProbity [49]. The structural figures were generated using PyMOL (https://pymol.org (accessed on 12 January 2022)).

Results
In the case of the multicrystal diffraction pattern images generated from crystals of different sizes or through partial multicrystal diffraction, crystal lattice indexing can be performed successfully by selecting only the diffraction patterns containing high-intensity Bragg peaks. However, it is relatively difficult to index multiple diffraction patterns with similar intensities generated from crystals of similar sizes. In this study, to process multicrystal diffraction data, which is even more challenging, multicrystal diffraction patterns were obtained by exposing crystals of similar sizes to X-rays such that the individual crystal diffraction patterns could not be distinguished with ease.
Similar-sized lysozyme crystals were obtained using the batch method, which is used widely in SX [29,31]. After being soaked in the cryoprotectant solution, multiple crystals were mounted on a goniometer maintained at 100 K in a nitrogen stream ( Figure 1). The CryoLoop containing the lysozyme crystals was aligned such that multiple crystals were exposed to the X-ray beam as the CryoLoop was rotated by 360 • during the data collection process ( Figure 1).
As a result, all the collected images contained diffraction patterns corresponding to several crystals ( Figure 2). In almost all the images, it was difficult to intuitively distinguish the typical diffraction pattern of the lysozyme containing Bragg peaks at regular intervals.
As the crystals were distributed throughout in the CryoLoop, when the CryoLoop was rotated by 360 • and exposed to the incident X-ray beam, the number and position of the crystals exposed to the X-rays varied with the rotation angle of the CryoLoop. Accordingly, when X-ray penetrated the two nylon points of the CryoLoop, the image contained maximum multicrystal diffraction patterns. Furthermore, when X-rays passed along the vertical direction of the CryoLoop plane, the images contained a small number of multicrystal diffraction patterns that could be distinguished partially represented the section. As a result, all the collected images contained diffraction patterns corresponding to several crystals ( Figure 2). In almost all the images, it was difficult to intuitively distinguish the typical diffraction pattern of the lysozyme containing Bragg peaks at regular intervals.
As the crystals were distributed throughout in the CryoLoop, when the CryoLoop was rotated by 360° and exposed to the incident X-ray beam, the number and position of the crystals exposed to the X-rays varied with the rotation angle of the CryoLoop. Accordingly, when X-ray penetrated the two nylon points of the CryoLoop, the image contained maximum multicrystal diffraction patterns. Furthermore, when X-rays passed along the vertical direction of the CryoLoop plane, the images contained a small number of multicrystal diffraction patterns that could be distinguished partially represented the section. All collected images contained multiple crystal diffraction patterns ( Figure 2). The multicrystal diffraction patterns were initially processed using the HKL2000 program, which is employed widely in cryocrystallography, but this program provided the misindexing information with wrong incorrect space group and unit cell dimension for lysozyme crystal for the diffraction pattern collected in this experiment. Next, the multicrystal diffraction patterns were indexed using the CrystFEL program [33]. As the data processing algorithms of different indexing programs would be different, the indexing rate and data statistics would also differ based on the program used. In this study, both single and multiple lattices were indexed from the multicrystal diffraction images using CrystFEL and various indexing algorithms, such as XGANDALF, MOSFLM, XDS, and DirAx. When processing the multicrystal diffraction patterns using the different indexing algorithms, the important indexing parameters, such as the signal-to-noise ratio (SNR), tolerance, and integration radius, were kept constant (set to the default values). The unit cell information of the lysozyme was the input, and the acceptable indexing tolerances for the dimensions and angle of the crystal lattice were set at 5% and 1.5%, respectively; the indexed crystal lattices exceeding these values were excluded from the data. In Braggs peaks indexing, singlelattice refers to obtaining only one crystal lattice from one image, and multiple-lattice refers to obtaining multiple crystal lattice by referring to "subtract and retry" method from one image.
For single-lattice indexing, XGANDALF, MOSFLM, XDS, and DirAx indexed 80, 78, 72, and 1 images, respectively, from 360 images ( Table 1). All the indexing algorithms used with the multicrystal diffraction patterns showed single-lattice indexing rates of less than 23%. All collected images contained multiple crystal diffraction patterns (Figure 2). The multicrystal diffraction patterns were initially processed using the HKL2000 program, which is employed widely in cryocrystallography, but this program provided the misindexing information with wrong incorrect space group and unit cell dimension for lysozyme crystal for the diffraction pattern collected in this experiment. Next, the multicrystal diffraction patterns were indexed using the CrystFEL program [33]. As the data processing algorithms of different indexing programs would be different, the indexing rate and data statistics would also differ based on the program used. In this study, both single and multiple lattices were indexed from the multicrystal diffraction images using Cryst-FEL and various indexing algorithms, such as XGANDALF, MOSFLM, XDS, and DirAx. When processing the multicrystal diffraction patterns using the different indexing algorithms, the important indexing parameters, such as the signal-to-noise ratio (SNR), tolerance, and integration radius, were kept constant (set to the default values). The unit cell information of the lysozyme was the input, and the acceptable indexing tolerances for the  For multi-lattice indexing, XGANDALF, MOSFLM, XDS, and DirAx indexed 360, 326, 213, and 1 images, respectively (Table 1). XGANDALF was able to index all 360 images (100%), while MOSFLM and XDS were able to index 90.55% and 59.16% of the images, respectively. In contrast, DirAx could recognize the diffraction pattern in only one image. In the multi-lattice indexing mode, XGANDALF, MOSFLM, and XDS could extract 1288, 385, and 308 single-crystal lattices, respectively (Table 1). Both XGANDAL and MOSFLM were able to extract more single-crystal lattices instead of the complete images. Specifically, XGANDALF yielded 3.53 times more crystal lattices than complete images. A representative example of multi-lattice indexing from a diffraction image using XGANDALF is shown in Figure 3. Therefore, when combined, the multicrystal diffraction patterns obtained using cryocrystallography could be successfully indexed using an SX program, with the indexing results depending on the indexing program used. Next, the crystal structure of the lysozyme was determined from the multi-and single-crystal diffraction data. For the multicrystal diffraction data, the Rwork/Rfree values of the final 1.9 Å lysozyme structure indexed using XGANDALF and MOSFLM were 0.207/0.264 and 0.199/0.246, respectively. The overall electron density maps of the lyso- Meanwhile, for the single-crystal diffraction data, 360 single-crystal lattices could be obtained from the 360 images using HKL2000 (Table 1).
Next, the crystal structure of the lysozyme was determined from the multi-and singlecrystal diffraction data. For the multicrystal diffraction data, the R work /R free values of the final 1.9 Å lysozyme structure indexed using XGANDALF and MOSFLM were 0.207/0.264 and 0.199/0.246, respectively. The overall electron density maps of the lysozyme obtained using XGANDALF and MOSFM were sufficiently clear for interpreting all the amino acids (Figure 4a,b). Concurrently, the refinement statistics for the model structure were different for XGANDALF and MOSFLM (Table 2). Meanwhile, although the data statistics for the multicrystal diffraction patterns processed using the XDS method were poor, these data were also used for structural determination. As expected, the R work and R free values of the 2.1 Å lysozyme structure in the case of XDS were 0.266 and 0.341, respectively, indicating that the model structure obtained using this method was not suitable (Figure 4c). For the single-crystal diffraction data, the R work /R free values of the final 1.6 Å lysozyme structure were 0.171 and 0.209, respectively. This dataset also exhibited clear electron density maps (Figure 4d), and almost all the amino acids were well defined.

Discussion
In cryocrystallography, diffraction data obtained from a single crystal are preferred, as this facilitates the indexing of the periodic Bragg peaks produced by the crystal, therefore yielding reliable and accurate structural information [7]. However, in the case of multicrystal diffraction patterns, the indexing process not only results in false Bravais lattice information in many cases but also affects the SNR and structure factor, depending on the degree of overlapping of the Bragg peaks.
In this study, both multi-and single-crystal lysozyme diffraction data were collected. After data processing, the single-crystal diffraction data showed higher resolution and SNR and CC values compared with those of the multicrystal diffraction data (Table 1). In addition, with respect to structural refinement, the single-crystal diffraction data yielded a more reliable refinement model in terms of the Rfree value (Table 2). Therefore, it can be surmised that high-quality diffraction data obtained from a single crystal are preferable

Discussion
In cryocrystallography, diffraction data obtained from a single crystal are preferred, as this facilitates the indexing of the periodic Bragg peaks produced by the crystal, therefore yielding reliable and accurate structural information [7]. However, in the case of multicrystal diffraction patterns, the indexing process not only results in false Bravais lattice information in many cases but also affects the SNR and structure factor, depending on the degree of overlapping of the Bragg peaks.
In this study, both multi-and single-crystal lysozyme diffraction data were collected. After data processing, the single-crystal diffraction data showed higher resolution and SNR and CC values compared with those of the multicrystal diffraction data (Table 1). In addition, with respect to structural refinement, the single-crystal diffraction data yielded a more reliable refinement model in terms of the R free value (Table 2). Therefore, it can be surmised that high-quality diffraction data obtained from a single crystal are preferable in cryocrystallography. However, often, the diffraction patterns recorded are from multiple crystals, owing to the nature of the crystal samples or multiple crystal fishing. In such cases, if the multiple crystal diffraction patterns are not indexed, typically, a diffraction experiment is conducted again using another fresh crystal sample. However, in the cases where single-crystal diffraction data collection is difficult or there are no additional crystal samples available, the structural information must be extracted from multicrystal diffraction patterns, even though these may be of relatively low quality.
Multicrystal diffraction patterns can be resolved using popular X-ray programs, such as HKL2000, which are used widely in X-ray crystallography. The following approaches are possible: (i) indexing crystal lattices from images in which single-crystal diffraction patterns are clearly distinguishable from multicrystal ones; (ii) indexing the high-intensity Bragg peaks by increasing the σ-cutoff level; (iii) indexing based on the partial crystal lattice pattern in a specific area, which may have low or high resolution; (iv) indexing the crystal lattice after manually increasing or decreasing the Bragg peaks by peak search mode.
These general approaches are useful for indexing multicrystal diffraction patterns. However, they can be time consuming and tedious if the multicrystal diffraction patterns are complex or of poor quality. Moreover, these approaches can usually extract only one diffraction pattern from multicrystal diffraction patterns. If the possibility of obtaining multi-lattice information from images is realized, the values of parameters, such as the completeness, redundancy, and resolution, would improve.
Here, I attempted to perform the data processing of multicrystal diffraction patterns obtained by cryocrystallography using the CrystFEL program, which is a widely used program for SX. The obtained multicrystal diffraction patterns could be indexed successfully and were used to determine the crystal structure of the lysozyme at a high resolution. The proposed approach has the following advantages: (i) it is not necessary to check the image for indexing multicrystal diffraction patterns; (ii) multi-lattice information can be extracted from a single image; (iii) the indexing efficiency can be improved using various indexing algorithms either individually or in combination.
In this study, multicrystal diffraction patterns were processed using CrystFEL and indexing programs, such as XGANDALF, MOSFLM, XDS, and DirAx. Each indexing method resulted in a different indexing rate, number of multicrystal lattices, and data statistics (Table 1). Overall, XGANDALF and MOSFLM showed higher indexing efficiencies compared with those of XDS and DirAx. However, the results of multicrystal diffraction pattern analysis depend on the actual multicrystal diffraction patterns recorded as well as the data processing parameters used. Hence, the choice of the indexing method will depend on the quality of the multicrystal diffraction patterns. When indexing the multicrystal diffraction patterns obtained during cryocrystallography using an SX program, it is necessary to use an indexing program that maximizes the indexing efficiency; this can be ensured by evaluating several indexing programs. Moreover, it is necessary to optimize the indexing parameters, such as the integration radius and detector geometry, as well.
In this study, the XGANDAF, MOSFLM, and XDS indexing algorithms were used successfully to separately index multiple lattices. Among them, XGANDALF and MOSFLM exhibited the best indexing performances and yielded the highest-quality electron density maps with suitable R free values. In contrast, the XDS indexing method showed a poor performance and resulted in low CC and SNR and high R free values despite the use of 308 crystal lattices. It is likely that this result is attributable not to the XDS indexing algorithm but to the fact that the multicrystal diffraction images used in this study were not suitable for processing using XDS. Meanwhile, these results indicate that even if a sufficiently high number of images are obtained while considering the space group for the indexing images, the success of the data processing step can be confirmed only by analyzing both the final data statistics and the refinement results.
In this study, four different indexing algorithms were used to index multicrystal diffraction patterns, and the results were compared. However, to obtain better results, it may be appropriate to combine several indexing algorithms, as this would allow for data processing with improved data statistics. To understand the influence of the indexing algorithm combination on the data obtained in this experiment, further data processing was performed using the following three combinations of algorithms: (1) XGANDALF/MOSFLM, (2) XGANDALF/XDS, and (3) MOSFLM/XDS. XGANDAF/MOSFLM and XGANDAN/XDS showed identical results as 360 indexed images with 1288 crystal diffraction patterns. Since this number was the same as when only the XGANDALF algorithm was used for indexing, there was no significant advantage in the indexing algorithm combination. This indicates that data processing by XGANDALF showed the maximum indexing efficiency for the collected data in this experiment. Meanwhile, as a result of combining the MOSFLM/XDS algorithms, 331 indexed images and 617 crystal diffraction patterns were obtained. This is the result of obtaining 232 and 309 more indexed patterns, respectively, compared to the results of indexing MOSFLM and XDS alone.

Conclusions
In this study, I showed that the multicrystal diffraction patterns collected during cryocrystallography can be readily and efficiently processed using an SX program. This approach should be suitable for indexing the cryocrystallographic multicrystal diffraction patterns of both macromolecule crystals and those of small molecules.