Applications of X-ray Powder Diffraction in Protein Crystallography and Drug Screening

: Providing fundamental information on intra/intermolecular interactions and physicochemical properties, the three-dimensional structural characterization of biological macromolecules is of extreme importance towards understanding their mechanism of action. Among methods, powder (XRPD) its applicability and in numerous studies of different materials. Owing to this method is now a respectable tool for identifying macromolecular phase transitions, quantitative analysis, and determining structural modifications of samples ranging from small organics to full-length proteins. An overview of the XRPD applications and recent improvements related to the study of challenging macromolecules and peptides toward structure-based drug design is discussed. This review congregates recent studies in the field of drug formulation and delivery processes, as well as in polymorph identification and the effect of ligands and environmental conditions upon crystal characteristics. These studies further manifest the efficiency of protein XRPD for quick and accurate preliminary structural characterization. in The of two in 7.39 in high resolution diffraction profiles from synchrotron the of allowed for simultaneous refinement and accurate extraction of unit-cell parameters via


Introduction
X-ray crystallography has been for more than sixty years the most accurate and reliable approach to obtain detailed structural information for biological macromolecules. Relying on the availability of high-quality crystals, this method provides significant insights into the molecular mechanisms revealing the function of macromolecules, as well as inter/intramolecular interactions forming complex supramolecular assemblies [1,2]. Although, in recent years, Single-Crystal X-ray Diffraction (SCXD) has been considered the most powerful structural characterization tool for proteins, limitations related to the requirement of a sizeable single-crystal, stability, and diffraction quality have considerably reduced the number of molecules that can be studied via this method [3]. X-ray diffraction by crystalline powders is one of the most powerful and widely used methods for analyzing matter. It was discovered just 100 years ago, independently, by Paul Scherrer and Peter Debye in Göttingen, Germany, who managed to use the powder diffraction method for structure solution in 1916, as they studied a polycrystalline material of lithium fluoride (LiF) [4][5][6].
The SCXD method for structure solution consolidated its applicability in macromolecules early on, with the first crystal structure of a protein, myoglobin, solved in 1960, while increasing numbers of large molecules and macromolecular assemblies have been determined by crystallographic methods during the following years. In the field of X-ray Powder Diffraction (XRPD), in 1947 the The present review article focuses on recent advances in macromolecular XRPD, summarizing crystallographic case-studies of standard (models) and challenging molecules investigated under different crystallization and environmental conditions. The experimental results presented herein confirm the suitability of the method for both the extraction of structural information and polymorph screening for the purpose of therapeutics' amelioration and design.

Challenging Samples: Macromolecular Assemblies & Subunits
Genome projects of several organisms have revealed numerous new genes, as well as their transcripts (proteins), which are potentially involved in the onset of certain diseases [40].
Viruses have developed different strategies for their proliferation and propagation. Visualization of diverse and complex three-dimensional structures of intact virus particles, as well as their constituent proteins and complexes as recorded in the Protein Data Bank (PDB) or in the specialized database Virus Particle Explorer (VIPER), provides scientists with useful information towards understanding their biological function [41][42][43].
Enveloped viruses have rarely yielded crystals [41]; however, solubilized protein components of their membranes have been crystallized and studied with great success. To date, about 10,837 virus-related structures are available in the PDB, more than 77% of them containing viruses only. Most virus structures in the PDB (around 85%) have been determined using X-ray crystallography (PDB: https://www.wwpdb.org/; [44]) [45]. The abovementioned structures include not only proteins forming icosahedral capsids, proteins of cylindrical viruses, and different components of tailed phages, but also many virus enzymes as proteases, and RNA/DNA polymerases [41,43].
The procedure of growing crystals implies approaches that have been for many years essentially experiential in a more or less trial-and-error process. Screening for identifying the optimal conditions has been made easier through automation and the introduction of commercially available crystallization kits and robots. Many parameters can be changed in these experiments, such as temperature, pH, and ionic strength, but perhaps the most important variable, the protein, is sometimes being neglected [40].
Advances in recombinant DNA technology in recent years have had a massive effect in the area of protein crystallization. Large amounts of pure protein produced in various expression systems allow for preliminary experiments before initiating crystallization trials, such as solubility, purity, and aggregation tendencies [46]. However, one of the most common problems that scientists have encountered with recombinant proteins produced in E. coli is the formation of inclusion bodies containing large amounts of insoluble protein [46].
The combination of methods and approaches, as well as advances in bioinformatics, protein expression, purification, and methods for accelerating crystallization and X-ray data collection are of extreme importance toward removing the so-called "crystallization bottleneck" from the process of determining protein crystal structures [40,[46][47][48][49][50][51][52]. Among other approaches, the XRPD method shows a significant gain in time, especially in the case of complex and difficult-to-crystallize molecules, as it contributes to the detection of crystalline symmetry and phase sorting, through the comprehensive examination of even low-quality crystals.

Polycrystalline Samples and First Virus Protein XRPD Studies
Macromolecular crystallization in general is an inherently complex phenomenon [53], while the most appropriate way to correlate the precipitant agents with the protein concentration is the phase diagram.
Crystallization techniques, even the most sophisticated among them [54,55], attempt to drive the solution briefly to the nucleation and then the metastable zone. Each method follows a different trajectory (Figure 1; [56]). Crystallization proceeds in two phases: nucleation and growth. Once a critical nucleus is formed, growth follows spontaneously [56][57][58][59]. Nonetheless, excess nucleation consistently occurs, resulting in the formation of numerous low-quality micro-/nano-crystals [60,61].

Figure 1.
Phase diagram (Protein concentration/ precipitation factor diagram). The solubility curve separates the undersaturated with supersaturated, which is also desirable for crystallization. (i) Batch, (ii) Vapor Diffusion, (iii) Dialysis, (iv) Free-Interface Diffusion. The superficial area consists of the metastable zone, the nucleation zone or labile zone and the precipitation zone [62].
In earlier days, XRPD methods were employed toward the investigation of several different crystalline proteins [63]. This kind of research established the presence of long lattice spacings in the corresponding structures and confirmed the applicability of X-ray diffraction studies of macromolecules.
XRPD data depicted as Debye-Scherrer rings were also obtained from virus proteins and specifically from precipitated tobacco mosaic virus proteins [64]. In this study, emphasis was placed on the large number of peaks in the diffraction profiles (within a range of 80-3 Å ). Indeed, it was reported that those patterns were exactly as expected for crystalline samples of molecules as large as these proteins. Further studies of other plant virus proteins [65] allowed for the determination of unit cell dimensions in several cases. Additionally, the use of powder methods was also broadened in a study of the crystalline inclusion bodies (1-3 mm in linear size) of cytoplasmic polyhedrosis virus from Bombyx mori [23,66,67]. They were placed in a capillary tube while immersed in buffer and Xray diffraction experiments led to powder data extending to 8.2 Å resolution. However, reductions in crystal size below ~20 μm for a 100 Å unit cell are not foreseen soon due to radiation damage effects [68]. Undeniably, the solution and refinement of structures from sub-μm sized protein crystals containing only a few unit cells is still a major challenge for crystallography [23].

Preliminary Structural Data of Virus Proteins via XRPD
Τhe necessity of many proteins to create considerably large and well-diffracting single crystals has underlined the applicability of XRPD for the structural characterization of virus proteins. The latter was originally confirmed a decade ago, when the macro domain moiety of the nsP3 protein from the Mayaro virus (MAYV), which appears in tropic regions of South America, was investigated. Despite significant efforts, good quality single crystals of the MAYV nsP3 macro domain were not available, whereas crystallization trials resulted in reproducible needle-shaped microcrystalline samples and the first structural information was obtained via synchrotron XRPD measurements [69]. X-ray diffraction data were collected at the European Synchrotron Radiation Facility (ESRF) while indexing of the diffraction patterns indicated a trigonal/ hexagonal unit cell (space group: P31, a  The high-resolution powder diffraction beamline ID31, RT, λ = 1.29984 Å . Insets correspond to magnifications of profile selected regions [69].
In addition, the application of the XRPD method provides the advantage of considerably reducing the amount of time necessary for fine-tuning crystallization experiments, and in the case of virus proteins, it may be useful to examine multiple crystallization conditions by investigating the formation of different crystalline polymorphs [70][71][72]. This allows for the examination of physicochemical characteristics that each polymorph bears, as well as the ability to bind molecules that inactivate the action of any such protein, considering their potential utility as drug precursors.
The applicability of XRPD measurements in antiviral research, and its ability to provide preliminary structural information as first shown from Papageorgiou and colleagues (2010), triggered the research around difficult-to-crystallize virus proteins. Recently, a study focused on a 20.5kDa protein, protease 3C (3C pro ) of an emerging Enterovirus, Coxsackievirus B3 (CVB3), has come to support this claim further [71]. CVB3 may cause various diseases ranging from pleuropneumonia or "Bornholme disease" to myocarditis leading to permanent heart damage or even death [73], while this molecule is comprised of the functional virus proteins and is responsible for the majority of proteolytic cleavages occurring within the host cell [74].
Experimentally, 3C pro was expressed and purified in a recombinant form, employing bacterial cultures and inducible factors. A crystallization condition containing stable resolving agents was employed in a range of polymer concentrations and pH values and resulted in polycrystalline material (~50 μm). In order to optimize data quality, different instruments and sources were used for data collection.
Using laboratory instrumentation (Malvern Panalytical, X' Pert PRO), initial extraction of unit cell parameters and crystal symmetry (indexing) was feasible, while the best diffraction profiles in terms of angular and d-spacing resolution were obtained at the ESRF, allowing accurate identification of unit-cell parameters and characterization of peak shape and background coefficients in the absence of a structural model using Pawley method [75]. XRPD data analysis demonstrated no structural modifications or alterations in the diffraction peak positions throughout the crystallization conditions examined, with all samples containing crystals of monoclinic symmetry (space group C2) (Figure 3) [71].  Analogous studies of selected virus proteins and protein domains that have a critical role in a virus's lifecycle (suggesting potential methods for virus inactivation via lifecycle disruption) have been conducted in recent years. Dengue virus 3 (DENV3) non-structural protein 5 (NS5) participates in a virus replication system; it is a bimodular enzyme carrying a methyltransferase domain (MTase) at its N-terminus and a polymerase (RdRp) at its C-terminus. DENV3 NS5 MTase catalyzes two consecutive methylation reactions associated with the synthesis of the RNA-cap structure. Dengue viruses are, in general, pathogenic flaviviruses transmitted by Aedes mosquitoes [76], and their diseases range in severity from undifferentiated acute febrile disease, classical fever epidemic (Dengue Fever/DF) to life threatening Dengue Hemorrhagic Fever (DHF) and Dengue Shock Syndrome (DSS) conditions which may lead to neurological disorders [77].
Crystallographic studies have been performed on DENV3 NS5 MTase domain in the absence [78,79] or presence [80][81][82] of organic molecules (ligands), leading to the identification of potential inhibitors against DENV [72]. However, only a small number of examined fragments selected by a primary biophysical screening could yield well diffracting single crystals and thus the structure of the complex [80], limiting options for the development of potent inhibitors. Thus, the production of polycrystalline material, as well as XRPD structural analysis, have been performed using different crystallization conditions aiming at diffraction data collection and preliminary extraction of structural information in terms of high-throughput crystal screening and polymorph identification ( Figure 5). Analysis of the synchrotron XRPD data indicated no profile variation of the diffraction patterns (peak positions) throughout the crystallization conditions examined. Pattern indexing revealed crystals with orthorhombic symmetry (space group: P21212) for all samples [72]. The aforementioned studies underline the capability of XRPD to accurately provide preliminary structural information for demanding biological samples, employing lower quality crystalline precipitate. Even in these cases, where resolution of the data does not allow for complete structural characterization, space group and lattice parameters are extracted using peak positions at the lower 2θ angles for indexing purposes [83,84]. This makes the proposed process, in a fast and systematic manner, suitable for crystal symmetry identification.
Aiming to facilitate antiviral research on a wide spectrum of virus proteins, forthcoming studies will be focused on the complete structural determination via XRPD, as well as employing the technique for the evaluation of co-crystallization experiments associated with the virus proteins with small molecules-ligands in the context of creating new pharmaceutical compounds.

Protein Structure Solution via XRPD
If powder data enclose sufficient amount of information, the structure of a specific protein can be solved and refined, a process which can be described in brief in the following steps.
Considering the fact that XRPD data are characterized by peak overlap, combining multiple data sets together where either the cell parameters or the preferred orientation is different allows the contributing reflections within a cluster of overlapped peaks to be more easily distinguished. The PRODD refinement program [23,85,86] has been modified to allow a multi-pattern Pawley fit [75] leading to more accurate intensity extraction.
Optimized peak shape and background parameters of each dataset are extracted via Le Bail method using a pseudo-Voigt peak profile function [87].
A molecular replacement (MR) step then follows [88]. A starting model is positioned and oriented in the new unit cell until the set of calculated intensities effectively match the experimental data. There are only six degrees of freedom per molecule; three of them are related to the orientation, while three more define the position of the molecule with respect to the symmetry elements of the space group.
A suite of stereochemical restraints with automatic recognition of atom and bond types for the standard amino acid residues, using the Rietveld method [13], are later implemented for structure refinement. A restraint is also used to describe the two-dimensional pseudo-potential surface of a Ramachandran plot [89], while Babinet's principle solvent correction is employed to account for the disordered solvent within the crystal structure [90].
Pioneering experiments with polycrystalline metmyoglobin and lysozyme conducted by Von Dreele and co-workers [20,21,91,92], as well as by Margiolaki and co-workers shortly after at ESRF [10], originally introduced the idea of protein structure determination and refinement using XRPD data. A few years later and after a long series of significant methodological improvements, macromolecular powder diffraction was employed for the examination of the second SH3 (Src homology-3) domain of ponsin (SH3.2) [11], while shortly after, Doebbler and Von Dreele achieved structure solution via MR from powder diffraction data collected using image plates and not multianalyzer diffractometers [93]. The SH3.2 binds to the cytoskeletal proteins paxillin and vinculin at the extracellular matrix adhesion sites [94], while its interaction with paxillin is associated with muscle differentiation processes forming the costamers, namely the lateral cell-matrix contacts of muscle cells [95]. Unit cell characterization step (space group: P212121, a = 24.70420 (9) Å , b = 36.42638 (14) Å , c = 72.09804 (26) Å ) was followed by structure solution, model building, and refinement of this 67-residue protein domain ( Figure 6). Electrostatic potential representation (using PYMOL) of the domain identifying additionally the water molecules as red spheres [11].
Ongoing advances in data analysis, implemented in the General Structure Analysis Software (GSAS; [90,96]) and other software packages, further enhanced the applicability of the method.
In 2013, a novel approach for refining structures of protein molecules using XRPD data was introduced in GSAS, where each amino acid is considered as a flexible rigid body (FRB), requiring a smaller number of refinable parameters and restraints [12]. The approach was applied for the structure refinement of the T6 hexameric form of bovine insulin, a highly homologous molecule to human hormone, responsible for glucose metabolism.
A total of 1542 stereochemical restraints were imposed in order to refine the positions of 800 protein atoms, two Zn 2+ atoms, and 44 water molecules in the asymmetric unit using experimental data in the resolution range 18.2-2.7 Å . The molecular structure was obtained via a 14-pattern stereochemically-restrained Rietveld refinement which exploits the anisotropic variations in unit-cell parameters for T6 insulin, resolving, therefore, the peak-overlap phenomenon [11,97] and resulting in an average crystal structure over a pH range of 5.9 to 7.7 ( Figure 7).

Figure 7.
Selected regions of the total OMIT map contoured at 1σ clearly indicating the positions and coordination of the two zinc ions present in T6 bovine insulin. The map was computed using SFCHECK. The residues represented as cyan sticks correspond to the starting model, 2a3g, and the grey spheres represent the two independent zinc ions, (a) ZnB.1 and (b) ZnB.2, octahedrally coordinated by three symmetry-related HisB10 side chains. This figure was generated using PYMOL [12].

Polymorph Identification
An important feature of crystalline matter for pharmaceutical industries towards drug development is polymorphism, meaning the ability of a molecule or compound to exist in one or more molecular as well as crystalline phases [98]. Variation in the crystallization conditions like solvent polarity, initial macromolecular concentration, and precipitant agents may result in different crystal and/or molecular polymorphs [30][31][32]99].
Differences in crystalline polymorph physicochemical characteristics may determine the manufacturability of a drug candidate [100,101] or affect production processes and properties such as stability, bioavailability, and toxicity of the final pharmaceutical product, and, ultimately, the therapeutic efficacy of the substance [102,103]. Approximately 90% of the existing pharmaceutical compounds based in small organic molecules have been reported to consist of more than one crystalline phase [104], each of which can exhibit diverse properties [105,106].
XRPD is a front-line technique in polymorph screening, as it provides a fingerprint of every crystalline phase exhibiting a unique diffraction pattern. Specifically, with XRPD patterns, differences between the various crystalline forms can be observed by examining the peak positions and intensities [101] (Figure 8). Even small changes in the XRPD patterns in the form of new peaks, additional shoulders, or shifts in the peak positions often imply the presence of a second polymorph [107]. Thus, information about crystalline sample composition is obtained, yielding knowledge of whether it consists of one or more phases. The existence of multiple phases in the same formulation can be problematic when homogenous formulations are required, which is usually the case. Understanding the crystalline form(s) of a pharmaceutical compound provides a road map to help directly development processes at multiple levels, ranging from crystallization, formulation, packaging, storage, and performance of the selected polymorph in addition to the preferred ADME characteristics [35].

Macromolecular Polymorph Screening: The Case of Human Insulin
Human insulin (HI), a peptide hormone of 5.8 kDa produced by β-pancreatic cells that promotes carbohydrates absorption from the blood to the tissues, was one of the first proteins ever isolated [108] and crystallographically studied [109]. In its active form, HI insulin consists of 51 amino acids in two polypeptide chains: A and B (21 and 30 amino acids, respectively). The secondary structure of insulin consists of two, almost antiparallel, α-helices in chain A and one α-helix followed by a turn and a β-strand in chain B [110]. The tertiary structure is stabilized by two inter-chain and one intrachain (in chain A) disulfide bonds, crucial for proper binding to the insulin receptor [111].
Historically, the first insulin crystals were produced in 1926 comprising the rhombohedral symmetry (R3) with T6 chain B configuration [112]. In 1934, David Aylmer Scott noted that the addition of zinc (Zn) and other divalent metals (such as Cd, Co, Ni) was necessary to create crystals [113]. There is a variety of insulin formulations and analogues against diabetes with different onset (time until action), peak (time to achieve the maximum impact), and duration (time until they wear off) of action. Several studies are also underlying the advantages of microcrystalline HI drugs over aqueous formulations, as they provide higher compound concentration, increased stability, and resistance to structural modifications since they are less prone to chemical or enzymatic degradation [114]. Toward improvement of the onset of insulin injections, first successful results were recorded in 1936 when Hagedorn mixed insulin, zinc (Zn), and protamine [115], producing a less soluble complex (NPH-Neutral Protamine Hagedorn), the ancestor of all modern insulin formulations of prolonged-action. A few years later, the production of the Lente [116] series and the examination of various crystallization parameters including pH, zinc, and insulin concentrations (protamine-free) prepared the ground for the production of an ever-growing variety of preparations with differing durations of action (Table 1). These preparations contain either crystalline, amorphous, or intermixtures of both (such as Semilente formulation), while insulin molecules with altered amino acid sequence (i.e., insulin analogues) are also commercially available in the form of ready-forinjection, solution (Aspart, Lispo, Glargine, etc.) [117]. Table 1. Classification of insulin and insulin analogue formulations based on their initiation and duration of action [118].

Type of Formulation Insulin Formulation Start Action
Maximum Action The structural behavior of ΗΙ is at the center of scientific interest, owing to its high crystal and molecular polymorphism [119]. To date, several different crystal polymorphs of monoclinic, rhombohedral, tetragonal, and cubic symmetries have been identified in various crystallization conditions. Insulin microcrystals enclose zinc-based insulin hexamers in one out of three different conformations, known as T6, T3R3 f and R6, depending on the conformation of monomers' N-terminal residues of chain B (Figure 9). T stands for an extended, R f for a "frayed" intermediate and R for a helical conformation, while the subscript is indicative of the number of monomers that exhibit the aforementioned arrangement [120]. Phenolic or non-phenolic organic molecules that can act as ligands have been used in HI co-crystallization experiments, resulting in a diverse assortment of polymorphs [30][31][32]98,[120][121][122].  [122,123].

Rapid-acting analogues
The interconversion among the three conformations is mediated by ligand-binding in allosteric sites with the most important among them being the hydrophobic pockets (3 in T3R3 f and 6 in R6), which bind phenol-like ligands [99,124]. In the absence of allosteric ligands, insulin hexamers adopt the T6 conformation. The T3R3 f conformation can be induced by thiocyanate anions [125] while T3R3 f and R6 conformations are induced and further stabilized by the binding of phenol and its derivatives to the abovementioned hydrophobic pockets (phenolic pockets) [122,126,127]. The three conformations display different biochemical stability in the following, descending order: R6 > T3R3 f > T6 [128]. Furthermore, it is examined whether a single microcrystalline pharmaceutical formulation could contain two active components, via the co-crystallization of HI with selected organic molecules of proven pharmacological importance, providing better regulation of insulin release which will be combined with the availability and the mode of action of the co-crystallized molecule.
It is evident that insulin is distinguished for its polymorphism at both the molecular and crystal levels. A combination of both types of polymorphic characteristics may lead to products with improved features. Thus, identification of these polymorphs must be performed in the polycrystalline sample which should be examined as unity. XRPD is the optimum research tool that makes this type of study feasible. Early attempts were made by Norrman and his colleagues in 2006 [122], but data quality only allowed for the extraction of limited structural information via data clustering based on their similarities and principal component analysis [129]. XRPD patterns of each cluster can, however, be used as "fingerprints" for the different insulin polymorphs. In the following years, improvements in instrumentation led to enhanced data resolution maximizing the extracted structural information.
External insulin is provided subcutaneously via injections obviating its degradation by gastric enzymes, while research aiming toward administration of HI in the form of a pill or inhalation is still proceeding [130][131][132][133].

Distinct and Novel HI Polymorphs Identified via XRPD
Depending on pH and ion concentration upon crystallization, the conformation of HI shifts between different molecular and crystal polymorphs. In "ligand-free" samples, in cases when pH ranges from 5 to 6.5, the rhombohedral symmetry (T6 molecular conformation) of HI (space group: R3, a = 82.99 Å , c = 34.07 Å ) has been identified, while in pH range from 6.9 to 7.5, the T6 alters to T3R3 f (a = 80.66 Å , c = 37.74 Å ), a transition which is evidently depicted in peak position changes ( Figure  10). Early results indicate an additional structural modification in samples prepared at pH values 7.8-8.6 as a first order phase transition occurs, and HI molecules obtain cubic symmetry (space group: I213, a = 79.1 Å ) (PDB ID: 9INS, [134]). The coexistence of two phases in pH range from 7.02 to 7.39 was evident in high resolution diffraction profiles from synchrotron source, the quality of which allowed for simultaneous refinement and accurate extraction of unit-cell parameters via Pawley method.
The structural behavior of HI in the presence of several organic additives, mainly phenolic derivatives which were originally used in pharmaceutical compounds as preservatives by virtue of their antimicrobial properties, has been extensively studied [30,121,135]. Ιn the presence of phenolic ligands, insulin-based pharmaceutical products bear improved physicochemical properties, as well as enhanced resistance to degradation. Toward the development of new pharmaceuticals and improving already existing ones, molecules with well-established pharmacological action employed as ligands provide new prospects for currently known treatment approaches.
Widely employed ligands such as phenol, resorcinol, and m-cresol enter inside the hydrophobic pockets of insulin and strongly stabilize the hexameric conformation by forming two H-bonds between the phenolic hydroxyl and the carbonyl oxygen of Cys A6 and the amide NH of Cys A11 at one end of the pocket [124].
Another important factor which strongly affects insulin and protein crystallinity in general is the crystallization pH, as this has been established by several earlier studies [136][137][138][139]. Within a wide pH range, protein molecules may modify in various ways, leading, for example, to partial amino acid neutralization, disrupting the formation of salt bridges between protein molecules, and thus decreasing the crystallization rate.
One of the first successful structure refinements using XRPD data was conducted by R.B. Von Dreele and referred to insulin, when a sample of microcrystalline precipitate, produced as a byproduct of single crystal production process, was examined. The experiment led to the identification of a previously unknown rhombohedral polymorph with a = 81.2780 (7) Å , c = 73.0389 (9) Å , which is fundamentally a doubled c axis superlattice of the T3R3 f structure (a phase denoted as T3R3 f DC). The complete structural determination was achieved via XRPD and verified later via SCXD experiments [140].
Novel insulin polymorphs were also reported by Norrman & Schluckebier [120], providing a driving force for further research on insulin. Specifically, variation in pH and co-crystallization with different ligands led to the production of new crystalline polymorphs with diverse physicochemical properties, thorough investigation, and analysis of which revealed enhanced characteristics in terms of physical stability and dissolution rate. Polycrystalline materials of bovine insulin were studied later on, in pH values from 5.0 to 7.6 [12] and data disclosed to the T6 hexameric insulin conformation (space group: R3, a = 82.5951 (9) Å , c = 33.6089 (3) Å for the sample crystallized at pH: 5.0). Despite significant efforts devoted to the structural characterization of HI and its complexes with different ligands, there are still novel crystalline phases to be discovered complementary to the rich diagram of phase transitions including the C2221 and C2 polymorphs identified a few years ago [120,122], and two previously unknown monoclinic formulations, P21(α) & P21(γ), reported by our team [30][31][32].
To date, our research has been focused on the polymorph identification using the XRPD method for HI in the absence and presence of organic ligands and phenolic derivatives in pH variation. Ligands such as phenol and resorcinol derivatives which led to the formation of more than one monoclinic symmetry polymorphs are of particular interest [31,32] (Figure 11). The previously referenced P21(γ) crystal polymorph (a = 87.0749 (7) Å , b = 70.1190 (5) Å , c = 48.1679 (5) Å, β = 106.7442 (8)°) was identified in cases of HI crystallization in the presence of m-cresol (pH: 4.5 to 6.7) and 4nitrophenol (pH: 5.1 to 6.3), as illustrated in Table 2, while in pH: 6.7 to 8.6 and 6.2 to 8.1, the complexes adopt the rhombohedral R3 symmetry, with R6 and T3R3 f HI conformation accordingly [26,31]. In the remarkable case of 4-ethytlresorsinol, monoclinic symmetry was observed throughout the whole pH range (4.95 to 8.05) for repeated crystallization experiments [32]. Four different monoclinic polymorphs were identified, two of which [C2 and P21(β)] were structurally known, whereas the other two belong to the P21 space group and were first reported by our team in previous studies [P21(α) and P21(γ)] [30,31] (Table 2), with HI obtaining the R6 molecular conformation, in the case of P21(γ) polymorph.
Even more recent studies from our research team revealed two additional novel monoclinic polymorphs in cases of co-crystallization of HI with two phenolic derivatives, p-coumaric acid and resveratrol [99]. The first one, namely P21(η), was identified in the presence of p-coumaric acid (pH:  Table 2). However, both complexes obtain the rhombohedral R3 in pH values around 6.5 to 7.5, while for HI-p-coumaric crystals an additional first order transition to cubic phase (space group: I213), was detected. It has also been reported that binding interactions of ligands in the phenolic pockets are further stabilized by the binding of certain anions such as halides, pseudohalides, and organic carboxylates [124,128,141,142]. Based on the previously identified HI complexes with small organic molecules, distinct and novel monoclinic P21 polymorphs have been reported, mainly in mild acidic pH (5.3-6.5), around the isoelectric point of HI -. Concerning the pH of the newly identified polymorphs, we could speculate that HI molecules, due to their decreased electric charge around pI, are more receptive to adopt various crystalline conformations of low symmetry, a process strongly affected by the presence of all different ligands. Furthermore, it seems that P21(ζ) polymorph is of the highest packaging efficiency among P21 polymorphs, according to the percentage of unit-cell volume occupied by protein molecules [143], as listed in Table 3. Owing to the very dense molecular packing, additional inter-hexamer interactions may arise, further increasing stability, and, thus, extending the life of crystalline insulin formulations. The latter could be of particular interest for the development of therapeutics as the combination of tightly packed hexamers and minimum amount of solvent is often linked directly with prolonged disassociation period after injection.
Crystals 2020, 10  The XRPD technique is increasingly used in the context of characterizing pharmaceutically important crystalline phases, which may display advantageous physicochemical characteristics such as altered solubility levels and prolonged release of the active pharmaceutical ingredient, based on identification of the composition of macromolecular polycrystalline precipitates.

Macromolecular Polymorph Screening: The Case of Urate Oxidase
The identification of novel HI formulations with remarkable physicochemical properties reinforced the use of powder diffraction as a rudimentary/fundamental tool in daily research, important for identification and verification of batch-to-batch abnormalities during large-scale crystallization in the production process. However, HI is not the only highly polymorphic protein upon which the validity of XRPD was attested. Another molecule of high pharmacological importance, rasburicase (recombinant urate oxidase enzyme (Uox) from Aspergillus flavus), a homotetrameric enzyme of 135 kDa, was also examined.
Uox triggers the initial step in the degradation of uric acid to allantoin; however, it is absent in humans. Even though uric acid has strong antioxidant properties, higher concentrations of the molecule can lead to acute hyperuricemia and gout. Consequently, Uox can be used as a proteinbased drug [24,144].
Crystallization may be employed in order to formulate a protein drug [35,145], as it ensures better stability of the molecule than in a solution for storage and has a considerably lower manufacturing cost in contrast with lyophilization. Additionally, this approach allows for a highly concentrated formulation with minimum viscosity, which makes drug handling significantly easier.
Different protocols were followed exploiting a variety of crystallization conditions. In all cases, Uox when complexed with the inhibitor 8-azaxanthine (AZA), was not altered from orthorhombic I222 phase. However, in the absence of AZA during crystallization, ligand free Uox was significantly affected by the type of salt, resulting in different crystal forms [35] (Figure 13). The related crystalline phases were characterized by means of high-resolution synchrotron X-ray powder diffraction, verifying the homogeneity and phase purity of the protein precipitants whereas the extraction of accurate lattice parameters allow for direct observation of slight structure modifications due to radiation and/or sample induced effects. (e) Ligand-free Uox crystallized with NaCl and 8% PEG 8000 (P21), (f) ligand-free Uox crystallized with KCl and 10% PEG 8000 (P3121), (g) Uox complexed with AZA and crystallized with NaCl (I222).The black, red and lower black lines represent the experimental data, the calculated pattern and the difference between the experimental and calculated profiles, respectively (Q = 4π·sinθ·λ −1 ). The vertical bars correspond to Bragg reflections compatible with the particular space group [24].

Drug Screening
XRPD has been recently recognized to be at the forefront of industrial studies as an analytical tool of pharmaceuticals due to its wide range of applications [36]. Namely, the technique is ideal for the identification of impurities, monitoring of structural changes and different crystal or molecular polymorphs that often occur during drug formulation [37]. Therefore, in early drug development processes, XRPD is often used as a primary research technique and a means of differentiating between the experimentally generated materials [146].
The applicability of the method in detecting and certifying different polymorphs, as previously discussed, as well as its ability to detect fine characteristics of the microcrystals (for example their size and strains) allows for its use towards improvement of the final form of the drug, aiming at greater potency at the lowest possible cost [147]. This is an important aspect as any change in the crystalline state of the active ingredient(s) in the final product, as a result of the manufacturing process, can influence the drug's bioavailability. Thus, detection of any changes in morphology during production will ensure the consistent behavior of the final product, making the method directly related to the final drug performance.
Owing to the holistic approach of which samples are measured via XRPD, materials can be investigated directly under the conditions in which they would be used for specific applications. In particular, the applicability of the method lies largely in the ability to detect percentages of the individual crystalline component of the drugs in the final dosage form, together with the percentage of any amorphous or crystallization agents (i.e., salts) used [148].
As an additional advantage, XRPD can be employed for the analysis of final dosage forms, leading toward the determination of the integrity of the active ingredient in the final product, while its capacity for detection of crystalline impurities reaches 0.05% when inorganic or small organic molecules are under examination [146,149]. The crystallinity percentage is a valuable parameter for drug dosage forms in certain cases, as it has a significant influence on manufacturing and processing as well as the pharmacological behavior. In the following sections, the use of the XRPD method for the structural characterization of pharmaceutical peptides is reported. In addition, in-situ studies of the physicochemical stability of protein crystals in terms of variable temperature and relative humidity, as well as their applicability in the development of therapeutics, are also discussed.

Structure Refinement of a Pharmaceutical Peptide via XRPD
Currently, the majority of pharmaceutical products that are used to treat a wide spectrum of diseases are small-molecular-weight, well-characterized molecules that are generally manufactured by chemical synthesis [150]. Especially synthetic peptides which constitute analogues of natural hormones are of high scientific interest due to their wide range of pharmaceutical and biological properties. In these cases, the artificial peptides are much smaller than the native hormone, while specific modifications in amino acid sequence provide them with increased activity and resistance to proteases following their administration to the human body [151].
A peptide that constitutes a representative example of synthetic analogues is octreotide, an eight-amino-acid molecule that mimics the action of the 14-amino acid human somatostatin hormone. Its superior characteristics lie mostly on the molecule's longer half-life (up to 2 h) than somatostatin and could be infused at intervals, or even be orally administered [152]. Octreotide's multiple physiological functions and applications have led to its widespread clinical use.
Octreotide was modified by somatostatin-14 (SS-14), with amino acids 7 to 10 (Phe 7 -Trp 8 -Lys 9 -Thr 10 ) being commonly retained, since they are considered as essential receptor-binding amino acids. In octreotide, this active four-peptide sequence is structurally restricted by a disulfide bridge. Additionally, in octreotide the terminal Thr-COOH group is reduced to an alcoholic group ( Figure  14), which is, in theory, more stable to enzymatic degradation while Trp 4 (L-Tryptophan) has been replaced by the non-natural enantiomer D-Tryptophan [153], in order to increase the peptide's biological activity, overcoming difficulties like proteolytic degradation in the application site [154]. Thus, research on in vivo stable synthetic SS agonists has been focused on peptides containing the necessary -Phe 7 -(D)Trp 8 -Lys 9 -Thr 10 -fragment.

Figure 14.
Comparison of amino acid sequences of somatostatin-14 and octreotide. The amino acids necessary for binding to the receptor are shadowed [155].
Owing to the fact that the latest crystallographic study of this peptide was performed back in 1995 [153], our research team decided to conduct new XRPD measurements of freshly prepared polycrystalline specimens, in order to elucidate the three-dimensional arrangement of the peptide aiming towards the examination of its properties and the investigation of the existence of different polymorphs [34]. Additionally, in the abovementioned study, it is discussed if the polycrystalline precipitates produced could be employed in the production of longer-lasting formulations of the specific molecule.

In Situ XRPD Measurements upon Variation of the Physicochemical Environment
Structural behavior as well as dehydration range tolerance in response to environmental changes are of extreme importance for a variety of pharmaceutical compounds with regard to optimization of their production and storage conditions. Today, a steadily increasing fraction of pharmaceutical compounds contain well-hydrated micro-/nano-crystals constituted from a wide selection of molecules ranging from inorganics to small organics and more recently peptides and proteins [158]. XRPD measurements upon relative humidity (rH) or temperature variation are routinely employed for identification of structural modifications for small organics and inorganics [36,159], an approach which until recently was not common for molecular microcrystals.
In cases of protein/peptide crystals, extensive amounts of solvent are present, surrounding macromolecules with layers of water molecules which preserve their structure during crystallization [160,161]. The amount of water is closely related to relative humidity or temperature levels around the sample. Even small changes in the sample's environment may cause subsequent alterations in solvent channels, driving protein molecules not to occupy exactly equivalent positions within or between unit cells, frequently leading to insufficient resolution of their diffraction patterns [28,29,162,163].
XRPD experiments upon variable temperature and relative humidity can be conducted employing laboratory X-ray sources properly equipped with a built-in transmission temperaturehumidity chamber allowing for in situ studies with gradual variation of environmental conditions. The main goals of such experiments are either the improvement of the diffraction patterns obtained, or, from a biological point of view, the structural characterization of a molecule in a very specific condition or the inspection of its behavior upon rH variation.
Recently, the effect of relative humidity on protein crystal structures, was investigated in two studies of hen egg-white lysozyme (HEWL) polycrystalline precipitates, via in situ laboratory XRPD measurements [28,29]. Two different crystallization protocols were employed in which microcrystals were grown using the salting-out approach [177] in batch by mixing equal amounts of protein solution and crystallization buffer [28,29].
In-situ XRPD data were collected upon controlled rH variation using a laboratory Empyrean diffractometer (Malvern Panalytical) equipped with a built-in transmission temperature humidity chamber (MHC-trans Anton Paar) [178]. Polycrystalline specimens were loaded into thin Kapton-foil holders in order to reduce background contribution and were placed on a multiple position sample holder inside the chamber. In order to investigate the behavior of HEWL crystals over a wide humidity range, two series of experiments were performed: direct crystal dehydration to lower humidity levels (type 1) and gradual crystal de-/re-hydration experiments (type 2). In general, all experiments were conducted following the steps: 1. Set of a specific rH level; 2. Equilibration (minutes to hours) between sample and its environment; 3. XRPD data collection; 4. Change to a new rH level.
Once all diffraction patterns were obtained (Figure 17), they were indexed employing the Dicvol indexing package [179] from the fitted positions of at least the first 20 reflections of the powder diffraction profiles. In order to obtain accurate values of the unit-cell parameters and characterize the peak shape and background coefficients without a structural model, Pawley fits were performed. All tasks were executed using HighScore Plus software [180]. Analysis of XRPD data, which were collected during humidity variation experiments, revealed several structural modifications, as well as a novel monoclinic HEWL phase which, to our knowledge, has never been observed before. When HEWL was crystallized at pH 4.5 and 293K, a new polymorph of monoclinic symmetry (space group P21) was obtained with unit-cell parameters a = 28.  Structural changes have been observed during both direct and gradual dehydration of the crystals. When the rH levels were slowly decreased, crystals kept their structure for a longer time than during rapid humidity reduction. Rehydration of the already dehydrated crystalline samples was also employed in order to examine the feasibility of the almost collapsed crystal matrix reorganization. In samples where crystallinity was not completely lost at low rH levels, rehydration was successful, restoring the crystal structure and diffraction data quality. However, after long exposure, collapse of the crystal matrix was irreversible. These experiments indicate that the lowest rH at which crystals preserve their structure is between 75% and 80% for those of monoclinic symmetry, and between 71% and 75% for those of tetragonal symmetry, while they underlined the need of long enough waiting time for the crystalline samples to reach their equilibrium [28,29]. This is the first study establishing a preliminary protocol for quick and accurate extraction of structural information from protein polycrystalline precipitates upon humidity variation using X-ray powder diffraction and laboratory instrumentation. These observations, on a well-studied molecule such as HEWL, underlie not only the high impact of humidity on biological crystal structures, but also the significance of in-house XRPD as an analytical tool in industrial drug development and its potential to provide information for enhancing manufacturing of pharmaceuticals.

Conclusions and Perspectives
The present review article outlines the application of XRPD methods to different types of biological samples in order to design and improve pharmaceutical formulations. The important contribution of microcrystalline drug technology is indisputable due to its advantages in terms of the protection of beneficial substances, but also the screening of molecular and crystalline polymorphs, leading to prolonged action formulations. During the last twenty years, significant progress has been made in the field of macromolecular powder diffraction, while recent advances of experimental methods and computational tools have strengthened this technique and widened the systems that can be studied.
Polymorphism of therapeutic substances must be fully characterized in order to formulate a drug. XRPD has proved its applicability as the most suitable tool for high throughput and accurate characterization of numerous microcrystalline suspensions by virtue of the simplicity of XRPD data collection and the uniqueness of each polymorph's diffraction pattern. To date, research reports on ΗΙ microcrystals exhibit fascinating polymorphism, occurring upon physicochemical modifications of their environment, namely pH, temperature, and relative humidity, or ligand binding and further expanding the phase diagram of the molecule [26]. Further advantages from the use of XRPD measurements include homogeneity and purity control of the precipitates, whereas, even in cases of challenging samples, powders can easily lead to the extraction of accurate lattice parameters which allow for the detection of structural modifications. Moreover, the combined action of molecules used for co-crystallization could be exploited for the design of a microcrystalline drugs with further benefits, whereas exploration of the physicochemical characteristics of polymorphs obtained could develop drugs to replace the high concentrated injectable solutions available today, leading to a minimization of injection times offering a life-quality improvement of great importance for millions of patients.
As discussed, a very important field that benefits from the XRPD technique is the development of drugs from small peptides/hormone analogues, a challenge with particular prospects due to the enhanced characteristics of the modified peptides, in combination with the protective properties of the crystals [154]. A deeper understanding of the physicochemical features related to the conformation and action of these peptides through enhanced ADME tools will help in accelerating the development of peptides in successful drugs [182].
Despite the widely reported advantages, the technique has also some limitations compared to SCXD, mainly related to quantity of the crystalline material needed and the quality the diffraction data. XRPD method requires high amount of polycrystalline precipitate for a single measurement, which is rapidly destroyed by radiation damage as cryoprotection strategies considerably affect diffraction quality. Additionally, there is a considerable loss of structural information, as 3D data are collapsed into one-dimension detectors.
The most important problem, nevertheless, arises from the overlapping reflections, as a large number of crystals contribute to different diffraction signals simultaneously, complicating considerably the analysis of the diffraction data obtained.
However, advances in instrumentation as well as the development of powerful crystallographic software have significantly facilitated the collection of high-resolution diffraction data and have made XRPD particularly useful for the extraction of structural information. For example, ID22 experimental station at ESRF provides the possibility of using a two-dimensional detector in combination with crystal analyzers in order to retrieve high-resolution powder diffraction patterns [183], while, employing the Mythen II detector of MS-X04SA at SLS [157] and following a strategy which combines vertical focus of the beam on the detector instead of the sample and data collection at multiple detector positions, dataset's angular resolution capabilities have been expanded.
Furthermore, introduction of the free-electron X-ray laser (XFEL) to structure and dynamics in biology have the potential to prevent the effects of radiation damage [184]. XFELs provide femtosecond pulses with up to 10 12 higher photon flux than synchrotrons [185], allowing both structure determination and time-resolved studies of submicrometer crystals that are small XRPD measurements, by delivering them to the XFEL beam in a stream of their mother liquid at room temperature [186,187]. The speed and brightness offered by XFELs are crucial for certain types of experiments, and pulses are so short that data can be collected avoiding the effects of radiation damage [188]. This application of XFELs is valuable for the field of structural biology, creating many new opportunities for crystallography and imaging at atomic resolution on timescales from femtoseconds to seconds (Serial Femtosecond Crystallography/SFX) [189].
In the direction of fully understanding the biochemical operations that macromolecules accomplish, characterizing the corresponding molecular mechanisms is essential. Structural visualization is invaluable, especially when done for multiple functional states of the macromolecule of interest [190]. X-ray crystallography has been the primary technique responsible for determining macromolecule models at atomic resolution for macromolecular complexes during recent decades [191]. This approach has been enormously powerful but is limited by the fact that the molecule or complex of interest must be crystallized, which is not always possible [191]. When macromolecules and complexes prove hard to crystallize or to be produced in the sufficient concentration to even attempt crystallization trials, 3D electron microscopy is a potential alternative to X-ray crystallography that is quickly gaining popularity among structural biologists [185]. Ιn the electron microscope method, aiming to endure the high vacuum and minimize visible effects of radiation damage and thus highly affecting biological studies, samples can be then studied in a frozen hydrated state after vitrification (cryo-EM).
Overall, although XRPD on macromolecules usually requires a cooperative employment of different X-ray sources and instrumentation to provide data suitable for structure determination, recent studies have indicated that XRPD is an efficient tool in structural biology, which can be employed routinely, providing insight into important biological problems. Funding: Part of the research conducted and presented in this review article has been financially supported by the Hellenic Foundation for Research and Innovation (HFRI), Andreas Mentzelopoulos scholarships of University of Patras, COST Action (CM1306) and the NanoMEGAS company.