Independent Validation of an In Silico Tool for a Pilot-Scale Pharmaceutical Crystallization Process Development

: There are many published models for predicting crystal size distribution (CSD) in the literature. However, none of them have been independently and comprehensively tested, which is important for industrial acceptance and conﬁdence of these models. Therefore, in this study, using solubility and kinetic data from the literature, an in silico tool for predicting the crystallization process performance of a model compound system (paracetamol in ethanol) was developed and challenged by independent experiments at the 50 L pilot scale. The solute concentration was tracked, and the ﬁnal CSD was quantiﬁed using three measurement techniques including a novel image analysis tool. The reported parameter uncertainties were also addressed using Monte Carlo simulations. The results showed that, when the models were used within their validity range (e.g., suspended solids), they were able to describe the observed process trends/dynamics (CSD and solute concentration) under varying experimental conditions (cooling time and seed mass) with R 2 ranging from 0.72 and 0.90. Overall, the results indicate that, using Monte Carlo simulations to account for known parametric uncertainties, the models can support model-based approaches for crystallization process development from scale-down to scale-up studies as well as control evaluation.


Introduction
Crystallization remains the most applied separation and purification technique during the recovery of solids from solutions in the pharmaceutical industry. The design of operating conditions of a crystallization process plays a crucial role since properties of a crystal product such as purity, morphology, size distribution, and polymorphic form are determined by the operating conditions. Moreover, the performance of downstream processes following the crystallization step as well as the efficacy of drug formulation are strictly affected by these product properties [1,2]. Conventional pharmaceutical manufacturing has followed recipe-based strategies to operate crystallization processes in accordance with specifications submitted for regulatory procedures. The crystal product quality has been ensured by testing at the end of the process, so-called quality-by-testing (QbT). This traditional approach has achieved delivering quality products but also often leads to profit losses due to failed batches tested only at the end of the process [2][3][4]. Moreover, recipebased approaches also require a high number of experiments and considerable resources in order to determine a robust envelope for a crystallization operation that is also reliable during scale-up [5]. Meanwhile, the need for of more efficient process development and manufacturing in the pharmaceutical industry has emerged due to challenges competition with the generic market by the patent expiration of drugs, increasing costs of R&D, awareness towards greener products and processes, a changing mindset of industries, as well as regulatory incentives [6].
In academia, extensive efforts in the improvement of understanding, modeling, and development of advanced process control strategies in crystallization processes have been made [7][8][9][10][11][12]. Progress on population balance models has given more insight into the effects of the major process phenomenon on the quality of the crystal product. Moreover, the introduction of process analytical technology (PAT) tools and incentives by regulatory authorities have promoted research and development in crystallization monitoring and control, directing the critical crystal product quality attributes towards improvements. However, the pharmaceutical companies have rarely reported the integration and utilization of the developed technologies into their course of manufacturing and their mindset on crystallization process development starting from the laboratory to production scales [1,3]. One of the main reasons is the absence of satisfying studies on independent validation of the developed tools in large-scale crystallization processes.
Therefore, in this work, a study on an independent validation of an in silico tool for the pharmaceutical crystallization process development is presented. The motivation was to test independently whether the model-based approach works so that they are capable of predicting process trends. If so, they are useful for the acceleration of crystallization process development and scale-up studies, especially in the pharmaceutical industry. Often in the literature, crystallization kinetic models are developed by performing several experiments first and then by fitting a model into the data. Afterwards, the validity of the developed models was tested against another experimental data by the same research group and by using the same experimental setup and mostly at laboratory scales. An affirmative/confirming outcome was obtained as expected. However, whether these models are also valid independent from the research group and experimental facility is still doubtful. Therefore, in this work, we put emphasis on the term "independent". To this end, crystallization process simulations were performed first and then physical experiments were performed to verify "independently" the predictive ability of the model. Accordingly, the crystallization kinetic models taken from the literature were used directly without any modification or parameter fitting and were simulated, and afterwards, the simulation results were compared with data that was obtained from the crystallization experiments. Herein, a particular challenge for the model predictions was created by the experiments at a 50 L pilot scale. Four different cases were tested to understand the strengths as well as the pitfalls of the models. The differences between the reported kinetic data of the same compound were also taken into account in the modeling by means of Monte Carlo simulations. Three different measurement techniques, namely, laser diffraction, focused beam reflectance measurement (FBRM), and a novel microscopy-based image analysis, were used in order to quantify the crystal size distribution (CSD) or the footprint of CSD and to understand the feasibility of the available CSD measurement tools during modeling. Based on the author's knowledge, this study is a novel work with the independent pilotscale application to a pharmaceutical crystallization process that aims to build trust in the model-based approaches and encourages pharmaceutical companies to invest in as well as to change mindsets towards the development and implementation of in silico tools.

In Silico Tool Framework
The framework of an in silico tool for crystallization process development is shown in Figure 1. In the framework, solubility and crystallization kinetic equations are the main inputs of the kinetic model that includes mass, energy, and population balances. The parameters of the solubility and crystallization kinetic equations are considered together with their reported uncertainty range (95% confidence interval of estimated parameters) and, if available, with correlation information as well. After generating a predefined number of random parameter values within their uncertainty range based on an appropriate sampling technique, the same predefined number of Monte Carlo simulations of crystallization kinetic model are performed to obtain all the possible model prediction variabilities or uncertainties due to the propagated input parameter uncertainties. The final CSD and solute concentration are selected as comparison criteria of the model output since they are also experimentally quantifiable information. The results obtained from the model simulations are compared with the experimental data. If the experimental data confirms the simulation predictions, the model becomes validated. Otherwise, mixing in the specific crystallizer (e.g., computational fluid dynamics (CFD)) and mixing modeling should be further studied in order to take the mixing effects on the crystallization kinetics into consideration as well. While the momentum balance of a solvent phase can be obtained from single-phase CFD simulations, the momentum balance of a solid phase might require consideration of the classification of the solids in the crystallizer geometry as well. A comprehensive tool includes a crystallization kinetic model coupled with the mixing behavior based on the compartmental modeling approach. However, in the current work, CFD and corresponding compartmentalization studies have not been reported. The validated model can be used for exploring the process operation space, uncertainty and sensitivity analysis, process risk assessment, and development and testing of different process control strategies.

Crystallization Modeling and Monte Carlo Simulations
A population balance equation combined with mass and energy balance equations was used for modeling of the batch cooling crystallization process (Equations (1)-(4)). The method of classes was applied to solve the population balance equation [13][14][15][16]. Only nucleation and growth were considered as dominant mechanisms during the crystallization.
where n(L, t) is crystal number density (#/m 3 ), B is total nucleation rate (#/m 3 s), t is time (s), δ(L 0 ) represents the birth of the crystals only into the first class of L 0 (m), G is the growth rate of the crystals (m/s), and V is the volume of the solvent (m 3 ). In the classes method, the internal size domain (L) was discredited into 500 classes between 0 and 2000 µm. The solute concentration, C (kg/kg), is described in Equation (2), as follows: where k v is the volume shape factor of the crystals, ρ c is crystal density (kg/m 3 ), and ρ l is the solvent density (kg/m 3 ). The bulk temperature, T b ( • C), and cooling water temperature, T c ( • C), are described as follows: where C p,l is the heat capacity of the solvent (2460 J/kg • C), U is the overall heat transfer coefficient (75 W/m 2 • C), A is the heat transfer area (0.65 m 2 ), ∆H c is the heat of crystallization (26.140 × 10 6 J/kmol). C p,w , ρ w , and F w are heat capacity, density, and in/out flow rate of the cooling liquid, respectively. For a comparison with the experimental data, a linear cooling profile from 40 • C to 20 • C in experimental cooling time was applied to the bulk solution. However, the cooling water flow rate in the pilot plant crystallizer during the actual experiment was controlled by an internal controller; therefore, the data were not available. For this reason, the experimental cooling liquid temperature data were directly used in Equation (3) for firstly predicting the heat transfer coefficient and then predicting the bulk temperature, T b . It is also possible to use dimensionless correlations such as Nusselt number [17] for calculation of the heat transfer coefficients. However, the parameters of these correlations are often available only for the commonly used vessel geometries and internal parts. As a case study, the batch cooling crystallization of paracetamol from the ethanol solvent was simulated using the modeling tool implemented in Matlab/Simulink. The solubility of paracetamol in ethanol was measured at different temperatures in the literature [18][19][20]. The solubility expression developed by Fernandes (1999) [18] was used in this work and shown in Table 1. The kinetic expressions regarding nucleation and growth mechanisms of a paracetamol-ethanol system were adopted from previous studies [20][21][22][23] and shown in Table 1. Uncertainties in the model input parameters were also considered in the process model by means of a Monte Carlo (MC)-based uncertainty propagation approach. In this MC-based uncertainty propagation approach, the idea was to perform multiple simulations by assigning multiple random values for the uncertain parameters within their individual uncertainty range. As a consequence, all the possible process outcomes (outcome uncertainties) due to the uncertainties of the parameter were obtained. This was achieved by performing the following steps [24]: Step 1. Definition of input uncertainty: the parameters of crystal nucleation and growth, solubility, and volumetric shape factor were considered sources of uncertainty in the model. The reported 95% confidence interval of the nucleation and growth parameters was regarded as the uncertainty range. In the work of Fernandes (1999), the temperature range for the solubility measurements was between 35 and 10 • C and the standard deviation of the measured solubility was reported instead of the confidence interval of the estimated solubility parameters. Therefore, the 95% confidence interval of the predicted solubility was calculated assuming a sample number of 6. Another source of uncertainty in the model was the volume shape factor, k v , that is used in the calculation of crystal volumes for each size class and the solute mass balance. The uncertainty range for k v was determined as being between 0.674 and 0.866, which are the reported values for the paracetamol crystals [18,25]. • Step 2. Sampling from input uncertainty space: in this step, random samples (values) from input space (uncertainty range) with predefined sampling number N were created with an appropriate sampling technique (e.g., Latin Hypercube Sampling (LHS)). The choice of sampling size, N, is a tradeoff between the MC integration error known as σ M = σ Output / √ N (where σ Output is the standard deviation of the different outputs of interest) and the computational time for each model simulations. Often, a large sample size is recommended for MC-based uncertainty analysis, so that the calculated MC integration error remains constant [24,26]. The sampling size of 200 was selected since it was sufficient to obtain statistically significant results (e.g,. σ M of 4.2 × 10 −5 ). In the work of Mitchell (2012) [23], no information regarding correlation between estimated parameters was reported, while in the work of Meisler (2014) [27], a correlation matrix between the estimated nucleation and growth parameters was given. Therefore, two different sets of sampling matrices were formed (a total of 400 samples and an N of 200 for each parameter set). In the first sampling set, using the parameters reported by Mitchell (2012), a sampling matrix based on LHS technique was created [28]. In the second set, using the parameter values and the correlation matrix reported by Meisler (2014), the second sampling matrix was created, in which the correlation between the parameters was taken into account [29]. The plot of generated sampling matrices can be found in the Supplementary Materials. • Step 3. Performing Monte Carlo simulations: the total 2 × N model simulations with 2 sets of N input parameter samples created in the previous step were performed. • Step 4. Review and analysis of the results: the outputs of the 2 × N model simulations were plotted, and the results were statistically analyzed (e.g., as mean, standard deviation, etc.) [24]. Table 1. Literature data on the solubility and kinetics of paracetamol crystallization from the ethanol and parameter values with 95% confidence interval.

Sensitivity Analysis for Design of the Validation Experiments
The prediction performance of the developed tool was tested through independent experiments at the 50 L pilot scale. Traditional approaches for the design of experiments demand a high number of experiments, time, and resources that were not practical in pilot-scale applications. Therefore, we aimed also to use the mechanistic models and the developed tool in order to investigate the relationship between different operational design variables and the process performance and eventually to come up with fewer and more targeted experiments. Herein, we performed a sensitivity analysis to determine which operational parameter to alternate in the experiments by quantifying the sensitivity of selected process output to the variations in the operational design parameters. Cooling time, cooling profile, seed mass, and seed size fraction (including mean and standard deviation) were the operational variables that were selected while yield and nucleation rates were defined as the process performance output. Two sensitivity analysis techniques of Morris screening [30][31][32] and Polynomial Chaos Expansions (PCE)-based Sobol's indices [33,34] as a variance decomposition model were employed.

Materials
Paracetamol (acetaminophen) and n-Hexane were bought from Merck Milipore. Ethanol 99.9% was obtained from Univar A\S.

Apparatus
For evaporation of the solvent, a Rotavapor R-210 from Büchi Labortechnik AG with a heating bath B-491 and vacuum controller V-850 was used. For drying of the samples overnight, a vacuum oven VACUTHERM from Thermo Fisher Scientific was used (at 40 • C). For the crystal size measurements based on laser diffraction method, Beckmann Coulter LS 13 320 Laser Diffraction Particle Size Analyzer with a dry powder system, TORNADO was used. ParticleTrack G400 from Mettler Toledo was used for measuring the chord length distribution based on the FBRM technique. Crystal size distribution measurements based on image analysis were performed using Particle Analyzer equipment (oCelloScope) applying Standard Segmentation Algorithm version 0.4.14.161, which was kindly provided by the company ParticleTech ApS in Farum, Denmark.

Crystallizer Geometry
A crystallizer (Büchi AG, Switzerland) equipped with a curved 3-blade turbine and two baffles, shown in Figure 2, was used. The baffle with 3 wings had a temperature sensor, Pt100, at the bottom. The ethanol volume in the crystallizer was 50 L, which was increased to about 62 L after the addition of 10.8 kg paracetamol, corresponding to the solution height of 535 mm. The impeller was operated at two different rotation speeds of 50 rpm and 150 rpm that create turbulent flow in the crystallizer, having Reynolds numbers of 40,288 and 120,860, respectively.

Seed Preparation
Paracetamol seeds were prepared by sieving the bought product. Sieved paracetamol below 300 µm were used as seed material.

Crystallization Experiments
A sensitivity analysis was performed in order to identify the experimental conditions so that a few numbered but more targeted experiments can be performed. The purpose of the sensitivity analysis was to study which design parameter has the most effect on the output of interest and, therefore, to obtain a parameter importance ranking. Once the most influential parameter pair was obtained, the experiments were designed to alternate those parameters. Between the operation parameters, cooling time, cooling profile, seed mass, and seed size fraction (mean and standard deviation) were selected to be investigated in the sensitivity analysis. Yield and nucleation rate were selected as output performance criteria since yield was desired to be higher and nucleation was desired to be avoided. It is important to emphasize here that the purpose of these experiments was to test the prediction ability of the developed model rather than to optimize the process. Detailed information about the design space of the studied parameters and the calculated sensitivity indices can be found in the Supplementary Materials. The results of sensitivity analysis showed that cooling time, seed mass, and cooling profile were the most important design parameters. Slight changes in the importance ranking of these parameters were obtained due to dependency on the output of interest and the sensitivity analysis method. Consequently, seed mass and cooling time were chosen to be varied under the guidance of the sensitivity analysis results. Moreover, impeller speed was defined as the third variable, since the mixing behavior and its effect on the process are often the concern, particularly in large-scale systems. Crystallization of paracetamol from ethanol was conducted at four different experimental conditions that are listed in Table 2. At the beginning of each experiment, an initial supersaturation ratio of 1.01 at 40 • C was aimed. For this purpose, in the first experiment, Exp P01, 10.8 kg of paracetamol was dissolved in 39.5 kg of ethanol (50 L), heated up to 50 • C, and kept at this temperature for 30 min to ensure complete dissolution. Afterwards, the solution was cooled down slowly to 40 • C. The experiments started when the seed material was added into the crystallizer and jacket cooling was activated. During the experiments, several samples were taken from the bottom of the crystallizer for measurement of the concentration gravimetrically. The taken samples were firstly filtrated under vacuum to separate the solid crystals from the solution; 50 mL of the filtrated solution sample was weighted, and ethanol solvent was evaporated in a rotary evaporator operated between 150 and 98 mbar at 40 • C. Complete drying of the samples was ensured by keeping the samples overnight in a vacuum oven. The dry paracetamol samples were weighted again to identify the amount of evaporated ethanol so that the solution concentration in terms of kg of paracetamol per kg of ethanol can be calculated. This procedure was repeated two times for each sample. At the end of each crystallization experiment, an additional sample around 1 L was taken for the analysis of the final CSD. To separate the crystals from the suspension, the sample was vacuum filtered and washed two times using pure ethanol at 20 • C based on the displacement wash technique. In this washing step, the pure ethanol solvent maintained the same temperature as the suspension sample was added on top of the filter cake while the process solvent was still on the filter up to the level of the crystal cake. Since the system was under vacuum, the newly added solvent very quickly replaced the solvent and washed the crystals. This method is advised in order to avoid crashing of the crystals, crystallization of the impurities, or inclusion of another solvent. Afterwards, the crystals were dried overnight in the vacuum oven. The size distribution of the obtained paracetamol crystals was measured using three different techniques that will be discussed in the following section. CSD analysis based on the laser diffraction method was made using dry crystals. For the image analysis, the crystals were put into an n-hexane solution, which does not dissolve paracetamol and was analyzed in a titer-plate in the previously mentioned image analysis tool. For offline FBRM measurement, 5 g crystals were put into 100 mL of distilled water in a reactor and mixed with an impeller speed of 200 rpm, and chord length distribution (CLD) was measured. It is also important to mention that, after the first experiment was performed, the remaining suspension was not disposed but reused for the next experiment. For this purpose, at the end of the experiment, the suspension was reheated up to 50 • C to dissolve the crystallized paracetamol and another sample was taken for the measurement of the concentration. During the experiments, the volumes of the all taken samples were recorded as well. Using this information (concentration and removed sample volumes), the required solid (paracetamol) and solvent (ethanol) amounts were calculated. The calculated amounts were added into the crystallizer before the next experiment so that the same initial conditions can be achieved.

Crystal Size Distribution Measurement Techniques
Crystal size distribution is one of the major quality attributes to specify a crystalline material [35]. In industrial practice, CSD is often measured based on offline analysis, e.g., the taken (dry) sample is quantified in a laboratory. It is also known that the quantified CSD strongly depends on the measurement principle of the utilized measurement technique. Additionally, the more the crystal shape deviates from a sphere, the more the deviation between the quantified CSD in different techniques [36]. The most commonly used CSD measurement techniques are laser diffraction, FBRM [37], and image analysis.
The laser diffraction technique is based on the principle of measuring forward scattering (diffraction) of the light from a laser that is related to the projected area of the crystals.
The orientation of a spherical crystal does not influence the projected area in the laser beam. However, this is not valid for non-spherical crystals. Assuming a random orientation of the non-spherical crystals in the laser beam, an average diameter over all the dimensions of the crystal are quantified by the light scattering method. As a consequence, a tail in volume distribution appears as the shape of the crystal deviates from a sphere [36]. Another aspect to consider is that the most common commercial laser diffraction equipment reconstructs the size distribution from a diffraction pattern using a scatter matrix for the spherical particles. This also causes much more deviations in the volume distribution calculations, especially for the particles with flakes or elongated shapes compared to the cubic shapes [38].
On the other hand, FBRM technique was used by a large community as an online monitoring and control tool. The FBRM sensor measures backward light scattering using a focused beam of laser light that scans along the full length of a particle passing probe window. As the laser beam intersects the edge of a particle, backscattered beams are detected by a detector in the same probe. Therefore, it produces a rise in the signal in the circuit until the laser beams reach the other edge of the particle. The chord length is calculated by the multiplication of rise-time and tangential velocity of the rotating laser beam (a straight line between two edges of a particle). It is still complicated to establish a direct correlation between the measured chord length and the actual CSD, and often, a sensor model might be needed [37,39]. The measured chord length distribution is also dependent on the optical properties of the particle. This requires particular caution while measuring the transparent particles.
The image analysis method applies advanced algorithms to the images, which are taken by a high-speed and high-resolution camera. These advanced algorithms identify the crystals and quantify the related sizes (e.g., average feret) of the detected crystals. Image analysis can be used for dynamic online systems and can provide a fast track of changes in the crystals. However, a large number of images might be required to be processed, so that a representable sample of the crystal population is obtained [37]. The image analysis equipment used in this work is based on oCelloScope optical scanning technology. The instrument contains a digital camera, an illumination unit, and a lens. It enables rapid and high throughput generation of a sequence of 6.25 • -tilted images along the horizontal plane so that an image Z-stack is formed [40]. The stack forms the best-focus image. The best-best focus image is then processed by applying the advanced segmentation and feature extraction algorithms developed by ParticleTech ApS. As a result, detailed 3D information of each identified particle down to 0.5 µm in size is obtained. Dry and suspension samples can be analyzed offline (using titer-plate or microscope glass slide) or at-line (pumping the suspension into a flow-cell) [41].

Statistical Performance Indicators
For a comparison between the experimental measurements and the model predictions, some measures of the statistics such as coefficient of determination (R 2 ), root-mean-square error (RMSE), average absolute error (AAE), and average relative error (ARE) are also used [39,42,43]. R 2 describes the goodness of the model fit and is shown in Equation (5): where y Exp i is the experimental measurement and y Pred i is the model prediction for the data points i = 1, . . . , N. µ is the mean of the differences between the experimental and prediction data.
RMSE is a measure of the differences between the observed and predicted values and is calculated using the Equation (7): AAE measures the deviation of the predicted values from the experimental measurements, and the calculation of AAE is shown in Equation (8): ARE measures the average of relative error calculated with respect to the experimental measurements, as shown in Equation (9):

Seed
CSD of the seed material was calculated based on image analysis, where over 12,000 seed particles were detected on several images taken. An example of the microscopy pictures of the seed material and detected particles by the image analysis algorithm can be seen in Figure 3. The image analysis algorithm gives a list of all the detected crystals with measured mean feret diameters. Therefore, a probability distribution of the detected crystals was calculated assuming a constant crystals size class (bin width) of 4 µm. A beta distribution with the mean diameter of 38.7 µm was obtained and shown as a histogram in Figure 3.  Figure 4 shows the simulation predictions of the solute concentration profile and the final crystal size distribution. Uncertainties in the model input parameters were propagated to the model output through Monte Carlo simulations. The consequence of model prediction variability can be seen in Figure 4 as well. Gravimetrically measured experimental data of the solute concentration during crystallization is also plotted for comparison. At the end of the experiments, the CSD of the final crystal product was quantified using the novel image analysis method. This method was chosen as the comparison method superior to other methods since the population balance is built based on the principle of conservation of particle numbers. Accordingly, the number-based distribution obtained from the image analysis method can be directly used in the population balance model. This is due to the fact that a list of crystals with their mean size is obtained by image analysis. Therefore, the detected crystals can be directly classified according to the classes (bins) of crystal diameters, yielding the density of the number-based distribution. On the other hand, converting the volume distribution or chord length distribution requires a sensor (calibration) model that correlates the measured distributions to a number-based crystal size distribution. An example of microscopy pictures and detected particles of image analysis for Exp P01 can be seen in Figure 5.    The simulation predictions show that, in general, the solute concentration consumption trend during crystallization can be captured by the crystallization model. The calculated measures of the statistics for the comparison of the experimental concentration measurements and the model predictions can be seen in Table 3. In all four experiments, a faster de-supersaturation especially at the beginning of each experiment was observed and the crystallization model underpredicted the rate of the solute concentration consumption initially. The deviations between the predictions and the experiments increase especially in the cases of fast cooling (in Exp P01 and Exp P03), where the supersaturation was higher throughout the experiment. Accordingly, higher deviations in the concentration predictions with respect to the experimental measurements were calculated for Exp P01 and Exp P03 in terms of RMSE, AAE, and ARE compared to Exp P02 and Exp P04. The reason behind these observations might be due to the fact that nucleation kinetics are highly uncertain, and upon scale-up, the metastable limit for nucleation might have changed. The secondary nucleation kinetic equation employed is an empirical equation that lacks, e.g., the particleimpeller collusion effects upon scale-up. In Exp P04 and Exp P03, at initial experimental sample points, the gravimetrically measured solute concentration is below the solubility curve, which is unreasonable. Here, we suspect some experimental measurement errors on the samples taken especially at the beginning of the experiments, when the temperature of the solution was much higher than the room temperature. Even though the samples were directly filtrated under the vacuum as soon as possible after being taken, some amount of the solute in the solution might have still crystallized on the wall of the sampling tube or during the filtration step due to the high-temperature difference between the environment and the sample. For a comparison of the final CSD, the crystal sizes were calculated for 10-90% (d10, d20, . . . , d80, d90) of the fractional number density distribution and used for the statistical measures. Table 4 shows the calculated values. The best experiment-model fit was observed in Exp P01 with a R 2 of 0.90 and an RMSE of 48.35 µm. This is followed by Exp P04 with a R 2 of 0.72 (RMSE of 94.12 µm). In the final CSD plot of Exp P04 in Figure 4, it can be seen that the crystal size ranges predicted by the model overlap the experimental measurements. However, no clear peak in the experimental measurements, which might be due to insufficient offline sampling, decreases the accuracy of the model prediction compared to the measurements in terms of statistical measures in Table 4. The high ARE value in Exp P04 is due to the high number of crystals detected via measurements at the crystal size range below 10 µm. This resulted in high differences between calculated d10 and d20 values. If these two measures were removed from the statistical calculations, the ARE value would decrease down to below 50%. On the other hand, the lowest model prediction-experimental data fit was observed in Exp P02 (R 2 of 0.69) and Exp P03 (R 2 of 0.53). It is important to mention here that, during these two experiments, in which impeller speed was set as 50 rpm, the settlement of particles was observed in the crystallizer; 50 rpm was chosen as the impeller speed based on one of the scale-up rules, namely, constant impeller tip speed between scales. The observed particle settlement in 50 rpm mixing intensity demonstrated also the unreliability of this scale-up rule. At the end of these experiments, we increased the impeller speed to 150 rpm to mix the crystals homogeneously and to take a representative sample from the crystallizer. However, the settlement phenomenon was not taken into account in the modeling of the crystallization process at this stage. This can be a reason for underprediction of the growth of seed crystals in Exp P02 and Exp P03. We expect that the crystals that were settled were subjected directly to a much colder exchanger surface. Therefore, they were subjected to much higher supersaturation that affected the rate of crystallization kinetics, resulting in higher growth, and were compared to the crystals suspended inside the crystallizer and moving continuously as the impeller mixing. Moreover, the comparison between the calculated experimental and predicted yield of the processes is overall in good agreement especially for Exp P02-Exp P04, as shown in Table 5, since the final solute concentration plays the major role for the yield. In Table 5, the comparison of mean crystal size, d 32 can be seen. In conclusion, in the absence of poor mixing, the simulation results of the crystallization model are in good agreement with the measurements of the independent pilot-scale experiments. Overall, it can be concluded that the models can be used to guide crystallization process development, exploration of design space, and optimization when the homogeneous mixing is guaranteed. For a perfect fit between the model predictions and the experiments, further experiments are still needed for specific systems, e.g., nucleation models including impeller-crystal collision. To obtain a comprehensive precision in the predictions, the crystallization kinetic model should be coupled with a solid and liquid phase momentum model, e.g., compartmental modeling in order to examine if there is a possibility of poor mixing and variations in the presence of poor mixing.

Crystal Size Distribution Measurements
Since this study concerns modeling of the crystallization process by means of the population balance, the image analysis technique was preferably used. Because it does not require an additional sensor model (e.g., CLD measured by FBRM requires a sensor model to correlate CLD with CSD). However, the other two most widely used methods were additionally presented here as well. The main purpose is to understand the differences between them and their applicability rather than making a direct comparison. An example of the measured crystal population distributions (fractional density distributions) of the product from Exp P01 based on three techniques is presented in Figure 6. The different measures of population distribution are delivered based on these techniques such as volume, chord count, and number. Therefore, a probability density was calculated by dividing the measurement data in each bin (or channel) by the sum of measurement data in all bins (or channel). In the measurement of the products from all experiments, we have observed also a tail especially in the larger crystal size area (>400 µm). This can be due to the deviation in the particle shape from a sphere as discussed in Section 2.2.6. Similar d 50 values of distribution were obtained in laser diffraction and the image analysis methods as 194 µm and 186 µm, respectively. However, the width (span) of the distribution remains broader in the laser diffraction method compared to the volumebased distribution calculated by converting the number-based distribution obtained in the image analysis method. A lower value of d 50 as 121 µm was obtained by offline FBRM measurement, while the span of the distribution was in between the laser diffraction and image analysis techniques. A comparison of the CSD of the remaining experiments (Exp P02-Exp P04) and the calculated d 10 , d 50 ,d 90 and the width of the distributions can be found in the Supplementary Materials. In general, more deviations were observed in the seed measurements compared to the product measurement. The reason behind this can be explained as the shape of seed material being deviated more from a sphere compared to the products such as in Figures 3 and 5.

Conclusions
An in silico tool for a pharmaceutical crystallization process was developed in this work. The prediction ability of the tool was tested by independent experiments performed in a 50 L pilot-scale crystallizer.
In the tool development stage, a batch cooling crystallization model was developed based on a population balance equation combined with mass and energy balance equations. The reported solubility and kinetic data related to paracetamol crystallization from the ethanol solvent were taken from the literature. Uncertainties in the reported parameters were also addressed by performing Monte Carlo simulations. As a non-conventional way, sensitivity analysis was performed in order to design the validation experiments. The purpose was to have fewer but more targeted experiments since the pilot-scale applications consume high energy, time, and a considerable amount of resources.
In the validation stage, four crystallization experiments with different conditions were performed at the 50 L pilot scale. In these experiments, seed amount, cooling time, and impeller speed were varied, as the sensitivity analysis suggested. Solute concentration and final CSD quantified by a novel image analysis tool were used as comparison criteria with the model predictions. The in silico tool delivered comparable results. Especially, the model predictions gave a reasonably good agreement with two experiments, in which no mixing problem was observed. However, the particle settlement, of which the model did not take into account, was observed in the remaining two experiments, and consequently, the model underestimated the growth of the crystals. The analytical technique used for the measurement of CSD challenges all model prediction and, therefore, should be approached with caution. This promising study showed that the modeling tools can be used to predict process trends especially in the absence of poor mixing in the system and should be interesting to those participating in process development of pharmaceutical crystallization processes (e.g., exploration of operation design space, uncertainty and sensitivity analysis, process risk assessment, and development of process control strategies). For the future perspective of this study, a detailed and comparative study of the mixing (e.g., implementing solid and liquid phase momentum balances, and down-and upscaling) will be studied.