3.1. SEC Method
Size exclusion chromatography (SEC) separates proteins on the basis of their hydrodynamic radius, is simple to use and compatible with a high-throughput mode, and is the election method for the analysis of aggregation in therapeutic proteins [
11]. The quantity of aggregate and monomer in a protein are often evaluated using SEC coupled with an ultraviolet (UV) spectrophotometer. SEC with multi-angle light scattering (SEC/MALLS) and analytical ultracentrifugation sedimentation velocity (AUC-SV) are characterization techniques used to support the SEC analysis [
11]. Oliva et al. [
12] have used SEC coupled with different detection modes to study the stability, level of impurities and its identification, and aggregation mechanism of bevacizumab. Typically, the SEC method is used in lot release testing, comparability and similarity assessments for purity-related product quality attributes, but not for the protein content as critical quality attributes (CQAs) due to various reasons related to the disadvantages of this techniques: a reversible aggregation process, sample dilution effect or adsorption of the protein to the chromatographic matrix. All these factors would reduce the quantity of the aggregate and monomeric protein detected [
11]. Some of these were previously analyzed for us [
12].
Several international regulatory organizations [
6,
7] recommend carrying out the validation process at two levels: The “pre-study” validation should show that the method achieves its objectives, whereas the “in-study” validation verifies that the method remains stable and in-control over time [
6]. This latter plays a vital role during the analytic similarity assessment of biosimilar drugs. In all these procedures, it is necessary to consider the measurement uncertainty associated with the results and to determine whether the level of variability satisfies the specification limits [
7].
3.1.1. Pre-Study Validation
A pre-study validation procedure was conducted to prove that the method could deliver quality results, following the ICH-Q2-(R1) [
8]. System adequacy parameters such as the retention time, tailing factors, and theoretical plate number were measured to check the performance. All these parameters were within the specified limits. Interference from the excipients (trehalose and polysorbate 20) did not interfere with the peak of interest; their elution times were greater. To evaluate the effect of the chromatographic matrix, a sample of 30 µg/mL was injected (the same sample) into the system six times, and an RSD of 0.35% was obtained. In conclusion, the quantity of aggregate and monomer protein is not affected. The identity of both species was checked by us in a previous work using SEC/MALLS [
12].
The calibration standards were prepared at five different concentration levels ranging from 5 to 30 µg/mL (n = 30), using the mobile phase as a diluent, and analytic runs were performed on different days. In such a situation, it is important to check that there is no difference between days in order to pool data. For this, the following statistical model was used:
where
yij is the response for the concentration
xj-th for the
i-th day;
β0 and
β1 correspond to the intercept and slope of the first day;
α0,i are the differences between intercepts for day 1 and day 2, 3,…k, and
α1,i are the differences in the slopes. The term
ε represents the error associated with the concentration
j-th measured on the
i-th day.
Table 1 shows the ANOVA results obtained using the “lm” function from R-program [
13].
The null hypothesis of no differences between intercepts and slopes was accepted for all situations (p > 0.05). This supposes that the observed variability is only due to the analytic method error, and second, that the data from different days can be pooled to obtain the regression line. In our case, a linear relationship between the signal peak areas of bevacizumab (μV·s) and the corresponding concentrations (μg/mL) were found (
Figure 1). The absence of systematic errors (bias) was verified using the plotted reference versus predicted values for the samples of the calibration [
14]. The limit of detection (LOD) and limit of quantitation (LOQ) were determined as 2.14 and 6.49 µg/mL, respectively, using the signal-to-noise approach.
The accuracy was examined by the RSD of the recovery data from a minimum of nine determinations over a minimum of three concentration levels covering the specified range. The average percentage recoveries were found to be 99.1% with an RSD of 3.00%. The system precision expressed as repeatability was determined from six replicate injections of a sample at 20 µg/mL, yielding an RSD value of 0.85%. The intermediate precision of the assay was also evaluated by the same analysts, under the same work conditions, but on three different days, in accordance with the design proposed by Maroto et al. [
15]. In this case, the overall precision corresponds to a time-different intermediate precision. From this study, the between-measurement variance (sm2) and the between-day variance (sday2) were calculated to be 0.025 and 0.090, respectively, for a concentration of 20 µg/mL, the intermediate precision being expressed as a percentage RSD of 1.70%.
The statistical design of the experiments was utilized to evaluate the robustness of the analytical method, in our case a factorial design with two levels and two variables. From the two-way ANOVA, the Flow main term only contributes to the model, whereas the pH term was not influenced.
Figure 2 shows the contour plot of the monomer content as a function of both variables. Considering a level of variability of ± 2% for the flow rate (a real situation in this type of analytical method) and changes in the pH of ± 0.2 units, the monomer content is within the expected value, indicating that our analytical method is robust enough.
The bevacizumab was stable at 5 °C over a 24-month period; no more than a 1% difference was observed with respect to the label claim. The long-term stability data from the manufacturer show that no significant changes occur when stored at 5 °C for up to 24 months [
12].
3.1.2. In-Study Validation
The main objective of this validation process was to check the maintenance of the validation conditions in the laboratory over a long time period. For this, a control sample with a nominal concentration of 25 µg/mL was analyzed each working day (n = 34). Different control charts have been used to monitor the quality and reliability of an instrument for testing CQAs. The analytic method can provide robust results if the observed variability falls within the pre-specified acceptance limits. In this case, the uncertainty in estimating these limits should be taken into account [
16].
Control charts are designed under the assumption that a method to be monitored will produce measurements that are independent and identically distributed over time, when only the inherent sources of variability are present in the method [
17]. For this, the normality of the data must be checked through Q-Q plots and statistical tests (e.g., Anderson-Darling, Shapiro-Wilk or chi-square). The results confirmed that the concentration distribution is not significantly different from a normal distribution at a 5% significance level according to Q-Q plots and the Anderson-Darling test. Assuming that the data are indeed normally distributed, the method mean was estimated to be 24.9 µg/mL from the X-chart. At first, all individual measurements were within the lower (LCL) and upper (UCL) control limits [24.4, 25.4] calculated according to Montgomery [
17]. The Moving Range (MR)-chart allows the method’s standard deviation to be estimated, the value being 0.17 µg/mL. These data indicate that the analytical method is in control and stable. Therefore, control charts provide enough information to fix the method requirements.
Cumulative Sum (CuSum) and Exponentially Weighted Moving-Average (EWMA) control charts efficiently complement the X_bar and MR control charts when there is interest in detecting small changes in the process, around ± 1.5 SD, and the sample consists of an individual unit. However, many researchers have discussed which of them is better in accordance with the level of variability that must be detected [
15]. In practice, the EWMA control chart worked well with the parameters
λ = 0.4 (smoothing constant) and L = 3.054 (control limit width fixed at three standard deviations), a value recommended by Montgomery [
15]. Under this scenario, the sample number #22 was out of the central limits (see
Figure 3). This fact is assignable with a column change. Using a value of
λ = 0.2, the change was clearer, since the samples #21 and #23 were also affected.
The CuSum control charts directly incorporate all the information into the sequence of sample values by plotting the cumulative sums of their deviations from a value objective [
18]. Moreover, when an up or down tendency appears, this indicates that the process average changes, which requires a search to determine the causes. Furthermore, the CuSum control chart showed two altered zones, the first with an upward tendency (C+), located between the positions 19 and 25. Furthermore, this was a similar situation to those observed in the EWMA control chart, whereas the second one was located close to the end of the process (see
Figure 4) and with a downward tendency (C−). The cause of this latter change was attributable to introducing a new bevacizumab lot from the sample #28.
At first, the CuSum control chart performed better for detecting shifts lower than 1.5 SD. However, EWMA provided the forecast for where the average will be in the next period, which makes it easier to apply in the control method.
3.2. Risk Assessment
Determining the suitability of the analytical method and the risk of taking incorrect decisions depends both on the used consistency and criteria [
6]. Moreover, the control charts data analysis during the in-study validation can estimate the accuracy and precision providing a measure of data quality, whereas a joint measure including both parameters can help us to gauge the associated risk [
19,
20]. In our case, the
β-expectation tolerance interval rule was used to assess both aspects. The balanced one-way random effects model proposed by Mee [
21] was used to construct two-sided
β-content tolerance intervals. This only requires calculating the method within-run (
σ2E), between-run (
σ2B), and total variance (
σ2Tot), and the quantities of the standard normal and chi-square distribution according to the degree of freedom [
21,
22]. In this case, the data consists of measurements taken over a nominal concentration of 20 µg/mL, following a sampling design with nine independent runs (I = 9) and three replicates per runs (J = 3). The resulting two-sided
β-expectation tolerance intervals for a nominal concentration of 25 µg/mL were 25.28 and 24.58 mg/mL, respectively corresponding to 1.13% and −1.69% from the nominal concentration [
22]. Thus, the performance is adequate with regards to the nominal concentration, since both parameters were lower than the acceptance limit,
λ, which was set at 3%, although a value of 2% may also be acceptable.
3.3. Analytical Method Performance
The performance requirements of any analytical method must be previously established using different tools and criteria; for example, the ATP tool is a good option. In this case, the overall uncertainty (i.e., precision + accuracy = ± 3%) and probability (95%) were established, criteria based on the quality of the results and the capability of the analytical method used in this study.
The application of the traditional acceptance criteria for the analytical measurements for nominal concentrations of 25 µg/mL indicates that the measurement has no more than a ± 2% bias and no more than a 2% variability, expressed as precision. However, various values are outside the defined region for 3% when the ATP approach is applied, and, therefore, it is not possible to ensure with at least a 95% probability that the samples are within ± 3% of 95–105% of the nominal concentration. This fact is due to bias and precision being interdependent in comparison with the traditional acceptance criteria, and therefore the method does not meet the requirements of ATP. At first, the data analysis indicates that the bias is under control but the precision must improve in order to meet the ATP. Thus, the selection of the level of performance has a great impact on the result quality and consequently affects its validity.
3.4. Quantification of Bevacizumab in Pharmaceutical Preparation
Directive 75/318EEC as amended states “Unless there is appropriate justification, the maximum acceptable deviation in the active substance content of the finished products shall not exceed ±5% at the time of manufacture. On the basis of the stability test the manufacturer must propose and justify maximum acceptable tolerance limits in the active substance content of the finished product up to the end of the proposed shelf-life” [
23]. Within this limit, both the variability in production and the test procedure for the assays, i.e., the analytic method error, are included.
According to this Directive, the active substance content of the finished product up to the end of the shelf life must be higher than 95%. The expected value for the content must be at least 95% plus the analytic method error in order to satisfy this. Under this condition, the analytic method must satisfy this objective; if the opposite is true, the method must be optimized or a new one must be found.
The obtained results show that all analysed bevacizumab lots present a content of monomer, expressed as a percentage of the declared content, within acceptance limits established by the Directive. Any analytic method used to test particular CQAs should be developed, qualified, validated and implemented, and, in particular, their suitability for the intended purpose should be demonstrated. All these aspects have been analysed and verified for the proposed SEC method.
3.5. Analytical Similarity Evaluation
In the last guidance published in 2019, the FDA recommends a quality range (QR) method in the comparative analytical assessment [
1]. The QR of the reference product for specific CQAs is defined as:
, where
is the reference product lots mean,
is the standard deviation of the reference product lots, and
x should be appropriately justified. A comparative analysis of a quality attribute would generally support a finding that the proposed product is highly similar to the reference product when a sufficient percentage of biosimilar lot values (e.g., 90%) fall within the QR defined for that attribute.
In this work, we tested six bevacizumab commercial lots manufactured at different times and showing at least 1-year residual shelf lives, from two measurements for each lot. To apply the QR method, and for illustration purposes: (1) we arbitrarily assigned a lot number to each sample (scenario #1) and (2) the approach proposed by Wang and Chow [
24] was used to obtain the between-lot (
) and within-lot variances (
), respectively.
Table 2 shows the obtained results for different scenarios. For the obtained variances for scenario #1,
and
were 0.15 and 0.59, respectively,
being 0.86, whereas the mean for the reference product was 99.46%. The analytical method error expressed as RSD was 0.78%
, the between-lot variability being 0.38%. Considering these data, the total variability is lower than 1.0%. In scenario #2, we increased the total variability up to 2.5% (value corresponds to the estimated analytic method error). The mean content was 99.26%, whereas
was similar to those obtained for scenario #1, and
(i.e., analytic method error) was 4.28, approximately 7 times higher in comparison with that estimated for scenario #1. The total variability total was 2.11%.
At this point, the analytic method error is within the expected limits (2.5%), whereas the between-lot variability is lower than 1%. The results indicate that the two main sources of variation are in-control; even the observed level of variability is close to the capability of the analytic method.
To apply the QR method, it is necessary to fix the x constant value. Recent publications and FDA presentations specify x values ranging from 2 to 3 to assure that there will be a high percentage (e.g., at least 95%) of the test values falling within the QR [
25,
26]. The FDA recommends a lower X value for the higher risk quality attribute such as the protein content. To apply this approach, we have considered the mean content for six lots used in this study as test lot individual values (see
Figure 5). Based on the FDA criteria, a high similarity is demonstrated for both scenarios, since more than 90% of the data points (all data) for the test product were within the QR of the reference product (
Figure 5A). In principle, the reference product is authorized and should always be biosimilar to itself. The results confirmed this fact.
In a second step, we analyzed the real data published by Burdick et al. [
27] for the protein content (% of declared content) by RP-HPLC for the US Neupogen and Commercial EP 2006 lots. To apply the QR method, we used the “sample with replace” function from R-program to obtain the replicates for each lot [
13]. The obtained variances for scenario #3,
and
, were 1.15 and 0.167, respectively,
being 1.15, whereas the mean for the reference product was 100.14% (
Table 2). The analytical method error expressed as RSD was 0.40%, the between-lot variability being 1.15%. Considering these data, the total variability was slightly higher than 1.0%. In scenario #4, we increased the total variability up to 2.15% (value published by the author). The mean content was 100.48%, whereas
was 0.670 for scenario #4, and
(i.e., lot-to-lot variability) was 4.35, approximately 4 times higher in comparison to that estimated for scenario #3. The total variability total was 2.32%.
At this point, the analytic method error is within the expected limits for an HPLC method (2–5%), whereas the between-lot variability is higher than 2.0%. Based on the FDA criteria, a high similarity is only demonstrated for scenario #4, since more than 90% of the data points (all data) for the test product were within the QR of the reference product (
Figure 5B).