Predictive Potential of Cmax Bioequivalence in Pilot Bioavailability/Bioequivalence Studies, through the Alternative ƒ2 Similarity Factor Method

Pilot bioavailability/bioequivalence (BA/BE) studies are downsized trials that can be conducted prior to the definitive pivotal trial. In these trials, 12 to 18 subjects are usually enrolled, although, in principle, a sample size is not formally calculated. In a previous work, authors recommended the use of an alternative approach to the average bioequivalence methodology to evaluate pilot studies’ data, using the geometric mean (Gmean) ƒ2 factor with a cut off of 35, which has shown to be an appropriate method to assess the potential bioequivalence for the maximum observed concentration (Cmax) metric under the assumptions of a true Test-to-Reference Geometric Mean Ratio (GMR) of 100% and an inter-occasion variability (IOV) in the range of 10% to 45%. In this work, the authors evaluated the proposed ƒ2 factor in comparison with the standard average bioequivalence in more extreme scenarios, using a true GMR of 90% or 111% for truly bioequivalent formulations, and 80% or 125% for truly bioinequivalent formulations, in order to better derive conclusions on the potential of this analysis method. Several scenarios of pilot BA/BE crossover studies were simulated through population pharmacokinetic modelling, accounting for different IOV levels. A redefined decision tree is proposed, suggesting a fixed sample size of 20 subjects for pilot studies in the case of intra-subject coefficient of variation (ISCV%) > 20% or unknown variability, and suggesting the assessment of study results through the average bioequivalence analysis, and additionally through Gmean ƒ2 factor method in the case of the 90% confidence interval (CI) for GMR is outside the regulatory acceptance bioequivalence interval of [80.00–125.00]%. Using this alternative approach, the certainty levels to proceed with pivotal studies, depending on Gmean ƒ2 values and variability scenarios tested (20–60% IOV), were assessed, which is expected to be helpful in terms of the decision to proceed with pivotal bioequivalence studies.


Introduction
The approval of brand-name and generic drugs under the European Medicines Agency (EMA) [1] and US Food and Drug Administration (FDA) [2] usually requires bioavailability/bioequivalence (BA/BE) studies.These pharmacokinetic clinical studies are designed to demonstrate comparable bioavailability or bioequivalence, defined as the absence of a significant difference in the rate and extent to which the active substance in pharmaceutical equivalent or pharmaceutical alternative medicinal products becomes available at the site of drug action when administered at the same molar dose under similar conditions [1].Claiming bioequivalence between two products assumes an equivalent therapeutic efficacy and safety.
When companies are uncertain whether the potential of a new formulation is bioequivalent to a so called Reference product, it is usual to carry out downsized pilot studies as a gatekeeping in vivo strategy to decide whether or not to move forward with a full-size pivotal study [3][4][5].
Pilot studies data are usually analyzed similarly to pivotal studies, using the average bioequivalence approach, given that no formal methodologies are provided in the guidelines.However, due to the low number of subjects usually enrolled, the results obtained from these studies are difficult to interpret, particularly when the inter-occasion (IOV) or intra-subject variability is high, as the point estimate obtained for the means ratio may not be close to the real population value [5,6].Consequently, pilot studies are considered underpowered studies.
In a previous work, authors have suggested a decision tree to be applied for the analysis of data from pilot BA/BE studies, which included the use of an alternative approach to the average bioequivalence, i.e., the similarity factor ƒ 2 applied to the comparison of the geometric means (G mean ) of plasma concentration-time profiles [3].A cut off of 35 for the G mean ƒ 2 factor has been proposed to conclude on a potential similarity between the Test and Reference formulations on the absorption rate (as assessed by the maximum observed concentration [C max ]), which is regulatorily required to be demonstrated in pivotal BA/BE studies [3].For the tested simulated scenarios, this cut off demonstrated a good relationship between avoiding type I error (which represents the probability of erroneously conclude bioequivalence, known as consumer's risk) and type II error (which represents the probability of erroneously conclude bioinequivalence, known as producer's risk).However, the method was tested in ideal simulated scenarios, i.e., assuming either completely equal Test and Reference formulations (truly bioequivalent with a true Test-to-Reference Geometric Least Square Means [LSM] ratio [GMR] of 100%), or completely different formulations (truly bioinequivalent with a true GMR of 70%) [3].
Considering that during drug product development, less favorable GMRs are commonly expected, in this work, the authors aimed to further investigate the proposed G mean ƒ 2 factor in comparison with the standard average bioequivalence in more extreme and realistic scenarios, in order to better derive conclusions on the potential of this analysis method to be applied into pilot BA/BE studies.Hence, two major scenarios are tested: 1.
The Test product presents a lower bioavailability (BA) than the Reference product, with a true GMR of 90% (truly bioequivalent formulations) and 80% (truly bioinequivalent formulations).

2.
The Test product presents a higher bioavailability than the Reference product, with a true GMR of 111% (truly bioequivalent formulations) and 125% (truly bioinequivalent formulations).
For each of the two major scenarios tested, several pilot BA/BE crossover studies were simulated through population pharmacokinetic modelling, accounting different IOV levels.Method' performance was measured with a confusion matrix.

Materials and Methods
For each major scenario, a total of 140,000 BA/BE crossover trials were simulated, corresponding to 5,880,000 different simulated concentration-time profiles per major scenario.For each major scenario, simulations were performed using two different Test-to-Reference ratios of the mean population values for the absorption rate constant (k a ), different sample sizes and different IOV levels for the volume of distribution (V).In all simulations, a fixed value was used for inter-individual variability (IIV) (Figure 1).
Trial simulations and statistical analysis were performed with R version 4.0.3(R Foundation for Scientific Computing, Vienna, Austria, 2013).

Study Design and Pharmacokinetic Simulation
Two-treatment (Test and Reference), two-sequence (Sequence 1 and Sequence 2), two-period crossover (2 × 2 × 2) studies were simulated (Figure 1-Study Design) as described by Henriques et al. (2023) [3].A range of 12-30 simulated subjects per study were randomized and administered a single 50 mg oral dose of either Test or Reference products, separated by a washout of 7 days (Figure 1-Study Design).
As in the previous work [3], oral drug absorption and disposition were described using a one-compartmental model with first-order absorption and first-order elimination, defined through ordinary differential equations (ODE), parameterized with micro constants (Figure 1-Structural Model, and Equations ( 1) and ( 2), and considering a log-normal additive experimental error of 10% (Equation ( 3)) [3,7].
Regarding the compartmental model parameter k a , and for the scenario where the Test product showed a higher bioavailability than the Reference product, a fixed mean population value of 1.22 h −1 was assumed for the Reference product, and 0.732 h −1 (truly bioequivalent formulations) or 0.484 h −1 (truly bioinequivalent formulations) was assumed for the Test product.These k a values were expected to provide a true GMR of approximately 90% and 80% for truly bioequivalent and truly bioinequivalent formulations, respectively (Figure 1-Covariate Model).
For the scenario where the Test product shows a lower bioavailability than the Reference product, a fixed mean population value of 1.22 h −1 was assumed for the Test product, and 0.732 h −1 (truly bioequivalent formulations) or 0.484 h −1 (truly bioinequivalent formulations) was assumed for the Reference product.These k a values were expected to provide a true GMR of approximately 111% and 125% for truly bioequivalent and truly bioinequivalent formulations, respectively (Figure 1-Covariate Model).
For k e and F model parameters, a mean population value of 0.150 h −1 for k e and of 0.9 for F was assumed in all simulation scenarios.
Previous results showed that IOV in V was the variability identified with highest impact on the evaluation of C max bioequivalence metric.The variability tested for the other model parameters had no relevant impact [3].Therefore, in the present study, only IOV for V was included in the model.
For each individual and occasion, V was generated considering a mean population value of 58.8 L, a log-normal distribution, a 30% IIV, and one of the following seven (7) different levels of IOV: (i) 0%, (ii) 10%, (iii) 20%, (iv) 30%, (v) 40%, (vi) 50%, and (vii) 60% (Figure 1-Statistical Model, Equation ( 4)).The impact of IIV was not assessed, as this variability was not expected to provide differences in the statistical analysis results, since it was suppressed by using a crossover design, as shown in previous simulations [3].
Within each group of simulations and for each variability scenario, 1000 bioequivalence crossover trials were simulated.As in previous work, simulations only studied the effect of variability on the bioequivalence of C max [3].

Simulation Bioequivalence Analysis
Simulation bioequivalence analysis and measure of methods' performance were performed as described by Henriques et al. (2023) [3].
As an alternative to the average bioequivalence approach, the arithmetic (A mean ) and geometric (G mean ) mean ƒ 2 factor approaches were tested with a cut off of 35, as proposed by Henriques et al. (2023) [3].By placing a cut off of 35 for the ƒ 2 factor, a maximum difference of 20% between the concentration-time profiles until the Reference t max was tested [3,14].
Likewise, for each variability and number of subjects' simulation scenario, the performance of each bioequivalence evaluation method (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factors) was measured with a confusion matrix in terms of sensitivity/power (capacity of avoiding type II errors), specificity (capacity of avoiding type I errors), precision (identified bioequivalent simulations that are truly bioequivalent), negative predictive value (NPV, identified bioinequivalent simulations that are truly bioinequivalent), accuracy (true bioequivalent and bioinequivalent predictions), F 1 (harmonic mean of sensitivity and precision), Matthews' Correlation Coefficient (MCC, correlation between the truth and the method prediction), and Cohen's Kappa (κ, agreement relative to what would be expected by chance) [3,15,16].For each tested method, the confusion matrix performance results were graphically presented over the number of subjects, for each tested variability scenario.Sensitivity and specificity were also plotted over the tested IOV.

Simulated Pharmacokinetic Data
The summary statistics of the simulated pharmacokinetic parameter V are presented in the Supplementary Materials section, along with 90% CI of the simulated concentrationtime profiles, and the summary statistics of the estimated pharmacokinetic metrics C max , t max and AUC.
For the simulation scenarios where a lower k a value for the Test product was assumed in comparison to the Reference product, the Test product showed, in the case of truly bioequivalent formulations, a G mean value for C max between 577 and 583 µg/L (Supplementary Material S.1.3.1), which was reached between 1.0 and 6 h (median t max = 2.75 h) (Supplementary Material S.1.3.2), and a G mean value for AUC 0-t between 4900 and 4930 µg•h/L (Supplementary Material S.1.3.3).In the case of truly bioinequivalent formulations, the Test product showed a G mean value for C max between 512 and 515 µg/L (Supplementary Material S. 1.3.1),which was reached between 1.5 and 8 h (median t max = 3.25 h) (Supplementary Material S.1.3.2), and a G mean value for AUC 0-t between 4880 and 4900 µg•h/L (Supplementary Material S. 1.3.3).For both truly bioequivalent and truly bioinequivalent formulations, the Reference product demonstrated a G mean value for C max between 641 and 649 µg/L (Supplementary Material S.1.3.1), which was reached between 0.75 and 4 h (median t max = 2.25 h) (Supplementary Material S.1.3.2), and a G mean value for AUC 0-t between 4938 and 5000 µg•h/L (Supplementary Material S. 1.3.3).
For the simulation scenarios where a higher k a value for the Test product was assumed in comparison to the Reference product, for both truly bioequivalent and truly bioinequivalent formulations, the Test product demonstrated a G mean value for C max between 639 and 646 µg/L (Supplementary Material S.2.3.1), which was reached between 0.75 and 4 h (median t max = 2.25 h) (Supplementary Material S.2.3.2).The Reference product demonstrated, in the case of truly bioequivalent formulations, a G mean value for C max between 577 and 583 µg/L (Supplementary Material S.2.3.1), which was reached between 1.0 and 6 h (median t max = 2.75 h) (Supplementary Material S.2.3.2), and a G mean value for AUC 0-t between 4900 and 4950 µg•h/L (Supplementary Material S.2.3.3).In the case of truly bioinequivalent formulations, the Reference product demonstrated a G mean value for C max between 511 and 515 µg/L (Supplementary Material S.2.3.1), which was reached between 1.5 and 8 h (median t max = 3.25 h) (Supplementary Material S.2.3.2), and a G mean value for AUC 0-t between 4867 and 4904 µg•h/L (Supplementary Material S.2.3.3).
For V, C max , and AUC, the 95% CIs for G mean are tightened, assuring an appropriate number of simulations per scenario.Moreover, the estimated geometric coefficient of variation (GCV%) results from the IIV and IOV components.
No differences were found for the apparent elimination half-life of the different simulated formulations, t 1 2 ≈ 4.6 h.

Bioequivalence Evaluation
As planned, for the scenario where the Test product presents a lower bioavailability than the Reference product, the simulations for truly bioequivalent and truly bioinequivalent formulations demonstrated a mean GMR of approximately 90% and 80%, respectively, while for the scenario where the Test product presents a higher bioavailability than the Reference product, the simulations for truly bioequivalent and truly bioinequivalent formulations demonstrated a mean GMR of approximately 111% and 125%, respectively, with a coefficient of variation (CV%) of approximately 2%, 4%, 7%, 10%, 13%, 17%, and 20% for the simulations with an IOV of 0%, 10%, 20%, 30%, 40%, 50%, and 60%, respectively (Figure 2).
For the two major scenarios tested, a mean ƒ 2 factor of 37 was observed for truly bioequivalent formulations and a mean ƒ 2 factor of 24 was observed for truly bioinequivalent formulations, with a CV% of approximately 5%, 10%, 18%, 25%, 30%, 35%, and 40% for simulations with an IOV of 0%, 10%, 20%, 30%, 40%, 50%, and 60%, respectively.These values corroborate the use of a cut off of 35 for the ƒ 2 metric to evaluate a potential bioequivalence between two formulations in terms of C max (Figure 4).
Such as with previous simulations [3], an inverted V-shaped correlation between the ƒ 2 factor and GMR was found (Figure 5).However, unlike previous simulations where the plot was centered on a GMR of 100% [3], in the current simulations the V shape was moved to the opposite direction of the true GMR.Such behavior is a consequence of the fact that the ƒ 2 factor was based on the normalization of the mean concentrations of the Test and Reference until the Reference t max .For simulations where the Test product shows a lower bioavailability than the Reference product (true GMR of 80% and 90%), the Reference product presented a faster absorption, resulting in a Reference t max < Test t max , and hence in a cut off of the normalization of the mean concentration curves earlier than the occurrence of the Test C max .Consequently, the number of timepoints used for the calculation of the ƒ 2 factor was reduced, increasing the ƒ 2 value.On the other hand, for simulations where the Test product showed a higher bioavailability than the Reference product (true GMR of 111% and 125%), the Reference product had a slower absorption, resulting in a Reference t max > Test t max , and hence in a cut off of the normalization of the mean concentration curves after the C max of the Test product was reached.Consequently, the number of timepoints used for the calculation of the ƒ 2 factor was increased, decreasing the ƒ 2 value.Such behavior did not affect the performance of the method.
For both major scenarios tested, and for the lowest tested variability (an IOV from 0% to 10%), average bioequivalence was shown to be the most sensitive method, being able to detect nearly 100% of the truly bioequivalent formulations simulated with a 0% IOV, and being able to detect approximately 78% to 99% of the truly bioequivalent formulations simulated with a 10% IOV, in studies with 12 or 30 subjects.On the other hand, A mean and G mean ƒ 2 factor approaches were less sensitive, detecting approximately 88% to 93% of the truly bioequivalent formulations simulated with a 0% IOV, and approximately 70% to 80% of the truly bioequivalent formulations simulated with a 10% IOV, in studies with 12 or 30 subjects (Figure 6 and Table 1 for the Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).For both major scenarios tested, a mean ISCV% of approximately 6%, 12%, 20%, 30% 42%, 53%, and 65% was observed for simulations with an IOV of 0%, 10%, 20%, 30%, 40% 50%, and 60%, respectively (Figure 3).For the two major scenarios tested, a mean ƒ2 factor of 37 was observed for truly bioequivalent formulations and a mean ƒ2 factor of 24 was observed for truly bioinequivalent formulations, with a CV% of approximately 5%, 10%, 18%, 25%, 30%, 35%, and 40% for simulations with an IOV of 0%, 10%, 20%, 30%, 40%, 50%, and 60%, respectively.These values corroborate the use of a cut off of 35 for the ƒ2 metric to evaluate a potential bioequivalence between two formulations in terms of Cmax (Figure 4).Such as with previous simulations [3], an inverted V-shaped correlation between the ƒ2 factor and GMR was found (Figure 5).However, unlike previous simulations where the plot was centered on a GMR of 100% [3], in the current simulations the V shape was moved to the opposite direction of the true GMR.Such behavior is a consequence of the fact that  For both major scenarios tested, and for the lowest tested variability (an IOV from 0% to 10%), average bioequivalence was shown to be the most sensitive method, being able to detect nearly 100% of the truly bioequivalent formulations simulated with a 0% IOV, and being able to detect approximately 78% to 99% of the truly bioequivalent formulations simulated with a 10% IOV, in studies with 12 or 30 subjects.On the other hand, Amean and Gmean ƒ2 factor approaches were less sensitive, detecting approximately 88% to 93% of the truly bioequivalent formulations simulated with a 0% IOV, and approximately 70% to 80% of the truly bioequivalent formulations simulated with a 10% IOV, in studies with 12 or 30 subjects (Figure 6 and Table 1 for the Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).
Table 1.Cross-tabulated matrix statistics calculated for each bioequivalence evaluation method (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) for each tested variability, considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 90% and 80%).However, for simulations with a higher IOV (IOV ≥ 20%), the Amean and Gmean ƒ2 factor demonstrated a higher sensitivity than the standard average bioequivalence analysis.

Average Bioequivalence GMR Centrality Amean ƒ2 Factor Gmean ƒ2 Factor
The ability of the average bioequivalence method to detect truly bioequivalent formulations decreased greatly, towards approximately 35% to 70% with 20% IOV, 10% to 40% with 30% IOV, and 2% to 20% with 40% IOV, in studies with 12 or 30 subjects, respectively.For an IOV greater than 50%, the sensitivity of the method was inferior to an IOV of 10% (Figure 6 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).
The sensitivity/power of the Amean and Gmean ƒ2 factor decreased as well with the increment of IOV, but not so steeply as with the standard method, allowing this alternative approach to demonstrate a superior sensitivity in the tested scenarios.For simulations with 20% IOV, the ƒ2 factor correctly identified around 60% to 70% of the truly bioequivalent formulations, while for simulations within 30% to 60% IOV, the Amean and Gmean ƒ2 factor allowed nearly 50% to 60% of the truly bioequivalent formulations to be correctly identified (Figure 6 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).Table 1.Cross-tabulated matrix statistics calculated for each bioequivalence evaluation method (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) for each tested variability, considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 90% and 80%).However, for simulations with a higher IOV (IOV ≥ 20%), the A mean and G mean ƒ 2 factor demonstrated a higher sensitivity than the standard average bioequivalence analysis.

Average
The ability of the average bioequivalence method to detect truly bioequivalent formulations decreased greatly, towards approximately 35% to 70% with 20% IOV, 10% to 40% with 30% IOV, and 2% to 20% with 40% IOV, in studies with 12 or 30 subjects, respectively.For an IOV greater than 50%, the sensitivity of the method was inferior to an IOV of 10% (Figure 6 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).
The sensitivity/power of the A mean and G mean ƒ 2 factor decreased as well with the increment of IOV, but not so steeply as with the standard method, allowing this alternative approach to demonstrate a superior sensitivity in the tested scenarios.For simulations with 20% IOV, the ƒ 2 factor correctly identified around 60% to 70% of the truly bioequivalent formulations, while for simulations within 30% to 60% IOV, the A mean and G mean ƒ 2 factor allowed nearly 50% to 60% of the truly bioequivalent formulations to be correctly identified (Figure 6 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).
Pharmaceutics 2023, 15, x FOR PEER REVIEW 15 of 24 Figure 6.Variation in sensitivity/power for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-occasion variability (below), considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 90%).Regarding the specificity, as expected the average bioequivalence was suitable for avoiding the identification of false bioequivalent formulations and maintaining the type I error around 5%, irrespective of the sample size and IOV.The Amean and Gmean ƒ2 factor approaches performed well in avoiding type I errors for an IOV < 40%.However, these approaches inflated type I errors when the IOV increased above 40%.For an IOV of 40%, a type I error < 5% was reached for trials simulated with 14 subjects.For an IOV of 50%, a type I error < 10% was reached for trials simulated with 16 subjects, decreasing to <5% for trials simulated with 24 subjects.For an IOV of 60%, a type I error < 10% was reached for trials simulated with 20 subjects, and the type I error was close to 5% for trials simulated with 28 subjects (Figure 8 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 9 and Table 2 for a Test product with a higher bioavailability than the Reference product).Table 2. Cross-tabulated matrix statistics calculated for each bioequivalence evaluation method (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) for each tested variability, considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 111% and 125%).Regarding the specificity, as expected the average bioequivalence was suitable for avoiding the identification of false bioequivalent formulations and maintaining the type I error around 5%, irrespective of the sample size and IOV.The A mean and G mean ƒ 2 factor approaches performed well in avoiding type I errors for an IOV < 40%.However, these approaches inflated type I errors when the IOV increased above 40%.For an IOV of 40%, a type I error < 5% was reached for trials simulated with 14 subjects.For an IOV of 50%, a type I error < 10% was reached for trials simulated with 16 subjects, decreasing to <5% for trials simulated with 24 subjects.For an IOV of 60%, a type I error < 10% was reached for trials simulated with 20 subjects, and the type I error was close to 5% for trials simulated with 28 subjects (Figure 8 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 9 and Table 2 for a Test product with a higher bioavailability than the Reference product).

Average
For each of the two major scenarios, precision, NPV, accuracy, MCC, F 1 and κ were also calculated in order to better understand the potentiality of each evaluation method in pilot BA/BE trials (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).
The ƒ 2 factor was always the most precise method, i.e., the method for which the identified bioequivalent formulations were more probable to be truly bioequivalent (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).
Moreover, for higher-variability scenarios (an IOV ≥ 20%), the ƒ 2 method was also the most reliable method for the identification of truly bioinequivalent formulations, i.e., the methodology with a higher NPV.For this method, the NPV varied little with the increment of subjects within the same variability scenarios (approximately 90% for 0% IOV, 80% for 10% IOV, and 70% for an IOV ≥ 20%).However, for lower variabilities (an IOV < 20%), the average bioequivalence was the method with a higher NPV (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).Correlation Coefficient (MCC), and Cohen's Kappa (κ) for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability, considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 111% and 125%).
Additionally, the study of the distribution of the calculated ƒ2 values for truly bioequivalent and truly bioinequivalent studies could also improve the certainty of the obtained results.These simulations showed that nearly 100% of the ƒ2 values above or equal to 50 (corresponding to a 10% difference between Test and Reference products [3,14]) were true positives (i.e., precision), irrespective of the IOV.Moreover, until a 40% IOV, more than 90% of the ƒ2 values above or equal to 41 (corresponding to a 15% Figure 11.Variation in precision, negative predictive value (NPV), accuracy, F 1 , Matthews' Correlation Coefficient (MCC), and Cohen's Kappa (κ) for the bioequivalence evaluation methods (average centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability, considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 111% and 125%).
For an IOV ≥ 20%, the ƒ 2 factor was also the most accurate methodology, showing a similar accuracy, despite the increase in sample size, within each simulated variability scenario (approximately 100% for 0% IOV, 90% for 10% IOV, 80% for an IOV within 20% and 30%, and 70% for an IOV within 40% and 60%) (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).
Pondering simultaneously sensitivity and precision, the average bioequivalence was the method with lowest F 1 .On the other hand, the ƒ 2 factor method could maintain a harmonic mean between sensitivity and precision, with an F 1 of approximately 100% for a 0% IOV, 80% for an IOV within 10% and 20%, 70% for an IOV within 30% and 40%, and 60% for an IOV within 50% and 60% (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).
Considering the correlation between the true classes and the predicted labels, once again, the average bioequivalence was the method that scored lower, and the ƒ 2 factor was again the most superior method, with an MCC of approximately 90% for 0% IOV, 80% for an IOV within 10%, 70% for a 20% IOV, 60% for an IOV within 30% and 40%, and 50% for an IOV within 50% and 60% (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).
Additionally, the study of the distribution of the calculated ƒ 2 values for truly bioequivalent and truly bioinequivalent studies could also improve the certainty of the obtained results.These simulations showed that nearly 100% of the ƒ 2 values above or equal to 50 (corresponding to a 10% difference between Test and Reference products [3,14]) were true positives (i.e., precision), irrespective of the IOV.Moreover, until a 40% IOV, more than 90% of the ƒ 2 values above or equal to 41 (corresponding to a 15% difference between the Test and Reference products [3,14]) were true positives, and for an IOV within 50% to 60%, the precision of an ƒ 2 above or equal to 41 was above 80%.For ƒ 2 factors above or equal to 35, the precision was above 90% for simulations below a 20% IOV, and was above 80% for an IOV of 30%.For an IOV above 40%, more than 60% of the ƒ 2 values above or equal to 35 were true positives (Figure 12).The combination of the ƒ 2 factor method with the centrality of the GMR did not improve the precision of the method.
Pharmaceutics 2023, 15, x FOR PEER REVIEW 20 of 24 difference between the Test and Reference products [3,14]) were true positives, and for an IOV within 50% to 60%, the precision of an ƒ2 above or equal to 41 was above 80%.For ƒ2 factors above or equal to 35, the precision was above 90% for simulations below a 20% IOV, and was above 80% for an IOV of 30%.For an IOV above 40%, more than 60% of the ƒ2 values above or equal to 35 were true positives (Figure 12).The combination of the ƒ2 factor method with the centrality of the GMR did not improve the precision of the method.Simulations also elucidated that, for truly bioequivalent formulations where Test and Reference differ by a maximum of 10% on Cmax, the probability of the point estimate (GMR) being centered within [90.00-111.11]% is around 60% in baseline studies without a tested IOV.This probability linearly decreases towards 32.2% to 42.2% for simulations with a 60% IOV, in trials with 12 or 30 subjects.For truly bioinequivalent formulations where Test and Reference differ by at least 20% on Cmax, the probability of a false centered GMR can be around 20% for the higher tested IOV.Nevertheless, the probability of a centered GMR to indicate a truly bioequivalent simulation (precision) was nearly 100% for simulations with a 10% IOV, within approximately 80% to 90% for simulations within 20% and 30% IOV, and approximately within 60% to 70% for simulations within 40% and 60% IOV (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).Simulations also elucidated that, for truly bioequivalent formulations where Test and Reference differ by a maximum of 10% on C max , the probability of the point estimate (GMR) being centered within [90.00-111.11]% is around 60% in baseline studies without a tested IOV.This probability linearly decreases towards 32.2% to 42.2% for simulations with a 60% IOV, in trials with 12 or 30 subjects.For truly bioinequivalent formulations where Test and Reference differ by at least 20% on C max , the probability of a false centered GMR can be around 20% for the higher tested IOV.Nevertheless, the probability of a centered GMR to indicate a truly bioequivalent simulation (precision) was nearly 100% for simulations with a 10% IOV, within approximately 80% to 90% for simulations within 20% and 30% IOV, and approximately within 60% to 70% for simulations within 40% and 60% IOV (Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product).

Discussion
The average bioequivalence method proved to be the most sensitive method in the simulations performed for the lowest tested variability scenarios (an IOV from 0% to 10%), in both major scenarios tested.Based on Figure 6 and Table 1 results (regarding a Test product with a lower bioavailability than the Reference product), considering an IOV of 0%, the sensitivity was ≥99.4% for the average bioequivalence method, while for the other methods it was ≤94.4%.Considering an IOV of 10%, the sensitivity was ≥78.4% for the average bioequivalence method, while for the other methods it was ≤79.5%.Based on Figure 7 and Table 2 results (regarding a Test product with a higher bioavailability than the Reference product), for an IOV of 0% the sensitivity was 100% for the average bioequivalence method, and it was ≤97.0% for the other methods.Considering an IOV of 10%, the sensitivity was ≥77.7% for the average bioequivalence method and ≤77.6% for the other methods.
However, for an IOV ≥ 20%, A mean and G mean ƒ 2 factor approaches have shown a higher sensitivity/power than the standard average bioequivalence analysis (Figure 6 and Table 1 for a Test product with a lower bioavailability than the Reference product, and Figure 7 and Table 2 for a Test product with a higher bioavailability than the Reference product).Based on Table 1 results and considering an IOV of 20%, the sensitivity values derived for A mean and G mean ƒ 2 factor methods were more concise (between 60.2 and 68.8%), while for the average bioequivalence method, derived sensitivity ranged from 35.2 to 74.3%.For an IOV ≥ 30%, the derived sensitivity for A mean and G mean ƒ 2 factor methods was shown to be ≥49.7%, as it was always considerably higher than the sensitivity for the average bioequivalence method for each scenario.Based on Table 2 results and considering an IOV of 20%, the sensitivity values derived for A mean and G mean ƒ 2 factor methods were also more concise (between 62.5 and 63.5%), while for the average bioequivalence method, derived sensitivity ranged from 36.2 to 67.6%.For an IOV ≥ 30%, derived sensitivity for A mean and G mean ƒ 2 factor methods was shown to be ≥46.9%,and was also always considerably higher than the sensitivity for the average bioequivalence method for each scenario.
The sensitivity/power of the average bioequivalence method demonstrated a sigmoidal decrease, from ≈100% to ≈0%, in function of IOV, with slopes decreasing considerably with the increment in the number of subjects per trial (Figures 6 and 7).Thus, such results confirm the high sensitivity of the method to the increment on the number of subjects.
On the other hand, and based on the same figures, the sensitivity/power of the A mean and G mean ƒ 2 factor method decreased exponentially from ≈90% to a plateau of ≈60%, showing no meaningful differences in the slope with the increase in the number of subjects per trial.Moreover, for higher variabilities (an IOV > 30%), the increase in IOV did not correspond to a higher decrease in sensitivity.Considering that this method relies on the mean profile of the concentration-time curves, the increase in sample size does not greatly increase the sensitivity of the ƒ 2 factor method.Nevertheless, for a higher variability (an IOV > 40%), the increase in the sample size can reduce the rate of type I errors, as assessed by the specificity.For an IOV of 40%, a type I error < 5% was reached for trials simulated with 14 subjects, and for an IOV of 50%, type I error < 5% was reached for trials simulated with 24 subjects.For an IOV of 60%, type I error was close to 5% for trials simulated with 28 subjects (Figure 8 for a Test product with a lower bioavailability than the Reference product, and Figure 9 for a Test product with a higher bioavailability than the Reference product).
Based on Figure 10 and Table 1 for a Test product with a lower bioavailability than the Reference product, as well as on Figure 11 and Table 2 for a Test product with a higher bioavailability than the Reference product, the ƒ 2 factor was always the most precise method and the method that demonstrated the best relationship between sensitivity/power and precision (F 1 ).The ƒ 2 factor was also the method with the best correlation between reality and the method prediction, as defined by accuracy, MCC, and κ.Additionally, the ƒ 2 factor was the most reliable method for the identification of truly bioinequivalent formulations (NPV) when the variability was high (≤20%).Nevertheless, for lower variability scenarios (<20%), bioinequivalent results were more reliable for the average bioequivalence method.
Pondering the observations from the current simulations, the authors refined the previously purposed decision tree for the analysis of data from pilot BA/BE studies [3] (Figure 13).This decision tree is thought to be able to assist companies on their decision to move forward with a full-size pivotal study, for drugs following a one compartment model, with median t max ranging from 0.75 to 8 h, a mean elimination half-life of approximately 4.6 h, and a mean volume of distribution of approximately 60 L, as the limits of tested scenarios.As before, for drug products with a known ISCV% below 20%, the authors propose the estimation of the sample size for a pilot study assuming a GMR of 100%, a power of 80%, and an α of 0.05 [3].However, for cases of higher ISCV% or unknow variability, the authors propose the use of a fixed sample size of 20 subjects in the current work, as the use of higher sample sizes was not shown to increase the study power meaningfully, but was sufficient to avoid substantial type I errors.Regarding the analysis of data from pilot studies, the authors keep the methodology previously proposed [3], i.e., to initially analyze the data using the average bioequivalence approach.For the case in which the calculated GMR and the Figure 13.Newly proposed decision tree for planning and analysis of pilot BA/BE studies.However, for cases of higher ISCV% or unknow variability, the authors propose the use of a fixed sample size of 20 subjects in the current work, as the use of higher sample sizes was not shown to increase the study power meaningfully, but was sufficient to avoid substantial type I errors.Regarding the analysis of data from pilot studies, the authors keep the methodology previously proposed [3], i.e., to initially analyze the data using the average bioequivalence approach.For the case in which the calculated GMR and the corresponding 90% CI are not within [80.00-125.00]%, the alternative G mean ƒ 2 factor method should be used with a cut off of 35 (Figure 13), as it was shown to be a valuable indicator of the potentiality of the Test formulation to be bioequivalent in terms of C max with a Reference product [3].Nevertheless, in this work the authors redefine the interpretation of the G mean ƒ 2 factor results based on the greatness of the calculated value (Figures 12 and 13): 1.
If the ƒ 2 factor is above or equal to 35 (corresponding to a difference of 20% between Test and Reference concentration-time profiles until the Reference t max ), the confidence to proceed to a pivotal study is higher than 90% when ISCV% is lower or equal to 20%; the confidence is higher than 80% when ISCV% is within 20% and 30%; and the confidence is higher than 60% when ISCV% is higher than 40%.

2.
If the ƒ 2 factor is above or equal to 41 (corresponding to a difference of 15% between Test and Reference concentration-time profiles until the Reference t max ), the confidence to proceed to a pivotal study is higher than 90% for ISCV% until 40%, and higher than 80% for ISCV% within 50% to 60%.

3.
If the ƒ 2 factor is above or equal to 50 (corresponding to a difference of 10% between Test and Reference concentration-time profiles until the Reference t max ), the probability of the Test product to be truly bioequivalent to the Reference product in terms of C max , i.e., the confidence to proceed to a pivotal study, is higher than 90%, irrespective of the ISCV%.

Conclusions
Due to the reduced sample size, and consequently being underpowered, the results derived from pilot BA/BE trials performed with drug/drug products showing a considerable variability (ISCV% > 20%) are dubious, and consequently the conclusions affecting the evaluation of the potential of a Test formulation to be bioequivalent to a Reference formulation are uncertain.Therefore, the authors have the G mean ƒ 2 as an alternative approach to the average bioequivalence methodology that is generally applied to pilot studies to access the rate of drug absorption [3].The G mean ƒ 2 was shown to be capable of overcoming and reducing the uncertainty of these underpowered studies, which can meaningfully aid pharmaceutical companies in the decision to go forward with pivotal bioequivalence studies [3].
In this project, the authors continued their previous work [3] and performed simulations in more extreme scenarios, using a true GMR of 90% or 111% for truly bioequivalent formulations, and 80% or 125% for truly bioinequivalent formulations, in order to better derive conclusions on the potential of this analysis method in more realistic and extreme scenarios.
A redefined decision tree is proposed, suggesting a fixed sample size of 20 subjects for pilot studies in the case of an ISCV% > 20% or which is unknown and the assessment of study results through the average bioequivalence analysis and additionally through the G mean ƒ 2 factor in the case of the 90% CI for GMR, which is outside the regulatory acceptance bioequivalence interval of [80.00-125.00]%(Figure 13).Using this alternative approach, the certainty levels to proceed for pivotal studies depending on G mean ƒ 2 values and variability scenarios tested (20-60% IOV) were assessed, which is expected to be helpful in terms of the decision to go forward with pivotal bioequivalence studies.

Pharmaceutics 2023 , 24 Figure 2 .
Figure 2. Distribution of the Test-to-Reference Geometric Least Square Means Ratio (GMR) estimated from the average bioequivalence method, in the form of box plots.

Figure 2 .
Figure 2. Distribution of the Test-to-Reference Geometric Least Square Means Ratio (GMR), estimated from the average bioequivalence method, in the form of box plots.

Figure 3 .
Figure 3. Distribution of the intra-subject coefficient of variation (ISCV%), estimated from the average bioequivalence method, in the form of box plots.

Figure 3 .Figure 4 .
Figure 3. Distribution of the intra-subject coefficient of variation (ISCV%), estimated from the average bioequivalence method, in the form of box plots.Pharmaceutics 2023, 15, x FOR PEER REVIEW 9 of 24

Figure 4 .
Figure 4. Distribution of the calculated ƒ 2 factor, in the form of box plots.

Figure 5 .
Figure 5. Relationship between Gmean f2 factor and Test-to-Reference GMR (above) for all simulated truly bioequivalent (blue) and truly bioinequivalent (red) studies.Vertical dotted lines correspond to 10% and 20% difference between Test and Reference formulations, tested by the average bioequivalence approach.Horizontal dotted lines correspond to ƒ2 values of 50, 41, and 35.

Figure 5 .
Figure 5. Relationship between G mean f 2 factor and Test-to-Reference GMR (above) for all simulated truly bioequivalent (blue) and truly bioinequivalent (red) studies.Vertical dotted lines correspond to 10% and 20% difference between Test and Reference formulations, tested by the average bioequivalence approach.Horizontal dotted lines correspond to ƒ 2 values of 50, 41, and 35.

Figure 6 .
Figure 6.Variation in sensitivity/power for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-occasion variability (below), considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 90%).

Figure 7 .
Figure 7. Variation in sensitivity/power for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-occasion variability (below), considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 111%).

Figure 7 .
Figure 7. Variation in sensitivity/power for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-occasion variability (below), considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 111%).

Figure 8 .
Figure 8. Variation in specificity for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of interoccasion variability (below), considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 80%).

Figure 9 .Figure 8 .
Figure 9. Variation in specificity for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-

Pharmaceutics 2023 , 24 Figure 8 .
Figure 8. Variation in specificity for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of interoccasion variability (below), considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 80%).

Figure 9 .Figure 9 .
Figure 9. Variation in specificity for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-Figure 9.Variation in specificity for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability (above) and as function of inter-occasion variability (below), considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 125%).

Figure 10 .
Figure10.Variation in precision, negative predictive value (NPV), accuracy, F1, Matthews' Correlation Coefficient (MCC), and Cohen's Kappa (κ) for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability, considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 90% and 80%).

Figure 10 .
Figure10.Variation in precision, negative predictive value (NPV), accuracy, F 1 , Matthews' Correlation Coefficient (MCC), and Cohen's Kappa (κ) for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and A mean and G mean ƒ 2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability, considering a Test product with a lower bioavailability than the Reference product (i.e., true GMR of 90% and 80%).

Figure 11 .
Figure 11.Variation in precision, negative predictive value (NPV), accuracy, F1, Matthews' Correlation Coefficient (MCC), and Cohen's Kappa (κ) for the bioequivalence evaluation methods (average bioequivalence, centrality of the Test-to-Reference GMR, and Amean and Gmean ƒ2 factor evaluated with a cut off of 35) as function of the number of subjects for each tested variability, considering a Test product with a higher bioavailability than the Reference product (i.e., true GMR of 111% and 125%).

Figure 12 .
Figure 12.Variation in precision, for the Gmean ƒ2 factor evaluated with a cut off of 35, 41, and 50 as function of the number of subjects for each tested variability.An ƒ2 factor of 35, 41, and 50 corresponds to a difference of 20%, 15%, and 10%, respectively, between Test and Reference concentration time-profiles until the Reference tmax.

Figure 12 .
Figure 12.Variation in precision, for the G mean ƒ 2 factor evaluated with a cut off of 35, 41, and 50 as function of the number of subjects for each tested variability.An ƒ 2 factor of 35, 41, and 50 corresponds to a difference of 20%, 15%, and 10%, respectively, between Test and Reference concentration time-profiles until the Reference t max .

Pharmaceutics 2023 ,Figure 13 .
Figure13.Newly proposed decision tree for planning and analysis of pilot BA/BE studies.
Values represent the range calculated from simulated studies with 12 and 30 subjects.When statistics do not change between 12 and 30 subjects, unique values are presented instead of ranges.F 1 -harmonic mean of sensitivity and precision; κ-Cohen's Kappa; MCC-Matthews' correlation coefficient; NPV-negative predictive value.NC-not calculated.
Values represent the range calculated from simulated studies with 12 and 30 subjects.When statistics do not change between 12 and 30 subjects, unique values are presented instead of ranges.F 1 -harmonic mean of sensitivity and precision; κ-Cohen's Kappa; MCC-Matthews' correlation coefficient; NPV-negative predictive value.NC-not calculated.