Evaluation of the Predictive Performance of Population Pharmacokinetic Models of Adalimumab in Patients with Inflammatory Bowel Disease

Adalimumab is a monoclonal antibody used for inflammatory bowel disease. Due to its considerably variable pharmacokinetics, the loss of response and the development of anti-antibodies, it is highly recommended to use a model-informed precision dosing approach. The aim of this study is to evaluate the predictive performance of different population-pharmacokinetic models of adalimumab for inflammatory bowel disease to determine the pharmacokinetic model(s) that best suit our population to use in the clinical routine. A retrospective observational study with 134 patients was conducted at the General University Hospital of Alicante between 2014 and 2019. Model adequacy of each model was evaluated by the distribution of the individual pharmacokinetic parameters and the NPDE plots whereas predictive performance was assessed by calculating bias and precision. Moreover, stochastic simulations were performed to optimize the maintenance doses in the clinical protocols, to reach the target of 8 mg/L in at least 75% of the population. Two population-pharmacokinetic models were selected out of the six found in the literature which performed better in terms of adequacy and predictive performance. The stochastic simulations suggested the benefits of increasing the maintenance dose in protocol to reach the 8 mg/L target.


Introduction
Crohn's disease (CD) and ulcerative colitis (UC) are chronic inflammatory bowel diseases (IBD) characterized by the intermittent destructive inflammation of the intestinal tract associated with significant morbidity, high burden of hospitalization and a severe impact on the quality of life of patients. There are several pharmacological alternatives available, including corticosteroids, immunosuppressive agents (methotrexate or azathioprine) and monoclonal antibodies that have shown clinical response in the treatment of these diseases [1][2][3].
Adalimumab is a human monoclonal antibody that binds specifically to the tumor necrosis factor (TNF) and neutralizes its biological function, decreasing the process of inflammation. Adalimumab is effective for induction and maintenance of remission in patients with moderate-to-severe IBD older than 6 years who fail with corticosteroids, immunosuppressive agents or other biologic therapy [4][5][6].
Several published studies of adalimumab have shed light on the clinical relevance of individualized dosing. Historically, the empiric approach to adapt the adalimumab dosage consists of intensifying the treatment in patients with loss of response and later, if this fails, switching to another biological treatment. In the last decade, several studies have shown that some patients can experience a loss of response to adalimumab or can develop antibodies against adalimumab (AAA) after long periods of subtherapeutic drug levels [7][8][9][10][11][12][13][14]. However, most of the time, the serum concentration guide dosing was done through algorithms [15,16].
In this line, Model-Informed Precision Dosing (MIPD) is the approach based on the use of population PK (PopPK) models and prospective Bayesian approach to increase the homogeneity in the drug exposure in patients in order to improve outcomes of treatments by achieving the optimal balance between efficacy and toxicity for each individual patient [17]. IBD patients could benefit from dose optimization because adalimumab has highly variable pharmacokinetics (PK) [16,18].
Recently, a multicenter retrospective study showed that the potential importance of early monitoring levels of adalimumab and MIPD approach can prevent immunogenicity and achieve better long-term outcomes in terms of IBD-related surgery or hospitalization, lower risk of developing AAA or serious infusion reactions and also it proved to be more cost-effective in comparison to empirical and/or reactive dose optimization program dose escalation [19]. However, the selection of the appropriate PopPK model is fundamental to apply MIPD, especially when there are multiple models in the literature in patients with IBD. The structural model is defined, in most of them, as one-compartment model with linear kinetics in the absorption and elimination processes, although the value of the PopPK parameters, and the covariates included in the model, vary significantly. Therefore, the aim of this study is to evaluate the predictive performance of PopPK models of adalimumab found in literature, in patients with IBD to determine the pharmacokinetic model(s) best suited for our population to subsequently use it in the clinical setting using MIPD.

Literature Search
A systematic literature search was conducted of databases in the field of Health Sciences: MEDLINE (via PubMed), Embase and Scopus. To define the search terms, the Medical Subject Headings (MeSH), a thesaurus developed by the U.S. National Library of Medicine, was used. The MeSH descriptors "Chron Disease", "Colitis, Ulcerative", "adalimumab" and "pharmacokinetics" were considered suitable. Likewise, these terms, "inflammatory bowel diseases" and "pharmacokinetics" were used to query the databases using the title and abstract field (Title/Abstract). The search was performed from the first available date until May 2021 according to the characteristics of each database. Additionally, a manual search for population models was conducted by inspecting the bibliographies of relevant journal articles to minimize the number of unrecovered papers by the review.
The following search was used in Pubmed, and it was adapted to the other databases: The inclusion criteria were the following: original articles published in peer-reviewed journals, articles that describe a novel population pharmacokinetic model and pertinent works with the available complete text, which must be written in English or Spanish. Additionally, the full text of the document should be accessible and only one version of each document was included. The following were the exclusion criteria: articles that included different diseases to CD or UC and studies developed in animal models.
The following information was extracted from the articles: patient characteristics, model structure, typical PopPK parameters, inter-individual variability (IIV), residual variability (RV) and covariates.

Study Design
A retrospective observational study was conducted at the General University Hospital of Alicante, performed on patients diagnosed with IBD undergoing treatment with adalimumab and who followed a dose optimization program developed between 2014 and 2019.

Patients and Data Collection
Trough serum concentrations (TSC) were collected from patients diagnosed with moderate or severe IBD treated with adalimumab in General University Hospital of Alicante, Spain. The following inclusion criteria were applied: participants had to be diagnosed with IBD, treated with adalimumab, and there had to be at least two adalimumab TSC in their medical history. Exclusion criteria included patients treated with other monoclonal antibodies different to adalimumab like infliximab, vedolizumab and ustekinumab and subjects who were diagnosed with other autoimmune diseases different to IBD such as rheumatoid arthritis, psoriasis and ankylosing spondylitis.
Relevant data were collected from the medical records and included age, sex, height, body weight, lean body weight (LBW), body mass index (BMI), AAA status and AAA serum concentration, dose of adalimumab, adalimumab serum concentration, serum albumin levels, serum C-reactive protein (CRP) levels, fecal calprotectin (FCP), type of disease, use of concomitant immunomodulators and time of the event recorded. Missing values of continuous covariates were imputed by their expected mean values. Data were excluded from the analysis if there was uncertainty about any relevant information such as the time of dosing or the time of drug concentration measurement and the loss to follow-up during their treatment.
Serum adalimumab concentrations and AAA were measured using an enzyme-linked immunosorbent assay (LISA TRACKER Duo Drug + ADAb from TheraDiag ® ) with a limit of quantification established to be 0.1 mg/L. Patients were considered as positive for AAA if titers were above 10 mg/L on at least one occasion.

Evaluation of Model Adequacy
The first step in the evaluation of the different PopPk models found in the literature was the evaluation of the model adequacy by analyzing and comparing how the different PopPK models describe the studied population using all the available TSC in the dataset (full dataset). Models that show the greater systematic bias in the Empirical Bayesian estimate (EBEs) of the PK parameters, or in the Normalised Prediction Distribution Errors (NPDE) [20,21] will be discarded. Only the models that described properly our population will be used to evaluate the predictive performance later.
Therefore, the distribution of the EBEs of the PK parameters for each of the PopPk models was calculated after performing a post-hoc analysis using the full dataset. Then, this distribution would be compared with the theoretical distribution of these PK parameters according to each of the PopPK models.
On the other hand, any trends observed in the NPDE plots (e.g., cone-shaped graph) might indicate model misspecifications and inferior model adequacy.

Evaluation of Predictive Performance
The evaluation of predictive performance was only performed in those models which best describe the studied population, according to the evaluation of the model adequacy.
To evaluate the predictive performance, the individual predictions of the last TSC were estimated for each patient, using the EBEs. These last TSC concentrations, named "last observed TSC", were left out and not used to calculate the EBEs. To evaluate the predictive performance, the bias and the precision were calculated with the last observed TSC by comparing them with their individual predictions calculated by each of the PK models. The predictive performance of the patients was evaluated considering two different scenarios; Scenario 1: The EBES were calculated from the previous TSC obtained from each patient 2 Scenario 2: The EBES were calculated from the two previous TSC of each patient.
In both equations Y-hat represents the model-predicted adalimumab concentration, Y represents the observed adalimumab concentration, and n is the number of observations.
A bootstrap of the data was performed to compare the statistical significance of the differences between bias and precision among the selected models.

Clinical Impact
Stochastic simulations were performed to optimize the initial maintenance doses in the clinical protocols, in order to acquire the target TSC in at least 75% of the population. The dosage regimens that were simulated were 40 and 80 mg administered subcutaneously every week or every other week. The target TSC that were considered were 8 mg/L for clinical remission [18,22].

Ethics Approval
All studies were conducted in accordance with principles for human experimentation as defined in the Declaration of Helsinki and were approved by the Human Investigational Review Board of each study center.

Consent
The need for written consent was waived because of the retrospective nature of the study.

Literature Search
A total of 211 publications 72, 52 and 87 from PubMed, Embase and Scopus, respectively, from 2003 to 2021, were found and collected in the search of databases using the keywords mentioned in the methods section. After removing duplicate articles and applying the inclusion and exclusion criteria, six PopPK models [26][27][28][29][30][31] were selected. The models were numbered from 1 to 6 and are referred to as M1 to M6. All selected PopPK models were one-compartment models. Four of them included only trough levels of adalimumab (M2, M3, M4 and M5) whereas the others (M1 and M6) derived from complete profiles of serum concentrations of adalimumab. Five of the six models were developed using NONMEM ® software, while one model (M2) was developed using Monolix ® software. Further information can be found in Table 1.  Typical values for adalimumab apparent clearance (CL/F) in the studies ranged from 11.7 to 17.5 mL/h, with the lowest value being reported in studies performed with pediatric population (M3). The typical apparent volume of distribution (V/F) ranged from 4.07 to 13.5 L. The absorption rate constant (ka) was estimated in three models and fixed in the others. All models estimated the IIV (coefficient of variation [CV], in percent) associated with adalimumab CL/F, with values ranging from 16.4% to 65%. Three models (M1, M2 and M5) estimated the IIV of V/F ranged from 35.1% to 48%. The summary of the characteristics of each study is listed in Table 2.

Patients
The dataset included 134 IBD patients in treatment with adalimumab with at least two TC. Baseline demographics, disease characteristics and missing values for the different covariates of the patient population are listed in Table 3. 75% of the patients are below 57 years old and 75 kg. Approximately 85% of the patients were diagnosed as CD and 8% of them developed AAA.
82 patients were treated subcutaneously with 160/80 mg and 18 with 80/40 mg at weeks 0/2 as an induction phase. For the rest, the information regarding the induction phase was not available in their medical histories. Following this phase, as a maintenance phase, all patients were treated with 40 mg of adalimumab every other week. A total of 398 TSC in the maintenance phase were available for the analysis, where 25.4% of these concentrations were over 8 mg/L, 46.3% between 3 and 8 mg/L and 28.3% below 3 mg/L in the first measure. AAA were detected in 11 patients. 73 patients were on a concomitant immunomodulator (azathioprine, 6 mercaptopurine, methotrexate or prednisone).
The dosage regimen was increased to 40 mg every 10 days or 40 mg every week, on 31 and 70 dose adjustments, respectively. Similarly, the dosage regimen was increased to 80 mg every other week or 80 mg every week, on 7 and 11 dose adjustments, respectively. On the other hand, on 7 dose adjustments, the dosage regimen was decreased to 40 mg every 3 weeks, at any time during their treatment. In 36 patients the dosage regimen was maintained at 40 mg every other week.

Evaluation of Model Adequacy
The distribution of the individual CL/F obtained in the post-hoc analysis compared with their theoretical distribution is represented in Figure 1. The distribution of the individual V/F was not performed because half of the models (M3, M4 and M6) did not include IIV in the V/F. The QQ-plot of the NPDE and their distribution versus time are depicted in Figure 2.
In M2 and M4, the 20% and 80% percentiles of the EBE of CL/F are close to the 95% confidence interval of the 20% and 80% percentiles of the simulated distribution of CL/F for these models. Moreover, the NPDE performed better in these models. Hence, M2 and M4 were the models that best described the studied population, with less bias and better NPDE performance. Therefore, the predictive performance would be evaluated in these models. Figure 3 shows the predictive performance for M2 and M4 represented as the IRES vs. the model-based prediction of the last observed TSC. Both models behave similarly, with a limited bias and a similar dispersion of the IRES. Table 4 also shows the bias and precision for M2 and M4 and their confidence interval. M2 and M4 are statistically better (p < 0.05) than the other models in terms of bias and precision in both scenarios (data not shown).

Evaluation of Model Adequacy
The distribution of the individual CL/F obtained in the post-hoc analysis compar with their theoretical distribution is represented in Figure 1. The distribution of the ind vidual V/F was not performed because half of the models (M3, M4 and M6) did not i clude IIV in the V/F. The QQ-plot of the NPDE and their distribution versus time are d picted in Figure 2.  Table 1.  gray solid lines, prediction intervals; blue-shaded area, 90% confidence interval (CI) of the 5% and 95% critical values; pink-shaded area, 90% CI of 0; red-shaded area, outliers of the bounds of the CI. The M numbers represent the models described in Table 1.
In M2 and M4, the 20% and 80% percentiles of the EBE of CL/F are close to the 95% confidence interval of the 20% and 80% percentiles of the simulated distribution of CL/F for these models. Moreover, the NPDE performed better in these models. Hence, M2 and M4 were the models that best described the studied population, with less bias and better NPDE performance. Therefore, the predictive performance would be evaluated in these models. Figure 3 shows the predictive performance for M2 and M4 represented as the IRES vs. the model-based prediction of the last observed TSC. Both models behave similarly, with a limited bias and a similar dispersion of the IRES. Table 4 also shows the bias and precision for M2 and M4 and their confidence interval. M2 and M4 are statistically better (p < 0.05) than the other models in terms of bias and precision in both scenarios (data not shown). gray solid lines, prediction intervals; blue-shaded area, 90% confidence interval (CI) of the 5% and 95% critical values; pink-shaded area, 90% CI of 0; red-shaded area, outliers of the bounds of the CI. The M numbers represent the models described in Table 1.  Table 1. The M numbers represent the models described in Table 1.  Table 1. The M numbers represent the models described in Table 1.

Clinical Impact
The results of the stochastic simulations of different dosage regimens using M2 and M4 are summarized in Figure 4. None of the dosage regimens could reach the desired target (TSC > 8 mg/mL) in at least 75% of the population that developed AAA. Similarly, 40 mg every other week was insufficient to reach the target for at least 75% of the population without AAA, although it is the standard dose recommended by protocol. 40 mg every week or 80 mg every week or every other week are enough to reach the target in at least 75% of the population. Interestingly, according to M2, the plasma concentration profiles of 40 mg every week or 80 mg every other week are very similar, which is not the case in M4.   Table 1.

Discussion
The MDPI applied in the clinical routine commonly makes use of PopPK models found in literature, given the lack of data available to develop in-house models in most of the hospitals. However, these models must be validated in the target population. An important aspect to validate is the predictive performance of the models, in similar conditions to the clinical routine. Many validations published in the literature do not really validate the predictive performance, but rather evaluate the model adequacy to the data. In the present work, the predictive performance was done with TSC that has not been used to calculate the EBEs, mimicking the real-world scenario. To our knowledge, this is the   Table 1.

Discussion
The MDPI applied in the clinical routine commonly makes use of PopPK models found in literature, given the lack of data available to develop in-house models in most of the hospitals. However, these models must be validated in the target population. An important aspect to validate is the predictive performance of the models, in similar conditions to the clinical routine. Many validations published in the literature do not really validate the predictive performance, but rather evaluate the model adequacy to the data. In the present work, the predictive performance was done with TSC that has not been used to calculate the EBEs, mimicking the real-world scenario. To our knowledge, this is the first validation and comparison of the PopPK models of adalimumab in the literature for their use in clinical routine. Six PopPK models for adalimumab were found in the literature for CD and/or UC patients, with similar structure (one-compartment model), although the covariates included differ among them. The PopPK models included patients with both induction and maintenance treatment, and only one was performed with data from pediatric population.
The model adequacy showed that M2 and M4 performed better than the rest. However, the mean individual CL/F obtained in all six PopPK models after the Bayesian post-hoc estimation ( Figure 1) is somehow higher than the expected mean CL/F. One possible explanation for this systematic trend is that the mean albumin value of our population is slightly lower than the referenced in the models found in literature, which indicates a worse disease control. There are several studies that demonstrate the correlation between low levels of albumin and an increase in the clearance of other similar drugs like infliximab [32,33].
Consequently, four out of the six models were discarded due to the significant bias in the distribution of the NPDE as well as the EBEs of the PopPK parameters, therefore, the models M2 and M4 were the candidates to evaluate the predictive performance. The predictive performance of both models performed reasonably well, with a bias less than −0.91, which is less than 13% of the trough target (8 mg/L). The bootstrap analysis of the predictive performance showed no statistical difference between both models, so, with the available data, both models could be considered as equally good for the clinical routine purposes. AAA is considered the covariate with the highest impact in the pharmacokinetic parameters, according to the results of the stochastic simulations.
According to the drug label, the recommended maintenance dose after the induction phase is 40 mg every other week [5]. This scheme results in a mean steady-state TSC of approximately 7 mg/L in Crohn's disease patients, which agrees with the mean steady-state TSC observed in our population (7.3 mg/L). So far, the exposure target is highly dependent on the therapeutic objective (clinical, endoscopic, biochemical or histologic remission) and whether patients are diagnosed with CD or UC [34]. A recent study showed that patients with concentrations <8.3 mg/L had more risk to develop AAA by week 12 and experienced less clinical benefit from dose escalation due to a loss of response [22]. Another study indicates that 8-12 mg/L TSC of adalimumab are required to achieve mucosal healing in 80-90% of IBD patients [18]. According to the stochastic simulations performed with M2 and M4 and considering a target TSC over 8 mg/mL, the recommended maintenance regimen dosage that should be included in the protocols is 40 mg every week or 80 mg every other week, in order to reach the target in, at least, 75% of the population. These recommendations are in line with the MDPI interventions in our population, where 75% of the patients needed a dose increase to reach the 8 mg/mL target.
The limitation of this study relies on its retrospective design, where patients were selected for MDPI based on the clinical decision of the physician, which implies a bias in the severity of the disease, reflected in the mean albumin values of our population. A prospective study in which patients were included for MDPI in a structured way regardless of the clinical situation of the patients should be carried out to avoid selection bias and validate these results in a wider population.
In summary, two of the PopPK models found in the literature were found to be better than the others in terms of model adequacy and predictive performance. However, the EBEs of the individual CL/F were found biased when compared with the population mean values in the models. That suggested the need to update the model with the available data. On the other hand, the stochastic simulations performed with these models suggested the benefits of increasing the maintenance dose in protocol to reach the 8 mg/L target.