Machine Learning and Pharmacometrics for Prediction of Pharmacokinetic Data: Differences, Similarities and Challenges Illustrated with Rifampicin

Pharmacometrics (PM) and machine learning (ML) are both valuable for drug development to characterize pharmacokinetics (PK) and pharmacodynamics (PD). Pharmacokinetic/pharmacodynamic (PKPD) analysis using PM provides mechanistic insight into biological processes but is time- and labor-intensive. In contrast, ML models are much quicker trained, but offer less mechanistic insights. The opportunity of using ML predictions of drug PK as input for a PKPD model could strongly accelerate analysis efforts. Here exemplified by rifampicin, a widely used antibiotic, we explore the ability of different ML algorithms to predict drug PK. Based on simulated data, we trained linear regressions (LASSO), Gradient Boosting Machines, XGBoost and Random Forest to predict the plasma concentration-time series and rifampicin area under the concentration-versus-time curve from 0–24 h (AUC0–24h) after repeated dosing. XGBoost performed best for prediction of the entire PK series (R2: 0.84, root mean square error (RMSE): 6.9 mg/L, mean absolute error (MAE): 4.0 mg/L) for the scenario with the largest data size. For AUC0–24h prediction, LASSO showed the highest performance (R2: 0.97, RMSE: 29.1 h·mg/L, MAE: 18.8 h·mg/L). Increasing the number of plasma concentrations per patient (0, 2 or 6 concentrations per occasion) improved model performance. For example, for AUC0–24h prediction using LASSO, the R2 was 0.41, 0.69 and 0.97 when using predictors only (no plasma concentrations), 2 or 6 plasma concentrations per occasion as input, respectively. Run times for the ML models ranged from 1.0 s to 8 min, while the run time for the PM model was more than 3 h. Furthermore, building a PM model is more time- and labor-intensive compared with ML. ML predictions of drug PK could thus be used as input into a PKPD model, enabling time-efficient analysis.


Introduction
Pharmacometrics (PM) and machine learning (ML) are promising approaches used in drug discovery and development, regulatory decision making and personalized medicine. In pharmacokinetic/pharmacodynamic (PKPD) analysis, pharmacokinetic (PK) data are commonly used as input to drive the exposure-response relationship, which can be both continuous drug concentrations or derived PK parameters (e.g., area under the concentration-time curve, AUC) [1,2]. When data are sparse, utilizing classical methodologies such as noncompartmental analysis (NCA) is not appropriate. In this case, PKPD modelling using the population approach is commonly applied. This requires the availability or development of a population PK model, which can be time-and labor-intensive. Growing access to big data in recent years has increased the interest in utilization of ML for predictions. ML, with its high computational efficiency, has significant potential and is starting to be used in drug development [3], but is not applied as much as in discovery yet [4].
While population PK modelling is a method very frequently used for drug PK predictions, ML is less commonly utilized. There are, however, a few examples in the literature where ML has been applied to predict PK data. Poynton et al. [5], for example, have evaluated the performance of several ML algorithms for prediction of remifentanil PK and compared the performance to population PK [5]. The authors show that an ensemble model integrating both an artificial neural network (ANN) and a population PK model best described remifentanil plasma concentration over time [5]. In addition, Woillard et al. [6,7] have shown that the XGBoost algorithm is appropriate for predictions of tacrolimus and mycophenolate mofetil exposure. Woillard et al. [6] and Poynton et al. [5] were some of the first authors to apply ML to PK datasets. While ML has the advantage of being fast and efficient, as well as being able to handle large datasets, PM models are based on biological mechanisms, contributing to mechanistical understanding, biological interpretability of the results and the potential to simulate in silico experiments from the model. ML carries an inherent risk of returning outputs that are not clinically relevant. Thus, this is another reason (other than time) that these two disciplines should work together. Due to the multiple challenges faced in PM and ML, there is a continuous interest in both communities to identify possible ways of combining expertise from both fields [4,[8][9][10]. Attempts to combine PM and ML were made by others previously. Current work includes improved methods for covariate selection [8], as well as research focused on combining ML algorithms with the compartmental structure of PM models [11,12] aiming to improve efficiency in model development. One way of combining ML and PM could be to use PK predicted from an ML model as input for a pharmacometrics PKPD model, provided that the predictive performance is acceptable (illustrated in Figure 1). In order to evaluate whether ML could support PKPD modelling by fast and accurate prediction of PK data, the aim of this case study was to investigate the ability of different ML algorithms to accurately and precisely predict both plasma concentration over time and derived PK parameters (such as AUC), using rifampicin as an example. Rifampicin is an antibiotic commonly used to treat drug-susceptible tuberculosis as part of an approved four-drug regimen [13]. It is a drug known for its complex PK, including autoinduction of elimination, concentrationdependent nonlinear clearance and dose-dependent bioavailability [14][15][16], as well as a high inter-occasion variability (IOV) [17]. Nonlinear PK sometimes leads to a complex modelling process, increasing labor efforts and time needed. This is the key reason why we believe that using an ML algorithm can actually assist PM in being more efficient.
When merging both methods, it is critical to first of all understand similarities and differences between the two fields, to uncover gaps in either methodology and to establish a common terminology, enabling both communities to communicate with each other. PM is the science of mathematical and statistical modelling with the aim of quantifying a drug's PK, pharmacodynamics (PD), PKPD or disease progression. PM models can be utilized within the entire spectrum of drug development, from discovery all the way to life-cycle management and label updates [18][19][20][21][22][23][24][25]. According to the FDA, model-informed drug development (MIDD) is an integral component in the development of a drug [26]. A common method used in PM is nonlinear mixed effects (NLME) modelling, first defined by Lindstrom and Bates [27]. It describes simultaneous model fitting to PK or PD data from all individuals within a population. NLME models consist of fixed-effects parameters describing the population as a whole and random-effects parameters describing the variability within the population and in the individual [27,28]. A population can, e.g., be a group of patients, healthy volunteers, but even a set of in vitro or in vivo data. The aim of a PM analysis is typically to quantify both the variability in the PK or PD response between patients and within a patient, as well as to identify predictors (=covariates) informing about the source of variability (see Figure S1), which can be endogenous (datadriven) or exogenous (e.g., between study sites). NLME models describe (1) the general tendency within the population at hand, (2) the variability between different patients (inter-individual variability (IIV)), (3) variability within the same patient on different occasions (inter-occasion variability (IOV)), (4) the remaining residual unexplained variability (RUV) and (5) predictors descriptive of the variability between patients, termed covariates. NLME models are defined by ordinary differential equations (ODEs) and stochastic When merging both methods, it is critical to first of all understand similarities and differences between the two fields, to uncover gaps in either methodology and to establish a common terminology, enabling both communities to communicate with each other. PM is the science of mathematical and statistical modelling with the aim of quantifying a drug's PK, pharmacodynamics (PD), PKPD or disease progression. PM models can be utilized within the entire spectrum of drug development, from discovery all the way to life-cycle management and label updates [18][19][20][21][22][23][24][25]. According to the FDA, model-informed drug development (MIDD) is an integral component in the development of a drug [26]. A common method used in PM is nonlinear mixed effects (NLME) modelling, first defined by Lindstrom and Bates [27]. It describes simultaneous model fitting to PK or PD data from all individuals within a population. NLME models consist of fixed-effects parameters describing the population as a whole and random-effects parameters describing the variability within the population and in the individual [27,28]. A population can, e.g., be a group of patients, healthy volunteers, but even a set of in vitro or in vivo data. The aim of a PM analysis is typically to quantify both the variability in the PK or PD response between patients and within a patient, as well as to identify predictors (= covariates) informing about the source of variability (see Figure S1), which can be endogenous (datadriven) or exogenous (e.g., between study sites). NLME models describe (1) the general tendency within the population at hand, (2) the variability between different patients (interindividual variability (IIV)), (3) variability within the same patient on different occasions (inter-occasion variability (IOV)), (4) the remaining residual unexplained variability (RUV) and (5) predictors descriptive of the variability between patients, termed covariates. NLME models are defined by ordinary differential equations (ODEs) and stochastic differential equations or analytical solutions [29,30]. They are usually expressed as compartmental models describing the absorption, distribution, metabolism and elimination of a drug, as illustrated in Figure 2. Since all data are fit simultaneously, model building with sparse or imbalanced data is possible [1,29,30]. Model building is performed in a step-wise manner and PM models contain biological or pharmacological mechanisms in their structure, thus allowing the derivation of parameters which are biologically sound and interpretable. The final model parameter estimates can be used to perform simulations of "what if" scenarios, answering research questions in order to inform future in vitro/in vivo experiments or clinical trials regarding their chances of success [31][32][33][34][35], which can greatly reduce the cost for an experiment or trial [36].
Pharmaceutics 2022, 14, x FOR PEER REVIEW 4 of 31 differential equations or analytical solutions [29,30]. They are usually expressed as compartmental models describing the absorption, distribution, metabolism and elimination of a drug, as illustrated in Figure 2. Since all data are fit simultaneously, model building with sparse or imbalanced data is possible [1,29,30]. Model building is performed in a stepwise manner and PM models contain biological or pharmacological mechanisms in their structure, thus allowing the derivation of parameters which are biologically sound and interpretable. The final model parameter estimates can be used to perform simulations of "what if" scenarios, answering research questions in order to inform future in vitro/in vivo experiments or clinical trials regarding their chances of success [31][32][33][34][35], which can greatly reduce the cost for an experiment or trial [36].
Illustration of a two-compartment pharmacokinetic model for a fictive drug. AGI, amount of drug in the gastrointestinal tract; k01, absorption rate constant; k10, elimination rate constant; k12, rate constant describing distribution from central to peripheral compartment; k21, rate constant describing distribution from peripheral to central compartment; V1, volume of central compartment (e.g., blood); V2, volume of peripheral compartment (e.g., brain tissue). Drug clearance is expressed as During and after model development, PM modellers utilize numerous diagnostics, both quantitative and/or graphical, in order to select the final model that best describes the observed data. For quantitative model comparison, modellers in the PM community commonly turn to the objective function value (OFV). The likelihood of the predictions to fit the data, i.e., the probability of the model parameters being able to describe the data, is estimated in a maximum likelihood parameter estimation, often using differential equation systems. The model parameters are estimated by minimizing the OFV, which is proportional to the −2*log likelihood that the model parameter values occur from the data. For comparison of nested models, the likelihood ratio test is used [37,38].
Graphical model evaluation is performed most commonly using visual predictive checks (VPCs) [39][40][41], basic goodness-of-fit (GOF) plots and individual plots [42,43]. The VPC is a tool to investigate the predictive performance of the model and allows for comparison between alternative models, evaluation of model fit and visualization of how the model could be improved [39]. The observed data are compared to data simulated from the model parameters. Commonly, the 90th percentile (sometimes lower) of the observed data is compared to the 95% (or lower) confidence interval (CI) of the simulated 90th percentile data. GOF plots include, for example, evaluation of population predictions or individual predictions versus observations (see Table 1 for details regarding terminology). In individual plots, individual observed data are compared to individual predictions. Lastly, precision in the parameter estimates, clinical relevance and scientific plausibility are considered during model evaluation.

Machine Learning
Machine learning (ML) has been defined as "a field of statistical research for training computational algorithms that split, sort and transform a set of data to maximize the abil- Illustration of a two-compartment pharmacokinetic model for a fictive drug. A GI , amount of drug in the gastrointestinal tract; k 01 , absorption rate constant; k 10 , elimination rate constant; k 12 , rate constant describing distribution from central to peripheral compartment; k 21 , rate constant describing distribution from peripheral to central compartment; V 1 , volume of central compartment (e.g., blood); V 2 , volume of peripheral compartment (e.g., brain tissue). Drug clearance is expressed as k 10 × V 1 . During and after model development, PM modellers utilize numerous diagnostics, both quantitative and/or graphical, in order to select the final model that best describes the observed data. For quantitative model comparison, modellers in the PM community commonly turn to the objective function value (OFV). The likelihood of the predictions to fit the data, i.e., the probability of the model parameters being able to describe the data, is estimated in a maximum likelihood parameter estimation, often using differential equation systems. The model parameters are estimated by minimizing the OFV, which is proportional to the −2*log likelihood that the model parameter values occur from the data. For comparison of nested models, the likelihood ratio test is used [37,38].
Graphical model evaluation is performed most commonly using visual predictive checks (VPCs) [39][40][41], basic goodness-of-fit (GOF) plots and individual plots [42,43]. The VPC is a tool to investigate the predictive performance of the model and allows for comparison between alternative models, evaluation of model fit and visualization of how the model could be improved [39]. The observed data are compared to data simulated from the model parameters. Commonly, the 90th percentile (sometimes lower) of the observed data is compared to the 95% (or lower) confidence interval (CI) of the simulated 90th percentile data. GOF plots include, for example, evaluation of population predictions or individual predictions versus observations (see Table 1 for details regarding terminology). In individual plots, individual observed data are compared to individual predictions. Lastly, precision in the parameter estimates, clinical relevance and scientific plausibility are considered during model evaluation.

Machine Learning
Machine learning (ML) has been defined as "a field of statistical research for training computational algorithms that split, sort and transform a set of data to maximize the ability to classify, predict, cluster or discover patterns in a target dataset" [44]. ML is commonly divided into supervised, unsupervised and semi-supervised methods [45]. Supervised ML aims to predict human assigned labels or experimentally determined outputs based on independent variables, i.e., these models are trained to predict established and expected outcomes based on a loss function. Algorithms belonging to supervised ML models either solve regression and classification problems, e.g., logistic regression, neural networks, support vector machines and decision trees [46]. In contrast, unsupervised machine learning does not require prior knowledge regarding the outcomes, and aims to reveal unexpected patterns in the data, i.e., clustering [47]. Semi-supervised learning is a combination of using labelled and unlabelled data applied in situations where only parts of the data have been labelled [48]. ML has frequently been used in drug discovery and is now increasingly being applied to drug development. Applications in drug development include biomarker identification, prediction of clinical outcomes and planning of clinical trials [9,10,45,[49][50][51].
Here, we focus on supervised ML, in which the dataset is separated into training, validation and test datasets. Common methodologies to split the dataset are either n-fold cross-validation or bootstrapping (for terminology, see Table 1). The training dataset is used for learning patterns in the data, while the validation dataset determines optimal parameterization, for instance, the number of trees in the decision tree, or epochs in a neural network. The loss function is calculated and optimized on the validation dataset to avoid under-or overfitting on the training dataset. Different methods for computation of the loss function exist, such as L1 (Least Absolute Deviations) (Equation (1)) or L2 (Least Square Errors) (Equation (2)).
In Equations (1) and (2), L1 indicates Lasso regression, L2 Ridge regression, Y observed observed data and Y predicted ML prediction. For estimating generalizability to predict novel data points [45], the test dataset is exclusively used for the final model evaluation, never for any training or parameterization [52][53][54]. Data leakage of the test dataset mostly leads to severe overestimation of the model's predictive performance, i.e., overfitting [45].

Terminology
Different terminologies are used in PM and ML, often describing either the same or a similar part of the analysis. While in PM one "builds" or "fits" a model, in ML a model is "trained". Both terms essentially describe the process of developing a model, its structure and parameters of interest. When it comes to predictors used in the model, there is a difference between ML and PM. In ML, the word "features" is used to describe all input variables used to train the model in order to predict the desired outcome [54]. This could in drug development, for example, include time, dose and patient characteristics such as bodyweight, age or creatinine clearance. In PM, on the other hand, predictors are usually termed "covariates", which are not directly comparable to features. Covariates describe predictors that aim to explain the sources of the PK and PD variability between patients (IIV). Since certain variables such as time and dose are normally already part of the structural model, they are not considered covariates [55]. Only the predictors explaining variability between patients in addition to the variables already included in the structural model are considered covariates, which would in this example be bodyweight, age or creatinine clearance. In this work, we use the word "predictor" as an umbrella term for both feature and covariate. In PM, only scientifically plausible and clinically relevant predictors should be included in the model, which are identified in a preselection step conducted by an expert. This can also be beneficial for ML models; however, if the dataset is large enough, preselection of predictors is not necessary. ML can then be used to identify unexpected associations in a data-driven manner.
The parameters of a PM model are often not comparable to those in ML. In PM, parameters are descriptive of PK and PD processes, such as drug clearance, distribution volume or drug absorption, as well as drug effect (e.g., the maximal drug effect (E max ) or the drug's potency, such as EC 50 ). ML parameters, however, are more mathematical and from a biological point of view less interpretable. Parameters are configuration variables of the model and are estimated during the model training process enabling predictions from the final model. They are determined automatically and include, for example, weights, coefficients and support vectors. Hyperparameters, on the other hand, are variables defined by the modeller as they cannot be estimated from the data, but are tuned during the learning process [56], such as the regularization parameter λ and the number of trees k in eXtreme Gradient Boosting (XGBoost) [57]. Determination of hyperparameters is called "hyperparameter tuning" and is often achieved by testing different hyperparameters and then choosing the ones providing the best model fit [58,59]; however, the process can also be automated [59].
As mentioned above, in supervised ML the data are split into a "training dataset" used for learning general rules, a "validation dataset" for internal validation and a "test dataset" for unbiased evaluation using cross-validation [52][53][54]. Separating the data is not common practice in pharmacometrics model building, where, instead, all available data are used for model building and only external validation is performed with new data.
The overall workflows for both PM and ML are illustrated in Figure 3.
Pharmaceutics 2022, 14, x FOR PEER REVIEW 6 of 31 The parameters of a PM model are often not comparable to those in ML. In PM, parameters are descriptive of PK and PD processes, such as drug clearance, distribution volume or drug absorption, as well as drug effect (e.g., the maximal drug effect (Emax) or the drug's potency, such as EC50). ML parameters, however, are more mathematical and from a biological point of view less interpretable. Parameters are configuration variables of the model and are estimated during the model training process enabling predictions from the final model. They are determined automatically and include, for example, weights, coefficients and support vectors. Hyperparameters, on the other hand, are variables defined by the modeller as they cannot be estimated from the data, but are tuned during the learning process [56], such as the regularization parameter λ and the number of trees k in eXtreme Gradient Boosting (XGBoost) [57]. Determination of hyperparameters is called "hyperparameter tuning" and is often achieved by testing different hyperparameters and then choosing the ones providing the best model fit [58,59]; however, the process can also be automated [59].
As mentioned above, in supervised ML the data are split into a "training dataset" used for learning general rules, a "validation dataset" for internal validation and a "test dataset" for unbiased evaluation using cross-validation [52][53][54]. Separating the data is not common practice in pharmacometrics model building, where, instead, all available data are used for model building and only external validation is performed with new data.
The overall workflows for both PM and ML are illustrated in Figure 3.  Table 1 provides a comprehensive overview of differences and similarities in terminology used by the PM and ML community.  Table 1 provides a comprehensive overview of differences and similarities in terminology used by the PM and ML community. Table 1. Overview of terminology commonly used by the pharmacometrics and/or machine learning community.

Covariates Features
Both terms describe predictors. Features are all input variables used to train a model. Covariates are predictors explaining variability between patients in addition to the variables already included in the structural pharmacometrics model.

Objective function value (OFV) Loss
The OFV is one of the main metrics for model evaluation in pharmacometrics model building. It is proportional to −2*log likelihood that the model parameter values occur from the data [37,38].
In ML, the loss is used as a goodness of fit. It represents the distance between predictions and observations which can be computed in different ways, such as L1, L2 or MAPE.
Build/Fit a model Train a model Both terms define the process of developing a model by determining model parameters that describe the input data in order to reach a predefined objective.

Validation dataset Validation dataset
In PM, the term validation dataset is often used for external validation. In ML, the term is commonly used for the data that are held back for internal validation to evaluate model performance during training.

Overparameterization Overfitting
In PM, a model can be overparameterized, meaning too many parameters are estimated in relation to the amount of information, leading to minimization issues. Overfitting in ML describes a phenomenon where the model has been trained to fit the training data too well. The model is forced to predict in a very narrow direction, which may result in poor predictive ability.

Model parameters Model parameters
Even though both communities use the same term, model parameters in PM are different from parameters in ML. Model parameters in PM describe biological or pharmacological processes, such as drug clearance, drug distribution volume or rate of absorption. These parameters are directly interpretable. In ML, on the other hand, model parameters are mathematical parameters learnt during the model training process and are part of the final model describing the data. They do not provide biological interpretation in the first instance at least.

Model averaging Ensemble model
An ensemble model combines multiple ML algorithms, which in most cases leads to better predictive performance compared to single algorithms [60]. There is a similar method used in PM called model averaging [61], where several models are combined using weights determined by their individual fit to the data.

Shrinkage Shrinkage
The term "shrinkage" has a different meaning in the PM and ML communities. In PM, shrinkage describes overparameterization, where 0 indicates very informative data and no overfit, and 1 uninformative data and overfitting. In ML, shrinkage methods in different ML models reduce the possibility of overfitting or underfitting by providing a trade-off between bias and variance.

Bootstrapping Bootstrapping
Describes a random resampling method with replacement. In PM, it is used during model development and evaluation for estimation of the model performance. In ML, bootstrapping is part of some algorithms, such as XGBoost or Random Forest, and is also used to estimate the model's predictive performance.

Cross-validation Cross-validation
In PM, cross-validation is used occasionally, for example, in covariate selection procedures in order to assess the true alpha error. In ML, cross-validation is commonly applied to prevent overfitting and to obtain robust predictions. Cross-validation describes the process of splitting the data into a training dataset and a test dataset. The training dataset is used for model development and the test dataset for external model evaluation. In n-fold cross-validation, the data are split into n non-overlapping subsets, where n − 1 subsets are used for training and the left-out subset for evaluation. This procedure is repeated until all subsets have been used for model evaluation. Model performance is then computed across all test sets [45].  [45]. -

Oversampling/ Upsampling
Oversampling is an approach used to deal with highly imbalanced data. Data in areas with sparse data are resampled or synthesized using different methods, for example, Synthetic Minority Oversampling Technique (SMOTE) [62].

Empirical Bayes Estimates (EBEs) Bayesian optimization
EBEs in PM are the model parameter estimates for an individual, estimated based on the final model parameters as well as observed data using Bayesian estimation [63]. In artificial intelligence (AI), Bayesian optimization is used to tune artificial neural networks (ANNs), particularly in deep learning.

Typical value Typical value
The typical value in PM is the most likely parameter estimate for the whole population given a set of covariates. It could, e.g., be the drug clearance estimate that best summarizes the clearance of the whole population. In ML, the typical value in unsupervised learning, for example, is the center of a cluster (e.g., k-means).

Inter-individual variability (IIV) -
Variability between individuals in a population. Describes the difference between typical and individual PK parameters. Often assumed to be log-normally distributed.
Inter-occasion variability (IOV) -Variability within an individual on different occasions (e.g., sampling or dosing occasions). Often assumed to be log-normally distributed.
Residual error variability (RUV) -Remaining random unexplained variability. Describes the difference between individual prediction and observed value.

Population prediction -
The population prediction is the most likely representation of the population given a set of covariates.
Individual prediction -Predictions for an individual using the population estimates in combination with the observed data for this individual, computed in a Bayesian posthoc step.

Data
Rifampicin PK data were simulated from a previously published population PK model describing rifampicin plasma concentrations over time in tuberculosis patients [15], which has been shown to perform best compared to other population PK models for rifampicin and has undergone external validation [64]. The population PK model consists of a 1compartment distribution model with nonlinear elimination (described as a Michaelis-Menten equation), autoinduction of elimination and dose-dependent bioavailability [15]. The simulated scenario was as in the original HIGHRIF1 clinical phase 2 trial [65] in order to create clinically relevant data. PK in 83 individuals following 10, 20, 25, 30, 35 or 40 mg/kg oral rifampicin once a day for 2 weeks was simulated. The dose group was randomly assigned to a simulated individual. The simulated individual's bodyweight defined the actual dose. The dose was then rounded to the next 150 mg increment, due to availability of tablet strength. In total, the 13 doses, which are 450, 600, 900, 1050, 1200, 1350, 1500, 1650, 1800, 1950, 2250, 2400, 2550 and 2700 mg, were randomly assigned to 8, 9, 1, 7, 8, 3, 17, 2, 13, 5, 3, 4, 1 and 2 individuals, respectively. IIV, IOV and RUV were included in the simulations. PK sampling time-points were at pre-dose and 0.5, 1, 1.5, 2, 3, 4, 6, 8, 12 and 24 h after dose at two different sampling occasions (day 7 and 14 of treatment). Patient covariates were randomly sampled from the parametric covariate distribution in the HIGHRIF1 trial [65], taking into account the correlation between bodyweight (WT), fat-free mass (FFM) and gender, as described previously [17]. The covariates were sampled from either a truncated normal distribution (WT, FFM, age, body height (HT)) or a binomial distribution (gender, HIV-coinfection, race). Body mass index (BMI) was calculated from WT and HT.
Predictors included in the simulations to create the dataset were time after dose (TAD), treatment week (OCC) and dose, which are part of the structural PM model, as well as the covariate FFM. In total, 1826 (83 patients × 2 occasions × 11 samples per occasion) rifampicin plasma concentrations were simulated. The simulated rifampicin concentrations and the covariates included in the simulation are considered to be the true observed concentrations and predictors, respectively. The final simulated dataset is provided in Supplementary Materials Data S1.

ML Model Training
In this work, we evaluate ML model performance for prediction of PK data using rifampicin as an example drug. Since either a rifampicin plasma concentration-time series or exposure indices can be used as input for a PKPD model, it is of interest to investigate the predictive ability of ML for either outcome. As an exposure index, the area under the plasma concentration-time curve from 0 to 24 h (AUC 0-24h ) was chosen here, since the AUC 0-24h /MIC has been shown to be the best predictor of rifampicin efficacy [66]. The individual AUC 0-24h values were calculated using noncompartmental analysis (NCA) based on rich simulated profiles (20 observations per sampling occasion). For AUC 0-24h calculation, the trapezoidal rule implemented in the pmxTools R package [67] was utilized. The derived AUC 0-24h values were considered the true values.
For ML model training, features included in the training dataset were TAD, dose, OCC, BMI, age, gender, race, WT, HT, HIV co-infection and FFM. The target was either the rifampicin plasma concentration or the rifampicin AUC 0-24h . From the whole dataset, 5 datasets each containing 80% of the data for training and 20% for testing were created using patient identifier (ID) as a grouping variable, enabling 5-fold cross-validation. When splitting the data, it was ensured that each of the 5 test datasets contained different IDs, making sure that every ID was left out once, thus avoiding overlapping and bias. The test set was solely used to evaluate final performance, never for any training or fitting parameters of any model. The training dataset was again split in 80% training and 20% validation for 5-fold internal cross-validation. Utilizing the training datasets as input, different ML algorithms were trained for prediction of rifampicin PK. The predictive performance for each of the 5 test datasets was averaged in order to compute the overall predictive performance.
The current study deals with a regression problem; therefore, different supervised ML algorithms included in the SuperLearner R package [68] were tested. The three topperforming algorithms were chosen for further evaluation, which were GBM, XGBoost and Random Forest. In addition to these three nonlinear models, a linear model (LASSO) was evaluated for comparison.
The algorithms were optimized by testing different model parameters (hyperparameter tuning). Hyperparameter tuning was performed in a grid search for LASSO, GBM and Random Forest, where all possible parameter combinations were explored. For XGBoost, a sequential, heuristic search was performed due to very long run times for the full grid search. The ranges of investigated hyperparameters of the final models are presented in Table S1.
For comparison of run times, as well as creation of a VPC, the population PK model [15] was re-estimated in NONMEM [37] using the simulated dataset.

Feature Ranking
GBM, XGBoost, Random Forest and LASSO were compared based on their capability to identify features. Since the dataset was created by simulating from the population PK model, including TAD, OCC, dose and FFM, which were previously shown to influence rifampicin PK [15], the data include a correlation between rifampicin plasma concentrations and these predictors. Other predictors included in the dataset, which were not used in the simulation of the plasma concentrations, can therefore be considered noise. In contrast to a real-world dataset, where the true predictors are unknown, in this simulation-based study the true predictors are known. In this work, we explored whether the four different algorithms were able to identify these true predictors, i.e., TAD, OCC, dose and FFM. Evaluated predictors were TAD, dose, OCC, BMI, age, gender, race, WT, HT, HIV coinfection and FFM. The machine learning model evaluates all features and assigns weights to them in the training process, which will determine the prediction of the outcome variable.
The importance score of GBM is based on calculating the amount by which each feature's split point improves the efficiency in a single decision tree. The importance scores are averaged across all decision trees. Therefore, the greater the improvement efficiency measure of a feature on the split point (closer to the root node), the greater the weight. This means that the more the promotion tree is selected, the higher the feature importance. XGBoost, as the implemented method of GBM, has the same algorithms in feature ranking, but uses a more regularized model formalization to control over-fitting which would affect the importance scores. Random Forest calculates importance scores by evaluating the decrease in node impurity weighted by the probability of reaching that node. LASSO adds the L1 norm of the coefficient as a penalty term to the loss function. Since the regularization term is non-zero, the coefficients corresponding to less important features are therefore discarded.

PK Predictions
The amount of available clinical PK data is very different across the several stages of drug development and thus an additional objective was to explore ML model performance across varying data sizes. To assess the amount of information required by the ML algorithms to predict well, scenarios including varying numbers of observed rifampicin concentrations as input variables in addition to the weighted features included in the model (TAD, dose, OCC, BMI, age, gender, race, WT, HT, HIV co-infection and FFM) were investigated (see Table 2). Based on this input, a full plasma concentration-time series (at pre-dose and 0.5, 1, 1.5, 2, 3, 4, 6, 8, 12 and 24 h post-dose) and the AUC 0-24h at treatment days 7 and 14 were predicted (see Table 2). As an example, in scenario 2 (Table 2), the abovementioned features, as well as 2 observed rifampicin plasma concentrations at 2 and 4 h post-dose are used as input variables to the model to predict the rifampicin plasma concentration at pre-dose and 0.5, 1, 1.5, 2, 3, 4, 6, 8, 12 and 24 h post-dose, i.e., a full pharmacokinetic profile.

Model Evaluation
Model performance was evaluated using the R 2 (Equation (3)) between observations and predictions, the root mean square error (RMSE) describing precision (Equation (4)) and the mean absolute error (MAE) describing bias (Equation (5)).
In Equation (3), SS res are the squared residuals reflecting the fit between observed and predicted value and SS tot are the total sum of squares, reflecting the total variance. SS res is defined as where Observed i is the individual plasma concentration or AUC 0-24h value simulated from the population PK model, and Predicted i is the individual ML model-based prediction.
SS tot is defined as where Observed i is the individual plasma concentration or AUC 0-24h value simulated from the population PK model.
In Equations (4) and (5), RMSE stands for root mean squared error describing precision, MAE stands for mean absolute error describing bias, Observed i is the individual plasma concentration or AUC 0-24h value simulated from the population PK model and Predicted i is the individual ML model-based prediction.

Feature Ranking
In scenario 1, i.e., predicting a rifampicin plasma concentration-time series based on features only, XGBoost, Random Forest and GBM selected TAD and dose as the most important predictors as expected, since these were included in the creation of the data. Instead of the true covariate FFM, all three algorithms selected WT and BMI as the third and fourth most important predictors, which can be explained by the direct correlation between WT, BMI and FFM. The linear LASSO algorithm showed poor feature selection performance, which was not better than random. Therefore, feature importance was not evaluated for LASSO. The importance score of each feature for the different ML algorithms (scenario 1) is presented in Figure 4. The feature importance for the five other scenarios is graphically presented in Figure S2.

Predictions of Rifampicin Plasma Concentration over Time
XGBoost, Random Forest, GBM and LASSO were trained and optimized. The final, well-trained models were used to predict rifampicin plasma concentration over time (11 time-points per sampling occasion) using the features in the test dataset as input. The final model predictions and run times are presented in Table 3. For all algorithms, the run time was shorter (range across all algorithms: 1.0 (scenario 5 in Lasso)-508.7 s (scenario 3 in Random Forest)) compared to the population PK model (11,479 s) using 1 core of the UPP-MAX cluster.
With regards to prediction of plasma concentration over time using features as input only, XGBoost showed the highest R 2 (0.60) and precision (RMSE: 10.6 mg/L). GBM had an R 2 of 0.57 and RMSE of 10.9 mg/L and Random Forest had an R 2 of 0.54 and RMSE of 11.3 mg/L. The linear model LASSO had an R 2 of 0.25. Using 2 rifampicin plasma concentrations per sampling occasion as input for prediction of the whole time series (11 timepoints per sampling occasion) substantially improved model performance compared to features only. In this scenario, XGBoost and GBM had the highest predictive performance with an R 2 of 0.76, closely followed by Random Forest with an R 2 value of 0.75. The use of 6 rifampicin plasma concentrations led to the best predictive performance in all algorithms. XGBoost had the highest R 2 (0.84) and precision (RMSE: 6.9 mg/L). Random Forest and GBM showed comparable performances, with R 2 values of 0.82 and 0.83, respectively.

Predictions of Rifampicin Plasma Concentration over Time
XGBoost, Random Forest, GBM and LASSO were trained and optimized. The final, well-trained models were used to predict rifampicin plasma concentration over time (11 time-points per sampling occasion) using the features in the test dataset as input. The final model predictions and run times are presented in Table 3. For all algorithms, the run time was shorter (range across all algorithms: 1.0 (scenario 5 in Lasso)-508.7 s (scenario 3 in Random Forest)) compared to the population PK model (11,479 s) using 1 core of the UPPMAX cluster.  With regards to prediction of plasma concentration over time using features as input only, XGBoost showed the highest R 2 (0.60) and precision (RMSE: 10.6 mg/L). GBM had an R 2 of 0.57 and RMSE of 10.9 mg/L and Random Forest had an R 2 of 0.54 and RMSE of 11.3 mg/L. The linear model LASSO had an R 2 of 0.25. Using 2 rifampicin plasma concentrations per sampling occasion as input for prediction of the whole time series (11 time-points per sampling occasion) substantially improved model performance compared to features only. In this scenario, XGBoost and GBM had the highest predictive performance with an R 2 of 0.76, closely followed by Random Forest with an R 2 value of 0.75. The use of 6 rifampicin plasma concentrations led to the best predictive performance in all algorithms. XGBoost had the highest R 2 (0.84) and precision (RMSE: 6.9 mg/L). Random Forest and GBM showed comparable performances, with R 2 values of 0.82 and 0.83, respectively. LASSO exhibited poor performance (R 2 : 0.39). Model performance across all algorithms and scenarios is summarized in Table 3. The results clearly show that increasing the amount of data per simulated patient improves the predictive performance of all four ML algorithms, as shown in Table 3 and Figure 5. The prediction interval-based VPC for the best performing algorithm (XGBoost) ( Figure 6) shows accurate prediction of the median but underprediction of the true variability in the population. The rifampicin concentrations simulated from the population PK model, considered to be observations in this study, for 15 randomly selected IDs in the test dataset and the model predictions from the best-performing ML model (XGBoost) were compared across the dose groups, which are presented in Figures 7 and S3. Even though XGBoost showed the best predictive performance in scenarios 1, 2 and 3 (see Table 3), all three nonlinear algorithms exhibited acceptable predictive performance, considering the small dataset. A VPC for the re-estimated population PK model using the simulated data is shown in VPC (Figure 8).

Predictions of Rifampicin AUC0-24h
The different ML models were trained to predict rifampicin AUC0-24h using varying plasma concentrations of rifampicin as input (see Table 2). The R 2 , imprecision (RMSE), bias (MAE) and run time for each scenario are summarized in Table 4. Graphical exploration revealed good performance across all four algorithms (Figure 9), but LASSO was superior in regard to precision and accuracy (Table 4). Figure 8. Visual predictive check for the re-estimated population PK model based on the simulated data. Open blue circles are the rifampicin plasma concentrations simulated from the population PK model, considered to be observed data in this study. The upper and lower dashed lines are the 95th and 5th percentiles of the observed data, respectively, and the solid line is the median of the observed data. The shaded areas (top to bottom) are the 95% confidence intervals of the 95th (blue shaded area), median (red shaded area) and 5th (blue shaded area) percentiles of the simulated data.

Predictions of Rifampicin AUC 0-24h
The different ML models were trained to predict rifampicin AUC 0-24h using varying plasma concentrations of rifampicin as input (see Table 2). The R 2 , imprecision (RMSE), bias (MAE) and run time for each scenario are summarized in Table 4. Graphical exploration revealed good performance across all four algorithms (Figure 9), but LASSO was superior in regard to precision and accuracy (Table 4).   In scenario 4, the models were trained to predict rifampicin AUC0-24h based only on features (no plasma concentrations) at days 7 and 14. In scenario 5, the models were trained to predict rifampicin

Discussion
When comparing PM and ML methodology in drug development in terms of physiological plausibility and clinical relevance, the difference that probably stands out the most is that PKPD models are based on statistical models as well as underlying biological mechanisms where the models can vary from compartmental models to full mechanistic as in physiology-based pharmacokinetic models, while ML is often a purely data-driven approach, though not in all cases. In PM, the parameters are directly biologically interpretable, which can aid in identifying underlying mechanisms. ML parameters, however, are mathematically interpretable albeit to a lesser extent from a biological perspective. There are, however, some artificial intelligence (AI) methodologies, such as causal AI, that provide a better interpretability. In PM, the previous mechanistic knowledge that is used as input can also introduce a bias into the analysis if imputed inappropriately, which is avoided in ML. On the other hand, the previous knowledge used in PM models is crucial for analysis of sparse datasets. There are also differences in the model-building process. When already-available ML models are used, the modeller is less involved in the model building but facilitates the selection of an appropriate algorithm, hyperparameter tuning and model evaluation. In PM, a model is manually built in an iterative manner, constantly evaluating the model's performance at each step, based on which the modeller takes the decision on how to proceed (Figure 3). The process is thus very time-and labor-intensive, and can lead to different modellers developing slightly different models based on the same dataset. This way of developing a model ensures that modelling is guided by a physiological intent, i.e., it is easily understandable if a PK profile follows a one-or a two-compartment model structure and how that potentially relates to the distribution of the drug in the body and site of action. ML cannot account for that in the first instance at least, but is a faster, more time-and labor-efficient method compared to PM. Challenges with purely data-driven ML algorithms certainly lie in the need for rich data, as well as the "Garbage in-Garbage out" problem, which describes the need for high-quality and diverse data in order to build robust models [45]. One of the major advantages of PM modelling is the ability to perform simulations of different scenarios that were not part of the data used for model building. An example is clinical trial simulations where a drug's PK and/or effect is predicted in a virtual population for different scenarios, for example, to support dose selection for a future clinical trial or for prediction of optimal doses in renal impairment patients or pediatric populations [70]. While this can also be achieved using ML, e.g., with reinforcement methodologies, larger amounts of data are needed to train robust models.
In this work, we have investigated the identification of key features (Figure 4), described the variability in the PK predictions ( Figure 6) and predicted PK using time as a continuous variable (Table 3, Figures 6 and 7).
Accurate selection of informative predictors is crucial both to obtain a high model performance and for clinical use, as they inform about potential differences in dose adjustments in special populations. This work suggests that all three nonlinear algorithms are able to correctly identify TAD and dose as the most important predictors. Since the linear LASSO algorithm performed very poorly, feature selection was not evaluated. Despite the correct identification of TAD and dose, none of the nonlinear algorithms selected FFM in third place, which is the true covariate. All three algorithms, however, selected WT and BMI as third and fourth most important, which are directly correlated with FFM. We hypothesize that WT was assigned a high importance instead of FFM due to the high correlation between the two features. The ML model assigns a high feature importance to WT and a low importance to FFM, because the contribution of FFM is no longer needed after inclusion of WT. In the PM model, on the other hand, FFM was selected as more important than WT. This might be due to the fact that PM and ML are two very different methods, which select features in a different way. However, this should not have a large impact on the results since FFM and WT are directly correlated and assigning one a high and the other one a low importance would likely describe the data very similarly. In PM, the true covariate FFM would be added as a covariate on, e.g., drug clearance or drug volume of distribution, indicating which process the covariate influences. This information, however, is not easily obtainable from ML models, and it thus remains unknown how exactly the exposure is influenced by a feature. Another difference is that in PM TAD, OCC and dose are a part of the structural model, and are always used for prediction of concentration over time in population PK modelling, while in ML these are considered features, just like FFM.
Using ML instead of PM could be promising for prediction of PK due to its time-and labor-efficiency in situations when the gold standard population PK cannot be used, or there is no need for a more mechanistic understanding of the PK, or to develop a model that can perform simulations. All the tested ML algorithms in this study outperformed the PM model with respect to run time, being at least 22 times faster. Considering the fact that PM model development is a manual, stepwise procedure, it is also important to note that a PM model has to be run repeatedly, especially when performing covariate analysis. It is often necessary to run the model 20-100 times until the final model is reached. In ML, however, many algorithms have an embedded feature selection, i.e., the model only has to be run once. A fairer comparison would thus be a multiple of 11,479 s (191 min) for the PM model, depending on how many developmental steps are involved and how many covariates are explored, versus one run time for the ML model plus the run time for hyperparameter tuning. However, both pure run times and the model development itself are faster using ML. Due to the stepwise, manual model building process in PM, it usually takes several weeks for an experienced modeller to develop a population PK model, while an ML model can most often be developed in less time. With large amounts of diverse data, the time ratio would change even more. This increased efficiency is especially beneficial in light of the good predictive performance of the nonlinear ML models explored in this study when at least two plasma concentrations are used as input. In this work, we demonstrated a good precision and accuracy in the ML predictions for both longitudinal data (XGBoost: RMSE: 6.9 mg/L, MAE: 4.0 mg/L) and exposure indices (AUC 0-24h ) (LASSO: RMSE: 29.1 h·mg/L, MAE: 18.8 h·mg/L), using six concentrations as input. In both cases, the inclusion of observed rifampicin plasma concentrations as features considerably improved the model performance.
For prediction of the plasma concentration-time series, the predictions ( Figure 5) indicated that the nonlinear ML algorithms, especially XGBoost, were in good accordance with the data simulated from the population PK model, and the shape of the concentrationtime profiles was predicted accurately (Figures 6 and 7). Despite the high complexity in rifampicin PK, including autoinduction of elimination, concentration-dependent nonlinear clearance, dose-dependent bioavailability [14][15][16] and high IOV [17], the nonlinear ML models performed well when at least two plasma concentrations were used as input. The results showed that the best-performing model (XGBoost) was able to predict accurately at treatment weeks 1 and 2 (Figure 7), suggesting that the model can handle autoinduction [14,15]. The good performance across all dose groups (Figure 7) indicated that dose-dependent bioavailability and concentration-dependent elimination are handled properly. However, the XGBoost predictions may have a weakness in identifying the true upper range of high exposure (see Figure 6). When comparing the individual predicted plasma concentrationtime profiles to the data simulated from the population PK model, here considered to be observed data, it becomes obvious that there is a larger variability in the observed data (95% PI predictions: 62.6 mg/L, 95% PI observations: 69.4 mg/L) (see Figures 6 and 8).
This could be due to the fact that the population PK model incorporates different sources of variability, such as IIV, IOV and residual error, and is thus able to describe variability between patients and within a patient well. The ML algorithms explored in this study, however, only use the features as a source of variability and cannot account for IIV that is not caused by any of the features. This could be important to appreciating/predicting risks of individual patients reaching a safety threshold of exposure.
For prediction of AUC 0-24h , all four algorithms performed well when at least two plasma concentrations were used as input. The results clearly show a correlation between the amount of data used for training and model performance. Using features only without rifampicin plasma concentrations as input, resulted in weak model performance with an R 2 of 0.41, RMSE of 117.9 h·mg/L and MAE of 74.2 h·mg/L (LASSO). Using two plasma concentrations at 2 h and 4 h post-dose as input, resembling a limited sampling strategy [64], led to a higher model performance (LASSO, R 2 : 0.69, RMSE: 86.8 h·mg/L, MAE: 54.5 h·mg/L). The best predictive performance was achieved when using six plasma concentrations as input, representing a richer sampling, where LASSO performed very well with an R 2 of 0.97 (RMSE: 29.1 h·mg/L, MAE: 18.8 h·mg/L) (see also Table 4). This indicates that predicting AUC 0-24h accurately and precisely without drug plasma concentrations is challenging. At least two concentrations are needed to reach acceptable model performance.
The generalizability of the algorithms to predict well across all test sets is important to evaluate overfitting and identify outliers. The generalizability of the models across the test sets obtained from the five-fold cross-validation was acceptable as the range of RMSE and MAE was in general spread evenly around the average value (see Tables 3 and 4), with the exception of GBM in scenario 6.
Due to the different nature of the evaluated algorithms (linear versus nonlinear), they performed very differently for prediction of longitudinal data and AUC 0-24h . While the linear model LASSO showed excellent performance for AUC 0-24h prediction using six concentrations (R 2 : 0.97), it was not able to predict longitudinal data (R 2 : 0.39). This is likely due to the different nature of the longitudinal data and the AUC 0-24h . A concentration-time series has a distinct shape (see, e.g., Figure 6), which a linear algorithm such as LASSO is not able to describe. The AUC 0-24h , however, is a summary variable, describing the whole time-series in one value. This simplifies the problem and enables even a linear algorithm to predict well. The nonlinear models performed well for both prediction of longitudinal data as well as AUC 0-24h , including at least two plasma concentrations as input. However, the PM model using NLME methodology still performs better for prediction of longitudinal data, as shown in Figure 8. NLME models are better able to capture the variability between and within patients compared to the ML models investigated here (Figures 6 and 8). There is thus a need for further studies investigating how the variability could be better captured using ML.
PK data are naturally imbalanced due to the fact that plasma PK sampling is often performed very sparsely. This can often lead to poor performance in areas of sparse information, e.g., around the C max using ML. One approach addressing this issue is oversampling, which is a method increasing the data size in sparse areas. Due to the fact that PK data is not assumed to be normally distributed, we did not apply oversampling in this work, but believe that oversampling methods appropriate for sparse PK data should be investigated.
In this work, we used an exemplary simplified case study and time efficiencies provided an example of a proof of value study. ML is a purely data-driven approach and should thus be regarded as an assistive tool rather than a decision maker at this stage. Richer, more diverse data will help produce a more accurate tool, but in this study, we exemplified the potential of ML. In this study, the model performance between PM and ML could not be directly compared, since the data were simulated from the PM model, which might bias PM model performance. There is thus a need for further studies comparing the performance of PM and ML directly using real patient data. In addition, the generalizability of the ML methods should be investigated further. A limitation of this work is that data simulated from a population PK model instead of real patient data were used as input for the ML model training. Even though real patient data would have reflected the real-world scenario perfectly, we believe that using simulated data instead did not impact the results significantly, since the population PK model used to perform the simulations [15] has been validated externally and shown to predict real patient data accurately and precisely [64]. Moreover, in order to be able to make the simulated dataset in Supplementary Materials Data S1 public, the patient covariates were sampled from a parametric covariate distribution of the HIGHRIF1 trial [65] and not bootstrapped. Even though bootstrap is the gold standard method, it is assumed that the results do not vary much, since the correlations between covariates were retained and the sampling was performed from a truncated distribution using the reported ranges as minimum and maximum values. One drawback of the ML model is that some plasma concentrations were predicted to be negative, which is not the case in the simulated data, as the lowest possible prediction is restricted to 0 mg/L due to scientific plausibility. Restricting the ML models or working with log-transformed data for future work could be one way to avoid this issue. In addition, we see a need to do more work on inflation of classifier accuracy due to data crosstalk. Furthermore, the time-points resembling sparse PK sampling (2 and 4 h post-dose) are based on a PM study [64]. A sparse sampling strategy in PM is often based on gaining information on exposure during the absorption and elimination phase. This might not be the case for ML models, and thus there is a need to investigate an optimal sampling strategy appropriate for ML.
There are few studies investigating how ML and PM could be combined to further improve drug development [5,[8][9][10][11][12]. This work is one of the most comprehensive illustrative examples of a side-by-side comparison of the two methods. We here exemplify in a case study the use of ML for PK predictions, which could then be used as input into a PKPD model, enabling faster but still accurate PKPD analysis. We demonstrate that ML could be a useful tool for PK analysis, providing fast predictions of both the full concentration time series as well as PK exposure indices (e.g., AUC 0-24h ) with acceptable precision. Further work is needed to investigate this tool using real patient data. Bridging the gap between PM and ML seems promising considering that ML can add value to PM workflows through increased time and labor efficiency.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pharmaceutics14081530/s1, Data S1: Simulated dataset, Text S2: Final model code. Figure S1. Illustration of the different types of variability in a nonlinear mixed effects model. The plasma concentration time-curve for a fictive drug is shown. The blue line represents the model predictions for the typical individual and the orange line the model predictions for one specific individual. The blue dots represent the observations of all individuals in the population and the orange dots the observations for the one specific individual. The covariates for the blue and orange lines are consistent. The difference between typical pharmacokinetic (PK) parameters (population PK parameters) and the individual PK parameters is expressed as interindividual variability (IIV), and is reflected by the difference in drug exposure between the typical individual (blue line) and an individual (orange line). The remaining difference between individual model predictions (orange line) and observed concentrations (orange dots) are described by residual unexplained variability (RUV). Figure S2: Importance scores for evaluated features shown for the machine learning algorithms GBM, Random forest, XGBoost and LASSO for (A) AUC 0-24h predictions using features only as input (scenario 4), (B) AUC 0-24h predictions using 2 plasma concentrations as input (scenario 5), (C) AUC 0-24h predictions using 6 plasma concentrations as input (scenario 6), (D) prediction of the plasma concentration-time series using 2 plasma concentrations as input (scenario 2), (E) prediction of the plasma concentration-time series using 6 plasma concentrations as input (scenario 3). AGE, age (years); BMI, body mass index (kg/m 2 ); DOSE, daily rifampicin dose (mg); FFM, fat-free mass (kg); HIV, HIV-coinfection; HT, body height (cm); OCC, treatment week, RACE, race; SEX, gender; TAD, time after dose (h); WT, bodyweight (kg); TAD, time after dose (h); TAD_0.5, rifampicin plasma concentration at 0.