1. Introduction
Residual unexplained variability (RUV) modeling is a fundamental component of population pharmacokinetic (PK) analysis. The residual error model accounts for all sources of variability not captured by structural and statistical models, including assay error, model misspecification, and intra-individual variability [
1]. The choice of residual error structure (additive, proportional, or combined) affects not only the precision of parameter estimates but also the validity of model diagnostics and the reliability of simulation outputs. In NONMEM, the residual error model is implemented in the
$ERROR block, where the analyst defines the predicted observation (Y), the individual residual (IRES), and a scaling factor (W) used to compute the individual weighted residual (IWRES = IRES/W) [
2]. While the mathematical foundations of these models are well established, the practical implementation in NONMEM allows for multiple coding approaches that, although structurally equivalent in terms of the likelihood function and parameter estimation, may differ substantially in their impact on diagnostic residuals.
For IWRES to be interpretable as a standardized residual (approximately normally distributed with a mean of zero and unit variance), W must reflect the true standard deviation of the residual error as encoded in the fitted model [
3,
4,
5,
6]. In the remainder of this paper, we refer to such coding as normalized W coding.
By contrast, we use the term non-normalized W coding when W omits one or more estimated variance components and, therefore, does not standardize IWRES, even though the underlying likelihood may remain unchanged.
However, alternative coding approaches are also commonly encountered in practice, practical examples, user documentation, and educational contexts, including W = 1 for additive models and W = F or W = IPRED for proportional models [
7,
8]. Although such codings may be operationally convenient, they do not necessarily standardize IWRES. In these cases, W does not incorporate the estimated variance components, and IWRES is, therefore, not a standardized residual, even though the underlying model fit is identical.
The consequences of different coding choices for the combined residual error model on parameter estimation have been systematically examined by Proost [
9]. That study compared three main implementation strategies in NONMEM: the variance-based method (VAR), in which W is defined as the square root of the total residual variance, and the standard-deviation-based method (SD), in which W is expressed as a linear combination of proportional and additive standard deviation terms. Proost demonstrated that different implementations of the VAR method yield identical parameter estimates and OFV, whereas the SD method produces numerically different but statistically valid estimates with a different parameterization of the residual error components [
9]. Crucially, however, that analysis was limited to parameter estimation and objective function values. The impact of these coding choices on diagnostic residuals, specifically IWRES, WRES, and CWRESI was not examined.
To our knowledge, no systematic analysis has characterized how the definition of W in the $ERROR block affects the numerical values and diagnostic interpretability of IWRES across additive, proportional, and combined residual error models, nor whether inconsistent W coding across model runs can produce misleading diagnostic plots. This issue may have remained underappreciated because different codings can produce identical likelihoods, objective function values, and parameter estimates in simple settings, while still altering the scale of commonly inspected individual residual diagnostics.
This gap has practical consequences. Although IWRES is no longer the most robust standalone diagnostic in contemporary pharmacometrics, it remains widely reported and visually inspected in routine NONMEM workflows. A pharmacometrician comparing IWRES-based plots between two models coded differently may observe apparent differences in residual dispersion, heteroscedasticity, or concentration-dependent bias that are purely artifactual. This may lead to incorrect conclusions about residual behavior, residual error structure, or perceived model adequacy, even when the underlying fit is unchanged.
The aim of this paper is to provide a systematic and practical characterization of the impact of W coding on diagnostic residuals in NONMEM. The present analysis is therefore intentionally restricted to the NONMEM framework, where the flexibility of the $ERROR block and the implementation of residual diagnostics make this question especially relevant. Using three simulated population PK datasets, each generated under a known additive, proportional, or combined residual error structure, we estimated each model using multiple coding variants of the $ERROR block and compared the resulting IWRES, WRES, and CWRESI values across runs. We illustrate how non-normalized W definitions distort individual-level diagnostic plots, how the choice between one- and two-EPS parameterizations affects population-level residuals in the combined model, and how cross-run comparisons of IWRES can be misleading when W is not consistently defined. Based on these results, we provide practical recommendations for consistent, interpretable residual error coding in NONMEM.
2. Methods
2.1. Simulation of Population Pharmacokinetic Datasets
Three population PK datasets were generated by stochastic simulation using the rxode2 package (version 5.0.1) in R (version 4.5.1), each corresponding to a distinct residual error structure, as follows: additive, proportional, and combined. A fixed random seed was used throughout to ensure reproducibility. All three datasets were simulated from a one-compartment model with first-order oral absorption, parameterized in terms of apparent clearance (CL/F), apparent volume of distribution (V/F), and absorption rate constant (KA). Between-subject variability (BSV) was implemented on CL/F and V/F using log-normal distributions with a variance of 0.09 on the log scale, corresponding to an approximate coefficient of variation of 30% for each parameter. This level of variability was chosen as a moderate and realistic didactic setting, sufficient to generate plausible interindividual PK differences while preserving a clear interpretation of the residual error coding effects under study.
For each dataset, 500 subjects received a single oral dose of 100 mg, with 12 concentration measurements collected per subject at nominal times of 0.25, 0.5, 1, 1.5, 2, 4, 6, 8, 10, 12, 18, and 24 h post-dose, yielding 6000 observations per dataset. The true population parameters used for simulation were TVCL = 5.0 L/h, TVV = 50.0 L, and KA = 1.0 h
−1 (fixed). Residual error was added to the individual model-predicted concentrations (IPRED) according to the structure of each dataset. For the additive dataset, a single normally distributed error term was used with true standard deviation σ_add = 0.5 mg/L. For the proportional dataset, a multiplicative error was applied with true proportional coefficient of variation CV = 20% (σ_prop = 0.20). For the combined dataset, two independent error terms were used simultaneously, a proportional component with σ_prop = 0.15 and an additive component with σ_add = 0.5 mg/L, consistent with the structure described by Proost [
9]. These values were selected to create three distinct and interpretable residual error scenarios. The proportional-only dataset used a slightly larger proportional component to represent a clearly multiplicative error structure, whereas the combined dataset used a somewhat smaller proportional contribution so that both additive and proportional variability would remain visible across the concentration range. Simulated concentrations below 0.001 mg/L were truncated to that value to avoid numerical issues. All true simulation parameter values are summarized in
Table 1.
2.2. Estimation Models
Each dataset was analyzed in NONMEM (version 7.6; Icon Development Solutions, Ellicott City, MD, USA) using the first-order conditional estimation method with the INTERACTION option (FOCE-I; “METHOD = 1 INTERACTION”). The structural model and between-subject variability structure were identical across all runs and matched the true simulation model, as follows: a one-compartment model with first-order absorption (ADVAN2 TRANS2), with BSV on CL and V modeled as exponential random effects. KA was fixed to its true simulation value of 1.0 h−1 in all runs to focus comparisons exclusively on residual error behavior. The following nine estimation runs were performed in total, grouped by residual error structure:
Additive error models (applied to the additive dataset):
- -
ADD.1: normalized coding: “W = SQRT(SIGMA(1,1))”, “Y = IPRED + ERR(1)”. The scaling factor W explicitly incorporates the estimated residual standard deviation, yielding a properly standardized IWRES;
- -
ADD.2: non-normalized coding: “W = 1”, “Y = IPRED + W*ERR(1)”. Although the Y equation is structurally identical to ADD.1 and, therefore, produces the same likelihood, OFV, and parameter estimates, W does not incorporate the estimated variance component, resulting in an unnormalized IWRES;
- -
ADD.3: THETA-based normalized coding: “W = THETA(4)”, “Y = IPRED + W*ERR(1)”, with “$SIGMA 1 FIX”. The residual standard deviation is estimated as a fixed effect, with SIGMA fixed to unity. This parameterization is expected to yield a normalized IWRES numerically equivalent to ADD.1 under correct model convergence.
Proportional error models (applied to the proportional dataset):
- -
PROP.1: normalized coding: “W = SQRT(IPRED**2 * SIGMA(1,1))”, “Y = IPRED + IPRED*ERR(1)”. W corresponds to the model-implied observation-specific residual standard deviation;
- -
PROP.2: non-normalized coding: “W = IPRED”, “Y = IPRED + IPRED*ERR(1)”. Y is structurally identical to PROP.1 and produces the same estimates, but W omits the SIGMA term, yielding an unnormalized IWRES;
- -
PROP.3: THETA-based normalized coding: “W = IPRED * THETA(4)”, “Y = IPRED + W*ERR(1)”, with “$SIGMA 1 FIX”. THETA(4) estimates the proportional coefficient of variation as a dimensionless fraction.
Combined error models (applied to the combined dataset):
- -
COMB VAR.1: variance-based method with two EPS (Proost VAR.1): “W = SQRT(IPRED**2 * SIGMA(1,1) + SIGMA(2,2))”, “Y = IPRED + IPRED*ERR(1) + ERR(2)”. This is the canonical two-epsilon parameterization, in which SIGMA(1,1) and SIGMA(2,2) represent the proportional and additive variance components, respectively. This coding matches the true simulation structure;
- -
COMB VAR.3: variance-based method with one EPS (Proost VAR.3): “W = SQRT(IPRED**2 * THETA(4)**2 + THETA(5)**2)”, “Y = IPRED + W*ERR(1)”, with “$SIGMA 1 FIX”. THETA(4) and THETA(5) estimate the proportional and additive standard deviations, respectively, as fixed effects. W is constructed to equal the true residual standard deviation of the combined model;
- -
COMB SD: standard-deviation-based method (Proost SD): “W = THETA(4)*IPRED + THETA(5)”, “Y = IPRED + W*ERR(1)”, with “$SIGMA 1 FIX”. As discussed by Proost (2017), this W does not represent the true standard deviation of the combined variance model but a linear approximation thereof, and it yields different but statistically valid parameter estimates compared to the VAR methods.
The nomenclature for COMB VAR.1, COMB VAR.3, and COMB SD follows that of Proost (2017) [
9] to facilitate cross-referencing with that work.
2.3. Comparison of Diagnostic Residuals
For each estimation run, the NONMEM output table (“.fit”) was generated to include the following variables: “ID”, “TIME”, “DV”, “IPRED”, “IWRES”, “WRES”, “CWRESI”, “PRED”, and “RES”. All post-processing and comparisons were performed in R (version 4.5.1).
Pairwise comparisons of IWRES, WRES, and CWRESI were conducted between runs sharing the same dataset and differing only in their
$ERROR coding. These comparisons were based on both numerical summaries and graphical diagnostics, so that the conclusions of the study would not rely on visual inspection alone. For each pair, the following three numerical metrics were computed: (i) the observation-by-observation ratio of IWRES values, summarized as the mean ± standard deviation across all observations; (ii) the Pearson correlation coefficient between corresponding residual vectors; and (iii) the standard deviation of IWRES across all observations, used as a measure of correct normalization. A well-specified model with a correctly normalized W is expected to yield an SD(IWRES) close to 1.0 [
5].
For the non-normalized coding schemes (ADD.2 and PROP.2), the implicit effective scaling factor, that is, the value of W that would be required to reproduce the same IWRES as the normalized reference run, was back-calculated observation by observation as the ratio of the IRES to IWRES from the non-normalized run and compared to the theoretical expectation derived from the estimated SIGMA of the corresponding reference run.
2.4. Evaluation of Diagnostic Plots
Graphical diagnostics were generated for each run using ggplot2 4.0.3 in R. To illustrate the impact of W coding on individual-level diagnostics, IWRES versus IPRED plots were constructed for the normalized and non-normalized coding variants within each error type. For the combined model, CWRESI versus IPRED plots were compared across coding variants rather than IWRES, because it is the population-level residual that differs meaningfully across combined error parameterizations. Finally, a cross-model scenario was constructed by superimposing IWRES versus IPRED plots from a non-normalized additive run and a normalized proportional run, to illustrate how inconsistent W definitions across runs can produce apparently meaningful differences in residual scatter that are purely artifactual.
2.5. Code Availability
All NONMEM control files (.ctl) corresponding to the coding variants analyzed in the manuscript, the rxode2 simulation script, and the R analysis scripts used to generate the figures and tables are provided as
Supplementary Materials. The simulated datasets (additive, proportional, and combined) are also included to allow for full reproduction of the estimation results. The
Supplementary Materials is intended not only to support reproducibility but also to provide directly reusable examples of the alternative
$ERROR codings discussed in the main text.
2.6. Language Editing and Stylistic Improvements
The authors used a generative AI tool (ChatGPT 5.5) to assist with language editing and stylistic improvements. The scientific content, analyses, and interpretations were fully developed and verified by the authors, who take full responsibility for the manuscript.
4. Discussion
4.1. Principal Findings
This study provides the first systematic numerical characterization of the impact of W coding in the NONMEM $ERROR block on diagnostic residuals, across additive, proportional, and combined residual error models. Three principal findings emerge.
First, for additive and proportional models, the definition of W has a direct, quantifiable, and algebraically predictable impact on IWRES, while leaving WRES and CWRESI strictly unchanged. This effect is multiplicative and proportional to the estimated variance component. For the additive model, ADD.1 (W = SQRT(SIGMA(1,1))) estimated SIGMA(1,1) = 0.216, giving W = 0.465. ADD.2 (W = 1), despite sharing a structurally identical Y equation, estimated SIGMA(1,1) = 0.22, producing SD(IWRES) = 0.44. The ratio SD(IWRES2)/SD(IWRES1) = 0.47 equals √(SIGMA1) = √0.22, confirming the algebraic relationship. ADD.3, with W = THETA(4) = 0.465 and $SIGMA 1 FIX, produced a normalized IWRES with SD = 0.947, identical to ADD.1, confirming the equivalence of SIGMA-based and THETA-based normalized codings.
This confirms that the difference between these codings concerns residual standardization rather than model fit. The mechanistic basis of this point is developed in
Section 4.2.
Second, for the proportional model, the result is even more striking. PROP.1 and PROP.2 produced rigorously identical parameter estimates (SIGMA(1,1) = 0.083, CL = 4.72 L/h, V = 34.8 L, OFV = −9622.81), and the pairwise ratio IWRES
2/IWRES
1 was perfectly constant at 0.288 (SD = 5.6 × 10
−6) (
Table 2 (Panel A)). This near-exact constancy arises because the two runs share a structurally identical Y = IPRED + IPRED*ERR(1) equation, which produces rigorously identical IPRED values (maximum difference = 0). Consequently, IRES is identical across both runs, and the compression factor reduces exactly to √(SIGMA(1,1)) = √0.082931 = 0.288.
Third, for the combined error model, all three codings (VAR.1, VAR.3, COMB SD) produced globally concordant IWRES (Pearson r ≥ 0.999 for all pairs) and identical OFV for VAR.1 and VAR.3 (−1800.81). However, CWRESI differed non-negligibly between the one-EPS and two-EPS parameterizations (maximum difference of up to 0.586), and COMB SD generated 19 extreme IWRES observations (0.32%) at low predicted concentrations (IPRED ≤ 1.94 mg/L), with values ranging from −59.0 to +74.3.
4.2. The Mechanistic Basis of W’s Impact on IWRES
The behavior of IWRES across alternative codings arises from the way residual variability is parameterized in the observation model.
In NONMEM, the likelihood and the objective function value (OFV) are determined by the statistical model defined in the $ERROR block through the equation used to compute Y. The variance entering the likelihood is, therefore, governed by the EPS/SIGMA structure embedded in Y, rather than by the user-defined variable W. Consequently, when different codings produce the same observation model Y, they necessarily produce the same likelihood, identical parameter estimates, and identical OFV values.
This explains the invariance observed between codings such as ADD.1 and ADD.2 or PROP.1 and PROP.2. In these cases, the observation model is unchanged and the residual variance structure entering the likelihood remains identical. The user-defined variable W is used only in the computation of diagnostic residuals through the expression
and, therefore, affects the scaling of IWRES without affecting the likelihood or the fitted parameters.
More generally, the quantity represented by W corresponds to the model-predicted standard deviation of the residual error for a given observation. When W is correctly defined as the square root of the residual variance implied by the model, IWRES are expected to behave approximately as standardized normal residuals. When W is defined differently, IWRES are simply rescaled versions of the same individual residuals, which explains the substantial differences in IWRES distributions observed across codings that, nevertheless, produce identical OFV and parameter estimates.
A different situation arises when residual variability is parameterized directly within the definition of Y, for example through THETA-based parameterizations with SIGMA fixed to one. In these cases, the residual variance entering the likelihood is carried by model parameters appearing explicitly in Y. Different codings may, therefore, represent alternative parameterizations of the same variance model, potentially preserving the OFV while modifying the interpretation and numerical values of estimated residual error parameters.
Taken together, these results show that the apparent impact of W coding on IWRES reflects a change in residual standardization rather than a change in model fit. The likelihood remains unaffected as long as the statistical model defined in Y is unchanged.
4.3. Why SD(IWRES) ≠ 1.0 for Normalized Runs: ε-Shrinkage
A potentially unexpected finding is that ADD.1, despite being correctly normalized, yields SD(IWRES) = 0.947, below the theoretical value of 1.0. This is consistent with ε-shrinkage [
5]. When individual empirical Bayes estimates shrink individual predictions towards the population, IRES are compressed towards zero, reducing SD(IWRES) below 1.0. The quantity 1 − SD(IWRES) estimates the ε-shrinkage: for ADD.1, ε-shrinkage = 1 − 0.947 = 5.3%, indicating modest shrinkage consistent with a well-specified additive model in this dataset. This explains why SD(IWRES) rarely equals exactly 1.0 in real datasets, even when W is correctly defined.
Among normalized codings, ADD.1 and ADD.3 produced identical SD(IWRES) = 0.947, both slightly below the theoretical value of 1.0 due to ε-shrinkage. Despite identical OFV values (−2095.254), the two runs estimated different residual standard deviations in different parameterizations: W = 0.465 for ADD.1 (from SQRT(SIGMA(1,1)) = SQRT(0.216)) versus THETA(4) = 0.465 for ADD.3, with SIGMA fixed to 1. These are numerically equivalent representations of the same residual standard deviation, confirming that THETA-based and SIGMA-based normalized codings produce equivalent IWRES when correctly specified.
For PROP.1, ε-shrinkage = 1 − 0.933 = 6.7%, substantially lower and expected for a proportional model that adapts more naturally to the heteroscedastic data structure. This difference illustrates that ε-shrinkage is a property of the data and model structure, not of the coding.
It is, therefore, useful to distinguish, at a conceptual level, the systematic deviation of SD(IWRES) from 1.0 due to ε-shrinkage from the artificial compression introduced by non-normalized W. In simple settings these two effects can be discussed separately, but in practice they may interact, particularly in combined error models and at low predicted concentrations. Correctly defining W removes the coding-related compression of IWRES, but it does not eliminate shrinkage-related departures from the ideal unit-variance behavior.
4.4. Why WRES and CWRESI Are Invariant to W Coding in Simple Models
The invariance of WRES and CWRESI to W coding for the additive and proportional models warrants explicit explanation. WRES and CWRESI are computed by NONMEM from the marginal distribution of observations, using the full variance–covariance structure implied by the Y equation, OMEGA, and SIGMA [
3].
For the additive model, the marginal residual variance entering the likelihood is determined solely by the SIGMA term appearing in the observation model Y. When the model is written as Y = IPRED + ERR(1), the residual variance is simply SIGMA(1,1), independently of the user-defined scaling factor W used to compute IWRES. Consequently, changing W from SQRT(SIGMA(1,1)) (ADD.1) to W = 1 (ADD.2) does not modify the statistical model used for estimation and, therefore, leaves WRES and CWRESI unchanged. This was confirmed numerically: the maximum absolute difference in CWRESI between ADD.1 and ADD.2 was ≤ 10−4, attributable to NONMEM’s output rounding.
This invariance breaks down for the combined model when moving from two EPS to one EPS. With two epsilon terms (Y = IPRED + IPRED*ERR(1) + ERR(2)), NONMEM handles residual variability through a variance structure that is not internally identical to that used for a single-epsilon coding of the form Y = IPRED + W*ERR(1), even when the marginal residual variance is numerically similar at the observation level. As a result, parameterizations that are nearly equivalent for IWRES may still differ for population-level residuals.
Under FOCE, CWRESI depends on the variance model propagated through NONMEM’s internal variance–covariance computations. The one-EPS and two-EPS formulations, therefore, do not enter the approximation in the same way. This explains why VAR.3 and COMB SD can remain highly concordant with VAR.1 at the level of IWRES, yet still produce detectable differences in CWRESI for individual observations.
4.5. Relationship with Proost (2017)
A key conceptual point is that VAR-based parameterizations represent the true variance of the combined error model, whereas the SD parameterization approximates the standard deviation as a linear function of IPRED. This approximation is generally adequate at moderate concentrations but may diverge at low concentrations.
The present results directly extend the analysis of Proost (2017) [
9] in two directions. First, we numerically confirm that the equivalence of VAR.1 and VAR.3 parameterizations—demonstrated by Proost at the level of parameter estimates and OFV—extends to individual diagnostic residuals, as follows: IWRES
3/IWRES
1 ≈ 1.00 (SD = 0.003, Pearson r = 1.00). Second, we demonstrate that the COMB SD coding, identified by Proost as producing different but valid parameter estimates (OFV
SD = −1844.85 vs. OFV
γar = −1800.81, ΔOFV = 44.04), also produces detectably different WRES and CWRESI (maximum CWRESI difference = 0.577) and introduces local IWRES instability at low concentrations (
Table 2 (Panel A)).
Importantly, the ΔOFV of 44.04 between COMB SD and COMB VAR.1 cannot be interpreted as evidence of better or worse model fit. As Proost (2017) [
9] clarifies, the COMB SD parameterization does not model the same error distribution as the VAR methods, it models W as a linear function of IPRED rather than as the true standard deviation of the combined variance. These two models are therefore not nested and their OFV values cannot be directly compared for model selection. This distinction is critical in practice: a pharmacometrician applying likelihood-ratio testing between COMB SD and COMB VAR models would be committing a methodological error.
Unlike Proost (2017), who analyzed real-world data with unknown true parameters, our simulation-based approach allows assessment of parameter recovery.
The COMB SD parameterization estimated CV_prop = 18.0% and σ_add = 0.313 mg/L, both different from the true values (15% and 0.5 mg/L) and from the VAR estimates (CV_prop ≈ 29%, σ_add ≈ 0.384 mg/L). This underscores that COMB SD does not estimate exactly the same quantities as the VAR methods, even when applied to the same data. This difference should be interpreted as a consequence of parameterization rather than as evidence that the SD method is inherently inappropriate.
4.6. Practical Implications
Based on the results of this study, the following practical recommendations can be formulated:
- -
Always define W as the model-predicted standard deviation of the residual error when interpreting IWRES;
- -
Avoid comparing IWRES across model runs unless W is defined consistently;
- -
Do not use IWRES-based criteria (e.g., |IWRES| > 2) when W is not normalized;
- -
Prefer CWRESI or simulation-based diagnostics (e.g., VPCs) for model evaluation;
- -
For combined error models, prefer the two-EPS variance-based parameterization (VAR.1) when population-level diagnostics are of interest;
- -
Interpret extreme IWRES values in SD parameterizations with caution, especially at low predicted concentrations.
These recommendations aim to ensure consistent and interpretable use of residual diagnostics in NONMEM.
IWRES plots require a normalized W, but IWRES should be interpreted within the broader hierarchy of pharmacometric diagnostics. In current practice, CWRESI and simulation-based diagnostics such as VPCs or NPDE are generally more robust than IWRES for model evaluation, especially in nonlinear settings. Nevertheless, because IWRES remains widely used in NONMEM workflows, its correct scaling remains important. For IWRES to be interpretable as a standardized residual with unit variance, W must incorporate the estimated residual standard deviation. Non-normalized codings (W = 1, W = IPRED) produce IWRES compressed by a factor of √(SIGMA(1,1)) relative to correctly normalized runs. In our data, this compression reached a factor of 3.5 for the proportional model (SD(IWRESprop2) = 0.269 vs. 0.933 for PROP.1). Any numerical criterion applied to IWRES distributions, such as the proportion of observations with |IWRES| > 2, will be strongly distorted by non-normalized coding and should not be used without first verifying that W is correctly defined.
WRES and CWRESI are robust to W coding for simple models. This invariance is consistent with the broader observation that population-level residuals are more robust diagnostic tools than individual-level residuals in nonlinear mixed-effects models [
6]. For additive and proportional error models, WRES and CWRESI are invariant to the W definition, as confirmed numerically to a precision of 10
−4. Diagnostic plots based on CWRESI can, therefore, be reliably interpreted regardless of the W coding used, which is reassuring given its recommended role as a primary diagnostic for FOCE-estimated models [
3,
4].
Cross-run IWRES comparisons are misleading unless W consistency is verified. As illustrated in
Section 3.5, comparing IWRES-based diagnostic plots between runs using different W definitions can generate entirely artifactual differences in apparent residual scatter. A non-normalized additive run (ADD.2, SD(IWRES) = 0.440) compared visually with a normalized proportional run (PROP.1, SD(IWRES) = 0.933) would falsely suggest that the proportional model has twice the residual variability, a conclusion with no basis in the underlying likelihood or fit quality.
For the combined error model, prefer two SIGMA for population-level diagnostics. The VAR.1 two-EPS parameterization produces CWRESI fully consistent with the combined variance structure. The one-EPS alternatives (VAR.3, COMB SD) introduce small but detectable CWRESI differences (up to 0.586 for individual observations). When the primary objective is model comparison via population-level diagnostics, VAR.1 is preferred. VAR.3 remains a valid and practical alternative, with near-perfect concordance with VAR.1 for IWRES and parameter estimates.
Exercise caution with COMB SD at low concentrations. The COMB SD parameterization generated extreme IWRES values for 19 observations (0.32%) at IPRED ≤ 1.94 mg/L. These values do not reflect model misspecification but rather a numerical artifact of the linear W approximation where it diverges from the true combined standard deviation. This does not imply that the SD parameterization is intrinsically invalid or unusable. In some practical settings, it may remain attractive because of parameterization preferences or model fitting considerations. However, its diagnostic consequences should be recognized, especially when interpreting IWRES at low concentrations or comparing residual-based diagnostics across coding approaches.
In practical NONMEM workflows, the impact of W coding is likely to be most visible when IWRES plots are compared across runs using different coding conventions, when proportional or combined residual error models are used, and when low predicted concentrations amplify the effect of approximation in SD-based combined parameterizations.
4.7. Limitations
Several limitations of this study merit acknowledgement. First, all analyses were performed on simulated data from a single one-compartment oral absorption model with dense sampling and no deliberate model misspecification. This simplified framework was chosen to isolate the effect of residual error coding on diagnostic residuals under controlled conditions. Although the algebraic role of W in the definition of IWRES is general, the magnitude and practical visibility of the effects described here may differ in more complex settings, including multi-compartment models, nonlinear kinetics, sparse sampling designs, or models with stronger covariate structure. This is particularly relevant for combined error models and for population-level residuals, whose numerical behavior may be more sensitive to model structure and estimation context.
Second, all analyses were performed in NONMEM 7.6 with FOCE-I. Other nonlinear mixed-effects platforms, such as Monolix, Phoenix NLME, Pumas, or Bayesian frameworks, implement residual error models and residual diagnostics differently. Accordingly, the present conclusions should not be extrapolated mechanically beyond NONMEM, and the exact behavior of IWRES, WRES, and CWRESI under alternative coding strategies may differ across software environments. Third, we did not evaluate the impact of W coding on simulation-based diagnostics such as visual predictive checks (VPCs) or normalized prediction distribution errors (NPDE), which are increasingly recommended as alternatives to IWRES-based plots [
4]. Fourth, the THETA-based parameterizations (ADD.3, PROP.3, and COMB VAR.3) produced IWRES that were normalized but not numerically identical to their SIGMA-based counterparts (ADD.1, PROP.1, and COMB VAR.1), reflecting minor differences in the optimizer convergence path. Users should be aware that even between normalized codings, small numerical IWRES differences may arise.
In addition, the present simulations used 500 subjects to obtain stable and visually clear residual patterns. The qualitative conclusions regarding the mathematical role of W are not expected to depend on this exact sample size. However, with smaller datasets, such as 100 or 200 subjects, estimation uncertainty, shrinkage, and random variability may become more prominent, and the observed effects may, therefore, appear less stable or less cleanly separated. Finally, evaluation of these issues in real-world datasets would be a valuable extension of the present work. Such analyses could help determine how often these coding-related distortions materially affect diagnostic interpretation in routine applied modeling settings.
5. Conclusions
This study demonstrates that, within NONMEM, the coding of W in the $ERROR block can strongly influence the numerical values of IWRES without altering the likelihood, the objective function value (OFV), or the estimated model parameters. When the observation model Y remains unchanged, alternative definitions of W rescale the individual residuals used to compute IWRES. In such cases, differences in IWRES reflect differences in residual standardization rather than differences in model fit.
Conversely, when residual variability is parameterized directly within the definition of Y, alternative codings may correspond to different parameterizations of the same variance model. These parameterizations can preserve the likelihood while modifying the interpretation and numerical values of estimated residual error parameters.
These findings have practical implications for pharmacometric modeling within the NONMEM framework. First, when IWRES are intended to behave as standardized residual diagnostics, W should correspond to the model-predicted standard deviation of the residual error. Failure to define W accordingly can produce misleading IWRES distributions, even when the underlying model fit is identical. Second, apparent discrepancies in IWRES across alternative model codings should not be interpreted as evidence of model misspecification without first verifying whether the underlying observation model Y has actually changed.
More broadly, this work highlights the importance of clearly distinguishing between the statistical model used for estimation and the scaling used for residual diagnostics. Misinterpretation of this distinction may lead to incorrect conclusions about model adequacy when comparing alternative residual error codings.
In practice, W affects the scaling of residual diagnostics, whereas the likelihood is determined by the residual variance encoded in the observation model.