Imputation for Repeated Bounded Outcome Data: Statistical and Machine-Learning Approaches

: Real-life data are bounded and heavy-tailed variables. Zero-one-inﬂated beta (ZOIB) regression is used for modelling them. There are no appropriate methods to address the problem of missing data in repeated bounded outcomes. We developed an imputation method using ZOIB (i-ZOIB) and compared its performance with those of the naïve and machine-learning methods, using different distribution shapes and settings designed in the simulation study. The performance was measured employing the absolute error (MAE), root-mean-square-error (RMSE) and the unscaled mean bounded relative absolute error (UMBRAE) methods. The results varied depending on the missingness rate and mechanism. The i-ZOIB and the machine-learning ANN, SVR and RF methods showed the best performance.


Introduction
In any research study, one of the most important tasks is data analysis. The results of such analysis can support or refute the hypotheses proposed by the researchers. It is, therefore, important to have high-quality data to draw and extrapolate the conclusions. Several different aspects can be considered in the quality assessment of the data, such as time necessary to ensure that the value is new, consistency (the representation of the data should be invariant in all cases), completeness (the data should be complete, without missing values), and accuracy (to ensure that the recorded value is identical to the true value) [1].
Missing data are among the most frequent and often-evaluated problems in all types of surveys, especially in repeated or longitudinal studies. In the latter type of studies, the missingness or dropout rates can be affected by many known and unrelated factors such as refusal to participate, death of the subject, etc. Little and Rubin [2] have classified these factors into three groups: (a) missing completely at random (MCAR): the missing information is due to chance; (b) missing at random (MAR): the lack of information is conditioned solely by the observed values; and finally, (c) missing not at random (MNAR): the missing information depends on both missing and non-missing information.
Numerous studies conclude that imputing unobserved data is better than ignoring them [3][4][5]. The most used classical imputation methods replace the unobserved value by the mean or median, by the last observed value, or use multiple imputation or regression [6][7][8][9]. However, in most cases, the response variable is assumed to be an unbounded real value that follows a normal distribution. In real life, and increasingly the methods to the CARESS-CCR study. Finally, Section 5 discusses the results and draws some general conclusions.

Distributions
Beta regression is an increasingly popular statistical technique used to predict outcomes with the values in the range of (0,1), such as rates and proportions, patient-reported outcomes, or economics studies [11,29]. Nevertheless, in certain situations, outcome variables also take the extreme values. In this case, zero-or-one-inflated beta (ZOIB) regression can be employed, when outcomes have values in the range [0, 1), [0, 1), or [0,1].
Variables bounded in the interval [0,1] can take the following forms ( Figure 1): The general idea is to model the response variable (call it y) as a mixture of Bernoulli and beta distributions, from which the true 0 s and 1 s and the values between 0 and 1 are generated. The probability density function is: where 0 < α, γ, µ < 1 and φ >0. f (y; µ, φ) is the probability density function for the beta distribution, parameterised in terms of its mean µ and precision φ: α is a mixture parameter that determines the extent to which the Bernoulli or beta component dominates the probability density function. γ determines the probability that y = 1 if it comes from the Bernoulli component. µ and φ are the expected value and the precision for the beta component, which is usually parameterised in terms of p and q: µ = p + q and φ = p + q. [12].
It should be noted that these results can be generalised univocally to y variables whose values lie in the interval [a,b] by using the following mathematical transformation: where y" values range within the [0, 1] interval.

Imputation via Zero-One-Inflated Regression (i-ZOIB)
In this work, we assess the performance of zero-one-inflated beta regression (ZOIB) as a missing-data imputation technique. ZOIB, based on the Bayesian regression, has been developed [11] to model-bounded outcomes whose values are within the zero-one interval. After obtaining the computed predictive values, we used the predictive mean matching methodology [7,30] to impute the missing values.

• K-nearest neighbour (K-NN)
The k-nearest neighbour method belongs to the machine-learning family. This approach is a donor-based method where the replaced value is either a value measured for another record, or the average of k (k > 1) measured values. In contrast to classical techniques, k-NN can be applied to mixed-continuous and categorical data. In our case, the 1-NN technique has been applied according to the literature [31].
• Support vector regression A support vector machine (SVM) has originated in the research on the theory of statistical learning; it was introduced in the 1990s by Vapnik and his collaborators [32,33]. Support vector regression (SVR) is a variant of the SVM analysis, used as a regression scheme to predict values.
The SVM represents the sample points in space, separating the classes into two regions, as wide as possible, by employing a separation hyperplane, giving rise to support vectors (formed by the data points close to the hyperplane boundary).
When a new sample is added to the SVM model, it is categorised into one of the classes, depending on the space to which it belongs. A good separation between the classes allows a correct classification. Thus, an SVM builds a hyperplane or a set of hyperplanes in the space of very high or even infinite dimensionality that can be used in classification or regression problems. To extend such a concept to non-linear decision surfaces, the kernel functions are applied to transform the original data to map into a new space. In this way, the points of the vector that are on one side of the hyperplane are labelled as belonging to one category and those on the other side to the other. The predictor variable is called an attribute, and the attributes used to define the hyperplane are called characteristics.
The input data are viewed as a p-dimensional vector (an ordered list of p numbers), as in most supervised classification methods. The SVM looks for a hyperplane that optimally separates the points of one class from another, which could have been previously projected in the space of higher dimensionality. Support vector regression uses the same principle with some minor changes. As the output is a real number, it becomes difficult to predict the information "manually" since there are infinite possibilities. In the case of regression, a tolerance margin (epsilon) is established near the vector to minimise the error, considering that a part of that error is tolerated [34].
Random forests (RF) is an ensemble-learning algorithm based on decision trees and developed by Breiman [35].
Given a set of original data values S = {(x i , y i )} n i=1 , and a new independent test case C test with a predictor x test , the steps for developing a RF model are the next ones: i.
The set of values S is sampled with replacement using the bootstrap technique: On each B j resample (j = 1, . . . , L), we grow a regression tree T j . iii.
Finally, the predicted value of the predictor x test of the test case C test by RF, is obtained as the mean average of the results given by each of the individual T j trees. Mathematically, if y * j (j = 1, . . . , L) is the prediction of C test obtained on each of the T j trees, then the final RF prediction y test is reflected as follows: RF is generally used for both data classification and regression. For its performance, RF needs to fix the number of bootstrap samples (ntree), number of predictors (mtry) and the number of threes (ntree). RF hardly overfits when more trees are added, but produce a limited generalisation error [36].

•
Artificial Neural Network (ANN) An ANN is an algorithm based on a nervous system analogy. The general idea consists of emulating the learning capacity of the nervous system. Figure 2 shows a graphical illustration of a basic ANN architecture. The ANN learns to identify a pattern of association between the values of a set of variables predictors ({X i } n i=1 inputs) and the states or values that are considered dependent on these values (Y output). Thus, the ANN consists of a group of units of process (nodes) that resemble neurons by being interconnected through a network of relationships ({w i } n i=1 weights), which is represented as z = ∑ n i=1 X i ·w i . Before being transferred this information to the next node, an activation function f is applied to z. After that, the output α = f(z) will be moved to the subsequent node.
In the nodes, three types of operations are performed to determine the output of the neuron that are: propagation rule, activation function and output function. There are several activation functions such as the linear, ReLU, or softplus transformations to transfer the information from the hidden-layer (internal node) to the final output.  The XGBoost (Extreme Gradient Boosting) algorithm is a supervised learning technique [37] also based on decision trees and which is considered the state of the art in the evolution of these algorithms.
It consists of a sequential assembly of decision trees. The trees are added sequentially in order to learn from the results of the previous trees and correct the error produced by them, until the error can no longer be corrected (this is known as a "downward gradient". It works as follows: i. An initial tree F 0 is obtained to predict the target variable y, the result is associated with a residual (y-F 0 ). ii.
A new tree h 1 is obtained that adjusts to the error of the previous step. iii.
The results of F 0 and h 1 are combined to obtain the F 1 tree, where the root mean square error of F 1 will be less than that of F 0 : iv. This process is continued iteratively until the error is minimized as much as possible in the following way: The main advantages of the XGBoost algorithm are: (a) it can handle large databases with multiple variables as well as missing variables; (b) excellent execution speed. The main difference between the XGBoost and Random Forest algorithms is that in the former the user defines the extent of the trees, while in the latter the trees grow to their maximum extent. It uses parallel processing, tree pruning, handling of missing values and regularization (optimization that penalizes the complexity of the models) to avoid overfitting or bias of the model as much as possible.

• Gradient Boosting algorithms: Light Gradient Boosting models
This algorithm uses a tree-based learning algorithm developed by Guolin et al. [38] The biggest difference with XGBoost is that LGBM does not grow line by line at tree-level-wise, but instead at leaf-wise level, that makes the algorithm faster.

Simulation Design
A simulation study was conducted to assess the performance of different statistical and machine-learning methods of missing data handling.
For repeated data, let us consider 500 subjects and two measurements for each subject. The functional form of the model is as follows: where i = 1, . . . , 500 and j = 1,2.
For fixed effects, we constructed a design matrix X = [X 1 . . . X 1000 ], where each X k is a vector of 2 elements. The first element was constant and equal to 1, and the other elements were sampled from a beta distribution. For random effects, we constructed a design matrix b = [b 1 . . . b 500 ], where each b k is a vector of a Chi-square distribution with 3 degrees of freedom. X is fully observed.
As noted in the introduction, we focused on the management of missing data in repeated bounded outcomes. As bounded outcomes can adopt different shapes, we considered three distinct scenarios based on the designs mentioned above, which are related to real-life situations: • Scenario 1: inverse-J-shaped distribution both at the baseline and at the follow-up In this scenario, the subjects have low scores at the beginning and during the followup.

•
Scenario 2: inverse-J-shaped at the baseline and inverse U-shaped at the follow-up Here, subjects present low values at the beginning, but the values increase slightly during the follow-up.

•
Scenario 3: J-shaped distribution both at the baseline and at follow-up In this scenario, the participants show high scores at both time points. For each of the three scenarios, we generated missing observations for the followup measurement, according to the missingness mechanisms (MCAR, MAR or MNAR) and different missingness rates (10%, 20% and 30%). Given a dataset, the missingness probability p i on Y was defined as follows: Finally, we run this experiment 250 times with different missing value mechanisms and rates for each assessed scenario.

Setting up Parameters for Statistical and Machine-Learning Approaches
We used the statistical and machine-learning methods under each of the three scenarios. A value of k = 1 for the K-NN imputation approach has been chose, as mentioned in the literature [31]. With regard to the rest of the ML/AI approaches (SVR, RF, ANN, XGBOOST and LightGBoost), the hyperparameter tuning was carried out to determine the best optimized model performed on each of the simulated dataset and run times.

Accuracy Measures
Different accuracy measures were used to evaluate the performance of statistical and machine-learning imputation methods.
Given a dataset with n observations, let Y denote the true distribution of the variable of interest and Y*, the imputed variable after using each tested method. Then, we define the error e = (Y − Y*) as the difference between the true and the imputed distributions by some benchmark method. The error in the observations in which the missing data were generated was e miss = (Y − Y*) miss . Thus, to assess the impact of the imputation technique, we only focused on those observations that had been generated missing data and imputed afterwards.
There are many accuracy measures to assess the performance [39]; we employed the most commonly used: (a) Mean absolute error method (MAE, defined as the arithmetic mean of absolute errors).
(b) Mean squared error (MSE)/root mean squared error (RMSE) method. The arithmetic mean of squared errors is computed. Outliers affect the MSE since they give large weight to large errors. Moreover, as the errors are squared, the results are given on a distinct scale. Thus, the RMSE (the square root of MSE) is preferable to MSE. Nevertheless, RMSE is also sensitive to outliers.
As already noted, the MAE, MSE, and RMSE are scale-dependent accuracy measures with various advantages and disadvantages. However, it is recommended to use a measure that satisfies several related requirements, i.e., is informative, resistant to outliers, symmetric, scale-independent and easy to understand (provides intuitive results). A relatively new accuracy metric, the unscaled mean bounded relative absolute error UMBRAE [40], has been defined to tackle some of the existing problems. We included it in our imputation performance assessment. It is computed as follows: where e miss and e * miss are the errors in two different methods. Based on this parameter, we compute the UMBRAE as: When UMBRAE is equal to 1, the proposed method performs roughly as well as the reference method. When UMBRAE < 1, the assessed method performs roughly (1 − UMBRAE) × 100% better than the reference method and if UMBRAE > 1, the method is roughly (UMBRAE − 1) × 100% worse than the reference. In this study, the i-ZOIB technique was the proposed method, and its performance was compared with those of the remaining imputation approaches. For instance, an i-ZOIB/LOCF UMBRAE value below 1 means that i-ZOIB performed better than the LOCF; if UMBRAE > 1, then i-ZOIB performed worse.
This was applied to each of the 250 runs. Thus, we computed the summary statistics (mean and standard deviations, as well as the median and the 25th and the 75th percentiles) in order to assess the imputation method performance.

Simulation Results
Simulation results for all the assessed scenarios, for the different missingness rates and mechanisms, are displayed in Tables A1-A6 (Appendix A). Bold numbers show the lowest MAE and RMSE values on each missingness rate and mechanism.
For each simulated scenario, 250 distinct samples with missing data were generated, with different missingness mechanisms and rates. Then, they were imputed. Finally, these new distributions were compared with those not containing imputed data, yielding the results depending on the assessed scenario.
• Scenario 1: inverse-J-shaped distribution both at the baseline and at the follow-up The MAE and RMSE results for the inverse-J-shaped imputed and not-imputed distributions are displayed in supplementary Tables A1 and A2. Boxplots of the distribution of the MAE accuracy measure are also depicted in Figure 3.
In general terms, the mean value of MAE was less than 0.07, both in the MCAR and MAR mechanisms and in whatever the percentage of data loss was. However, when the unobserved values followed an MNAR distribution, the SVR, 1-NN and LightGBoost methods showed mean MAE values greater than 0.10 (eg, MNAR 10% data loss: SVR: 0.09, LightGBoost: 0.106; MNAR 30%: 1-NN: 0.099, LightGBoost: 0.121). On the other hand, the ANN method showed the smallest mean MAE values of all the techniques compared in all the combinations of mechanisms and percentages of data losses. More precisely, when the missing values had a pattern of the MNAR type, the i-ZOIB method was also, together with the ANN, the one that presented the lowest error when imputing the missing values for whatever the missingness rate was.
The SVR method presents a variability of MAE values higher than the rest of the methods in most of the situations studied, especially if the data imputation is carried out under the MAR or MNAR mechanism ( Figure 3).
Regarding the RMSE findings, the same results are observed as in the MAE values. As the missing value mechanism depends more on the observed and unobserved data, the RMSE values increase: the MCAR case, the RMSE ranges between 0.067 (ANN, 10% loss) and 0.09 (1-NNI), while for that in the MNAR, the results are between 0.084 (ANN, 10% loss) and 0.17 (SVR, 30%) (Table A2).
In the second step, boxplots of UMBRAE results were constructed for the different fixed missingness rates and mechanisms ( Figure 4). Based on the findings provided by the UMBRAE boxplots, all methods had similar performance to i-ZOIB. However, XGBOOST showed UMBRAE values less than 1 in the nine mechanism-percentage loss combinations studied in this data simulation. In situations where the loss of information was 30% (and any mechanism), the RF method showed UMBRAE values greater than 1.
Overall, its performance was comparable to that of the best method among them.
• Scenario 2: inverse-J-shaped distribution at the baseline and inverse U-shaped distribution at the follow-up.
In the second evaluated setting, MAE and RMSE results for the imputed inverse Ushaped distribution and non-imputed distribution were compared. The data are displayed in Tables A3 and A4.    Regarding the RMSE measure, its values were different according to the data loss mechanism: in both MCAR and MAR missingness mechanisms, all methods showed RMSE values lower than 0.10, while imputing the missing values with an MNAR distribution, these were higher than 0.10, except for ANN (in all percentages of unobserved values) and i-ZOIB (RMSE: 10% missingness rate: 0.085; 20% of missing data: 0.98). The methods that presented the lowest RMSE, by type of pattern, were the following: SVR and ANN (MCAR); SVR, RF and ANN (MAR); i-ZOIB and ANN (MNAR).
The boxplots of UMBRAE performance measurements for fixed missingness rates and mechanisms were constructed ( Figure 6). Overall, the machine-learning and AI-based algorithms showed similar performance respect to the reference (i-ZOIB). Nevertheless, under the MNAR mechanism at 10% and 20% missingness rate, the RF, XGBoost and LightGBoost methods showed lower UMBRAE values than 1. On the other hand, at 30% of loss of information (in any mechanism), the RF method showed UMBRAE values higher than 1.  Summary statistics of the MAE and RSME accuracy measures according to the missingness mechanisms and rates are displayed in Tables A5 and A6, respectively. Their corresponding boxplots are depicted in Figure 7.
In the current scenario, the mean MAE values ranged between 0.026 (LigthGBoost, MNAR, 10% losses) and 0.12 (SVR, MNAR, 30% missing values). Both in the MCAR and MAR mechanisms, i-ZOIB showed higher values than the rest of the techniques (MAE value: 0.06, versus 0.03-0.04 in the rest of the proposals). Furthermore, the variability of the MAE observed in each of the imputation methods was similar (Figure 7). On the other hand, in MNAR, the SVR method was the one who presented the worst MAE values (MAE: 30% losses: 0.12 vs. 0.02-0.03 the rest of the methods), also showing a higher variability than the rest of the techniques.
Boxplots show the distributions of UMBRAE values by the imputation method, mechanism, and missingness rate (Figure 8). In the MCAR mechanism, it is observed that the ML/AI techniques show UMBRAE values higher than 1, while in the MAR scenario, the 1-NN method is the only one that exceeds the threshold of 1. However, in the MNAR setting, both SVR, ANN and those based on boosting algorithms-XGBOOST, LightGBoostpresented values below 1. Both 1-NN and RF continued to reflect values beyond 1.

Description of the Data Set
The CARESS-CCR study started in 2010 and is still in progress. Patients with colorectal cancer (CRC) who had undergone the surgical procedure have been consecutively recruited and followed-up. The study has been approved by the research committee of the corresponding centres, and the patients have provided their informed consent to participate in the study.
The socio-demographic data and clinical variables were obtained. The patient-reported outcomes (PRO) were also included in the study. The participants could report their healthrelated quality of life (HRQoL) status by filling the appropriate questionnaires, starting at the baseline measurement point. A subset of patients with no missing values in the covariate related to the outcomes and the dropout process was selected. Moreover, to focus on the repeated study design to achieve our objectives, we selected the PRO measures to be examined at baseline and one year after the index surgical intervention.
We employed the k-NNI, SVR, and i-ZOIB methods to impute missing values in this subset of the outcome. We used the already-mentioned naïve techniques as control methods.
Our outcomes of interest and variables to be imputed were of two types. The first one was the total score in the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire, Core30 (QLQ-C30) [41,42]. This summary score (ranging from 0 to 100) covers all symptom (e.g., fatigue, pain) and function domains (e.g., emotional and social functioning) obtained for the cancer patients using the QLQ-C30. A single, higher-order HRQoL score has been proposed as a more meaningful and reliable measure in oncological research, where the higher the value, the better status of the patient. We also considered the anxiety level (HADS-A) reported by the patients using the HADS questionnaire [43,44]. It ranges from 0 to 21, where high scores reflect high anxiety levels and a reduction in well-being status. As covariates, we also selected the patient baseline HRQoL status and their scores in the health status.
The differences between the distributions obtained by the proposed imputation methods were evaluated.

Results
The study sample was composed of 968 patients. All participants reported their EORTC QLQ-C30 and HADS-A scores at the baseline point. Three hundred subjects (31%) did not fill the EORCT QLQ-C30 questionnaire a year after the surgical intervention. Similarly, 310 (32%) did not report their HAD-A levels during the follow-up. Figure 9 shows both the baseline and follow-up distributions of the HADS-A and EORTC QLQ-C30 scores. The former had a distribution (Scenario 3 of the simulation study), whereas the latter formed J-shaped curves at both time points (Scenario 1 in the simulation study).
To decide which imputation model to use, a logistic regression procedure was performed to assess the missingness mechanism related to the EORTC QLQ-C30 dropout during a year of follow-up. The results indicated that both the baseline EOTRC QLQ-C30 and HADS-A scores were associated with the dropout process. Likewise, both scores were related to the HADS-A follow-up dropout.
The distributions of the scores after applying different imputation methods are displayed in Figures 10 and 11. For both outcomes, imputing the missing values using the XGBOOST method led to biased distributions since the imputed values are outside the natural range of the imputed variable. The i-ZOIB method as well as the ANN, 1-NN, RF, and 1-NN imputation techniques provided similar curve shapes.

Discussion
Here, we evaluated the performance of several statistical and machine-learning approaches for handling missing data in bounded and non-bell-shaped repeated outcomes, a problem not often analysed so far. Our findings show that the i-ZOIB technique and ANN, SVR and RF machine-learning methods outperform the others in the accuracy assessments performed for different missingness mechanisms and rates in each of the evaluated scenarios.
Imputation using naïve methods such as the mean, median, or the LOCF technique is widely used. These naïve techniques ignore the correlation between the variables and underestimate the variability of the variable itself [45].
In the simulation study, we used the i-ZOIB method. This is a zero-one-inflated beta regression (based on Bayesian analysis) combined with the predictive mean matching methodology. The missing value is imputed using an observed value from a donor pool of k-candidate donors. i-ZOIB has several advantages over our other selected methods. This regression is appropriate for modelling outcomes with values within the [0,1] interval, as noted in the Methods section. This ensures that the predicted values remain within the same interval. It also introduces variability in the predicted values, as it is based on a Bayesian regression model. It is combined with the PMM method, which relaxes some of the assumptions of parametric imputation. Similar opinions have been presented in analogous studies [46]. Other statistical approaches have been considered to address this issue [27], but under another distinct setting. To our knowledge, it is the first approach for handling missing bounded outcomes in a repeated measurement setting.
In this work, we have compared the performance of machine learning and artificial intelligence techniques to address the treatment of missing values. In addition to the K-NN, the traditional algorithms ANN, SVR, RF, and those based on gradient descendent (XGBOOST and LightGBoost) have been applied. The results obtained in the simulation study indicate that the mean RMSE values obtained by ANN in the different scenarios contemplated were lower than reported the rest of the techniques. This may be due to the fact that the SVR, RF, XGBOOST and LightGBoost methods are based on the feature selection, where they only consider the influence of each of the variables to establish the prediction. On the contrary, the ANN models evaluate the information considering the fusion of all the variables together. Furthermore, overestimation could be avoided (RFs are prone to provide overestimated models) by using the regularization method and the activation function during the training process. In our case, we have used the softplus function, a smoothed version of the well-known ReLu. We also conducted the imputation procedures using the K-NN approach, which in our case, 1-NN has been performed. This method, compared to other ML/AI methods, did not provide good results in MAE and RMSE assessments. As a single-imputation machine-learning technique, it takes into account relationships between variables, based on distance measurements.
In our results, both in the simulation and in the application studies, the XGBOOST method reflected imputed values beyond the range of values that the variable of interest is defined. This may be due to the design of the method itself. After the first iteration, each of the following trees is based on the error that occurred in the previous step/tree. For this reason, while the initial tree is restricted to the domain of the variable to be imputed, summing across gradient trees is not [31,47].
Another of the most important aspects is the type of pattern or distribution in which the missing values follow. The MCAR and MAR mechanisms either are patterns where the loss of information is established randomly (MCAR) or is conditioned by the observed values (MAR). No less important is the MNAR. In this case, the missing values not only depend on the observed information but also on the unobserved one. MNAR is non-ignorable, because to do so would lead to biased results. Therefore, knowing the mechanism is useful in identifying the most appropriate analysis. Our findings indicate that there is no single method in each of the data loss patterns and percentages, since, in each combination, we find, in addition to the ANN, other methods such as SVR or i-ZOIB that present similar mean MAE values. Similar results have been obtained. Similar results were concluded in different settings [13][14][15][16][17][18][19][20][21][22][23] when handling missing data using ML/AI techniques, depends on the scenario and type of the data structure to be imputed.
We used MAE, RMSE, and UMBRAE accuracy measures to assess the performance of the chosen imputation methods. Remarkably, UMBRAE is a good index to compare all the benchmarks, given a reference method. It combines the best features of various alternative measures into a single new one: it is a simple method, flexible and easy to use and understand and resistant to outliers. Moreover, the benchmark for calculating UMBRAE is selectable, and the ideal choice should be a forecasting method to be outperformed.
The results of this study can be only extrapolated to settings similar to those considered in our simulation, i.e., the three plausible real-life scenarios where the outcomes (bounded data) followed a heavy-tailed distribution.

Conclusions
We compared statistical and machine-learning approaches for handling missing data in skewed, bounded, and repeated outcomes, using plausible simulated real-life data. In the most frequent real-life settings (Scenario 1 and Scenario 3), the MAE and RMSE assessments of the i-ZOIB (statistical) as well as the ANN, SVR, and RF and (machinelearning) imputation methods resulted in low values. These methods can handle high missingness rates. However, their MAE and RMSE results could be improved; this might be achieved by optimising the parameters for each of these approaches.
In conclusion, our results can significantly contribute to improvements in the accuracy of analysis of bounded heavy-tailed and repeated outcomes. This would benefit many future research endeavours and real-life applications, such as modelling repeated HRQoL data or cost-effectiveness studies. Funding: This work was partially supported by a research grant from Instituto de Salud Carlos III (PI13/00013, PI18/01589) to U. Aguirre; Department of Health of the Basque Country (2010111098); KRONIKGUNE, Institute for Health Services Research (KRONIK 11/006); and the European Regional Development Fund. These institutions had no further role in study design; in the collection, analysis or interpretation of data; in the writing of the manuscript; or in the decision to submit the paper for publication.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the application study.

Data Availability Statement:
The dataset used for the application study is available from the corresponding author on reasonable request.

Acknowledgments:
The authors extend their appreciation to Jose María Quintana. The authors would like to thank the anonymous referees for their valuable comments and efforts which definitely help to improve the quality of the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Supplementary Tables displaying MAE and RMSE values obtained on each of the studied scenario.