Time Series of Counts under Censoring: A Bayesian Approach

Censored data are frequently found in diverse fields including environmental monitoring, medicine, economics and social sciences. Censoring occurs when observations are available only for a restricted range, e.g., due to a detection limit. Ignoring censoring produces biased estimates and unreliable statistical inference. The aim of this work is to contribute to the modelling of time series of counts under censoring using convolution closed infinitely divisible (CCID) models. The emphasis is on estimation and inference problems, using Bayesian approaches with Approximate Bayesian Computation (ABC) and Gibbs sampler with Data Augmentation (GDA) algorithms.


Introduction
Observations collected over time or space are usually correlated rather than independent. Time series are often observed with data irregularities such as missing values or detection limits. For instance, a monitoring device may have a technical detection limit and it records the limit value when the true value exceeds/precedes the detection limit. Such data is called censored (type 1) data and are common in environmental monitoring, physical sciences, business and economics. In particular, in the context of time series of counts, censored data arise in call centers. In fact, the demand measured by the number of calls is limited by the number of operators. When the number of calls is higher than the number of operators the data is right censored and the call center incurs under-staffing and poor service to the costumers.
The main consequence of neglecting censoring in the time series analysis is the loss of information that is reflected in biased and inconsistent estimators and altered serial correlation. These consequences can be summarized as problems in inference that lead to model misspecification, biased parameter estimation, and poor forecasts. These problems have been solved in regression settings (i.i.d.) and partially solved for Gaussian time series (see for instance [1][2][3][4][5][6][7]). However, the problem of modelling time series under censoring in the context of time series of counts has, as yet, received little attention in the literature even though its relevance for inference. Count time series occur in many areas such as telecommunications, actuarial science, epidemiology, hydrology and environmental studies where the modelling of censored data may be invaluable in risk assessment.
In the context of time series of counts, Ref. [8] deal with correlated under-reported data through INAR(1)-hidden Markov chain models. A naïve method of parameter estimation was proposed, jointly with the maximum likelihood method based on a revised version of the forward algorithm. Additionally, Ref. [9] propose a random-censoring Poisson model for under-reported data, which accounts for the uncertainty about both the count and the data reporting processes.
Here, the problem of modelling count data under censoring is considered under a Bayesian perspective. In this paper, we consider a general class of convolution closed infinitely divisible (CCID) models as proposed by [10].
We investigate two natural approaches to analyse censored convolution closed infinitely divisible models of first order, CCID(1), using the Bayesian framework: the Approximate Bayesian Computation (ABC) methodology and the Gibbs sampler with Data Augmentation (GDA).
Since the CCID(1) under censoring presents an intractable likelihood, we resort to the Approximate Bayesian Computation methodology for estimating the model parameters. The presupposed model is simulated by using sample parameters taken from the prior distribution, then a distance between the simulated dataset and the observations is computed and when the simulated dataset is very close to the observed, the corresponding parameter samples are accepted as part of the posterior.
In addition, a widely used strategy to deal with censored data is to fill in censored data in order to create a data-augmented (complete) dataset. When the data-augmented posterior and the conditional pdf of the latent process are both available in a tractable form, the Gibbs sampler allows us to sample from the posterior distribution of the parameters of the complete dataset. This methodology is called Gibbs sampler with Data Augmentation (GDA). Here, a modified GDA, in which the data augmentation is achieved by multiple sampling of the latent variables from the truncated conditional distributions (GDA-MMS), is adopted.
The Poisson integer-valued autoregressive models of first-order, PoINAR(1), is one of the most popular classes of CCID models. It was proposed by [11,12] and extensively studied in the literature and applied to many real-world problems because of its ease of interpretation. To motivate the proposed approaches, we present in Figure 1 a synthetic dataset with n = 350 observations generated from a PoINAR(1) process with parameters α = 0.5 and λ = 5 (X t , blue line) and the respective right-censored dataset (Y t , red line), at L = 11, corresponding to 30% of censoring. If we disregard the censoring, the estimates for the parameters (assuming an PoINAR(1) model without censoring) present a strong bias. For instance, in the frequentist framework, the conditional maximum likelihood estimates areα CML = 0.6174 andλ CML = 3.4078, while in the Bayesian framework, the Gibbs sampler givesα Bayes = 0.6242 andλ Bayes = 3.3297. On the other hand, if we assume a PoINAR(1) model under censoring, the parameter estimates given by the proposed approaches described in this work are, respectively,α ABC = 0.4623 andλ ABC = 5.2259, and α GDA = 0.4834 andλ GDA = 4.9073. Therefore, it is important to consider the censoring in data in order to avoid some inference issues that lead to a poor time series analysis. The remainder of this work is organized as follows. Section 2 presents a general class of convolution closed infinitely divisible (CCID) models under censoring. Two Bayesian approaches proposed to estimate the parameters of the censored CCID(1) model are described in Section 3. The proposed methodologies are illustrated and compared with synthetic data in Section 4. Finally, Section 5 concludes the paper.

A Model for Time Series of Counts under Censoring
This section introduces a class of models adequate for censored time series of counts based on the convolution closed infinitely divisible (CCID) models as proposed by [10].

Convolution Closed Models for Count Time Series
First we introduce some notation. Consider a random variable X with a distribution F µ , µ > 0, belonging to the convolution closed infinitely divisible (CCID) parametric family [10]. This means, in particular, that the distribution F µ is closed under convolution, F µ 1 * F µ 2 = F µ 1 +µ 2 , where * is the convolution operator. Let R(·) denote a random operator on X such that R(X) ∼ F αµ , 0 < α < 1 and the conditional distribution of R(X) given As an example, consider a Poisson random variable, X ∼ Po(µ) and a binomial thinning operation, (1−α)µ,x is the Binomial distribution with parameters x and α.

Modelling Censoring in CCID(1) Time Series
Given a model as in (1), a basic question is whether it properly describes all the observations of a given time series, or whether some observations have been affected by censoring. Here, we describe a model for dealing with censored observations in CCID(1) processes and study some of its properties.
Exogenous censoring can be modelled assuming (1) as a latent process and Y t = min{L, X t } as the observed process, where L is a constant that is assumed to be known. For simplicity of exposition we assume exogenous right censoring but all the results are easily extended to left-censoring or interval censoring. Hence, for right exogenous censoring Although X t , a CCID(1) process is Markovian, the exogenous censoring implies that Y t is not Markovian because Y t depends on X t and L. Furthermore, Y t is not CLAR (Conditionally Linear AutoRegressive). In fact, The authors Zeger and Brookmeyer [1] established a procedure to obtain the likelihood of an observed time series under censoring, y = (Y 1 , . . . , Y n ), which becomes infeasible when the proportion of censoring is large. To overcome this issue, this work considers a Bayesian approach.

Bayesian Modelling
The Bayesian approach to the inference of an unknown parameter vector θ is based on the posterior distribution π(θ|y), defined as where L(y|θ) is the likelihood function of the observed data y and π(θ) is the prior distribution of the model parameters.
When the likelihood is computationally prohibitive or even impossible to handle, but it is feasible to simulate samples from the model (bypass the likelihood evaluation), as is the case of censored CCID(1) processes, Approximate Bayesian Computation (ABC) algorithms are an alternative. This methodology accepts the parameter draws that produce a match between the observed and the simulated sample, depending on a set of summary statistics, a chosen distance and a selected tolerance. The accepted parameters are then used to estimate (an approximation of) the posterior distribution (conditioned on the summary statistics that afforded the match).
On the other hand, the idea of imputation arises naturally in the context of censored data. The Gibbs sampler with Data Augmentation (GDA) allows us to obtain an augmented dataset from the censored data by using a modified version of the Gibbs sampler, which samples not only the parameters of the model from its complete conditional but also the censored observations. The usual inference procedures may then be applied to the augmented data set.

Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) is based on an acceptance-rejection algorithm. ABC is used to compute a draw from an approximation of the posterior distributions, based on simulated data obtained from the assumed model in situations where its likelihood function is intractable or numerically difficult to handle. Summary statistics from the synthetic data are compared with the corresponding statistics from the observed data and a parameter draw is retained when there is a match (in some sense) between the simulated sample and the observed time series observation.
Recently, Ref. [16] provided the asymptotic results pertaining to the ABC posterior, such as Bayesian consistency and asymptotic distribution of the posterior mean.
It is well known that, if we use a proper distance measure, then as δ tends to zero, the distribution of the accepted values tends to the posterior distribution of the parameter given the data. When the summary statistics are sufficient for the parameter, then the distribution of the accepted values tends to the true posterior as δ tends to zero, assuming a proper distance measure on the space of sufficient statistics. The latent structure of the thinning operator means that the reduction to a sufficient set of statistics of dimension smaller than the sample size is not feasible and, therefore, informative summary statistics are often used [19].
In this work, given the characteristics of the data under study to compare the observed data (y 0 ) and the synthetic (simulated ) data (y), we consider two distinctive characteristics of CCID(1) time series which are affected by the censoring: (i) the empirical marginal distribution and (ii) lag 1 auto-correlation.
To measure the similarity between the empirical marginal distributions the Kullback-Leibler (Note that Kullback-Leibler distance measures the dissimilarity between two probability distributions.) distance is calculated as wherep 0 j andp j denote the empirical marginal distribution of the observed time series and that of the simulated time series, respectively, estimated by the corresponding sample proportions, p 0 j = 1 n ∑ n j=1 I {Y 0 j =j} andp j = 1 n ∑ n j=1 I {Y j =j} . Wheneverp 0 j is zero, the contribution of the jth term is interpreted as zero because lim p→0 p ln(p) = 0.
Additionally, we estimated the censoring rates, S 3 (y 0 ) = 1 n ∑ n t=1 I {y 0 t =L} and S 3 (y) = 1 n ∑ n t=1 I {Y t =L} , which are also compared by their squared difference. Thus, for each set of parameters, θ (k) , a time series x (k) is generated from the model CCID (1) and right censored at L, yielding y (k) = (Y (k) 1 , . . . , Y (k) n ) and the above statistics, S 1 (y (k) ), S 2 (y (k) ) and S 3 (y (k) ) are computed. Combining these statistics in a metric leading to the choice of the parameters θ requires scaling. Thus, we propose the following metric where S i (y 0 ) and S i (y (k) ) are the ith statistics obtained respectively from the observed and kth simulated data and V(S 1 (y)) and V(S i (y 0 ) − S i (y)) are the corresponding sample variances across the replications.
In summary, we propose Algorithm 1 for ABC approach based on [20]: (1) For k = 1, ..., N Sample θ (k) from the prior distribution π(θ) Generate a time series with n observations, x (k) from the CCID(1) model Right truncate at L x (k) to obtain the simulated data y (k) Compute S 1 (y (k) ), S 2 (y (k) ) and S 3 (y (k) ) Select the values θ (k) corresponding to the 0.1% quantile of d

Gibbs Sampler with Data Augmentation
Gibbs sampling is a Markov chain Monte Carlo (MCMC) algorithm that can generate samples of the posterior distribution from their full conditional distributions [21]. When the data are under censoring or there are missing values, both cases leading to an incomplete data set, Ref. [22] proposed to combine the Gibbs sampler with data augmentation. This methodology implies imputing the censored (or missing) data, thus obtaining a complete dataset, and then dealing with the posterior of the complete data through the iterative Gibbs sampler. Therefore, the Gibbs sampler is modified in order to sample not only the parameters of the model from their complete conditionals but also the censored observations, obtaining an augmented (complete) dataset z = (z 1 , . . . , z n ) where where F µ (x|x ≥ L) is the truncated marginal distribution of the CCID(1) model with support in [L, +∞[. Furthermore, we consider a modified sampling procedure for the imputation, designated as Mean of Multiple Simulation (MMS), proposed by [23] consisting in sampling from F µ (x|x ≥ L) multiple times, say m, and then imputing with the (nearest integer value) median of the m samples. This procedure is designated by GDA-MMS . The augmented dataset can be considered as a CCID(1) time series and with a conditional likelihood function given by Equation (3). The posterior distribution of θ is given by where π(θ) is the prior distribution of the parameters. In CCID(1) models the complexity of p(θ|z) requires resorting to Markov Chain Monte Carlo (MCMC) techniques for sampling from the full conditional distributions. The procedure is summarized in Algorithm 2 and detailed for the CensPoINAR(1) case in Sections 3.3 and 4.1. (1) , . . . , θ (N) ] and z (N) .

The Particular Case of CensPoINAR(1)
This section details the ABC and GDA-MMS procedures to estimate a censored CCID(1) with the binomial thinning operation and Poisson marginal distribution, the censored Poisson INAR(1), CensPoINAR(1), model.
Consider the censored observations y = (Y 1 , . . . , Y n ) from a PoINAR(1) time series x = (X 1 , . . . , X n ) defined as with α • X t−1 = ∑ X t−1 i=1 ξ ti , ξ ti ∼ iid Ber(α), e t ∼ Po(λ) and X t ∼ Po( λ 1−α ). Then θ = (α, λ) and given x, the conditional likelihood function is given by Under a Bayesian approach, we need a prior distribution for θ. In the absence of prior information, we use weakly informative prior distributions for θ detailed below. (1) The ABC procedure described in Algorithm 1 is now implemented for the censored PoINAR(1). For the parameter 0 < α < 1, we choose a non-informative prior U(0, 1), while for the positive parameter λ, we choose a non-informative U(0, 10). The former allows us to explore all the support space for α. The choice of U(0, 10) as a prior for λ > 0 allows us to explore a restricted support for the parameter that is in accordance with small counts. (1) Under the GDA-MMS approach, we first need to obtain a complete data set z = (z 1 , . . . , z n ) by imputing the censored observations, see (8). In this work, we draw m = 10 replicates of the right truncated at L Poisson distribution with parameter λ 1−α , w i ∼ Po λ 1−α × I (w i ≥L) and set z t = median(w) ( c represents the integer ceiling of c), Figure 2 shows an augmented dataset (Z t , black line) from the synthetic data presented in Figure 1. As remarked above, given the complexity of the posterior distribution, Markov Chain Monte Carlo techniques are required for sampling from the full conditional distributions. Thus, the Adaptive Rejection Metropolis Sampling (ARMS) is used inside the Gibbs sampler [24]. Also in this approach, in the absence of prior information, we use weakly informative prior distributions for (α, λ). Thus, for the parameter 0 < α < 1, we choose a non-informative beta prior, conjugate of the binomial distribution, with parameters (a, b), while for the positive parameter µ, we choose a non-informative Gamma (shape, rate) prior, conjugate of the Poisson distribution, with parameters (c, d). The full conditional of λ is given by

Illustration
This section illustrates the procedures proposed above to model CCID(1) right-censored time series in the particular case of Poisson distribution and binomial thinning operation.
For the ABC estimates, we run N = 10 6 replications and choose the pairs (α, λ) corresponding to the 0.1% lower quantile of d (k) S , Equation (7), in total of 1000 values from which the estimates are computed as the mean value. Software R [25] was used to implement the ABC algorithm.
To implement GDA-MMS algorithm we consider the initial values θ (0) = (α (0) , λ (0) ) given by the Conditional Least squares estimates of α and λ [24]. The hyper-parameters for the prior distributions of α and λ are the following α ∼ Beta(a = 2, b = 2) and λ ∼ Gamma(c = 0.1, d = 0.1). In this work, the function armspp was used from the package armspp [26] in R to sample from the full conditional distributions. Several experiments were carried out to analyse the size that the chain should have in order to be stable and, thus, the number of Gibbs sampler iterations used in this work is N = 15,000. Among these, we ignored the first 5000 simulations as burn-in time and, to reduce autocorrelation between MCMC observations, we considered only simulations from every 30 iterations. Therefore, we use a simulated sample with size 323 to obtain the Bayesian estimates. A convergence analysis with the usual diagnostic tests was performed with the package coda [27] in R [25]. Tables 2 and 3 summarize ABC and GDA-MMS results for the several scenarios described above: point estimates,α andλ, obtained as sample means, the corresponding bias, standard deviation and the coefficient of variation. The results indicate that the bias tends to decrease for large sample sizes and small censoring rates. The results also indicate that overall ABC presents estimates with smaller bias but larger variability when compared with GDA-MMS. Table 2. ABC and GDA-MMS results for parameter α (sample mean, and the corresponding bias, standard deviation and coefficient of variation) for synthetic data generated from CensPoINAR(1) models. Additionally, Figures 3 and 4 represent the corresponding posterior densities. The plots show unimodal and approximately symmetric distributions, with a dispersion that clearly decreases with increasing sample size and smaller censoring rate. The posterior densities indicate that the ABC approach produces posteriors that are flatter but with modes very close to the true value, while the corresponding GDA-MMS approach, despite producing posteriors which are more concentrated also evidence higher bias. However, the behaviour of GDA-MMS estimates varies with the parameters and even the sample sizes. These results are representative of the properties of GDA-MMS estimates across a large number of experiments, not reported here for conciseness.

Simulation Study for GDA-MMS
This section presents the results of a simulation study designed to further analyse the sample properties of GDA-MMS, in particular the bias of the resulting Bayesian estimates.
For that purpose, realizations with sample sizes n = 100 and n = 350 of Cen-sPoINAR(1) models with parameters θ = (0.2, 3) and θ = (0.5, 5), are generated, considering two levels of censorship, namely 30% and 5%. To analyse the performance of the procedure, the sample posterior mean, standard deviation and mean squared error were calculated over 50 repetitions.
Boxplots of the sample bias for the 50 repetitions of GDA-MMS methodology are presented in Figures 5 and 6. The bias increases with the rate of censoring and the variability decreases with the sample size. Furthermore, in general, the estimates for α presents positive sample mean biases, indicating that α is overestimated, whilst the estimates for λ shows negative sample biases, indicating underestimation for λ. Both bias and dispersion seem larger for λ.  Tables 4 and 5 present the sample posterior measures forα andλ, respectively. We can see improvement of the estimation methods performance when the sample size increases. Additionally, the higher the censoring percentage, the worse the behavior of the proposed methods.

Final Comments
This work approaches the problem of estimating CCID(1) models for time series of counts under censoring from a Bayesian perspective. Two algorithms are proposed: one is based on ABC methodology and the second a Gibbs Data Augmentation modified with multiple sampling. Experiments with synthetic data allow us to conclude that both approaches lead to estimates that present less bias than those obtained neglecting the censoring. Moreover, the GDA-MMS approach allows us to obtain a complete data set, making it a valuable method in other situations such as missing data.
In this study, we focus on the most popular CCID(1) model, the Poisson INAR(1). However, if the data under study present over-or under-dispersion, other CCID(1) models with appropriate distributions for the innovations, such as Generalised Poisson or Negative Binomial, can easily be entertained. Furthermore, one can consider different models for time series of counts under censoring, based on INGARCH models, ( [28,29] using a switching mechanism) if they are more suitable to the data set to be modeled. These issues are beyond the scope of this paper and are a topic for a future research project.
Author Contributions: The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding: The first and third authors were partially supported by The Center for Research and Development in Mathematics and Applications (CIDMA) through the Portuguese Foundation for Science and Technology (FCT-Fundação para a Ciência e a Tecnologia), reference UIDB/04106/2020. The second author is partially financed by National Funds through the Portuguese funding agency, FCT, within project UIDB/50014/2020.

Conflicts of Interest:
The authors declare no conflict of interest.