Analysis of Longitudinal Binomial Data with Positive Association between the Number of Successes and the Number of Failures: An Application to Stock Instability Study

Numerous methods have been developed for longitudinal binomial data in the literature. These traditional methods are reasonable for longitudinal binomial data with a negative association between the number of successes and the number of failures over time; however, a positive association may occur between the number of successes and the number of failures over time in some behaviour, economic, disease aggregation and toxicological studies as the numbers of trials are often random. In this paper, we propose a joint Poisson mixed modelling approach to longitudinal binomial data with a positive association between longitudinal counts of successes and longitudinal counts of failures. This approach can accommodate both a random and zero number of trials. It can also accommodate overdispersion and zero inflation in the number of successes and the number of failures. An optimal estimation method for our model has been developed using the orthodox best linear unbiased predictors. Our approach not only provides robust inference against misspecified random effects distributions, but also consolidates the subject-specific and population-averaged inferences. The usefulness of our approach is illustrated with an analysis of quarterly bivariate count data of stock daily limit-ups and limit-downs.


Introduction
There is continuing interest in developing mixed models for longitudinal binomial data [1,2]; however, these methods in the literature generally assume a fixed number of trials, and thus imply a negative association between the number of successes and the number of failures. As the number of trials is fixed, an increase in the number of successes implies a decrease in the number of failures, and vice versa. In practice, however, the number of successes and the number of failures are often positively associated when the number of trials is random. That is, the number of successes and the number of failures can increase or decrease simultaneously. This can be illustrated further with a hypothetical example in Table 1. Clearly the sample correlation is 1 although both probabilities of success and failure are 0.5. This is because the two outcomes can increase together if the totals are not fixed. On the other hand, when the totals are fixed, the increase of one outcome is at the loss of the other; therefore, they are negatively associated. Thus, the inference methods for the case of fixed cluster sizes are no longer valid for the case of varying cluster sizes since the key assumption of negative association has been violated [3].  Time 1  Time 2  Time 3  Time 4  Time 5  Time 6  Time 7  Time 8   Success  10  20  30  40  50  60  70  80  Failure  10  20  30  40  50  60  70  80   Total  20  40  60  80  100  120 140 160 In the analysis of clustered binary data, the importance of accounting for randomness in the cluster sizes has long been recognized in developmental toxicity studies, disease aggregation and behaviour studies [4,5]; therefore, varying cluster sizes are likely to occur if such data are collected longitudinally in these areas. Hence, mixed models that can accommodate a random number of trials are also needed in the analysis of longitudinal binomial analysis of positively associated numbers of successes and numbers of failures. In addition, traditional approaches to mixed models for longitudinal binomial data usually rely on the specification of particular random effects distributions; therefore, concerns over the validity of the assumed random effects distributions and the robustness of such inferences were raised [6].
Our work is motivated by the daily price limit policy in Chinese stock market. Quarterly bivariate counts of stock daily limit-ups and limit-downs were collected over 49 seasons from the second quarter of 2007 to the second quarter of 2019 for 60 randomly selected stocks. The quarterly counts of stock daily limit-ups and the quarterly counts of stock limit-downs were positively correlated over time within every stock; the Pearson correlation coefficient ranged from 0.11 to 0.92 with an average of 0.54 for these 60 stocks. The mixed models for longitudinal binomial data in the literature imply a negative association between the number of successes and the number of failures and are inappropriate for the analysis of quarterly bivariate counts of stock daily limit-ups and limit-downs for these stocks. These binomial mixed models in the literature also imply that the number of trials is always positive; however, more than 58% of the corresponding number of trials for our quarterly bivariate counts of stock daily limit-ups and limit-downs were exactly zero. The research on the price limit of stock mainly focuses on its impact on stock volatility [7][8][9]. It is generally believed that the price limit policy strengthened the herding effect of the market and made stock prices a self-exciting process. This is indeed one of the characteristics of Chinese stock market, that is, stocks are prone to frequent limit-ups and limit-downs. For market managers, in the process of extreme market fluctuations, the limit-ups and limitdowns bring a lack of liquidity, and the market is prone to systemic risks. For investors, the extreme volatility of stocks will directly affect their investment decisions and investment returns; therefore, the instability of the stocks is of great interest in the analysis of extreme price fluctuations but has not been studied so far.
In this paper, we propose a three-level joint Poisson mixed model for both longitudinal counts of successes and longitudinal counts of failures in the longitudinal binomial data with varying numbers of trials over time. Without loss of generality, we describe the model here in terms of quarterly data of stock daily limit-ups and limit-downs to facilitate understanding. First, we introduce stock-specific random effects shared by both counts of stock daily limit-ups and limit-downs. The higher the stock-specific distribution-free random effects, the higher both quarterly counts of stock daily limit-ups and limit-downs; therefore, the stock-specific random effect characterizes the instability of the corresponding stock. Second, conditioning on stock-specific random effects, we introduce two sequences of autocorrelated distribution-free random effects; one for quarterly counts of stock daily limitups, whereas the other is for quarterly counts of stock daily limit-downs. Finally, given both stock-specific and the corresponding autocorrelated random effects, we assume that both quarterly counts of stock daily limit-ups and limit-downs follow Poisson distributions with time-dependent covariates. This joint Poisson mixed model can accommodate randomness and zero in the total counts of quarterly stock daily limit-ups and limit-downs at each time point and imply a positive cross association between the quarterly counts of stock daily limit-ups and the quarterly counts of stock limit-downs. Following Ma and Jørgensen [10] and Ma et al. [5], we develop a model estimation based on the orthodox best linear unbiased predictors (BLUP) of random effects given the data. As our approach does not require the specification of any parametric distribution for random effects, our inference is robust against misspecified random effects distributions. To the best of our knowledge, this is the first time a mixed model is developed for longitudinal binomial data where the number of successes and the number of failures are positively associated over time.
The rest of the paper is organized as follows. After introducing quarterly count data of stock daily limit-ups and limit-downs in Section 2, we propose a joint model for bivariate longitudinal counts and discuss its implied longitudinal binomial model in Section 3. In Section 4, we discuss the orthodox best linear unbiased predictors of random effects and model estimation. We analyze the quarterly stock price limits data in Section 5 and conclude in Section 6.

Quarterly Data of Stock Daily Limit-Ups and Limit-Downs
A unique daily price limit policy has been in place in Chinese stock market since 13 December 1996. The purpose of the price limit policy is to reduce the volatility of prices by setting limits on how much each stock can rise or fall on a daily basis. The price can move within 10% of the previous day's closing price, and a quotation that is outside the range will be invalid. These daily limits on rise or fall are called stock daily limit-ups and limit-downs.
To characterize the instability of the stocks, quarterly bivariate counts of stock daily limit-ups and limit-downs were collected over 49 seasons from the second quarter of 2007 to the second quarter of 2019 for 60 selected stocks. These 60 stocks were randomly selected from the CSI Small cap 500 index (CSI 500) index in order to make the selected samples generally representative of the market and make the conclusion of this research widely applicable. The data are available from http://www.sse.com.cn/ (accessed on 5 January 2020) and http://www.szse.cn/ (accessed on 5 January 2020). The parallel boxplots of this pair of quarterly counts of stock daily limit-ups and limit-downs are displayed in Figure 1 below. The quarterly counts of stock daily limit-ups range between 0 and 21, whereas the quarterly counts of stock daily limit-downs range between 0 and 18. All these counts are far below their ceiling counts of around 60 (trading days); thus, the ceiling counts are not necessarily a concern for our use of conditional Poisson for the quarterly counts [11].
To study the relationship between quarterly bivariate counts of stock daily limit-ups and limit-downs and basic characteristics of stocks that are prone to rise and fall limits, we also collected information on the following three variables quarterly: price-to-earnings ratio (PE), price-to-book ratio (PB) and price-to-sales ratio (PS). These three variables are common and important growth indicators reflecting stock fundamentals [12][13][14]. Our analysis results are expected to help managers and investors in risk management and investment decision-making.

The Model
Let Y i1t be the quarterly count of daily limit-ups and Y i2t the quarterly count of daily limit-downs, where i indexes one of these 60 stocks, i = 1, . . . , m = 60, and t = 1, . . . , T = 49 indexes the seasons under study with t = 1 corresponding to the second quarter of 2007 and t = 49 the second quarter of 2019. Let N it be the total number of limit-ups or limit-downs for the ith stock in the tth season. We model the quarterly counts of daily limit-ups and limit-downs jointly through a three-level Poisson mixed model as follows.

Assumption 1.
At the top level, we introduce a stock-specific random effect U i for each stock, i = 1, . . . , m. We assume that the U i 's are positive, independently and identically distributed with mean 1 and variance σ 2 .

Assumption 2.
At the second level, we introduce two sequences of random effects for quarterly counts of daily limit-ups and limit-downs of each stock, V i1t for the count of daily limit-ups Y i1t and V i2t for the count of daily limit-downs Y i2t . Denote U = (U 1 , U 2 , . . . , U T ) . Conditioning on U, we assume that these stock × season random effects are positive and identically distributed with This formulation of correlations of the random effects is general as it encompasses various correlation structures including unstructured, m-dependence, Toeplitz, exchangeable, etc. In this paper, we focus on the first-order autoregression (AR(1)), in Assumption 3. At the response level, we assume the quarterly counts of limit-ups and limitdowns are conditionally Poisson distributed, given the stock-specific random effects and the stock × season random effects.
. Specifically, we assume that counts Y i1t 's and Y i2t 's are conditionally independent and Poisson distributed with . . , z itp ) are known vectors of covariates, α and β are unknown regression coefficients. Here, we employ the same strategy as in Ma et al. [5] to incorporate covariates in the model. In general, z and x can be different, but they are the same in the analysis of this paper: Remarks.
(1) The proposed joint Poisson mixed model can accommodate both a random and zero number of trials. From Equation (1), given all the random effects, the total number of counts N it is also conditionally independent and Poisson distributed, Clearly, N it can take a value of zero with positive probability and thus, our model can handle the case that both the quarterly counts of limit-ups and limit-downs are zeros. This is advantageous to traditional logistic mixed models for the number of successes in which the portion of data with zero number of trials has to be excluded from the analysis.
(2) As each of the pair of Poisson mixed models is a generalization of a negative binomial model, our model can accommodate overdispersion and zero inflation.
(3) In Assumptions 1 and 2 above, we only assume the mean and variance structures of the random effects, without specifying any parametric form for their distributions. Furthermore, our estimation method to be discussed in the next section requires only these first two moments of the random effects. Thus, our model is robust against misspecified random effects distributions.
(4) Our model consolidates subject-specific and population-averaged inferences for the number of successes and for the number of failures. Under the model setup, the marginal means of the counts are Comparing Equations (1) and (3), it is clear that the regression parameters α and β can be interpreted marginally the same way as conditionally.
(5) The number of successes Y i1t is conditionally binomial, given the number of trials and the random effects: where Thus, the binomial probability p it is linear in the regression parameters β under the logit link as follows: Note that α does not appear in Equation (5); therefore, it is auxiliary in this induced binomial model.

Covariance Structure
We now give the moment structures of the random effects and responses which will be used in the model estimation; these can be obtained after some algebraic calculations by the method of conditioning on random effects. For ease of programming, we present some results in matrix forms.
In addition to the vectors introduced so far, let The means of the random effects and the responses are respectively, where 1 is a vector of ones. The variances of V i is where R j is the correlation matrix of V ij , j = 1, 2. The variance of Y i is where diag(µ i ) is the diagonal matrix of µ i . The covariances between the random effects and the responses are respectively.

Model Estimation
Similar to Ma et al. [10], we adopt an iterative EM-like algorithm for the model estimation. While updating a component in an iteration, which can be either a vector of random effects, regression parameters or dispersion parameters, we keep other unknowns at their current values. Thus, in the subsections below, we treat random effects and/or parameters as known except the ones under discussion.

Prediction of Random Effects
Let the inverse of Var(Y i ) be denoted by Var −1 (Y i ), i = 1, 2, . . . , m. The values of the random effects are updated by the orthodox BLUPs of the random effects as follows: where the terms in the equations are given in Equations (6)- (9). As pointed out in [10], the orthodox BLUPs minimize the mean squared distance between the random effects and their predictors within the class of linear functions of the responses. The estimating equations for the regression and random effects parameters can then be constructed based on these predictors.

Estimation of Regression Parameters
As in Ma et al. [5], we may rewrite Equation (1) into a single Poisson mixed model with regression parameters γ = (α , β ) as follows: where x i1t = (z it , x it ) and x i2t = (z it , 0 ) . Thus, we may adapt the orthodox BLUP approach in [10] to our model. We first differentiate the partially observed "joint" log-likelihood for the data and the random effects with respect to the regression parameters γ to obtain the partially observed "joint" score function. We then have an unbiased estimating equation for γ below, by replacing the random effects with their corresponding orthodox BLUP predictors: Following Ma et al. [10], Equation (13) can be solved iteratively using the Newton scoring algorithm [15] with γ being updated as follows: with the explicit expression of the sensitivity matrix given by where X is the design matrix formed by stacking x ijt 's, i.e., X i = (x i11 , . . ., x 11T , x i21 , . . ., x i2T ) . The optimality results in Ma et al. [10] still hold, i.e., under mild regularity conditions, the solution to Equation (13) is consistent and asymptotically normal with asymptotic mean γ and asymptotic variance given by the inverse of the negative sensitivity matrix S(γ).

Estimation of Random Effects Parameters
In this subsection, we present a moment approach to estimate the unknown random effects parameters σ 2 , τ 2 j and ρ j , j = 1, 2. Following Ma and Jørgensen [10], the dispersion parameter σ 2 for the stock-specific random effects can be estimated in terms of their corresponding orthodox BLUPsÛ i 's with a bias correction. After some algebraic calculation, the iterative equation for updating σ 2 can be expressed asσ where c i is a bias-correction term defined as withσ 2 r−1 as the estimate from the previous iteration. Similarly, the iterative equations for estimating τ 2 1 and τ 2 2 are given aŝ where the bias-correction term d ijt is expressed as For unstructured correlation structures, ρ j,(t,t ) can be estimated using an adjusted Pearson estimator aŝ where b ij,(t,t ) is the correction term which can be simplified as For various patterned correlation, we can obtain the patterned correlation matrix from Equation (17). To estimate ρ j under AR(1) structure, it would be sufficient to estimate the lag 1 (ρ 1 j = ρ j ) correlation only, which can be estimated aŝ

Computational Procedures
To start the estimating algorithm, we need some sensible initial values for the unknown parameters. We obtained the initial valuesγ 0 of the regression parameters by fitting a Poisson regression to Equation (12) ignoring the random effects. The initial valueρ j,0 of ρ j was taken to be the lag 1 sample correlation of the quarterly counts of daily limit-ups or limit-downs. Similarly, by treating appropriate averages of the counts as rough estimates of the random effects, we obtained the initial values of the dispersion parameters as the sample moments of the random effects. Specifically, the initial value of σ 2 was taken aŝ and the initial value of τ 2 j waŝ The algorithm was iterated as follows.
Step 2: At the rth iteration, (1) Update regression parameters γ by Equation (14), (2) Predict the random effects by their orthodox BLUPs given in Equations (10) and (11) Step 3: Repeat Step 2 until the sum of absolute changes in the parameters is below a prespecified threshold, for example, 10 −4 or 10 −7 .

Analysis of Quarterly Counts of Stock Daily Limit Ups and Limit Downs
We first present some descriptive statistics of the variables in Tables 2 and 3. From Table 2, the variables PB and PS are highly dispersed; for computational stability consideration, we divided all these three predictors PE, PB and PS by 100 in our analysis. In Table 3, we observe that the counts of limit-ups and limit-downs are positively correlated (r = 0.56), and that the counts are also positively correlated with PE and PB although the correlations are weaker. The trace plots of all these five variables, limit-ups, limit-downs, PE, PB and PS, are presented in Figure 2 for four stocks, one in each panel. There seems to be some evidence that PE and PB follow the same temporal pattern. Furthermore, PE looks different than PB and PS.
We fitted the proposed model to the stock instability data. The parameter estimation results are presented in Table 4.
The price-to-book ratio variable PB is defined as the stock price divided by the net asset value per share. PB represents the intrinsic value of the stock. The higher the PB is, the lower the intrinsic value of the stock is. The estimated effect of PB on the quarterly count of daily limit-ups was β * 1 = 0.0345 and significant, with a p-value 0.0200 < 0.05, whereas the estimated effect of PB on the quarterly count of daily limit-downs was α 1 = 0.0308 but insignificant, with a p-value 0.2070 > 0.05. Furthermore, the estimated effect of PB on the corresponding binomial proportion was β 1 = 0.0038, but highly insignificant, with a p-value of 0.8938. In other words, only the quarterly count of daily limit-ups was affected by PB significantly, whereas neither the quarterly count of daily limit-downs nor the corresponding binomial proportion was affected by PB significantly.
The price-to-earnings ratio variable PE is defined as the price of a stock divided by the earnings per share. PE represents the valuation of the stock: the higher the PE, the higher the valuation of the stock. The estimated effects of PE on both quarterly counts of daily limit-ups and limit-downs were positive and highly significant. More specifically, the corresponding regression parameters for quarterly numbers of limit-ups and limitdowns were estimated as β * 2 = 9.2711 and α 2 = 5.9616 with p-values of 0.0000 and 0.0000, respectively. Furthermore, the estimated effect of PE on binomial proportion was also positive and highly significant with β 2 = 3.3095 and a p-value of 0.0080. That is, with PB and PS being held constant, as PE increased, both quarterly counts of daily limit-ups and limit-downs tended to increase significantly, and so did the corresponding binomial proportion. In other words, as PE increased, the quarterly count of daily limit-ups tended to increase faster than the quarterly count of limit-downs. The price-to-sales ratio variable PS was defined as the price of a stock divided by the sales per share. None of the estimated effects of PS on the quarterly count of daily limit-ups, the quarterly count of daily limit-downs and the corresponding binomial proportion were significant; these insignificance results of PS were also in agreement with the close-to-zero sample correlations with the counts of limit-ups (r = −0.01) and limit-downs (r = −0.01) in Table 3.
The stock-specific random effects helped characterize the positive association between quarterly counts of daily limit-ups and limit-downs. The higher the stock-specific random effects, the higher the frequencies of both quarterly counts of daily limit-ups and limitdowns; therefore, the stock-specific random effects reflected stock instabilities. Thus, we term stock-specific random effects as stock-specific instabilities hereafter. We present the parallel box plots of the predicted stock-specific random effects by industry in Figure 3 to assess stock-specific instabilities. First, the mining industry tended to have much higher stock instabilities than other industries. Second, the production and supply of electricity, construction, manufacturing, service and pharmaceutical industries tended to have relatively lower stock instabilities. Third, the IT industry had a wide range of stock instabilities from very low to very high, whereas the financial industry tended to have higher than average stock instabilities with much narrower range. Finally, the logistics industry tended to have very low stock instabilities with an exception.

Discussion
For longitudinal binomial data, we proposed a joint Poisson mixed model for the number of successes and the number of failures when their association was positive over time. Its usefulness and advantages were demonstrated through the analysis of quarterly data of stock daily limit-ups and limit-downs. Compared with traditional logistic mixed models, the proposed joint model could still assess covariate effects on the binomial proportions through the induced binomial model (see Equation (5)), while the influence of a random number of trials was incorporated. In addition, the ability to allow for zero number of trials was also an advantage as excluding this portion of data might lead to biased inferences on the binomial proportion. Furthermore, the predicted stock-specific random effects enabled the assessment of stock instabilities by industry.
In accordance with the proposed estimation method, we did not specify a parametric form for the random effects distributions. While this formulation enjoys the property of robustness against misspecified random effects, it is in principle straightforward to assume some parametric distributions for the random effects and estimate the parameters using alternative methods. For example, the hierarchical nature of the model makes it easy to implement in Bayesian computing package JAGS or OpenBUGS, which only requires users to specify the model structure.
Finally, the proposed joint Poisson mixed model can be easily extended to longitudinal multinomial data by directly modelling counts of each category with a Poisson mixed model.  Data Availability Statement: The data are available from http://www.sse.com.cn/ (accessed on 5 January 2020) and http://www.szse.cn/ (accessed on 5 January 2020).