Abstract
The density ratio model has been widely used in many research fields. To test the homogeneity of the model, the empirical likelihood ratio test (ELRT) has been shown to be valid. In this paper, we conduct a parametric test procedure. We transform the hypothesis of homogeneity to one on the equality of mean parameters of the exponential family of distributions. Then, we propose a modified Wald test and give its asymptotic power. We further apply it to the semicontinuous case when there is an excess of zeros in the sample. The simulation studies show that the new test controls the type-I error better than ELRT while retaining competitive power. Benefiting from the simple closed form of the test statistic, the computational cost is small. We also use a real data example to illustrate the effectiveness of our test.
Keywords:
density ratio model; homogeneity test; multiple semicontinous data; exponential family of distributions MSC:
43T50
1. Introduction
The density ratio model (DRM) was first introduced by Anderson [1] and later popularized by Qin and Zhang [2], who found the relationship between the two-sample DRM and the logistic regression model in case–control studies. The DRM models in a semi-parametric way the difference between two independent samples. Assume that and are two samples independently drawn from two cumulative distribution functions and . The DRM postulates that
where is a d-dimensional pre-specified basis function while and are unknown parameters. We can also generalize the DRM to the sample case as follows
where
for . Even though the form of is unspecified, many parametric distribution families are in the DRM, including normal, exponential, and gamma distributions, among others.
Due to its flexibility and utility, increasing importance has been attached to the DRM. Zhang [3] proposed a weighted Kolmogorov–Smirnov type statistic to test the validity of the DRM based on case–control data. Qin [4] and Zou et al. [5] applied the DRM to the semi-parametric mixture model and developed test statistics based on the empirical likelihood function. Zhang [6] induced the quantile estimator under a two-sample semi-parametric model and Chen and Liu [7] generalized the estimator to the -sample case. Another problem of interest is to test the homogeneity of the DRM model, that is, to test whether . Fokianos et al. [8] outlined a method based on the classical normal-based one-way analysis of variance. Cai et al. [9] studied the properties of the dual empirical likelihood ratio tests to general hypotheses on parameters. Moreover, let be the initial cumulative distribution function (cdf) of a population, and be the cdf of the weighted distribution of , so that their densities are connected to each other as follows,
Then, , in the context of the DRM, seems to be , and X is a random variable with density . Thus, the DRM lies in the context of weighted distributions which have many applications in various fields. The problem of detecting or estimating the weight function is of interest in the framework of weighted distributions; see Patil and Rao [10], Rao [11,12] and Lele and Keim [13].
Recent research on the DRM mainly considered using the empirical likelihood function. We give a brief introduction to this method below. Given and , the likelihood function of the model (2) has the form
If is restricted to a discretized distribution as
where is constrained by
for . Then, the Lagrangian multipliers described in Qin and Lawless [14] are used to obtain the maximum empirical likelihood estimate of . However, the type-I error of the empirical likelihood ratio test cannot be well controlled in finite samples. To deal with this problem, Wang et al. [15] suggested using a nonparametric bootstrap procedure. However, the computational cost of the bootstrap procedure is non-negligible, especially when m is large.
We also notice that there is increasing interest in the case when there are zero values in the samples. This phenomenon happens in many research fields such as meteorology, health, economics, and life sciences; see Tu and Zhou [16], Muralidharan and Kale [17] and Kassahun-Yimer et al. [18]. For example, in the meteorology study, a group of zero observations may correspond to a number of dry days when there are no rainfall measurements recorded. Another example happens in dietary intake studies, where zero observations may occur for some food components that are consumed episodically. In the examples mentioned above, samples are constructed from two parts. One is the zero observations and the other is the positive observations. This kind of distribution is also called a semicontinuous distribution, which has the form
where p indicates the probability of drawing a zero observation and is a positive and continuous distribution. We recommend the reviews of Neelon et al. [19,20] for more details. In this paper, we adopt the DRM, as the choice of benefits from the advantages we introduced above. Thus, the model becomes
where
for , where I is the indicator function.
A two-part test is proposed to test the homogeneity of the model (3), which is a fundamental problem in real applications. For example, the different distributions of precipitation in certain areas among years may influence the strategy of agricultural irrigation. Furthermore, in colorectal cancer clinical trials, it is important to compare the efficacy and safety between two or more treatment arms; see Lachenbruch [21], Su et al. [22], Smith et al. [23] and Wang and Tu [24]. The two-part test consists of a test for the binomial distribution and another for the continuous responses. For the two-sample case, Wang et al. [15] suggested that the former test is a test while the latter can be a Wilcoxon–Mann–Whitney rank-sum test or a two-sample t-test. For the -sample case, the latter can be replaced by a Kruskal–Wallis rank-sum test or an ANOVA F-test; see for example, Wilcox [25], Hallstrom [26] and Pauly et al. [27]. However, as far as we are concerned, the tests mentioned above may perform badly in heteroskedastic cases.
In this paper, we propose an efficient method based on the exponential family of distributions. First, the problem of testing the homogeneity is transformed to testing the equalities of the mean parameters. Secondly, a Wald test statistic is proposed to test the equalities. Since is unknown, we modify the Wald test statistic based on the sample from . This modified statistic has a simple closed form and we show that it converges in distribution to the distribution under the null hypotheses. We also give the local asympotical power. Thirdly, the Bernoulli distribution can be regarded as a DRM and we obtain the combined modified Wald test for the semicontinuous case. Finally, the simulation studies illustrate that the computational cost of the modified Wald test is much less than the bootstrap procedure, while it always controls type-I error better than the empirical likelihood ratio test. Moreover, the power of the modified Wald test is competitive.
The rest of the paper is organized as follows. In Section 2, we propose the method for testing the homogeneity of the two-sample model for both continuous and semicontinuous distributions. In Section 3, we generalize the result to multiple-sample cases. We illustrate the performance of the modified Wald test and compare it with the empirical likelihood ratio test through simulations in Section 4. We consider a real data sample to show the practicability of our method and give the conclusions in the last section.
2. Two-Sample Case
2.1. Density Ratio Model
In this section, we assume that and are the two independent samples drawn from and , respectively. It is further assumed that for certain d-dimensional ,
where and are the density of and with respect to a -finite measure , respectively. The hypotheses for testing the homogeneity are
Since is a density function, we have
Hence, there is a function such that
Then,
Construct an exponential family of distributions
where
is the natural parameter space. Under the family , the hypotheses (4) are equivalent to
For family , we give two simple assumptions.
Assumption 1.
is a full-rank exponential family of distributions.
Then, under Assumption 1, the Fisher information matrix of is positively definite and continuous. By the properties of the exponential family,
for an interior point of .
Assumption 2.
The origin is an interior point of .
Although always because is a density, it may not be an interior point. For example, if , and , the density of the standard normal distribution, then .
Hypotheses (6) are expressed by the nature parameter of . We further want to represent them with the mean parameter of , which is defined as
The following lemma is demanded.
Lemma 1.
Under Assumptions 1 and 2, if and only if .
The proof is given in Appendix A.
Lemma 1 shows that the hypotheses (6) are equivalent to
First, consider the case where is known. Based on the data , the maximum likelihood estimator of is
The Wald test statistic of hypotheses (7) is then
When , by the central limit theorem, we have
where is the convergence in the distribution. Then, . The Wald test with significance level can be obtained by the critical region
where denotes the -quantile of the .
However, the test (9) is not applicable when is unknown, because and in are unknown. Fortunately, we have sample from , which can be used to estimate and instead. The estimators are
Then, the test statistic (8) can be modified to
We refer to this statistic as a modified Wald statistic.
Notice that the two populations are the same under the null hypothesis, let
then, we can use
as an estimate of and obtain , which is
Assumption 3.
Let . When ,
Theorem 1.
Assume that the Assumptions 1–3 hold. Then,
- 1.
- Under in (7),
- 2.
- Take , . Under this alternative,where , the non-central parameter.
The proof is given in Appendix A.
Now, the modified Wald test with level is determined by the critical region
The local asymptotic power of the modified Wald test is given by
where . Since , is maximized at , i.e, . Furthermore, the power increases in .
Remark 1.
The distributions we consider in the next subsection are semicontinuous, where the data are one-dimensional and non-negative. However, Theorem 1 holds for in which the supports of the distributions can be either multivariate or negative.
2.2. Semicontinuous Data
In this subsection, we consider the case when both populations are semicontinuous. Specifically, assume that the two independent samples and are drawn from and , respectively, where
The distributions and satisfy (1) and the supports of them are in . Denote the densities of them by and . Then, the hypotheses for testing homogeneity are
Let and be the numbers of zero observations and let and be the numbers of non-zero observations in two populations, respectively. Without loss of generality, assume that the first of and of are non-zero. Then, the estimates of and are
A natural test statistic for is
Then, the two-part test statistic is a combination of test statistics (16) and (11), which is
where
and
Corollary 1.
Assume that Assumptions 1–3 hold and . Then,
- 1.
- Under in (14),
- 2.
- Take , , , under this alternative,wherethe non-central parameter.
The proof is given in Appendix A.
Now, the modified Wald test with level is determined by the critical region
The local asymptotic power of the modified Wald test is given by
where . Interestingly, although the numbers of non-zero observations in two samples are random, the non-central parameter
as in Theorem 1 (2).
3. Multiple Sample Case
In this section, we generalize the conclusions in the last section to the cases when there are more than two populations. Similarly, we first study the case when all the populations are DRM. Then, we move on to the semicontinuous case.
3.1. Density Ratio Model
Assume that are samples independently drawn from the distributions . Let be the density of . Then, the density function of satisfies
where . is known. , and are unknown parameters. For convenience, we also define and . As in Section 2.1, there exists a function such that
for . Then, to test the homogeneity of the DRM is equivalent to testing
With Lemma 1, testing the homogeneity is equivalent to testing
Based on the sample , the MLE of the mean vector is
Then, under , by the central limit theorem, we have
We can construct the test statistic as
Then, by the independence of , this statistic is converging in distribution to a distribution with degrees of freedom, that is,
When is unknown, and and cannot be computed directly. Analogously, the estimates of them using the samples and are
where
and . Then, the test statistic (22) is estimated by
However, the statistic above may not converge in distribution to since there is in all the terms of (23). So, we construct a modified test statistic as
where .
Assumption 4.
When ,
Theorem 2.
Assume that Assumptions 1, 2, and 4 hold. Then,
- 1.
- Under in (21),
- 2.
- Take , , . Under this alternative,where
The proof is given in Appendix A.
Now, the modified Wald test with level is determined by the critical region
The local asymptotic power of the modified Wald test is given by
where .
Remark 3.
When , . In this case, δ becomes
This means that δ is maximized at .
Remark 3 above can be naturally generalized to the following question. When the total sample size n is fixed, how to arrange to maximize the local power? To solve this problem, we first let
and
3.2. Semicontinuous Data
Now, we consider the model (3) where the populations are semicontinuous. Assume that is drawn from
Let and be the numbers of zero and non-zero observations . Without loss of generality, assume that the first samples of are non-zero. The densities of are denoted by and satisfy
where and . From the continuous case considered in the last subsection, the hypotheses of testing the homogeneity are equivalent to
The test for homogeneity of the continuous part is considered in the last subsection. The remaining task is to test the homogeneity of binomial distributions. The hypotheses are
As a proof of Corollary 1, the Bernoulli distributions can be expressed as a DRM, where
and . Then, the MLE of is
The Fisher information is estimated by
where
Then, we can construct the test statistic for the binomial part using Theorem 2.
Finally, we combine the two test statistics together to obtain the test statistic for the semicontinuous case. Let
and
where
Then, the test statistic for the semicontinuous case is
Corollary 2.
Assume that Assumptions 1, 2, and 4 hold and . Then,
- 1.
- Under in (27),
- 2.
- Take , , , . Under this alternative,where
The proof is given in Appendix A.
Now, the modified Wald test with level is determined by the critical region
The local asymptotic power of the modified Wald test is given by
where .
4. Simulation Study
In our simulations we make comparison between three tests. In addition to the modified Wald test we proposed, denoted by “MWT”, the others are the dual empirical likelihood ratio test proposed by Cai et al. [9] and the empirical likelihood ratio test using the bootstrap procedure proposed by Wang et al. [15], which are denoted by “DELRT” and “BELRT”, respectively. We hope to show that our modified Wald test is available for different cases. In the first simulation study, we illustrate the case when the number of populations is large. We compare the performances and computational costs of the three tests. It can be seen that MWT controls the type-I error better than DELRT while taking much less time than BELRT. In the second one, we look into three normal distributions with the same scale and study how the tests perform with the change in location parameter. This means that the three populations vary from the same to totally different. We can clearly see from Figure 1 how the three tests perform. In the third simulation study we hope to verify Remark 3 in our context, which shows an interesting phenomenon of the power effected by sample sizes under certain alternative hypotheses. In the last one, we consider the semicontinuous case when the continuous part is either log-normal or a gamma distribution. The same parameter settings are also considered by Wang et al. [15]. From Figure 2 and Figure 3, we can show that our method is competitive.
4.1. Scenario 1
We consider the DRM when , and 11. Let be the standard normal distribution while the rest are the normal distribution with scale fixed to 1 and location fixed to . We consider the cases when . We choose the same sample size and 50 for all the populations and generate repetitions for each situation with different m and . Then, we calculate the type-I error of the three statistics when and the power of them when at the 5% significance level. The results are shown in Table 1 and Table 2, respectively.
Table 1.
Type-I error and power of the three test statistics for different and when the sample size is 30.
Table 2.
Type-I error and power of the three test statistics for different and when the sample size is 50.
It can be seen that the type-I error of DELRT is not as well controlled as the other two. The type-I error and the power of MWT is similar to that of BELRT. However, the computational cost of MWT is much smaller. For the DELRT and the modified Wald test, realizing a repetition of when needs no more than 40 s. However, for the bootstrap procedure when , it takes nearly 4 h using the “for” loop in the R programming language to realize a single repetition of when and 12 h when . When it comes to , it took nearly a whole day. Certainly we can use some parallel computational methods to accelerate the computation, but the running time is still a big challenge. The modified Wald test statistic we proposed seems to be a promising compromise, especially when the number of the population is large. It controls the type-I error better than DELRT while retaining a similar computational cost.
4.2. Scenario 2
In the second simulation study, we show how our test statistic performs in the case of three continuous populations. We choose the three populations as normal distributions with the scale equal to 1. The location parameters of the three are set to be , 0, and . Then, we change from 0.2 to to see how our test statistic performs when the three distributions vary from “similar” to “totally different”. We consider the case with equal sample sizes , and 50, . For each sample size, we consider , and . We generate M = 10,000 repetitions for each case and show the comparison of the three statistics in Table 3 and Figure 1. In this figure, “MWT”, “DELRT”, and “BELRT” denote the modified Wald test, dual empirical likelihood ratio test, and bootstrap empirical likelihood ratio test, respectively.
Table 3.
Type-I error and the power of the three statistics in the case of three populations.
Figure 1.
Type-I error and power (%) of the three statistics in simulation two for different sample sizes.
It can be seen that the modified Wald test can control the type-I error nicely in this case, even when the sample size is small. The power of the Wald test is always smaller than that of the DELRT due to the better control of the type-I error. However, the disparity is gradually eliminated with the increase in the sample size and the differences between the populations.
4.3. Scenario 3
In this simulation study, we verify the conclusion in Remark 3. The total sample size n is fixed and and 4 are under consideration. We choose different for both cases and compare the power for different sample sizes. We fixed to , , and . The rest are chosen to be the same distribution corresponding to with different , and 0.7 for normal and log-normal cases and , and 1.6 for the location parameter in gamma’s case. For each different sample size and , we generalize M = 100,000 repetitions and calculate the power. The details are given in Table 4 and Table 5. The symbols I to VIII in Table 5 denote different sample sizes which are shown in Table 6.
Table 4.
The power of testing at significance level 0.05 for different sample sizes and when .
Table 5.
The power of testing at significance level 0.05 for different sample sizes and when .
Table 6.
The different settings of in Table 5.
It can be seen that the conclusion in Remark 3 holds basically. It is obviously that has the biggest impact on the power while the rest of the sample sizes do not seem to have much influence. This can be seen quite clearly from the comparison of the first four sample sizes in the three-sample case and case I and II, and case V and VI in the five-sample case.
4.4. Scenario 4
In this simulation study, we consider the semicontinuous case. We adopt the same parameter settings as in Wang et al. [15]. Assume that the samples are generated from
for , where ’s are all log-normal or gamma distributions. The parameters of are present in Table 7. Each of LN–LN and GAM–GAM in the first column denotes a mixture model whose continuous part follows a log-normal or gamma distribution. denotes the probability of drawing a zero observation for . LN denotes a log-normal distribution whose associated normal distribution has the mean and variance . GAM denotes a gamma distribution with shape parameter and scale parameter . We consider both the equal sample sizes where and the unequal sample size where . For every parameter setting, we generate M = 10,000 repetitions. We calculate the type-I error of testing homogeneity at 5% significance level for LN–LN and GAM–GAM, and the power of that for the rest of the parameter settings. The type-I errors of the three statistics are shown in Table 8 while the powers are shown in Table 9 and Table 10, respectively, for the log-normal and the gamma cases. To have a better view of them, we show the powers of the three statistics in Figure 2 and Figure 3. It can be seen that the results are competitive.
Table 7.
Parameter settings for simulation study 3.
Table 8.
Type I error rates (%) for testing at significance level 0.05 when data are generated from LN–LN and GAM–GAM in Table 7.
Table 9.
Power (%) for testing at significance level 0.05 when data are generated from LN–LN in Table 7.
Table 10.
Power (%) for testing at significance level 0.05 when data are generated from GAM–GAM in Table 7.
Figure 2.
Power (%) for testing at significance level 0.05 when data are generated from LN–LN in Table 7.
Figure 3.
Power (%) for testing at significance level 0.05 when data are generated from GAM–GAM in Table 7.
5. Real Data Sample
In this section, we employ the real data example suggested by Wang et al. [15] which is available from the website of the University of Waterloo weather station data archive (http://weather.uwaterloo.ca/data.html, accessed on 1 June 2023). We focus on the data that records the daily precipitation measurements (in millimeters) in the North Campus of the University of Waterloo, Canada and investigate whether the precipitation distribution has changed over the past few years.
Benefiting from what Wang et al. [15] has previously reported, to reduce the time dependence among the observations, we take every fourth measurement into our analysis, i.e., only use the observations on days 1, 5, 9, …, 361, which gives a sample size of 91 for each sample. Then, we consider two cases, one is from 2003 to 2006 and the other from 2008 to 2012, we hope to obtain some information about the changing of the precipitation distribution in the last few years. Some summaries of the samples are given below
- From 2003 to 2006, the estimates of the probability of dry days are (0.30, 0.40, 0.42, 0.42) while those of 2008 to 2012 are (0.45, 0.49, 0.43, 0.38, 0.40).
- The sample means of 2003 to 2006 are (2.05, 3.54, 3.40, 3.50) while those of 2008 to 2012 are (3.42, 1.37, 2.29, 4.08, 3.09).
- The sample variances are (17.52, 41.07, 76.10, 59.50) and (95.19, 13.53, 18.35, 73.83, 59.76), respectively.
For each null and alternative hypothesis, we fit the data to both the log-normal and the gamma mixture under the assumption of the density ratio model using the maximum likelihood estimate. The details are give in Table 11 below. There is a small difference between the parameters of ours and Wang et al. [15], this may be caused by the mistake when summarizing the data of the year 2003. LN and GAM are the parameters under the null hypothesis of the case of 2003 to 2006, while LN and GAM are those of 2008 to 2012. The rest of the parameters are for the alternative hypotheses.
Table 11.
The parameter settings for the null and alternative hypothesis for testing homogeneity.
We apply the modified Wald test on the null hypotheses LN and GAM, respectively. The test statistic is 21.65 for the log-normal mixture and 24.02 for the gamma mixture. Both statistics are larger than the 0.05% quantile of , which is 15.51. The null hypothesis should be rejected at the significance level 0.05. We then move on to the case of 5 years. This time the result becomes quite different. The test statistic for LN is 11.70, while that for GAM is 9.95, this is smaller than the 0.05% quantile of , which is 18.3074, which means that the null hypothesis is true at the significance level 0.05. The two simulations above indicate that the precipitation distribution of the area was changing from 2003 to 2006, but may have remained unchanged over 2008 to 2012.
6. Conclusions
In this paper, we propose a modified Wald test for homogeneity of the density ratio model. Since the density functions are unknown, recent works mainly focus on the empirical likelihood ratio test, which is a nonparametric method. We transform the problem of testing homogeneity to testing the equalities of the mean parameters of the exponential family of distributions. Then, we propose a modified Wald test, which is a parametric method. The simulations show that the type-I error of the modified Wald test is smaller than that of the empirical likelihood ratio test. Since the modified Wald test statistic converges in distribution to the distribution, it can further be applied to the semicontinuous data. It should be noticed that for the DRM, we test hypotheses . This can be generalized to test hypotheses .
Author Contributions
Conceptualization, X.X.; methodology, X.X.; software, Y.W.; validation, Y.W. and X.X.; formal analysis, Y.W. and X.X.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W.; visualization, Y.W.; supervision, X.X.; project administration, X.X.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China under grant no. 11471030 and 11471035.
Institutional Review Board Statement
The study did not require ethical approval.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Appendix A
Proof of Lemma 1.
We only need to prove that for two parameters and , the equation holds only if . Assume that . Let
The derivative of is
Since , . Then, is a strictly increasing function. However, it is easy to compute that when ,
This is a contradiction. Hence, . Then, the lemma is proved by letting and . □
Proof of Theorem 1.
- As , by Assumption 3, . Hence, under ,By Assumption 1, . Thus,Then,Again by Assumption 3 and ,
- The Taylor expansion of isThen,By Assumption 1,This means thatBy Assumption 2, . Then,As in the proof of (1), we have
□
Proof of Corollary 1.
- First, we show that the Bernoulli distributions can be expressed as a DRM. LetThen,Thus,whereand . Thus, by Theorem 1, the binomial test converges in distribution to .For the continuous test, by Assumption 3 and ,with the probability tending to 1. Then, as in the proof of Theorem 1,Then, by the independence of the two test statistics, we have
- Since , then by Theorem 1, for the binomial part,whereNotice that for a fixed ,Since , . Then,Similarly,Thus, in the same way as in the proof of Theorem 1 we can obtainwhereThen by independence,
□
Proof of Theorem 2.
- LetFurthermore we defineWhen the null hypothesis is true, by the independence of for , we havewhere , is the -order identity matrix and ⊗ is the Kronecker product.We further defineThen, left multiply (A2) by and we obtainBy Assumption 4, when , and converge to a and , respectively, that is,Letwe haveThen,Since and converge to a and , respectively, when , the test statisticalso converges in distribution to when .Then, the test statistic (A4) is rewritten asPutting , and into the formula we obtain
- Under the alternative, by Theorem 1,Thus,We can obtain the expression of in the same way as in the proof of (1), that is
□
Proof of Corollary 2.
- From the construction of (28) and Theorem 2, it is easy to prove that . Then, by the independence of the two test statistics,
- Since , then by Theorem 2,whereSincethenAs with the test statistic for the continuous part, we can prove thatSince , . Then,Thus, in the same way as in proof of Theorem 2 we obtainwhereThus, by independence,
□
References
- Anderson, J.A. Multivariate logistic compounds. Biometrika 1979, 66, 17–26. [Google Scholar] [CrossRef]
- Qin, J.; Zhang, B. A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 1997, 84, 609–618. [Google Scholar] [CrossRef]
- Zhang, B. Assessing goodness-of-fit of generalized logit models based on case-control data. J. Multivar. Anal. 2002, 82, 17–38. [Google Scholar] [CrossRef]
- Qin, J. Empirical likelihood ratio based confidence intervals for mixture proportions. Ann. Stat. 1999, 27, 1368–1384. [Google Scholar] [CrossRef]
- Zou, F.; Fine, J.P.; Yandell, B.S. On empirical likelihood for a semiparametric mixture model. Biometrika 2002, 89, 61–75. [Google Scholar] [CrossRef][Green Version]
- Zhang, B. Quantile estimation under a two-sample semi-parametric model. Bernoulli 2000, 6, 491–511. [Google Scholar] [CrossRef]
- Chen, J.; Liu, Y. Quantile and quantile-function estimations under density ratio model. Ann. Stat. 2013, 41, 1669–1692. [Google Scholar] [CrossRef]
- Fokianos, K.; Kedem, B.; Qin, J.; Short, D.A. A semiparametric approach to the one-way layout. Technometrics 2001, 43, 56–65. [Google Scholar] [CrossRef]
- Cai, S.; Chen, J.; Zidek, J.V. Hypothesis testing in the presence of multiple samples under density ratio models. Statist. Sin. 2017, 27, 761–783. [Google Scholar] [CrossRef]
- Patil, G.P.; Rao, C.R. Weighted Distributions and Size-Biased Sampling with Applications to Wildlife Populations and Human Families. Biometrics 1978, 34, 179–189. [Google Scholar] [CrossRef]
- Rao, C.R. Weighted Distributions Arising Out of Methods of Ascertainment: What Population Does a Sample Represent? In A Celebration of Statistics; Springer: New York, NY, USA, 1985; pp. 543–569. [Google Scholar]
- Rao, C.R. On Discrete Distributions Arising out of Methods of Ascertainment. Sankhyā Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
- Lele, S.R.; Keim, J.L. Weighted distributions and estimation of resource selection probability functions. Ecology 2006, 87, 3021–3028. [Google Scholar] [CrossRef]
- Qin, J.; Lawless, J. Empirical likelihood and general estimating equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
- Wang, C.; Marriott, P.; Li, P. Testing homogeneity for multiple nonnegative distributions with excess zero observations. Comput. Stat. Data Anal. 2017, 114, 146–157. [Google Scholar] [CrossRef]
- Tu, W.; Zhou, X.H. A Wald test comparing medical costs based on log-normal distributions with zero valued costs. Stat. Med. 1999, 18, 2749–2761. [Google Scholar] [CrossRef]
- Muralidharan, K.; Kale, B.K. Modified Gamma distribution with singularity at zero. Commun. Stat.-Simul. Comput. 2002, 31, 143–158. [Google Scholar] [CrossRef]
- Kassahun-Yimer, W.; Albert, P.S.; Lipsky, L.M.; Nansel, T.R.; Liu, A. A joint model for multivariate hierarchical semicontinuous data with replications. Stat. Methods Med. Res. 2019, 28, 858–870. [Google Scholar] [CrossRef]
- Neelon, B.; O’Malley, A.J.; Smith, V.A. Modeling zero-modified count and semicontinuous data in health services research part 1: Background and overview. Stat. Med. 2016, 35, 5070–5093. [Google Scholar] [CrossRef]
- Neelon, B.; O’Malley, A.J.; Smith, V.A. Modeling zero-modified count and semicontinuous data in health services research part 2: Case studies. Stat. Med. 2016, 35, 5094–5112. [Google Scholar] [CrossRef]
- Lachenbruch, P.A. Analysis of data with excess zeros. Stat. Methods Med. Res. 2002, 11, 297–302. [Google Scholar] [CrossRef]
- Su, L.; Tom, B.D.M.; Farewell, V.T. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 2009, 10, 374–389. [Google Scholar] [CrossRef]
- Smith, V.A.; Preisser, J.S.; Neelon, B.; Maciejewski, M.L. A marginalized two-part model for semicontinuous data. Stat. Med. 2014, 33, 4891–4903. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Tu, D. A bootstrap semiparametric homogeneity test for the distributions of multigroup proportional data, with applications to analysis of quality of life outcomes in clinical trials. Stat. Med. 2020, 39, 1715–1731. [Google Scholar] [CrossRef]
- Wilcox, R.R. ANOVA: A Paradigm for Low Power and Misleading Measures of Effect Size? Rev. Educ. Res. 1995, 65, 51–77. [Google Scholar] [CrossRef]
- Hallstrom, A.P. A modified Wilcoxon test for non-negative distributions with a clump of zeros. Stat. Med. 2009, 29, 391–400. [Google Scholar] [CrossRef] [PubMed]
- Pauly, M.; Brunner, E.; Konietschke, F. Asymptotic permutation tests in general factorial designs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 2015, 77, 461–473. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).