Abstract
The problem of examining how well the data fit a supposed distribution is very important, and it must be confirmed prior to any data analysis, because many data analysis methods assume a specific distribution of data. For this purpose, histograms or Q-Q plots are employed for the assessment of data distribution. Additionally, a GoF TstS utilizes distance measurements between the empirical distribution function and the theoretical cumulative distribution function (cdf) to evaluate data distribution. In life-testing or reliability studies, the observed failure time of test units may not be recorded in some situations. The GoF TstSs for completely observed data can no longer be used in progressive type II censored data (PrCsD). In this paper, we suggest a GoF TstSs and new plot method for the GoF test of symmetric and asymmetric location-scale distribution (LoScD) based on PrCsD. The power of the suggested TstSs is estimated through Monte Carlo (MC) simulations, and it is compared with that of the TstSs using the order statistics (OrSt). Furthermore, we analyzed real data examples (symmetric and asymmetric data).
1. Introduction
The problem of examining how well the data fit a supposed distribution is very important, and it must be confirmed prior to any data analysis, because many data analysis methods assume a specific distribution of data. The evaluation of the GoF TstS for a statistical model involves assessing how effectively it aligns with a given set of observations. Measures quantifying GoF typically synthesize the disparities between observed values and those anticipated under the model that is being considered. Usually, histograms or Q-Q plots are employed for the assessment of data distribution. Additionally, a GoF TstS utilizes distance measurements between the empirical distribution function and the theoretical cumulative distribution function (cdf) to evaluate data distribution. In this situation, we reject the null hypothesized distribution if the distance is far in some cases.
In life-testing or reliability studies, the observed failure time of test units may not be recorded in some situations. Furthermore, there are situations wherein the removal of units prior to failure is pre-planned in order to reduce the cost or time associated with testing. Among the censoring methods, progressive Type II censoring schemes (PrCs) have become quite popular in life-testing or reliability studies. The PrCs arises in life-testing or reliability studies as follows. Randomly, surviving test units are removed from the test after observed the 1st failure unit. Moreover, randomly, surviving test units are removed from the test after the observed 2nd failure unit. Finally, all the remaining test units () are removed from the test after the observed mth failure unit. In PrCs, we suppose that integer m and are pre-assigned. Moreover, the ordered observed times for failure units () are referred to as PrCsD (Ref. [1]). Recently, some studies on PrCs were carried out by many authors (Refs. [2,3,4,5,6,7]). Ref. [2] discussed the estimation of reliability in a multi-component stress–strength model for a general class of inverted exponentiated distributions under PrCs. Ref. [3] discussed classical and Bayesian estimation of the inverse Weibull distribution using PrCs. Ref. [4] discussed applying transformer insulation using Weibull extended distribution based on PrCs. Ref. [5] discussed inference on maintenance service policy under step-stress partially accelerated life tests using PrCs. Ref. [6] discussed monitoring the Weibull shape parameter (ShPm) under PrCs in presence of independent competing risks. Ref. [7] discussed the analysis of gamma distribution under PrCs.
The GoF TstSs for completely observed data can no longer be used in PrCsD. For this reason, the GoF test based on PrCsD has received attention from authors (Refs. [8,9,10,11,12,13,14,15,16,17]). Ref. [8] proposed a GoF test for the exponential distribution (ExpD) based on PrCsD using spacings for PrCsD. Ref. [9] proposed approximate GoF tests for LoScD based on PrCsD using empirical distribution function. Ref. [10] proposed GoF tests for LoScD based on PrCsD using spacings for PrCsD. Recently, in Ref. [11], Lee and Lee (2019) proposed a GoF tests and plot method for LoScD based on PrCsD using generalized LrCv. Ref. [12] proposed a GoF test for inverse Rayleigh distribution based on PrCsD using entropy. Ref. [13] proposed a GoF test based on the Gini index of spacings for PrCsD. Ref. [14] proposed a GoF test for ExpD based on general PrScD using spacings for PrCsD. Ref. [15] proposed a GoF test for inverse Weibull distribution based on PrScD using OrSt. Ref. [16] proposed a GoF test for distribution based on type I left censored data. Ref. [17] proposed a GoF test for Rayleigh distribution based on censored data.
While numerous GoF TstS under PrCsD have been proposed in the literature for various distributions, to the best of our knowledge, these tests do not encompass both the TstS and graphical methods. This motivates us to develop new GoF TstS and graphical methods for LoScD for PrCsD. In this paper, therefore, we suggest a GoF TstSs and new graphical method for the GoF test of LoScD based on PrCsD. The rest of this paper is organized as follows. The introduction of the LrCv is presented in Section 2. In Section 3, we propose a GoF TstSs and a new plot method for the GoF test that uses a LrCv. In Section 4, the power of the suggested TstSs is estimated through MC simulations, and it is compared with that of the TstSs proposed by Ref. [10]. In Section 5, we analyze two examples (real data sets). Finally, in Section 6, we present the conclusion.
2. Lorenz Curve
LrCv presents the means to evaluate income disparity between two countries. From the LrCv, Ref. [18] gave terms under which such an LrCv inequality comparison has normative significance. In the case of an increasing and strict concave utility function, Ref. [18] indicates that one prefer a distribution with dominating LrCvs do not cross. Ref. [19] presented an alternative definition of the LrCv in terms of the inverse of continuous variables as well as discrete variables. Let F denote the cdf of income distribution, and the income is assumed to be non-negative. Furthermore, for a given percentile p, let
denote the inverse cdf. We suppose throughout that F is a continuous cdf with finite support. Then, the LrCvs corresponding to the distributions with F is defined as
where means the mean of the distribution. Assume that are positive random variables (RanV) with OrSt . Then, the sample LrCv (Ref. [20]) is defined by
Given that the LrCv possesses the property of comparing the degree of wealth distribution between two different distributions, our intention is to utilize it in the development of a goodness-of-fit test statistic.
3. Test Statistics
Let be the PrCsD with PrCs from the LoScD. Moreover, let the PrCsD have an LoScD with a probability density function (pdf)
where is the known function (Ref. [8]). Furthermore, we assume that location and scale parameters, and , respectively, of are unknown and is the standard form of the . Then, we want to test whether the PrCsD comes from an LoScD with Equation (1), and test the null hypothesis ()
where denotes the distribution function (Ref. [8]).
First of all, we introduced GoF TstS based on the distance between OsSt (Ref. [10]). Let . Then, denotes the deviation between the jth OrSt () and its expected value () based on the PrCsD. Here,
Then, TstSs based on the deviation between OrSts are obtained as
Here, the above TstSs are related to the modified Kolmogorov–Smirnov TstS.
Now, we propose TstSs by using sample LrCv. All LoScD do not have non-negative support. However, the sample LrCv supposed that Y is a non-negative income. Therefore, in order to solve, all values of the ordered PrCsD were subtracted by the value of the 1st ordered PrCsD. Then, each result was added. Furthermore, a sample LrCv cannot show the property of the shape of distribution. Therefore, in order to solve, the result is added from . Then, the modified sample LrCv is derived as
We used the percentile points (%pts) of Gumbel distribution (GumDist), log-gamma distribution (LGamDist) with ShPm 3, 6, 9 and ∞; normal distribution (NormDist); and t distribution (tDist) with 4, 5, 6 and 7 degrees of freedom (DoF) (Figure 1 and Figure 2). As shown in Figure 1 and Figure 2, the modified sample LrCv of LoScDs has a different shape. Here, the modified sample LrCv using the percentile points of LoScD is obtained as
Then, the ratio modified sample LrCv using the Equations (4) and (5) is obtained as
Here, the has the following result.
Figure 1.
Sample LrCv and modified sample LrCv of GumDist and LGamDist.
Figure 2.
Sample LrCv and modified sample LrCv of NormDist and tDist.
Lemma 1.
and are a location-scale (LoSc) invariant statistic.
Proof of Lemma 1.
Let Y be a RanV with a location parameter (LoPm) and scale parameter (ScPm) . Let , then . The distribution of X does not depend on LoPm and ScPm .
of is
of is
Let have a LoPm and ScPm . If , then . The distribution of does not depend on LoPm and ScPm .
of is
of is
□
Theorem 1.
is a LoSc invariant statistic.
Proof of Theorem 1.
Theorem 1 is straightforward according to Lemma 1. □
If the data come from an LoScD, we expect all the values to be 1. By applying these properties of to Ref. [10]’s TstSs (Equation (3)), we propose the following TstSs.
If the data come from an LoScD, we expect , , , , and TstSts to be 0. Consequently, large values of , , , , and TstSts lead to the rejection of (Equation (2)). Therefore, we reject if , , , , and TstSs exceed the corresponding null critical values (CrVal). Since , , , , and TstSs have a drawback in that their distribution theory is difficult, the %pts need to be determined through MC simulations because the CrVal are not available explicitly.
Furthermore, using , we propose a new plot method for the GoF test. If the data come from an LoScD, we expect all the values to be 1. Therefore, using these property of , we would like to propose a new plot method as follows.
If the data come from an LoScD, the is 1 and converges with . Therefore, we are going to test if the data follow the LoScD by using the degree of how much the is apart from the .
To check the shape of of various LoScDs, we generate %pts of NormDist; tDist with 4, 5, 6 and 7 DoF; GumDist; and LGamDist with parameter 3, 6, 9 and ∞. Furthermore, we draw the . The results of for various LoScDs appear in Figure 3. From Figure 3, converges with the at NormDist and GumDist. In tDist and LGamDist, however, is apart from the x-axis.
Figure 3.
of various LoScDs.
4. Simulation Result
In this Section, we assess the power of the proposed TstS by comparing the simulated power values with those of Ref. [10]’s TstSs. First of all, we generated 10,000 data for various PrCs (different choices of sample size and PrCs). Here, PrCs were used by Ref. [21].
The proposed TstS is designed to be free of LoSc parameters, ensuring that distributions with these parameters remain unaffected by their specific values. Consequently, the standard distribution serves as the parameter value for the null distribution, ensuring that the power of the test remains consistent irrespective of the parameter value in the null distribution. The alternative distribution, on the other hand, incorporates an ShPm with diverse values to represent a range of distribution shapes.
We consider a NormDist and GumDist as the NuDist. For testing the NormDist, the alternative distribution is considered tDist with 4, 5, 6 and 7 DoF. For testing the GumDist, the alternative distribution is considered LGamDist with ShPm 3, 6, 9 and ∞. All numerical computations are carried out via R 4.3.2 software (Supplementary Materials) utilizing two packages, namely: ‘goftest’ and ‘VGAM’ packages.
When considering the alternative distribution as the distribution from which the data are simulated, the rejection probabilities provide insights into the power of the TstSs. A power value approaching 1 indicates higher test effectiveness. The estimated power are presented in Table 1 and Table 2. The proposed TstSs gained better power as the PrCsD size increased. Table 1 presents the estimated power of the TstSs when the stipulates NormDist and the corresponds to tDist with 4, 5, 6 and 7 DoF. Table 1 shows that the TstS possessed better power than Ref. [10]’s TstSs in a number of PrCs (indicated in bold). The TstS was found to be better than Ref. [10]’s TstSs in all PrCs. When the data were generated from tDist with 4 DoF, the proposed TstSs gained better power. The TstS was always more powerful than the other proposed TstSs. Furthermore, , , , , and TstSs were compared with , , , , and TstSs, respectively. As a result, , , and TstSs were found to be better than , , and TstS in 64, 80, 68 and 68 out of 108 PrCs, respectively.
Table 1.
Estimated power values for tDist with 4, 5, 6, and 7 DoF when is the NormDist.
Table 2.
Estimated power values for LGamDist with ShPm 3, 6, 9, and ∞ when the is the GumDist.
Table 2 presents the estimated power of the TstSs when the stipulates GumDist and the corresponds to LGamDist with ShPm 3, 6, 9 and ∞. Table 2 shows that the TstS possessed better power than Ref. [10]’s TstSs in a number of PrCs (indicated in bold). TstS was found to be better than Ref. [10]’s TstSs in 64 out of 108 PrCs. When the data were generated from LGamDist with ShPm ∞, the proposed TstSs gained better power. The TstS was almost always more powerful than the other proposed TstSs. Moreover, , , , , and TstSs were compared with , , , , and TstSs, respectively. As a result, , , , and TstSs were found to be better than , , , and TstSs in 99, 93, 93, 75 and 68 out of 108 PrCs, respectively.
Therefore, it can be seen that the TstSs using the LrCv are better than the TstSs using the OrSt.
5. Real Data Analysis
In this Section, we present two examples of real data analysis using Ref. [10]’s TstSs and the proposed TstSs for illustrative purposes.
Example 1 (Breaking strength data)
The Example 1 data were previously studied by Refs. [11,22,23]. In Example 1, Refs. [11,22,23] generated a PrCsD of size from . The PrCsD are given in Table 3.
Table 3.
Example 1.
The values of the TstSs and the corresponding p-values are presented in Table 4. Table 4 shows that all p-values are greater than significance level 0.05. Therefore, the given p-values support of the NormDist for the data. This result is in agreement with the findings of Refs. [10,11,22,23].
Table 4.
TstSs and the corresponding p-values for example 1.
We can confirm this with . The of Example 1 is presented in Figure 4. Figure 4 shows that the of Example 1 is closed to . Thus, concludes that Example 1 follows NormDist.
Figure 4.
of examples.
Example 2 (log transformed insulating fluid test data)
The Example 2 data were previously studied by Refs. [11,23]. In Example 2, Refs. [11,23] generated a PrCsD of size from . The PrCsD are given in Table 5.
Table 5.
Example 2 data.
The values of the TstSs and the corresponding p-values are presented in Table 6. Table 6 shows that all p-values are greater than significance level 0.05. Therefore, the given p-values support the of the GumDist for the data. This result is in agreement with the findings of Refs. [10,11,23].
Table 6.
TstSs and the corresponding p-values for example 2.
6. Conclusions
The problem of examining how well the data fit a supposed distribution is very important, and it must be confirmed prior to any data analysis. Usually, we use a histogram or Q-Q plot for the assessment of data distribution. Furthermore, we use a GoF TstS. In life-testing or reliability studies, the observed failure time of test units may not be recorded in some situations. The GoF TstSs for completely observed data can no longer be used in PrCsD. In this paper, we suggest a GoF TstSs and a new plot method for the GoF test of LoScD based on PrCsD.
The proposed TstS is designed to be free of LoSc parameters, ensuring that distributions with these parameters remain unaffected by their specific values. Consequently, the standard distribution serves as the parameter value for the null distribution, ensuring that the power of the test remains consistent irrespective of the parameter value in the null distribution. The power of the suggested TstSs is estimated through MC simulations, and it is compared with that of the TstSs using the OrSts. As the parent distributions, we consider NormDist and GumDist. For testing the NormDist and GumDist, the alternative distribution is considered tDist with 4, 5, 6 and 7 DoF and LGamDist with ShPm 3, 6, 9 and ∞, respectively.
For testing the NormDist, the TstS possessed better power than Ref. [10]’s TstSs in a number of PrCs. TstS was found to be better than Ref. [10]’s TstSs in all PrCs. For testing the GumDist, the TstS possessed better power than Ref. [10]’s TstSs in a number of PrCs. TstS was found to be better than Ref. [10]’s TstSs in 64 out of 108 PrCs. Therefore, it can be seen that the TstSs using the LrCv are better than the TstSs using the OrSts. Moreover, the proposed method in this study not only provides test statistics but also incorporates graphical representations, allowing for the visual interpretation of results.
Although we have supposed that the LoScDs are GumDist and NormDist, any other LoScD can also be considered.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/sym16020202/s1.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2022R1I1A3068582).
Data Availability Statement
The author confirms that the data supporting the findings of this study are available within the article.
Acknowledgments
The author would like to express deep thanks to the Editor and the referees for their helpful comments and suggestions, which led to a considerable improvement in the presentation of this paper.
Conflicts of Interest
The author declares no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| GoF | goodness-of-fit | LoScD | location-scale distribution |
| LrCv | Lorenz curve | PrCsD | progressive Type II censored data |
| PrCs | progressive Type II censoring scheme | OrSt | order statistic |
| %pts | percentile points | LoPm | location parameter |
| ScPm | scale parameter | ShPm | shape parameter |
| LoSc | location and scale | CrVal | critical values |
| RanV | random variable | NormDist | normal distribution |
| tDist | t distribution | DoF | degrees of freedom |
| GumDist | Gumbel distribution | LGamDist | log-gamma distribution |
| MC | Monte Carlo |
References
- Balakrishnan, N.; Aggarwala, R. Progressive Censoring: Theory, Methods and Applications; Birkhauser: Boston, MA, USA, 2000. [Google Scholar]
- Mahto, A.K.; Tripathi, Y.M.; Kizilaslan, F. Estimation of reliability in a multicomponent stress-strength model for a general class of inverted exponentiated distributions under progressive censoring. J. Stat. Theory Pract. 2020, 14, 1–35. [Google Scholar] [CrossRef]
- Algarni, A.; Elgarhy, M.; M Almarashi, A.; Fayomi, A.; R El-Saeed, A. Classical and bayesian estimation of the inverse Weibull distribution: Using progressive type-I censoring scheme. Adv. Civ. Eng. 2021, 2021, 1–15. [Google Scholar] [CrossRef]
- Almongy, H.M.; Alshenawy, F.Y.; Almetwally, E.M.; Abdo, D.A. Applying transformer insulation using Weibull extended distribution based on progressive censoring scheme. Axioms 2021, 10, 100. [Google Scholar] [CrossRef]
- Alam, I.; Ahmed, A. Inference on maintenance service policy under step-stress partially accelerated life tests using progressive censoring. J. Stat. Comput. Simul. 2022, 92, 813–829. [Google Scholar] [CrossRef]
- Moharib Alsarray, R.M.; Kazempoor, J.; Ahmadi Nadi, A. Monitoring the Weibull shape parameter under progressive censoring in presence of independent competing risks. J. Appl. Stat. 2023, 50, 945–962. [Google Scholar] [CrossRef] [PubMed]
- Dey, S.; Elshahhat, A.; Nassar, M. Analysis of progressive type-II censored gamma distribution. Comput. Stat. 2023, 38, 481–508. [Google Scholar] [CrossRef]
- Wang, B. Goodness-of-fit test for the exponential distribution based on progressively Type II censored sample. J. Stat. Comput. Simul. 2008, 78, 125–132. [Google Scholar] [CrossRef]
- Pakyari, R.; Balakrishnan, N. A general purpose approximate goodness-of-fit test for progressively type-II censored data. IEEE Trans. Reliab. 2012, 61, 238–244. [Google Scholar] [CrossRef]
- Pakyari, R.; Balakrishnan, N. Goodness-of-fit tests for progressively Type II censored data from location-scale distribution. J. Stat. Comput. Simul. 2013, 83, 167–178. [Google Scholar] [CrossRef]
- Lee, W.; Lee, K. Goodness-of-fit tests for progressively Type II censored data from a location-scale distributions. Commun. Stat. Appl. Methods 2019, 26, 191–203. [Google Scholar]
- Ma, Y.; Gui, W. Entropy-based and non-entropy-based goodness of fit test for the inverse rayleigh distribution with progressively type II censored data. Probab. Eng. Informational Sci. 2021, 35, 631–649. [Google Scholar] [CrossRef]
- Pakyari, R. Goodness-of-fit testing based on Gini Index of spacings for progressively Type-II censored data. Commun. Stat. Simul. Comput. 2023, 52, 3223–3232. [Google Scholar] [CrossRef]
- Qin, X.; Yu, J.; Gui, W. Goodness-of-fit test for exponentiality based on spacings for general progressive Type-II censored data. J. Appl. Stat. 2022, 49, 599–620. [Google Scholar] [CrossRef] [PubMed]
- Cho, Y.; Lee, K. Goodness-of-fit test for progressive censored data from an inverse Weibull distribution. J. Korean Data Inf. Sci. Soc. 2023, 34, 505–513. [Google Scholar]
- Fusek, M. Statistical power of goodness-of-fit tests for type I left-censored data. Austrian J. Stat. 2023, 52, 51–61. [Google Scholar] [CrossRef]
- Vaisakh, K.M.; Xavier, T.; Sreedevi, E.P. Goodness of fit test for Rayleigh distribution with censored observations. J. Korean Stat. Soc. 2023, 52, 794–815. [Google Scholar] [CrossRef]
- Atkinson, A.B. On the measurement of inequality. J. Econ. Theory 1970, 2, 244–263. [Google Scholar] [CrossRef]
- Gastwirth, J.L. A general definition of the Lorenz curve. Econometrica 1971, 39, 1037–1039. [Google Scholar] [CrossRef]
- Gail, M.H.; Gastwirth, J.L. A scale-free goodness-of-fit test for the exponential distribution based on the Lorenz curve. J. Am. Stat. Assoc. 1978, 73, 787–793. [Google Scholar] [CrossRef]
- Balakrishnan, N.; Ng, H.K.T.; Kannan, N. Goodness-of-fit tests based on spacings for progressively Type II censored data from a general location-scale distribution. IEEE Trans. Reliab. 2004, 53, 349–356. [Google Scholar] [CrossRef]
- King, J.R. Probability Charts for Decision Making; Industrial Press: New York, NY, USA, 1971. [Google Scholar]
- Nelson, W. Applied Life Data Analysis; John Wiley & Sons: New York, NY, USA, 1982. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).