Abstract
The goal of the present work is to estimate the nonlinear correlation between two random variables when the sample is drawn from a Farlie–Gumbel–Morgenstern (FGM)-type bivariate gamma distribution. In the context of estimating the dependence parameter, a maximum likelihood (ML) methodology is used. Thus, the present work offers ML estimators based on simple random sampling (SRS) and ranked set sampling (RSS). Additionally, we consider generalized modified RSS (GMRSS), which only requires a single rank to obtain a sample. Using GMRSS, we aim to observe the effect of the rth order statistic and its concomitant on the ML estimator. According to the Monte Carlo simulation, it is clearly seen that RSS provides an ML estimator as efficient as the ML estimator based on SRS. On the other hand, it appears that the ML estimator based on GMRSS (with minimum or maximum ranked pairs) is the best option among the studied ML estimators. Moreover, these findings are made even more meaningful by the fact that GMRSS is easier to obtain than SRS and RSS.
Keywords:
ranked set sampling; algorithm for sampling data; order statistics; dependence parameter; FGM family; maximum likelihood estimation MSC:
62D99; 62F10; 62G30; 62H05; 65C05
1. Introduction
Farlie–Gumbel–Morgenstern (FGM) is a quite attractive distribution family as it has a simple form and allows for modeling the dependence between two random variables. The family was introduced by Morgenstern [1], Gumbel [2] and Farlie [3]. Several researchers have studied the FGM distribution family, see [4,5,6,7,8,9,10]. D’Este [11] and Gupta and Wong [12] discussed bivariate gamma distribution, which is derived from the FGM family. The joint cumulative distribution function (CDF) of the random variables X and Y on is given by
with the corresponding joint probability density function (PDF)
where and are gamma PDFs,
with their CDFs and ,
Moreover, is a dependence parameter with . The random variables are said to be independent if the dependence parameter equals zero. In Equations (3)–(5), and are gamma and incomplete gamma functions, respectively. For the characteristic properties of FGM-type bivariate gamma distribution, see Kotz et al. [13].
Ranked set sampling (RSS) was introduced by McIntyre [14] as a cost-effective sampling method. The estimation of the dependence parameter using RSS and its modifications has been addressed by some authors, such as Stokes [15], Modarres and Zheng [16], Al-Saleh and Samawi [17] and Sevil and Yildiz [18].
Let us introduce the RSS procedure (k: set size and m: number of cycle) as follows:
- Step I:
- Select k2 pairs from the population for the jth cycle.
- Step II:
- Divide the pairs into the k sets at random.
- Step III:
- Select rth order statistic and its concomitant from the rth set, where r = 1, 2, … , k.
- Step IV:
- Steps I–III are repeated m cycle, j = 1, 2, … , m.
Example 1.
Let and . A sample of size is selected from the population and the rth set is denoted as , where . In each set, the Xs are ranked from the smallest to the largest. Additionally, Ys are ranked according to the variable X. From the rth set, the rth order statistic and its concomitant are selected. Then, RSS with one cycle is represented by . If , then RSS is where .
Another rank-based sampling design is called generalized modified RSS (GMRSS), which is defined by Sevil and Yildiz [18]. Only the rth order statistic and its concomitants are selected from all sets in the GMRSS procedure. Thus, the sampling design is called GMRSS () and is denoted by , where , and .
To the authors’ best knowledge, there has been limited study on estimating the dependence parameter, especially when the random variables are nonlinearly related. Moreover, the fact that FGM is a useful distribution family makes this study more meaningful. The rest of the paper is organized as follows: First, we provide some algorithms to generate sampling data from FGM-type bivariate gamma distribution in Section 2. Then, in Section 3, we examine ML estimators based on SRS, RSS and GMRSS (). Section 4 provides the biases and efficiencies of the ML estimators based on SRS, RSS and GMRSS. Finally, we give some critical comments for the simulation results in Section 5.
2. Algorithms to Generate Samples
In this section, we focus on using the copula as a tool. Suppose that and , where both X and Y are continuous. Thus, the joint distribution of U and V is called a copula that is represented by C with domain . According to Sklar’s theorem [19], we define a bivariate copula C with
where is the joint CDF on with marginal CDFs and . If and are continuous, is unique. Let be an inverse function of , then for . By taking partial derivatives of with respect to u and v, the density function of the bivariate copula can be obtained,
where () is the dependence parameter of the bivariate copula. Thus, the form of density function is
Now, the copula form of FGM is , where . Thanks to the Sklar’s theorem [19], we only need to generate such a pair on , whose joint distribution function is . An algorithm for generating a pair of observations of uniform random variables was introduced by Johnson [20] and Nelsen [21]. This algorithm uses a conditional distribution approach. It is assumed that is a conditional distribution function for V given .
The following theorem demonstrates that exists and is nondecreasing almost everywhere .
Theorem 1
(Nelsen [21]). Let C be a copula. For any , the partial derivative exists and for almost all u, and for v and u. Moreover, the function is defined and nondecreasing almost everywhere on .
Johnson [20] obtained a quasi-inverse () of by solving the equation , which is quadratic in v. This equation has one root for and it is , where and . The Algorithm 1 is given for simulating SRS data from an FGM-type bivariate gamma distribution.
| Algorithm 1: Generating data from FGM type bivariate gamma distribution. |
Step I: Generate u and p from i.i.d. random variates uniform ; Step II: and ; Step III: Set ; Step IV: The desired pair is ; Step V: and ; Step VI: RETURN . |
Here, and are inverse functions of F and G which are given by Equation (5). However, there is no closed form of the inverse functions. By using “qgamma” function in R statistical programming language, the desired pair can be obtained. To generate a ranked set sample from an FGM-type bivariate gamma distribution, Algorithm 2 is proposed in this paper. Using Algorithm 2, m pairs are generated from each rank and RSS is balanced. If only rth ranked pairs are selected from all sets, then GMRSS () can be constructed by using Algorithm 2. Therefore, GMRSS is not balanced. We will see that GMRSS provides more information than SRS and RSS, even if the GMRSS only consists of single ordered pairs.
| Algorithm 2: Generating RSS data from FGM type bivariate gamma distribution. |
Step I: Generate a random sample , , using Algorithm 1 for jth cycle; Step II: Divide the units in the sample randomly into k sets of size k each; Step III: Rank the units in each set from the smallest to the largest by using the variable X; Step IV: Select the order statistic and its concomitant variable from rth set; Step V: RETURN where and . |
The authors will provide the R functions for Algorithms 1 and 2 upon request.
3. Maximum-Likelihood Estimates of Dependence Parameter
Let be a parameter vector where and are shape parameters of the variables X and Y, respectively. Additionally, is the dependence parameter. By examining similar studies, it can be seen that there are three different scenarios in which the ML estimator of the dependence parameter is investigated: (i) the parameters of the variables X and Y are known, (ii) the parameters of the variable X (which is more easily accessible) are known, but those of Y are not, and (iii) all parameters are unknown. Beginning with case (i), an examination of the ML estimator for the dependence parameter can be meaningful and offer some hints for cases (ii) and (iii).
3.1. ML Estimator from Simple Random Sample
Suppose that , is a sample drawn at random from . Let the log-likelihood function be . Since the analytically closed expression of is not defined, the ML estimate of () can be obtained by solving the following equation:
where is the first derivative of with respect to , and . The variance of , is defined as follows:
3.2. ML Estimator from Ranked Set Sample
In this section, the ML estimator based on RSS () is established. It is assumed that is a ranked set sample where and . Under the perfect ranking condition, has the same PDF as the rth order statistic,
where and are given by Equations (3) and (5), respectively. The joint PDF of can be written as follows:
where is given by Equation (2). Thus, the log-likelihood function can be expressed as . Since appears in ,
where and . The ML estimator based on RSS () can be found by solving Equation (15) with a computer algorithm. The variance of is obtained using the following equation:
3.3. ML Estimator from Generalized Modified Ranked Set Sample
Let denote GMRSS (), where , and . The pairs in this sampling scheme are selected from the rth ranked pairs. Assume that is an ML estimator based on GMRSS (). Under perfect ranking conditions, has the same PDF as the rth order statistic that is given by Equation (13) for each and j. Thus, the joint PDF of can be defined from Equation (16). The log-likelihood function is . By taking the first derivative of with respect to , the following equation is obtained:
where and . The ML estimator based on GMRSS () is represented by and can be obtained by solving Equation (16). The variance of is described as follows:
4. Results
In this section, we present Monte Carlo simulation results. In the simulation, 10,000 samples were generated by using Algorithms 1 and 2. The values of dependence parameters were taken to be , the sample sizes to be and the set sizes to be . In the Table 1, Table 2 and Table 3, the estimated values , relative efficiency (RE) and relative information (RI) are reported, where SRS, RSS and (GMRSS ()). The REs of the ML estimators based on RSS and GMRSS () with respect to the ML estimator based on SRS are obtained by using and , where for =SRS, RSS and . On the other hand, the RIs are defined as and , where FIs are fisher information values for each SRS, RSS and . In the Monte Carlo simulation, the first and second derivatives of the log-likelihoods with respect to were attained by using a numerical optimization program, which is the “optim” function in R programming language. In the R function “optim”, function is chosen as “L-BFGS-B”.
Table 1.
Estimated values (, ), REs and RIs of with respect to .
Table 2.
Estimated values (), REs and RIs of with respect to .
Table 3.
Estimated values (), REs and RIs of with respect to .
5. Discussion
The present work provides ML estimators based on SRS, RSS and GMRSS () for the dependence parameter of FGM-type bivariate gamma distribution. According to the Monte Carlo simulation, we obtained the following findings:
- In Table 1, it is observed that the estimated values of and are similar. The estimated values become closer to the actual as the sample size increases. Additionally, when considering the REs and RIs, we can say that the ML estimator based on RSS is as efficient as the ML estimator based on SRS. For researchers who study RSS and its extensions, the result might be a surprise. Similar results, however, were found by Stokes [15] and Sevil and Yildiz [18].
- Table 2 shows that the ML estimators based on GMRSS () and GMRSS () have smaller biases than the ML estimator based on GMRSS (). Additionally, it should be noted that compared to SRS, RSS and GMRSS (), GMRSS () and GMRSS () provide more efficient ML estimators. On the other hand, there is no evidence that and are monotone increasing or decreasing functions of the sample size n.
- When comparing SRS and RSS, Table 3 demonstrates that GMRSS () and GMRSS () have the least biases (aside from ). Thus, it is revealed that the ML estimator based on GMRSS () approaches the actual while the set size increases. Additionally, the highest REs and RIs are obtained when the samples are selected from the minimum ranked () pairs or maximum ranked () pairs. According to Table 2 and Table 3, we see that the REs and RIs decrease while the rank r increases for . On the other hand, it is observed that the REs and RIs increase in r for .
- Considering that GMRSS design only needs one rank value while RSS requires all rank values, it is obvious that a sample can be obtained by using GMRSS with less effort than RSS. Consequently, the authors recommend using the GMRSS () or GMRSS () design to estimate the dependence parameter of FGM-type bivariate gamma distribution for a set size k.
Author Contributions
Conceptualization, Y.C.S. and T.O.Y.; methodology, Y.C.S. and T.O.Y.; software, Y.C.S.; validation, T.O.Y.; simulation, Y.C.S.; interpretation of results, Y.C.S. and T.O.Y.; writing—original draft preparation, Y.C.S.; writing—review and editing, Y.C.S. and T.O.Y.; supervision, T.O.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors thank the reviewers and the editor for the helpful comments that have improved the paper. The first author is supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) 2211/A National PhD scholarship program and the Higher Education Council of Turkey (YÖK) 100/2000 PhD scholarship program.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Morgenstern, D. Einfache beispiele zweidimensionaler verteilungen. Mitt. Math. Statist. 1956, 8, 234–235. [Google Scholar]
- Gumbel, E.J. Bivariate exponential distributions. J. Am. Stat. Assoc. 1960, 55, 698–707. [Google Scholar] [CrossRef]
- Farlie, D.J. The performance of some correlation coefficients for a general bivariate distribution. Biometrika 1960, 47, 307–323. [Google Scholar] [CrossRef]
- Nelsen, R.; Quesada-Molina, J.; Rodriguez-Lallena, A.J. Bivariate copulas with cubic sections. J. Nonparametric Stat. 1997, 7, 205–220. [Google Scholar] [CrossRef]
- Bairamov, I.; Kotz, S. On local dependence function for multivariate distributions. New Trends Probab. Stat. 2000, 5, 27–44. [Google Scholar] [CrossRef]
- Abo-Eleneen, Z.; Nagaraja, H. Fisher information in an order statistic and its concomitant. Ann. Inst. Stat. Math. 2002, 54, 667–680. [Google Scholar] [CrossRef]
- Ucer, B.H.; Yildiz, T.O. Estimation and goodness-of-fit procedures for Farlie–Gumbel–Morgenstern bivariate copula of order statistics. J. Stat. Comput. Simul. 2012, 82, 137–147. [Google Scholar] [CrossRef]
- Ucer, B.H.; Gurler, S. On the Mean Residual Lifetime at System Level in Two-Component Parallel Systems for the FGM Distribution. Hacet. J. Math. Stat. 2012, 41, 139–145. [Google Scholar]
- Gurler, S.; Ucer, B.H.; Bairamov, I. On the mean remaining strength at the system level for some bivariate survival models based on exponential distribution. J. Comput. Appl. Math. 2015, 290, 535–542. [Google Scholar] [CrossRef]
- Yildiz, T.; Ucer, B.H. Fisher Information of Dependence in Progressive Type II Censored Order Statistics and Their Concomitants. Int. J. Appl. Math. Stat. 2017, 56, 1–10. [Google Scholar]
- D’Este, G. A Morgenstern-type bivariate gamma distribution. Biometrika 1981, 68, 339–340. [Google Scholar] [CrossRef]
- Gupta, A.K.; Wong, C. On a Morgenstern-type bivariate gamma distribution. Metrika 1984, 31, 327–332. [Google Scholar] [CrossRef]
- Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions, Volume 1: Models and Applications, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
- McIntyre, G. A method for unbiased selective sampling, using ranked sets. Aust. J. Agric. Res. 1952, 3, 385–390. [Google Scholar] [CrossRef]
- Stokes, S.L. Inferences on the correlation coefficient in bivariate normal populations from ranked set samples. J. Am. Stat. Assoc. 1980, 75, 989–995. [Google Scholar] [CrossRef]
- Modarres, R.; Zheng, G. Maximum likelihood estimation of dependence parameter using ranked set sampling. Stat. Probab. Lett. 2004, 68, 315–323. [Google Scholar] [CrossRef]
- Al-Saleh, M.F.; Samawi, H.M. Estimation of the correlation coefficient using bivariate ranked set sampling with application to the bivariate normal distribution. Commun. Stat. Methods 2005, 34, 875–889. [Google Scholar] [CrossRef]
- Sevil, Y.C.; Yildiz, T.O. Gumbel’s bivariate exponential distribution: Estimation of the association parameter using ranked set sampling. Comput. Stat. 2022, 37, 1695–1726. [Google Scholar] [CrossRef]
- Sklar, M. Fonctions de repartition an dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 1959, 8, 229–231. [Google Scholar]
- Johnson, M.B. Multivariate Statistical Simulation; Wiley: New York, NY, USA, 1987. [Google Scholar]
- Nelsen, R.B. An Introduction to Copulas; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).