Abstract
Estimation based on probability-weighted moments is a well-established method and an excellent alternative to the classic method of moments or the maximum likelihood method, especially for small sample sizes. In this research, we developed a new class of estimators for the parameters of the Pareto type I distribution. A generalization of the probability-weighted moments approach is the foundation for this new class of estimators. It has the advantage of being valid in the entire parameter space of the Pareto distribution. We established the asymptotic normality of the new estimators and applied them to simulated and real datasets in order to illustrate their finite sample behavior. The results of comparisons with the most used estimation methods were also analyzed.
Keywords:
asymptotic distribution; Pareto distribution; parameter estimation; probability-weighted moment MSC:
62E20; 62F10; 62F12
1. Introduction
The Pareto distribution has resulted from the work of the economist Vilfredo Pareto [1]. Pareto observed that the number of taxpayers with an income greater than x could be approximated by , where a and b are positive parameters. This fact led to the introduction of several variants of the Pareto distribution, with a survival function proportional to . The most common Pareto distribution, often referred to as Pareto type I, will be investigated in this work. Given a random variable X with a Pareto type I distribution, the distribution function (d.f.) is as follows:
where a and c are the shape and scale parameters, respectively. The corresponding probability density function is
The parameter c corresponds to the lower bound for the support of the random variable, whereas the parameter a quantifies the heaviness of the right tail and is also referred to as the tail or Pareto index [2,3]. As a decreases, the tail becomes heavier. The d.f. in (1) is inverted to produce the associated quantile function of X, which is represented by
where the lower tail probability is denoted by p. Despite the simple analytic expressions in Equations (1)–(3), this model has been successfully applied in a large number of different fields, such as bibliometrics, demography, economy, geology, insurance and finance, among others. An alternative form of the Pareto model results from the change in location . This alternative form is known as the Pareto type II or the Lomax ([4,5]) distribution. Another related model is the generalized Pareto distribution [6]. Under a semiparametric framework, the Pareto type I distribution is often used in the analysis of extreme events. Under such a framework, we use Equation (1) as an upper tail model and work with the reciprocal parameter , the so-called extreme value index. Detailed discussions and reviews of estimation for Pareto-tailed models can be found in works by Beirlant et al. [7,8,9], Gomes and Guillou [10] and Peng and Qi [11], among others.
The estimation of the shape and scale parameters a and c is an important and popular research topic. Although maximum likelihood estimators have optimal properties, such properties are only guaranteed asymptotically. Thus, different estimation methods, which performed better than the maximum likelihood method for small or moderate sample sizes, have been proposed in the literature by many authors. Quandt [12] compared the performance of the maximum likelihood estimator with that of the moments estimator, a least squares estimator and four quantile estimators. Different least squares estimators were examined by Lu and Tau [13], Caeiro et al. [14], Kantar [15] and Kim et al. [16]. Robust estimators of the shape parameter were introduced by Brazauskas and Serfling [17] and Vandewalle et al. [18]. Bayesian estimators can be found in Arnold and Press [19], Rasheed and Al-Gazi [20] and Han [21]. Singh and Guo [22], Caeiro and Gomes [23,24] and Munir et al. [25] considered probability-weighted moment estimators. Bhatti et al. [26,27] proposed modified maximum likelihood estimators and Chen et al. [28] dealt with the estimation of the Pareto parameters with a modification of ranked set sampling.
The purpose of this article is to examine a new method for estimating the shape and scale parameters of a Pareto model. The remainder of the paper is structured as follows. In Section 2, we review the most common estimators for the parameters of the Pareto distribution and introduce the new class of estimators. These estimators, called log-generalized probability-weighted moment estimators, are derived from a modification of the classic probability-weighted moments method. In Section 3, we study the asymptotic results for the new class of estimators. A Monte Carlo simulation study and two real data applications are provided in Section 4 to illustrate the performance of the estimators. Some concluding remarks are given in Section 5.
2. Traditional and New Techniques for Estimating the Parameters of the Pareto Distribution
This section covers some common estimation methods for the shape and scale parameters from the Pareto distribution in (1) and introduces a new estimation procedure. Assume that is a sample of independent and identically distributed (i.i.d.) random variables, from a Pareto distribution, as defined in (1), with both parameters, a and c, unknown. The sample of non-decreasing order statistics is denoted as .
2.1. Maximum Likelihood Estimators
The maximum likelihood (ML) estimators are found by maximizing the log-likelihood function and have the closed-form expressions
2.2. Moment Estimators
It is well known that the non-central moments of order k for the Pareto model are expressed as follows:
In applications, the approach of moments based on the two first moments is unpopular because the second moment only exists for , and other moment-based estimators have emerged in the literature. To extend the domain of validity of the estimators based on moments, Quandt [12] considered the first non-central moment of X, , and the first moment of the sample minimum, . The sample minimum of a Pareto distribution has a Pareto distribution whose scale and shape parameters are c and , respectively. Quandt obtained the moment (M) estimators by equating two aforementioned theoretical moments to the corresponding sample moments and solving the system of equations following the parameters of the distribution. The estimators obtained are consistent for and given by
where denotes the arithmetic sample mean.
2.3. Probability Weighted Moment Estimators
The probability-weighted moment (PWM) method (Greenwood et al. [29]) is currently a well-established estimation procedure in the field of hydrology. Studies using Monte Carlo simulations demonstrated that, for small sample sizes, PWM estimators outperform other estimation techniques (Hosking et al. [30]). The PWMs of a random variable X, with d.f. F, are defined as
where k, r and s are real numbers. If the mean value exists, then exists for any real positive values r and s. The PWM method generalizes the classic method of moments: when , are the non-central moments of order k. For models that have a closed-form quantile function, Q, it may be more convenient to compute the PWMs as
More recently, this method was modified for models without an analytic d.f. and quantile function (see Jing et al. [31]). The PWM estimators are derived by equating with their respective sample moments and then solving those equations following the parameters of the distribution. Greenwood et al. [29] and Hosking et al. [30] recommend using , since the relations between parameters and moments are usually much simpler. The empirical estimate of is usually less sensitive to outliers and has good properties when the sample size is small. For convenience, several authors chose to use and non-negative integer values for r and s. This approach will be referred to as the classic PWM method. In addition, when r and s are non-negative integers, it is more convenient to work with the PWMs
or
It should be noted that can be represented as a linear combination of powers of or for non-negative integers r and s. As a result, we may use the following equations to relate and :
where denotes the binomial coefficient. Using or is equivalent as long as the values for r are non-negative integers that are as small as possible. For non-negative integer values of r, the unbiased estimators of the PWMs and , defined in (7) and (8), are, respectively (Landwehr et al. [32]),
Instead of the unbiased estimators, one may prefer to use the biased estimators
where r can be a real number and are the plotting positions; that is, empirical estimates of . The options that are most frequently used for plotting positions are
or
where b is a continuity correction factor. Landwehr et al. [33] concluded empirically that moderated biased estimators of the PWMs could produce more accurate estimates of upper quantiles.
For the Pareto distribution in (1), the PWMs in (6) are given by
where stands for the complete beta function. By setting the exponents , we obtain the classical PWMs for the Pareto distribution, valid for and given by
Singh and Guo [22], Caeiro and Gomes [23,34], Munir et al. [25] and Caeiro et al. [35] took the PWMs and into account and deduced the associated PWM estimators for the shape and scale parameters of the Pareto distribution. Those estimators are
with and given in (9). As stated earlier, the PWM estimators in (11) are only defined for a Pareto model with finite mean value ().
2.4. Extended Class of PWM Estimators
The theoretical PWMs defined in (6) can have any real values for the exponents k, r and s; however, early applications only considered non-negative integer exponents. Rasmussen [36] explored PWMs with real exponents and referred to this method as generalized PWM (GPWM) to distinguish it from the classic PWM approach. He found that, in most cases, the GPWM method outperforms the classic PWM method. To simplify the GPWM method, it is recommended to limit the class of GPWMs by setting or . This restriction leads to the use of simpler analytical formulas for GPWMs. The GPWM estimators are the ones in (10) for any real value of r. Another version of the PWM method was introduced by Caeiro and Prata Gomes [37]. The authors worked in the context of Pareto-type tails and considered a different type of PWM, specified by
with , , . Such a class of PWMs was named log PWM (LPWM) and has the advantage of extending the domain of validity of the estimators to the complete parameter space for the Pareto model. Caeiro and Mateus [38] considered the LPWMs in (12) with and studied the corresponding LPWMs for the Pareto model.
If we take into consideration the LPWMs and , the respective LPWM estimators of the shape and scale parameters of the Pareto distribution in (1) are, respectively,
where , are the unbiased empirical estimator of given by
Recently, Chen [39] introduced an extended class of GPWMs by evaluating the PWMs in (12) with g a suitable continuous function and r and s any real values. Mateus and Caeiro [40] considered the extended class of GPWMs with for a rescaled sample of the Pareto model. This approach, called log-generalized probability-weighted (LGPWM), uses one theoretical moment and only provides an estimator for the shape parameter of the Pareto distribution. For the estimation of the scale parameter, Mateus and Caeiro [40] used an estimator similar to the moment estimator, .
2.5. New Class of LGPWM Estimators
We now introduce a new LGPWM class of estimators for the Pareto distribution that provides shape and scale estimators and generalizes the LPWM estimators in (14). The new LGPWM estimators are built using the moments in (13) for any real value of . Then, for each real s, the corresponding empirical (biased) estimator is provided by
where are the plotting positions. To estimate the two parameters of the Pareto distribution, we shall consider the theoretical moments and in (13) with . Equating the moments and to the corresponding empirical estimate in (16) and solving the system of equations in the order of the parameters a and c, we obtain the following estimators:
and
where . The tuning parameters and should be chosen carefully in order to obtain a good fit of the sample data. A possible selection of and will be presented in Section 4.
3. Distributional Behavior of the LGPWM Estimators
To better understand the behavior of the estimators under consideration, and in order to compare their relative performance with other established estimators from the literature, it is important to study their sampling distribution. Unfortunately, for the estimators depending on a weighted average of the complete set of order statistics, the exact distribution cannot be derived analytically. As a compromise, we will study the asymptotic sampling distribution of the estimators considered here. Such asymptotic distributions can be used as an approximation to the exact distribution for large values of n and usually provide a good approximation for samples of sizes larger than 50.
In the following, and stand, respectively, for convergence and equality in distribution. Next, we present, without proof, in Proposition 1 and Proposition 2, the non-degenerated asymptotic distribution of the commonly known estimators from the literature given in (4), (5) and (11).
Proposition 1
(Mateus and Caeiro [40,41]). Suppose that is an i.i.d. sample from the Pareto population with d.f. in (1). Then,
and
where represents a normal random variable with mean value μ and variance .
Proposition 2.
Under the conditions of Proposition 1, we have
and
where refers to a standard exponential random variable with d.f.
The following lemma and proposition are required to state the non-degenerate asymptotic limit behavior of the LGPWM estimators.
Lemma 1.
Let X be a Pareto random variable with d.f. given in (1) and E a standard exponential random variable with d.f. given in Equation (25). Then, has a shifted and re-scaled standard exponential distribution (Arnold [42]):
Moreover, since the previous relation between the Pareto and exponential distributions is strictly increasing, it follows that, for a sample of size n,
where are the non-decreasing order statistics from n mutually independent and identically distributed standard exponentially random variables.
Proposition 3.
Consider a sample of size n from a Pareto population and define
with and any real ω. The asymptotic limit distribution
holds true, with
and
Proof of Proposition 3.
Using Lemma 1 we can write
with
and
Hence, note that converges toward . By utilizing the asymptotic result in the study by Arnold et al. [43] (p.229), for linear functions of order statistics, we obtain
with
and
Combining the asymptotic results for and , the limit distribution in (27) follows straightforwardly. □
Proposition 4.
Let us consider the conditions of Proposition 3. Then,
and
with .
Proof.
Let . Then, invoking Proposition 3 with , we obtain
Noticing that and applying the delta method, the asymptotic result in (29) is established. Then, defining and using the result from Proposition 3 again, with , we obtain
Applying the delta method to , we obtain the limit distribution in (30). □
4. Numerical Results
In this section, we analyze simulated and real datasets to assess the performance of the estimation procedures discussed in Section 2. For the LGPWM estimators in (17) and (18), we used the empirical values of (16) with plotting positions , where . Since the LGPWM estimation method requires two tuning parameters, we first present a data-driven algorithm to determine these parameters.
4.1. Data-Driven Tuning Parameter Selection for the LGPWM Estimator
Consider the LGPWM estimators and with tuning parameters and taking values in , with , discretized in small steps of length 0.1. For each pair of values (), we analyze the fit of the Pareto model by comparing the empirical cumulative distribution function, , with the fitted cumulative distribution function, , as defined in (1), using an appropriate goodness-of-fit statistic. Lastly, we select the set of parameters that provides the best fit. To measure the agreement between the observations and the model, the following goodness-of-fit statistic tests were considered:
- Kolmogorov–Smirnov (KS) statistic:withand
- Cramér–von Mises (CvM) statistic:
- Modified Anderson–Darling (MAD) statistic (Ahmad et al. [44]):
Relative to the usual Anderson–Darling statistic, the statistic in (35) gives more weight to the data in the upper tail. Smaller values of the statistics in (33)–(35) correspond to a better fit of the Pareto model. For a statistical power comparison between some of the aforementioned statistical tests, see Razali and Wah [45] or Singla et al. [46].
4.2. Simulation Study
In this subsection, we conduct a Monte Carlo simulation experiment to illustrate the performance of the aforementioned estimation methods for the shape and scale parameters of the Pareto model. We refer to the LGPWM estimators as LGPWM-KS, LGPWM-CvM and LGPWM-MAD when the tuning parameters and are selected using the data-driven method described in Section 4.1 based on the statistics in (33), (34) and (35), respectively. All computation was performed in software R. We simulated samples of sizes , 20, 30, 40, 50, 75, 100, 150 and 200 from the Pareto distribution, taking the following combination of shape and scale parameters: = , , and . To evaluate the accuracy and efficiency of the various estimators, we computed the simulated bias and the root mean squared error (RMSE) for each sample size, each set of parameters and the estimator under study.
The simulated results are summarized in Table 1 and Table 2. As can be seen from these tables, the estimated biases and root mean squared errors generally tend toward zero for all estimation methods as the sample size increases, except for the M and PWM. This can be explained by the fact that the M estimator of a and both PWM estimators are not consistent if . Moreover, most of the estimators usually overestimate the target parameter. Regarding the LGPWM estimator, the optimal selection of tuning parameters is obtained through the data-driven method outlined in Section 4.1 using the MAD statistic.
Table 1.
Bias and RMSE of the estimators of the shape parameter a for the Pareto distribution.
Table 2.
Bias and RMSE of the estimators of the scale parameter c for the Pareto distribution.
For the estimation of the shape parameter a, the LGPWM-MAD estimator always has the smallest absolute bias and the smallest RMSE if the sample size is small. For larger sample sizes, the ML estimators have the lowest RMSE. In addition, the performance of the ML estimator is always quite close to the LGPWM-MAD estimator. Comparing the performance of all of the estimators for the scale parameter c, it is observed that the M estimator usually has the smallest RMSE. The LGPWM-MAD provides generally good results in terms of absolute bias.
4.3. Real Data Analysis
We now analyze the fit of a Pareto model to two real datasets: the population of the 150 largest metropolitan areas in the world and the estimated number of deaths from major earthquakes.
4.3.1. Population of the Largest Metropolitan Areas in the World
This dataset has the 150 largest cities in the world, by population, and was retrieved from the worldatlas website [47]. Since the webpage with the dataset is no longer available, data can be retrieved using the Wayback Machine website (https://web.archive.org/ (accessed on 15 May 2021)) or in Appendix A. Values were converted to millions (). In Table 3 we provide the descriptive statistics obtained with the function summary in R software.
Table 3.
Summary statistics for the population data.
If data come from a Pareto distribution, high-order moments might not exist. Therefore, to assess the skewness, we computed the Bowley [48] coefficient of skewness,
where , and are the first, second and third empirical quartiles, respectively. This measure of skewness is robust against extreme values. For other robust measures of skewness, see, among others, Horn [49], Kim and White [50] and Brys et al. [51]. Since and the median is smaller than the mean, we conclude that the underlying model is positively skewed. The histogram and the boxplot of these observations, in Figure 1, confirm the skewness of the data.
Figure 1.
Histogram and boxplot for the population data.
Figure 2 suggests a Paretian behavior of the data. For more details regarding the construction of the Pareto Q-Q plot, see refs. [7,52].
Figure 2.
Pareto Q-Q Plot for the population data.
The parameter estimates for the fitted Pareto distribution, provided by the ML, L, PWM and LGPWM estimators, and the values of , and test statistics in (33), (34) and (35), respectively, are shown in Table 4. The smallest value of each test statistic is presented in bold. We took all possible combinations of values and chose the three combinations that provided the smallest values for each of the aforementioned test statistics. The values of the test statistics show that the new LGPWM estimators are relatively better than any other considered estimators. The choice of parameters and for the LGPWM estimators provides the smallest or second smallest value of the test statistics , and . Note that not all methods perform well: the estimator produced an inadequate estimate ().
Table 4.
Parameter estimates and goodness-of-fit statistics for the population data.
4.3.2. Estimated Number of Deaths in Major Earthquakes
The second data set is available in Clark [53] and contains the estimated number of deaths in international earthquakes (from 1900 to 2011). The values of the data are as follows: 316,000, 242,769, 227,898, 200,000, 142,800, 110,000, 87,587, 86,000, 72,000, 70,000, 50,000, 40,900, 32,700, 32,610, 31,000, 30,000, 28,000, 25,000, 23,000, 20,896, 20,085. Values were converted to thousands (). Table 5 shows the descriptive statistics of the data.
Table 5.
Summary statistics for the estimated number of deaths.
The Bowley coefficient of skewness is 0.5. Figure 3 shows the histogram and the boxplot, which are clearly right skewed.
Figure 3.
Histogram and boxplot for the estimated number of deaths.
The Q-Q plot in Figure 4 suggests a Paretian behavior of the data.
Figure 4.
Pareto Q-Q Plot for the estimated number of deaths.
The parameter estimates of the Pareto model and the empirical value of the Kolmogorov–Smirnov, Cramér–von Mises and modified Anderson–Darling criteria are shown in Table 6. Overall, the LGPWM method provides a good fit. From Table 6, it is seen that there is no significant difference between using the Cramér–von Mises or modified Anderson–Darling criteria. In addition, notice that the scale PWM estimate is again invalid, since it is greater than the sample minimum.
Table 6.
Parameter estimates and goodness-of-fit statistics for estimated number of deaths.
5. Conclusions
In this research, we propose a new class of estimators for the shape and scale parameters of a Pareto distribution, named the log-generalized probability-weighted moment. This new class can be viewed as a generalization of the well-known probability-weighted moments and offers the advantage of extending the domain of the validity of the estimators to the complete parameter space of the Pareto distribution. Additionally, the asymptotic sampling distribution of the estimators provided by this method can be used as an approximation of the exact distribution for large sample sizes. The usefulness of the new estimation method was illustrated through a simulation study and two real data applications. It is concluded that, with appropriate choices of the tuning parameters and , the proposed LGPWM estimators are capable of competing with the most commonly used estimation methods. As future research, we plan to examine the utilization of other goodness-of-fit statistics in the data-driven method for selecting the tuning parameters.
Author Contributions
Conceptualization, F.C.; methodology, F.C. and A.M.; validation, A.M.; Investigation, F.C. and A.M.; data curation, F.C.; writing—original draft preparation, F.C. and A.M.; writing—review and editing, F.C. and A.M. All authors have read and agreed to the published version of the manuscript.
Funding
Research partially supported by National Funds through FCT—Fundação para a Ciência e a Tecnologia, projects UIDB/00297/2020 and UIDP/00297/2020 (Centro de Matemática e Aplicações).
Data Availability Statement
The data supporting the findings in Section 4.3 of this study are available within the article.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Table A1.
Population of the largest metropolitan areas in the world.
Table A1.
Population of the largest metropolitan areas in the world.
| Rank | City | Country | Population | Rank | City | Country | Population |
|---|---|---|---|---|---|---|---|
| 1 | Tokyo | Japan | 38,001,000 | 76 | Abidjan | Cote d’Ivoire | 4,859,798 |
| 2 | Delhi | India | 25,703,168 | 77 | Guadalajara | Mexico | 4,843,241 |
| 3 | Shanghai | China | 23,740,778 | 78 | Yangon | Myanmar | 4,801,930 |
| 4 | São Paulo | Brazil | 21,066,245 | 79 | Alexandria | Egypt | 4,777,677 |
| 5 | Mumbai | India | 21,042,538 | 80 | Ankara | Turkey | 4,749,968 |
| 6 | Mexico City | Mexico | 20,998,543 | 81 | Kabul | Afghanistan | 4,634,875 |
| 7 | Beijing | China | 20,383,994 | 82 | Qingdao | China | 4,565,549 |
| 8 | Osaka | Japan | 20,237,645 | 83 | Chittagong | Bangladesh | 4,539,393 |
| 9 | Cairo | Egypt | 18,771,769 | 84 | Monterrey | Mexico | 4,512,572 |
| 10 | New York | United States | 18,593,220 | 85 | Sydney | Australia | 4,505,341 |
| 11 | Dhaka | Bangladesh | 17,598,228 | 86 | Dalian | China | 4,489,380 |
| 12 | Karachi | Pakistan | 16,617,644 | 87 | Xiamen | China | 4,430,081 |
| 13 | Buenos Aires | Argentina | 15,180,176 | 88 | Zhengzhou | China | 4,387,118 |
| 14 | Kolkata | India | 14,864,919 | 89 | Boston | United States | 4,249,036 |
| 15 | Istanbul | Turkey | 14,163,989 | 90 | Melbourne | Australia | 4,203,416 |
| 16 | Chongqing | China | 13,331,579 | 91 | Brasília | Brazil | 4,155,476 |
| 17 | Lagos | Nigeria | 13,122,829 | 92 | Jiddah | Saudi Arabia | 4,075,803 |
| 18 | Manila | Philippines | 12,946,263 | 93 | Phoenix | United States | 4,062,605 |
| 19 | Rio de Janeiro | Brazil | 12,902,306 | 94 | Ji’nan | China | 4,032,150 |
| 20 | Guangzhou | China | 12,458,130 | 95 | Montréal | Canada | 3,980,708 |
| 21 | Los Angeles | United States | 12,309,530 | 96 | Shantou | China | 3,948,813 |
| 22 | Moscow | Russia | 12,165,704 | 97 | Nairobi | Kenya | 3,914,791 |
| 23 | Kinshasa | D. Rep. Congo | 11,586,914 | 98 | Medellín | Colombia | 3,910,989 |
| 24 | Tianjin | China | 11,210,329 | 99 | Fortaleza | Brazil | 3,880,202 |
| 25 | Paris | France | 10,843,285 | 100 | Kunming | China | 3,779,558 |
| 26 | Shenzhen | China | 10,749,473 | 101 | Changchun | China | 3,762,390 |
| 27 | Jakarta | Indonesia | 10,323,142 | 102 | Changsha | China | 3,761,018 |
| 28 | London | United Kingdom | 10,313,307 | 103 | Recife | Brazil | 3,738,526 |
| 29 | Bangalore | India | 10,087,132 | 104 | Rome | Italy | 3,717,956 |
| 30 | Lima | Peru | 9,897,033 | 105 | Zhongshan | China | 3,691,360 |
| 31 | Chennai | India | 9,890,427 | 106 | Cape Town | South Africa | 3,660,447 |
| 32 | Seoul | South Korea | 9,773,746 | 107 | Detroit | United States | 3,639,050 |
| 33 | Bogotá | Colombia | 9,764,769 | 108 | Hanoi | Vietnam | 3,629,493 |
| 34 | Nagoya | Japan | 9,406,264 | 109 | Tel Aviv | Israel | 3,608,265 |
| 35 | Johannesburg | South Africa | 9,398,698 | 110 | Porto Alegre | Brazil | 3,602,526 |
| 36 | Bangkok | Thailand | 9,269,823 | 111 | Kano | Nigeria | 3,587,049 |
| 37 | Hyderabad | India | 8,943,523 | 112 | Salvador | Brazil | 3,582,967 |
| 38 | Chicago | United States | 8,744,835 | 113 | Faisalabad | Pakistan | 3,566,952 |
| 39 | Lahore | Pakistan | 8,741,365 | 114 | Berlin | Germany | 3,563,194 |
| 40 | Tehran | Iran | 8,432,196 | 115 | Aleppo | Syria | 3,561,796 |
| 41 | Wuhan | China | 7,905,572 | 116 | Dakar | Senegal | 3,520,215 |
| 42 | Chengdu | China | 7,555,705 | 117 | Casablanca | Morocco | 3,514,958 |
| 43 | Dongguan | China | 7,434,935 | 118 | Urumqi | China | 3,498,591 |
| 44 | Nanjing | China | 7,369,157 | 119 | Taiyuan | China | 3,481,810 |
| 45 | Ahmadabad | India | 7,342,850 | 120 | Curitiba | Brazil | 3,473,681 |
| 46 | Hong Kong | Hong Kong | 7,313,557 | 121 | Jaipur | India | 3,460,701 |
| 47 | Ho Chi Minh City | Vietnam | 7,297,780 | 122 | Shizuoka | Japan | 3,368,988 |
| 48 | Foshan | Foshan | 7,035,945 | 123 | Hefei | China | 3,347,591 |
| 49 | Kuala Lumpur | Malaysia | 6,836,911 | 124 | San Francisco | United States | 3,300,075 |
| 50 | Baghdad | Iraq | 6,642,848 | 125 | Fuzhou | China | 3,282,932 |
| 51 | Santiago | Chile | 6,507,400 | 126 | Shijiazhuang | China | 3,264,498 |
| 52 | Hangzhou | China | 6,390,637 | 127 | Seattle | United States | 3,248,724 |
| 53 | Riyadh | Saudi Arabia | 6,369,710 | 128 | Addis Ababa | Ethiopia | 3,237,525 |
| 54 | Shenyang | China | 6,315,470 | 129 | Nanning | China | 3,234,379 |
| 55 | Madrid | Spain | 6,199,254 | 130 | Lucknow | India | 3,221,817 |
| 56 | Xi’an | China | 6,043,700 | 131 | Busan | South Korea | 3,216,298 |
| 57 | Toronto | Canada | 5,992,739 | 132 | Wenzhou | China | 3,207,846 |
| 58 | Miami | United States | 5,817,221 | 133 | Ibadan | Nigeria | 3,160,190 |
| 59 | Pune | India | 5,727,530 | 134 | Ningbo | China | 3,131,921 |
| 60 | Belo Horizonte | Brazil | 5,716,422 | 135 | San Diego | United States | 3,107,034 |
| 61 | Dallas | United States | 5,702,641 | 136 | Milan | Italy | 3,098,974 |
| 62 | Surat | India | 5,650,011 | 137 | Yaounde | Cameroon | 3,065,692 |
| 63 | Houston | United States | 5,638,045 | 138 | Athens | Greece | 3,051,899 |
| 64 | Singapore | Singapore | 5,618,866 | 139 | Wuxi | China | 3,049,042 |
| 65 | Philadelphia | United States | 5,585,211 | 140 | Campinas | Brazil | 3,047,102 |
| 66 | Kitakyushu | Japan | 5,510,478 | 141 | Izmir | Turkey | 3,040,416 |
| 67 | Luanda | Angola | 5,506,000 | 142 | Kanpur | India | 3,020,795 |
| 68 | Suzhou | China | 5,472,033 | 143 | Mashhad | Iran | 3,014,424 |
| 69 | Haerbin | China | 5,457,414 | 144 | Puebla | Mexico | 2,984,048 |
| 70 | Barcelona | Spain | 5,258,319 | 145 | Sana’a | Yemen | 2,961,934 |
| 71 | Atlanta | United States | 5,142,140 | 146 | Santo Domingo | Domican Rep. | 2,945,353 |
| 72 | Khartoum | Sudan | 5,129,358 | 147 | Douala | Cameroon | 2,943,318 |
| 73 | Dar es Salaam | Tanzania | 5,115,670 | 148 | Kiev | Ukraine | 2,941,884 |
| 74 | Saint Petersburg | Russia | 4,992,991 | 149 | Guatemala City | Guatemala | 2,918,337 |
| 75 | Washington D.C. | United States | 4,955,139 | 150 | Caracas | Venezuela | 2,916,183 |
References
- Pareto, V. Cours d’Economie Politique; Librairie Droz: Lausanne, Switzerland, 1897; Volume 2. [Google Scholar]
- Kleiber, C.; Kotz, S. Statistical Size Distributions in Economics and Actuarial Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2003; Volume 470. [Google Scholar] [CrossRef]
- Finkelstein, M.; Tucker, H.G.; Veeh, J.A. Pareto Tail Index Estimation Revisited. N. Am. Actuar. J. 2006, 10, 1–10. [Google Scholar] [CrossRef]
- Lomax, K.S. Business Failures: Another Example of the Analysis of Failure Data. J. Am. Stat. Assoc. 1954, 49, 847–852. [Google Scholar] [CrossRef]
- Bourguignon, M.; Gallardo, D.I.; Gómez, H.J. A Note on Pareto-Type Distributions Parameterized by Its Mean and Precision Parameters. Mathematics 2022, 10, 528. [Google Scholar] [CrossRef]
- Charpentier, A.; Flachaire, E. Pareto models for top incomes and wealth. J. Econ. Inequal. 2022, 20, 1–25. [Google Scholar] [CrossRef]
- Beirlant, J.; Goegebeur, Y.; Segers, J.; Teugels, J.L. Statistics of Extremes: Theory and Applications; John Wiley & Sons: Chichester, UK, 2004. [Google Scholar]
- Beirlant, J.; Caeiro, F.; Gomes, M.I. An overview and open research topics in statistics of univariate extremes. Revstat-Stat. J. 2012, 10, 1–31. [Google Scholar] [CrossRef]
- Albrecher, H.; Beirlant, J.; Teugels, J.L. Reinsurance: Actuarial and Statistical Aspects; John Wiley & Sons, Ltd.: Chichester, UK, 2017. [Google Scholar] [CrossRef]
- Gomes, M.I.; Guillou, A. Extreme value theory and statistics of univariate extremes: A review. Int. Stat. Rev. 2015, 83, 263–292. [Google Scholar] [CrossRef]
- Peng, L.; Qi, Y. Inference for Heavy-Tailed Data: Applications in Insurance and Finance; Academic Press: Cambridge, MA, USA, 2017. [Google Scholar] [CrossRef]
- Quandt, R.E. Old and new methods of estimation and the Pareto distribution. Metrika 1966, 10, 55–82. [Google Scholar] [CrossRef]
- Lu, H.L.; Tao, S.H. The Estimation of Pareto Distribution by a Weighted Least Square Method. Qual. Quant. 2007, 41, 913–926. [Google Scholar] [CrossRef]
- Caeiro, F.; Martins, A.P.; Sequeira, I.J. Finite sample behaviour of classical and quantile regression estimators for the Pareto distribution. AIP Conf. Proc. 2015, 1648, 540007. [Google Scholar] [CrossRef]
- Kantar, Y.M. Generalized least squares and weighted least squares estimation methods for distributional parameters. REVSTAT-Stat. J. 2015, 13, 263–282. [Google Scholar] [CrossRef]
- Kim, J.H.; Ahn, S.; Ahn, S. Parameter estimation of the Pareto distribution using a pivotal quantity. J. Korean Stat. Soc. 2017, 46, 438–450. [Google Scholar] [CrossRef]
- Brazauskas, V.; Serfling, R. Robust and Efficient Estimation of the Tail Index of a Single-Parameter Pareto Distribution. N. Am. Actuar. J. 2000, 4, 12–27. [Google Scholar] [CrossRef]
- Vandewalle, B.; Beirlant, J.; Christmann, A.; Hubert, M. A robust estimator for the tail index of Pareto-type distributions. Comput. Stat. Data Anal. 2007, 51, 6252–6268. [Google Scholar] [CrossRef]
- Arnold, B.C.; Press, S.J. Bayesian estimation and prediction for Pareto data. J. Am. Stat. Assoc. 1989, 84, 1079–1084. [Google Scholar] [CrossRef]
- Rasheed, H.A.; Al-Gazi, N.A.A. Bayes estimators for the shape parameter of Pareto type I distribution under generalized square error loss function. Math. Theory Model. 2014, 4, 20–32. [Google Scholar]
- Han, M. The E-Bayesian estimation and its E-MSE of Pareto distribution parameter under different loss functions. J. Stat. Comput. Simul. 2020, 90, 1834–1848. [Google Scholar] [CrossRef]
- Singh, V.P.; Guo, H. Parameter estimations for 2-parameter Pareto distribution by pome. Water Resour. Manag. 1995, 9, 81–93. [Google Scholar] [CrossRef]
- Caeiro, F.; Gomes, M.I. Semi-parametric tail inference through probability-weighted moments. J. Stat. Plan. Inference 2011, 141, 937–950. [Google Scholar] [CrossRef]
- Caeiro, F.; Gomes, M.I. A Class of Semi-parametric Probability Weighted Moment Estimators. In Recent Developments in Modeling and Applications in Statistics; Springer: Berlin/Heidelberg, Germany, 2013; pp. 139–147. [Google Scholar] [CrossRef]
- Munir, R.; Saleem, M.; Aslam, M.; Ali, S. Comparison of different methods of parameters estimation for Pareto Model. Casp. J. Appl. Sci. Res. 2013, 2, 45–56. [Google Scholar]
- Bhatti, S.H.; Hussain, S.; Ahmad, T.; Aslam, M.; Aftab, M.; Raza, M.A. Efficient estimation of Pareto model: Some modified percentile estimators. PLoS ONE 2018, 13, e0196456. [Google Scholar] [CrossRef] [PubMed]
- Bhatti, S.H.; Hussain, S.; Ahmad, T.; Aftab, M.; Ali Raza, M.; Tahir, M. Efficient estimation of Pareto model using modified maximum likelihood estimators. Sci. Iran. 2019, 26, 605–614. [Google Scholar] [CrossRef]
- Chen, W.; Yang, R.; Yao, D.; Long, C. Pareto parameters estimation using moving extremes ranked set sampling. Stat. Pap. 2019, 62, 1195–1211. [Google Scholar] [CrossRef]
- Greenwood, J.A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R. Probability weighted moments: Definition and relation to parameters of several distributions expressable in inverse form. Water Resour. Res. 1979, 15, 1049–1054. [Google Scholar] [CrossRef]
- Hosking, J.R.M.; Wallis, J.R.; Wood, E.F. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 1985, 27, 251–261. [Google Scholar] [CrossRef]
- Jing, D.; Dedun, S.; Ronfu, Y.; Yu, H. Expressions relating probability weighted moments to parameters of several distributions inexpressible in inverse form. J. Hydrol. 1989, 110, 259–270. [Google Scholar] [CrossRef]
- Landwehr, J.M.; Matalas, N.; Wallis, J. Probability weighted moments compared with some traditional techniques in estimating Gumbel parameters and quantiles. Water Resour. Res. 1979, 15, 1055–1064. [Google Scholar] [CrossRef]
- Landwehr, J.M.; Matalas, N.; Wallis, J. Estimation of parameters and quantiles of Wakeby distributions: 1. Known lower bounds. Water Resour. Res. 1979, 15, 1361–1372. [Google Scholar] [CrossRef]
- Caeiro, F.; Gomes, M.I. Computational Study of the Adaptive Estimation of the Extreme Value Index with Probability Weighted Moments. In Proceedings of the Recent Developments in Statistics and Data Science: SPE2021, Évora, Portugal, 13–16 October 2021; Bispo, R., Henriques-Rodrigues, L., Alpizar-Jara, R., de Carvalho, M., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 29–39. [Google Scholar] [CrossRef]
- Caeiro, F.; Gomes, M.I.; Vandewalle, B. Semi-parametric probability-weighted moments estimation revisited. Methodol. Comput. Appl. Probab. 2014, 16, 1–29. [Google Scholar] [CrossRef]
- Rasmussen, P.F. Generalized probability weighted moments: Application to the generalized Pareto distribution. Water Resour. Res. 2001, 37, 1745–1751. [Google Scholar] [CrossRef]
- Caeiro, F.; Prata Gomes, D. A Log Probability Weighted Moment Estimator of Extreme Quantiles. In Theory and Practice of Risk Assessment; Springer Proceedings in Mathematics & Statistics; Kitsos, C., Oliveira, T., Rigas, A., Gulati, S., Eds.; Springer: Cham, Switzerland, 2015; Volume 136, pp. 293–303. [Google Scholar] [CrossRef]
- Caeiro, F.; Mateus, A. Log Probability Weighted Moments Method for Pareto distribution. In Proceedings of the 17th Applied Stochastic Models and Data Analysis International Conference with the 6th Demographics Workshop, London, UK, 6–9 June 2017; Skiadas, C.H., Ed.; 2017; pp. 211–218. [Google Scholar]
- Chen, H.; Cheng, W.; Zhao, J.; Zhao, X. Parameter estimation for generalized Pareto distribution by generalized probability weighted moment-equations. Commun. Stat.-Simul. Comput. 2017, 46, 7761–7776. [Google Scholar] [CrossRef]
- Mateus, A.; Caeiro, F. A new class of estimators for the shape parameter of a Pareto model. Comput. Math. Methods 2021, 3, e1133. [Google Scholar] [CrossRef]
- Mateus, A.; Caeiro, F. Confidence intervals for the shape parameter of a Pareto distribution. AIP Conf. Proc. 2022, 2425, 320003. [Google Scholar] [CrossRef]
- Arnold, B.C. Pareto and Generalized Pareto Distributions. In Modeling Income Distributions and Lorenz Curves; Chotikapanich, D., Ed.; Springer: New York, NY, USA, 2008; pp. 119–145. [Google Scholar] [CrossRef]
- Arnold, B.C.; Balakrishnan, N.; Nagaraja, H.N. A First Course in Order Statistics; Siam: Philadelphia, PA, USA, 1992; Volume 54. [Google Scholar] [CrossRef]
- Ahmad, M.I.; Sinclair, C.D.; Spurr, B.D. Assessment of flood frequency models using empirical distribution function statistics. Water Resour. Res. 1988, 24, 1323–1328. [Google Scholar] [CrossRef]
- Razali, N.M.; Wah, Y.B. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. J. Stat. Model. Anal. 2011, 2, 21–33. [Google Scholar]
- Singla, N.; Jain, K.; Kumar Sharma, S. Goodness of Fit Tests and Power Comparisons for Weighted Gamma Distribution. REVSTAT-Stat. J. 2016, 14, 29–48. [Google Scholar] [CrossRef]
- The 150 Largest Cities in the World. Available online: https://www.worldatlas.com/citypops.htm (accessed on 15 May 2021).
- Bowley, A.L. Elements of Statistics; PS King & Son: London, UK, 1901. [Google Scholar]
- Horn, P.S. Robust quantile estimators for skewed populations. Biometrika 1990, 77, 631–636. [Google Scholar] [CrossRef]
- Kim, T.H.; White, H. On more robust estimation of skewness and kurtosis. Financ. Res. Lett. 2004, 1, 56–73. [Google Scholar] [CrossRef]
- Brys, G.; Hubert, M.; Struyf, A. A Robust Measure of Skewness. J. Comput. Graph. Stat. 2004, 13, 996–1017. [Google Scholar] [CrossRef]
- Cirillo, P. Are your data really Pareto distributed? Phys. A Stat. Mech. Its Appl. 2013, 392, 5947–5962. [Google Scholar] [CrossRef]
- Clark, D. A Note on the Upper-truncated Pareto distribution. Casualty Actuar. Soc. E-Forum 2013, Winter 1, 1–22. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).