On fitting the Lomax distribution: a comparison between minimum distance estimators and other estimation techniques

In this paper we investigate the performance of a variety of estimation techniques for the scale and shape parameter of the Lomax distribution. These methods include traditional methods such as the maximum likelihood estimator and the method of moments estimator. A version of the maximum likelihood estimator adjusted for bias is also included. Furthermore, alternative moment-based estimation techniques such as the $L$-moment estimator and the probability weighted moments estimator are included along with three different minimum distance estimators. The finite sample performances of each of these estimators is compared via an extensive Monte Carlo study. We find that no single estimator outperforms its competitors uniformly. We recommend one of the minimum distance estimators for use with smaller samples, while a bias reduced version of maximum likelihood estimation is recommended for use with larger samples. In addition, the desirable asymptotic properties of traditional maximum likelihood estimators make them appealing for larger samples. We also include a practical application demonstrating the use of the techniques on observed data.


Introduction
The Pareto distribution is a heavy-tailed distribution that was originally developed in the 19 th century to model the distribution of income among individuals [34].However, in the years following its introduction, it has been extensively modified and changed to produce several variants, referred to as the Type I, II, III, and IV Pareto distributions as well as the so-called generalised Pareto distribution (GPD).The focus of this paper is on the Pareto Type II distribution where the location parameter is zero, also known as the Lomax distribution.This rather popular distribution was originally introduced to model business failure data [31], but has subsequently been employed as a model in a wide range of settings.This distribution has also been found to be extremely useful in modelling survival times when censoring is present.In [33] the lifetimes of electric power transformers are modelled and it is found that the Lomax distribution provides the best fit among the various distributions considered.
Another important application of the Lomax distribution is found in autoregressive conditional duration (ACD) models, which model high frequency financial data, where the duration between market events are of interest.Without discussing the details of the model we will briefly describe the role of the Lomax distribution in ACD models (for an in-depth exposition of ACD models, see [18] and [41]).We only note that the error terms of the ACD model are historically assumed to be exponential or Weibull distributed, see, for example, [32].However, in [7] and [8], it is found that these choices do not adequately describe the characteristics of the observed data.In [14], the authors advocate the use of infinite mixtures of exponential distributions to model the distribution of the error term, finding that this choice improves the fit of the model.
If the scale parameter of this mixing exponential distribution follows an inverse Gaussian distribution, then the resulting distribution of the error term is Lomax.It is therefore important for the efficient implementation of these ACD models that we are able to accurately estimate the parameters of this distribution.Another important area of ongoing research is goodness-of-fit tests for the Lomax and related distributions as these tests check the assumptions underlying the ACD models.
Other scenarios in which the Lomax distribution has been used as a model include wealth distribution [4], the distribution of queueing service times [23], life testing [24], and the sizes of files on a computer server [25], to name but a few.
We say a random variable X follows a Lomax distribution with scale parameter σ > 0 and shape parameter β > 0, if its cumulative distribution function (CDF) is with probability density function (PDF) , x > 0.
In this paper, we investigate different approaches used for estimating the parameters of the Lomax distribution.We also draw parallels with the methods used to estimate the parameters of other Pareto distributions in order to ascertain how one might go about obtaining reliable estimators.For example, when considering the estimation of the parameters of the GPD, a number of ap-proaches such as maximum likelihood estimation (MLE), method of moments estimation (MME), and probability weighted moments estimation (PWME) have been studied.However, not all approaches are equally viable in all settings.It is noted in [28], for example, that the MLE experiences difficulty when estimating the parameters of the GPD under a variety of configurations of the shape and scale parameter values.Similarly, [15] find that the usefulness of the MLE is restricted to very large sample sizes; the use of PWME is advocated in case of moderate sample sizes.For the Lomax distribution in particular, similar studies have been conducted in, for example, [20] and [38].These papers are concerned with the estimation of Lomax parameters using diverse traditional methods of estimation including MLE, MME, and PWME.The results of these methods are compared to one another and recommendations are made concerning the use of these more traditional estimation approaches.It is also found in these papers that MLE approaches are not particularly reliable when used with moderately sized samples.The conclusion from these studies is that these traditional approaches have utility in certain settings, but not all.
The aim of this paper is twofold.First, we provide an overview of estimation techniques used for the Lomax distribution and we compare the performance of the various estimators.Second, we focus on the minimum distance estimators (MDEs) which represents an unexplored avenue for the estimation of the parameters of the Lomax distribution.While some of these procedures have already been examined in [29] for the GPD, focusing on the Lomax distribution will provide specific insight into the usefulness of these methods for this distribution.In addition, the MDEs we consider differ substantially from those proposed in [29] and [5], in that they only consider density power divergence measures whereas we additionally explore some distribution function based methods as well.
Generally speaking, MDEs measure the difference between an empirical estimate of the probability density or distribution function and the theoretical function [6].The use of minimum distance estimation measures has been advocated by a number of authors when modelling data containing extreme values including [11], [36] as well as [35] since these estimators have good robustness qualities.
This paper studies the general case where both the scale (σ) and shape (β) parameters of the Lomax distribution are unknown.We start by considering the myriad different traditional estimation methods proposed for related distributions, including the L-moment estimator, the PWME, the MLE, and the MME.We then go on to propose the use of MDEs in the hope that these will be competitive alternatives to the more traditional estimators.The performance of the estimators are compared to one another via a comprehensive numerical study involving Monte Carlo simulations where the finite sample variance, relative bias, and mean squared error (MSE) of each estimator is approximated.In addition, an omnibus measure allowing one to gauge the MSE of the estimation of both σ and β simultaneously is employed in the simulation study.
The remainder of the article is organized as follows.In Section 2, the different parameter estimation methods are introduced and discussed.Thereafter, the Monte Carlo study is presented in Section 3, including a detailed discussion of the results of the Monte Carlo study.In Section 4, we apply the different estimators considered to a real world example.Finally, in Section 5, we conclude the paper by making a number of general observations regarding the estimators, provide some recommendations regarding the use of each.

Estimation of parameters
In this section, we discuss the parameter estimation methods which are used to estimate the shape and scale parameters of the Lomax distribution.Included here are traditional methods of estimation for parameters, along with methods based on optimising a variety of distance measures.In addition to the performance of the MDEs, we provide an overview of the performance of a wide variety of competing estimators.This facilitates recommendations regarding the choice of estimator to use in various settings.The theoretical detail relevant to the estimation of the parameters is provided in each case.
For the remainder of the section, we assume that we have data X 1 , X 2 , . . ., X n which is independently and identically distributed (i.i.d.) from the Lomax distribution with parameters β > 0 and σ > 0. The order statistics based on this sample are denoted using X

Method of moment estimators (MMEs)
The MME is one of the most widely used estimation techniques.It has been used to investigate key aspects of probability distributions, such as central tendency, spread, skewness, and kurtosis.However, due to the fact that, for the Lomax distribution with shape parameter β, only moments lower than β are finite, this approach is often not ideal for the parameters of the Lomax distribution.It is ultimately presented here and in the Monte Carlo in Section 3.1 to illustrate its functionality for a range of values of β.
To derive the MMEs for the Lomax distribution, we note that the r th theoretical raw moment of the distribution is The first and second theoretical raw moments are then given by From ( 1) we find that we can express β and σ as By simply substituting µ 1 and µ 2 with their sample estimators in (2), the MMEs of these parameters are obtained:

L-moment estimators (LMEs)
L-moments provide an alternative way to describe the shapes of probability distributions and are defined to be the expected values of linear combinations of order statistics.They were proposed in [27] as robust alternatives to classical moments.The method of L-moments has its advantages over classical moments including unbiasedness, robustness to the presence of outliers in the data and being less sensitive to sampling variability especially with data having heavy tails (which is the case in the current context).If the mean of the distribution exists, it follows that all of the L-moments exist and uniquely define the distribution [27].The r th population L-moment is given by: where r is the integer order of the L-moment, and E(X r−k:r ) is the expectation of the (r − k) th order statistic of a sample of size r.The sample L-moment is given by The expressions of the first two sample L-moments are therefore Similar to the method of moments, the L-moments can be used to obtain parameter estimates by equating the theoretical L-moments of the distribution to the corresponding L-moments of a sample.The L-moments estimators of the parameters of the Lomax distribution are therefore given by see [38].L-moments can be a good starting point for the iterative numerical procedure needed to obtain maximum likelihood estimates [26].

Probability weighted moment estimators (PWMEs)
PWMEs were proposed in [22] as a tool for estimating the parameters of probability distributions.Probability weighted moments of a random variable X with CDF denoted F (• ) are quantities defined as: where p, u, and v are integers.Useful special cases of PWMEs are when u = v = 0, yielding the conventional non-central moments, and the cases when p = 1, with either u or v set to zero, yielding These quantities permit the following unbiased estimators [see 17,38]: Finally, as with L-moment and conventional moment estimators, the PWMEs are obtained by equating the expressions for M p,u,v to their estimators and solving the resulting equations.In this case, the moments M 1,0,0 and M 1,0,1 are used to achieve the results For small samples, both PWMEs and L-moments estimators perform well.Indeed because these quantities are linear combinations of each other, the calculation of one implies the calculation of the other, and thus inferences based on either are identical [27,3].

Maximum likelihood estimators (MLEs)
Maximum likelihood estimation is one of the most popular methods for estimating the unknown parameters of probability distributions.The popularity of MLEs can be attributable to its desirable asymptotic properties such as unbiasedness and consistency.However, for small sample sizes, these properties may not hold true, resulting in biased MLEs.
In this section, we first use the MLEs to estimate the unknown parameters, and then, in the section that follows, we consider a bias-adjusted approach which reduces the bias of the MLEs to order O(n −1 ), see [20].

MLE of the Lomax distribution
The MLEs for the Lomax distribution are obtained by maximising the loglikelihood function Differentiating the log-likelihood with respect to β and σ, respectively, yields the equations: Unfortunately, one cannot obtain closed-form solutions for both estimators using (4) and ( 5) so numerical procedures will be required for optimisation purposes.
To simplify the optimisation problem, the expressions are reduced to the optimisation of a single parameter (this is the approach followed in [16] using their flomax function).This is accomplished by first noting that when setting (4) to zero and solving we get the MLE, The maximised log-likelihood, := ( β M L , σ M L ; X 1 , . . ., X n ), can therefore be expressed solely in terms of the MLE, σ M L , by substituting the form of β M L given in (6) and σ M L into (3): Then, using R's one-dimensional optimisation function, optimize [37], the MLEs σ M L and β M L (via (6)) are obtained.

MLEs of the Lomax distribution adjusted for bias (MLE.b)
For small sample sizes, the MLEs sometimes perform poorly.In order to solve this problem, [20] proposed the use of second-order bias-adjusted versions of the MLEs.To obtain these estimators the bias of the MLE is first approximated and then subtracted from the MLE to produce the bias-corrected estimate.
Let K be the Fisher information matrix for the Lomax distribution.It can be shown that The O(n −1 ) bias is obtained in [20] as where vec(• ) is an operator that, when applied to a (m × n) matrix, produces a single column vector of dimension (mn × 1) by piling up the column vectors below one another, and the matrix A is given by The bias-adjusted estimators proposed in [20] are then obtained by subtracting an estimator of this approximated bias from the original MLEs: where The bias-adjusted estimators in (7) have simple closed-form expressions, making them computationally appealing.It is expected that the bias adjustedestimators, σ M LB and β M LB will outperform σ M L and β M L , respectively for smaller to moderate samples.

Minimum distance estimators (MDEs)
MDEs are estimators minimizing, for example, some distance measure between the theoretical CDF and the empirical distribution function (EDF) of the observed sample data.Note that we define the EDF as where I(•) is the indicator function and n is the sample size.The main idea is to see how closely the theoretical CDF is to the EDF and the distance between two probability distributions captures the difference in information between them.Minimum distance estimation was first subjected to the in-depth study in a series of papers discussed in [43] and has since been considered as a method for deriving efficient and robust estimators by [9], [10], [11] as well as [36], among others.Several minimum distance estimation methods have been proposed including those derived from empirical distribution function such as Cramér-von Mises and those measures derived from entropy such as Kullback-Liebler divergence.In this section we explore several distance measures; the Cramér-von Mises (CVM) statistic, least squares, and some phi-divergence measures such as Kullback-Leibler(KL) divergence, total variation (TV) distance and chi-square divergence.
In Section 3 we calculate these MDEs using the optim function in R, specifically we use the "BFGS" optimisation method; see [13], [19], [21] and [39].This is a quasi-Newton method reliant on gradients to perform optimisation.As initial values we use the LMEs as they are available in closed-form and the LMEs generally perform well compared to other estimators.

A Cramér-von Mises distance measure (MDE.CvM)
The Cramér-von Mises (CVM) statistic is based on the squared integral difference between the EDF and the theoretical distribution.It has the form 2 dF (x).
We can define the CVM distance measure used to estimate the parameters of the Lomax distribution using the MDE approach as follows: where F σ,β (x) and f σ,β (x) are the CDF and PDF of the Lomax distribution with scale parameter σ and shape parameter β, respectively.This distance measure permits a simple calculable form given by Finally, the resulting estimators for σ and β can be expressed as simply the values of those parameters that minimise this distance measure: D CM n (σ, β).

A 'squared difference' distance estimator (MDE.SD)
An alternative to the Cramér-von Mises distance measure is the much simpler 'squared difference' measure.This distance measure focuses only on the squared difference between the quantities F n and F and has the following general expression: Now, using of the Lomax distribution function, F σ,β (• ), the distance measure can be defined as where, upon simplification, we obtain the following tractable calculation form: Therefore, the estimators obtained by minimising this 'squared difference' distance measure can be expressed as

The φ-divergence distance measures
The φ-divergence distance measures represent a broad class of distance measures that describe the distance between two densities f and g, and is defined as: where X is a random variable from a distribution function with density f and φ(•) is a convex function such that φ(1) = 0 and ∂ 2 φ(x) = φ (1) > 0. In (8), we set g(x) to be the Lomax density function, f σ,β (x), and set f (x) to be some empirical estimator for the density, f (x), to get the following distance measure which is to be minimised in order to estimate the parameters σ and β: .
The required parameters estimates are ( σ, β) = arg min In this setting, we estimate the density using the kernel density estimator, f h (x), defined as where k(•) is the kernel function, chosen to be the standard normal density function, and h is the bandwidth, chosen using Silverman's rule-of-thumb h = 0.9n −1/5 min(s, (q 3 − q 1 )/1.34),where q 1 and q 3 denote the sample quartiles and s denotes the sample standard deviation [40].The practical implementation of the kernel density estimator is executed using the density function in R [37].Different choices of the function φ(• ) lead to different forms of this distance measure.We consider three choices of φ(•), these choices are described below.
Kullback-Liebler φ-divergence distance measure (MDE.KL) The Kullback-Liebler φ-divergence distance measure can be obtained by setting The resulting distance measure is and we denote the estimator by ( σ KL , β KL ).[5] note that this Kullback-Liebler form simply represents a robust extension of MLE.This approach is also followed in [29] where a general class of density power divergent estimators are employed to estimate the parameters of the GPD.

Finite sample results
Below we consider the Monte Carlo setting used before turning our attention to the results obtained.

Monte Carlo simulation settings
In order to compare the performance of the various estimators to one another, a comprehensive Monte Carlo simulation study was conducted.The simulation study sets out to approximate several distributional properties of the estimators, including the expected value, variance, relative bias (RB) and mean squared error (MSE).Using these various approximations, it will be possible to compare the estimators to one another in a sensible manner.
The Monte Carlo was conducted by simulating M C = 10 000 samples of size n ∈ {30, 50, 100, 200, 500} from the Lomax distribution using a variety of parameter settings.For the parameter σ, the values used included σ ∈ {0.5, 1, 2} whereas for the parameter β, we used β ∈ {1.1, 1.5, 2, 2.1}.We consider only values of β exceeding 1, since the mean of the distribution is infinite if β ≤ 1, see (1).Furthermore, the value of β = 2.1 is included so as to consider the case where the variance exists.Note that, for sample size n = 30, we omit the results pertaining to σ = 0.5.The majority of the simulations in this setting produced data sets for which the optimisation procedures used fail to converge, rendering the results unusable.
For each randomly generated sample the parameters were estimated and collected to ultimately obtain the approximations of the distributional properties mentioned.All calculations were done using R [37].
In addition to calculating the MSE for each parameter separately, a combined MSE yielding a single value was also calculated as follows.For every pair of estimates, define j=1 ς j as the measure of the total mean squared error (TMSE) of the estimation technique.
Tables 1 to 8 present the Monte Carlo approximations of the expected value, variance, RB and MSE for all estimators of σ and β as well as the TMSE described above, for sample sizes n = 30, 50, and 100.The remaining tables for samples sizes n = 200 and 500 are presented in the appendix.The RB of an estimator is defined to be the bias divided by the true parameter value.Figures 1 and 2 show the logarithm of the TMSE as a function of sample size.We choose to use a logarithmic scale as this increases the separation between the lines on the graph, which makes the figures easier to interpret.

Results
To get a better understanding of the performance of the estimators considered in this study, we present a brief discussion of Tables 1 to 14.We compare the variance, RB, and MSE performance of the estimators to one another in various parameter settings in the Monte Carlo study.
We start by first noting that the RB, variance, and MSE of each of the estimation methods all improve when the sample sizes increase.The most notable improvements in overall MSE as the sample size increases from 30 to 500 are in the MLE, MDE.TV and MME, however, as will be mentioned later, the MME method should likely not be considered for estimation even for large samples.Interestingly, we find that the methods that perform well for smaller sample sizes do not always perform well for larger samples when compared to the competitor methods.In particular, when the sample size is small (n = 30 or n = 50), one can see that the MDE.SD, MDE.CvM and LME have, in the majority of cases, the smallest variance, RB and MSE values.Note that this small sample behaviour of the variance and MSE measures holds for all but the parameter β = 1.1.However, when considering the case where the sample size is small and the parameter β is specified to be small, there are instances where the MDE.CvM exhibits comparatively larger bias than MDE.SD, LME, and even MLE.b.It is thus clear that the underlying parameter values also play a role in the relative performance of the various estimation techniques considered.Finally, when considering the RB for the various estimators, we note      that MDE.SD most frequently gives rise to negative RB values.Based on the generally positive RB values associated with the remaining estimators, it therefore seems that these other estimators tend to overestimate the true parameter value.This behaviour of MDE.SD (and, indeed, all other MDE-type estimators considered) is difficult to explain as their distributional properties depend on the parameters and do not permit simple closed-form expressions.Note that while the MDE.KL, MDE.SD and LME are also relatively good estimators (in an MSE sense) when the sample size is large (n = 200 or n = 500), the MLE method now has the best performance among all methods considered in this setting.
Next, we consider the behaviour of the estimators as the parameter settings of the Monte Carlo change from one value to the next.As β increases from 1.1 to 2, the values of the approximated variance, RB, and MSE of the estimators for both β and σ have the tendency to increase as well.However, these values then immediately start to decrease again when β is strictly larger than 2.
When the values of σ increases from 0.5 to 2, the simulation approximation of the variance of the estimators of σ generally increases for all of the different estimation methods.The same is not necessarily true for the estimation of β, where we find that there are instances where the approximated variance of the estimator of β (using a given estimation method) decreases when the true σ increases from 1 to 2, and increases in others.
When comparing the class of estimators based on minimum distance measures to the remaining methods, one can readily see that the MDE.SD and MDE.CvM estimators are the best performers in terms of the MSE for small to moderate samples.The LME is an excellent competitor, but the MLE and MLE.b estimators perform relatively poorly in terms of the MSE in these cases.However, for larger sample sizes, the MLE and MLE.b clearly outperform the entire MDE class of estimators.Interestingly, the LME remains competitive here too, but is almost never found to have the best MSE performance.
The performances of the estimators within the class of MDE methods also differ wildly from one another.We have already noted that the MDE.SD and MDE.CvM methods generally have good MSE performance for small to moderate samples sizes, but within this class we find that the MDE.TV and MDE.χ 2 have some of the worst performances.The MDE.KL method is something of a mixed bag: some parameter settings yield good MSE performances, where others yield extremely poor MSE performance.Generally, however, we can state that the MDE.KL method is a better performer than MDE.TV and MDE.χ 2 , although this still does not place this estimator near the top performers among the estimation methods considered.
The overall worst performing estimator is MME regardless of which value of β or σ is considered.While this estimator improves with an increase in sample size, it still has the worst performance when compared to all other estimation methods.
In Figures 1 and 2 the logarithm of the TMSE measure is plotted against the samples sizes in order to produce a more intuitive representation of the behaviour of a selection of estimators over varying sample sizes.The estima-tors MLE, MLE.b, LME, MDE.CvM and MDE.SD are selected for includion in these figures since, according to the tables, these estimators generally perform well (the TMSE values of the remaining estimators can be found in the tables).We see that, as expected, the TMSE measure's value generally decreases as sample size increases.We note, however, that the MLE and MLE.b Accuracies are erratic for small samples; see, for example, the graphs associated with β = 2 in Figures 1 and 2. Furthermore, we see that while the MLE and MLE.b techniques have the worst TMSE performance for small sample sizes, this trend reverses when the samples sizes increase.Conversely, it also clear that the MDE.CvM and MDE.SD estimators performs well, in terms of TMSE, for smaller sample sizes, in many cases outperformed only when the sample sizes increase to n = 100 and higher.It should be noted that the LM E also performs well for smaller sample sizes.Furthermore, the numerical results indicate that the relative performance of the estimators depend, to some degree, on the specific parameter values considered.

Remark:
The extremely large values for the variance reported for the MLE and MLE.b estimators can be attributed to the difficulty in estimating the parameters for small sample sizes.The large variances appear since, in a small number of aberrant simulated data sets, the estimation of the parameters produces extremely large estimates for the parameters.However, if we calculate the 'trimmed' variance, that is, the variance of the smallest (100 − α)% of the parameter estimates, the variance drastically reduces to more reasonable values.For example, in the case where n = 30, β = 1.5 and σ = 1, we find that the approximate variance of the MLEs is 537.8, but, trimming only 1% of the highest estimates, the trimmed variance is approximately 5.5.

Practical application
The various Lomax parameter estimation techniques are now applied to a set of observed losses resulting from wind related catastrophes.This data set is also analysed in [12] and [2].The data set is comprised of the monetary expenses incurred as a result of wind related catastrophes in 40 separate instances during 1977 (these values are rounded to the nearest million US dollars).This rounding causes unrealistic clustering in the data which may lead to problems when fitting the Lomax distribution.In order to circumvent the associated problems, we use the de-grouping algorithm discussed in the two papers mentioned above.This algorithm replaces the values in each group of tied observations with the expected value of the order statistics of the uniform distribution with the same range.That is, if there are k observations in an interval (l, u), we replace the observations by for j ∈ {1, . . ., k}.We emphasise that this de-grouping algorithm d]oes not change the mean of the data.The de-grouped data can be found in Table 9.Since the observed (rounded) minimum amount of damage is 2, we conclude that no value less than 1.5 was observed.As a result, we subtract 1.5 from each of the observed values so as to ensure that the support of the distribution is appropriate.Assuming a Lomax distribution for the shifted data, we use each of the estimation techniques to obtain the parameter estimates of the distribution.The resulting parameter estimates are shown in Table 10.The table indicates that, with the exception of MDE.TV and MME, the parameter estimates obtained are closely related to each other.We note that the difference between the estimates arrived at using these estimators on the one hand and the remaining estimators on the other hand is unsurprising given the irregular performance of these two estimators observed in the Monte Carlo study above.The code for this example can be found here: https://tinyurl.com/LomaxPracticalCode Figure 3 shows the EDF of the data with several fitted distribution function obtained using the MLE, MLE.b, LME and MDE.CvM.Visual inspection of Figure 3 indicates that, while minor deviations from the Lomax distribution are present, the Lomax provides a suitable model for the wind catastrophe data.
We also perform formal goodness-of-fit tests in order to test the hypothesis that the data is realised from the Lomax distribution.For this purpose, we For the sake of brevity, we mention only that we use the parametric bootstrap approach detailed in [1].The p-values associated with the fitted distributions are 0.28, 0.77, 0.43 and 0.03, respectively.As a result, we are able to use the fitted Lomax model to calculate theoretical properties such as quantiles and moments parametrically.
Remark: As suggested by a reviewer, due to the instability observed in some of the estimators for smaller sample sizes, we include a small Monte Carlo simulation similar to those displayed in Section 3. We specify the sample size to be n = 40 and the parameter values to be β = 6 and σ = 50, as these values closely correspond to the majority of the estimated values obtained for the wind catastrophe data set.The results from the study are displayed in Table 11.When analysing the wind catastrophe data we are able to obtain estimators for the parameters using the MLE and MLE.b methods.In contrast, in the simulation study, we find that the optimisers required for these estimators very rarely converge, meaning that these estimates often do not exist.We thus exclude the MLE and MLE.b from the table and report only the remaining estimators.We must therefore caution the user against these methods in the case of small samples.Similar erratic tendencies for these estimators were observed in Figures 1 and 2.

Conclusion
In this study different parameter estimation methods for the two parameters of the Lomax distribution are explored and discussed.We review L-moment estimators, probability weighted moments estimators, maximum likelihood estimators (with and without a bias adjustment), method of moments estimators, and three different minimum distance estimators.The specific goals of the study were to provide an overview of the different estimation methods as well as to determine whether, for this distribution, the minimum distance estimators are feasible estimation alternatives when compared to traditional methods of estimation like maximum likelihood.In an attempt to ascertain the properties of these estimators, all of the methods were compared to one another via a comprehensive Monte Carlo simulation, assuming different parametric values for small to large samples.Unsurprisingly, this study showed that the MLEs perform well in large samples, but yield severely biased answers in small samples.The bias correction outlined in [20], however, noticeably reduces the bias of the MLE estimates for moderate to large sample sizes.The traditional MME is found to be a universally poor performer in terms of MSE for all settings of the parameters β and σ, and the given sample sizes.The simulation shows that LME is stable under different parameter settings, including varying values of β, σ, and the sample size.So, while this estimator is almost never found to have the best performance in either small or large sample settings, it can be considered to be a 'safe' option for the estimation of the parameters of the Lomax distribution.Finally, several estimation methods based on distance measures were investigated and were found to produce varied results.These methods all attempt to estimate the parameters of the Lomax distribution by considering different measures of distance measures that need to be minimised.The measures considered include Cramér-von Mises (CVM), squared difference, and some phi-divergence measures (including the Kullback-Leibler (KL) divergence, total variation (TV) distance and chi-square divergence).The results of the simulation study established that for this class of MDE methods, the MDE.SD and MDE.CvM methods have the best overall MSE performance for small to moderate samples sizes, and we would recommend either of these methods for the estimation of the Lomax parameters σ and β when sample sizes are smaller than 100.However, within this MDE class we also find two of the worst performing methods, namely the MDE.TV and MDE.χ 2 methods.The MDE.KL method produces reasonably good MSE performance for the moderate to large sample sizes, but, as with MLE, the quality of the estimator fluctuates tremendously for smaller samples.
The results show that no single estimator has the best overall performance, but we can identify the distance-based measure, MDE.SD, as generally having the best MSE performance for small sample sizes (n = 30 and n = 50), whereas MLE.b is found to be the better choice for moderate and large sample sizes (n = 100, n = 200 and n = 500), for almost all σ and β values considered.
In the literature one finds competing views about the usefulness of MDEs and MLEs.In the case of multivariate data, [30] (specifically estimators for the parameters of a copula) advocates for the use of MDEs because MLEs have the disadvantage that one needs the density of the copula, whereas for MDEs one only requires the empirical distribution estimate of the copula.However, in an extensive overview paper comparing MDEs and MLEs for parameter estimation of copulas, [42] found that MLEs, in addition to having a lower computational cost, also produce lower estimation biases.Naturally, the other main advantages of the traditional MLEs are the ease with which asymptotic standard errors can be obtained, the fact that they have asymptotically normal distributions, and, consequently, confidence intervals for the parameter being estimated are easily obtained.In contrast, the standard errors of the other estimators considered in this paper require either complex analytical derivations (possibly with no tractable forms) or require the use of resampling methods.

Figure 1 :Figure 2 :
Figure 1: Plots of the log of the TMSE measure against sample size for 5 different estimation techniques when σ = 1.

Figure 3 :
Figure 3: The empirical distribution function and fitted distribution functions based on the wind catastropes data

Table 1 :
Comparison of different estimation methods for different values of β (σ = 1 and n = 30).

Table 2 :
Comparison of different estimation methods for different values of β (σ = 2 and n = 30).

Table 3 :
Comparison of different estimation methods for different values of β (σ = 0.5 and n = 50).

Table 4 :
Comparison of different estimation methods for different values of β (σ = 1 and n = 50).

Table 5 :
Comparison of different estimation methods for different values of β (σ = 2 and n = 50).

Table 6 :
Comparison of different estimation methods for different values of β (σ = 0.5 and n = 100).

Table 7 :
Comparison of different estimation methods for different values of β (σ = 1 and n = 100).

Table 8 :
Comparison of different estimation methods for different values of β (σ = 2 and n = 100).

Table 9 :
Wind catastrophes data set.

Table 10 :
Estimated parameters for the wind catastrophes data set.

Table 12 :
Comparison of different estimation methods for different values of β (σ = 0.5 and n = 200).

Table 13 :
Comparison of different estimation methods for different values of β (σ = 1 and n = 200).

Table 14 :
Comparison of different estimation methods for different values of β (σ = 2 and n = 200).

Table 15 :
Comparison of different estimation methods for different values of β (σ = 0.5 and n = 500).