Next Article in Journal
Using Weighted Data Envelopment Analysis to Measure Occupational Safety and Healthy Economic Performance of Taiwan’s Industrial Sectors
Next Article in Special Issue
Progressive Type-II Censoring Schemes of Extended Odd Weibull Exponential Distribution with Applications in Medicine and Engineering
Previous Article in Journal
Optimal Disturbance Observer Design for High Tracking Performance in Motion Control Systems
Previous Article in Special Issue
A New Extended Two-Parameter Distribution: Properties, Estimation Methods, and Applications in Medicine and Geology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Reliability Estimation for Lindley Distribution—A Probability Integral Transform Statistical Approach

by
Muhammad Aslam Mohd Safari
,
Nurulkamal Masseran
* and
Muhammad Hilmi Abdul Majid
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(9), 1634; https://doi.org/10.3390/math8091634
Submission received: 27 August 2020 / Revised: 18 September 2020 / Accepted: 18 September 2020 / Published: 21 September 2020
(This article belongs to the Special Issue Probability, Statistics and Their Applications)

Abstract

:
In the modeling and analysis of reliability data via the Lindley distribution, the maximum likelihood estimator is the most commonly used for parameter estimation. However, the maximum likelihood estimator is highly sensitive to the presence of outliers. In this paper, based on the probability integral transform statistic, a robust and efficient estimator of the parameter of the Lindley distribution is proposed. We investigate the relative efficiency of the new estimator compared to that of the maximum likelihood estimator, as well as its robustness based on the breakdown point and influence function. It is found that this new estimator provides reasonable protection against outliers while also being simple to compute. Using a Monte Carlo simulation, we compare the performance of the new estimator and several well-known methods, including the maximum likelihood, ordinary least-squares and weighted least-squares methods in the absence and presence of outliers. The results reveal that the new estimator is highly competitive with the maximum likelihood estimator in the absence of outliers and outperforms the other methods in the presence of outliers. Finally, we conduct a statistical analysis of four reliability data sets, the results of which support the simulation results.

1. Introduction

Reliability is defined as the ability of a system or component to perform its required functions under stated conditions for a specified period of time [1]. In reliability theory, various aspects of reliability, probability, statistics and stochastic modeling are studied in combination with engineering principles in the design and scientific understanding of failure mechanisms [2]. Reliability analysis has been utilized to analyze data from various fields, including engineering, medicine, biology, ecology, economics, sociology and the social sciences [3,4,5]. In some areas, reliability data analysis is also referred to as lifetime, failure-time, survival or event-time data analysis [3]. In the analysis of reliability data, reliability properties are often defined using the mean time to failure, reliability function and failure rate function.
Parametric statistical distributions are often used to model and analyze reliability data. The advantages of applying a parametric model in the analysis of reliability data are as follows—a parametric distribution can be described concisely based on only a few parameters rather than having to report an entire curve and also provides smooth estimates of failure time distributions [3]. Several useful parametric models that are often considered for reliability analysis are the exponential, Weibull, gamma and lognormal distributions [3,4,5]. Due to its mathematical simplicity, the exponential distribution is recognized as the most popular and widely used model for reliability data analysis [3,4].
It has been argued that the Lindley distribution provides a better fitting than one based on the exponential distribution [6]. In fact, Ghitany et al. [6] found that various statistical properties of the Lindley distribution are more flexible than those of the exponential distribution. However, due to the popularity of the exponential distribution in many fields, the Lindley distribution has yet to be very well explored and has attracted little attention in the literature [6,7]. The Lindley distribution, as first proposed by Lindley [8], is a two-component mixture comprising of an exponential and a gamma distribution [9]. Over the last decade, research for the purpose of proposing a new model related to the family of Lindley distributions has attracted considerable interest by several researchers [10,11,12,13,14,15,16,17,18]. The main objective of introducing an extension to the Lindley distribution is to offer more flexible distribution structures for fitting data.
The presence of outliers in a dataset is common, including reliability data. Outliers are defined as data that appear to deviate from the bulk of the observations [19]. Outliers may arise due to errors or simply by natural deviations in a data set. In reliability modeling, the maximum likelihood (ML) estimation method is often used to estimate the parameters of a particular parametric model. In fact, the ML estimator is well known to be efficient for any parametric distribution. However, in the presence of outliers, the ML estimator is not robust and experiences severe bias [20]. The biased parameter estimates of some parametric models can result in biased estimation of reliability. Thus, when outliers are present in the data, the use of the ML estimator should be avoided and an alternative, more robust method, should be applied to ensure the unbiased estimation of parameters.
In the literature, several robust estimators have been proposed for estimating the parameters of parametric models, including the exponential, Weibull, gamma and lognormal distributions [21,22,23,24,25,26,27]. However, to our knowledge, a robust method for estimating the parameter of the Lindley distribution has not been proposed. In this study, we propose a new robust and efficient estimator for the parameter of Lindley distribution that offers reasonable protection against outliers based on the probability integral transform statistic. In probability theory, the probability integral transform is a means for transforming any continuous random variable into one that is standard and uniform [28]. For instance, if a random variable X has a continuous distribution function F(x), then random variable U = F(X) has a uniform distribution on the interval (0, 1). In previous studies, this approach has been used in developing a robust estimator for the shape parameter of Pareto [29] and inverse Pareto [30,31] distributions. In this paper, we investigate the relative efficiency of the new estimator compared to that of the ML estimator, as well as its robustness based on the breakdown point and influence function. Based on a simulation study and real data applications, we show that the estimation of the Lindley model parameter is more reliable when the new estimator is employed in comparison to some well-known estimators both in the absence and presence of outliers in the data.
The rest of this paper is organized as follows. In Section 2, we present the Lindley model and its reliability characteristics. In Section 3, we discuss three well-known methods for estimating the parameter of the Lindley distribution. A brief explanation of M-estimators is provided in Section 4. In Section 5, we propose the new estimator for the parameter of the Lindley distribution and explore its properties. In Section 6, we compare the performance of the proposed estimator with those of several other estimators in the absence and presence of outliers based on a simulation study. In Section 7, three reliability data applications are applied to assess the performance of the new estimator relative to those of some competing estimators. Finally, in Section 8, we draw our conclusions.

2. Lindley Distribution and Reliability Measures

Let X be a random variable that follows a Lindley distribution. The respective probability density function (PDF), cumulative distribution function (CDF) and quantile function of the Lindley distribution are given by the following:
f ( x ; θ ) = θ 2 1 + θ ( 1 + x ) e θ x ; x > 0 , θ > 0 ,
F ( x ; θ ) = 1 ( 1 + θ x 1 + θ ) e θ x ; x > 0 , θ > 0 ,
and
Q ( u ; θ ) = 1 1 θ 1 θ W 1 ( 1 + θ e ( θ + 1 ) ( u 1 ) ) ; 0 < u < 1 ,
where θ is the parameter of Lindley distribution and W−1 denotes the negative branch of the Lambert W function [32].
The reliability or survival function of the Lindley distribution is given by:
R ( t ) = 1 F ( t ; θ ) = ( 1 + θ t 1 + θ ) e θ t ; t > 0 , θ > 0 .
The expectation or the mean time to failure (MTTF) of the Lindley distribution can be written as:
MTTF = E [ X ] = θ + 2 θ ( θ + 1 ) ; θ > 0 .
The failure or hazard rate function of Lindley distribution is as follows:
h ( t ) = f ( t ; θ ) R ( t ) = θ 2 ( 1 + t ) θ + 1 + θ t ; t > 0 , θ > 0 .

3. Several Estimators of the Parameter of the Lindley Distribution

In this section, we present three well-known methods to estimate the parameter of the Lindley distribution, including ML, ordinary least-squares (OLS) and weighted least-squares (WLS).

3.1. ML Estimator

Let X 1 , X 2 , , X n be a random sample of size n from the Lindley distribution with the PDF as shown in Equation (1). As reported by Ghitany et al. [6], the ML estimator for the parameter θ is given by the following:
θ ^ M L = ( X ¯ 1 ) + ( X ¯ 1 ) 2 + 8 X ¯ 2 X ¯ ; X ¯ > 0 ,
where X ¯ is the sample mean. Note that, as explained by Ghitany et al. [6], the method of moments (MOM) estimator for the parameter θ is similar to that obtained by the ML estimator.

3.2. OLS and WLS Estimators

Suppose that X ( 1 ) X ( 2 ) X ( n ) are the order statistics of a random sample from the Lindley distribution with the CDF as shown in Equation (2). Recall some results for order statistics:
E [ F ( X ( i ) ) ] = i n + 1 and V a r [ F ( X ( i ) ) ] = i ( n i + 1 ) ( n + 1 ) 2 ( n + 2 ) ,
for i ∈ {1, 2, …, n} and for all values of parameter θ.
OLS estimates for the parameter θ can be obtained by minimizing the following function with respect to θ:
L ( θ ) = i = 1 n [ F ( x ( i ) ; θ ) i n + 1 ] 2 ,
where x(i) is the ordered observations for i ∈ {1, 2, …, n}, that is, x ( 1 ) x ( 2 ) x ( n ) .
Taking the derivative of the function L(θ) with respect to θ and equating it to 0, we obtain
i = 1 n [ F ( x ( i ) ; θ ) i n + 1 ] Δ ( x ( i ) , θ ) = 0 ,
where
Δ ( x , θ ) = [ θ ( 1 + x ) + 1 ] θ x e θ x θ 2 .
By numerically solving the non-linear equation shown in Equation (9), the OLS estimates θ ^ O L S is obtained.
WLS estimates for parameter can be obtained by first minimizing the following function with respect to θ:
W ( θ ) = i = 1 n ( n + 1 ) 2 ( n + 2 ) i ( n i + 1 ) [ F ( x ( i ) ; θ ) i n + 1 ] 2 .
Apart from that, the WLS estimates θ ^ W L S can also be obtained by solving the following non-linear equation:
i = 1 n 1 i ( n i + 1 ) [ F ( x ( i ) ; θ ) i n + 1 ] Δ ( x ( i ) , θ ) = 0 .
Here, the variable Δ ( x , θ ) is given by Equation (10).

4. M-Estimators

M-estimators are generalized ML estimators that provide tools for measuring the robustness of ML estimator. As stated by Huber [33], an estimator δ n is defined either by
δ n = arg min δ i = 1 n ρ ( x i , δ )
or
i = 1 n ψ ( x i , δ n ) = 0
is called an M-estimator. ρ is a measurable function on X × Θ and ψ ( x , δ ) = ( / δ ) ρ ( x , δ ) is the derivative of the function ρ with respect to δ (when it exists). Note that if ρ ( x , δ ) = log f ( x ; δ ) , then δ ^ n is the ordinary ML estimator.

4.1. Efficiency Measure: Asymptotic Relative Efficiency

The ML estimator is well known to be efficient. For this reason, the ML estimator is useful in providing a quantitative benchmark for the measure of efficiency. Note that the ML estimator for the parameter of the Lindley distribution θ, which is given in Equation (7), is asymptotically normal with mean θ and variance θ 2 ( θ + 1 ) 2 / ( n ( θ 2 + 4 θ + 2 ) ) , that is,
θ ^ M L ~ N ( θ , θ 2 ( θ + 1 ) 2 n ( θ 2 + 4 θ + 2 ) ) .
For any competing estimator of parameter θ, say, θ ^ 0 the asymptotic relative efficiency (ARE) is defined as the ratio of the asymptotic variance of the ML estimator to the asymptotic variance of the competing estimator, which can be written as follows:
ARE ( θ ^ 0 ) = lim n V a r ( θ ^ M L ) V a r ( θ ^ 0 ) .
In other words, the ARE measures the relative efficiency of the competing estimator θ ^ 0 as compared with θ ^ M L . When the ARE value gets closer to 1, the efficiency of the estimator θ ^ 0 becomes closer to the efficiency of the ML estimator.

4.2. Robustness Measures: Influence Function and Breakdown Point

The breakdown point (BP), which is useful for assessing the robustness of a statistical approach, measures the degree of sensitivity of an estimator to data contamination. The BP is defined as the largest proportion of contamination that can be tolerated by an estimator before breaking down [20]. An estimator with a higher BP is more robust against data contamination. Note that there are two types of BP, a lower breakdown point (LBP) and an upper breakdown point (UBP). In the present context of θ estimation, the LBP indicates the largest proportion of lower contamination that can be tolerated by an estimator before forcing θ ^ and the UBP is the largest proportion of upper contamination that can be tolerated by an estimator before forcing θ ^ 0 . Note, however, that since contamination of the upper end of the distribution is of greater interest in most typical applications, we emphasize only the UBP in this study. As mentioned by Huber [33], an estimator that has an unbounded ψ function has a BP equal to 0. Note that the ψ function for the ML estimator is ψ ( x , θ ^ M L ) = 2 / θ ^ M L 1 / ( θ ^ M L + 1 ) x . For a finite sample, if a single observation x i , then i = 1 n x i and consequently, θ ^ M L 0 . This result shows that the function ψ ( x , θ ^ M L ) is unbounded in x and consequently suggests that θ ^ M L has a UBP equal to 0. It can be observed that even an extreme value of a single contaminated data in the upper tail of the observations would contribute to the unreliable performance of θ ^ M L .
Another approach for measuring robustness is the use of an influence function (IF). An estimator is considered to have desirable robustness if it has a bounded IF [33,34,35]. According to Hampel et al. [34] (p. 101), the IF of an estimator δ n that satisfies Equation (14) is defined by the following:
IF ( x ; ψ , F ) = ψ ( x , δ ( F ) ) ( / θ ) [ ψ ( y , θ ) ] δ ( F ) d F ( y ) ,
where δ(F) denotes the solution δ n of Equation (14) with samples generated from the CDF F. For the ML estimator, since the function ψ ( x , θ ^ M L ) is unbounded in x, its IF is also unbounded. Therefore, it is clear that θ ^ M L is not a robust estimator of parameter θ. For the case when ρ is not differentiable, the IF can be obtained by using the following expression [34] (p. 84):
I F ( x ; δ , F ) = lim t 0 δ ( ( 1 t ) F + t Δ x ) δ ( F ) t ,
where Δ x denotes the probability measure that puts mass 1 at point x.

5. New Robust M-Estimator for the Parameter of the Lindley Distribution

In this section, a new robust M-estimator is proposed based on the probability integral transform statistic. We also discuss the ARE and robustness of this new estimator.

5.1. Probability Integral Transform Statistic Estimator

Let X 1 , X 2 , , X n be a random sample from a Lindley distribution. Since the CDF of the Lindley distribution in Equation (2) is continuous and strictly increasing, it can be seen that the random variables F ( X 1 ) , F ( X 2 ) , , F ( X n ) follow a standard uniform distribution, that is, F(X)~U(0, 1). The new robust M-estimator for parameter θ, which we refer to as the probability integral transform statistic (PITS) estimator, is defined by:
H n , τ ( θ ) = n 1 i = 1 n [ ( 1 + θ X i 1 + θ ) e θ X i ] τ ,
where τ > 0 denotes a tuning parameter that will be used later to adjust the balance between efficiency and robustness. Notice that when τ = 1, [ 1 + ( θ X i / ( 1 + θ ) ) ] e θ X i = 1 F ( X i ) , is a random variable which follows the standard uniform distribution. Assume that u 1 , u 2 , , u n is a random sample from a standard uniform distribution. Based on the strong law of large numbers, it can be shown that n 1 i = 1 n u i τ converges to E [ u τ ] = 1 / ( τ + 1 ) as n with probability 1. Therefore, the PITS estimator of parameter θ ^ is defined as the solution with respect to θ to the following equation:
H n , τ ( θ ) = n 1 i = 1 n [ ( 1 + θ X i 1 + θ ) e θ X i ] τ = 1 τ + 1 .
Note that Equation (19) can be solved numerically using a method such as the Newton–Raphson, bisection or secant method. It is clear that the PITS estimator for the parameter of the Lindley distribution given in Equation (19) is a class of M-estimator with:
ψ ( x , θ ) = [ ( 1 + θ x 1 + θ ) e θ x ] τ 1 τ + 1 .
Lemma 1.
For any fixed τ > 0, Equation (19), that is, H n , t ( θ ) = 1 / ( τ + 1 ) , has exactly one solution.
Proof of Lemma 1.
Note that H n , τ ( θ ) is continuous on [0, ∞) and that H n , τ ( 0 ) = 1 > 1 / ( τ + 1 ) , whereas lim θ H n , τ ( θ ) = 0 < 1 / ( τ + 1 ) . By the intermediate value theorem, we know that H n , τ ( θ ) = 1 / ( τ + 1 ) for some θ in (0, ∞). This means that Equation (19) has at least one solution. It can also be shown that
H n , τ ( θ ) = n 1 i = 1 n θ x τ [ ( x + 1 ) θ + x + 2 ] [ x θ / ( θ + 1 ) + 1 ] τ e θ x τ ( θ + 1 ) [ ( x + 1 ) θ + 1 ] < 0 ,
for all θ > 0.
Therefore, H n , τ ( θ ) is strictly monotonic with respect to θ. Thus, it can be concluded the solution must be unique.  □

5.2. ARE of the PITS Estimator

By applying Equation (15), the ARE of the PITS estimator for parameter θ is given by:
ARE ( θ ^ P I T S ) = lim n V a r ( θ ^ M L ) V a r ( θ ^ P I T S ) .
To compute the ARE of the PITS estimator, that is, θ ^ P I T S , we must find the asymptotic distribution of θ ^ P I T S . One way to obtain the asymptotic distribution of θ ^ P I T S is by following Corollary 2.5 in Chapter 3 of Huber [33]. However, in the present context of the PITS estimator, this corollary cannot be applied since the function
λ ( β ) = 0 ψ ( x , β ) f ( x ) d x
is rather complicated to obtain. Thus, the variance of θ ^ P I T S cannot be determined. As an alternative we apply the Monte Carlo simulation method to estimate the variance of θ ^ P I T S , that is, V a r ( θ ^ P I T S ) .
As mentioned above, the balance between efficiency and robustness of the PITS estimator can be adjusted by changing the value of the tuning parameter τ. Note that as the value of τ increases, the ARE decreases. Simply put, when the value of τ increases, the PITS estimator attains robustness but loses its relative efficiency. By taking a τ value close to 0, the ARE of the PITS estimator can be made arbitrarily close to 1. Table 1 presents the AREs of the PITS estimator and the corresponding tuning parameter values obtained from 10,000 simulation runs.

5.3. BP of PITS Estimator

To obtain the UBP and LBP of the PITS estimator, we apply the argument presented in Finkelstein et al. [29]. The UBP and LBP of the PITS estimator are presented in Theorem 1.
Theorem 1.
The finite sample UBP and LBP of the PITS estimator are obtained by n τ / ( τ + 1 ) / n and n / ( τ + 1 ) / n , respectively, so the respective UBP and LBP of the PITS estimator are   τ / ( τ + 1 ) and   1 / ( τ + 1 ) .
Proof of Theorem 1.
For any integer 1 ≤ kn, the estimator θ ^ P I T S is defined as:
n 1 i = 1 k [ ( 1 + x i θ ^ P I T S 1 + θ ^ P I T S ) e x i θ ^ P I T S ] τ + n 1 i = k + 1 n [ ( 1 + x i θ ^ P I T S 1 + θ ^ P I T S ) e x i θ ^ P I T S ] τ = 1 τ + 1 .
For simplicity, let
h ( x ) = [ ( 1 + x θ ^ P I T S 1 + θ ^ P I T S ) e x θ ^ P I T S ] τ .
It can be shown that
h ( x ) = τ θ ^ P I T S 2 [ ( 1 + x θ ^ P I T S 1 + θ ^ P I T S ) e x θ ^ P I T S ] τ 1 e x θ ^ P I T S [ 1 + x 1 + θ ^ P I T S ] < 0 .
Therefore, h(x) is strictly decreasing with respect to x, for x > 0. It is also worth noting that h(x) > 0 and h(x) < 1 for any x > 0.
We proceed to prove for UBP first. Assume that x 1 , x 2 , , x k takes on values that approach ∞. Let x min = min { x 1 , , x k } . For any ε > 0, suppose
x min > ( 1 + θ ^ P I T S ) τ θ ^ P I T S 2 log ( ε n k ) + 1 + θ ^ P I T S θ ^ P I T S 2 .
Then, it can be shown that
1 τ log ( ε k n ) > x min θ ^ P I T S + ( 1 + x min θ ^ P I T S 1 + θ ^ P I T S ) > x min θ ^ P I T S + log ( 1 + x min θ ^ P I T S 1 + θ ^ P I T S ) .
The second inequality above is due to the fact that x > log(x) for any x > 0. It follows that
ε > k n h ( x min ) | 1 n i = 1 k h ( x i ) | ,
since h(x) is strictly decreasing with respect to x and is always positive for x > 0. Therefore, by definition,
lim x i ; i = 1 , , k { 1 n i = 1 k h ( x i ) } = 0 .
From Equation (23), we can write
1 τ + 1 = 1 n i = 1 k h ( x i ) + 1 n i = k + 1 n h ( x i ) < ε + 1 n i = k + 1 n h ( x i ) < ε + n k n ,
since h(x) < 1 for any x > 0. This is valid if and only if k < n ε + n τ / ( τ + 1 ) . Thus, the finite sample UBP is n τ / ( τ + 1 ) / n , where ⌈⋅⌉ is the ceiling function. By letting n in the finite sample UBP, we find that UBP equals τ / ( τ + 1 ) .
For LBP, suppose that x 1 , x 2 , , x k takes on values that approach 0. Let x max = max { x 1 , , x k } . For any 0 < ε < k/n, suppose
x max < 1 τ θ ^ P I T S log ( 1 ε n k ) .
Then, it follows that
( 1 ε n k ) 1 / τ < e x max θ ^ P I T S < ( 1 + x max θ ^ P I T S 1 + θ ^ P I T S ) e x max θ ^ P I T S .
Since h(x) < 1 for any x > 0 and h ( x max ) h ( x i ) for i = 1,…, k, then
ε > k n ( 1 h ( x max ) ) 1 n ( k i = 1 k h ( x i ) ) = 1 n | i = 1 k h ( x i ) k | .
If εk/n, then the above inequality is true for all x i , i = 1,…, k. Hence, by definition,
lim x i 0 ; i = 1 , , k { 1 n i = 1 k h ( x i ) } = k n .
From Equation (23), we can write
1 τ + 1 = 1 n i = 1 k h ( x i ) + 1 n i = k + 1 n h ( x i ) > k n ε + 1 n i = k + 1 n h ( x i ) > k n ε ,
since h(x) > 0 for any x > 0. This is valid if and only if k < n ε + n / ( τ + 1 ) . Therefore, the finite sample LBP is n / ( τ + 1 ) / n and by taking n , the LBP is equal to 1 / ( τ + 1 ) .  □
Based on Theorem 1, Table 2 lists the UBPs and LBPs of the PITS estimator for different ARE levels. We can see that as the ARE level decreases, the UBP increases, which suggests that the robustness of the PITS estimator is increasing against upper contamination. On the other hand, the LBP decreases as the ARE level decreases, which means that the robustness of the PITS estimator decreases against lower contamination.

5.4. IF of PITS Estimator

According to Marona [35], the IF of an estimator is an asymptotic version of its sensitivity curve (SC). In general, the SC measures the sensitivity of an estimator to the location of the outlier x 0 for a particular random sample. Since using Equation (16) to obtain the IF of the PITS estimator is rather complicated, we apply its SC as an approximation of IF. The SC of the PITS estimator θ ^ P I T S for the random sample x 1 , x 2 , , x n is defined as a function of the location of the outlier x 0 , which can be written as follows:
SC ( x 0 ) = θ ^ P I T S ( x 1 , x 2 , , x n , x 0 ) θ ^ P I T S ( x 1 , x 2 , , x n ) .
where θ ^ P I T S ( x 1 , x 2 , , x n ) is the PITS estimator using the samples x 1 , x 2 , , x n . To obtain the SC of the PITS estimator, we generated a random sample of size n = 50 from the Lindley distribution with parameter θ = 0.5, 1, 2, 3. Then, a single outlier x 0 is added to the sample data. The value of x 0 changed from 0 to 50 in increments of 1. Note that the 99th percentile are often used as upper boundaries for outlier identification, where any data point that exceed the 99th percentile can be considered as a potential outlier. For each parameter θ = 0.5, 1, 2 and 3, the 99th percentile of Lindley distribution are 12.4940, 5.9902, 2.8330 and 1.8222, respectively. Thus, the value of x 0 that exceed these 99th percentiles of Lindley distribution is large enough to be considered as a potential outlier. We obtained the SC ( x 0 ) of the PITS estimator with several different AREs for each value of x 0 , as shown in Figure 1.
Based on Figure 1, we can see that as x 0 increases, the SC of the PITS estimators with 98%, 90%, 80%, 70% and 60% AREs converge to a certain limit, which indicates that the curve is bounded. This suggests that the PITS estimator has a bounded IF and is robust against the location of a single outlier. As the level of ARE decreases, the limit of each curve moves closer to 0, which suggests that the PITS estimator becomes more robust.

6. Simulation Study

In this section, we investigate the performance of the ML, OLS, WLS and PITS (98%, 90%, 80%, 70% and 60% AREs) estimators in the absence and presence of outliers via a simulation study. In Section 6.1 and 6.2, we present the framework and results of the simulation study, respectively. Then, in Section 6.3, we provide some guidelines for selecting the appropriate ARE of the PITS estimator for practical application purposes.

6.1. Simulation Framework

The simulation procedure goes as follows:
Step 1:
Generate a random variable from the Lindley distribution for two sample size settings, that is, small (n = 30, 50, 70) and large (n = 100, 300, 500), with parameter θ = 0.5, 1, 2, 3.
Step 2:
Randomly select some observations and replace them with outliers generated from the Lindley distribution with parameter 0.05θ. Note that by multiplying the true value of parameter θ with 0.05, the Lindley distribution will have heavier upper tail and will produce larger values of random variables, which are interpreted as outliers. For the small sample sizes, generate outliers for several fixed numbers, m = 0, 1, 3, 5. For the large sample sizes, simulate outliers for several fixed proportions, ε = 0%, 1%, 5%, 10%.
Step 3:
Estimate the parameter θ using the ML, OLS, WLS and PITS (98%, 90%, 80%, 70% and 60% AREs) methods.
Step 4:
Repeat steps 1–3 10,000 times.
Step 5:
Calculate the performance of each estimator using the percentage relative root mean square error (RRMSE). For a given true value of parameter θ, the RRMSE is given by the following:
RRMSE = 100 θ 1 N i = 1 N ( θ ^ i θ ) 2 ,
where θ ^ i is the estimated parameter for the i-th (i = 1, 2,…, N) sample and N is the number of simulated samples. A smaller RRMSE value indicates that the estimator is more accurate and precise. Thus, any estimation method that minimizes the RRMSE provides the best estimation of the parameter θ.

6.2. Simulation Results

Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 list the results of the simulation study based on the obtained RRMSEs. We summarize these results as follows:
  • In the cases of both small and large sample sizes (Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8), we found the following:
    • When there are no outliers (m = 0 and ε = 0%) in the data, both the ML and PITS (98% ARE) estimators perform similarly and slightly outperform the OLS and WLS estimators.
    • In the presence of outliers, the performance of the ML estimator is much worse than that of the other estimators. As the degree of contamination increases, the performance of the ML estimator deteriorates significantly.
    • The OLS and WLS estimators are quite robust and offer some protection against outliers.
    • As the sample size increases, for all the cases considered, the performance of all the estimators improve, with the RRMSE values becoming smaller.
  • In the case of a small sample size (Table 3, Table 4 and Table 5), we found the following:
    • When the number of outliers is small (m = 1), the PITS (90% ARE) estimator performs best, that is, slightly better than the OLS and WLS estimators.
    • When the number of outliers is moderate (m = 3), the OLS, WLS and PITS (70% and/or 60% AREs) estimators perform almost equally well and are considered to be the best methods for this particular case.
    • When the number of outliers is large (m = 5), the PITS (60% ARE) estimator performs best, that is, slightly better than the OLS and WLS estimators.
  • In the case of a large sample size (Table 6, Table 7 and Table 8), we found the following:
    • When the proportion of outliers is small (ε = 1%), for n = 100, the PITS (90% ARE) estimator performs best, that is, slightly better than the OLS and WLS estimators. For n = 300, 500, the OLS, WLS and PITS (90% and 80% AREs) estimators perform almost equally well and are considered to be the best methods for this particular case.
    • When the proportion of outliers is moderate (ε = 5%), for n = 100, the OLS and PITS (70% ARE) estimators perform very similarly, slightly outperforming the WLS estimator. For n = 300, 500, the PITS (60% ARE) estimator performs best, outperforming all the other methods.
    • When the proportion of outliers is large (ε = 10%), the performance of the PITS (60% ARE) estimator also surpasses that of other methods.
Overall, in the presence of outliers, the use of the ML estimator should be avoided since it is not robust and provides no protection against outliers. It is also interesting to note that we found the OLS and WLS estimators to be quite robust and able to offer some protection against outliers. However, the proposed PITS estimator provides the most flexible approach as it can be applied in both the absence and presence of outliers.

6.3. Some Guidelines for Selecting the Appropriate ARE of PITS Estimator

Based on the results of a comprehensive simulation study, here we provide some guidelines for selecting the appropriate ARE of PITS estimator in practical application:
  • When there are no outliers in the data, the PITS (98% ARE) estimator should be applied for estimating parameter θ.
  • For small sample size setting, that is, n < 30:
    • When number of outliers m ≤ 3, the PITS (60–90% AREs) estimators are preferable for estimating parameter θ.
    • When number of outliers m ≥ 4, the PITS (50–60% AREs) estimators are preferable for estimating parameter θ.
  • For small sample size setting, that is, 30 ≤ n ≤ 70:
    • When number of outliers m ≤ 2, the PITS (80–90% AREs) estimators are recommended for estimating parameter θ.
    • When number of outliers 3 ≤ m ≤ 4, the PITS (60–80% AREs) estimators are preferable for estimating parameter θ.
    • When number of outliers m ≥ 5, the PITS (50–60% AREs) estimators are recommended for estimating parameter θ.
  • For large sample size setting, that is, 70 < n ≤ 100:
    • When number of outliers m ≤ 3, the PITS (70–90% AREs) estimators are recommended for estimating parameter θ.
    • When number of outliers 4 ≤ m ≤ 6, the PITS (60–70% AREs) estimators are recommended for estimating parameter θ.
    • When number of outliers m ≥ 7, the PITS (50–60% AREs) estimators are preferable for estimating parameter θ.
  • For large sample size setting, that is, n > 100:
    • When proportion of outliers ε ≤ 3%, the PITS (80–90% AREs) estimators are preferable for estimating parameter θ.
    • When proportion of outliers 3% < ε ≤ 7%, the PITS (60–80% AREs) estimators are preferable for estimating parameter θ.
    • When proportion of outliers ε > 7%, the PITS (50–60% AREs) estimators are recommended for estimating parameter θ.

7. Applications and Discussion

In this section, we report four applications of the Lindley distribution as a reliability model using real data sets and compare the performance of the ML, OLS, WLS and PITS (with several AREs) estimators. Note that the AREs of PITS estimator are determined based on the proportion of outliers found in the data. The first data set (Data Set 1) consists of the time to failure of 18 electronic devices reported by Wang [36]. The second data set (Data Set 2) represents the survival times of 44 patients suffering from head and neck cancers (treated using radiotherapy and chemotherapy), which were initially reported by Efron [37] (see also Reference [38]). The third data set (Data Set 3) consists of an uncensored data set of the remission times of a random sample of 128 bladder cancer patients, which was obtained from Lee and Wang [5]. Finally, the fourth data set (Data Set 4) represents the length of stay of 300 patients suffering from breast cancer, which can be found in Reference [39]. All four data sets are given in Appendix A.
For all the data sets considered, to identify the presence of outliers, we applied the generalized boxplot method [40], which is suitable for skewed and/or heavy-tailed distributions. In practical applications, we suggest to apply this method for the purpose of outlier detection for the data that follows the Lindley distribution. Table 9 provides descriptive statistics for all the data sets and Figure 2 shows generalized boxplots of these data sets.
To compare the performance of the considered methods in estimating the parameter of the Lindley distribution, we applied the Kolmogorov-Smirnov (K-S) test, Akaike information criterion (AIC) and Bayesian information criterion (BIC) to assess goodness of fit. The best method is determined by that yielding the highest p-value of the K-S test and the smallest value of K-S statistic, AIC and BIC. Table 10 lists the estimated parameters and goodness of fits obtained for the Lindley distribution for all the data sets. As presented in Table 10, we can observe that the PITS estimator provides a better estimation of the Lindley parameter than the ML, OLS and WLS estimators based on its smallest K-S statistic and highest p-value of the K-S test. On the other hand, the ML estimator is found to be the best method for estimating the parameter θ based on its smallest value of AIC and BIC. This is because the ML estimator maximized the likelihood function of the Lindley model and as a result the AIC and BIC will always favor the ML estimator. Thus, in our case here, we could say that the AIC and BIC are biased measures for goodness of fit assessment. To further support our claim regarding the AIC and BIC, we provide another example where the data is simulated from Lindley distribution with θ = 2 for sample size n = 100. Then, we random select 5% of the observations and replace them with outliers. Based on this data, we compare the performances of all methods in estimating the parameter of the Lindley distribution. The K-S test, AIC and BIC are utilized to assess goodness of fit. The result of the comparative study is presented in Table 11.
Based on the result from Table 11, it can be seen that the estimated parameter found based on ML estimator ( θ ^ = 0.81958) is much deviated from the true value of θ = 2. However, if the AIC and BIC are used for determining the best model, it can be observed that the best fitted Lindley distribution is found when the parameter θ is estimated using ML estimator. This further support our claim that AIC and BIC will always favor the ML estimator although the outliers are present in the data. On the other hand, K-S statistic and p-value of K-S test are able to determine the best method for estimating the parameter θ which is PITS (85% ARE). Since the AIC and BIC would provide a biased measures for goodness of fit assessment, thus, in this study, the best Lindley model is only determined based on the smallest value of K-S statistic and the highest p-value of K-S test.
As we mentioned before, based on the smallest K-S statistic and highest p-value of the K-S test, the PITS estimator is found to provide a better estimation of the Lindley parameter than the ML, OLS and WLS estimators. This result is supported by the fitted PDF shown in Figure 3. In addition, due to the substantial increase in the p-value of the K-S test, we note that the fittings of the Lindley distribution for data Sets 2, 3 and 4 are significantly improved when the PITS estimator is used to estimate parameter θ, as compared to the ML estimator. This is due to the presence of outliers far beyond the rest of the data, as shown in Figure 3b–d. However, the OLS and WLS estimators are also found to be quite reliable for estimating parameter θ despite the presence of outliers in the data.
Since the reliability measures based on the Lindley distribution depend on parameter θ, it is important to employ the most suitable method for estimating parameter θ. Based on its application to real data sets, we have shown that the proposed PITS estimator is a viable alternative for estimating the parameter of the Lindley distribution especially when outliers are present in the data.

8. Conclusions

In this paper, we proposed a new robust and efficient estimator of the parameter of the Lindley distribution based on the PITS. The advantage of the PITS estimator is that it is conceptually simple and easy to compute. An assessment of the robustness of the PITS estimator based on the BP and IF revealed that the PITS estimator has a high BP and bounded IF, which means that this estimator offers reasonable protection against outliers. In a simulation study, we compared the performance of the PITS estimator with those of several well-known estimators, namely ML, OLS and WLS. The results of the simulation indicated that the performance of the PITS estimator was similar to that of the ML estimator in the absence of outliers and outperforms all the other methods in the presence of outliers. We also note that the OLS and WLS estimators are quite robust and outperformed the ML estimator in the presence of outliers. Four real data sets were applied for which the parameter of the Lindley distribution was estimated. The results demonstrated that the PITS estimator provides a better fitting of this model than the other methods in terms of smallest K-S statistic and highest p-value of the K-S test. Finally, all the abbreviations are listed in Abbreviations part and the R commands for the PITS estimator are available in Appendix B.

Author Contributions

Methodology and writing, M.A.M.S.; supervision, review and editing, N.M.; programming, review and editing, M.H.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Universiti Kebangsaan Malaysia [grant number DIP-2018-038].

Acknowledgments

The authors would like to thank the editor for their time spent on reviewing our manuscript. Also the authors would like to thank the reviewers for the careful and insightful review of our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AICAkaike information criterion
AREAsymptotic relative efficiency
BICBayesian information criterion
BPBreakdown point
CDFCumulative distribution function
IFInfluence function
K-SKolmogorov-Smirnov
LBPLower breakdown point
MLMaximum likelihood
MOMMethod of moments
MTTFMean time to failure
OLSOrdinary least-squares
PDFProbability density function
PITSProbability integral transform
RRMSERelative root mean square error
UBPUpper breakdown point
WLSWeighted least-squares

Appendix A. Real Data Sets

Data Set 1 [36] (p. 309):
5, 1, 21, 31, 46, 75, 98, 122, 145, 165, 195, 224, 245, 293, 321, 330, 350, 420.
Data Set 2 [37,38] (p. 415, p. 169):
12.20, 23.56, 23.74, 25.87, 31.98, 37, 41.35, 47.38, 55.46, 58.36, 63.47, 68.46, 78.26, 74.47, 81.43, 84, 92, 94, 110, 112, 119, 127, 130, 133, 140, 146, 155, 159, 173, 179, 194, 195, 209, 249, 281, 319, 339, 432, 469, 519, 633, 725, 817, 1776.
Data Set 3 [5] (p. 231):
0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.20, 2.23, 3.52, 4.98, 6.97, 9.02, 13.29, 0.40, 2.26, 3.57, 5.06, 7.09, 9.22, 13.80, 25.74, 0.50, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 25.82, 0.51, 2.54, 3.70, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 2.62, 3.82, 5.32, 7.32, 10.06, 14.77, 32.15, 2.64, 3.88, 5.32, 7.39, 10.34, 14.83, 34.26, 0.90, 2.69, 4.18, 5.34, 7.59, 10.66, 15.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 1.19, 2.75, 4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 5.49, 7.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 7.87, 11.64, 17.36, 1.40, 3.02, 4.34, 5.71, 7.93, 11.79, 18.10, 1.46, 4.40, 5.85, 8.26, 11.98, 19.13, 1.76, 3.25, 4.50, 6.25, 8.37, 12.02, 2.02, 3.31, 4.51, 6.54, 8.53, 12.03, 20.28, 2.02, 3.36, 6.76, 12.07, 21.73, 2.07, 3.36, 6.93, 8.65, 12.63, 22.69.
Data Set 4 [39] (p. 2045):
8, 4, 3, 30, 15, 54, 24, 7, 4, 21, 5, 7, 3, 12, 26, 2, 61, 9, 11, 26, 12, 20, 60, 4, 24, 2, 12, 9, 36, 14, 12, 37, 35, 21, 11, 7, 2, 7, 33, 13, 25, 33, 11, 26, 31, 13, 26, 12, 22, 9, 21, 4, 8, 10, 2, 48, 30, 17, 6, 7, 15, 6, 12, 19, 13, 15, 5, 10, 7, 22, 26, 15, 55, 7, 5, 9, 6, 11, 10, 26, 24, 37, 7, 3, 16, 26, 15, 9, 16, 13, 11, 7, 2, 9, 10, 10, 20, 9, 7, 17, 19, 26, 7, 2, 11, 7, 8, 15, 6, 4, 21, 5, 13, 13, 37, 2, 8, 7, 16, 11, 15, 25, 8, 3, 10, 21, 10, 11, 4, 29, 28, 13, 10, 15, 20, 60, 12, 10, 3, 51, 17, 31, 4, 5, 11, 9, 30, 17, 26, 5, 40, 74, 14, 16, 33, 23, 19, 3, 89, 14, 20, 48, 26, 13, 12, 10, 10, 15, 14, 5, 23, 36, 6, 5, 3, 28, 28, 23, 12, 3, 4, 68, 10, 4, 30, 8, 6, 23, 14, 14, 1, 1, 16, 80, 1, 14, 18, 30, 17, 26, 5, 50, 17, 14, 15, 33, 23, 17, 9, 3, 59, 40, 27, 40, 3, 14, 87, 16, 14, 14, 11, 11, 32, 24, 15, 18, 31, 2, 8, 11, 17, 7, 48, 1, 25, 25, 77, 7, 2, 6, 2, 32, 12, 17, 19, 13, 1, 23, 20, 16, 46, 10, 14, 2, 5, 35, 9, 18, 2, 50, 7, 40, 22, 46, 19, 31, 13, 15, 26, 31, 5, 26, 1, 25, 46, 94, 9, 11, 12, 27, 12, 15, 31, 10, 30, 16, 14, 14, 49, 22, 17, 22, 7, 17, 4, 17, 13, 5, 33, 27.

Appendix B. R Commands for PITS Estimator

### PITS Estimator ###
flin<-function(theta,data,tau){
n<-length(data)
fx<-(sum(((1+((theta *data)/(1+theta)))*exp(-theta*data))^tau)/n)-(1/(tau+1))
return(fx)
}
#solve using bisection method
# a-lower interval; b-upper interval
pits<-function(data,tau,a,b){
theta<-uniroot(flin,interval=c(a,b),data=data,tau=tau)$root
return(theta)
}
		

References

  1. IEEE. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries; IEEE Std 610; IEEE Press: New York, NY, USA, 1991; pp. 1–217. [Google Scholar]
  2. Murthy, D.N.P.; Rausand, M.; Østerås, T. Product Reliability: Specification and Performance; Springer: London, UK, 2008. [Google Scholar]
  3. Meeker, W.Q.; Escobar, L.A. Statistical Methods for Reliability Data; John Wiley & Sons: New York, NY, USA, 2014. [Google Scholar]
  4. Blischke, W.R.; Murthy, D.N.P. Reliability: Modeling, Prediction, and Optimization; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
  5. Lee, E.T.; Wang, J. Statistical Methods for Survival Data Analysis, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2003; Volume 476. [Google Scholar]
  6. Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its application. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
  7. Krishna, H.; Kumar, K. Reliability estimation in Lindley distribution with progressively type II right censored sample. Math. Comput. Simul. 2011, 82, 281–294. [Google Scholar] [CrossRef]
  8. Lindley, D.V. Fiducial Distributions and Bayes’ Theorem. J. R. Stat. Soc. Ser. B 1958, 20, 102–107. [Google Scholar] [CrossRef]
  9. Nie, J.; Gui, W. Parameter estimation of Lindley distribution based on progressive type-II censored competing risks data with binomial removals. Mathematics 2019, 7, 646. [Google Scholar] [CrossRef] [Green Version]
  10. Ghitany, M.E.; Alqallaf, F.; Al-Mutairi, D.K.; Husain, H.A. A two-parameter weighted Lindley distribution and its applications to survival data. Math. Comput. Simul. 2011, 81, 1190–1201. [Google Scholar] [CrossRef]
  11. Nadarajah, S.; Bakouch, H.S.; Tahmasbi, R. A generalized Lindley distribution. Sankhya B 2011, 73, 331–359. [Google Scholar] [CrossRef]
  12. Bakouch, H.S.; Al-Zahrani, B.M.; Al-Shomrani, A.A.; Marchi, V.A.A.; Louzada, F. An extended Lindley distribution. J. Korean Stat. Soc. 2012, 41, 75–85. [Google Scholar] [CrossRef]
  13. Ghitany, M.E.; Al-Mutairi, D.K.; Balakrishnan, N.; Al-Enezi, L.J. Power Lindley distribution and associated inference. Comput. Stat. Data Anal. 2013, 64, 20–33. [Google Scholar] [CrossRef]
  14. Oluyede, B.O.; Yang, T. A new class of generalized Lindley distributions with applications. J. Stat. Comput. Simul. 2015, 85, 2072–2100. [Google Scholar] [CrossRef]
  15. Ashour, S.K.; Eltehiwy, M.A. Exponentiated power Lindley distribution. J. Adv. Res. 2015, 6, 895–905. [Google Scholar] [CrossRef] [Green Version]
  16. Asgharzadeh, A.; Bakouch, H.S.; Nadarajah, S.; Sharafi, F. A new weighted lindley distribution with application. Brazilian J. Probab. Stat. 2016, 30, 1–27. [Google Scholar] [CrossRef]
  17. Kemaloglu, S.A.; Yilmaz, M. Transmuted two-parameter Lindley distribution. Commun. Stat. Theory Methods 2017, 46, 11866–11879. [Google Scholar] [CrossRef]
  18. MirMostafaee, S.M.T.K.; Alizadeh, M.; Altun, E.; Nadarajah, S. The exponentiated generalized power Lindley distribution: Properties and applications. Appl. Math. 2019, 34, 127–148. [Google Scholar] [CrossRef]
  19. Barnett, V.; Lewis, T. Outliers in Statistical Data, 3rd ed.; Wiley: New York, NY, USA, 1994. [Google Scholar]
  20. Huber, S. (Non-)robustness of maximum likelihood estimators for operational risk severity distributions. Quant. Financ. 2010, 10, 871–882. [Google Scholar] [CrossRef]
  21. Ahmed, E.S.; Volodin, A.I.; Hussein, A.A. Robust weighted likelihood estimation of exponential parameters. IEEE Trans. Reliab. 2005, 54, 389–395. [Google Scholar] [CrossRef]
  22. Shahriari, H.; Radfar, E.; Samimi, Y. Robust estimation of systems reliability. Qual. Technol. Quant. Manag. 2017, 14, 310–324. [Google Scholar] [CrossRef]
  23. Boudt, K.; Caliskan, D.; Croux, C. Robust explicit estimators of Weibull parameters. Metrika 2011, 73, 187–209. [Google Scholar] [CrossRef] [Green Version]
  24. Adatia, A. Robust estimators of the 2-parameter gamma distribution. IEEE Trans. Reliab. 1988, 37, 234–238. [Google Scholar] [CrossRef]
  25. Marazzi, A.; Ruffieux, C. Implementing M-Estimators of the Gamma Distribution. In Robust Statistics, Data Analysis, and Computer Intensive Methods: In Honor of Peter Huber’s 60th Birthday; Rieder, H., Ed.; Springer: New York, NY, USA, 1996; pp. 277–297. [Google Scholar]
  26. Clarke, B.R.; McKinnon, P.L.; Riley, G. A fast robust method for fitting gamma distributions. Stat. Pap. 2012, 53, 1001–1014. [Google Scholar] [CrossRef]
  27. Serfling, R. Efficient and robust fitting of lognormal distributions. N. Am. Actuar. J. 2002, 6, 95–109. [Google Scholar] [CrossRef] [Green Version]
  28. Quesenberry, C.P. Probability integral transformations. Encycl. Stat. Sci. 2004, 10. [Google Scholar] [CrossRef] [Green Version]
  29. Finkelstein, M.; Tucker, H.G.; Alan Veeh, J. Pareto tail index estimation revisited. North Am. Actuar. J. 2006, 10, 1–10. [Google Scholar] [CrossRef]
  30. Safari, M.A.M.; Masseran, N.; Ibrahim, K.; Hussain, S.I. A robust and efficient estimator for the tail index of inverse Pareto distribution. Phys. A Stat. Mech. Its Appl. 2019, 517, 431–439. [Google Scholar] [CrossRef]
  31. Safari, M.A.M.; Masseran, N.; Ibrahim, K.; AL-Dhurafi, N.A. The power-law distribution for the income of poor households. Phys. A Stat. Mech. Its Appl. 2020, 557, 124893. [Google Scholar] [CrossRef]
  32. Jodrá, P. Computer generation of random variables with Lindley or Poisson-Lindley distribution via the Lambert W function. Math. Comput. Simul. 2010, 81, 851–859. [Google Scholar] [CrossRef]
  33. Huber, P.J. Robust Statistics; John Wiley & Sons: New York, NY, USA, 1981. [Google Scholar]
  34. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; John Wiley & Sons: New York, NY, USA, 1986. [Google Scholar]
  35. Maronna, R.A.; Martin, R.D.; Yohai, V.J.; Salibián-Barrera, M. Robust Statistics: Theory and Methods (with R), 2nd ed.; John Wiley & Sons: Chichester, UK, 2019. [Google Scholar]
  36. Wang, F.K. A new model with bath tub-shaped failure rate using an additive Burr XII distribution. Reliab. Eng. Syst. Saf. 2000, 70, 305–312. [Google Scholar] [CrossRef]
  37. Efron, B. Logistic regression, survival analysis, and the Kaplan-Meier curve. J. Am. Stat. Assoc. 1988, 83, 414–425. [Google Scholar] [CrossRef]
  38. Sharma, V.K.; Singh, S.K.; Singh, U.; Agiwal, V. The inverse Lindley distribution: A stress-strength reliability model with application to head and neck cancer data. J. Ind. Prod. Eng. 2015, 32, 162–173. [Google Scholar] [CrossRef]
  39. Adamu, P.I.; Oguntunde, P.E.; Okagbue, H.I.; Agboola, O.O. Statistical data analysis of cancer incidences in insurgency affected states in Nigeria. Data Br. 2018, 18, 2029–2046. [Google Scholar] [CrossRef]
  40. Bruffaerts, C.; Verardi, V.; Vermandele, C. A generalized boxplot for skewed and heavy-tailed distributions. Stat. Probab. Lett. 2014, 95, 110–117. [Google Scholar] [CrossRef]
Figure 1. Sensitivity curve (SCs) of PITS estimators for (a) θ = 0.5, (b) θ = 1, (c) θ = 2 and (d) θ = 3 with several AREs.
Figure 1. Sensitivity curve (SCs) of PITS estimators for (a) θ = 0.5, (b) θ = 1, (c) θ = 2 and (d) θ = 3 with several AREs.
Mathematics 08 01634 g001
Figure 2. Generalized boxplot for data (a) Set 1, (b) Set 2, (c) Set 3 and (d) Set (4).
Figure 2. Generalized boxplot for data (a) Set 1, (b) Set 2, (c) Set 3 and (d) Set (4).
Mathematics 08 01634 g002
Figure 3. Fitted Lindley densities on histograms for data (a) Set 1, (b) Set 2, (c) Set 3 and (d) Set 4 by several different estimators.
Figure 3. Fitted Lindley densities on histograms for data (a) Set 1, (b) Set 2, (c) Set 3 and (d) Set 4 by several different estimators.
Mathematics 08 01634 g003
Table 1. Asymptotic relative efficiency (ARE) values of the probability integral transform statistic (PITS) estimator and corresponding tuning parameter τ values.
Table 1. Asymptotic relative efficiency (ARE) values of the probability integral transform statistic (PITS) estimator and corresponding tuning parameter τ values.
τ0.160.290.460.630.811.001.211.451.722.042.41
ARE (%)9895908580757065605550
Table 2. Upper breakdown points (UBPs) and lower breakdown points (LBPs) of the PITS estimator for different ARE levels.
Table 2. Upper breakdown points (UBPs) and lower breakdown points (LBPs) of the PITS estimator for different ARE levels.
ARE (%)9895908580757065605550
UBP0.140.220.320.390.450.500.550.590.630.670.71
LBP0.860.780.680.610.550.50.450.410.370.330.29
Table 3. Relative root mean square error (RRMSE) results for estimations of parameter θ with n = 30 and m = 0, 1, 3, 5.
Table 3. Relative root mean square error (RRMSE) results for estimations of parameter θ with n = 30 and m = 0, 1, 3, 5.
θmRRMSE
MLOLSWLSPITS
98% ARE
PITS
90% ARE
PITS
80% ARE
PITS
70% ARE
PITS
60% ARE
0.5014.0215.4214.9414.0114.2414.8215.6116.66
138.9715.5115.0818.7114.4814.5515.1816.14
364.2417.5517.8346.0724.0319.2317.7617.60
574.9522.5322.8167.2038.2627.9123.8721.70
1014.5616.1715.6214.5614.9115.6416.6017.89
139.9116.0515.7519.1915.0315.2716.1017.32
364.7118.2518.5946.6624.6619.8518.4218.40
575.3623.4623.8667.5839.1428.7024.6622.60
2015.5217.3716.6915.5216.0116.8918.0419.58
141.2117.4716.8320.1315.9716.3717.3718.82
365.8019.5819.9447.8425.9221.0519.6019.70
576.2225.0025.4968.1140.6130.0825.9924.06
3016.2718.2617.5016.2616.7817.7619.0720.82
142.4118.3517.6420.4416.3416.8818.0719.75
366.8320.0320.3148.5226.3621.3720.1020.23
576.6025.7126.2868.3741.4430.7726.6324.69
The best method for each case is written in bold.
Table 4. RRMSE results for estimations of parameter θ with n = 50 and m = 0, 1, 3, 5.
Table 4. RRMSE results for estimations of parameter θ with n = 50 and m = 0, 1, 3, 5.
θmRRMSE
MLOLSWLSPITS
98% ARE
PITS
90% ARE
PITS
80% ARE
PITS
70% ARE
PITS
60% ARE
0.5010.6311.6611.3010.6310.8111.2411.8212.60
129.4411.7311.4112.9710.9411.1311.6412.38
352.6212.7612.9428.7315.7213.3712.8012.90
564.7615.5215.9145.8423.4917.9315.9515.03
1011.1312.3611.8911.1411.4211.9612.6613.58
130.0312.4512.0513.2111.3711.7212.3813.27
353.4813.3013.4829.3016.1113.8413.3913.73
565.4216.0416.5146.3923.9018.3116.4215.42
2011.6613.1412.5911.6812.1012.7713.6114.70
131.9813.2212.7113.9912.1012.5413.3214.36
355.1214.2714.4730.4817.0814.7814.4014.94
566.6917.2917.8447.7125.2619.5317.6116.73
3012.2313.9313.2812.2512.7713.5514.5115.73
132.7714.0213.3914.4212.6413.2114.1115.30
355.8114.9415.1131.0617.6215.3815.0515.57
567.5817.8818.4548.3625.8720.0718.1717.05
The best method for each case is written in bold.
Table 5. RRMSE results for estimations of parameter θ with n = 70 and m = 0, 1, 3, 5.
Table 5. RRMSE results for estimations of parameter θ with n = 70 and m = 0, 1, 3, 5.
θmRRMSE
MLOLSWLSPITS
98% ARE
PITS
90% ARE
PITS
80% ARE
PITS
70% ARE
PITS
60% ARE
0.508.889.809.468.899.079.449.9210.57
155.919.879.5610.699.169.379.8110.43
379.2410.5210.6123.4712.3510.8110.5610.91
586.7412.2112.5539.2817.6713.6912.4611.91
109.2910.369.959.309.5610.0110.5711.30
156.6810.4210.0411.119.629.9110.4411.14
379.6711.0611.1524.2212.8511.3111.1011.51
587.0312.9813.2940.3418.3614.3113.0912.79
209.7211.0010.519.7410.1010.6811.4112.34
158.0411.0610.6111.5910.1110.5311.2112.12
380.5111.6911.7925.3313.4811.9411.7412.39
587.3013.8814.1341.9519.2715.0813.8813.68
3010.2211.5310.9910.2310.6011.2211.9912.98
159.0211.6011.0811.9210.5010.9911.7312.70
381.1312.1812.2626.0913.9312.4012.2212.99
587.7614.3014.6443.1319.8715.5814.3814.21
The best method for each case is written in bold.
Table 6. RRMSE results for estimations of parameter θ with n = 100 and ε = 0%, 1%, 5%, 10%.
Table 6. RRMSE results for estimations of parameter θ with n = 100 and ε = 0%, 1%, 5%, 10%.
θε (%)RRMSE
MLOLSWLSPITS
98% ARE
PITS
90% ARE
PITS
80% ARE
PITS
70% ARE
PITS
60% ARE
0.507.348.157.867.357.517.858.288.84
118.488.197.928.237.567.838.238.78
548.829.639.8923.9112.6410.389.669.75
1065.3613.7714.2745.5622.8616.7514.3113.03
107.708.568.237.717.928.308.789.41
119.238.608.288.577.958.258.709.32
549.7410.0510.3524.5313.0610.7810.1010.22
1066.0514.5215.1246.3423.6317.4114.9513.60
208.219.268.858.228.549.019.5910.32
120.659.308.909.058.548.939.4910.21
551.5210.8811.2025.6013.8611.5610.9811.13
1067.2615.6316.3147.5524.8018.4515.9614.68
308.619.729.258.638.959.4610.1010.90
121.459.759.299.368.889.339.9510.75
552.7011.2611.5826.2214.2711.9411.3411.58
1068.1616.2316.9748.3425.5319.0616.5215.28
The best method for each case is written in bold.
Table 7. RRMSE results for estimations of parameter θ with n = 300 and ε = 0%, 1%, 5%, 10%.
Table 7. RRMSE results for estimations of parameter θ with n = 300 and ε = 0%, 1%, 5%, 10%.
θε (%)RRMSE
MLOLSWLSPITS
98% ARE
PITS
90% ARE
PITS
80% ARE
PITS
70% ARE
PITS
60% ARE
0.504.254.754.564.264.374.574.815.11
117.254.814.716.094.704.694.865.13
549.247.417.8523.6411.578.677.477.07
1065.8012.6313.1845.5722.6016.0613.2311.58
104.404.864.674.404.514.724.985.33
118.144.924.836.264.844.835.025.33
550.267.778.2824.2711.979.007.827.38
1066.4813.3113.9746.2623.2816.6413.7712.10
204.685.245.004.694.845.105.425.82
119.585.345.226.735.265.265.495.84
552.028.509.0825.3612.799.728.548.07
1067.7414.4915.2747.5724.5917.7714.8213.10
304.835.455.184.845.025.305.646.08
120.305.515.366.855.395.425.686.07
553.228.769.3925.9313.139.998.808.34
1068.6314.9915.8548.3025.2618.3115.2913.53
The best method for each case is written in bold.
Table 8. RRMSE results for estimations of parameter θ with n = 500 and ε = 0%, 1%, 5%, 10%.
Table 8. RRMSE results for estimations of parameter θ with n = 500 and ε = 0%, 1%, 5%, 10%.
θε (%)RRMSE
MLOLSWLSPITS
98% ARE
PITS
90% ARE
PITS
80% ARE
PITS
70% ARE
PITS
60% ARE
0.503.253.623.483.263.343.483.673.91
116.983.753.715.533.803.713.793.98
549.326.897.3723.5611.348.297.006.41
1065.8912.3812.9345.5322.5115.9012.9911.21
103.363.753.593.363.463.633.844.11
117.893.913.875.784.013.883.984.19
550.467.317.8924.2811.828.697.396.78
1066.5713.1413.8246.3223.2916.5713.6111.80
203.554.023.823.573.703.904.144.45
119.294.194.136.094.264.154.284.52
552.237.888.5425.2812.499.247.907.27
1067.8214.1614.9747.4924.4617.5514.5112.67
303.714.234.023.733.894.124.394.73
120.114.404.336.324.494.354.524.79
553.428.258.9625.9512.959.628.277.61
1068.7014.8115.6848.3025.2518.2115.1013.20
The best method for each case is written in bold.
Table 9. Descriptive statistics for data Sets 1, 2, 3 and 4.
Table 9. Descriptive statistics for data Sets 1, 2, 3 and 4.
SetSample Size (n)MeanMedianMinMaxStd. DeviationSkewnessNo. of Outliers (Proportion)
Set 118171.50155.001.00420.00132.270.281 (5.55%)
Set 244223.48128.5012.201776.00305.433.271 (2.27%)
Set 31289.376.400.0879.0510.513.256 (4.69%)
Set 430018.4414.001.0094.0015.891.9617 (5.67%)
Table 10. Parameter estimates and goodness of fits of the Lindley distributions of data Sets 1, 2, 3 and 4.
Table 10. Parameter estimates and goodness of fits of the Lindley distributions of data Sets 1, 2, 3 and 4.
DataMethodEstimated Parameter ( θ ^ ) K-S Statisticp-ValueAICBIC
Set 1ML0.011600.17370.5895230.7422231.6326
OLS0.011150.18020.5434230.7963231.6867
WLS0.011270.17860.5550230.7720231.6623
PITS (75% ARE)0.011800.17070.6112230.7532231.6435
PITS (70% ARE)0.012140.16550.6483230.8193231.7097
PITS (65% ARE)0.012610.16410.6583231.0041231.8945
PITS (60% ARE)0.013240.18660.4998231.4050232.2954
Set 2ML0.008910.21940.0243581.1628582.9470
OLS0.013250.13740.3453597.0666598.8508
WLS0.013330.13800.3404597.5451599.3293
PITS (95% ARE)0.010350.15100.2425583.2425585.0267
PITS (90% ARE)0.011170.12250.4864586.0117587.7959
PITS (85% ARE)0.011780.12200.4916588.7263590.5104
PITS (80% ARE)0.012270.12800.4306591.2379593.0220
Set 3ML0.196050.11640.0623841.0598843.9118
OLS0.230280.05970.7509847.9594850.8115
WLS0.227610.05800.7826846.9738849.8258
PITS (80% ARE)0.223680.05550.8247845.6468848.4988
PITS (75% ARE)0.226350.05710.7977846.5308849.3828
PITS (70% ARE)0.228520.05860.7718847.2994850.1514
PITS (65% ARE)0.230320.05980.7504847.8874850.7394
Set 4ML0.103380.07720.05582326.71502330.4190
OLS0.109980.04390.60992329.04902332.7520
WLS0.109220.04760.50492328.55502332.2590
PITS (75% ARE)0.109290.04720.51522328.60202332.3060
PITS (70% ARE)0.109730.04510.57532328.88202332.5860
PITS (65% ARE)0.110120.04320.63062329.15002332.8540
PITS (60% ARE)0.110390.04190.66732329.34202333.0460
The best method is written in bold.
Table 11. Parameter estimates and goodness of fits of Lindley distribution for simulated data from Lindley distribution with θ = 2, n = 100 and 5% outliers.
Table 11. Parameter estimates and goodness of fits of Lindley distribution for simulated data from Lindley distribution with θ = 2, n = 100 and 5% outliers.
MethodEstimated Parameter ( θ ^ ) K-S Statisticp-ValueAICBIC
ML0.819580.4256<0.0001401.3450403.9502
OLS2.069620.06180.8383608.0914610.6965
WLS2.049330.06100.8505603.0340605.6391
PITS (90% ARE)1.929350.06580.7784573.7682576.3734
PITS (85% ARE)1.988650.05810.8875588.0901590.6952
PITS (80% ARE)2.027380.06000.8638597.5959600.2011
PITS (75% ARE)2.055030.06120.8471604.4503607.0555
PITS (70% ARE)2.076970.06210.8340609.9304612.5355
PITS (65% ARE)2.095470.06290.8232614.5775617.1827
The best method is written in bold.

Share and Cite

MDPI and ACS Style

Safari, M.A.M.; Masseran, N.; Abdul Majid, M.H. Robust Reliability Estimation for Lindley Distribution—A Probability Integral Transform Statistical Approach. Mathematics 2020, 8, 1634. https://doi.org/10.3390/math8091634

AMA Style

Safari MAM, Masseran N, Abdul Majid MH. Robust Reliability Estimation for Lindley Distribution—A Probability Integral Transform Statistical Approach. Mathematics. 2020; 8(9):1634. https://doi.org/10.3390/math8091634

Chicago/Turabian Style

Safari, Muhammad Aslam Mohd, Nurulkamal Masseran, and Muhammad Hilmi Abdul Majid. 2020. "Robust Reliability Estimation for Lindley Distribution—A Probability Integral Transform Statistical Approach" Mathematics 8, no. 9: 1634. https://doi.org/10.3390/math8091634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop