Three-Part Composite Pareto Modelling for Income Distribution in Malaysia

: Income distribution models can be useful for describing the economic properties of a population. In this study, three-part composite Pareto models are ﬁtted to the income distribution in Malaysia for the years 2007, 2009, 2012, 2014, and 2016. The three-part composite Pareto models divide the population into three parts, each following a different distribution model. The lower part follows the inverse Pareto distribution, the upper part follows the Pareto distribution, and the middle part follows another unspeciﬁed distribution model. For application in income data, the use of Gaussian mixture distribution is proposed for the middle part, making the inverse Pareto– Gaussian mixture-Pareto distribution model semi-parametric. From the model, it is found that the levels of income inequality in the lower and upper income groups decrease over the period of study. Additionally, the proportion of data following the inverse Pareto distribution in the model is highly correlated with the ofﬁcial absolute poverty incidence.


Introduction
The research on income distribution has been around for a long time, starting from Vilfredo Pareto's observation on income in 1896.Even though the topic has been discussed at length, it is still relevant and important, as income is heavily related to the well-being of a country.For example, it is found that income inequality is highly correlated with the number of criminal activities [1,2] and countries with low income inequality have healthier citizens mentally and physically [3][4][5][6].High income inequality between groups in a population may also cause political instability [7] and is considered one of the main causes of the racial riot in Malaysia in 1969 [8].
There have been many models proposed for describing income distribution.For example, the lognormal, Weibull, gamma, Dagum, beta distribution of the second kind, and Singh-Maddala distributions have been used to model the income distribution of the whole population.However, these distributions may not fit the upper and lower tails of the income distribution well.Dagum [9] and Singh and Maddala [10] for example have noted that the lognormal or gamma distributions alone are not enough to describe the upper and lower tails of the income distribution well.On the other hand, Pareto distribution has been used extensively to model the upper tail of the income distribution in various countries [11][12][13][14][15] and the inverse Pareto distribution can be used for the lower tail as both the upper and lower tails of income distribution exhibit power-law behaviour [16,17].In general, the power-law behaviour is observed for the top 5-10% of the population [18].In the context of the household income in Malaysia, previous studies have identified the power-law behaviour in the upper tail of the income distribution [19][20][21][22].Additionally, Safari et al. [23] have noted that the inverse Pareto distribution is a good model to be used for lower income groups in Malaysia.
The three-part composite Pareto model can be seen as an extension of the twopart composite Pareto model introduced by Cooray and Ananda [24].This composite model is also known as the spliced distribution in some literature [25,26].In the introduction of the two-part composite Pareto model, Cooray and Ananda have used the Pareto distribution for the upper tail and the lognormal distribution for the rest of the data.After the model was introduced, there have been many advances and varieties of the two-part composite Pareto model [27][28][29][30][31][32][33][34].As for the three-part composite Pareto model, Mendes and Lopes [35] have used a composite model that uses t-distribution for the middle part and two generalized Pareto distributions both for the lower and upper parts of the data.Luckstead and Devadoss [36] and Luckstead et al. [37] on the other hand have used the inverse Pareto distribution for the lower part, lognormal distribution for the middle part, and Pareto distribution for the upper part for the cities size distribution in the US and India.Wiegand and Nadarajah [38] have also used the three-part composite Pareto model with Pareto type IV distribution for the upper part and lognormal, gamma, beta Weibull, or Pareto type IV for the lower and middle part of the data for categorizing companies based on their market value, sales, assets, and profits.The two-part composite Pareto model has been applied to the income data [39], but to the authors knowledge, the three-part composite Pareto model has not been proposed and used to describe the income distribution.
Since the Pareto distribution fits well for the upper tail and the inverse Pareto distributions is suitable for the lower tail of the household income data, a model that combines these two distributions together with another distribution for the middle part of the data may be useful.In this paper, we propose the use of the three-part composite Pareto models for income distribution that describe lower, middle, and upper parts of the data using separate distributions: inverse Pareto distribution for the lower part, an unspecified distribution for the middle part, and Pareto distribution for the upper part of the income data.For the middle part of the data, this paper proposed the usage of Gaussian mixture distribution.By combining these three distributions, the three-part composite Pareto model can divide the population into three categories: the lower, middle, and upper income groups.Further analysis on each of these categories can be performed by studying the properties of the respective distribution for the group.This approach of combining three separate distributions is different as compared to the practice used in other literature, which which analyses each part separately [14,[40][41][42].
The choice of using Pareto and inverse Pareto distributions in the composite model is due to their properties that fit with the upper and lower parts of the income distribution together their simplicity.While there are other distributions that can be used, for example Pareto Type II-IV, Generalized Pareto, or Generalized Extreme Value distributions for the upper tail, using these distributions increases the complexity of the composite model.Additionally, the shape parameters in the Pareto and inverse Pareto distributions are useful for measuring income inequalities, as discussed in Section 2.2.The Lorenz curve and Gini index used for measuring the income inequality model will also be shown.And finally, the model is then applied to the household income data in Malaysia for the years 2007, 2009, 2012, 2014, and 2016.This paper is organized as follows.Section 2 discusses the methodologies used in the study.This includes the three-part composite Pareto model, the Lorenz curve and Gini index, the semi-parametric three part composite Pareto model, as well as the pseudolikelihood approach used to estimate the parameters in the model.Section 3 focuses on the application of the three-part composite Pareto model to the income distribution in Malaysia.Then finally, Section 4 concludes the paper.

Three-Part Composite Pareto Model
In the three-part composite Pareto (3PCP) model, the data are divided into three parts, the lower, upper, and middle parts, each following a different distribution model.
The lower part of the data follows the inverse Pareto distribution with probability density function (PDF) where τ 1 and α 1 are the threshold and shape parameters of the inverse Pareto distribution, respectively.The inverse Pareto distribution is also called the power function distribution in some literature [43].The upper part of the data follows the Pareto distribution with PDF: where τ 2 and α 2 are the threshold and shape parameters of the Pareto distribution, respectively.And finally, the middle part of the data follows another distribution that is not specified or fixed.Figure 1  Let h(x|η) and H(x|η) be the PDF and cumulative distribution function (CDF) of the middle part of the data with parameter η, respectively.Combining the three distributions for each part gives the following PDF for the 3PCP model: where θ is the collection of all parameters in the model, ρ 1 is the proportion of data in the lower part following the inverse Pareto distribution, and ρ 2 is the proportion of data in the upper part following the Pareto distribution.
The PDF in Equation (3) indicates that there are two threshold parameters, τ 1 and τ 2 .Any observation with a value less than τ 1 follows the inverse Pareto distribution with PDF f IP (x|τ 1 , α 1 ).Any observation with a value between τ 1 and τ 2 follows the middle part distribution with PDF h(x|η).And lastly, for any observation with a value greater than τ 2 , it follows the Pareto distribution with PDF f P (x|τ 2 , α 2 ).Because f IP (x|τ 1 , α 1 ), h(x|η), and f P (x|τ 2 , α 2 ) are all PDFs, then ∫ ∞ 0 f (x|θ) dx must be equal to 1.The PDF in Equation (3) can also be considered as a mixture distribution, except the distributions do not overlap each other.
The CDF for 3PCP model can be calculated simply by integrating the PDF in Equation (3) to obtain Moreover, the quantile function for the model is The overall mean when α 2 > 1 is given in the equation below: If α 2 ≤ 1, then the integral ∫ ∞ τ 2 x f P (x|τ 2 , α 2 ) dx diverges and µ X = ∞.However, note that the PDF in Equation (3) may not be continuous or differentiable.Additional constraints are required if continuity and differentiability are desired.The continuity of the PDF can be achieved by setting and And as for the differentiability of the PDF, Equations ( 7) and ( 8) must be satisfied together with and where h (x|η) is the first derivative of h(x|η) with respect to x.

Lorenz Curve and Gini Index
Lorenz curve and Gini index are commonly used tools to measure the level of income inequality in a population [18,44].The general formula that can be used to calculate the Lorenz curve for a population with a distribution function is [45] where µ X is the overall mean and F −1 (y|θ) is the quantile function.The value of LC(u) for a specific u refers to the proportion of cumulative wealth or income earned by the lowest u proportion of the population.Let Then, it can be shown that for the 3PCP model, Using the obtained Lorenz curve function in Equation ( 13), the Lorenz curve can be plotted on a unit square where the x-axis is the population proportion, u, and the y-axis is the proportion of cumulative wealth or income, LC(u), and will be compared to the 45 • equality line.The closer the Lorenz curve is to the 45 • equality line, the lower the level of income inequality.
The Gini index on the other hand is a numerical measure calculated using the Lorenz curve that can be used to assess the level of income inequality.The value of Gini index is between 0 and 1 where the higher the Gini index is, the higher the level of income inequality.Using the Lorenz curve in Equation ( 13), it can be shown that The integral ∫ A(u) du in the expression above may require a numerical method, for example the trapezoidal rule, to approximate its value.Using Equation ( 14) above, the income inequality for the whole population can be measured.A high Gini index value shows a high level of income inequality, whereas a low Gini index value shows a low level of income inequality.
Note that the Lorenz curve and Gini index can both be calculated empirically or using a non-parametric approach, without having to specify a distribution model for the income.In general, if the number of observations in the data is large, the Lorenz curve and Gini index calculated empirically or using a non-parametric approach provide good estimates of inequality measures.However, the Lorenz curve and Gini index calculated using the underlying distribution model have the advantage when the sample size is small, and provide more reliable estimates as compared to the non-parametric approach [46].This is true for any distribution model, including the 3PCP model, provided that the distribution model fits the data adequately.
Additionally, it can be shown that the Gini index for the inverse Pareto distribution with shape parameter α 1 is [23] whereas for Pareto distribution with shape parameter α 2 , its Gini index is From these two equations, we can then use the values of α 1 and α 2 in the 3PCP model to evaluate the income inequalities in the lower and upper data, respectively.For example, if the value of α 1 is high, this indicates a high level of income inequality in the lower part of the data.On the other hand, if α 1 is low, then the level of income inequality in the lower part of the data is low, and similarly for α 2 for the upper part of the data.Note that comparisons on the income inequalities using the shape parameters can be made for different datasets with different proportions of the upper and lower data, as long as the proportions are not too small.As shown in Equations ( 15) and ( 16), the Gini index depends on the shape of the distribution, and not on the threshold parameters or the proportions of data.

Semi-Parametric Three-Part Composite Pareto Model
A problem might occur when using commonly used models for income distribution, for example lognormal, gamma, or Weibull distributions, for the middle part of the data.Suppose for example, the overall income distribution comes from a lognormal distribution.Then, when the 3PCP model with PDF in Equation ( 3) is applied to the data with h(x|η) be the lognormal distribution, we would expect ρ 1 and ρ 2 to be zero as the inverse Pareto and Pareto distributions are not required for describing the income distribution, and lognormal distribution alone is enough for the whole data.With that, information regarding the lower and upper income earners will be lost and the 3PCP model is not useful.
To overcome this problem, we can set the middle part to follow a semi-parametric model, for example the Gaussian mixture with k components.We can set and where f N (x|µ j , σ 2 j ) and F N (x|µ j , σ 2 j ) are the PDF and CDF of a normal distribution with mean µ j and variance σ 2 j , respectively, and r j is the weight for the jth component in the mixture model with ∑ k i=1 r j = 1 and r j > 0 for all j = 1, • • • , k.The number of components k is not specified and depends on the data themselves.The AIC and BIC values can be used to find the value of k that gives the lowest AIC and BIC values.Additionally, notice that from Equation (18), H(τ 1 |η) = 0 and H(τ 2 |η) = 1.In this paper, the model that uses this specification is called the inverse Pareto-Gaussian mixture-Pareto (IP-GM-P) model.
The usage of Gaussian mixture for the middle part of the data can give space for the lower and upper part of the data to be modelled by the inverse Pareto and Pareto distributions, respectively.In general, a normal distribution is not suitable to be used for income distribution due to its properties.For example, a normal distribution is symmetric and covers the whole real number, whereas the income distribution is commonly skewed to the right with heavy upper tail and with positive values only.When the Gaussian mixture distribution is used for the middle part of the data, the Pareto and inverse Pareto distributions are both required to fit the upper and lower parts of the data, respectively.Thus, information about the upper and lower data in the form of the Pareto and inverse Pareto distributions are not lost.
The finite mixture models have been used extensively to model the whole income distribution and to separate income groups within the population [47].Some finite mixture models that have been used for the whole income distribution include the Gaussian mixture model [48,49], the gamma mixture model [50], and the lognormal mixture model [51].But as mentioned, heavy tail distribution such as the lognormal, gamma, or Weibull distributions including their mixture models, should be avoided for the middle part of the data when the 3PCP model is to be fitted to income distribution.To our knowledge, the mixture model has not been used to model middle-class income specifically.The choice of using the Gaussian mixture model for the middle part of the 3PCP model is due to the properties of the normal distribution that is not suitable for income distribution and that any continuous distribution can be fitted by the Gaussian mixture model with a large enough number of components [52].
By substituting Equation (17) with Equation ( 3), the PDF for the IP-GM-P model is Additionally, the overall mean for the IP-GM-P model can be written as where Φ(•) is the CDF of the standard normal distribution.
For the IP-GM-P model, it can be shown that the A(u) function in Equation ( 12) can be written as where and H −1 (u|η) is the quantile function of the Gaussian mixture.Since H(x|η) is an increasing function, calculating u * is easy, for example by using bisection method.Using Equations ( 20) and ( 21), the Lorenz curve and Gini index using IP-GM-P model can be calculated using Equations ( 13) and ( 14), respectively.The integral ∫ 1−ρ 2 ρ 1 A(u) du in Equation ( 14) requires a numerical method, for example the trapezoidal rule, to approximate it.The approximation is fast and easy as the integral is bounded.

Statistical Methods for Complex Survey Data
In complex survey data, samples in the survey are given different weights depending on the size of target population and the size of the samples collected.These weights, when available, should be included in analysis to improve accuracy and to avoid bias in the results [53].To include sample weights in the parameter estimation of the model, the pseudo-likelihood approach can be used.
Let x i be the income of ith household in the sample with weight w i and n be the sample size.The weight is scaled such that the total weight is n using the following expression: where w * i is the unscaled sample weight.Then, the pseudo-likelihood function of the data can be written as Notice that if w i = 1 for all i, as seen in simple random sampling, then the pseudolikelihood function is the regular likelihood function.The maximum pseudo-likelihood estimate can be defined as the parameters that results with the highest value for the pseudo-likelihood function [54]: Unfortunately, due of the complexity of the 3PCP model, the analytical form of the solution is not possible.In this paper, the mle2 function in bbmle R package is used to estimate the parameters.This function uses the optim optimizer in R and gives the numerical estimate for the values of parameters that maximize the log pseudo-likelihood.
To perform a goodness-of-fit test, the modified Kolmogorov-Smirnov (KS) test will be used.The KS goodness-of-fit test is used to determine whether data fits the model by comparing the empirical CDF with the theoretical CDF.If the p-value of the test is lower than the significance level, then the null hypothesis that the model fits the data will be rejected.Since the sample weights are included in the analysis, the test statistic for this test is modified such that where F n (x) is the weighted empirical CDF and F(x) is the theoretical CDF.Observe that if w i = 1 for all i, then D n in the expression above reduces to the regular KS test statistic.It has been shown that D n in Equation ( 26) converges weakly to the KS distribution as n → ∞ [55,56].
As for finding the best number of components, k, the pseudo-likelihood based BIC values are used.The formula for this information criterion is as follows: where d is the number of parameters in the model and L is the value of the pseudolikelihood function using the maximum pseudo-likelihood estimate.The model with the lowest BIC values is more preferable.The consistency of the pseudo-likelihood-based BIC has been established by Xu et al. [57].

Household Income Survey
The data used in this paper are from the Household Income and Basic Amenities Survey (HIS & BA) conducted by the Department of Statistics Malaysia.Twice every five years, the Department of Statistics Malaysia would conduct this survey to collect information related to the economic well-being of the citizens in Malaysia.In this paper, the household income and its size will be used to study the changes in household income in Malaysia.Five datasets are used: household income for the years 2007, 2009, 2012, 2014, and 2016.These datasets are obtained from the Bank Data UKM through its agreement with the Department of Statistics Malaysia.The data consist of at least 12,000 households in each dataset.The monthly gross income, household size, and weight of each sample in the data are used to model the income distribution by the 3PCP model and using the pseudo-likelihood approach.The monthly gross income is first equivalized by dividing the income by the square root of the household size.This square root equivalization is used in many studies to take into account the household size when considering the economic status of a household [58,59].

Application of the Model
The 3PCP models are applied to the HIS data.Originally, the 3PCP models with lognormal, gamma, or Weibull distributions for the middle data are applied to the HIS data.It is found that the inverse Pareto-lognormal-Pareto model with continuous but not differentiable PDF fits all five datasets based on the KS test statistics and the lowest BIC values as compared to other models.However, for some datasets, the estimated values ρ1 and ρ2 are found to be too small, with the smallest value being 0.0054 followed by 0.0074, which cannot be interpreted as proportion of poor and rich subpopulations, respectively.The estimated proportions are also inconsistent throughout the five datasets.This may occur because the lognormal distribution is already a good fit for some of the data, without needing the inverse Pareto and Pareto distributions in the model.This is where the semi-parametric IP-GM-P model can be useful to make sure the lower and upper parts of the data are modelled by the inverse Pareto and Pareto distributions, respectively.The IP-GM-P model used is specified such that its PDF is continuous but not differentiable by specifying the values of ρ 1 and ρ 2 as in Equations ( 7) and (8).Adding the differentiability condition to the IP-GM-P model causes the number of components in the Gaussian mixture to increase just to make the PDF differentiable.While differentiability condition is more realistic, it is not useful for the IP-GM-P model.
When applying the IP-GM-P model, the number of components k for each dataset is first determined by using k = 1, 2, 3 and 4 and finding the value of k that gives the lowest BIC value.It is found that k = 2 gives the lowest BIC value for HIS data for the years 2007, 2009, and 2012 whereas k = 3 gives the lowest BIC value for HIS data for the years 2014 and 2016.Table 1 shows the estimated values for α 1 , α 2 , ρ 1 , ρ 2 , τ 1 , and τ 2 for IP-GM-P model.The table also shows the p-values for the KS goodness-of-fit test.Based on the very large p-values for all five datasets, the IP-GM-P model has successfully fit with all of them.This is not unexpected because of the large number of parameters in the IP-GM-P model that helps with fitting the model.

Income Inequality Using IP-GM-P Model
The income inequality of the data can be measured using the values of α 1 and α 2 , as well as the Lorenz Curve and Gini index.Looking at the estimated values for ρ 1 in Table 1, the IP-GM-P model estimated that around 2.45% to 7.81% of the population belongs to the lower income group.Note that the estimated values also drop from 2007 to 2016.Here, comparisons are made on the proportions and not the threshold parameters, as proportions are unit-free.If comparisons were made using threshold parameters, the inflation effect must be taken into consideration.Additionally, from the table, the estimated value for α 1 generally increases from 3.25 in 2007 to 4.59 in 2016.Since the Gini index for the lower data is inversely related to α 1 , these values indicate that in general, the level of income inequality for the lower income group decreases from 2007 to 2016. Figure 2 shows the changes in α 1 and α 2 from 2007 to 2016.From the figure, it can be inferred that the level of income inequality for the lower income group decreases from 2007 to 2012 (as the value of α 1 increases), and does not change much from 2012 to 2016.For the upper income group, Table 1 shows that the estimated proportion of the upper income group in the population is around 2.87% to 9.96%.There does not seem to be any trend for the changes in ρ 2 .As for α 2 , the table shows that the its estimated value generally increases from 2.37 in 2007 to 2.64 in 2016.The changes in α 2 are also shown in Figure 2. Similar to the lower income group, the increase of α 2 indicates that the level of income inequality for the upper income group decreases over the period of time.
Figure 3 shows the Lorenz curve for household income for all five datasets obtained by using the IP-GM-P model.This Lorenz curve is obtained by substituting Equations ( 20) and (21) into Equation (13).From the figure, it can be observed that the Lorenz curve moves closer to the equality line from 2007 to 2016.Additionally, the Gini index is obtained by substituting Equations ( 20) and (21) into Equation ( 14) and using the trapezoidal rule to estimate the integral in Equation (14).It is found that the Gini index obtained by using IP-GM-P model for HIS data for the years 2007,2009,2012,2014, and 2016 are 0.4434, 0.4406, 0.4267, 0.4051, and 0.3929, respectively.The decrease in the Gini index together with the increase in proximity of the Lorenz curve to the equality line suggest that, overall, the level of income inequality in Malaysia decreased from 2007 to 2016.

Comparison with Official Poverty Rate
The proportions of household in the lower income group represented by the inverse Pareto distribution in the IP-GM-P are compared to the official poverty incidences published by the Department of Statistics Malaysia [60].There are two types of poverty used by the Department of Statistics Malaysia.The first type is the absolute poverty that includes households with income lower than a minimum threshold called poverty line income.According to the Department of Statistics Malaysia [61], the poverty line income is the minimum income required for a household to satisfy the basic needs of its members that has been identified through research conducted by the Economic Planning Unit, Prime Minister's Department and the Department of Statistics Malaysia in collaboration with the United Nations Development Programme.The second type of poverty is the relative poverty defined as households with income less than half of the median household income of the population.Overall, the absolute poverty incidence decreases over the period of time and no trend can be observed for the relative poverty incidence.It is also noted that the relative poverty incidence is much higher as compared to the absolute poverty incidence.Overall, Table 2 shows that the percentage of household income modelled by the inverse Pareto distribution is between the absolute poverty incidence and relative poverty incidence.Additionally, the percentage of household modelled by the inverse Pareto distribution also decreases from 2007 to 2016, and the same can be observed for the absolute poverty incidence.Figure 4 shows the relationship between the percentage of the lower income group and the absolute and relative poverty incidences.Based on the figure, it can be observed that the percentage of lower income group seems to be linearly related to the absolute poverty incidence, with the high correlation coefficient.However, no relationship can be observed between the percentage of the group and the relative poverty incidence.
Although the percentage of the lower income group is not exactly equal to any of the two poverty incidences reported by the Department of Statistics Malaysia, there is a strong relationship between this percentage and the absolute poverty incidence based on the high correlation coefficient.This indicates that ρ 1 in the IP-GM-P model may be related to the absolute poverty incidence, and can be used to determine the absolute poverty incidence without the need to determine the poverty line income, which may require additional time and cost.

Conclusions
This paper proposes the use of a three-part composite Pareto (3PCP) model to be applied to the income distribution.The 3PCP model is a combination of inverse Pareto distribution for the lower part of the data, Pareto distribution for the upper part of the data, and another unspecified distribution for the middle part of the data.The general form of the probability density function (PDF) as well as the constraints required for the PDF to be continuous and differentiable are also given.Additionally, the Lorenz curve and Gini index for the 3PCP model are given.For the middle part of the data, this paper proposes to use a semi-parametric approach by using the Gaussian mixture distribution.This inverse Pareto-Gaussian mixture-Pareto (IP-GM-P) distribution model has the benefit that it allows lower and upper parts of the data to be described by the inverse Pareto and Pareto distributions, respectively.
The main advantage of the 3PCP model is that the model divides the population into three categories-the lower, middle, and upper income groups-and analyses them simultaneously, unlike previous literature that analyses each group separately.Additionally, the shape parameters in the Pareto and inverse Pareto distributions give insight on the levels of income inequality in the upper and lower income groups, respectively.Knowing how the income inequality changes in the lower and upper income groups may help policy makers in making decisions.Additionally, it is found, at least for the Malaysian household income, that the proportion of data following the inverse Pareto distribution is highly correlated with the official absolute poverty incidence.Therefore, the 3PCP model can be used to estimate the absolute poverty incidence in a country without having to find the poverty line income, which can be difficult.
However there are some challenges to the 3PCP model.First, due to the model complexity, the parameter estimation process can be difficult.In this paper, the parameters are estimated numerically which may not give reliable results.In some cases, several initial values were used to find the maximum likelihood estimates and there is no guarantee that the numerically estimated values are the ones that maximize the likelihood.Additionally, the lower, middle, and upper income groups derived from the 3PCP model may not align with the definition used by the governments and policy makers.In many countries, the income groups are defined by the quantiles, for example lower income earners are those in the bottom 40% of the population.The classification based on the quantiles are easier to be understood by the general public, compared to estimates found using the 3PCP model.
For future work, the performance of the 3PCP model must be assessed for other countries and not just Malaysia.It would be interesting to see if the 3PCP model can explain properties of income distribution in other countries.This paper focuses on the household income in Malaysia due to data availability and to make comparison with poverty incidence based on poverty line income.We expect the 3PCP model to fit income distribution of other countries.Comparison on the proportions of data following the Pareto and inverse Pareto distributions based on the 3PCP model can also be made for different countries.Furthermore, the robustness of the 3PCP model may be explored further, but we expect that the robustness of the 3PCP model to be similar to the Pareto distribution for data with extreme outliers.Robust estimators for the 3PCP model may also be developed.

Figure 1 .
Figure 1.A graphical example of the PDF of the 3PCP model with Gaussian mixture distribution for the middle data.

Figure 2 .
Figure 2. The estimated values for α 1 and α 2 using the IP-GM-P model together with their 95% confidence intervals.

Figure 3 .
Figure 3. Lorenz curve for the household income data in Malaysia using IP-GM-P model.Table 2 shows the percentage of household represented by the inverse Pareto distribution in the IP-GM-P model together with the official poverty incidences published by the Department of Statistics Malaysia for the years 2007, 2009, 2012, 2014, and 2016.Overall, the absolute poverty incidence decreases over the period of time and no trend can be observed for the relative poverty incidence.It is also noted that the relative poverty incidence is much higher as compared to the absolute poverty incidence.

Figure 4 .
Figure 4. Comparison between ρ 1 obtained from IP-GM-P model and the official poverty incidences.

Table 1 .
Estimated parameter values for IP-GM-P model, together with the p-values for the KS goodness-of-fit test.

Table 2 .
Percentages of lower income data explained by the inverse Pareto distribution and the official poverty incidences in Malaysia.