A Hybrid MCMC Sampler for Unconditional Quantile Based on Influence Function

In this study, we provide a Bayesian estimation method for the unconditional quantile regression model based on the Re-centered Influence Function (RIF). The method makes use of the dichotomous structure of the RIF and estimates a non-linear probability model by a logistic regression using a Gibbs within a Metropolis-Hastings sampler. This approach performs better in the presence of heavy-tailed distributions. Applied to a nationally-representative household survey, the Senegal Poverty Monitoring Report (2005), the results show that the change in the rate of returns to education across quantiles is substantially lower at the primary level.


Introduction
Introduced by Koenker and Bassett (1978), quantile regression models have been increasingly used in empirical labor market studies 1 to parsimoniously describe the entire distribution of an outcome variable.To overcome some limitations 2 of conditional quantile regression models, Firpo et al. (2009) propose the Re-centered Influence Function (RIF)-regression.This regression evaluates the impact of changes in the distribution of covariates on the quantiles of the marginal distribution of the dependent variable.The two-step estimation of the RIF-regression requires first an estimation of the density of the RIF function.A "classical" approach consists of estimating independently the RIF and the regression coefficients (see Firpo et al. 2009).This approach does not take into account the uncertainty related to the first step of estimation.Lubrano and Ndoye (2014) provide a Bayesian estimation of the RIF-regression where they consider sequentially the two-steps of estimation by estimating the density function of the outcome variable by a mixture of normal distributions.While being consistent 3 in the presence of heavy tails, their approach makes the underlying restrictive hypothesis of linearity.However, the estimated RIF function is a binary dependent variable; the linearity and the normality assumptions are strong and may lead sometimes to predicted probabilities that are negative or greater than one.In this study, we implement a Bayesian estimation method for the RIF-regression by considering the dichotomous structure of the RIF function.The method consists of running a logistic-regression where coefficients are estimated by the Metropolis-Hastings sampler using Gibbs output in the first step of estimation.
Since the collective agreement in April 2000 to place education at the heart of the development priorities for eradicating extreme poverty, the last two decades have seen a large increase in the enrollment rate of primary education in most developing countries, responding also to the second priority of the Millennium Development Goals (MDGs), "primary education for all".While education is increasingly acknowledged as an important dimension of poverty reduction, there remains some challenges in measuring its return, for example on a household's welfare.Studies emphasizing the role of education on poverty reduction have recently exploded, and regression analysis relying on both household surveys and cross-country data has been widely used in this literature.These regressions, using reduced-form equations, generally provide a simple, but partial framework for examining the marginal effect of education on a household's income 4 .Since the distribution of income is generally skewed to the right, the mean regression models do not provide complete and meaningful information, and then, the analysis of each point of the distribution is of particular interest to assess changes at these different points.
The proposed approach is employed in the empirical analysis to measure the return to education and to address the extent to which the rate of the marginal effect of primary education on a household's income changes across quantiles compared with those of higher education.
The investment in primary education devotes the largest budget allocation in developing countries to fulfill development priorities (Psacharopoulos 1994;Psacharopoulos and Patrinos 2002).In Senegal, the enrollment rate in primary school has climbed from 54 percent in 1994 to 70 percent in 2001 and 82.5 percent in 2005, accompanied by an increase in the female enrollment rate and the rural sectors enrollment rate 5 .However, the IMF 2007's report reveals that 78.51% of Senegalese youth aged 15-19 dropped out before finishing lower secondary school.
The empirical analysis of this paper uses the data from a nationally-representative survey: the Senegal Poverty Monitoring Report (ESPS, 2005) conducted by the National Agency of Statistics and Demography (ANSD) 6 .This survey is largely used by empirical studies, government monitoring reports, institutional strategic documents and in poverty reduction strategies papers (PRSPs) in Senegal 7 .
This study applies the RIF-regression method in a Mincer 8 equation type, to primarily investigate the changes in the return to education across quantiles.
The empirical results primarily demonstrate evidence from the heterogeneous pattern of changes in the rate of return to education across quantiles.The rate of change in the return to primary education does not vary much between the lower and the upper quantiles (0.50, 0.75, 0.90) compared to those to secondary and tertiary education.This result supports findings showing that in countries that rapidly expand access to primary education, the returns to primary education fall, while returns to higher education rise (Psacharopoulos 1994;Psacharopoulos and Patrinos 2002).
The paper is organized as follows: Section 2 presents the RIF-regression and the different estimation methods employed.It implements a Bayesian RIF-logit estimation by a Gibbs-Metropolis-Hastings sampler.Section 3 describes the data.Section 4 discusses the empirical results.Section 5 concludes and discusses some policy implications.The consumption expenditure is considered as an indicator of a household's income.Source: published reports and papers; see for instance (IMF 2007;Delaunay 2012).These ratios correspond to the number of students formally registered in primary school.Among the studies using the ESPS datasets, we can cite Boccanfuso et al. (2008); Boccanfuso et al. (2009); Diawara (2012), among others, and the national and institutional reports : DSRP 2005;IMF 2007;ANSD 2007. 8 The standard (Mincer 1974) earnings equation linearly regresses the log of wage on the year of education and the quadratic function of labor market experience.

Unconditional Quantile Regression Models
We consider the following quantile regression model: where (y i , x i ), i = 1, 2, . . ., n are independent observations, y i being the single-response variable and represents the (k + 1) unknown regression parameters, and u iτ , i = 1, . . ., n are the error terms, which are supposed to be independent and identically distributed.The τ-th quantile of u iτ is assumed equal to zero, q τ (u iτ |X) = 0.

RIF-Regression Models
Firpo et al. ( 2009) developed an unconditional quantile regression method based on the Re-centered Influence Function (RIF) to evaluate the marginal impact of changes in the distribution of the explanatory variables on the quantiles of the marginal distribution of the dependent variable.
The Influence Function (IF) studies how a change in the distribution of covariates affects a distributional statistic ν(F), where F is a class of distribution functions.It is defined as: where ∆ y is a perturbation distribution, which puts a mass of one at any point y and Firpo et al. (2009) consider the τ-th quantile, q τ as the distributional statistics ν(F), and show that the IFcan be expressed as: Firpo et al. (2009) define the Re-centered Influence Function (RIF) as RIF(y i , ν, F) = IF(y i , ν, F) + ν(F).For quantiles, the RIF can be expressed in the following convenient way: where The RIF-regression model consists of regressing the function RIF given in (3) on a set of covariates X.

Bayesian Estimation of the RIF-Regression
Running the two-step estimation of the RIF-regression remains a challenging problem.The "classical" approach consists of estimating independently the influence function by kernel estimation and the regression coefficients (see Firpo et al. 2009).However, the kernel density estimation in the first step may lead to unreliable inference in the presence of heavy-tailed distributions as theoretically shown by Bahadur and Savage (1956) and empirically evidenced by Davidson (2012).The Bayesian estimation method of the RIF consists of choosing a mixture representation for the density function by solving a data augmentation problem by a Gibbs sampler and then estimating the regression coefficients.A first MCMC algorithm, which combines the two steps of estimation in a sequential process in linear RIF-regression, was suggested by Lubrano and Ndoye (2014).However, the estimated RIF function is a binary dependent variable; the linearity and the normality assumptions are strong and may lead sometimes to predicted probabilities that are negative or greater than one.Following the dichotomous structure of the RIF in (3), a non-linear model can be estimated using a logistic (probit) regression.We take the opportunity of this requirement to introduce a hybrid MCMC method, which is called a Gibbs within a Metropolis-Hastings algorithm.
The conditional expectation of the RIF is expressed as: the average marginal effect of covariates is given by: where ĉ1τ = 1/ f (q τ |θ) with θ are the mixture parameters estimated by the Gibbs sampler.The average marginal effect ∂x can be consistently estimated by a logit regression considering the dummy variable y iτ = 1I(y i > q τ ) that is regressed on x i to derive the RIF-regression coefficients, γ τ .A Bayesian estimation of a logit regression can be done by a Metropolis-Hastings sampler where the starting values are derived from the estimation of the regression coefficients in a linear probability model.
The average marginal effect from a logit model will be consistent only if: where Λ(.) is the cumulative distribution function of a logistic distribution.
The likelihood of the sample is then given by: For a given prior π(γ τ ), the posterior distribution π(γ τ |y, x) is: The Gibbs sampler is difficult to implement since conjugate priors do not exist because the logistic likelihood function does not belong to the exponential family.Therefore, we consider a Metropolis-Hastings sampler, which can be tuned only with the likelihood function under a flat prior on γ τ .
The proposed approach for the RIF-logit developed is a Gibbs within a Metropolis-Hastings sampler algorithm, as it first requires the use of the Gibbs sampler to estimate the mixture of lognormal densities 9 for ĉ1τ = 1/ f (q τ |θ).
Gibbs within a Metropolis-Hastings sampler algorithm.
The Gibbs sampler for the mixture of lognormal densities was developed in Lubrano and Ndoye (2016); see also Marin and Robert (2007) for the mixture of normal distributions.
With probability ρ(γ τ to obtain the estimates of the RIF-regression coefficient, βτ . Without any prior information, the flat prior on γ τ can be considered, π (γ τ ) ∝ 1.For comparison purposes, we will consider Zellner's non-informative G-prior: We can notice that the RIF-logit estimation approach makes assumptions about the functional forms of the P(Y > q τ |X = x) in (4).Firpo et al. (2009) suggest the nonparametric-RIF (NP-RIF) regression method based on polynomial series approximations and show that RIF-logit regression yields estimates very close to the fully-nonparametric estimator.However, the choice of the nonparametric estimator is not crucial in large samples as discussed by Newey (1994); if the domain is unbounded, the polynomial series would also poorly approximate the tails.

Data and Descriptive Statistics
The Senegal Poverty Monitoring Report (ESPS, 2005) is a nationally-representative survey conducted by the National Agency of Statistics and Demography.The survey is constructed to provide information related to the evaluation of poverty and to the assessment of the impact of public policies.The ESPS sample covers 13,500 of households of all social classes and from all geographical areas of residence.
Table 1 reports descriptive statistics concerning the characteristics of households and information on the head of the household.It shows that two-thirds of household-heads are illiterate, around 13 percent have reached primary education, 9 percent a secondary education level and less than 5 percent a tertiary level and equivalent.Senegalese families are often extended, nine persons per household on average, and more than half are between 40 and 65 years old.About 80 percent of household-heads are employed (self-employed or salaried).More details on the descriptive statistics of these data are given in the summary reports of the two surveys published by the National Agency of Demography (ANSD 2007).
The estimation of a given equivalence scale relies on a particular consumption model, which is rather restrictive and therefore may lead to identification problems.The usual practice consists of using the per capita income, dividing the household income by the household size.That is what we use in this study referring to Deaton and Muellbauer (1980) and Deaton (1997) and empirical work by the World Bank with Ravallion (2001).

Real Consumption Expenditure Per Capita Distribution
We consider the annual real consumption expenditure as an indicator of permanent income.The consumption expenditures are expressed in CFA francs. 10The WAEMU 11 Harmonized Consumer Prices Index (HCPI) was respectively 10.94 in 2001 and 11.3 in 2005, revealing a small inflation rate of 0.036 points.The total consumption expenditures in the survey are already deflated by sectors using the national Consumer Prices Index (CPI).The differences in weight in CPI between urban and rural sectors nicely reflect the consumption expenditure structure.In fact, foods are typically less expensive in the rural sectors, and urban households are more likely to consume higher quality goods, which increases their consumption expenditures.The total consumption expenditure in the sample is the sum of food and non-food expenditures, with self-consumption added.
Table 2 presents the distribution of the real annual consumption expenditure per capita.The sample reveals that the largest part of the Senegalese household's consumption expenditure is on food (45.6%) and housing (20%); the remainder of the budget is mostly used to cover the clothing expenditure, health and items expenditure.
Since the distribution of the consumption expenditure is often skewed to the left, we impose a restriction on the form of the distribution.We estimate the density function by a mixture of normals using a Gibbs sampler.Figure 1 presents the estimation of the real consumption expenditure per capita 10 −6 by a mixture of two lognormal distributions.

Empirical Application
In the RIF-regression models, we consider a Mincer type model where the logarithm of the consumption expenditure per capita is the dependent variable.We estimate returns to education at different levels by converting the continuous years of the schooling variable into three dummy variables referring to the completion of the main schooling cycles12 .This return to education refers to the marginal effect of the level of education on the household's consumption expenditure per capita.
We consider the following set of covariates: primary, secondary and tertiary as dummies, which refer to the level of education of the head of household; age and its square13 refer to the age of the heads of household; the dummy female refers to a female headed-household; the dummy married refers to a married household's head; the dummy rural is the rural geographical area of residence.We restrict the estimations to five quantiles (0.10, 0.25, 0.50, 0.75, 0.90).
In this case, the RIF-regression allows us to evaluate the marginal effect of the changes in the distribution of covariates on the quantiles of the marginal distribution of the total consumption expenditure per capita.
Tables 3 and 4 report the RIF-regression estimates.They show the marginal effects of different covariates on the household's expenditure consumption per capita and their changes across the five quantiles.The regression coefficients are estimated by the hybrid MCMC RIF-estimation methods developed in this paper.The density function of the dependent variable (log of the expenditure consumption per capita) is estimated by a mixture of normal distributions.Returns to education: For both estimations, the marginal effect of education monotonically increases with the level of education and with quantiles.The rate of change in the returns to education across quantiles provides evidence of significant differences between the bottom and the top of the distribution.For all educational attainment levels, the marginal effects and their rate of change are significantly larger for upper quantiles (0.5, 0.75, 0.90), especially the secondary and the tertiary levels.The marginal effects of the secondary and tertiary education largely dominate the upper part of the distribution.The primary education is significant for all quantiles except the lowest 10 percent; its return increases from the first quartile to the third quartile and then slightly decreases for the highest quantiles.The rate of change in the return to primary education is small and much lower than those to secondary and tertiary educations (see also Table A1 in Appendix A).This result is in line with findings showing that in countries that rapidly expand access to primary education, the returns to primary education fall, while returns to higher education rise (see for instance Psacharopoulos 1994;Psacharopoulos and Patrinos 2002).In contrast, "primary education continues to be the number one investment priority in developing countries" (Psacharopoulos and Patrinos 2002).
Including age-square, the results show an overall negative effect of age on the household consumption expenditure.Its marginal effect monotonically increases across the first four quantiles and is not significant for the 90th quantile.On average, an additional year of age decreases the household consumption expenditure (in log) by approximately (0.667 0.395 0.294 0.243), respectively.For each of the quantiles (0.10, 0.25, 0.5, 0.75), these marginal effects also increase with age 14 .
The marginal effects of the household's size monotonically decrease, and their rates of change across quantiles are higher for upper quantiles.Living in rural areas has a negative and significant effect on the consumption expenditures for all quantiles.Senegal's rural economy is largely agricultural, which is seasonal.The marginal effects of living in rural ares are comparatively higher than the other effects of covariates for poor households.Indeed, the urban labor force is more skilled and earns higher wages than the rural labor force.

Conclusions and Policy Implications
In this study, we provide a Bayesian estimation method for the unconditional quantile regression model based on the Re-centered Influence Function (RIF).The method makes use of the dichotomous structure of the RIF and estimates a non-linear probability model by a logistic regression using a Gibbs within a Metropolis-Hastings sampler.This approach performs better in the presence of heavy-tailed distributions.Applied to a nationally-representative household survey, the Senegal Poverty Monitoring Report (2005), the empirical results primarily show evidence from the heterogeneous pattern of changes in the rate of returns to education across quantiles and across the different levels of education.The marginal effects of education monotonically increase and are comparatively higher for upper quantiles (0.50, 0.75, 0.90).The return to primary education does not vary much across quantiles compared with those to secondary and tertiary education.
In most developing countries, promoting education is not only for development policy and for eradicating poverty, but it is also an argument to attract institutional financing and other forms of aid from donors.Senegal witnessed one of the largest increases in the achievement of the second priority of the MDGs.The rate of primary education in Senegal climbed from 54 percent in 1994 to over 82 percent in 2005.In Senegal, as well as in most developing countries, the quality of education in public schools has deteriorated following the increase of enrollment rates.The growing number of primary schools has partially contributed to the literacy and encouraged the education of girls.In contrast, the growing number of public primary schools disadvantages children from low-income families due to the lack of educational resources. 4 5 Suivie de la Pauvreté au Sénégal", 2005-2006; ANSD, "Agence National de la Statistique et de la Démographie". 7

Table 1 .
Characteristics of heads of households.

Table 2 .
Real annual consumption expenditure per capita.

Table 3 .
Bayesian RIF estimates on the log-income without using prior β.RIF, Re-centered Influence Function.The age variable was divided by 100.age 2 represents the square of age.Standard errors are indicated in parentheses.Bold figures correspond to posterior means for which 0 is contained in a 95% HPDinterval.

Table 4 .
Bayesian RIF estimates on the log-income.The age variable was divided by 100.age 2 represents the square of age.Standard errors are indicated in parentheses.Bold figures correspond to posterior means for which 0 is contained in a 95% HPD interval.