Higher-Order Risk–Returns to Education

: In the traditional human capital framework, education is often considered as an investment, rather than consumption, while consumption is not necessarily precluded. Whether education is an investment is empirically unclear and relatively under-explored. We shed light on this issue by estimating the risk–return trade-off in the context of education. If education is indeed an investment, risk could play an important role in individual educational decisions just as with risky assets. As portfolio theory predicts, there could be a trade-off between returns to education and risks concerning those returns: higher risks are generally associated with higher returns. We contribute to the literature by proposing various measures of risk based on the entire distribution of returns to education recovered by our nonparametric models. Our results conﬁrm a trade-off between returns and variance. We also found statistically signiﬁcant impacts for the higher moments: skewness and kurtosis. Interestingly, we found the relationship between mean returns and variance to be linear, and the relationship between expected returns and higher-moments (skewness and kurtosis) is non-linear.


Introduction
Education has been considered both as an investment and consumption good (Stiglitz 1974;Schaafsma 1976). The investment aspect of education is similar to financial assets in that they both affect future income and well-being, and investments in human capital and financial assets are both associated with considerable uncertainty (although insurance options for reducing riskiness in human capital accumulation do not exist -Shaw 1996). To that end, a growing body of literature argues that risk should be included in traditional human capital models (Heckman et al. 2008;Heckman et al. 2006;Mazza et al. 2013;Cunha and Heckman 2007). If individuals consider education to be an investment good, as opposed to a consumption good, decisions about the level of investment in education should be made in the same way as the decisions about any other risky assets.
Extensive theoretical and empirical literature exist linking an individual's decision making under uncertainty with their risk attitudes, which are the basis of portfolio theory for risky assets. Most of the focus has been on second-order risk, known as risk aversion (starting with Bernoulli 1738). Risk averse investors prefer a less risky portfolio to a riskier one for a given mean return. This assumption implies that an investor will take on more risk only if he/she is expecting higher returns. Empirically, we should observe a positive correlation between the risk-level of a portfolio, usually measured by the variance (second-order risk) of the distribution of its returns, and its expected return (i.e., the mean of the distribution of its returns); the so-called risk-return trade-off. Among the relatively scarce literature examining the investment aspect of education, emphasis has been placed on testing the risk-return trade-off, for which empirical evidence supporting it has been found.

Literature Review and Our Contributions
Our work is related to the earnings dispersion literature. Kuznets and Friedman (1939) first discovered a positive relationship between the mean and variance of incomes among workers (from a selected few professions). In subsequent work, authors have found a positive relationship between the mean and variance of earnings across occupations (including Hartog and Vijverberg 2007;McGoldrick and Robst 1996;Cubas and Silos 2017), across fields and/or types of education 1 See Hartog and Vijverberg (2007) for a study on second and third-order effects of risk with respect to log wages in a parametric setting. (Christiansen et al. 2007;Backes-Gellner et al. 2010) and across different levels of general education (Palacios-Huerta 2003). By including risk preferences (such as second-order risk attitude) into a basic model, researchers have been able to better predict individuals' choices in the presence of risk/uncertainty (Ghosh and Ray 1997).
We contribute to this literature in three different ways. First, we use the distribution of returns to education (gradients) directly as opposed to the distribution of earnings or residuals. Most of the empirical literature on compensation for risk in an educational context uses either moments from the distributions of estimated earnings or moments from the distributions of estimated residual earnings to measure risk-return trade-offs. Specifically, they estimate those earnings (or residuals) using a Mincer earnings equation, obtain the higher moments (variance and occasionally skewness) from the distributions of those estimates and then add those higher moments back into the Mincer equation (risk augmented Mincer earnings equation). Instead of looking at earnings dispersion, we estimate the moments from the distributions of the gradients on earnings obtained from our augmented Mincer equation. We believe that this will allow us to directly capture the risk of investing in education rather than the more general risk of the labor market as captured by earnings (or residuals) dispersion.
Second, in addition to the variance (second moment), we also use the third and fourth moments (skewness and kurtosis) of the rate of returns' distribution to capture risk. Only a few papers have looked at the relationship between higher-order risk, particularly skewness, and the mean, and none so far (as we are aware of) has examined the trade-off between kurtosis (fourth-order risk) and mean returns. The theoretical literature on decision making under uncertainty suggests that higher-order risk attitudes also matter when individuals make decisions. By adding those higher moments into our model, we can investigate if there also exists compensation for the third and fourth moment risk-types. If we find compensation for those other types of risk, it would suggest that the variance does not fully capture the risk of investing in education and that higher-order risk attitudes play a role in education investment decisions.
Third, as opposed to the parametric structure (log-linear model) that the literature has typically imposed, we model the relationship between mean returns and higher-order risks nonparametrically. That is, we allow the relationship between years of schooling and earnings and the relationship between risk of investing in one more year of schooling and expected earning to be nonlinear. The relationship between risk and expected earnings is unknown and likely to be nonlinear. For example, compensation for risk may vary with the level of expected returns. Also, risk attitudes are likely to be heterogeneous. This cannot be captured by a single, constant parameter without assumptions on the underlying preferences of individuals (e.g., constant relative risk aversion). For example, risk attitudes might have changed over time (i.e., individuals in the 1980s might be more risk-averse than individuals in the 2010s) and would require more compensation for risk (i.e., have different gradients). Our nonparametric model allows us to observe this potential heterogeneity which so far has been ignored in the literature.

Methodology
To estimate if there are risk-return trade-offs to education, we employed a four step procedure. Our approach was as follows: (1) estimate an individuals' rate of return to education; (2) merge those rates into our original dataset; (3) split the sample by year and state and obtain our various measures of risk; and (4) establish if there is indeed compensation for that risk.
In our first step, we use a Mincer-type equation which regresses log wage on education and experience. More formally, we nonparametrically regress the log of an individuals' weekly wage on education and other control variables for each of the 8 years in our sample separately via 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, where wage is the weekly wage, educ corresponds to the number of years of education (beginning with first grade) and X is a vector of control variables (e.g., years of experience, marital status, metropolitan area, etc.). As our sample includes both continuous and discrete variables, we will use categorical regression splines (CRS), a method developed by Ma et al. (2015), to deal with categorical variables that cannot be accommodated by traditional spline regression. In short, the method combines spline regression (for continuous variables) and discrete kernel weights (for categorical variables). In our case, continuous variables such as education and experience will be modeled using splines, while categorical variables (such as race and metropolitan area) will be incorporated using kernel weights. 2 As extensively discussed in the literature (Card 1999;Card 2001), schooling choice may be related to unobservable variables such as innate ability, which can bias our estimates (i.e., omitted variable bias). To address this issue, we use the nonparametric instrumental variable (IV) approach developed in Horowitz (2011). 3 This approach is desirable relative to alternative IV approaches in the literature such as Darolles et al. (2011)'s approach, given the relative computational requirements.
Horowitz (2011)'s approach is analogous to the IV approach in the linear case (OLS with IV) and thus has similar requirements for its instruments to be valid. Specifically, this approach assumes existence of an IV, W, such that for Equation (1): where W is an instrument for education. It follows that, using the CRS methodology, we estimate the unknown function m(·) such that Similar to parametric IV methods, we want an instrument W that is correlated with years of education (educ), but not with ability (and thus will not affect weekly wage directly). Unlike the parametric IV approach, the number of instruments cannot exceed the number of endogenous variables. In our case, as we consider education to be the only endogenous variable in our model, we will only use a single instrument (discussed in the data section).
In step 2, we obtain individual-specific returns to education using nonparametric estimation. Specifically, for each of our t = 1, 2, . . . , T (8 years) regressions, 4 we obtain gradients for each individual as In step 3, we compute both average returns and various measures of risk for each state j, in each time period t. Specifically, we obtain the first four moments for our analysis of the risk-return trade-off in education as 2 According to Ma et al. (2015)'s simulations results, for a small number of continuous variables (less than 5) and a relatively large sample size (more than 500), as we have here, CRS results in a lower computational burden compared to a kernel-only regression. Moreover, their simulations show that CRS outperforms the only two spline regression alternatives in the presence of categorical variables: a frequency estimator which breaks the sample into subsets and an additive regression spline model (which is only consistent if the model is additively separable).

Mean tj = rr tj
where rr tj is the sample average (over individuals i) for state j in year t; "Var" is the variance; "Skew", the skewness, measures the degree and direction of asymmetry in the distribution; and "Kurt", the kurtosis, measures the heaviness of the tails. N tj is the sample size for each of the 408 subsamples (=51 states in each of the 8 years). 5 In our final step, we use a standard spline regression to regress the mean of returns to education 6 on its corresponding higher moments (our measure of risk) for each subsample: where our nonparametric model allows the relationship between the measure of risk and the return to education to be potentially nonlinear. There are several reasons why such nonlinearity may exist. First, a risk-averse individual may require a different level of compensation at a different level of risk, income and/or education level. For example, an individual may not act the same way at the margin in a relatively low-risk environment compared to a high-risk environment. Second, individuals may have heterogeneous risk attitudes. That is, relatively more risk-averse individuals may require larger earnings compensation for the same level of risk than relatively more risk-seeking individuals. Further, the trade-off between mean returns and one type of risk may also depend on other types of risks. If the theoretical literature on decision making under uncertainty holds, we expect to see a risk-return to education trade-off, that is, in the fourth step, our gradients on variance and kurtosis will be positive and our gradients on skewness will be negative.

Data
We use individual-level data obtained from the March Current Population Survey (CPS) for quinquennial survey years between 1980 and 2015. The CPS provides information on individual characteristics such as earnings and educational attainment in the calendar year preceding the March survey. Following the literature, we focus our analysis on men 7 in the labor force with a positive income between the age of 20 and 59 (working age) living in a household (excluding group quarters). We also exclude self-employed men as their income measures tend to be less reliable and they tend to have different risk tolerances than typical wage earners (Cramer et al. 2002;Ekelund et al. 2005). Our final potential sample was composed of 245,401 men with information on their race, age, years of education (0 to 20), occupations, household characteristics (children and marital status dummies) and geographical location (state dummies, region dummies and metropolitan area dummies). 5 We are abusing the definition and calling the District of Columbia a state here. 6 Most papers in the literature (e.g., Hartog and Vijverberg 2007) parametrically regress log wage on risk, obtained via the residuals of a first stage Mincer regression, whereas we are using the mean returns themselves. 7 The literature has typically focused on men as women's labor force participation rates are generally low and there exists sample selection for working women (Henderson et al. 2011). Not only are the nonparametric methods to address this issue challenging (Ozabaci and Henderson 2015), but they often require an exclusion restriction that can be difficult to find, let alone be non-controversial. Given that our main contribution is the methodology, we leave the analysis of women to future research.
Following the literature on the estimation of the Mincer earnings equation, we use log weekly wages as our dependent variable, 8 after adjusting for inflation. We exclude 69 individuals with zero weekly wages. In our model, we also control for a dummy variable indicating whether or not wages are top-coded (one if yes, zero otherwise). 9 Education, our main independent variable of interest, is simply the number of years spent in school, starting from first grade.
When estimating the returns to education, we control for the following variables. As we do not have information about individuals' work experience, we proxy it using potential years of experience as exp = age -years of schooling -6 (negative numbers were replaced by zero). To avoid multicollinearity issues by having both age and experience together in our model, we use age groups. The age group (ordered) variable age is equal to 1 for men in their 20s, 2 for men in their 30s, 3 for men in their 40s and 4 for men in their 50s. Information about race was combined into one dummy, which equals 1 if the individual is not white and zero otherwise. We control for several household characteristics: children and marital status. The child dummy is equal to 1 if any children are living in the same house. The married dummy is equal to 1 if the individual is both married and living with his spouse, and equal to 0 otherwise (e.g., the individual is married but the spouse is absent, or if the individual is divorced, widowed, or has never been married). Finally, we control for regional effects using nine regional dummies and for city effects using a metropolitan (metro) dummy which equals 1 if the individual lives in a central city or right outside a central city.
For the nonparametric IV approach, we borrow a measure of family characteristics. The literature has proposed some candidates for an IV, but all of them have been shown to have some limitations. For example, some used cost of schooling in terms of money and time (Card 2001;Card 1993), changes in legislation/reforms (Harmon and Walker 1995;Meghir and Palme 2005), birth date/cohort (Angrist and Krueger 1991;Angrist and Krueger 1992;Kaymak 2009), a combination of birth and reforms (Acemoglu and Angrist 1999), a combination of birth and family background (Winters 2015) and family background (Blackburn and Neumark 1993;Parker and Van Praag 2006;Hoogerheide et al. 2012). We use spouse's yearly wage, a measure of family characteristics, as our IV for two reasons. First, it is a good proxy for a spouse's productivity and ability. Considering the theory of positive assortative matching on education in marriages, this would be highly related to one's education, but not necessarily directly related to one's wages. Second, while our data include parents' education, spouse's education, parents' income and spouse's income (bottom of Table 1), this measure minimizes the loss of information due to missing values. 10 We need to emphasize that our goal here is not to debate about the validity of these IVs, but rather use one to illustrate our proposal in evaluating the risk-return trade-off. For alternative IVs, one can readily employ our approach to assess the robustness of our results.
In our third step, we use our state variables to obtain mean returns and various risk measures for state local markets. That is, we infer that each state's labor market faces different risks due to different "local" conditions and laws.
From the descriptive analysis of our full sample (Table 1), we can observe that the average worker earns a wage of about $830 per week, has about 13 years of schooling (starting with 1st grade), has 19 years of work experience and is about 39 years old; 13.8% of our sample was non-white, 67.5% was married (with the spouse present in the household) and 55.7% had at least one child living with them. Note that the last five variables in Table 1 are the five possible instrumental variables (IVs) we considered. However, too few individuals reported their father's income, father's education and mother's education, for them to be used in our regressions. On average, the spouse of the men 8 We use weekly wage instead of yearly wage to control for productivity. 9 Data that is top-coded reduces the chance of a right-skewed distribution of wages. 10 This can create further sample selection issue. In the Chinese context, Wang (2013) shows that the issue is potentially more severe when using parental characteristics than spousal characteristics. in our sample earned $17,175 per year (about $488 per week). We also considered using the spouse's weekly wage, but favored yearly wage as it was reported at a higher rate. 11,12

Results
This section will be broken down according to our four step procedure: (1) we used categorical regression splines with an IV (Ma et al. 2015) to separately estimate the returns to education for each individual (in each time period) in our sample; (2) we merged the data from the first step back into our original dataset; (3) we split the sample by year and state, and obtained the first four moments (mean, variance, skewness, kurtosis) of the rate of returns' distribution for each state in each year; and (4) we regressed the mean (first moment) returns to education on the higher-order moments to see if we would observed compensation for each type of risk.

Step 1: Estimated Rates of Returns
In our first step, we regress (using CRS) the log weekly wage on schooling and our control variables (described in the data section) separately for each quinquennial year. 13 The summary statistics of men's gradients on schooling (i.e., returns to schooling) for each year are presented in Table 2. 14 For example, the first row of Table 2 represents the summary statistics of the returns to schooling for all the men in our sample surveyed in 1980. Hence, on average, men surveyed in 1980 earned an additional 7.37% on their weekly wage for one more year spent in school. Similarly, the median man earned an additional 7.53%. For completeness, we also performed parametric regressions (adding polynomial terms for education and experience) with and without an IV (spouse's wage). We found the estimated rate of returns range from about 6% to 12% in the models without IVs, and from about 4% to 23% in the models with IVs. These results are in line with what is commonly found in the literature . While the estimated rate of returns are similar, we do see important differences, especially in terms of our ability to handle dispersion (which we highlight below).
Overall, we note a general increase in individuals' returns to education over time. This general trend is typically found in the literature Deschênes 2006). The two panels in Figure 1 show this trend over time. The top panel plots every individual's gradient on schooling for each year. The upper and lower lines, respectively, connect the maximum gradient for each year together and the minimum gradient for each year together. This shows how the range of returns to education varied substantially across states and over time. The middle line connects the mean gradients together. The bottom panel zooms in on the mean gradient to better show its evolution over time. The two dashed lines represent the mean of the 95% upper and lower bounds of the the estimates on the gradients on schooling. 15 Apart from a general increase throughout the whole period, we note a significant drop in mean returns in 2010, the aftermath of the financial crisis, followed by a rebound in 2015. 13 All regressions had knots placed at equally spaced quantiles. The number of degree and segments (i.e., knots minus one) were all selected via cross-validation (CV) with 8 restarts from different random initial points (5 is the default, we increased it to a higher number to avoid local optima). The number of segments selected via CV for both schooling and experience were between 1 and 5 and the number of degrees were either 2 or 3. We used the default regularization method "Landweber-Fridman." 14 The R-squared for each of those regressions ranged from 0.30 to 0.65. 15 The upper and lower estimates are often mistaken as an alternative to the commonly used confidence bounds in parametric models.

Figure 1. Gradients on schooling (education) from
Step 1. Figure 2 shows the resulting density of rate of returns to education for each year. Note that those distributions do not appear to be Gaussian. Other than perhaps for 1980, they all appear to be skewed (to varying degree). Their variance and tails (kurtosis) also seem to vary from year to year. The results here are similar to those in Henderson et al. (2011), who study the distribution of returns to education over time, but we note that they did not control for endogeneity in their study.

Steps 2 and 3: Moments by Year and State
In our second step, using these individual-specific gradients, we can compute mean returns and various risk measures for each state labor market in each year. 16 We compute four moments for each state market in each year: mean, variance, skewness and kurtosis. We thus have 408 observations (=51 states in each of our 8 years) for each moment. Table 3 presents the summary statistics for each of the 4 moments obtained. To have a better idea of what the data look like, we also plot the density of each moment in Figure 3.   The density of mean returns appears to have two modes (one near 0.08 and another near 0.20). On the other hand, the densities for the other three moments appear flat or unimodal. The variance typically goes from 0 to 0.10, while the skewness has some negative cases (not uncommon, see Hartog and Vijverberg 2007), but most lie between 0 and 0.04. Kurtosis is also relatively minor with values that lie between 0 and 0.02.

Dynamics of Each Moment
To get a better idea of the variation that exists in the data, Although expected returns have been generally increasing over time, Figure 4 shows that they decreased in both the period between 1990 and 1995 and the period between 2005 and 2010. Such decreases could be attributed to the small recession of 1990 and the economic crisis of 2008, respectively. 17 The range of expected returns were particularly large in 1995 and 2000.
We also note that variance is positively correlated with time (year), and more specifically, it has generally been rising over time. The correlation between the variance and year is roughly 0.60. This is different from the results of  on the dispersion of rate of returns to education. They obtained their variance using a random coefficient model and found that the variance in rate of returns did not increase over time. They link this result to the rapid increase in participation in higher education and argue that if the expansion was driven by easier access to credit and by dipping further into the ability distribution, then we should observe an increase in the variance of returns over time.
Our results are able to support that theory. Figure 5 shows a general increase in the variance over time for both the top and bottom five states. Note that a high variance indicates high inequality in rate of returns to education within each state. While the year 2000 had unusually high variance between states, it also had high variance on average compared to other years, indicating high inequality within states as well. Minnessota and Kansas, in particular, stand out with very high variance in 2000. Those two states also stand out with very high skewness and kurtosis in 2000 (Figures 6 and 7).
In contrast to the results for mean returns and variance, the correlations between the other two higher moments and time are much lower. Skewness increases slightly over time (i.e., becomes more positive, with a correlation of 0.13 with year). Kurtosis, however, does not change much over time (a correlation of 0.02 with time). Figures 6 and 7 show skewness and kurtosis over time respectively. While both do not vary by much from year to year, the years 1980 and 2000 both display a few states with large values. 18 This suggests that the risk faced by individuals varied greatly between states. Note that Minnesota and Kansas stand out again with very large estimates for skewness and kurtosis in 2000. Table 4 shows the correlation between each moment. There is a positive correlation between expected returns (mean) and all three higher moments. However, the correlation with the fourth moment (kurtosis) is low and not statistically significant. While the correlation of expected returns with variance and with skewness are both statistically significant, the correlation with skewness is much lower (0.25). Table 4. Moments correlation matrix.

Mean
Variance Skewness Kurtosis mean 1.00 0.86 *** 0.25 *** 0.06 variance 0.86 *** 1.00 0.55 *** 0.46 *** skewness 0.25 *** 0.55 *** 1.00 0.83 *** kurtosis 0.06 0.46 *** 0.83 *** 1.00 Note: * p < 0.1; ** p < 0.05; *** p < 0.01. 17 Note that the District of Columbia was an exception in the early 1990s drop in expected returns. Its particularly low expected return to education in 1990 (compared to other states) was followed by a dramatic increase in its expected return between 1990 and 1995. 18 While 2015 does not have particuarly extreme values, it does show a wider range than usual for both skewness and kurtosis.     Figure 8 plots the higher moments against the expected rate of returns to schooling (mean). For each row of plots, the right panel is a zoom-in of the left panel (i.e., excludes more extreme values). Plotting the expected returns against their variances (top two panels) yields an unsurprising result: the higher the variance of the distribution of rate of returns to schooling, the higher the mean of returns. The second row of plots in Figure 8 show the relationship between skewness and the mean rate of return for each distribution. The majority of the 408 distributions are positively skewed. Considering that income has a lower bound (i.e., zero), it is not surprising that rates of return distributions tend to be positively skewed. The more positively skewed the distribution, the higher the expected returns appear to be. If taken literally, this result indicates that individuals may consider any type of skewness (positive or negative) as a risk and seek compensation for both types. This is in contrast to what is often found in studies regressing log wages on skewness whereby authors argue that workers exhibit skewness affection (Hartog and Vijverberg 2007).
The bottom two panels in Figure 8 plot the kurtosis against the expected rate of returns to schooling for each of the state-year labor markets. While the left panel only shows us that most of the distributions have a kurtosis close to zero, the right panel seems to indicate a positive relationship between the first and fourth moments of the distributions for very low levels of kurtosis (below 0.01). Looking at the bottom-right panel of Figure 8, we note that the kurtosis values in our sample are all between 0 and 3 (platykurtic distributions). While a few are higher than 0.5, most are very close to zero (i.e., thin tails). This result is interesting as it suggests that there exist relatively few extreme average mean returns in our sample (as evidence by the top and bottom five figures).

Step 4: Risk-Return Trade-Offs
In our final step (4), we nonparametrically (spline) regress the mean return to education on its higher moments (i.e., variance, skewness and kurtosis) to investigate if there is indeed compensation for risk (while including a time trend and state fixed effects). 19 Our model specification is obtained via cross-validation. The cross-validation procedures determined that the relationship between expected returns and variance is linear, while the relationships between expected returns and the higher moments, skewness and kurtosis, are nonlinear. 20 Tables 5 and 6 present the summary statistics of the gradients found via this spline regression. In Table 6, we exclude gradients that are not statistically significant. While all of the gradients of variance are constant and statistically significant, about 40% and 20% of the gradients on skewness and kurtosis, respectively, are not statistically significant. As our distributions in our sample have kurtosis values very close to zero (with little variation), it is not surprising that the gradients on those low kurtosis estimates are not statistically different from zero. The gradients on skewness that are not statistically significant are also relatively small values. This results is to be expected as there should not be any additional compensation for symmetric distributions (i.e., skewness close to zero). For comparison purposes, we also show the OLS results (Table 7) using the same moments found nonparametrically in our first step. Compared to the spline results, OLS seems to overestimate the compensation on variance while underestimating the compensation on skewness (0.15 is less than the first quartile of the skewness gradients 0.85). More strikingly, the OLS coefficient on kurtosis seems to suggest that a higher kurtosis will result in a lower expected return. This contradicts theory. However, the spline results find gradients of far higher magnitude and of both signs, suggesting a far more complex relationship between kurtosis and expected returns. While Figure 9 shows the histograms of gradients on skewness and kurtosis, Figures 10-13 plot the gradients of each higher moment against the first moment (left panel) as well as against skewness and kurtosis (right panel). Note that the cross-validation procedure determined that variance enters linearly and so there is no variation (a constant derivative) in the gradients of variance across both the expected returns and the variance itself. Figure 14 shows the statistically significant gradients on 45 • plots (Henderson et al. 2012) with their respective 95 percentile confidence bounds. The gradients on variance are all positive, as expected, and statistically significant. That is, we do find compensation for second-order risk, as has been found in the literature. Interestingly, we find that this relationship is linear and thus constant over both the expected returns and the variances themselves. The gradients are equal to 1.34. This means that for any level of variance, increasing the variance by 1 unit will increase the expected return by 1.34 units, on average. However, the variance ranges from 0.01 to 0.28. Therefore, an increase of 0.10 units or 0.01 units would be more reasonable than an increase of 1 unit. Hence, if variance increases by 0.01 units, the expected rate of returns to education will increase by 0.0134 units or 1.34 percentage points. Going, for example, from 20% to 21.34%, is not a negligible amount.  The effect of skewness on the expected returns is a bit more complicated, and shows a relationship that could not have been captured by a simple linear model. Educational investment is unique as compared to other types of investments in that rate of returns to education distributions are mostly positively skewed. As we are dealing with mean returns as opposed to log wages, it is less clear what type of results we should expect. However, we know that the empirical literature on educational investments has found that skewness either has no effect or that the relationship is negative (Hartog and Vijverberg 2007), implying that the more positively skewed a distribution is, the lower its expected returns is going to be on average.  Our results show that all of the distributions with a negative skewness (upper left quadrant on the right panel of Figures 10 and 11) have positive gradients. This suggests that negative skewness is not compensated by a higher expected rate of return to education as one may expect. If taken literally, this implies that, decreasing skewness from -0.01 to -0.02 (i.e., more risk) would result in a decrease in expected rate of returns to education (0.85 * −0.01 = −0.0082). That being said, there are only a few observations with negative skewness (only 5 out of 408 observations are negative), so little can be concluded from this lack of variation.
We also observe that some of those distributions have a very high and statistically significant compensation for a slightly positively skewed distribution, while distributions with a slightly more positive skewness have a statistically significant negative compensation. The latter seems to indicate that individuals are getting a lower return to education on average when they are in a more positively skewed market. The former suggests the opposite, that is, individuals require compensation for positive skewness as well.
The effect of kurtosis on expected returns is nonlinear as well. We see both positive and negative gradients. In the face of log wages we may expect only positive coefficients. While many of these are statistically significant, the kurtosis values themselves are very close to zero. Consider, for example, the few gradients on kurtosis that are equal to 9.81. If the kurtosis increased by 0.01 units (still a relatively big increase considering that most of the kurtosis values lie between 0 and 0.02), the expected returns would increase by 0.0981 units or 90.81 percentage points. Hence, those percentage changes are large, but the absolute magnitudes are not.
For completeness, Figure 15 plots the gradients on time versus time. The time trend effect seems to be overall very low. Interestingly, the expected returns seemed to have increased for the first 20 years in our sample , while it decreased on average for the last 10 years (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015).

Conclusions
In this paper, we investigated whether education acts as an investment good. If so, parts of individuals' return to education could be a premium for the risk they incur when investing in education. In other words, risk could play an important role in the educational decisions of risk-averse individuals. To examine the existence of compensation for risk in education, we employed nonparametric methods to obtain individual-specific returns to education and hence the entire distribution of returns to education. This approach, therefore, allows us to uncover risk measures beyond variance, including skewness and kurtosis.
Overall, our results suggest that education acts like an investment good rather than a consumption good. We obtained statistically significant compensations for variance, which is in line with the previous empirical literature. Our results also show that there exists compensation for skewness and kurtosis. The compensations for higher-order risks beyond variance are nonlinear with their respective moments, and are not always in the direction (the gradient's sign) that one would expect when the outcome variable is a log wage. While kurtosis has thus far not been studied in an educational investment context, our analysis found that 80% of our gradients on kurtosis are statistically significant. This suggests that kurtosis is also an important risk measure to consider when analyzing educational investment behaviors.
Our results have important implications. First, there is a need to take into account high-order risks in the theoretical modeling of human capital investment, which has been typically ignored in the literature. Second, the literature has typically observed an increasing trend of returns to education. Our results suggest a potential explanation for said pattern is that the trend may partly reflect the increase in various risks surrounding educational investment over time. These results are also important for discussing relevant governmental policies for educational investment, as high estimated returns to education may not necessarily be completely a result of enhanced productivity. This may also explain why individuals choose to drop out of school despite high returns to education. Policies aimed at promoting education should also consider ways to mitigate potential risks associated with educational investment.
A deeper analysis could help better explain the nonlinearities we observed. For example, sub-sampling our final stage further could show possible heterogeneity in risk and in risk compensation. It would be interesting to study the rates of return to education by sex, occupation, race and educational level. It would also be worthwhile to consider the costs of education (e.g., student loans). Future research is also warranted to examine the risk-return trade-off for females, and consider the role of sample selection in estimating the return.