Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Compulsory Schooling and Returns to Education: A Re-Examination

Econometrics 2019, 7(3), 36; https://doi.org/10.3390/econometrics7030036

by Sophie van Huellen^*

and Duo Qin

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Econometrics 2019, 7(3), 36; https://doi.org/10.3390/econometrics7030036

Submission received: 25 May 2019 / Revised: 22 July 2019 / Accepted: 29 August 2019 / Published: 2 September 2019

Round 1

Reviewer 1 Report

Referee report on: Compulsory Schooling and The Returns to Education: A Re-Examination

Summary: The paper re-examines the instrumental variable (IV) approach to estimating the effect of compulsory school laws (CSL) on wages by using two U.S. examples from existing research on the topic. The authors argue that the IV approach yields inconsistent estimates and is conceptually confused because the educational variable is actually not endogenous, so that we should prefer the OLS over the IV estimates. They underpin this statement by using machine learning techniques, specifically the cross-validation, which suggests that the OLS outperforms the IV approach in terms of generalizability, stability, and consistency. They conclude that data-guided model selection is more important than the choice of consistent estimators.

The paper is concerned with an interesting question but the argumentation is very confusing. In the following, I want to discuss several points in more detail, which might help to improve the paper.

Generally, I found the paper difficult to follow. The authors use two examples of previous IV studies (AK and SY), which are based of different instruments, to illustrate their main points. However, as the authors note on page 1, we already know from the literature that IV estimates might vary considerably across studies e.g., due to a different choice of instruments as the IV identifies a LATE effect for a specific group of compliers. Moreover, the AK study has been already heavily criticized by later research, questioning mainly the validity of the instrument. So what’s the reason for re-examining the AK study if we know that their instruments are wrong? I’d suggest that the authors focus on the re-examination of the SY results, which is the latest installment in the debate on the magnitude of returns to education in the U.S.

The authors emphasize that importance of the distinction between the ARTE of education and the ATE of the CSL via schooling. I appreciate the formal representation of the two effects in Section 2, which makes it easier to follow the paper’s arguments later on. However, I am not convinced whether the paper adds anything new to the literature in this dimension. From equation (6) it is apparent that the ATE the authors have in mind corresponds to what is usually labeled as “reduced-form” effect in the literature. Obviously, the ARTE is not equal to the reduced-form effect of the CSL, but we already know that.

On page 4 the authors mention that SY’s result show that the CSL indicators correlate with regional factors and other controls, which basically invalidates the exclusion restriction. Is this a U.S.-specific problem or does it generally apply to all studies using CSL as instruments (i.e., for CSL changes in other countries)? Can the authors relate their work to the literature trying to bound the IV estimates if the exogeneity assumption is violated (e.g., Conley et al., 2012)?

In Section 3, the authors replicate the SY’s results and argue that the IV estimates “lack empirical consistency and robustness relative to their OLS counterpart” (page 7). However, Figure 4 illustrates also that the OLS estimates vary considerably across the samples, but not across model specifications. In contrast, the IV estimates vary across specifications, but remain relatively stable across different samples. I that it is difficult to conclude based on these results that the OLS results are more robust than the IV estimates. (It is confusing that the IV and OLS estimates are not depicted on the same scale in this figure).

I found the results in Table 1 interesting, especially the finding that the instruments used by SY do not pass the Sargan test of overidentifying restrictions in most cases. This finding implies that their results are indeed inconsistent. Maybe the authors should pursue this issue more. Again, a clear focus on re-examination of SY’s results would definitely add to the paper.

Table 2 splits then the SY’s results by educational groups (the cutoff is 12 years of schooling). There are several problems with this table. First, the splits might lead to an endogenously selective samples if the CSL affect the probability of obtaining more than 12 years of schooling. It is plausible that individuals obliged to attend school for one additional year change their attitudes towards their preferred final educational attainment. Thus, the authors should check whether the CSL affect the dummy less than 12 vs. 12 and more years of education. Second, the authors conclude that the IV estimates are insignificant for the higher sub-sample, thereby confirming the conjecture of zero ATE by CSL for this group. However, some of the coefficients are huge in magnitudes. This should be explained. Third, it would be interesting to know how the first-stage results vary across the sample splits.

In the abstract and introduction, the authors emphasize that they use machine learning techniques to show the inconsistency of the IV approach. Thus, I actually expected it to be the core contribution of the paper. It is a little bit disappointing that the use of machine learning techniques is limited to Figures 5 and 6.

Based on Figure 6 the authors conclude that the IV bias increases with k, suggesting overfitting. I do not really see significant increases of the bias with increasing k. The confidence intervals overlap throughout and the chances in the bias are tiny. Or do I misinterpret the scale of the y-axis?

I am also not convinced that Section 4 improves the AK and SY estimates by using a more parsimonious specifications as the authors claim. For example, adding the uni variable to the vector of controls seems awkward as it might be endogenous to the CSL. Further, the results in Table 3 suggest that estimates from the alternative model are not substantially more stable compared to the original one.

In Tables 4 and 5, the authors find several negative effects for the school sub-sample. How should we interpret these effects?

Other minor comments:

In the abstract authors state that they e-examine the IV estimates of the effect of CSL on education, but in fact, the focus of the paper is on effects on wages.

Page 2: Please rewrite the last sentence in the second paragraph. It is simply too long.

Page 14: The authors compare their estimates with those from Pischke and von Wachter (2008) for Germany and Oreopoulos (2006a) for Canada, the U.S., and Britain, amongst others. First, what do we actually learn from these comparisons of estimates for different countries? Why do not focus on previous results for the U.S. Second, there is already evidence that the mentioned results for Germany and Britain are not correct (Cygan-Rehm (2018) and Devereux and Hart (2010), respectively).

Reconsider re-writing of the concluding remarks as they are currently too technical and not very informative about the paper’s main contributions and its implications for further research.

References:

Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly exogenous. Review of Economics and Statistics, 94(1), 260-272.

Cygan-Rehm, K. (2018). Is additional schooling worthless? Revising the zero returns to compulsory schooling in Germany. CESifo Working Paper No. 7191

Devereux, P. J., & Hart, R. A. (2010). Forced to be rich? Returns to compulsory schooling in Britain. The Economic Journal, 120(549), 1345-1364.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this paper, the authors replicate two famous previous studies on the returns to compulsory schooling in the US and they propose a different empirical strategy based on DAGs. The authors argue that the (by now) traditional way of dealing with endogeneity problems in applied microeconometrics is „conceptually confused“ and provides „empirically inconsistent estimates“. They present a solution based on DAGs and data-guided model selection (where actually, the part on data-guided model selection is quite short).

I am not familiar with DAGs and the whole literature on this kind of causal models mainly put forward by Judea Pearl. Therefore, I was very interested in reading the paper – probably, I am not the right person to judge about it, however, because there are parts in it that I do not fully understand (yet, it should be the authors‘ job to make it understandable to people not using DAGs). What I am aware of is that Pearl claims his approach to be the one and only correct one and insults basically the whole economics and statistics profession as ignorant idiots because they do not follow. The authors‘ work has a similarly aggressive tone and throughout speaks of inconsitent approaches, conceptually confused approaches and so on. They might be right or wrong. But this is not how professional scientists should communicate with each other. Moreover, they might make their lifes easier with a more neutral tone. Why not formulate it more neutral and try not to scare off the reader?

Having said this, I will try to state the objections I have with the paper in its current form.

1. I do not understand what audience the authors address. The authors attack the current state of the art in applied microeconometrics. Yet, they do it in the language of the DAG-framework which shares similarities but is quite different from the mainstream econometrics language. If they want to have impact, they should translate their ideas into the language of mainstream econometrics. Otherwise people will try to read this but will not understand it without making the effort of reading a DAG introduction. Currently, they speak to DAG community which is already proselytized. Thus, this paper will not have impact. It just takes the reader to long to translate their ideas into their own world.

2. A major source of the authors‘ critique seems to be that originally one might be interested in the ATE of education but that IV is only able to identify a local average treatment effect (LATE). In their notation, it is they are interested in beta but IV only identifies beta^L because it uses the predicted value from the first stage. I guess this is meant by „rejection of the key causal variable as a valid conditional variable“. Now, this has been known for decades now. Of course, it would be desirable to be able to identify the ATE but, as Imbens (2010) says: „Better LATE than nothing!“. By the way, the concept of marginal treatment effects of Heckman and co-authors also makes this point and offers a solution to use IV-type estimation but nevertheless determine ATEs.

3. Throughout the paper, the authors speak of inconsistency of IV estimation. But nowhere do they define what they mean by „inconsistency“. Econometricians would in such a setting think of the statistical concept of „consistency“ that is, the ability of an estimator to come close to the true parameter. It took me a while to understand that the authors seem to mean something else. Their view is more about stability of estimates. No question, IV estimates often are unstable and imprecise. But this does not mean that they are inconsistent. I suggest different wording here.

4. Moreover, they seem to have a different concept of what an „invalid instrument“ is. Their view seems to be a fully data driven one. However, econometricians agree that validity of an instrument cannot be tested or jugded by looking at estimation results. Validity needs to be assumed and the assumption can be credible or not credible. One of the most important improvements in applied econometrics is the need for transparent discussion of instrument validity. Here, the authors take different estimation results with different IV specifications as evidence for „invalid instruments“. Also, having effect heterogeneity in mind – something the authors have, otherwise their critique does not make sense, see below – the traditional overidentification tests they use do not make sense anymore and should not be used. Different estimates using different instruments do not necessarily inform about instrument validity but can just be due to different LATEs.

5. Having said this, the chosen instruments might be invalid here (for other reasons than mentioned by the authors). Thus, IV can fail, of course! But that does not mean the IV as a method in general fails. This, again, makes clear the authors‘ view (prevailing among many who apply machine learning and are concerned with prediction problems) that all we need is data and models to solve an empirical question. But we need data, models and assumptions. Assumptions which cannot be tested.

6. Rigor: the authors are very confident about their superiority but their exposition lacks rigor and many things remain unclear. As one example, the authors seem to have hetereogeneous effects of S on Y in mind, but define a beta that does not seem to be allowed to vary by individuals. It is difficult to discuss the issue of ATE and LATE in this context. Funnily, in the model the authors set up, there is no effect heterogeneity and ATE equals LATE, so one of their main points would not hold here.

As another example they try to split up the sample into always-takers and compliers. They seem to be aware that it is impossible to do this. So why derive so strong and confident conclusions from an approach that obviously does not really work (or only works under assumptions that are not stated here).

7. Probably I am wrong but the authors want to show superiority of their approach: should they not use a simulation study to show it? Within the empirical application this does not work because we do not know the „true“ ATE that they would like to estimate.

8. The CV-approach on page 9 is not explained and, therefore, impossible to understand. Yet, I guess it does not make sense here, anyway, the way I understand it. CV makes sense in prediction problems. It is also no surprise that OLS outperforms IV here because it is much more precisely estimated. But the research question is not about predicition, it is about identification of causal effects.

9. Again, although redundant (but the text is full of statements like this): Page 11 „Given the overwhelming rejection of s^L in favour of s...“ The „rejection“ seems to be in terms of stability of estimates, which, however, is not the main criterion in causal analysis.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thank you for the fast revision. I think that you addressed most of the issues in a satisfactory way.

Reviewer 2 Report

It is reassuring to see that the authors partly dropped their aggressive tone. I still think that the authors make a big fuss out of points that have been known for quite a while. Also, I still think that the paper is written in a form that it scares off readers and, therefore, it will have limited impact. Moreover, to be honest, I did not understand all the authors' responses to my previous points, to me, some just answer different questions.

The paper still is full of strange expressions. Just three examples from the introduction:

"the paper has since entered the standard economics curriculum as evident from its appearance in two popular textbooks by Angrist and Pischke (2009; 2015)." I do not say that this is wrong but it is a strange argument because the textbook is by one of the authors who just cites himself.

"Stephens Jr. and Yang (2014), replicate the research design..." The point SY make is not about the instruments! It is about controlling for state fixed effects. In my view, this statement is misleading!

"Angrist and Krueger (1991) interpret their results as consistent estimates of the ARTE" Come on! Five years later the LATE-interpretation was introduced, also by Angrist. You cannot blame him for having called this an ARTE 30 years ago!

Yet, I acknowledge that the authors see this differently and I just leave it at that. All the best!

Article Menu

Compulsory Schooling and Returns to Education: A Re-Examination

Further Information

Guidelines

MDPI Initiatives

Follow MDPI