Adaptive Reduction of Curse of Dimensionality in Nonparametric Instrumental Variable Estimation

Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsGENERAL COMMENT. This paper is on nonparametric dimensionality reduction. It is rather well writen, and as far as I can see the proofs atre correct. It suffers from some important drawback that is avodiing for measuring really what is the degree of novelty of the paper (look at comment 1 below).
SPECIFIC COMMENT.
1. Nonparametrci variable/model selection methods by kernel tools have been widely studied in the literature. The authros are failing to have a complete discussion of the state of art.The precursor paper (Vieu, 1994) is not discussed and many more recent key papers like those by Horowitz, or by Van de Geer, and many other ones need to be presented shortly in order to highlight the interest fo this new paper.
2. This paper si on multivariate setting. Recent works by Aneiros (see references in the Novo et al survey in JMVA two years ago) showed that any procedure available in multivariate setting (as this one intends to be +) can be (more or less easily) adapted to functional data framework. A short discussion on this point is necessary, with relevant references. Stating that as a n open question for the future would greatly increase the impact of this paper.
3/ In a curious way (see formula (16) and some other ones along page 8) when estimating the smooth distribution function F, the authors are using some discontinuous indicator kernel? Why a smooth kernel cannot be also used therein?
Author Response
Please find the responses in the attached file.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authorsplease see my report
Comments for author File: Comments.pdf
Author Response
Please find the responses in the attached file.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors proposed a nonparametric sufficient dimension reduction framework to mitigate the dimensionality curse. The method emphasizes introducing various central subspaces to identify different treatment effects and propose estimators for these subspaces and treatment effects. The theoretical properties of the proposed estimators were also studied.
The method can preserve the nonparametric nature of the instrumental variable estimator. The major merits over existing competitors are the proposed sufficient dimension reduction technique can avoid stringent distributional assumptions and uncovers parsimonious structures in the population distribution. Moreover, the proposed framework contains fully nonparametric models as a special case when the data does not support any dimensional reduction. Two types of sufficient dimension reduction subspaces were introduced in the method to identify the interested treatment effects. The proposed framework guarantees the estimated central subspace is n^(1/2)-consistent. Moreover, the nonparametric instrumental variable estimator of the proposed method can achieve the same asymptotic distribution when the true central subspace is known. Q1. To minimize the prediction risk (12), the authors proposed a forward selection algorithm. The CV functions in the algorithm are key to implementing the algorithm. However, the CV functions or their approximate forms are complicated and are integral function forms. The authors should explain how to obtain the CV functions in practical applications.
Q2. The proposed methods are good, and the authors showed their theoretical properties in detail. However, the implementation is not clear enough. The authors showed simulation results in Section 2.5, which is short and can be clearer. Using examples or adding more paragraphs to demonstrate the applications of the proposed methods in the real world are appreciated and can guide the readers to have a comprehensive picture to use the proposed method.
Q3. The extension in Section 3. In (19), U is an unobserved variable and h_Y is an unknown function. Both U and h_y are unknown. In P14, we can see that we need the assumption to the conditional distribution assumption of U|X. How to do it in practice seems to have no clear picture. Please explain it.
Q4. This paper is well-written but too long to read. It is better to cut the paper length to be more condensed.
Author Response
Please find the responses in the attached file.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsGenerally, I am satisfied with the revision of the authors. I just have a few remaining minor comments. Nevertheless, I do hope that the authors take them into account. The following comments are in no particular order:
Equation (9): Why is equation (8) of interest. In the end, you want to do inference on the LATE which is specified in (9). Or do you need (8) for the technical analysis?
p. 7, line 181: There seems to be a typo in the CV since the right hand side should be evaluated at \hat B_\pi instead of B. Please clarify! Also it is confusing that here, \hat Fs are evaluated at the data driven bandwidth values..
p. 1, line 37-38: Explanation is not correct but rather: In this setup, [14] establishes adaptive hypothesis testing based on a random exponential scan for the data-driven selection of optimal smoothing parameters.
p. 10, line 219: ".. whose order will be ever..." does not make sense. Please correct grammar
Comments on the Quality of English Language
see comments above
Author Response
Thanks for your comments.
Please find the responses in the attached file.
Author Response File: Author Response.pdf