A Parametric Bayesian Approach in Density Ratio Estimation
Round 1
Reviewer 1 Report
Please see my report.
Comments for author File: Comments.pdf
Author Response
Dear Reviewer,
Many thanks for your valuable comments and suggestions. I applied all of your concerns and comments in the manuscript in ORANGE color.
Here also, are my responses to your comments:
Response to comment 1:
we added two refs which deal with more specifically with the machine learning and the information theory.
Response to comments 2, 4, 8, 10, 20: Typos have been fixed.
Response to comment 3:
We tried to add some explanatory sentences and define clearly p(t) and q(t) and emphasize on how we get them using Bayes' rule. Kindly please, see lines 48 and 49 on page 2 and 3.
Response to comment 5:
Yes, P(.) is the cumulative density function and it had been introduced already (between line 55 and 56).
Response to comment 6:
Yes, it was an error in eq (10). We defined the CDF $\Pi(.)$ and fixed the Bayes risk formula (eq 10).
Response to comment 7:
Yes, it was a typo. I removed the integral and fixed it please see line 71. Also, I removed the P(.) related that integral.
Response to comment 9:
Yes, it the Bregman in the exponential family. What we did regarding your comments:
1- We moved the Def. 1, from the previous position to just before Lemma 1 and 2 (see p.6)
2-We correct the Def 1 by emphasizing on "exponential family".
3-Include c(.) in the problem set up (see lines 39 and 40).
4-Now because we have referred to exponential family, ie. eq (1), so the $\eta$, $\gamma$ and $\kappa$ are clear.
Response to comment 11:
Lemma 1 was introduced and proved by Nielsen and Nock (2010). We have mentioned in the proof. (line 90). Also, some lines before Def. 1 in the old manuscript, have been modified and deleted. Also, as mentioned in Response 9 (1), we have moved and reorganized the Def 1 along with Lemma 1,2.
Lemma 2 is not an obvious part of Lemma 1, just we introduced Lemma 1 in order to gain it Lemma 2 to find the DRE under loss function $E^{p}(\log \hat{r}(t)-\log r(t)t)^2 (as you pointed out correctly. Thanks!)
Response to comment 12:
We have change H() to h().
Response to comment 13:
We rephrased it and referred it to eq (15) in order to avoid any confusion.
Response to comment 14:
Loss functions in Ex 1, 2 are added now. Please see lines (between 106/107) and (113/114).
Response to comment 15:
It has been fixed now. Please see line 115.
Response to comment 16:
We deleted this remark because it is explained somehow in Lemma 2 now.
Response to comment 17:
We tried to emphasize why Huber loss is important based on other reviewer's suggestion. Since in order to find the Bayesian DREs under Log-Huber one need to find them under either Log-L2 or Log-L1, the most of the paper we try to find the closed form Bayesian DRE under Log-L2 (and Log L1). In fact, Figures 4, 5, are used to illustrate the role of Log-Huber and its robustness.
Response to comment 18:
Parametric methods.
Response to comment 19:
No, I have not used this loss function. However, this loss (alpha-divergence loss) embrace many well-known loss functions such as KL and Hellinger. Furthermore, estimating the divergence functions has its own interests in the literature.
Response to comment 21:
We corrected the "variance varies" to "ratio of variances varies". We added some sentences to address all of your concerns such as whether the means are known or unknown and so. Kindly see lines 142 until 148.
Response to your GENERAL comment:
You asked, why one needs to care about frequentist performance when we could use prior elicitation.
That is a good question, but please note that when the prior elicitation methods are followed and we use experts' opinion (no longer using the conjugate priors) we can not hope to see any close form density ratio estimator. The main idea of this article is bolding the importance of parametric methods, and having the closed form for the DRE based in exponential family (That is why we used the conjugate priors because gaining beautiful results of them in exp family). Otherwise, there are many works in the literature whose they do use non-parametric methods and lacking forming the closed form has its own disadvantages, for instance, it is computationally expensive.
Also, We study the performance of the obtained estimator in terms of frequentist risk function, which enables us to evaluate them based on the (hyper)parameters.
Thank you so much once more for your useful comments.
Reviewer 2 Report
The paper solve interesting problem, but mainly in introduction they were used old literature. Please add the newest literature. See for example https://michaelgutmann.github.io/assets/slides/Gutmann-2017-06-20.pdf , https://arxiv.org/pdf/1611.10242.pdf
Please explain better, why you decide to use Huber loss, Least square and absolute error function and why these function are interesting for reader.
From the text it is not obvious whether the Correction factors (Table1) were derived by authors. Similarly KL in Table 2. Please put into text this information.
The English must be corrected.
Some mistakes occur, for example:
Row 34 delete Se
row 63: delete it is
row 73: delete and (and and)
row 74: (see, Brwon, 1986) - does not exist in References
row 88: I propose to delete text ... is that is the Bayesian DRE
under equation (16) parameters for given t
row 96: The names of the distribution must have capital first character. You have not described all shortcut used in Table 1 and Table 2 (for example: IG, Bin, Ge, Bet, Ray, U distribution)
Row 99 r e. - must be r/e.
Pleas correct these mistakes and check that all publications are used in the text
Author Response
Dear Reviewer,
Many thanks for useful comments. Here are my responses:
I have corrected all your concerns and comments in the draft in BLUE
1-The paper solve interesting problem, but mainly in introduction they were used old literature. Please add the newest literature. See for example https://michaelgutmann.github.io/assets/slides/Gutmann-2017-06-20.pdf , https://arxiv.org/pdf/1611.10242.pdf
My response: I have added the mentioned paper in the ref as well as explaining in the application of DRE in the posterior density estimation problem as addressed in that paper.
3-Please explain better, why you decide to use Huber loss, Least square and absolute error function and why these function are interesting for reader.
my resp.: In Sec. 3.1, I have explained why the Huber loss is important and why L2 and L1 which are widely used may not be suitable. I rephrased a bit to make it more clear as well as adding a reference and explaining the Huber loss application in the robust regression problem. (all in blue)
3-From the text it is not obvious whether the Correction factors (Table1) were derived by authors. Similarly KL in Table 2. Please put into text this information.
My resp.: Table 1 is our contribution and it is based on the correction factor which is new (Thm 1). Table 2, we just calculated the KL loss between well-known densities.
4-The English must be corrected.
My resp.: All have been corrected and the references had been double-checked.
Thank you again.
Reviewer 3 Report
English needs a lot of corrections. The proposed ones are annotated in the attached file.
I recommend a further check of English.
At line 74, the authors cite Brwon 1986, but this reference is not in the bibliography at the end of the paper.
I believe that other references are absent in the bibliography, or are cited in a wrong way (for instance, changing the order of authors).
Please check all citations and references.
Comments for author File: Comments.pdf
Author Response
Dear Reviewer,
Many thanks for your useful comments.
Here are my responses:
All your concerns and comments are applied to the manuscript in GREEN color.
Line 74, the correct name is Brown (1986) which has been fixed and added to the ref.
All citations and references are double-checked and their orders have been fixed now.
Thank you again.
Round 2
Reviewer 1 Report
Some minor comments
Your notation for expectation on the left in (15) is somewhat misleading as it looks like a posterior risk. I would drop the superscript theta|t. Also this is the "prior risk" not the "Bayes risk" which is defined as the smallest value of the prior risk.
line 142 larger ratios
There is absolutely no reason to suppose that a prior is not elicited from a conjugate family. In fact this is probably the most common use of priors and elicitation. If a conjugate family is rich enough to express prior beliefs, then there are lots of good reasons to use such a family and design the elicitation algorithm around it. In my experience this is the primary way that proper priors are elicited and the nonparametric approach does not avoid the need to elicit.
Author Response
Dear Reviewer,
Many thanks for your comments again. I have corrected your concerns. Please see Lemma 2.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.