Next Article in Journal
Non-Negativity of a Quadratic form with Applications to Panel Data Estimation, Forecasting and Optimization
Previous Article in Journal
Generalized Mutual Information
 
 
Article
Peer-Review Record

A Family of Correlated Observations: From Independent to Strongly Interrelated Ones

Stats 2020, 3(3), 166-184; https://doi.org/10.3390/stats3030014
by Daniel A. Griffith
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Stats 2020, 3(3), 166-184; https://doi.org/10.3390/stats3030014
Submission received: 22 May 2020 / Revised: 20 June 2020 / Accepted: 27 June 2020 / Published: 30 June 2020
(This article belongs to the Section Statistical Methods)

Round 1

Reviewer 1 Report

The manuscript presents an interesting proposal on correlation structures.

Author Response

This reviewer makes no specific comments. S/he indicates that the narrative relating to research design, methods, results presentation, and results supporting conclusions can be improved. These improvements were addressed by the revisions and insertions in reply to the other three reviewers.

Reviewer 2 Report

1) It seems that the new classification is a subset of well known models on lattice can be found in Cressie (1993) and Cressie and Wkle (2011) books on spatial and spatio-temporal data analysis

2) Main contribution of the author is estimation of the covariance matrix density for models under consideration, but for the modern computers and moderate-size data sets the main problem is not the matrix density but number of parameters used for the matrix parametrization. Wide range of devices like variogram models, can be used to define dense covariance matrix using only few parameters.

3) Author use the same letter V for covariance and inverse covariance matrices

 

 

Author Response

In response to the first point, citations were inserted to both Cressie (1993) and Wikle and Cressie (2011) in the respective sections about space series and space-time series. I originally considered including these two references, and only for brevity decided at that time not to; they now are cited in addition to their predecessors. I agree that they make the presentation more complete.

 

I incorporated this point into the first paragraph of the conclusion, in part commenting about it as future research.

 

I use V and V-1 to emphasize that the same covariance matrix is involved; this usage parallels the multivariate statistics notation usage of Σ and Σ-1. For better clarity, I revised the one occurrence that might be confusing.

 

My revisions and insertions in reply to all of the reviewers improve the narrative relating to the introduction, research design, methods, results presentation, and results supporting conclusions.

 

Reviewer 3 Report

 

The topic of independent samples, correlated samples and of studying potential data structures and correlation data structures is fundamental in statistics. So the chosen topic of this paper is very worthwhile, particularly with the coming of big data, data deluge etc… providing more issues when considering data samples. Unfortunately, the orientation with the latter in mind is not included in the paper, but never mind, there is probably enough to discuss or re-discuss within a focus of effective samples and minimal samples.

 

The scholastic tone of the paper is appealing but It is difficult to dig out the points made.

 

The paper relates to a previous classification of samples by Liu and Liang (1997) and proposes to modify this particularly in introducing the category “matched k-tuples (k>=2)” extending the concept of pairs so of correlation. Incursions in textbooks give some example mostly of k=2 matched data, but without fully showing how different this category brings in comparison to Liu and Liang’s. The time and spatial autocorrelation limits to order 1 and if some developments of these cases seem to give some results the paper do not express these in a very comprehensive way.

The exchangeability disappears without much discussion, it is however entailed with important results (e.g. De Finetti theorem).

 

I recommend to rewrite drastically the paper with more convincing arguments for each category together with more precise conclusions or interpretation on each result. An example where real structure is ignored (at first) and going through the different categories expressing the consequences, would be quite informative.

 

 

Some details: 

 

 

line 2  the section 2 title and line 122 section 3 title needs rewriting, probably as only the working after the colon punctuation.

Section 2 is not describing the methods but describing the first concepts …

Section 3 doesn’t seem to be the results but more the core of the paper …

 

 

line 83 As it is the first time the concept of effective sample size is used in relation to observations, it is perhaps necessary to give some details / explanation of the wording “ a sample size of one with n observations”. Perhaps the analogy to multivariate normal distribution should say that the n observations are seen as a vector of n random variables (which are the same)

 

… and then line 92, the case were the n observations are uncorrelated correspond to the likelihood of an i.i.d sample of one  Gaussian variable.

 

The concept of independence needs to be defined and described in the introduction  in relation to uncorrelation which would be important as when looking at Gaussian distribution, uncorrelated sets implies independent sets, as used in line 120.

 

 

Equation (3) is a bit abrupt and needs a bit more explanation

 

Theorem 1 + corollary 1 should be part of the same Theorem including line 120 comment. Line 108 comment as “relevant standard theorem” is not straightforward and should be explained better.

 

The social network autocorrelation example could lead a  big data discussion.

 

Author Response

In response to the first point, a comment was added at the end of the introduction, and at the end of §3.5 about big data.

 

A summary statement was added at the end of each section emphasizing its main point(s).

 

Parsimony was an emphasis for each of the correlated data situations, to retain focus on the unfolding taxonomy. Insertions in the beginning of §3 and in the first paragraph of the conclusion highlight this perspective.

 

A reference to the de Finetti theorem was inserted in the 2nd paragraph of the introduction, and in the 1st paragraph of the conclusion.

 

The first example in each section now has an explicit statement of variance consequences ignoring and accounting for correlation among observations, supplementing the original sole contrast for repeated measures. A new §3 opening paragraph summarizes these examples, stressing the consequence of variance inflation.

 

§2 and §3 titles are revised as suggested.

 

The effective sample size, multivariate normal, and Gaussian discussion were incorporated as suggested.

 

Additional discussion about independence was inserted into the 1st paragraph of the introduction, as well as immediately preceding equation (2).

 

Equation (3) now contains some of the algebraic steps between its left- and right-hand sides.

 

The Theorem 1 discussion was modified as suggested, and a citation added about the phrase “standard mathematical statistics.”

 

A big data statement expressing this sentiment was added to the last paragraph of the introduction.

 

My revisions and insertions in reply to all of the reviewers improve the narrative relating to the introduction, research design, methods, results presentation, and results supporting conclusions. I am puzzled about the English language assessment; the 4th reviewer suggested only conducting a spell-check, which I have done.

Reviewer 4 Report

see attached file

Comments for author File: Comments.pdf

Author Response

I disagree with the 2nd point. I can make these changes, if the editor wishes. But the use of ρ is consistent with the classic paper by Ord (JASA, 1975), and ρ still would be needed for the theorem and the repeated measures example, creating a mixture of notation usage if the other cases are changed; Cressie (1993) uses θ. As I note in my reply to Reviewer #2, I see no reason not to use V and V-1; a definition is stated at the beginning of §2.

 

The adjectives “weak” and “strong” are replaced with “small domain” and “large domain,” which seem more appropriate.

 

I supplemented the Kronecker specifications with the suggested specification because Kronecker operations also occur in equations (10)-(12), and elsewhere; in other words, removing it from this section would not remove it from the paper. Furthermore, because regression is a common and unifying thread throughout this paper, it needs to be retained in this section.

 

Because variance impacts attributed to correlated observations are an emphasis of this paper, the ANOVA example is retained. The repeated measures category is a necessary class in the taxonomy. Meanwhile, a sentence was added at the end of §4 pointing out the noteworthiness of scrutinizing the fox rabies dataset: “because of their rarity in the past, datasets like Andrew and Herzberg’s Southern Germany fox rabies dominate the sample datasets analyzed in statistics courses and textbooks, and appear in such collections as the R datasets.” This is a contention reiterated before Figure 4.

 

My revisions and insertions in reply to all of the reviewers improve the narrative relating to the methods and results presentation. I ran spell check.

 

Round 2

Reviewer 2 Report

no other comments

Author Response

This reviewer appears to find my previous revisions acceptable.

Reviewer 3 Report

No convincing answers to my comments were given and no drastic changes have occurred as requested in the line of my comments.

The topic is interesting and deserves a better analysis and discourse than what is rendered here so far.

Author Response

see attached

Author Response File: Author Response.pdf

Back to TopTop