D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper introduces a visual framework that synthesizes rank plots, copula-based measures, and a combined display known as the dplot, aiming to help analysts more effectively assess dependence between two continuous random variables.
I appreciate the effort that went into this work and the thoughtful presentation of visual methods for exploring statistical dependence between continuous random variables. That said, I do think the paper has potential, but it would benefit from substantial revisions before it can be considered for publication. Below, I’ve outlined several key areas where improvements could help clarify and strengthen the overall contribution.
1. Clarify the Novel Contribution: The work currently presents a combination of existing methods rather than a new statistical tool or theoretical result. While this can still be valuable, especially from a visualization or pedagogical perspective, it’s important to clarify what exactly is new here. Is the innovation primarily in the visual design (dplot)? If so, how does it compare to existing visualization frameworks.
2. Include Empirical Validation: The work makes claims about the benefits of using dplots to assess dependence more effectively than traditional scatter plots. However, these claims are not supported by empirical evidence. I’d recommend adding either (or both) of the following:
-
- A simulation study and/or a real-world study showing how dplots aid in detecting non-quadrant dependence that might be missed by standard techniques.
3. Address Scope and Limitations: The work focuses on continuous variables, which is a reasonable starting point, but in practice, many datasets include ordinal, categorical, or mixed type variables. It would be helpful for the authors to explicitly acknowledge this limitation and briefly discuss whether the dplot framework could be extended or adapted to those broader settings.
To strengthen the paper and broaden its impact, I also would recommend citing the following relevant works, which address association in ordinal and mixed-type data:
(a) Liu, D., Li, S., Yu, Y. and Moustaki, I., (2021). Assessing partial association between ordinal variables: quantification, visualization, and hypothesis testing. Journal of the American Statistical Association, 116(534), pp.955-968.
(b) Li, S., Fan, Z., Liu, I., Morrison, P.S. and Liu, D., (2024). Surrogate method for partial association between mixed data with application to well-being survey analysis. The Annals of Applied Statistics, 18(3), pp.2254-2276.
Comments for author File: Comments.pdf
4. There are typos and grammar errors in the manuscript.
(a) On Page 10, Line 308, the phrase “...unbiased and consistent estimations...”should use “estimates” instead of “estimations,” which is the correct term in this context.
(b) Page 11, Line 339, the phrase “...especially to identify cases...”should be revised to “...especially for identifying cases...” for smoother and more natural phrasing.
(c) Page 20, Line 502, the phrase “ Similarly to the comparison...”should be revised to “ Similar to the comparison...” for more natural and grammatically correct phrasing.
(d) Page 21, Line 553, the phrase “...code and data sets...is available...” should be revised to “...Code and data sets... are available...” to match the plural subject with a plural verb.
Author Response
Reviewer's comment 1: The work currently presents a combination of existing methods rather than a new statistical tool or theoretical result. While this can still be valuable, especially from a visualization or pedagogical perspective, it’s important to clarify what exactly is new here. Is the innovation primarily in the visual design (dplot)? If so, how does it compare to existing visualization frameworks?
Response 1: We appreciate the reviewer’s observation. The novelty of our work lies primarily in the integration of existing theoretical concepts (such as copula-based rank plots, Schweizer-Wolff’s dependence, and Spearman’s concordance) into a cohesive visualization ensemble—the dependence plot (dplot)—that enables nuanced visual and numerical assessment of bivariate dependence. This ensemble goes beyond traditional scatter plots and correlation matrices by making the copula structure visually interpretable, especially in cases involving non-quadrant dependence. We have now clarified this contribution in both the Abstract and Introduction, emphasizing that the dplot is a novel visualization design for pedagogical and analytical purposes.
Reviewer's comment 2: The work makes claims about the benefits of using dplots to assess dependence more effectively than traditional scatter plots. However, these claims are not supported by empirical evidence. I’d recommend adding either (or both) of the following: A simulation study and/or a real-world study showing how dplots aid in detecting non-quadrant dependence that might be missed by standard techniques.
Response 2: The benefits of using dplots to assess dependence more effectively than traditional scatter plots has already been supported by empirical evidence since the first submission: 2 simulated examples in Section 6, and 3 real data examples in Section 7. Julia programming code and data sets used for calculations and generating figures are available for reproducibility.
Reviewer's comment 3: The work focuses on continuous variables, which is a reasonable starting point, but in practice, many datasets include ordinal, categorical, or mixed type variables. It would be helpful for the authors to explicitly acknowledge this limitation and briefly discuss whether the dplot framework could be extended or adapted to those broader settings. To strengthen the paper and broaden its impact, I also would recommend citing the relevant works by Liu et al.(2021) and Li et al.(2024).
Response 3: We have added a paragraph in the Conclusions and Discussion (Section 8) acknowledging that the current framework is limited to continuous random variables. We also briefly discuss how future work may adapt the approach to discrete or mixed data using subcopula theory and alternative dependence measures, citing the relevant works by Liu et al.(2021) and Li et al.(2024).
Reviewer's comment 4: There are 4 typos and grammar errors in the manuscript.
Response 4: We thank the reviewer for noticing typos and grammar errors, which have been corrected accordingly.
Reviewer 2 Report
Comments and Suggestions for Authors\documentclass{article}
\topmargin -0.5in \textheight 21cm \textwidth 17cm \hoffset -0.8in
\begin{document}
\thispagestyle{empty}
\begin{center}
{\large \textbf{Referee's report on the manuscript by Erdely and Rubio-S\'anchez \\[2mm]
submitted to STATS}}
\end{center}
This is a neatly written paper, essentially typo-free, which explains and illustrates why it is essential to use rank plots rather than traditional scatter plots when visualizing dependence between two continuous random variables. I~enjoyed the authors' step-by-step pedagogical approach and thought Figure~1 was particularly compelling. As nothing is perfect, I~would like to raise a small number of issues which the authors should fix before their paper is published.
\bigskip
\noindent
\textbf{Major points}
\begin{enumerate}
\item
It would be important to state more clearly that rank plots have been in use for dependence and copula modeling for many years. The book by Hofert et al. (2018), which appears as reference [38], is fine but fairly recent. For an earlier reference, see, e.g., Ge\-nest \& Favre (2007), ``Everything you always wanted to know about copula modeling but were afraid to ask,'' \textit{Journal of Hydrologic Engineering}, 12(4), 347--368.
\item
It would also be capital to state why rank plots, which exhibit the support of the empirical copula, are a reliable reflection of the underlying dependence structure captured by a copula. Right now, the authors do this in a roundabout way by stating, around line 308 on p.~10, that the marginal empirical distributions are unbiased and consistent estimators of the underlying margins. This is not sufficient. What needs to be added here is that the empirical copula $C_n$ defined at Eq.~(10) is itself a consistent estimator of the underlying copula $C$. In fact, much more is known about the asymptotic behavior of the stochastic process $\sqrt{n} (C_n - C)$. The most up-to-date references about this topic (indeed, one could say ``the final word'') are the following two papers, which consider the case of data from continuous distributions and data from arbitrary distributions, respectively:
\begin{itemize}
\item [a)] Segers (2012), ``Asymptotics of empirical copula processes under nonrestrictive smoothness assumptions,'' \textit{Bernoulli}, 18(3), 764--782.
\item [b)] Genest, Ne\v{s}lehov\'a \& R\'emillard (2017), ``Asymptotic behavior of the empirical multilinear copula process under broad conditions,'' \textit{Journal of Multivariate Analysis}, 159, 82--110.
\end{itemize}
I encourage the authors to cite them both.
\item
On p.~3, line 101, it is good to cite Friend's book, as the authors do. Another reference that would be relevant in the context of dependence visualization for discrete data would be Genest \& Green (1987), ``A graphical display of association in two-way contingency tables,'' \emph{The Statistician}, 36(4), 371--380.
\item
On p.~3, line 121, it is claimed that ``In Figure 1 the scatter plot located in the top-left corner is precisely a rank plot.'' This is wrong. Rank plots are based on ranks, which is not the case here. Granted, there is only a minor difference between a scatter plot and a rank plot when the marginal distributions are uniform on the interval $(0, 1)$. Nevertheless, there is a difference, which becomes less and less apparent as the number of observations increases.
\item
On p.~5, line 151, the restriction that $F$ be strictly increasing is not necessary. The result holds as long as $F$ is increasing (e.g., it could have plateaus). See, e.g., Proposition~3.1 in Embrechts \& Hofert (2013), ``A note on generalized inverses,'' \textit{Mathematical Methods of Operations Research}, 77(3), 423--432.
\item
I was a bit disappointing that no mention of Kendall's tau was made. Although the parallel with Pearson's correlation is less direct, Kendall's tau is generally preferred in copula modeling, particularly in connection with elliptical copula models.
\item
Would it be worthwhile to inform the reader that there exist rank-based tests of PQD? For a review, see Tang et al.~(2021), ``Testing for positive quadrant dependence,'' \textit{The American Statistician}, 75(1), 23--30.
\end{enumerate}
\bigskip
\noindent
\textbf{Minor points}
\begin{enumerate}
\item
The expression ``copula function'' which is used repeatedly in the manuscript is a pleonasm, because a copula \textit{is} a function. Recommendation: Replace ``copula function'' by ``copula'' everywhere.
\item
In expressions such as Fr\'echet--Hoeffding and Schweizer--Wolff, it is customary to place a longer hyphen between the two names as a visual clue that they refer to two different persons. Right now, there is only one hyphen, e.g., Fr\'echet-Hoeffding. Please fix.
\item
There are minor inconsistencies in the reference style. For example, the title of Nelsen's book [28] is not capitalized, whereas Wasserman's book [37] is. A similar comment applies to the titles of articles; cf. [12] and [13]. Please fix according to the journal's style file.
\item
In reference [39], replace ``d\'ependence'' by ``d\'epend\underline{a}nce'' and, later on, ``ind\'ependence'' by ``ind\'epend\underline{a}nce.''
\item
In reference [27] to Sklar's 1959 paper, the ``n'' preceding the word ``dimensions'' should be italicized, as it is a mathematical symbol, viz. ``... \`a $n$ dimensions...''
\end{enumerate}
\bigskip
\noindent
Report submitted with a positive recommendation (minor revision) on April 25, 2025
\end{document}
Author Response
Reviewer's comment 1: State more clearly that rank plots have been in use for dependence and copula modeling for many years. Consider earlier references such as Genest & Favre (2007).
Response 1: We agree and appreciate the suggestion. Although we cited Hofert et al. (2018), we now explicitly acknowledge that the use of rank-based methods for dependence modeling has a longer history. We have added a sentence and citation to this effect in the introduction to Subsection 5.4, and we now include Genest & Favre (2007) as an additional reference.
Reviewer's comment 2: State clearly that the empirical copula Cn
is a consistent estimator of the copula C, and mention asymptotic results. Cite Segers (2012) and Genest, Nešlehová & Rémillard (2017).Response 2: We appreciate this relevant clarification. We have revised the discussion in Subsection 5.4 to explicitly state that the empirical copula Cn
is a consistent estimator of the true copula C, and we now cite Segers (2012) and Genest et al. (2017) as suggested.Reviewer's comment 3: On p. 3, line 101, add a reference relevant to dependence visualization in discrete data: Genest & Green (1987).
Response 3: We have added the suggested citation in the context of visualizing association for categorical data, complementing our discussion of mosaic plots.
Reviewer's comment 4: On p. 3, line 121: The top-left plot in Figure 1 is not a rank plot—it is a scatter plot with uniform marginals. Please correct.
Response 4: We have corrected the description of the top-left plot in Figure 1 and clarified the distinction.
Reviewer's comment 5: On p. 5, line 151, remove the strict increasing condition for the CDF; cite Embrechts & Hofert (2013).
Response 5: We have replaced “strictly increasing” and cited the appropriate reference.
Reviewer's comment 6: Kendall’s tau is often preferred in copula modeling, especially for elliptical copulas. It should be at least mentioned.
Response 6: We now mention Kendall’s tau at the end of Subsection 5.2, including a brief remark about it.
Reviewer's comment 7: It would be worthwhile to inform the reader that there exist rank-based tests of PQD. For a review, see Tang et al.(2021).
Response 7: Such related work has now been mentioned and cited in Subsection 5.1.
Response to Reviewer's minor points: All have been corrected.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI really appreciate the authors taking the time to address all of my questions. I’ve reviewed the latest version, and I think the paper is in great shape now. I’m happy to recommend it for publication.