D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables

Erdely, Arturo; Rubio-Sánchez, Manuel

doi:10.3390/stats8020043

Open AccessArticle

D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables

by

Arturo Erdely

^1,*,†

and

Manuel Rubio-Sánchez

^2,†

¹

Programa de Actuaría, FES Acatlán, Universidad Nacional Autónoma de México, Avenida Alcanfores y San Juan Totoltepec S/N, Santa Cruz Acatlán, Naucalpan de Juárez 53150, Mexico

²

Departamento de Informática y Estadística, Universidad Rey Juan Carlos, C/Tulipan s/n, Móstoles, 28933 Madrid, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Stats 2025, 8(2), 43; https://doi.org/10.3390/stats8020043

Submission received: 16 April 2025 / Revised: 5 May 2025 / Accepted: 15 May 2025 / Published: 24 May 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Scatter plots are widely recognized as fundamental tools for illustrating the relationship between two numerical variables. Despite this, based on solid theoretical foundations, scatter plots generated from pairs of continuous random variables may not serve as reliable tools for assessing dependence. Sklar’s theorem implies that scatter plots created from ranked data are preferable for such analysis, as they exclusively convey information pertinent to dependence. This is in stark contrast to conventional scatter plots, which also encapsulate information about the variables’ marginal distributions. Such additional information is extraneous to dependence analysis and can obscure the visual interpretation of the variables’ relationship. In this article, we delve into the theoretical underpinnings of these ranked data scatter plots, hereafter referred to as rank plots. We offer insights into interpreting the information they reveal and examine their connections with various association measures, including Pearson’s and Spearman’s correlation coefficients, as well as Schweizer–Wolff’s measure of dependence. Furthermore, we introduce a novel visualization ensemble, termed a d-plot, which integrates rank plots, empirical copula diagnostics, and traditional summaries to provide a comprehensive visual assessment of dependence between continuous variables. This ensemble facilitates the detection of subtle dependence structures, including non-quadrant dependencies, that might be overlooked by traditional visual tools.

Keywords:

copula; dependence; concordance; scatterplot; rankplot

1. Introduction

For over three decades, the data visualization community has been innovating, crafting sophisticated and interactive methods to analyze and present data. Despite these advancements, simple visual representations such as bar, line, and pie charts remain indispensable. Their simplicity not only facilitates communication with a wide audience, but they can also be considered to be the most appropriate and effective visualizations of certain types of information.

Among these foundational visual tools is the scatter plot, which is regarded as one of the most useful and popular statistical graphs [1]. It depicts bivariate numerical data as points within a Cartesian coordinate system, presenting the data “as they are” (i.e., no information is lost through its visual encoding), allowing for the direct reading of values by projecting the points onto labeled axes. In this regard, scatter plots have numerous benefits and are the predominant technique for visualizing the data of two numerical variables simultaneously. They can be extended by incorporating additional layers, such as regression curves, confidence bands, modifications in point characteristics (shape, size, color, opacity), regions/areas of interest, histograms, density contours, correlation/dependence measures, winglets, glyphs, and so forth (see [2,3,4,5,6,7,8,9]). Moreover, scatter plots enable users to assess similarity between observations through distance, aiding in the detection of clusters, outliers, and class separation.

A scatter plot may be regarded as the quintessential graph for showing the relationship between two numerical variables. However, surprisingly perhaps, it has limitations when it comes to illustrating the statistical dependence between variables. In particular, it includes information about the marginal distributions of the variables, which is irrelevant to their dependence. This excess information can obscure the true relationship between the variables. To address this, we turn our focus to rank plots, i.e., scatter plots of data ranks, which omit information related to marginal distributions and rest on solid mathematical theory.

While previous approaches like scatterplot matrices, corrgrams, or scagnostics aid in multivariate visual exploration, they often fail to isolate dependence information from marginal influences. In contrast, our proposed d-plot integrates dependence-focused visual components—including rank plots and empirical copula diagnostics—into a compact ensemble that supports both exploratory analysis and interpretability grounded in copula theory.

The main contributions of this paper are the following: (1) explaining, through copula theory, the limitations of scatter plots for dependence visualization and the preference for measures like Spearman’s rank correlation over Pearson’s correlation; (2) advocating for rank plots as more suitable tools for analyzing statistical dependence; (3) providing interpretation guidelines for common patterns in rank plots; and (4) introducing a novel graphical ensemble that we call a d-plot for performing a comprehensive analysis of the relationship between two continuous random variables.

The remainder of this paper is structured as follows: A review of relevant literature is presented in Section 2, followed by a discussion on how marginal distributions influence scatter plots under an identical dependence relationship in Section 3. In Section 4, we briefly delve into copula theory, while in Section 5, we revisit key concepts of dependence, association measures, and graphical representations. In Section 6, we introduce the concept of a d-plot, and in Section 7, we show their utility through several examples employing real data. Finally, Section 8 contains the main conclusions and a discussion.

2. Related Work

Statistical dependence is often quantified through computational methods, yet the value of visualization techniques should not be underestimated, as illustrated by Anscombe’s quartet [10] and other synthetic datasets (see [11]).

Scatter plots are the predominant method for visualizing the relationship between continuous random variables, with scatter plot matrices (SPLOM) extending this approach to pairwise analysis of several variables. For the latter, correlation matrices or corrgrams [12], which represent correlation measures through a color coding, are frequently employed. Although Pearson’s correlation coefficient is the default measure in many software packages, it is not always the most appropriate choice, particularly if the data does not follow a joint Gaussian distribution. We discuss preferable alternatives in Section 5.2.

Several research works have been conducted on the perception of Pearson’s correlation coefficient in scatter plots. Pioneering work by Doherty et al. [13] focused on absolute estimates, while other subsequent works have examined discriminative judgments by analyzing just-noticeable differences (JNDs) between the correlations of two scatter plots presented simultaneously. Rensink and Baldridge [14], as well as Harrison et al. [15], have found that correlation adheres to Weber’s Law, suggesting a linear relationship between correlation and JNDs. Conversely, Kay and Heer [16] proposed a log-linear model for this relationship. A critical limitation of these studies is their reliance on Gaussian-distributed data or data generated through linear regression models [13]. In contrast, Sher et al. [17] expanded the dataset variety by altering aspects like density, shape, and number of clusters, ultimately questioning the reliability of human estimates of Pearson correlation in diverse scatter plots and challenging their utility. Recently, Strain et al. [18,19] have studied the perception of correlation when varying aspects such as contrast and point size, proposing solutions to mitigate underestimating correlation judgements. The rank plots under study in this work could be used to visually assess the degree of association between continuous random variables. However, note that a complementary user study falls beyond the scope of this paper.

Wilkinson et al. [20] developed a collection of graph-theoretic scagnostics, which are measures related to scatter plots designed to aid in the exploration of large scatter plot matrices. Specifically, they considered the squared Spearman’s correlation coefficient to measure monotonicity, i.e., trends or degree of association (see [21]). In this paper, we employ rank plots to visualize monotonic relationships, which offer greater insight into dependence than a single numerical value. Moreover, we use Spearman’s correlation coefficient (not squared) along with Schweizer–Wolff’s dependence measure [22] due to their stronger theoretical foundation and advantages.

This paper relies on copula theory (to explore dependence between random variables), which has not been fully exploited in the visualization literature. Previous works by Hazarika et al. [23,24] applied copulas to visualize uncertainty and to analyze large-scale multivariate simulation data. However, our focus is on visualizations that specifically facilitate the assessment and interpretation of dependence through copulas and their transformations.

Lastly, for discrete or categorical variables, visual tools such as mosaic plots [25] or graphical association displays [26] have been used to assess dependence, though our study only concentrates on continuous random variables.

3. Effect of Marginal Distributions on Scatter Plots

Basic probability theory states that the relationship between two continuous random variables is encapsulated within their joint probability density or their joint cumulative distribution function, whereas marginals alone lack such dependency information. However, scatter plots inherently reflect data from marginal distributions. Consequently, distinct scatter plots may exhibit notable differences even when their associated variables adhere to the same dependency structure.

Figure 1 shows 25 different scatter plots for the simplest probabilistic relationship between two random variables: independence. Despite sharing the same dependence structure (in this case, independence), the scatter plots are quite diverse due to variations in the shapes of their corresponding marginal distributions. In this example, we have selected five types of marginal distributions (see Galtung’s classification of distributions [27]): (1) uniform, (2) unimodal monotone (peak on the left), (3) unimodal non-monotone (peak in the middle), (4) bimodal, and (5) “skewed” bimodal.

This example suggests that traditional scatter plots may not be ideal for visualizing dependence, since they are not invariant under changes in marginal distributions. However, as we will elucidate in the subsequent sections, a unique type of scatter plot, known as a rank plot (i.e., a scatter plot of ranks), adequately represents dependence information while remaining invariant to marginal distributions. A key characteristic of rank plots is that their associated marginals are always uniform.

Note that the scatter plot in the top-left corner of Figure 1 displays variables with uniform marginals, which closely resembles a rank plot. However, it is not a true rank plot based on ranks or empirical distribution functions. In this case, the uniform distribution of the plotted points over the unit square clearly indicates independence (roughly speaking, the values of one variable are unaffected by those of the other). Thus, in general, we can infer that traditional scatter plots distort the information contained in rank plots by incorporating details about marginal distributions, potentially leading to misinterpretations when assessing dependence. Subsequent sections will present the main copula theory results supporting the use of rank plots.

4. Bivariate Copulas

A key concept regarding statistical dependence between continuous random variables is copulas, which was introduced in 1959 by Sklar [28]. He proved that, for a given vector (X, Y) of two continuous random variables, there exists a unique function

C_{X, Y} : {[0, 1]}^{2} \to [0, 1]

such that

F_{X, Y} (x, y) = C_{X, Y} (F_{X} (x), F_{Y} (y)),

(1)

where

F_{X, Y} (x, y) = P (X \leq x, Y \leq y)

is the joint cumulative distribution function of (X, Y), and

F_{X} (x) = P (X \leq x)

and

F_{Y} (y) = P (Y \leq y)

define the marginal cumulative distribution functions of X and Y, respectively.

C_{X, Y}

is called the copula of the random vector (X, Y) and represents the unique functional link between the joint distribution and its marginals. Since marginal distributions have no information about how each random variable interacts with others, all the information about the dependence between random variables is contained in their underlying copulas. Thus, Sklar’s theorem implies that any proposal to analyze, measure, or visualize dependence should be based only on the information that can be obtained from the underlying copula of the random variables.

Basic results and properties about copulas may be found in [29,30]. For example, the underlying copula of a random vector is invariant under continuous strictly increasing transformations of its random variables:

C_{X, Y} = C_{g (X), h (Y)} g ↑, h ↑ .

(2)

This last property implies that the random vector (X, Y) has exactly the same dependence structure as

(g (X), h (Y))

, even though they could have different marginals. Consequently, dependence measures and visualizations aimed at illustrating only dependence should be identical in both cases.

Furthermore, recall from basic probability theory that for any continuous random variable X with cumulative probability distribution function

F_{X}

, the transformed random variable defined as

U = F_{X} (X)

has a continuous uniform distribution over the closed interval

[0, 1];

see [31] for a detailed discussion on generalized inverses (since

F_{X}

might be non-decreasing). In addition, its cumulative distribution function is the identity function

F_{U} (u) = u

for

0 \leq u \leq 1 .

These facts, combined with (1) and (2), have several important implications:

(C1): For any vector (U, V) of continuous uniform (0, 1) random variables, $F_{U, V} (u, v) = C_{U, V} (u, v) .$ In other words, copulas may be regarded as joint distributions with uniform (0, 1) marginals.
(C2): The random vectors (X, Y) and $(F_{X} (X), F_{Y} (Y))$ have the same underlying copula, due to (2), and therefore exactly the same dependence relationship.
(C3): Even though (X, Y) and $(F_{X} (X), F_{Y} (Y))$ share the same copula (i.e., dependence structure), their scatter plots may look considerably different, since the latter has uniform (0, 1) marginals but (X, Y) typically will not.

Hence, scatter plots do not have a unique representation for the same type of dependence. Also, as a consequence of (C2) and (C3), we may consider the scatter plot of

(F_{X} (X), F_{Y} (Y)),

which has uniform

(0, 1)

marginals, as a canonical dependence representation of any other random vector

(X, Y)

with the same dependence relationship but with any other marginal distributions. Lastly, as a consequence of (C1), such canonical representation would be a scatter plot of a random vector with uniform

(0, 1)

marginals and joint cumulative distribution equal to the underlying copula. In other words, the joint cumulative distribution of

(F_{X} (X), F_{Y} (Y))

would be the copula

C_{X, Y} .

Thus, a scatter plot of observations from

(F_{X} (X), F_{Y} (Y))

is a valid way to represent the information of a copula.

For example, consider the dependence structure associated with independence. Recall that two random variables X and Y are independent if and only if their joint distribution is equal to the product of its marginals, that is,

F_{X, Y} (x, y) = F_{X} (x) F_{Y} (y) .

Therefore, as a consequence of Sklar’s theorem (1), their unique underlying copula is given by

C_{X, Y} (u, v) = u v = Π (u, v),

(3)

which is usually known as the independence or product copula. For example, the independent datasets associated with the 25 scatter plots in Figure 1 all have the same underlying copula

Π

(despite having different marginals). Moreover, the canonical dependence scatter plot in this case would be the one in the upper left corner, which is a continuous uniform distribution over the unit square, where the marginals are uniform.

5. Dependence Types, Measures, and Plots

5.1. Quadrant Dependence

According to the results from Section 4, if the observations from

(F_{X} (X), F_{Y} (Y))

appear uniformly distributed over the unit square, we may conjecture that the random variables are independent (or exhibit a very weak dependence). Departures from this scenario imply some kind of dependence relationship that requires assessment. E. Lehmann described a comprehensive catalog of general types of dependencies [32], while R. Nelsen identified them in terms of copulas [29]. The most general and simple type is known as quadrant dependence, which may be positive (PQD) or negative (NQD):

\begin{matrix} PQD (X, Y) & \Leftrightarrow & P (X \leq x, Y \leq y) \geq P (X \leq x) P (Y \leq y) \\ \Leftrightarrow & P (X > x, Y > y) \geq P (X > x) P (Y > y) \\ \Leftrightarrow & C_{X, Y} (u, v) \geq Π (u, v) = u v . \end{matrix}

(4)

\begin{matrix} NQD (X, Y) & \Leftrightarrow & P (X \leq x, Y > y) \geq P (X \leq x) P (Y > y) \\ \Leftrightarrow & P (X > x, Y \leq y) \geq P (X > x) P (Y \leq y) \\ \Leftrightarrow & C_{X, Y} (u, v) \leq Π (u, v) = u v . \end{matrix}

(5)

Intuitively,

PQD (X, Y)

implies that the joint probability wherein values from X and Y are simultaneously small (or simultaneously large) is greater than or equal to the analogous probability if the variables were independent. As a consequence of (1), the copulas of X and Y will be greater than or equal to the product copula. We can also interpret that small (large) values of X tend to be more likely associated with small (large) values of Y. Conversely,

NQD (X, Y)

implies that small (large) values of X tend to be more likely associated with large (small) values of Y (and the copulas of X and Y will be less than or equal to

Π

). Thus, roughly speaking, the PQD and NQD are associated with increasing and decreasing trends, respectively. It is worthwhile to mention that there exist rank-based statistical tests of the PQD; see, for example, [33].

Regression models that fall into the category of quadrant dependence are those of the form

Y = φ (X) + ε

, where

φ

is a continuous and strictly monotone function (increasing or decreasing), and

ε

is a random noise variable centered around zero. Particularly, if

φ

is a linear function, we have the case of linear regression. In terms of observations from

(F_{X} (X), F_{Y} (Y)),

the PQD can be identified when they appear close to the graph of

v = u,

for

0 \leq u \leq 1

(i.e., the “main” diagonal). Alternatively, for the NQD, the points will lie close to the “secondary” diagonal corresponding to

v = 1 - u

.

Another significant aspect concerning copulas is the ability to ascertain the proximity to the PQD, NQD, and independence. As we have seen, a uniform distribution on

{[0, 1]}^{2}

of observations from

(F_{X} (X), F_{Y} (Y))

indicates independence. Therefore, a departure from uniformity indicates some type of dependence. The maximum deviation from independence can be established by applying Sklar’s theorem (1) to the Fréchet–Hoeffding bounds [34,35] for bivariate joint distributions:

W (u, v) \leq C_{X, Y} (u, v) \leq M (u, v),

(6)

where

W (u, v) = max {u + v - 1, 0}

and

M (u, v) = min {u, v}

are also copulas, see Figure 2. It is straightforward to prove that if

Y = φ (X)

, where

φ

is a continuous strictly increasing function, then

C_{X, Y} = M

, indicating a PQD. In such instances, the observations from

(F_{X} (X), F_{Y} (Y))

align precisely along the main diagonal v = u. Conversely, if

φ

is a continuous strictly decreasing function, then

C_{X, Y} = W

, signifying an NQD. In that case, the scatter plot of observations from

(F_{X} (X), F_{Y} (Y))

would exclusively contain points along the secondary diagonal

v = 1 - u

(see Figure 2).

For a general regression model

Y = φ (X) + ε

with

φ

continuous and strictly monotone, the smaller the variability of

ε

, the closer the underlying copula gets to one of the Fréchet–Hoeffding bounds (6), and the closer a scatter plot of observations from

(F_{X} (X), F_{Y} (Y))

gets to one of the diagonal lines v = u or

v = 1 - u

. Furthermore, as the variability of

ε

increases, the underlying copula will increasingly resemble the independence copula

Π

, and observations from

(F_{X} (X), F_{Y} (Y))

will more closely resemble a uniform distribution across the unit square.

5.2. Dependence Versus Concordance

According to Sklar’s theorem (1), since all the information about the dependence relationship between a pair of continuous random variables (X, Y) is in their underlying unique copula

C_{X, Y}

, any attempt to measure dependence must be copula-based only, considering how far

C_{X, Y}

is from the independence copula

Π

. Schweizer and Wolff (see [22]) proposed a measure

σ_{X, Y}

based on the

L_{1}

distance between the graphs of

C_{X, Y}

and

Π

defined as follows:

σ_{X, Y} = 12 \int_{0}^{1} \int_{0}^{1} |C_{X, Y} (u, v) - u v| d u d v .

(7)

It can be shown that

σ_{X, Y}

satisfies the properties of a measure of dependence, as defined in [29]. In particular, the double integral is multiplied by 12 in order to provide a normalized measure between 0 and 1. Note that the farthest a copula can be from

Π

is one of the Fréchet–Hoeffding bounds (6), and the

L_{1}

distance between M and

Π

, and between

Π

and W, is 1/12.

Furthermore, observe that

σ_{X, Y}

only depends on the copula

C_{X, Y}

(e.g., it does not depend on marginal distributions). Also,

σ_{X, Y} = 0

if and only if

C_{X, Y} = Π

, which occurs if and only if X and Y are independent. Note that this desirable unique characterization of independence is not provided by the popular Pearson’s correlation coefficient, for which a zero value does not necessarily imply independence.

Embrechts et al. [36] analyzed in detail additional pitfalls of Pearson’s correlation coefficient

r_{X, Y}

in terms of measuring dependence. If

Y = g (X)

with g strictly monotone, then in general,

r_{X, Y}

fails to achieve any of its extreme values

{- 1, + 1}

, for example, if X is uniform (0, 1) and

Y = X^{t}

for any

t > 1 .

In general,

r_{X, Y} \neq r_{log X, log Y}

, even though the copula of (X, Y) and

(log X, log Y)

is exactly the same, as a consequence of (2). Moreover,

r_{X, Y}

does not exist for every pair of random variables, since it depends on the existence of the marginal variances. Thus, it is not even a general linearity measure; for example, if X is Cauchy distributed and

Y = a + b X

, then even though there is a clear linear relationship between X and Y,

r_{X, Y}

does not even exist.

The pitfalls of Pearson’s correlation have their root in its link to marginal characteristics that have no information about dependence of random variables, since

r_{X, Y} = \frac{1}{\sqrt{V (X) V (Y)}} \int_{0}^{1} \int_{0}^{1} [C_{X, Y} (u, v) - u v] d F_{X}^{- 1} (u) d F_{Y}^{- 1} (v) .

(8)

Pearson’s correlation coefficient blends information from the dependence (given by the underlying copula

C_{X, Y}

) with marginal information that has nothing to do with dependence.

We could improve this correlation coefficient to make it marginal-free by considering

ρ_{X, Y} = r_{F_{X} (X), F_{Y} (Y)}

. Since

F_{X} (X)

and

F_{Y} (Y)

are continuous uniform (0, 1) distributions, their variances always exist and are equal to

\frac{1}{12}

. Furthermore, the dependence information from the random vector

(F_{X} (X), F_{Y} (Y))

is the same as for

(X, Y)

, as a consequence of (2). In fact,

ρ_{X, Y}

is Spearman’s correlation [37], also known as Spearman’s concordance measure, and (8) becomes

ρ_{X, Y} = 12 \int_{0}^{1} \int_{0}^{1} [C_{X, Y} (u, v) - u v] d u d v .

(9)

It can be shown that

ρ_{X, Y}

satisfies the properties of a measure of concordance, as defined in [29]. The only difference between

σ_{X, Y}

in (7) and

ρ_{X, Y}

is that the integrand in (9) is not in absolute value. Therefore,

ρ_{X, Y} = 0

does not necessarily imply independence, and it cannot be considered a dependence measure.

From (4), (5), (7), and (9), we have the following relationships:

(B1): $| ρ_{X, Y} | \leq σ_{X, Y};$
(B2): $σ_{X, Y} = 0$ implies $ρ_{X, Y} = 0$ , but not vice versa;
(B3): $PQD (X, Y)$ if and only if $σ_{X, Y} = ρ_{X, Y};$
(B4): $NQD (X, Y)$ if and only if $σ_{X, Y} = - ρ_{X, Y};$
(B5): X and Y are neither PQD nor NQD if and only if $| ρ_{X, Y} | < σ_{X, Y} .$

The pair of values

(ρ_{X, Y}, σ_{X, Y})

provides valuable insights into the relationship between two continuous random variables. By understanding these two values, we can promptly discern their independence and, if dependent, classify whether the relationship is PQD, NQD, or neither of these.

Another widely used concordance measure is Kendall’s

τ_{X, Y};

see, for example, [38]. It is also based solely on the copula and has a well-defined interpretation as the probability of concordance minus the probability of discordance. Unfortunately, this concordance measure does not have a closely related dependence measure counterpart, as Spearman’s concordance does with Schweizer–Wolff’s measure.

5.3. Non-Quadrant Dependence

If

| ρ_{X, Y} | < σ_{X, Y}

, then X and Y are not quadrant dependent. This implies, as inferred from Equations (4) and (5), that within certain subsets of the unit square,

C_{X, Y}

exceeds

Π

, while in complementary regions

C_{X, Y}

falls below

Π

. Equivalently, within certain areas of the support of (X, Y) the relationship between X and Y is a PQD, while in other regions it is an NQD. In subsequent subsections, we analyze two primary approaches for achieving this behavior (other alternatives exist).

5.3.1. Convex Linear Combinations

If

C_{1}

and

C_{2}

are two copulas and

0 \leq θ \leq 1

, then as proven in [29], any convex linear combination of them is also a copula:

C_{1, 2, θ} (u, v) = (1 - θ) C_{1} (u, v) + θ C_{2} (u, v) .

(10)

If, for example, we choose as

C_{1}

a PQD copula and as

C_{2}

a NQD one, then (10) would be a non-quadrant dependence copula, and by (1) we can build a joint cumulative distribution function for a vector (X, Y) of continuous random variables with such copula and any given continuous marginals; see, for example, case R4 in Figure 3.

5.3.2. Gluing Copulas

It is possible to combine two copulas through a copula construction technique known as “gluing copulas” [39]. Given two copulas

C_{1}

and

C_{2}

and a fixed value

0 < θ < 1,

called a gluing point, we scale and glue them (intuitively, “concatenate” them) horizontally according to a vertical partition of the unit square. In particular,

C_{1}

is scaled to

[0, θ] \times [0, 1]

and

C_{2}

to

[θ, 1] \times [0, 1] .

Finally, they are glued into a single copula:

C_{1, 2, θ} (u, v) = \{\begin{matrix} θ C_{1} (\frac{u}{θ}, v), if 0 \leq u \leq θ, \\ (1 - θ) C_{2} (\frac{u - θ}{1 - θ}, v) + θ v, if θ \leq u \leq 1 . \end{matrix}

(11)

If we choose as

C_{1}

a PQD copula and as

C_{2}

an NQD copula, then (11) would be a non-quadrant dependence copula. By (1), we can build a joint cumulative distribution function for a vector (X, Y) of continuous random variables by

F_{X, Y} (x, y) = C_{1, 2, θ} (F_{X} (x), F_{Y} (y))

, where

PQD (X, Y)

if

X \leq F_{X}^{- 1} (θ)

but

NQD (X, Y)

if

X \geq F_{X}^{- 1} (θ)

(see case R7 in Figure 3). The gluing copula technique is particularly useful for piecewise regression [40].

5.4. Rank Plots and Empirical Estimation

The use of rank plots and pseudo-observations in dependence and copula modeling is well established in the literature (see, e.g., [38,41]). These plots are widely used for visualizing empirical copulas and assessing the structure of dependence between continuous random variables.

In practice, we usually have a random sample of paired observed values

{(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

from a random vector

(X, Y)

with an unknown joint probability distribution and also unknown marginal distributions. In this case, a natural replacement for observed values of

(F_{X} (X), F_{Y} (Y))

would be the set of pairs

{(u_{k}, v_{k}) = (F_{n} (x_{k}), G_{n} (y_{k})) : k = 1, \dots, n},

which constitutes a consistent empirical approximation of the vector

(F_{X} (X), F_{Y} (Y)) .

F_{n}

and

G_{n}

are unbiased and consistent estimates of the marginal distributions

F_{X}

and

F_{Y},

respectively, which are known as empirical distribution functions [42]. Moreover, the empirical copula

C_{n}

defined in (12) is a consistent estimator of the true copula C, and its asymptotic behavior has been studied in depth. In particular, the process

\sqrt{n} (C_{n} - C)

converges weakly under broad conditions; see [43] for continuous margins and [44] for more general distributions.

For continuous random variables, there are no repeated values, and therefore,

n F_{n} (x_{k}) = rank (x_{k}),

that is, the total number of observations from X that are equal to or less than

x_{k} .

Thus, a scatter plot of

(u_{k}, v_{k})

values is a bivariate plot of observed ranks scaled to lie in the unit square, which we will call a rank plot (some authors call it a plot of pseudo-observations [41]), and constitutes an empirical approximation of a scatter plot from the unknown underlying copula.

According to what has been already discussed in the previous sections, rank plots appropriately illustrate the dependence between the data variables. Table 1 and Figure 3 show a categorization of dependence types that can be used as a set of guidelines for interpreting rank plots. In practice, the described patterns might be quite clear but sometimes are not. To help in this empirical dependence assessment, in addition to ranking plots, we propose analyzing empirical estimations for Schweizer–Wolff’s dependence measure (7) and Spearman’s concordance (9) to take advantage of their combined interpretation, as explained in Section 5.2.

The empirical estimation of the underlying bivariate copula is given by a function

C_{n}

, with domain the grid

{0, \frac{1}{n}, \dots, \frac{n - 1}{n}, 1}^{2}

, which is defined as follows:

C_{n} (\frac{i}{n}, \frac{j}{n}) = \frac{1}{n} \sum_{k = 1}^{n} 1_{{x_{k} \leq x_{(i)}, y_{k} \leq y_{(j)}}},

(12)

where

x_{(i)}

denotes the i-th order statistic, and

C_{n} (i / n, 0) = 0 = C_{n} (0, j / n)

. The function

C_{n}

is usually referred to as the empirical copula, though originally it was introduced as empirical dependence function by [45]. In addition, the empirical estimation of (9) is [29]

ρ_{n} = \frac{12}{n^{2} - 1} \sum_{i = 1}^{n} \sum_{j = 1}^{n} [C_{n} (\frac{i}{n}, \frac{j}{n}) - \frac{i j}{n^{2}}],

(13)

while for (7), its empirical estimation

σ_{n}

is obtained by replacing the differences in the sums in (13) with the absolute value of the differences.

5.5. Diagonal Sections

Besides

ρ_{n}

and

σ_{n}

, there are other characteristics of the copula that help in interpreting a rank plot. For example, it is useful to determine whether the copula is above or below

Π

or how close it is to M or W. Even though we could visualize

C_{n}

either by a 3D surface plot (as in Figure 2) or a contour plot, it would be difficult to grasp the nuances that allow us to interpret the dependence structure in these visualizations. Alternatively, we propose visualizing the graphs of the main and secondary diagonal sections of the copula. These simplifications can be useful for detecting departures from independence, PQD, NQD, and especially for identifying cases exhibiting both PQD and NQD, which may be analyzed through gluing copulas (see [40,46]).

The main diagonal section of a copula C is given by

δ_{C} (t) = C (t, t),

and the secondary diagonal by

λ_{C} (t) = C (t, 1 - t) .

As an immediate consequence of (6) we have

δ_{W} (t) = max {2 t - 1, 0} \leq δ_{C} (t) \leq t = δ_{M} (t),

(14)

and

λ_{W} (t) = 0 \leq λ_{C} (t) \leq min {t, 1 - t} = λ_{M} (t),

(15)

for

0 \leq t \leq 1

, where

δ_{W} (t)

and

λ_{W} (t)

represent the main and secondary diagonal sections of W, respectively. Similarly,

δ_{M} (t)

and

λ_{M} (t)

are diagonal sections of M. Figure 4 shows these bounds, together with the diagonal sections from the independence copula, which are

δ_{Π} (t) = t^{2}

and

λ_{Π} (t) = t (1 - t)

. In cases of PQD (X, Y), we have

δ_{X, Y} (t) \geq t^{2}

and

λ_{X, Y} (t) \geq t (1 - t) .

Likewise, for NQD

(X, Y)

, we have

δ_{X, Y} (t) \leq t^{2}

and

λ_{X, Y} (t) \leq t (1 - t) .

If there is a crossing between

δ_{Π}

and

δ_{X, Y}

and/or between

λ_{Π}

and

λ_{X, Y}

, then there would not be a PQD or an NQD, and the crossing point

t = θ

could be a gluing point of two copulas, as described in (11).

6. D-plots

In order to comprehensively analyze bivariate data and effectively visualize the dependency structure between the variables (as discussed in the previous sections), along with characteristics depicted in traditional scatter plots, we advocate for the simultaneous visualization of the following:

(a): The regular scatter plot to visualize characteristics such as concrete data values, clusters, or outliers.
(b): The rank plot to analyze the dependence between the variables without the distortion of the marginal distributions and to identify it according to the categorization in Table 1 and Figure 3.
(c): Marginal histograms to visualize the marginal behavior of the probability density functions (pdfs) in cases where it could be hard to extract from the scatter plot (e.g., due to occlusion caused by a large number of displayed points) and understand their combined influence in the shape of the scatter plot (as in Figure 1).
(d): Marginal box plots to identify the presence and amount of outliers for understanding the scales of the axes of the scatter plot.
(e): Empirical copula diagonals to visualize presence/absence of quadrant dependence and possibly the need to partition the data (applying the gluing copula technique) to decompose the dependence into simpler quadrant dependencies.
(f): A bar chart showing the absolute value of the empirical Spearman’s concordance $ρ_{n}$ and the Schweizer–Wolff’s dependence measure $σ_{n}$ (if the bar colors are different, then the sign of $ρ_{n}$ is negative). The combination of these two values is helpful for quantifying the degree of quadrant dependence and for finding gluing points when applying the gluing copula technique.

In this paper, we have organized these components into a

3 \times 3

grid, as shown in Figure 5, and other examples (although other configurations would also be valid), which we call a dependence plot (d-plot). In the remainder of this section, we will introduce two theoretical cases, while Section 7 will contain examples with real data.

6.1. Example 1: Non-Quadrant Dependence

Consider a vector of two continuous random variables (X, Y) with the following characteristics:

X is bimodal (Kumaraswamy distribution with parameters 0.25 and 0.15);
Y is non-monotone unimodal (t-Student with location parameter 3.0, scale parameter 1.5, and 2.5 degrees of freedom);
If X is below its median, then there is NQD (X, Y), while otherwise, there is PQD (X, Y). In both cases, we have used the parametric Frank family of copulas (see [29]) with parameters −30.0 and 30.0, respectively.

This example falls into category R8 of Table 1, and its corresponding d-plot is shown in Figure 5. In the center, we have the usual scatter plot of (X, Y). The associated rank plot is immediately above, and it clearly exhibits a gluing of an NQD copula with a PQD one, with

θ = 0.5

as the gluing point. This can be confirmed by analyzing the graphs associated with the diagonal sections of the empirical copula with the independence copula

Π

(top-right and center-right). Note that the empirical diagonals cross the

Π

diagonals at 0.5. Also, the empirical diagonals start being below

Π

(which confirms NQD) but end up above

Π

after u = 0.5.

In the bar chart in the lower left corner, we represent the absolute value of Spearman’s empirical concordance, together with Schweizer–Wolff’s

σ_{n}

(since it is useful to compare these magnitudes). Also, we indicate the sign of

ρ_{n}

through color. For negative values we use a light color (in this case, yellow), and for positive values, we use black (which we also use for the bar associated with

σ_{n}

). In this case, the bar chart shows a clear numerical difference

0.02 = | ρ_{n} | < σ_{n} = 0.49

(by construction of the example, the theoretical value of Spearman’s concordance is exactly equal to zero). This confirms that the dependence between X and Y is neither PQD nor NQD all the time, as expected.

The remainder of the graphs are related to marginal characteristics of the variables. There are no observed outliers for X, but there are several for Y. Also, the pdf for X is considerably left-skewed. These are the main reasons why the dependency structure is hard to visualize in the scatter plot, but it is apparent in the marginal-free rank plot. Furthermore, users might erroneously perceive an overall decreasing trend, since the points show a slight decreasing trend when

X \leq 0.9

. Moreover, the empirical Pearson’s correlation coefficient in this case is

r_{n} = - 0.40,

which is quite misleading.

6.2. Example 2: Positive Quadrant Dependence with Noise

Consider a random vector (X, Y), where X is a Pareto (2, 10) random variable,

ε

a random noise distributed normal (0, 0.03), Z is another Pareto (2, 10) random variable independent from X, and B a Bernoulli (0.4) random variable. If we define a random variable Y as

Y = (1 - B) (X + ε) + B Z

We have a probabilistic model for (X, Y) where, with probability 0.6, they exhibit a strong linear relationship, but with probability 0.4, they behave as independent random variables. Figure 6 shows the corresponding d-plot, where the rank plot is of type R5 (see Table 1 and Figure 3). Note that it can be interpreted as a (convex) combination of R1 (independence) and R2 (PQD, due to the positive linear relationship). In this example, Schweizer–Wolff’s dependence is

σ_{n} = 0.58

, which is equal in magnitude and sign to the empirical Spearman’s concordance, which implies PQD (see property B3 in Section 5.2). In this example, the diagonal sections also distinctly confirm PQD, as the empirical copula consistently exceeds the independence copula

Π

. Lastly, while discerning the linear relationship in the scatter plot is relatively straightforward, the independence relation is less apparent. Conversely, both relationships are clearly evident in the rank plot.

7. Real Data Examples

In this section we illustrate the previous ideas and proposals with real data available from public sources.

7.1. Example 3: NQD with Outliers

In this example, we used the dataset “CEO vs Worker Pay in Top 3000 US Companies [2023]” (see [47]) to analyze the dependence between the “pay ratio” (X = CEO/worker salary) versus “median worker pay” (Y); see Figure 7. Firstly, it is difficult to analyze the relationship through the scatter plot, mainly due to the presence of outliers, which are clearly shown in the box plots. In this case, the rank plot falls into the R3 category (since the distribution of points is not uniform we can discard independence). Other components of the d-plot also confirm a negative quadrant dependence. The equal height of the bars for Schweizer–Wolff’s dependence and the absolute value of Spearman’s concordance, where the latter is negative (yellow bar), along with the empirical diagonals that in both cases are below the independence diagonals, clearly indicate NQD. Numerically,

σ_{n} = 0.55

and

ρ_{n} = - 0.55,

which are in contrast to Pearson’s correlation

r_{n} = - 0.18,

which is far from the concordance value, since it is affected by the presence of outliers.

7.2. Example 4: Gluing PQD and NQD

For the next example, we used the “Cloud” dataset [48] with features about images related to climate. Specifically, we analyzed the dependence between “infrared minimum value (ir-min)“ (X) and “contrast” (Y) through the d-plot in Figure 8. The scatter plot appears to show a strictly increasing trend for Y in terms of X plus some random noise. However, the rank plot is suggesting something slightly different: an increasing trend for most parts of the data but that is decreasing for larger values of X. This would explain the difference between Spearman’s concordance

ρ_{n} = + 0.46

and Schweizer–Wolff’s dependence

σ_{n} = 0.51,

which suggests the presence of both PQD and NQD. The empirical diagonals also indicate both PQD and NQD, since they cross the independent copula

Π

at about u = 0.8, passing from being above

Π

to lying slightly below it (in this case, it may be necessary to zoom in).

The analysis of the d-plot suggests a gluing of a PQD copula followed by an NQD copula. If we order the bivariate observations in terms of the observed values of X and split them into two subsets, the first 80% as data subset 1 and the rest as subset 2, the corresponding d-plots (see Figure 9 and Figure 10) reveal that subset 1 is PQD and subset 2 is NQD, which confirms that using

u = 0.8

as gluing point is a good choice. It is worth noticing that the best gluing point is the one such that in each subset the absolute value of Spearman’s concordance equals that of Schweizer–Wolff’s dependence. Specifically, for subset 1, we have

ρ_{n}^{(1)} = + 0.76 = σ_{n}^{(1)}

, and for subset 2, we have

ρ_{n}^{(2)} = - 0.49

and

σ_{n}^{(2)} = 0.49 .

Finally, this would be an example of type R7 dependence, with gluing point u = 0.8 of PQD followed by NQD. In terms of the original variables, it is equivalent to splitting the observed data for

(X, Y)

by conditioning

X \leq F_{n}^{- 1} (0.8) = 28.36

for PQD and

X \geq F_{n}^{- 1} (0.8) = 28.36

for NQD. However, note that it is not easy to determine when the relationship between the variables begins to decrease in the scatter plot.

Jointly, or separately in subsets 1 and 2, the marginal density for Y seems to be monotone unimodal, while X appears to be multimodal, especially in Figure 8 and Figure 9, which produces the slight impression in the scatter plot of two clusters divided approximately by the vertical line

X = - 30 .

Thus, the scenario is similar to the one in the graph in row 2, column 4, of Figure 1. The appearance of clusters is due to the bimodal marginal distribution of X and not to the statistical dependence between the variables.

7.3. Example 5: Apparently Independent Variables

In this example, we used a random subsample of size 1,000 from the “Song Features Dataset - Regressing Popularity” (Spotify Song features) dataset in [49], comparing tempo (X) and acousticness (Y), through the d-plot in Figure 11. This is an example where from both the scatter plot and even the rank plot we may probably be tempted to assess independence at a first glance. For example, the scatter plot appears to be similar to the one in row 5, column 3, of Figure 1. However, note a slight shift to the left of the points in the upper part of the scatter plot compared to the points on the lower part. Also, in the rank plot, there is slightly less density of points on the bottom-left and upper-right corners compared to the density in the upper-left and bottom-right counterparts. Thus, the plots suggest a weak decreasing relationship. This is confirmed by examining the bar chart and diagonal sections. Specifically, not only does Schweizer–Wolff’s dependence measure

σ_{n} = 0.2

indicate some dependence (albeit a weak one), in this case,

σ_{n} = | ρ_{n} |

, with

ρ_{n} < 0

, which clearly indicates NQD. Furthermore, the empirical diagonals also suggest NQD, since they lie (slightly) are below their independence counterparts. As a conclusion, this would be an example of weak dependence of type R3 according to Table 1, where by “weak” we mean not to far from R1 but not as clear as the R3 type in Figure 7.

8. Conclusions and Discussion

This work presents several contributions related to the visual assessment of dependence between two continuous random variables. Firstly, it reviews essential theory regarding copulas, which is necessary for understanding why rank plots should be chosen over traditional scatter plots for assessing dependence. In contrast to scatter plots, rank plots do not include uninformative, and possibly misleading, information about marginal distributions that are unrelated to dependence. The paper also provides guidelines (see Table 1 and Figure 3) for using and interpreting rank plots, identifying nine categories related to broad types of dependencies and combinations of these.

Regarding association measures, we have highlighted the superior reliability of dependence measures, such as Schweizer–Wolff’s (7), and concordance measures like Spearman’s (9) over Pearson’s correlation coefficient. Similar to the comparison between rank plots and scatter plots, the key characteristic of Schweizer–Wolff’s dependence and Spearman’s concordance is that they derive solely from the copula and are therefore unaffected by marginal distributions. Instead, Pearson’s correlation combines information from the copula with marginal characteristics and can therefore be misleading, as we have shown in several examples. We believe this is relevant for the entire scientific community, since Pearson’s correlation is arguably the most popular association measure used in practice and the default option in many software packages, despite previous efforts to communicate its limitations (see [36]). It is also relevant for the visualization community, since Pearson’s correlation has been studied intensively in relation to scatter plots. Moreover, some authors may view scatter plots as tools for communicating Pearson correlation [18,19].

The paper coalesces around the idea of the dependence plot (d-plot), an ensemble of nine graphs that encapsulates both the scatter plot and marginal distributions, together with visualizations that focus solely on aspects of dependence. The former are useful for detecting clusters, outliers, and examining the specific data values, among other tasks. Regarding the latter, rank plots provide a faithful description of the dependence between the variables, while Spearman’s concordance and Schweizer–Wolff’s dependence are appropriate association summaries, since they depend exclusively on the copula. In addition, the visualizations related to diagonal sections of the copula can help users decompose complex dependence patterns into simpler quadrant-dependent scenarios by conditioning on one variable. This idea is related to gluing copulas (i.e., describing the dependence through several copulas). Although we have addressed the gluing of two copulas, extending this concept to multiple copulas and/or conditioning in both variables is straightforward.

The ideas put forward in the paper can be useful for using or developing other visualization techniques in which it may be appropriate to visualize the dependency between two continuous random variables. For example, the theory clearly advocates for replacing data values with ranks (note that it would be straightforward to incorporate ranks in methods like parallel coordinates [50], table lens [51], and many others). Naturally, researchers should also consider replacing or complementing Pearson’s correlation with Spearman’s rank correlation or Schweizer–Wolff’s dependence in their visualizations.

The general approach of this paper is non-parametric, as no model assumptions are made regarding the bivariate data under analysis. Nevertheless, the guidelines provided in Table 1 for interpreting rank plots are useful in making decisions about fitting specific parametric families of copulas. For instance, if it is evident from the data that negative quadrant dependence (NQD) is present, then only parametric families of copulas accommodating NQD should be considered for goodness-of-fit testing. Similarly, if the rank plot of the data resembles, for example, case R8, the data should first be divided using the gluing copula technique. Subsequently, we may try fitting a parametric family of copulas with NQD to one subset and a parametric family with positive quadrant dependence (PQD) to the other subset.

The current formulation of d-plots is designed for continuous variables due to the reliance on rank-based transformations and copula theory. However, extending this framework to ordinal or mixed-type data is a natural next step; see, for example, [52,53]. In such cases, subcopulas and discrete-specific dependence measures may be employed (e.g., as discussed in [54]. Adapting rank plots to handle tied data and incorporating appropriate estimation techniques remains an open and promising research direction.

Finally, we also envision carrying out perceptual studies of Spearman’s rank correlation and Schweizer–Wolff’s dependence measure on rank plots, given their superiority over Pearson’s correlation and scatter plots for assessing dependence.

Author Contributions

Conceptualization, A.E. and M.R.-S.; methodology, A.E. and M.R.-S.; software, A.E.; validation, A.E. and M.R.-S.; formal analysis, A.E. and M.R.-S.; investigation, A.E. and M.R.-S.; resources, A.E. and M.R.-S.; data curation, A.E. and M.R.-S.; writing—original draft preparation, A.E. and M.R.-S.; writing—review and editing, A.E. and M.R.-S.; visualization, A.E. and M.R.-S.; supervision, A.E. and M.R.-S.; project administration, A.E. and M.R.-S.; funding acquisition, A.E. and M.R.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a grant from Universidad Nacional Autónoma de México (PAPIIT IN104425) and by a grant from Universidad Rey Juan Carlos (PID2023-149457OB-I00).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the Julia programming code and datasets used for calculations and generating figures are available for reproducibility https://github.com/aerdely/visualdep (accessed on 15 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Friendly, M.; Denis, D. The Early Origins and Development of the Scatterplot. J. Hist. Behav. Sci. 2005, 41, 103–130. [Google Scholar] [CrossRef] [PubMed]
Friendly, M. Data Ellipses, HE Plots and Reduced-Rank Displays for Multivariate Linear Models: SAS Software and Examples. J. Stat. Softw. 2006, 17, 1–43. [Google Scholar] [CrossRef]
Elmqvist, N.; Dragicevic, P.; Fekete, J.D. Rolling the Dice: Multidimensional Visual Exploration using Scatterplot Matrix Navigation. IEEE Trans. Vis. Comput. Graph. 2008, 14, 1148–1539. [Google Scholar] [CrossRef] [PubMed]
Chan, Y.H.; Correa, C.D.; Ma, K.L. The Generalized Sensitivity Scatterplot. IEEE Trans. Vis. Comput. Graph. 2013, 19, 1768–1781. [Google Scholar] [CrossRef]
Yates, A.; Webb, A.; Sharpnack, M.; Chamberlin, H.M.; Huang, K.; Machiraju, R. Visualizing Multidimensional Data with Glyph SPLOMs. Comput. Graph. Forum 2014, 33, 301–310. [Google Scholar] [CrossRef]
Nguyen, H.; Rosen, P.; Wang, B. Visual Exploration of Multiway Dependencies in Multivariate Data. In Proceedings of the SIGGRAPH ASIA 2016 Symposium on Visualization, New York, NY, USA, 6–8 December 2016. [Google Scholar] [CrossRef]
Nguyen, H.; Rosen, P. Correlation Coordinate Plots: Efficient Layouts for Correlation Tasks. In Computer Vision, Imaging and Computer Graphics Theory and Applications; Springer International Publishing: Cham, Switzerland, 2017; pp. 264–286. [Google Scholar] [CrossRef]
Sarikaya, A.; Gleicher, M. Scatterplots: Tasks, Data, and Designs. IEEE Trans. Vis. Comput. Graph. 2018, 24, 402–412. [Google Scholar] [CrossRef]
Lu, M.; Wang, S.; Lanir, J.; Fish, N.; Yue, Y.; Cohen-Or, D.; Huang, H. Winglets: Visualizing Association with Uncertainty in Multi-class Scatterplots. IEEE Trans. Vis. Comput. Graph. 2020, 26, 770–779. [Google Scholar] [CrossRef]
Anscombe, F.J. Graphs in Statistical Analysis. Am. Stat. 1973, 27, 17–21. [Google Scholar] [CrossRef]
Matejka, J.; Fitzmaurice, G. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. In Proceedings of the 2017 CHI’17 Conference on Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 1290–1294. [Google Scholar] [CrossRef]
Friendly, M. Corrgrams: Exploratory Displays for Correlation Matrices. Am. Stat. 2002, 56, 316–324. [Google Scholar] [CrossRef]
Doherty, M.E.; Anderson, R.B.; Angott, A.M.; Klopfer, D.S. The Perception of Scatterplots. Percept. Psychophys. 2007, 69, 1261–1272. [Google Scholar] [CrossRef]
Rensink, R.; Baldridge, G. The Perception of Correlation in Scatterplots. Comput. Graph. Forum 2010, 29, 1203–1210. [Google Scholar] [CrossRef]
Harrison, L.; Yang, F.; Franconeri, S.; Chang, R. Ranking Visualizations of Correlation Using Weber’s Law. IEEE Trans. Vis. Comput. Graph. 2014, 20, 1943–1952. [Google Scholar] [CrossRef] [PubMed]
Kay, M.; Heer, J. Beyond Weber’s Law: A Second Look at Ranking Visualizations of Correlation. IEEE Trans. Vis. Comput. Graph. 2016, 22, 469–478. [Google Scholar] [CrossRef] [PubMed]
Sher, V.; Bermis, K.; Liccardi, I.; Chen, M. An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots. Comput. Graph. Forum 2017, 36, 61–72. [Google Scholar] [CrossRef]
Strain, G.; Stewart, A.J.; Warren, P.; Jay, C. The Effects of Contrast on Correlation Perception in Scatterplots. Int. J.-Hum.-Comput. Stud. 2023, 176, 103040. [Google Scholar] [CrossRef]
Strain, G.; Stewart, A.J.; Warren, P.; Jay, C. Adjusting Point Size to Facilitate More Accurate Correlation Perception in Scatterplots. In Proceedings of the 2023 IEEE Vis X Vision, Melbourne, Australia, 22–27 October 2023; pp. 1–5. [Google Scholar] [CrossRef]
Wilkinson, L.; Anand, A.; Grossman, R. Graph-Theoretic Scagnostics. In Proceedings of the 2005 IEEE Symposium on Information Visualization, Minneapolis, MN, USA, 23–25 October 2005; pp. 157–164. [Google Scholar] [CrossRef]
Dang, T.N.; Anand, A.; Wilkinson, L. TimeSeer: Scagnostics for High-Dimensional Time Series. IEEE Trans. Vis. Comput. Graph. 2013, 19, 470–483. [Google Scholar] [CrossRef]
Schweizer, B.; Wolff, E.F. On Nonparametric Measures of Dependence for Random Variables. Ann. Stat. 1981, 9, 879–885. [Google Scholar] [CrossRef]
Hazarika, S.; Biswas, A.; Shen, H.W. Uncertainty Visualization Using Copula-Based Analysis in Mixed Distribution Models. IEEE Trans. Vis. Comput. Graph. 2018, 24, 934–943. [Google Scholar] [CrossRef]
Hazarika, S.; Dutta, S.; Shen, H.W.; Chen, J.P. CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data. IEEE Trans. Vis. Comput. Graph. 2019, 25, 1214–1224. [Google Scholar] [CrossRef]
Friendly, M. Visualizing Categorical Data; SAS Institute: Cary, NC, USA, 2001. [Google Scholar]
Genest, C.; Green, P. A Graphical Display of Association in Two-Way Contingency Tables. J. R. Stat. Soc. Ser. D (Stat.) 1987, 36, 371–380. [Google Scholar] [CrossRef]
Galtung, J. Theory and Methods of Social Research; Columbia University Press: New York, NY, USA, 1967. [Google Scholar] [CrossRef]
Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 1959, 8, 229–231. [Google Scholar]
Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
Joe, H. Dependence Modeling with Copulas; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
Embrechts, P.; Hofert, M. A Note on Generalized Inverses. Math. Methods Oper. Res. 2013, 77, 423–432. [Google Scholar] [CrossRef]
Lehmann, E.L. Some Concepts of Dependence. Ann. Math. Statist. 1966, 37, 1137–1153. [Google Scholar] [CrossRef]
Tang, C.F.; Wang, D.; Barmi, H.E.; Tebbs, J. Testing for Positive Quadrant Dependence. Am. Stat. 2021, 75, 23–30. [Google Scholar] [CrossRef]
Hoeffding, W. Masstabinvariante Korrelationstheorie. Schrif. Mat. Inst. Inst. Angew. Math. Univ. Berl. 1940, 5, 179–223. [Google Scholar]
Fréchet, M. Sur les tableaux de corrélation dont les marges sont données. Ann. Univ. Lyon 1951, 14, 53–77. [Google Scholar] [CrossRef]
Embrechts, P.; McNeil, A.; Straumann, D. Correlation: Pitfalls and Alternatives. Risk Mag. 1999, 5, 69–71. [Google Scholar]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
Genest, C.; Favre, A.C. Everything you always wanted to know about Copula Modeling but were afraid to ask. J. Hydrol. Eng. 2007, 12, 347–368. [Google Scholar] [CrossRef]
Siburg, K.F.; Stoimenov, P. Gluing Copulas. Commun. Stat. Theory Meth. 2008, 37, 3124–3134. [Google Scholar] [CrossRef]
Erdely, A. Copula-based Piecewise Regression. In Copulas and Dependence Models with Applications; Úbeda Flores, M., de Amo Artero, E., Durante, F., Fernández-Sánchez, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Chapter 6; pp. 69–81. [Google Scholar] [CrossRef]
Hofert, M.; Kojadinovic, I.; Machler, M.; Yan, J. Elements of Copula Modeling with R; Springer: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
Wasserman, L. All of Nonparametric Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Segers, J. Asymptotics of Empirical Copula Processes under Non-restrictive Smoothness Assumptions. Bernoulli 2012, 18, 764–782. [Google Scholar] [CrossRef]
Genest, C.; Neslehova, J.; Remillard, B. Asymptotic Behavior of the Empirical Multilinear Copula Process under Broad Conditions. J. Multivar. Anal. 2017, 159, 82–110. [Google Scholar] [CrossRef]
Deheuvels, P. La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance. Bull. Acad. R. Belg. 1979, 55, 274–292. [Google Scholar] [CrossRef]
Erdely, A.; Díaz-Viera, M. Nonparametric and Semiparametric Bivariate Modeling of Petrophysical Porosity-Permeability Dependence from Well Log Data. In Copula Theory and Its Applications; Jaworski, P., Durante, F., Hardle, W., Richlik, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Chapter 13; pp. 267–278. [Google Scholar] [CrossRef]
Salim, W. CEO vs Worker Pay in Top 3000 United States Companies. 2023. Available online: https://www.kaggle.com/datasets/salimwid/latest-top-3000-companies-ceo-salary-202223 (accessed on 10 June 2024).
Collard, P. Cloud: UCI Machine Learning Repository. 1989. Available online: https://archive.ics.uci.edu/dataset/155/cloud (accessed on 10 June 2024). [CrossRef]
Oturkar, A. Song Features Dataset-Regressing Popularity. 2023. Available online: https://www.kaggle.com/datasets/ayushnitb/song-features-dataset-regressing-popularity (accessed on 10 June 2024).
Inselberg, A.; Dimsdale, B. Parallel coordinates: A tool for visualizing multi-dimensional geometry. In Proceedings of the VIS’90 1st Conference on Visualization, San Francisco, CA, USA, 23–26 October 1990; pp. 361–378. [Google Scholar] [CrossRef]
Rao, R.; Card, S.K. The table lens: Merging graphical and symbolic representations in an interactive focus + context visualization for tabular information. In Proceedings of the CHI’94 Conference Companion on Human Factors in Computing Systems, Boston, MA, USA, 24–28 April 1994; pp. 318–322. [Google Scholar] [CrossRef]
Liu, D.; Li, S.; Yu, Y.; Moustaki, I. Assessing Partial Association between Ordinal Variables: Quantification, visualization, and hypothesis testing. J. Am. Stat. Assoc. 2021, 116, 955–968. [Google Scholar] [CrossRef]
Li, S.; Fan, Z.; Morrison, P.; Liu, D. Surrogate Method for Partial Association between Mixed Data with Application to Well-being Survey Analysis. Ann. Appl. Stat. 2024, 18, 2254–2276. [Google Scholar] [CrossRef]
Erdely, A. A Subcopula Based Dependence Measure. Kybernetika 2017, 53, 231–243. [Google Scholar] [CrossRef]

Figure 1. Bivariate scatter plots of independent continuous random variables, but with different marginal distributions, as indicated by histograms. Specifically, we generated pairs of five marginal distributions in the following order: uniform, monotone, non-monotone unimodal, bimodal, and “skewed” bimodal.

Figure 2. Basic copulas

Π

(indicating independence), M (illustrating a strictly increasing relationship), and W (representing a strictly decreasing relationship), together with observations from these copulas.

Figure 2. Basic copulas

Π

(indicating independence), M (illustrating a strictly increasing relationship), and W (representing a strictly decreasing relationship), together with observations from these copulas.

Figure 3. Examples of rank plots associated with the guideline cases in Table 1. R1 (independence), R2 (PQD), and R3 (NQD). Convex linear combinations: R4 (PQD + NQD), R5 (independence + PQD), and R6 (independence + NQD). Gluing copulas: R7 (PQD|NQD), R8 (NQD|PQD), and R9 (independence|NQD).

Figure 4. Main diagonal sections

δ (t) = C (t, t)

and secondary diagonal sections

λ (t) = C (t, 1 - t)

for the Fréchet–Hoeffding bounds copulas W and M and the product copula

Π .

Figure 4. Main diagonal sections

δ (t) = C (t, t)

and secondary diagonal sections

λ (t) = C (t, 1 - t)

for the Fréchet–Hoeffding bounds copulas W and M and the product copula

Π .

Figure 5. D-plot for Example 1 in Section 6.1. (Center): Scatter plot of simulated values. (Top-center): Rank plot. (Top-right,center-right): Main and secondary diagonal sections of the empirical copula. (Bottom-left): Bar chart of the absolute value of the empirical Spearman’s concordance (

| ρ_{n} | = 0.02

) and the empirical Schweizer–Wolff’s dependence (

σ_{n} = 0.49

). If Spearman’s bar is yellow, then the sign of

ρ_{n}

is negative; otherwise, it is positive. (Bottom-center,bottom right): Histogram and box plot of X values. (Center-left,top-left): Histogram and box plot of Y values.

Figure 5. D-plot for Example 1 in Section 6.1. (Center): Scatter plot of simulated values. (Top-center): Rank plot. (Top-right,center-right): Main and secondary diagonal sections of the empirical copula. (Bottom-left): Bar chart of the absolute value of the empirical Spearman’s concordance (

| ρ_{n} | = 0.02

) and the empirical Schweizer–Wolff’s dependence (

σ_{n} = 0.49

). If Spearman’s bar is yellow, then the sign of

ρ_{n}

is negative; otherwise, it is positive. (Bottom-center,bottom right): Histogram and box plot of X values. (Center-left,top-left): Histogram and box plot of Y values.

Figure 6. D-plot for Example 2 in Section 6.2, where the rank plot illustrates the combination of PQD and independence. Thus, it falls into category R5 of Table 1 and Figure 3.

Figure 7. D-plot of the dataset from [47] to analyze the dependence between the “pay ratio” (X) versus “median worker pay” (Y) in top 3000 United States companies.

Figure 8. D-plot for variables “infrared minimum value” (X) and “contrast” (Y) of the “Cloud” dataset [48].

Figure 9. D-plot for variables “infrared minimum value” (X) and “contrast” (Y) of the “Cloud” dataset [48] but conditioning on

X \leq F_{n}^{- 1} (0.8) = 28.36 .

Figure 9. D-plot for variables “infrared minimum value” (X) and “contrast” (Y) of the “Cloud” dataset [48] but conditioning on

X \leq F_{n}^{- 1} (0.8) = 28.36 .

Figure 10. D-plot for variables “infrared minimum value” (X) and “contrast” (Y) of the “Cloud” dataset [48] but conditioning on

X \geq F_{n}^{- 1} (0.8) = 28.36 .

Figure 10. D-plot for variables “infrared minimum value” (X) and “contrast” (Y) of the “Cloud” dataset [48] but conditioning on

X \geq F_{n}^{- 1} (0.8) = 28.36 .

Figure 11. D-plot of a random subsample of size 1,000 from the “Song Features Dataset-Regressing Popularity” dataset in [49] comparing “tempo” (X) and “acousticness” (Y).

Table 1. Guidelines for dependence assessment according to rank plot appearance.

Case	Rank Plot Appearance	Dependence Assessment
R1	Uniform	(close to) Independence
R2	Most points close to v = u	PQD
R3	Most points close to $v = 1 - u$	NQD
R4	Some points close to v = u	Convex linear combination
R4	but some close to $v = 1 - u$	of PQD and NQD
R5	Some points close to v = u	Convex linear combination
R5	but some uniformly distributed	of PQD and independence
R6	Some points close to $v = 1 - u$	Convex linear combination
R6	but some uniformly distributed	of NQD and independence
R7	PQD for $u \leq θ$	Gluing of PQD and NQD
R7	and NQD for $u > θ$	(in that order)
R8	NQD for $u \leq θ$	Gluing of NQD and PQD
R8	and PQD for $u > θ$	(in that order)
R9	Independence for $u \leq θ$	Gluing of independence
R9	and NQD (or PQD) for $u > θ$	and NQD (or PQD)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Erdely, A.; Rubio-Sánchez, M. D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables. Stats 2025, 8, 43. https://doi.org/10.3390/stats8020043

AMA Style

Erdely A, Rubio-Sánchez M. D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables. Stats. 2025; 8(2):43. https://doi.org/10.3390/stats8020043

Chicago/Turabian Style

Erdely, Arturo, and Manuel Rubio-Sánchez. 2025. "D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables" Stats 8, no. 2: 43. https://doi.org/10.3390/stats8020043

APA Style

Erdely, A., & Rubio-Sánchez, M. (2025). D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables. Stats, 8(2), 43. https://doi.org/10.3390/stats8020043

Article Menu

D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables

Abstract

1. Introduction

2. Related Work

3. Effect of Marginal Distributions on Scatter Plots

4. Bivariate Copulas

5. Dependence Types, Measures, and Plots

5.1. Quadrant Dependence

5.2. Dependence Versus Concordance

5.3. Non-Quadrant Dependence

5.3.1. Convex Linear Combinations

5.3.2. Gluing Copulas

5.4. Rank Plots and Empirical Estimation

5.5. Diagonal Sections

6. D-plots

6.1. Example 1: Non-Quadrant Dependence

6.2. Example 2: Positive Quadrant Dependence with Noise

7. Real Data Examples

7.1. Example 3: NQD with Outliers

7.2. Example 4: Gluing PQD and NQD

7.3. Example 5: Apparently Independent Variables

8. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI