A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient

Bulut, Hasan; Zobu, Müjgan; Sağlam, Vedat

doi:10.3390/math14010196

Open AccessArticle

A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient

by

Hasan Bulut

^1,*

,

Müjgan Zobu

² and

Vedat Sağlam

¹

Department of Statistics, Faculty of Science, Ondokuz Mayıs University, Samsun 55139, Türkiye

²

Department of Statistics, Faculty of Science and Letters, Amasya University, Amasya 05100, Türkiye

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(1), 196; https://doi.org/10.3390/math14010196 (registering DOI)

Submission received: 17 November 2025 / Revised: 30 December 2025 / Accepted: 31 December 2025 / Published: 4 January 2026

(This article belongs to the Special Issue Advances in Statistical Approaches with Applications for Multivariate Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

The concordance correlation coefficient (CCC) is a popular measure of agreement between two continuous variables but is highly sensitive to outliers and data contamination. In this study, we propose a robust reformulation of the CCC by replacing classical moment estimators with Minimum Covariance Determinant (MCD) estimators. The proposed robust CCC preserves the interpretability of the classical coefficient while providing substantially improved robustness. Comprehensive Monte Carlo simulations under normal and non-normal distributions, varying sample sizes, correlation levels, and contamination schemes compare the proposed coefficient with the classical CCC and existing robust alternatives. The results show that the proposed robust CCC achieves superior stability and accuracy in contaminated settings while remaining competitive under clean data. Theoretical properties of the estimator are discussed, and its practical usefulness is demonstrated using real glucose measurement and blood pressure data sets. The proposed method is implemented in the MVTests R package, enabling straightforward application to real-world data.

Keywords:

concordance correlation; robust estimation; robust concordance correlation; MCD; MVTests

MSC:

62H12; 62H20; 62G35

1. Introduction

The comparison of quantitative measurements requires assessing the extent to which different methods, devices, or observers can record the same value in an interchangeable manner. For this purpose, the Pearson product–moment correlation coefficient

(ρ)

is frequently used to measure the strength of the linear relationship between two variables. When data points cluster tightly around a straight line, the correlation

ρ

approaches 1. However, Pearson correlation

ρ

does not account for systematic differences (bias) or scale discrepancies. For example, suppose two devices measuring TSH values are highly linearly related such as

y = 2 x + 10

. Even if

ρ

is close to 1, when device-X records a TSH value of 3, device-Y may record 16, leading to severe bias. Thus, a high

ρ

does not guarantee absolute agreement or interchangeability between methods.

To address this limitation, Lin [1] proposed the concordance correlation coefficient (CCC), which simultaneously considers both precision and accuracy. The CCC is defined in Equation (1).

ρ_{C} = 1 - \frac{E [{(X - Y)}^{2}]}{E_{i n d e p} [{(X - Y)}^{2}]} = \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = \frac{2 ρ σ_{X} σ_{Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = ρ C_{b}

(1)

where

C_{b} = \frac{2 σ_{X} σ_{Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}}

,

E_{i n d e p}

is the expected value under the assumption that the variables X and Y are independent. The CCC measures the degree of agreement between two raters around the equality line

(y = x)

in a scatterplot. In Equation (1), the term

{(μ_{X} - μ_{Y})}^{2}

represents a measure of bias between the two raters, while

σ_{X}^{2}

and

σ_{Y}^{2}

express the scale differences. Therefore, the agreement between the two variables is penalized as the bias and scale differences increase. The Pearson correlation coefficient focuses solely on the strength of the linear association between two variables. By contrast, the concordance correlation coefficient assesses the degree to which two measurements are in absolute agreement and therefore interchangeable [1,2].

The key distinction between Pearson correlation and CCC is illustrated in Figure 1. In Figure 1, the red dashed line represents the equality line

(y = x)

, while the blue line corresponds to the OLS regression line. In Figure 1, Pearson correlation becomes high when the points lie close to the blue line. On the other hand, CCC becomes high when they cluster near the red equality line. In scenario (A), both

ρ

and CCC are large since the two methods yield compatible results. However, in scenario (B), method Y consistently records higher values than method X. For this reason,

ρ

yields a high despite systematic bias. Since interchangeability does not hold, the points deviate from the equality line

(y = x)

. Therefore, the CCC is relatively smaller.

Lin [1] further demonstrated that the CCC is asymptotically normally distributed and defined the Z transformation given in Equation (2).

\hat{Z} = \frac{1}{2} l n (\frac{1 + {\hat{ρ}}_{C}}{1 - {\hat{ρ}}_{C}})

(2)

Since its introduction, CCC has attracted substantial attention, particularly in the medical sciences [3,4,5,6,7,8]. Applications often involve the evaluation of agreement between diagnostic markers, measurement devices, or raters. In addition, several methodological developments have been made: Lin [2] discussed the use of CCC in assay validation and provided guidelines for sample size and power analysis; King and Chinchilli [9] generalized CCC to both continuous and categorical data; King, et al. [10] and Carrasco, et al. [11] extended CCC to repeated measurements; Williamson, et al. [12] proposed a resampling-based hypothesis testing approach; Feng, et al. [13] proposed jackknife confidence intervals for

ρ_{C}

, and Feng, et al. [14] compared several confidence intervals.

Nevertheless, the vulnerability of the classical CCC to outliers has been widely emphasized in the literature. Because CCC relies on moment-based estimators of means, variances, and covariances, it becomes unreliable under heavy-tailed distributions, skewed structures, or contamination. To mitigate this problem, King and Chinchilli [15] reformulated Equation (1) as in Equation (3):

ρ_{g} = \frac{[E_{F_{X} F_{Y}} g (X - Y) - E_{F_{X} F_{Y}} g (X + Y)] - [E_{F_{X Y}} g (X - Y) - E_{F_{X Y}} g (X + Y)]}{E_{F_{X} F_{Y}} g (X - Y) - E_{F_{X} F_{Y}} g (X + Y) + \frac{1}{2} E_{F_{X}} g (2 X) + \frac{1}{2} E_{F_{Y}} g (2 Y)}

(3)

where

E_{F_{X} F_{Y}} g (.)

is the expected value of the function g under

F_{X Y} (.) = F_{X} (.) F_{Y} (.)

,

F_{X Y}

is the cumulative distribution function (cdf) of the bivariate population for pairs

(X_{i}, Y_{i})

,

F_{X}

and

F_{Y}

are the marginal distribution functions of X and Y, respectively.

The function

g (z)

must satisfy the following conditions:

$g (0) = 0$ ,
$g (- z) = g (z)$ for all z,
g(z) is nondecreasing function of z for all z ≥ 0.

King and Chinchilli [15] proposed five candidate functions meeting these conditions:

(a): $g_{1} (z) = z^{2}$ for the squared distance function (classical CCC),
(b): $g_{2} (z) = \{\begin{matrix} z^{2}, |z| \leq z_{0} \\ z_{0}^{2}, |z| > z_{0} \end{matrix}$ for Winsorized squared distance function,
(c): $g_{3} (z) = {|z|}^{δ}$ , where $1 \leq δ \leq 2$ , for the absolute distance function,
(d): $g_{4} (z) = \{\begin{matrix} {|z|}^{δ}, |z| \leq z_{0} \\ {|z_{0}|}^{δ}, |z| > z_{0} \end{matrix}$ for Winsorized absolute distance function,
(e): $g_{5} (z) = \{\begin{matrix} z^{2} / 2, |z| \leq z_{0} \\ z_{0} |z| - z_{0}^{2} / 2, |z| > z_{0} \end{matrix}$ for Huber’s function.

Here,

z_{0} > 0

is a constant selected. Function (a) corresponds to the classical CCC, whereas functions (b)–(e) define robust CCC variants. King and Chinchilli [15] showed that these robust CCC estimators achieve smaller errors at moderate contamination levels (10–25%).

Another important robust extension of the concordance correlation coefficient is the robust Bayesian CCC, proposed by Feng, et al. [16]. In this approach, robustness is achieved by modelling the joint distribution of paired measurements using a multivariate Student’s t-distribution instead of a Gaussian distribution, thereby naturally downweighing extreme observations through heavy-tailed behaviour. Under this framework, the CCC is defined as

ρ_{B a y e s} = \frac{2 \sum_{i = 1}^{d - 1} \sum_{j = i + 1}^{d} \frac{ν}{ν - 2} σ_{i j}}{(d - 1) \sum_{i = 1}^{d} \frac{ν}{ν - 2} σ_{i}^{2} + \sum_{i = 1}^{d - 1} \sum_{j = i + 1}^{d} {(μ_{i} - μ_{j})}^{2}}

(4)

where

ν

is the degrees of freedom,

μ_{i}

’s are components of the location vector,

σ_{i}^{2}

’s are diagonal elements of the scale matrix, and

σ_{i j}

’s are off-diagonal elements of the scale matrix. As

ν \to \infty

, the multivariate t-distribution converges to the normal distribution, and the Bayesian CCC reduces to the classical CCC. By embedding the CCC within a Bayesian framework, this method enables posterior inference, credible intervals, and the incorporation of covariates or repeated measurements. Simulation studies and real data applications have shown that the robust Bayesian CCC is less sensitive to outliers and deviations from normality than classical moment-based estimators. Nevertheless, the approach relies on parametric distributional assumptions and Markov chain Monte Carlo estimation, which may increase computational cost and limit scalability in large-sample or high-dimensional settings [16].

In addition to M-estimation–based approaches, Vallejos, et al. [17] proposed an alternative robust formulation of the concordance correlation coefficient based on L1-type loss functions, which replaces the squared distance used in the classical CCC with absolute deviations. The resulting L1-based concordance coefficient, given in Equation (5), measures agreement by normalizing the expected absolute difference between paired observations relative to the case of independence.

ρ_{L 1} = \frac{E (|X_{1} - X_{2}|)}{E (|X_{1} - X_{2}| | σ_{12} = 0)} .

(5)

By relying on absolute differences, the L1-based CCC reduces the influence of extreme observations and heavy-tailed distributions, offering increased robustness compared to moment-based estimators. This formulation is particularly effective in situations where marginal outliers distort second-order moments. However, since robustness is achieved through a univariate distance measure, the L1-based CCC primarily addresses marginal deviations and does not explicitly account for the joint multivariate structure of the data, which may limit its effectiveness under complex or multivariate contamination patterns [17]. In our study, these robust CCC variants are used as benchmarks for comparison with the proposed estimator.

Finally, some studies have also explored the use of CCC in longitudinal data with missing and contaminated observations, combining robust covariance estimators with imputation strategies [18].

In this study, we introduce a robust alternative to the classical CCC that provides reliable estimates in contaminated data without being distorted by outliers. The methodological details of the proposed estimator are presented in Section 2. Section 3 reports an extensive simulation study evaluating its performance, while Section 4 demonstrates its practical utility using a real dataset. Section 5 describes the R functions developed for implementation, and Section 6 concludes with a discussion of the findings.

2. Proposed Robust Concordance Correlation Coefficients (RCCC)

2.1. Motivation

When data contain outliers, the CCC defined in Equation (1) can yield misleading results, primarily because it relies on classical mean, variance, and covariance estimators that are sensitive to contamination. This problem is illustrated in Figure 2. In Figure 2a, the points cluster tightly around the equality line, yielding a CCC of 0.85. However, after adding just a single outlier, the CCC drops dramatically to 0.47, despite the majority of points still lying close to the equality line. This illustrates the lack of robustness of CCC, and underscores the need for a robust concordance correlation coefficient.

Although, King and Chinchilli [15] suggested using alternative functions instead of the squared function to provide resistance against outliers, the performance of their coefficients depend on the choice of function, and this performance varies according to the selection of the constant

z_{0}

. Similarly, L1-based concordance measures was proposed to reduce sensitivity to extreme observations by replacing the quadratic loss with an absolute deviation criterion; however, this comes at the cost of reduced efficiency under clean data and may lead to unstable behavior when the underlying distributions deviate from symmetry [17]. Finally, Feng, Baumgartner and Svetnik [16] suggested Robust Bayesian concordance approaches, on the other hand, achieve resistance to outliers by introducing heavy-tailed distributions and hierarchical modeling assumptions. While effective in many settings, their performance depends on prior specifications, fixed degrees-of-freedom parameters, and computational complexity, which may limit their practical applicability and reproducibility.

These limitations collectively motivate the development of a new robust concordance correlation coefficient that preserves high efficiency under clean data while providing stable and reliable performance in the presence of contamination, without relying on tuning constants, strong distributional assumptions, or intensive computational procedures.

2.2. Definition

Let

(X_{i}, Y_{i}) (i = 1,2, . . ., n)

be pairs drawn from an arbitrary bivariate distribution with joint distribution function

F_{X Y} (x, y)

, with mean vector

μ = {[μ_{X}, μ_{Y}]}^{T}

and covariance matrix

Σ = [\begin{matrix} σ_{X}^{2} & σ_{X Y} \\ σ_{Y X} & σ_{Y}^{2} \end{matrix}]

. The population concordance correlation coefficient (CCC) is defined as

ρ_{C} \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}}

. When classical estimations of these parameters are used in this expression, CCC may be sensitive to contamination. To obtain robust CCC while preserving its form, we propose to replace the classical mean and covariance estimations with their Minimum Covariance Determinant (MCD) ones.

Intuitively, the robustness of the Minimum Covariance Determinant estimator stems from its focus on the most homogeneous subset of the data rather than the full sample. Instead of being influenced by all observations, including extreme or contaminated points, the MCD identifies the subset with the smallest covariance determinant, which corresponds to the core data cloud. As a result, observations lying far from the main data structure receive little or no influence on the estimated location and scatter, making MCD-based estimators naturally resistant to outliers and contamination.

For a trimming level

h \in \{⌈n / 2⌉, . . ., n\}

, consider all index subsets

H

of size

h

. For each

H

, let us define

the subset mean ${\hat{μ}}_{H} = \frac{1}{h} \sum_{i \in H} z_{i}$ with $z ᵢ = {[X ᵢ, Y]}^{T}$ ,
the subset covariance ${\hat{Σ}}_{H} = \frac{1}{(h - 1)} \sum_{i \in H} {[z_{i} - {\hat{μ}}_{H}] [z_{i} - {\hat{μ}}_{H}]}^{T}$ .

where,

{[.]}^{T}

is transpose operator.

The MCD estimator selects the subset

H^{*}

that minimizes det

(S_{H})

over all such

H

. The MCD location and scatter estimators are then

{\hat{μ}}_{M C D} = {\hat{μ}}_{H^{*}}

and

{\hat{Σ}}_{M C D} = {\hat{Σ}}_{H^{*}}

[19].

According to this definition, the MCD estimates of the mean vector and covariance matrix of the bivariate distribution from which the pairs

(X_{i}, Y_{i}) (i = 1,2, . . ., n)

are drawn can then be expressed as in Equation (6):

{\hat{μ}}_{M C D} = [\begin{matrix} {\hat{μ}}_{X . M C D} \\ {\hat{μ}}_{Y . M C D} \end{matrix}], {\hat{Σ}}_{M C D} = [\begin{matrix} {\hat{σ}}_{X . M C D}^{2} & {\hat{σ}}_{X Y . M C D} \\ {\hat{σ}}_{Y X . M C D} & {\hat{σ}}_{Y . M C D}^{2} \end{matrix}]

(6)

Let

α = 1 - h / n

denote the trimming proportion. The breakdown point of MCD estimations equals

α

and attains its maximum near 50% when

h \approx n / 2

, providing high resistance to outliers [20].

As a result, in this study we propose replacing the classical mean, variance, and covariance terms in the concordance correlation coefficient

ρ_{C}

given in Equation (1) with MCD estimators, leading to the robust concordance correlation coefficient

(ρ_{R})

defined in Equation (7):

ρ_{R} = \frac{2 {\hat{σ}}_{X Y . M C D}}{{\hat{σ}}_{X . M C D}^{2} + {\hat{σ}}_{Y . M C D}^{2} + {({\hat{μ}}_{X . M C D} - {\hat{μ}}_{Y . M C D})}^{2}}

(7)

On clean data, the robust and classical summaries coincide and

ρ_{R}

reduces to the usual sample

ρ_{C}

; under contamination,

ρ_{R}

remains stable while retaining the interpretability of

ρ_{C}

. Since

ρ_{R}

is based on MCD estimators with a high breakdown point, it is resistant to outliers. Thus, the proposed coefficient avoids misleading assessments such as the one illustrated in Figure 2b when outliers are present. The performance of the proposed measure is demonstrated through both simulation studies and a real data application.

2.3. Range and Affine Invariance Properties of $ρ_{R}$

This subsection establishes that the proposed robust concordance correlation coefficient

(ρ_{R})

is well-defined under mild conditions and is bounded within the closed interval [−1, 1]. We further characterize equality conditions, degenerate cases (such as zero variance), and invariance properties that are relevant for practical applications.

Let us define the denominator

D = {\hat{σ}}_{X . M C D}^{2} + {\hat{σ}}_{Y . M C D}^{2} + {({\hat{μ}}_{X . M C D} - {\hat{μ}}_{Y . M C D})}^{2} \geq 0

and the proposed coefficient

ρ_{R} = 2 {\hat{σ}}_{X Y . M C D} / D

.

Lemma 1.

The robust CCC

(ρ_{R})

lies in the closed interval [−1, 1]:

- 1 \leq ρ_{R} \leq 1

(8)

Proof of Lemma 1.

By Cauchy–Schwarz (C-S) inequality, we can write

|{\hat{σ}}_{X Y . M C D}| \leq \sqrt{{\hat{σ}}_{X . M C D}^{2} \times {\hat{σ}}_{Y . M C D}^{2}}

. Moreover, by the Arithmetic Mean–Geometric Mean (AM-GM) inequality, we know

\sqrt{a b} \leq (a + b) / 2

for any

a, b \geq 0

. Combining these properties, we obtain

|{\hat{σ}}_{X Y . M C D}| \leq \sqrt{{\hat{σ}}_{X . M C D}^{2} \times {\hat{σ}}_{Y . M C D}^{2}} \leq \frac{{\hat{σ}}_{X . M C D}^{2} + {\hat{σ}}_{Y . M C D}^{2}}{2} \leq \frac{{\hat{σ}}_{X . M C D}^{2} + {\hat{σ}}_{Y . M C D}^{2}}{2}

Because

{({\hat{μ}}_{X . M C D} - {\hat{μ}}_{Y . M C D})}^{2} \geq 0

, we can write;

|{\hat{σ}}_{X Y . M C D}| \leq \frac{{\hat{σ}}_{X . M C D}^{2} + {\hat{σ}}_{Y . M C D}^{2} + {({\hat{μ}}_{X . M C D} - {\hat{μ}}_{Y . M C D})}^{2}}{2} = \frac{D}{2}

Therefore,

|{\hat{σ}}_{X Y . M C D}| \leq \frac{D}{2} .

If we multiply the last inequality by 2 and divide by

D

:

|ρ_{R}| = \frac{2 |{\hat{σ}}_{X Y . M C D}|}{D} \leq 1

then

- 1 \leq ρ_{R} \leq 1

□

If

{\hat{μ}}_{X . M C D} = {\hat{μ}}_{Y . M C D}

and

{\hat{σ}}_{X . M C D}^{2} = {\hat{σ}}_{Y . M C D}^{2} = |{\hat{σ}}_{X Y . M C D}|

, then

|ρ_{R}| = 1

. In finite samples these conditions are rarely met exactly; hence

|ρ_{R}| < 1

typically holds.

The denominator D is always nonnegative by construction. If both robust marginal variances

{\hat{σ}}_{X . M C D}^{2} = {\hat{σ}}_{Y . M C D}^{2} = 0

and robust means

{\hat{μ}}_{X . M C D} = {\hat{μ}}_{Y . M C D} = 0

, then

D = 0

, corresponding to a degenerate sample with no variation and perfect equality,

ρ_{R}

is indeterminate

(0 / 0)

. In practice, this occurs only if X and

Y

are both constant (after robust trimming). If exactly one variance is zero, then

{\hat{σ}}_{X Y . M C D} = 0

as well (since covariance with a constant is zero), whence

{\hat{ρ}}_{R} = 0

. Thus, outside fully degenerate scenarios

(D = 0)

,

ρ_{R}

is well-defined and bounded.

Because the MCD estimator is affine equivariant, the proposed

ρ_{R}

is affine invariant. In particular, translating

X

and

Y

by constants or multiplying both by the same positive factor leaves

{\hat{ρ}}_{R}

unchanged. Opposite sign rescalings flip the sign consistently with correlation. These properties parallel those of the classical

ρ_{C}

.

Taken together, these results ensure that

ρ_{R}

is well-defined outside fully degenerate samples and bounded in [−1, 1]. Moreover, because the MCD location and scatter estimators are affine equivariant, the resulting concordance measure

ρ_{R}

is affine invariant: under common affine transformations applied to both variables (e.g., joint translations and positive rescalings), its value remains unchanged (with sign changes behaving consistently under sign reversals). In addition, since MCD attains a high breakdown point for location and scatter,

ρ_{R}

inherits strong stability under substantial contamination through its robust inputs. These theoretical properties underpin the robust performance patterns observed in the simulation study.

3. Simulation Study

In this study, a comprehensive Monte Carlo simulation was conducted to compare the performance of the concordance correlation coefficient (CCC) proposed by Lin [1], the robust CCC estimators based on alternative

g (.)

functions [15], Bayes approach [16], L1 estimations [17], and the proposed robust CCC (rCCC) based on Minimum Covariance Determinant (MCD) estimators. The simulation was carried out for three different population correlation coefficients

(ρ = 0.45, 0.6, 0.9)

and five different sample sizes

(n = 30, 50, 100, 200, 500)

in each scenario. Assuming that the variables X and Y follow a bivariate distribution, data were generated under clean conditions using five different distributional/design combinations.

Design A: Random observations were drawn from the bivariate normal distribution $[\begin{matrix} X \\ Y \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}])$ . Here, the two variables have identical locations and scales.
Design B: Random observations were drawn from the bivariate normal distribution $[\begin{matrix} X \\ Y \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & 2 ρ \\ 2 ρ & 4 \end{matrix}])$ . Here, the two variables have identical locations but different scales.
Design C: Random observations were drawn from the bivariate normal distribution $[\begin{matrix} X \\ Y \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 2 \end{matrix}], [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}])$ . Here, the two variables have identical scales but different locations.
Design D: In this design, random observations are initially generated from Design A and subsequently subjected to an exponential transformation, yielding variables with a log-normal distribution.
Design E: Random observations were drawn from a multivariate t distribution with degrees of freedom $d f = 5$ and covariance matrix $[\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}]$

Designs D and E were chosen to examine the performance of the coefficients under skewed and heavy-tailed distributions.

In each simulation design, the population value of the CCC

(ρ_{C})

defined by Lin [1] is expressed in terms of the parameter

ρ

, which was used as the measure of association in the simulation, as given in Table 1 (full derivations in Appendix A):

For each design, the value of the parameter

ρ_{C}

was calculated based on the population correlation values

ρ (0.45, 0.6, 0.9)

, and this value was taken as the ‘true’ concordance correlation coefficient. Subsequently, from the corresponding populations, 3000 random samples of sizes

n

(30, 50, 100, 200, 500)

were drawn for each scenario and combination, and eight different estimators

({\hat{ρ}}_{C}, {\hat{ρ}}_{R}, {\hat{ρ}}_{g_{2}}, {\hat{ρ}}_{g_{3}}, {\hat{ρ}}_{g_{4}}, {\hat{ρ}}_{g_{5}}, {\hat{ρ}}_{L 1}, {\hat{ρ}}_{B a y e s})

were computed. The performance of these estimators was evaluated using the root mean square error (RMSE) values, calculated as in Equation (9).

R M S E = \sqrt{\frac{1}{3000} \sum_{i = 1}^{3000} {(ρ_{C} - {\hat{ρ}}_{i})}^{2}}

(9)

Here, instead of

{\hat{ρ}}_{i}

, one of the estimators

({\hat{ρ}}_{C}, {\hat{ρ}}_{R}, {\hat{ρ}}_{g_{2}}, {\hat{ρ}}_{g_{3}}, {\hat{ρ}}_{g_{4}}, {\hat{ρ}}_{g_{5}}, {\hat{ρ}}_{L 1}, {\hat{ρ}}_{B a y e s})

is used, and the RMSE of the selected estimator is calculated according to Equation (9).

3.1. Simulation Study for Clean Data

In this subsection, we randomly generate data for five different designs as previously described, and the RMSE values were computed for each case as given in Equation (9). For visualization purposes, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 report the corresponding

l o g (R M S E)

values to facilitate a clearer comparison across estimators with different error scales, whereas the exact RMSE values are provided in the Supplementary File Tables S1–S5.

According to Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, several consistent patterns emerge across all designs under clean data conditions. First, for all estimators, the error levels decrease monotonically as the sample size increases from 30 to 500, reflecting the expected consistency behaviour. This pattern is observed uniformly across all correlation levels and designs.

Second, stronger population correlations are associated with smaller error values. In all designs, the estimators exhibit noticeably lower

l o g (R M S E)

when the true correlation is high

(ρ = 0.90)

compared to weaker association levels

(ρ = 0.45)

, indicating improved agreement estimation as the underlying concordance increases.

Third, across all five designs, the proposed rCCC performs almost indistinguishably from the classical CCC in the absence of contamination. Their curves largely overlap in the graphical results, demonstrating that the proposed robust estimator does not suffer from efficiency loss under clean conditions. This behaviour holds not only under the standard bivariate normal setting, but also for skewed and heavy-tailed yet uncontaminated designs.

Fourth, the alternative robust concordance measures, including the g2–g5 variants of King and Chinchilli [15] as well as Bayes approach [16], L1 estimations [17], generally yield larger error levels in clean data. While some of these methods occasionally approach the performance of CCC and rCCC at large sample sizes or high correlations, they do not consistently outperform them and often exhibit reduced efficiency, particularly at lower correlations.

Finally, departures from normality alone—such as skewness or heavy tails without explicit outliers—do not alter the overall conclusions. In all clean designs, the classical CCC and the proposed rCCC remain the most efficient estimators, whereas the alternative robust methods provide no clear advantage in the absence of contamination. Overall, the clean-data simulations confirm that rCCC preserves the efficiency of CCC while retaining its robustness-oriented formulation.

3.2. Simulation Study for Unidirectional Contaminated Data

In this subsection, we investigated the performance of all concordance estimators under unidirectional contamination, where outliers affect only one of the variables. During the contamination process, random data were first generated cleanly according to the five different designs described in Section 3.1. Then, the X-values of the last

h = ⌊n \times ε⌋

observations in the dataset were multiplied by 10 to introduce contamination, where

n

denotes the sample size and

ε

the contamination rate (

ε = 0.1, 0.25

). The resulting RMSE values are reported in the Supplementary File Tables S6–S10, while graphical summaries are presented in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12.

According to these results, the classical CCC is highly sensitive to unidirectional contamination. As illustrated in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12, its

l o g (R M S E)

values increase sharply once contamination is introduced. This deterioration is visible across all designs and becomes more pronounced as the contamination rate increases from 10% to 25%. Notably, increasing the sample size does not mitigate this effect, indicating that the classical CCC remains vulnerable even in large samples.

In contrast, the proposed MCD-based rCCC shows strong robustness. In Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12, the rCCC curves remain relatively flat across different sample sizes and contamination levels. This pattern is consistent for all population concordance levels and across different distributional designs. The graphical results clearly demonstrate that the rCCC is largely unaffected by one-sided contamination.

The robust CCC variants proposed by King and Chinchilli (g2–g5) exhibit mixed behaviour. In the figures, some variants show partial improvement over the classical CCC, particularly at lower contamination levels. However, their

l o g (R M S E)

curves display noticeable variability across designs and correlation levels. In several scenarios, these estimators show increasing error as contamination intensifies, and none achieves the consistent stability observed for the rCCC.

The L1-based concordance coefficient provides improved robustness relative to the classical CCC. As seen in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12, its error curves lie below those of the classical CCC in most scenarios. This improvement reflects the reduced sensitivity of absolute deviations to extreme observations. Nevertheless, the L1-based estimator still exhibits visible performance degradation as contamination increases, especially at higher correlation levels, suggesting limited protection against distortions in the joint dependence structure.

Similarly, the robust Bayesian CCC improves resistance to outliers through its heavy-tailed modelling framework. In the graphical results, it generally outperforms the classical CCC and shows smoother behaviour across sample sizes. However, its

l o g (R M S E)

curves remain above those of the rCCC in most settings, and its performance becomes less stable under stronger contamination.

As expected, increasing the sample size leads to lower error levels for all estimators. This trend is clearly visible in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12. However, the relative ranking of the methods remains unchanged. Estimators that are sensitive to unidirectional contamination continue to exhibit higher error levels even for large samples.

Overall, the graphical evidence confirms that while several robust approaches—including L1-based and Bayesian CCCs—provide meaningful improvements over the classical CCC, the proposed MCD-based rCCC consistently delivers the most stable and reliable performance across all designs, contamination levels, and sample sizes.

3.3. Simulation Study for Bidirectional Contaminated Data

In order to investigate the behaviour of the competing estimators under more challenging contamination mechanisms, we consider a bidirectional contamination scheme, where entire observations are simultaneously corrupted in both variables. Let

Z_{i} = (X_{i}, Y_{i})^{⊤}, i = 1, \dots, n

(10)

denote a bivariate sample generated from the clean data-generating mechanism described in Section 3.1. For a given contamination rate

ε \in \{0.10, 0.25\}

, we randomly select

m = ⌈ ε n ⌉

observations and replace them with outlying values. The contaminated observations are generated independently from a bivariate normal distribution with shifted location and inflated dispersion,

Z_{i}^{(o u t)} \sim N_{2} ((\begin{matrix} 8 \\ - 8 \end{matrix}), (\begin{matrix} 9 & 0 \\ 0 & 9 \end{matrix}))

(11)

which induces extreme values in both components. This construction produces rowwise contamination and naturally introduces leverage-type outliers, known to severely affect classical covariance- and concordance-based estimators.

For non-Gaussian designs, the same transformation applied to the clean data is also applied to the contaminated observations. In particular, for the log-normal design, the contaminated values are transformed as

Z_{i}^{(o u t)} \leftarrow e x p (\frac{Z_{i}^{(o u t)}}{2})

(12)

ensuring consistency with the underlying data-generating process.

This bidirectional contamination scheme represents a more severe and realistic departure from the ideal model than unidirectional contamination, as entire observations rather than individual variables are affected. Monte Carlo replications are then conducted for each combination of design, sample size, contamination level, and population concordance. The resulting RMSE values are reported in Supplementary File Tables S11–S15, while graphical summaries are presented in Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17.

Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17 present the log(RMSE) values of all estimators under bidirectional contamination, where both variables are simultaneously affected by outliers. The results are shown for two contamination levels

ε = 10 %

and

ε = 25 %

and three population correlation values

(0.45, 0.6, 0.9)

, across the five designs.

Several clear patterns emerge from these figures. First, in contrast to the clean data case, the classical CCC exhibits a pronounced deterioration under bidirectional contamination. Its error levels remain high and show little improvement with increasing sample size, particularly at moderate contamination levels, highlighting its sensitivity to leverage-type outliers.

Second, the proposed rCCC consistently demonstrates the strongest robustness across all designs, contamination levels, and correlation strengths. In both the 10% and 25% contamination settings, rCCC achieves substantially lower error values than all competing methods, and its

l o g (R M S E)

decreases steadily as the sample size increases. This behavior is especially pronounced at higher correlations

ρ = 0.9

, where rCCC clearly separates from the other estimators.

Third, the alternative robust concordance measures based on the g2–g5 functions show mixed performance. While these methods offer some protection against contamination compared to the classical CCC, their error levels remain noticeably higher than those of rCCC, particularly under heavier contamination. Among them, g4 and g5 tend to be more stable than g2 and g3, yet none consistently match the robustness-efficiency balance achieved by rCCC.

Fourth, the L1-based and Bayesian estimators exhibit improved robustness relative to the classical CCC, especially under moderate contamination. However, their performance remains inferior to rCCC across most scenarios. In particular, their error curves often flatten as the sample size increases, indicating limited gains from additional observations under severe contamination.

Finally, increasing the contamination level from 10% to 25% amplifies the differences between the estimators. While the performance of CCC, g-based, L1, and Bayesian methods deteriorates substantially, the proposed rCCC maintains stable and comparatively low error levels. This robustness persists across all five designs, including skewed and heavy-tailed distributions, confirming that the proposed method effectively handles bidirectional and leverage-type contamination.

Overall, the bidirectional contamination results reinforce the advantages of the proposed rCCC. Unlike the clean data case—where rCCC matches the efficiency of CCC—under contaminated settings rCCC clearly outperforms all competing estimators, providing reliable agreement assessment even in the presence of severe and symmetric outliers.

To enhance transparency and reproducibility, we have made a representative sample of the simulation framework publicly available. Specifically, a fully executable example corresponding to Design A, including clean, unidirectional, and bidirectional contamination scenarios, is provided in a public GitHub repository (version 1) [21]. This example illustrates the data generation mechanism, contamination schemes, and performance evaluation based on RMSE with respect to the true population correlation, consistent with the simulation methodology described in this section. The complete large-scale simulation study reported in the paper involves additional designs and extensive repetitions, and is therefore not fully included.

4. Real Example

In order to demonstrate the practical performance of the proposed robust concordance correlation coefficient

ρ_{R}

, we considered two real data applications from different fields: a glucose measurement dataset [22] and a blood pressure dataset [23].

4.1. Glucose Data Example

In this subsection, we use the well-known glucose dataset from the R package MethComp (version 1.30.2) [22]. This dataset contains glucose measurements obtained from serum and plasma samples of the same individuals at different time points. Following the approach of King and Chinchilli [15], we focused on the baseline measurements (time = 0) and compared plasma versus serum readings.

First, we computed the correlation coefficients using the clean dataset, in which the plasma–serum relationship is strongly linear and free of obvious outliers. Subsequently, we introduced an artificial outlier (plasma = 15, serum = 2) to deliberately disrupt the concordance between the two methods. This allowed us to investigate how sensitive each coefficient is to contamination. All results are summarized in the Glucose Data section of Table 2.

As shown in Table 2, the glucose data example clearly demonstrates the sensitivity of different concordance measures to contamination. While all coefficients indicate strong agreement between plasma and serum measurements under clean conditions, the introduction of a single influential outlier leads to substantial deterioration in several methods. Pearson’s correlation and the classical concordance correlation coefficient collapse to negative values, indicating a complete breakdown of agreement. The robust variants proposed by King and Chinchilli exhibit inconsistent behavior, with some estimators decreasing sharply and others increasing, reflecting instability in the presence of leverage points. The Bayesian CCC and the L1-based CCC show partial robustness: although they remain positive after contamination, both experience noticeable reductions compared to their clean-data values, indicating sensitivity to extreme observations. In contrast, the proposed robust concordance correlation coefficient

ρ_{R}

remains entirely unchanged under contamination, highlighting its strong resistance to outliers while maintaining efficiency in clean data settings.

4.2. Blood Pressure Data Example

In this subsection, we use the blood pressure dataset (bpres) from the R package cccrm [23], which contains systolic and diastolic blood pressure measurements collected from 384 subjects using two instruments: a manual (mercury sphygmomanometer) device and an automatic device. Each subject was measured twice by each method, yielding replicated paired observations. For comparability with the glucose example, we focus on systolic blood pressure (SIS) and construct one paired value per subject and method by taking the median across replicates, resulting in a clean paired sample (manual vs. automatic) without missing values.

First, we computed the competing concordance measures using the clean paired SIS data, where the agreement between the two instruments is high and no extreme discordant points are evident. Subsequently, to assess robustness, we introduced a deliberately discordant contamination by appending an artificial leverage-type observation designed to disrupt agreement (i.e., one method taking an unusually high systolic value while the other is set to an unusually low value), thereby creating a severe outlier in the method-comparison setting. All results are summarized in the Blood Pressure Data section of Table 2.

As shown in Table 2, the blood pressure example confirms that several coefficients are highly sensitive to a single extreme discordant point: Pearson’s correlation and the classical CCC can flip sign and become strongly negative, indicating that agreement can be misrepresented under contamination. The robust variants of King and Chinchilli again behave inconsistently across contamination, with some remaining positive while others deteriorate substantially or change sign, reflecting method-dependent instability when leverage-type outliers are present. The Bayesian CCC and the L1-based CCC provide only partial protection: both drop markedly and may also change sign under severe contamination, suggesting that they are not fully resistant to extreme discordant observations. In contrast, the proposed robust concordance correlation coefficient

ρ_{R}

remains essentially unchanged, reinforcing its strong robustness to leverage-type contamination while preserving high agreement in the uncontaminated setting.

These real examples provide strong empirical evidence that

ρ_{R}

is more reliable than the classical

ρ_{C}

and other robust alternatives when outliers are present.

5. Software Availability

We have implemented two functions,

c c c ()

and

r c c c ()

, in the R package

M V T e s t s

[24] to compute the classical concordance correlation coefficient

(ρ_{C})

introduced by Lin [1] and the proposed robust concordance correlation coefficient

(ρ_{R})

. The

c c c ()

function requires two arguments,

x

and

y

, which are numeric vectors containing the observed values of the two variables to be compared. The

r c c c ()

function extends this by including an additional argument, alpha, which controls the robustness of the Minimum Covariance Determinant (MCD) estimator. Specifically, alpha is a trimming parameter that can take values between 0.5 and 1 (default: alpha = 0.75), reflecting the proportion of uncontaminated data assumed in the analysis. A lower alpha increases robustness against outliers.

With these functions, researchers can easily apply both the classical and robust concordance correlation coefficients to real datasets. The robust version (rccc) is particularly useful when the data are contaminated by outliers, as it yields more reliable estimates of agreement. The MVTests package (version 2.3.1) can be installed directly from GitHub using the following code:

install.packages("devtools")

devtools::install_github("hsnbulut/MVTests")

In addition, all R functions and analysis scripts used to generate the results presented in this paper, including the real-data applications, are publicly available in a dedicated GitHub repository at: https://github.com/hsnbulut/rccc (accessed on 30 December 2025).

6. Conclusions

This paper introduced the robust concordance correlation coefficient (rCCC), which replaces the classical mean, variance, and covariance estimators in the CCC formula with robust alternatives derived from the Minimum Covariance Determinant (MCD). The proposed approach directly addresses the well-documented sensitivity of the classical CCC to outliers and skewed distributions.

The findings from an extensive Monte Carlo simulation study across five different designs provide compelling evidence of the advantages of rCCC. In clean data scenarios, rCCC behaves almost identically to the classical CCC, showing that the introduction of robustness does not cause efficiency loss. In contaminated scenarios with 10% and 25% outliers, rCCC consistently achieved the lowest RMSE values among all competitors, including the classical CCC and the King–Chinchilli’s robust variants. This stability was observed across varying sample sizes, correlation levels, and distributional forms, confirming the general applicability of the proposed method.

The observed variability in the performance of the King–Chinchilli variants across different simulation settings can be attributed to their reliance on moment-based components and weighting schemes. In particular, these approaches combine measures of precision and accuracy through variance- and covariance-driven adjustments, which may become unstable under contamination or distributional asymmetry. As a result, small changes in variance structure or the presence of outliers can disproportionately influence the estimated concordance, leading to performance that is highly scenario-dependent.

The real data example using plasma and serum glucose measurements offered further support. When a single artificial outlier was introduced, both Pearson’s correlation and the classical CCC collapsed, even yielding negative estimates of agreement. In contrast, rCCC remained virtually unchanged, providing a realistic and reliable measure of concordance.

Beyond its theoretical and empirical contributions, the practical relevance of rCCC has been enhanced by its implementation in the R package MVTests. Researchers can now easily compute both classical and robust CCCs using simple functions, with the trimming parameter α offering additional control over robustness.

Taken together, this study demonstrates that rCCC is both a robust and efficient estimator. It provides reliable results under clean data and substantial protection against outliers under contaminated data. Therefore, rCCC should be considered a strong alternative to the classical CCC in biomedical, clinical, and other applied research settings where measurement agreement is of critical importance.

Potential directions for future research include extending the proposed robust concordance correlation coefficient to repeated measures and longitudinal settings, where within-subject dependence must be explicitly accounted for. Another promising avenue is the development of multivariate or high-dimensional versions of the robust CCC, enabling agreement assessment among more than two measurement methods. Such extensions would further broaden the applicability of the proposed approach in complex real-world data structures.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math14010196/s1, Table S1: RMSE values for Design-A in clean data; Table S2: RMSE values for Design-B in clean data; Table S3: RMSE values for Design-C in clean data; Table S4: RMSE values for Design-D in clean data; Table S5: RMSE values for Design-E in clean data; Table S6: RMSE values for Design-A in unidirectional contaminated data; Table S7: RMSE values for Design-B in unidirectional contaminated data; Table S8: RMSE values for Design-C in unidirectional contaminated data; Table S9: RMSE values for Design-D in unidirectional contaminated data; Table S10: RMSE values for Design-E in unidirectional contaminated data; Table S11: RMSE values for Design-A in bidirectional contaminated data; Table S12: RMSE values for Design-B in bidirectional contaminated data; Table S13: RMSE values for Design-C in bidirectional contaminated data; Table S14: RMSE values for Design-D in bidirectional contaminated data; Table S15: RMSE values for Design-E in bidirectional contaminated data.

Author Contributions

Conceptualization, H.B.; methodology, H.B. and V.S.; software, H.B.; validation, H.B.; formal analysis, H.B. and V.S.; investigation, H.B. and M.Z.; resources, H.B. and M.Z.; data curation, H.B. and M.Z.; writing—original draft preparation, H.B. and M.Z.; writing—review and editing, H.B., V.S. and M.Z.; visualization, H.B.; supervision, H.B.; project administration, H.B.; funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the Supplementary Materials. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this study, the authors used ChatGPT (version 5.1) for the purposes of proofreading. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CCC	Concordance Correlation Coefficient
rCCC	Robust Concordance Correlation Coefficient
MCD	Minimum Covariance Determinant

Appendix A. Derivations of $ρ_{C}$ for Designs A–E

Design A: Since $[\begin{matrix} X \\ Y \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}]),$ it follows that

ρ_{C} = \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = \frac{2 ρ}{1 + 1 + {(0 - 0)}^{2}} = ρ .

Design B: Since $[\begin{matrix} X \\ Y \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & 2 ρ \\ 2 ρ & 4 \end{matrix}])$ , it follows that

ρ_{C} = \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = \frac{2 (2 ρ)}{1 + 4 + {(0 - 0)}^{2}} = 0.8 ρ .

Design C: Since $[\begin{matrix} X \\ Y \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 2 \end{matrix}], [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}])$ , it follows that;

ρ_{C} = \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = \frac{2 ρ}{1 + 1 + {(0 - 2)}^{2}} = ρ / 3 .

Design D: First, let the variables $U$ and $V$ be generated from a normal distribution as in Design A. Accordingly, the random variables $U$ and $V$ are drawn from bivariate normal distribution $[\begin{matrix} U \\ V \end{matrix}] ~ N_{2} ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}])$ . Then transformations $X = e x p (U)$ and $Y = e x p (V)$ are applied. Since $U$ and $V$ follow the standard normal distribution, the transformed variables $X$ and $Y$ follow the log-normal distribution. Accordingly, letting $μ_{U} = 0$ and $σ_{U}^{2} = 1$ , the expectation and variance of the random variable $X$ can be written as:

E (X) = e x p (μ_{U} + σ_{U}^{2} / 2) = e x p (0 + 1 / 2) = \sqrt{e}

V (X) = e x p (2 μ_{U} + σ_{U}^{2}) (e x p (σ_{U}^{2}) - 1) = e x p (2 \times 0 + 1) (e x p (1) - 1) = e (e - 1) = e^{2} - e

Similarly, letting $μ_{V} = 0$ and $σ_{V}^{2} = 1$ , the expectation and variance of the random variable $Y$ can be written as

E (Y) = e x p (μ_{V} + σ_{V}^{2} / 2) = e x p (0 + 1 / 2) = \sqrt{e}

V (Y) = e x p (2 μ_{V} + σ_{V}^{2}) (e x p (σ_{V}^{2}) - 1) = e x p (2 \times 0 + 1) (e x p (1) - 1) = e (e - 1) = e^{2} - e

To calculate the covariance between $X$ and $Y$ , we first need to compute the expected value $E (X Y)$ :

E (X Y) = E (e^{U} e^{V}) = E (e^{U + V})

Since $U$ and $V$ follow the standard normal distribution with covariance $ρ$ , the sum $U + V$ is also normally distributed. The expected value and variance of $U + V$ are computed as follows:

E (U + V) = E (U) + E (V) = 0 + 0 = 0

V (U + V) = V (U) + V a r (V) + 2 C o v (U, V) = 1 + 1 + 2 ρ = 2 + 2 ρ

In summary,

(U + V) ~ N (0,2 + 2 ρ)

By definition, the variable $e^{U + V}$ follows a log-normal distribution, and its expectation gives $E (e^{U + V})$ :

E (X Y) = E (e^{U + V}) = e x p (μ_{(U + V)} + σ_{(U + V)}^{2} / 2) = e x p (0 + (2 + 2 ρ) / 2) = e x p (1 + ρ)

Thus, the covariance between $X$ and $Y$ can be calculated as:

C o v (X, Y) = E (X Y) - E (X) E (Y) = e x p (1 + ρ) - \sqrt{e} \times \sqrt{e} = e x p (1 + ρ) - e = e (e^{ρ} - 1)

Finally, for Design D, the population value of the concordance correlation coefficient $ρ_{C}$ can be obtained in terms of $ρ$ as below:

ρ_{C} = \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = \frac{2 e (e^{ρ} - 1)}{(e^{2} - e) + (e^{2} - e) + {(\sqrt{e} - \sqrt{e})}^{2}} = \frac{2 e (e^{ρ} - 1)}{2 e (e - 1)} = \frac{(e^{ρ} - 1)}{(e - 1)} .

Design E: An important property of the multivariate t-distribution is that, when $d f > 1$ , the mean (expected value) is zero. Since the degrees of freedom are $d f = 5$ , this condition is satisfied. Accordingly, we can write $E (X) = E (Y) = 0$ .

The variance matrix of the multivariate t-distribution is obtained by multiplying the covariance matrix by $d f / (2 - d f)$ , provided that $d f > 2$ . Therefore,

V a r i a n c e M a t r i s i = \frac{5}{5 - 2} [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}] = [\begin{matrix} 5 / 3 & 5 ρ / 3 \\ 5 ρ / 3 & 5 / 3 \end{matrix}]

According to the variance matrix of the multivariate t-distribution, we can write

V (X) = V (Y) = 5 / 3

C o v (X, Y) = 5 ρ / 3

As a result, for Design E, the population value of the concordance correlation coefficient $ρ_{C}$ can be obtained in terms of $ρ$ as follows:

ρ_{C} = \frac{2 σ_{X Y}}{σ_{X}^{2} + σ_{Y}^{2} + {(μ_{X} - μ_{Y})}^{2}} = \frac{2 (5 ρ / 3)}{(5 / 3) + (5 / 3) + {(0 - 0)}^{2}} = \frac{(10 / 3) ρ}{10 / 3} = ρ .

References

Lin, L.I. A Concordance Correlation-Coefficient to Evaluate Reproducibility. Biometrics 1989, 45, 255–268. [Google Scholar] [CrossRef]
Lin, L.I.K. Assay Validation Using the Concordance Correlation-Coefficient. Biometrics 1992, 48, 599–604. [Google Scholar] [CrossRef]
Abdelgani, S.; Nadeem, R.; Fallouh, Y.; Leff, P.; Patel, A.; Chughtai, M.; Hemant, S.; Patel, S.; Dharia, A.; Ameneni, G.; et al. Assessment of the correlation and concordance between ELF and VCTE in the AASLD algorithm for MASLD severity assessment. J. Hepatol. 2024, 80, S547–S548. [Google Scholar] [CrossRef]
Belenguer-Muncharaz, A.; Bernal-Julián, F.; Hernández-Garcés, H.; Hermosilla-Semikina, I.; Rodriguez, L.T.; Marco, C.V. Correlation and concordance of SaO₂/FiO₂ and paO₂/FiO₂ ratios in patients with COVID-19 pneumonia who received non-invasive ventilation in two intensive care units. Med. Intensiva 2024, 48, 298–300. [Google Scholar] [CrossRef] [PubMed]
Belenguer-Muncharaz, A.; Hermosilla-Semikina, I.; Bernal-Julián, F.; Hernández-Garcés, H.; Tormo-Rodriguez, L.; Granero-Gasamans, E. Correlation and concordance of HACOR and IROX scales in patients with COVID-19 pneumonia who received non-invasive ventilation in two intensive care units. Med. Intensiva 2025, 49, 177–180. [Google Scholar] [CrossRef] [PubMed]
De Keukeleire, S.J.; Vermassen, T.; Deron, P.; Huvenne, W.; Duprez, F.; Creytens, D.; Van Dorpe, J.; Ferdinande, L.; Rottey, S. Concordance, Correlation, and Clinical Impact of Standardized PD-L1 and TIL Scoring in SCCHN. Cancers 2022, 14, 2431. [Google Scholar] [CrossRef] [PubMed]
Meredith, L.R.; Kirkland, A.E.; Green, R.; Tomko, R.L.; Browning, B.D.; Gray, K.M.; Anton, R.F.; Miranda, R., Jr.; Squeglia, L.M. Strong concordance and correlation between direct alcohol biomarkers and alcohol self-report among youth with variable drinking patterns. Drug Alcohol Depend. 2025, 274, 112734. [Google Scholar] [CrossRef] [PubMed]
Uras, C.; Tsoulos, N.; Giannoulakis, S.; Kapetsis, G.; Ergoren, M.C.; Olgun, P.; Diker, O.; Karanlik, H.; Bilir, C.; Arslan, C.; et al. Examining the Correlation and Concordance Between Ki-67 Expression and Oncotype DX Recurrence Score: Findings from a Multicenter Study in Turkey. Breast 2025, 80, 103928. [Google Scholar] [CrossRef]
King, T.S.; Chinchilli, V.M. A generalized concordance correlation coefficient for continuous and categorical data. Stat. Med. 2001, 20, 2131–2147. [Google Scholar] [CrossRef] [PubMed]
King, T.S.; Chinchilli, V.M.; Carrasco, J.L. A repeated measures concordance correlation coefficient. Stat. Med. 2007, 26, 3095–3113. [Google Scholar] [CrossRef] [PubMed]
Carrasco, J.L.; King, T.S.; Chinchilli, V.M. The Concordance Correlation Coefficient for Repeated Measures Estimated by Variance Components. J. Biopharm. Stat. 2009, 19, 90–105. [Google Scholar] [CrossRef] [PubMed]
Williamson, J.M.; Crawford, S.B.; Lin, H.M. Resampling dependent concordance correlation coefficients. J. Biopharm. Stat. 2007, 17, 685–696. [Google Scholar] [CrossRef] [PubMed]
Feng, D.; Baumgartner, R.; Svetnik, V. A short note on jackknifing the concordance correlation coefficient. Stat. Med. 2014, 33, 514–516. [Google Scholar] [CrossRef] [PubMed]
Feng, D.; Svetnik, V.; Coimbra, A.; Baumgartner, R. A Comparison of Confidence Interval Methods for the Concordance Correlation Coefficient and Intraclass Correlation Coefficient with Small Number of Raters. J. Biopharm. Stat. 2014, 24, 272–293. [Google Scholar] [CrossRef] [PubMed]
King, T.S.; Chinchilli, V.M. Robust estimators of the concordance correlation coefficient. J. Biopharm. Stat. 2001, 11, 83–105. [Google Scholar] [CrossRef] [PubMed]
Feng, D.; Baumgartner, R.; Svetnik, V. A robust Bayesian estimate of the concordance correlation coefficient. J. Biopharm. Stat. 2015, 25, 490–507. [Google Scholar] [CrossRef] [PubMed]
Vallejos, R.; Osorio, F.; Ferrer, C. A new coefficient to measure agreement between continuous variables. arXiv 2025, arXiv:2507.07913. [Google Scholar] [CrossRef]
Tsai, M.Y. A comparison of robust and weighted multiple imputation approaches for estimating concordance correlation coefficients in longitudinal normal data. Commun. Stat.-Simul. Comput. 2025, 1–20. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Multivariate estimation with high breakdown point. Math. Stat. Appl. 1985, 8, 283–297. [Google Scholar]
Rousseeuw, P.J.; Driessen, K.V. A fast algorithm for the minimum covariance determinant estimator. Technometrics 1999, 41, 212–223. [Google Scholar] [CrossRef]
Bulut, H. Rccc: Robust Concordance Correlation Coefficent. Github. Available online: https://github.com/hsnbulut/rccc/tree/main (accessed on 30 December 2025).
MethComp: Analysis of Agreement in Method Comparison Studies; Version 1.30.2; CRAN: Vienna, Austria, 2024.
Carrasco, J.L.; Martinez, J.P.; Carrasco, M.J.L. Package ‘cccrm’. 2015. Available online: https://mirrors.nics.utk.edu/cran/web/packages/cccrm/cccrm.pdf (accessed on 30 December 2025).
Bulut, H. An R Package for Multivariate Hypothesis Tests: MVTests. Technol. Appl. Sci. 2019, 14, 132–138. [Google Scholar] [CrossRef]

Figure 1. The changes in Pearson correlation

(ρ)

and CCC

(ρ_{C})

under different scenarios: (A)

ρ

and

ρ_{C}

high; (B)

ρ

high,

ρ_{C}

low.

Figure 1. The changes in Pearson correlation

(ρ)

and CCC

(ρ_{C})

under different scenarios: (A)

ρ

and

ρ_{C}

high; (B)

ρ

high,

ρ_{C}

low.

Figure 2. The effect of outliers on CCC: (a) Clean data; (b) Contaminated data.

Figure 3. The line plots of RMSE values for Design-A in clean data.

Figure 4. The line plots of RMSE values for Design-B in clean data.

Figure 5. The line plots of RMSE values for Design-C in clean data.

Figure 6. The line plots of RMSE values for Design-D in clean data.

Figure 7. The line plots of RMSE values for Design-E in clean data.

Figure 8. The line plots of

l o g (R M S E)

values for Design-A in unidirectional contaminated data.

Figure 8. The line plots of

l o g (R M S E)

values for Design-A in unidirectional contaminated data.

Figure 9. The line plots of

l o g (R M S E)

values for Design-B in unidirectional contaminated data.

Figure 9. The line plots of

l o g (R M S E)

values for Design-B in unidirectional contaminated data.

Figure 10. The line plots of

l o g (R M S E)

values for Design-C in unidirectional contaminated data.

Figure 10. The line plots of

l o g (R M S E)

values for Design-C in unidirectional contaminated data.

Figure 11. The line plots of

l o g (R M S E)

values for Design-D in unidirectional contaminated data.

Figure 11. The line plots of

l o g (R M S E)

values for Design-D in unidirectional contaminated data.

Figure 12. The line plots of

l o g (R M S E)

values for Design-E in unidirectional contaminated data.

Figure 12. The line plots of

l o g (R M S E)

values for Design-E in unidirectional contaminated data.

Figure 13. The line plots of

l o g (R M S E)

values for Design-A in bidirectional contaminated data.

Figure 13. The line plots of

l o g (R M S E)

values for Design-A in bidirectional contaminated data.

Figure 14. The line plots of

l o g (R M S E)

values for Design-B in bidirectional contaminated data.

Figure 14. The line plots of

l o g (R M S E)

values for Design-B in bidirectional contaminated data.

Figure 15. The line plots of

l o g (R M S E)

values for Design-C in bidirectional contaminated data.

Figure 15. The line plots of

l o g (R M S E)

values for Design-C in bidirectional contaminated data.

Figure 16. The line plots of

l o g (R M S E)

values for Design-D in bidirectional contaminated data.

Figure 16. The line plots of

l o g (R M S E)

values for Design-D in bidirectional contaminated data.

Figure 17. The line plots of

l o g (R M S E)

values for Design-E in bidirectional contaminated data.

Figure 17. The line plots of

l o g (R M S E)

values for Design-E in bidirectional contaminated data.

Table 1. Population concordance

ρ_{c}

corresponding to the design parameter

ρ

.

Table 1. Population concordance

ρ_{c}

corresponding to the design parameter

ρ

.

Design	$ρ_{c}$
A	$ρ$
B	$0.8 ρ$
C	$ρ / 3$
D	$(e^{ρ} - 1) / (e - 1)$
E	$ρ$

Table 2. Comparison of correlation coefficients under clean and contaminated real data sets.

	Glucose Data		Blood Pressure Data
Coefficient	Clean Data	Contaminated Data	Clean Data	Contaminated Data
$ρ_{P e a r s o n}$	0.927	−0.458	0.953	−0.81
$ρ_{C}$	0.897	−0.351	0.947	−0.672
$ρ_{R}$	0.918	0.918	0.976	0.974
$ρ_{g_{2}}$	0.685	0.437	0.812	0.267
$ρ_{g_{3}}$	0.820	0.136	0.906	−0.147
$ρ_{g_{4}}$	0.502	0.712	0.686	0.524
$ρ_{g_{5}}$	0.812	0.145	0.905	−0.223
$ρ_{B a y e s}$	0.827	0.760	0.965	−0.224
$ρ_{L 1}$	0.633	0.259	0.791	−0.193

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bulut, H.; Zobu, M.; Sağlam, V. A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient. Mathematics 2026, 14, 196. https://doi.org/10.3390/math14010196

AMA Style

Bulut H, Zobu M, Sağlam V. A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient. Mathematics. 2026; 14(1):196. https://doi.org/10.3390/math14010196

Chicago/Turabian Style

Bulut, Hasan, Müjgan Zobu, and Vedat Sağlam. 2026. "A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient" Mathematics 14, no. 1: 196. https://doi.org/10.3390/math14010196

APA Style

Bulut, H., Zobu, M., & Sağlam, V. (2026). A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient. Mathematics, 14(1), 196. https://doi.org/10.3390/math14010196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient

Abstract

1. Introduction

2. Proposed Robust Concordance Correlation Coefficients (RCCC)

2.1. Motivation

2.2. Definition

2.3. Range and Affine Invariance Properties of $ρ_{R}$

3. Simulation Study

3.1. Simulation Study for Clean Data

3.2. Simulation Study for Unidirectional Contaminated Data

3.3. Simulation Study for Bidirectional Contaminated Data

4. Real Example

4.1. Glucose Data Example

4.2. Blood Pressure Data Example

5. Software Availability

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Derivations of $ρ_{C}$ for Designs A–E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient

Abstract

1. Introduction

2. Proposed Robust Concordance Correlation Coefficients (RCCC)

2.1. Motivation

2.2. Definition

2.3. Range and Affine Invariance Properties of ρ R

3. Simulation Study

3.1. Simulation Study for Clean Data

3.2. Simulation Study for Unidirectional Contaminated Data

3.3. Simulation Study for Bidirectional Contaminated Data

4. Real Example

4.1. Glucose Data Example

4.2. Blood Pressure Data Example

5. Software Availability

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Derivations of ρ C for Designs A–E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.3. Range and Affine Invariance Properties of $ρ_{R}$

Appendix A. Derivations of $ρ_{C}$ for Designs A–E