FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams

Jang, Sooyoung; Choi, Changbeom

doi:10.3390/math14010191

Open AccessArticle

FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams

by

Sooyoung Jang

and

Changbeom Choi

^*

Department of Computer Engineering, Hanbat National University, Daejeon 34158, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(1), 191; https://doi.org/10.3390/math14010191

Submission received: 6 December 2025 / Revised: 26 December 2025 / Accepted: 31 December 2025 / Published: 4 January 2026

(This article belongs to the Special Issue Modeling and Simulation for Optimizing Complex Dynamical Systems)

Download

Browse Figures

Versions Notes

Abstract

The estimation of correlation matrices in high-dimensional data streams presents a fundamental conflict between computational efficiency and statistical robustness. Moment-based estimators, such as Pearson’s correlation, offer linear

O (N)

complexity but lack robustness. In contrast, high-breakdown methods like the minimum covariance determinant (MCD) are computationally prohibitive (

O (N p^{2} + p^{3})

) for real-time applications. This paper introduces Fast Outlier-Robust Correlation Estimation (FORCE), a streaming algorithm that performs adaptive coordinate-wise trimming using the

P^{2}

algorithm for streaming quantile approximation, requiring only

O (p)

memory independent of stream length. We evaluate FORCE against six baseline algorithms—including exact trimmed methods (TP-Exact, TP-TER) that use

O (N log N)

sorting with

O (N p)

storage—across five benchmark datasets spanning synthetic, financial, medical, and genomic domains. FORCE achieves speedups of approximately 470× over FastMCD and 3.9× over Spearman’s rank correlation. On S&P 500 financial data, coordinate-wise trimmed methods substantially outperform FastMCD: TP-Exact achieves the best RMSE (0.0902), followed by TP-TER (0.0909) and FORCE (0.1186), compared to FastMCD’s 0.1606. This result demonstrates that coordinate-wise trimming better accommodates volatility clustering in financial time series than multivariate outlier exclusion. FORCE achieves 76% of TP-Exact’s accuracy while requiring

10^{4} \times

less memory, enabling robust estimation in true streaming environments where data cannot be retained for batch processing. We validate the 25% breakdown point shared by all IQR-based trimmed methods using the ODDS-satellite benchmark (31.7% contamination), confirming identical degradation for FORCE, TP-Exact, and TP-TER. For memory-constrained streaming applications with contamination below 25%, FORCE provides the only viable path to robust correlation estimation with bounded memory.

Keywords:

robust statistics; correlation estimation; machine learning; streaming algorithms; quantile approximation; high-dimensional data; outlier detection; computational efficiency; breakdown point; memory-efficient computing

MSC:

37M10; 62H20; 62G35; 68W27

Graphical Abstract

1. Introduction

The estimation of covariance and correlation matrices constitutes the mathematical foundation of multivariate statistical analysis. These matrices serve as prerequisites for a broad spectrum of downstream analytical tasks, including principal component analysis (PCA) for dimensionality reduction [1], linear discriminant analysis (LDA) for classification, and mean-variance optimization in modern portfolio theory [2]. In the contemporary era of high-velocity data streams, characterized by unprecedented volume and variety, the dimensionality of datasets has expanded dramatically. From tick-by-tick order book updates in high-frequency trading [3] to transcriptomic counts in single-cell RNA sequencing [4] and telemetry streams from massive Internet of Things (IoT) sensor arrays [5], the demand for real-time multivariate analysis has never been greater.

However, the reliability of these fundamental estimators is perpetually threatened by the non-ideal nature of real-world data. Outliers, heavy-tailed distributions, sensor failures, and structural anomalies are ubiquitous in high-dimensional streams. This reality creates a critical tension—a “Computational and Memory Bottleneck”—where the methods simple enough to handle continuous data streams (e.g., Pearson correlation) lack the statistical robustness to handle data quality issues, while the methods possessing the necessary robustness (e.g., minimum covariance determinant) are computationally intractable and require storing all observations for batch processing.

This paper addresses this bottleneck by introducing Fast Outlier-Robust Correlation Estimation (FORCE), a streaming algorithm that performs adaptive coordinate-wise trimming using streaming quantile approximations. FORCE requires only

O (p)

memory for quantile markers—independent of stream length N—enabling robust estimation in true streaming environments where data cannot be retained. We systematically compare FORCE against exact trimmed methods that use

O (N log N)

sorting with

O (N p)

storage, quantifying the accuracy–memory trade-off and providing practitioners with clear guidance on method selection.

1.1. The Explosion of High-Dimensional Data and the Robustness Deficit

The scale of modern data acquisition has fundamentally shifted the paradigms of statistical inference. In financial markets, the covariance structure of assets—essential for risk management and portfolio optimization—must be updated continuously as millions of transactions occur daily [6]. A delayed or corrupted covariance update can expose algorithmic trading systems to unhedged risks or missed arbitrage opportunities. Similarly, in the domain of IoT, the integration of 5G networks has enabled the deployment of dense sensor grids that generate continuous, high-velocity streams of time-series data requiring real-time anomaly detection [7].

Standard statistical methods, specifically the sample mean and sample covariance (Pearson correlation), remain the default choices for these applications primarily due to their computational efficiency. Calculating a Pearson correlation coefficient requires a single pass over the data, scaling linearly with the sample size N, i.e.,

O (N)

complexity.

However, these moment-based estimators are notoriously sensitive to outliers. The breakdown point of an estimator is formally defined as the smallest fraction of contamination that can cause the estimator to take on an arbitrarily large aberrant value [8,9]. For the sample mean and covariance, the breakdown point is asymptotically zero; a single unbounded observation can arbitrarily distort the estimate as

N \to \infty

. In high-stakes applications, this fragility is unacceptable. For instance, in an IoT network monitoring critical infrastructure, a single malfunctioning sensor emitting high-magnitude noise should be flagged as an anomaly, not allowed to skew the global correlation structure and trigger false system-wide alarms [10].

1.2. The Computational and Memory Bottleneck

To mitigate the influence of outliers, the statistical community has developed a rich theory of robust estimation over several decades [8,11,12]. Methods such as M-estimators [8], the minimum covariance determinant (MCD) [13], and rank-based correlations (e.g., Spearman’s

ρ

, Kendall’s

τ

) [14,15,16] offer high breakdown points and theoretical guarantees against contamination [14].

However, the deployment of these methods in high-dimensional streams is hindered by their computational complexity and memory requirements. This “Computational and Memory Bottleneck” manifests in three primary forms:

Sorting and ranking costs. Rank-based methods, such as Spearman’s correlation, achieve robustness by replacing raw values with their ranks. This transformation necessitates sorting the data, which imposes a time complexity of

O (N log N)

and—critically—requires

O (N)

storage to retain all observations for sorting. While trivial for small, static datasets, this super-linear complexity and linear memory requirement become prohibitive in streaming contexts where N grows continuously or is unbounded.

Iterative optimization and matrix inversion. High-breakdown affine-equivariant estimators, such as the MCD, rely on iterative subsampling to find a subset of observations with the minimum determinant. Even optimized variants like FastMCD [17] exhibit complexities that scale poorly with dimension. The calculation of Mahalanobis distances, required for outlier flagging, involves inverting the covariance matrix, an operation scaling as

O (p^{3})

where p denotes dimensionality [18]. As the number of dimensions increases (e.g., in genomics or image processing), this step becomes computationally prohibitive.

Memory constraints in streaming environments. As noted in the recent literature, robust estimation methods typically assume batch access to all observations [18,19]. In true streaming environments—continuous sensor telemetry, real-time market data feeds, edge computing deployments—storing all observations for batch processing is infeasible. A stream of

N = 10^{9}

observations across

p = 100

dimensions requires approximately 800 GB of storage for batch methods. This memory constraint, often overlooked in the robust statistics literature, represents a fundamental barrier to deployment in resource-constrained environments.

The empirical reality of this bottleneck is severe. As evidenced by our experimental baselines presented in Table 1, the FastMCD algorithm—despite its name—requires over 1300 ms to process relatively small batches of satellite telemetry data, and nearly 500 ms for S&P 500 financial snapshots. In high-frequency trading systems requiring microsecond-level latency, or intrusion detection systems requiring line-rate processing, delays of this magnitude are tantamount to system failure.

Remark 1

(The Nature of the Bottleneck). Two distinct bottlenecks constrain robust estimation in streaming environments. First, the computational bottleneck: FastMCD’s

O (N p^{2} + p^{3})

complexity yields latencies of 325–1367 ms, precluding real-time deployment. Second, the memory bottleneck: even fast trimmed methods that use exact quantile computation require

O (N p)

storage to retain observations for sorting, precluding deployment when data arrives as an unbounded stream. FORCE addresses both bottlenecks:

O (N p^{2})

time complexity comparable to Pearson correlation, and

O (p)

memory independent of stream length.

1.3. Survey of Recent Advances (2024–2025)

The challenge of robust estimation in streaming environments has been a focal point of intense research activity in 2024 and 2025, driven by the documented failures of batch-oriented models in dynamic, non-stationary environments.

1.3.1. Concept Drift and IoT Security

Carnier et al. [7] highlighted the critical issue of “concept drift” in IoT traffic, wherein the statistical properties of the “normal” operational state evolve over time. Their work demonstrates that static batch models rapidly degrade in performance as the underlying data distribution shifts due to network reconfiguration, seasonal patterns, or adversarial manipulation. They argue that effective anomaly detection requires adaptive models capable of incremental updates. However, they also note that most existing streaming implementations lack the robustness to handle “poisoning” attacks where the detection model is slowly trained on adversarial examples [10]. This observation reinforces the pressing need for estimators that are simultaneously streaming-capable and inherently robust to contamination. FORCE directly targets this setting by updating robust quantile summaries online (via

P^{2}

) and recomputing trimming bounds from these summaries, enabling correlation monitoring that adapts as the distribution shifts. In addition, the single-pass variants described in Remark 9 permit threshold updates on rolling windows, providing a practical mechanism to track drift without storing an unbounded stream.

1.3.2. High Dimensionality in Finance

In the financial domain, Maddanu et al. [18] addressed the problem of anomaly detection in high-dimensional bank account balances using robust statistical methods. Working with datasets containing approximately 2.6 million daily records, they encountered significant computational barriers. Their research explicitly states that calculating the Mahalanobis distance becomes “infeasible for standard computers” in very large datasets due to the

O (p^{2})

complexity in memory and

O (p^{3})

complexity in computation. They evaluated alternative robust approaches but found that many robust strategies remain “less efficient and computationally expensive under high dimensional settings”, necessitating a practical trade-off where practitioners often revert to less robust but faster methods, or employ aggressive dimension reduction that may obscure subtle anomalies. Both FORCE and exact trimmed methods (TP-Exact, TP-TER) address this computational barrier by eliminating global optimization: quantiles are computed either via streaming approximation (FORCE) or exact sorting (TP-Exact), and pairwise trimmed statistics are accumulated in

O (1)

time per observation per pair. Consequently, these coordinate-wise trimmed methods match the

O (N p^{2})

scaling of standard Pearson correlation while incorporating robust outlier rejection. The choice between FORCE and exact trimmed methods depends on memory constraints: FORCE requires

O (p)

storage independent of stream length, while exact methods require

O (N p)

storage.

1.3.3. Theoretical Limits and New Directions

Loh [19], in a comprehensive theoretical review of modern robust statistics, emphasizes that while the field possesses a mature theoretical foundation, the intersection of high dimensionality and robustness requires fundamentally new frameworks. The review highlights that classical robust estimation methods often struggle with the “curse of dimensionality”, where the volume of the feature space increases exponentially such that data points become sparse, and the geometric concept of “outlyingness” becomes increasingly difficult to define without computationally expensive projection pursuit or data depth methods. Loh calls for the development of new estimators that can operate efficiently in these high-dimensional regimes, potentially by relaxing certain equivariance conditions in exchange for computational tractability—a direction that both FORCE and exact trimmed methods explicitly pursue by sacrificing affine equivariance for coordinate-wise trimming.

1.3.4. Relation to Distributed Learning

Trimming-based estimators have been studied extensively in Byzantine-robust distributed learning, where the goal is to aggregate gradients or updates under adversarial workers and to obtain minimax-optimal statistical rates. For example, coordinate-wise trimmed means provide robustness guarantees under Byzantine contamination [20], while Huber-loss-based procedures can improve efficiency bounds in certain regimes [21]. Our setting is fundamentally different: FORCE targets streaming estimation of a full

p \times p

correlation matrix under observation-level outliers and distributional tail expansion, with a primary constraint being computational feasibility (sub-millisecond updates) rather than minimax-optimal learning rates. Accordingly, FORCE does not claim optimality in the sense of distributed learning theory; instead, it adopts quantile/IQR-based trimming because it yields a transparent breakdown point guarantee (25%) while avoiding the sorting and optimization steps that dominate classical robust covariance estimators.

1.4. Limitations of Existing Robust Solutions

Despite these theoretical and empirical advances, a fundamental gap remains in the literature. Current solutions generally fall into three distinct categories:

High robustness, low speed, batch memory: Algorithms in this category include the minimum covariance determinant (MCD) and its optimized derivatives (FastMCD, deterministic MCD) [17], as well as high-breakdown S-estimators and MM-estimators [22]. These methods provide excellent protection against outliers, with breakdown points approaching 50%, but are computationally prohibitive for real-time streams. As demonstrated in Table 1, processing times range from approximately 325 to over 1367 ms per batch update.

Moderate robustness, moderate speed, batch memory: Algorithms in this category include coordinate-wise trimmed estimators using exact quantile computation. These methods achieve

O (N p^{2})

time complexity by replacing multivariate outlier detection with marginal trimming, yielding speedups of 200–

800 \times

over FastMCD. However, they require

O (N p)

storage to retain all observations for exact quantile computation via sorting. When batch processing is acceptable, these methods provide the best accuracy among fast robust estimators.

High speed, low robustness, streaming memory: Algorithms in this category include standard Pearson correlation, exponentially weighted moving averages (EWMAs), and simple Winsorization schemes. These methods achieve

O (N)

complexity and can be implemented in streaming fashion with

O (p^{2})

memory, but they possess breakdown points at or near zero, rendering them vulnerable to even mild contamination.

The rank-based methods (Spearman’s

ρ

, Kendall’s

τ

) [14,15] occupy an intermediate position, offering moderate robustness with

O (N log N)

complexity and

O (N p)

memory. However, the sorting requirement still imposes scalability ceilings in both time and space.

The gap: No existing method combines meaningful robustness (breakdown point > 0%) with streaming memory requirements (

O (p)

independent of N). FORCE is designed to fill this gap.

1.5. The FORCE Contribution

To bridge this methodological gap, we propose FORCE (Fast Outlier-Robust Correlation Estimation), an algorithm designed to achieve robust correlation estimation with bounded memory in true streaming environments. The core innovation of FORCE lies in its use of streaming quantile approximations to perform adaptive, data-driven trimming without requiring access to the complete dataset or storing observations for batch sorting.

Instead of sorting the data to identify and remove extreme observations (which requires

O (N log N)

complexity and

O (N p)

storage), FORCE maintains dynamic estimates of the data’s quantiles directly within the stream using the

P^{2}

algorithm [23]. This enables the instantaneous classification and rejection of outlying observations with only

O (p)

memory for quantile markers, independent of stream length.

The specific contributions of this paper are as follows:

Memory-bounded streaming estimation: We introduce the FORCE algorithm, a streaming covariance estimator that operates with

O (p)

memory for quantile markers—independent of stream length N—compared to the

O (N p)

storage required by exact trimmed methods. This architectural distinction enables deployment in true streaming environments where data cannot be retained for batch processing. For a stream of

10^{9}

observations across 100 dimensions, FORCE requires ∼80 KB versus ∼800 GB for exact methods—a reduction of seven orders of magnitude.

Linear-time robust estimation: FORCE operates in strictly linear time

O (N)

per correlation pair (

O (N p^{2})

for a full

p \times p

correlation matrix) by utilizing the

P^{2}

algorithm for streaming quantile approximation, combined with adaptive threshold computation for robust trimming. This design effectively bypasses the sorting bottleneck inherent in rank-based methods while matching the time complexity of non-robust Pearson correlation.

Systematic comparison with exact trimmed methods: Through comprehensive benchmarking across five diverse datasets, we compare FORCE against six baseline algorithms, including exact trimmed methods (TP-Exact, TP-TER), that share FORCE’s coordinate-wise trimming approach but use exact quantile computation. FORCE achieves speedups of approximately

470 \times

over FastMCD and

3.9 \times

over Spearman’s rank correlation. Compared to exact trimmed methods, FORCE occupies the same computational performance tier (1–3 ms average execution time), with the critical distinction being memory requirements rather than speed.

Accuracy–memory trade-off quantification: We provide explicit quantification of the accuracy cost of streaming quantile approximation. On S&P 500 financial data, TP-Exact achieves the best RMSE (

0.0902

) among fast methods, followed by TP-TER (

0.0909

) and FORCE (

0.1186

). FORCE achieves 76% of TP-Exact’s accuracy while requiring

10^{4} \times

less memory. This trade-off enables practitioners to make informed decisions based on their application’s memory constraints.

Coordinate-wise trimming for financial data: We demonstrate that all coordinate-wise trimmed methods (FORCE, TP-Exact, TP-TER) substantially outperform FastMCD on financial time series exhibiting volatility clustering (S&P 500 RMSE: 0.09–0.12 vs. FastMCD’s 0.16). This result reflects a fundamental insight: coordinate-wise trimming accommodates coherent market-wide volatility events that multivariate methods inappropriately exclude, providing more accurate correlation estimates during market stress.

Empirical validation of shared breakdown point: We provide rigorous validation of the 25% breakdown point shared by all IQR-based trimmed methods. Using the ODDS-satellite dataset (31.7% contamination), we demonstrate that FORCE, TP-Exact, and TP-TER exhibit identical breakdown behavior—all degrading to RMSE ∼0.72, comparable to non-robust Pearson. This shared limitation confirms that method selection should be based on memory constraints rather than robustness properties.

The remainder of this paper is organized as follows. Section 2 presents the mathematical formulation of the FORCE algorithm, including the streaming quantile approximation mechanism and complexity analysis. Section 3 describes the experimental setup and presents comprehensive benchmark results comparing FORCE against six baseline algorithms. Section 4 interprets the findings in the context of practical deployment, provides method selection guidance, and discusses theoretical limitations. Section 5 summarizes the contributions and outlines directions for future research.

2. Materials and Methods

This section presents the mathematical foundations of the FORCE algorithm, beginning with the formal contamination model that motivates robust estimation, followed by a detailed exposition of the streaming quantile approximation mechanism, the adaptive trimming procedure, and concluding with complexity analysis and implementation details.

2.1. Theoretical Framework: The Contamination Model

We formalize the problem of robust estimation in contaminated data streams using the Huber

ε

-contamination model [8], which provides a principled framework for analyzing estimator behavior under adversarial or stochastic corruption.

Definition 1

(Huber

ε

-Contamination Model). Let the observed data stream

X = {x_{1}, x_{2}, \dots, x_{N}},

where each

x_{i} \in R^{p}

, be drawn independently from a mixture distribution

P_{ε}

:

P_{ε} = (1 - ε) F + ε G,

(1)

where F represents the nominal (clean) distribution, typically assumed to be multivariate normal

N (μ, Σ)

; G is an arbitrary contamination distribution which may be heavy-tailed, asymmetric, or even adversarially chosen; and

ε \in [0, 1)

denotes the contamination fraction.

The goal of robust covariance estimation is to recover the true covariance matrix

Σ

of the nominal distribution F, even when a fraction

ε

of observations are drawn from the arbitrary contamination distribution G.

Definition 2

(Breakdown Point). The finite-sample breakdown point of an estimator

{\hat{Σ}}_{N}

is defined as [9]:

ε_{N}^{*} = min_{1 \leq m \leq N} \{\frac{m}{N} : sup_{X_{m}^{'}} ∥ {\hat{Σ}}_{N} (X_{m}^{'}) ∥ = \infty\},

(2)

where

X_{m}^{'}

denotes a corrupted dataset obtained by replacing any m observations in

X

with arbitrary values. Intuitively, the breakdown point represents the smallest fraction of contamination that can cause the estimator to become arbitrarily biased.

For the sample covariance matrix (Pearson correlation), the breakdown point is

ε^{*} = 1 / N \to 0

as

N \to \infty

; a single outlier of sufficient magnitude can arbitrarily inflate the variance estimate. The minimum covariance determinant (MCD) estimator achieves a breakdown point of approximately

⌊ (N - p + 1) / 2 ⌋ / N \approx 0.5

[17], but at substantial computational cost.

2.2. The FORCE Algorithm

The FORCE algorithm is designed to achieve robust correlation estimation with

O (N)

time complexity by replacing computationally expensive sorting and matrix operations with streaming quantile approximations. The algorithm consists of three integrated modules: (1) streaming quantile estimation, (2) robust scale computation, and (3) adaptive threshold trimming.

2.2.1. Module 1: Streaming Quantile Approximation via the $P^{2}$ Algorithm

The foundation of FORCE is the

P^{2}

(piecewise-parabolic) algorithm introduced by Jain and Chlamtac [23], which enables dynamic estimation of arbitrary quantiles without storing the complete observation history. This algorithm maintains a fixed number of markers whose positions are adjusted incrementally as new observations arrive.

Definition 3

(

P^{2}

Quantile Estimator). For a target quantile

ϕ \in (0, 1)

, the

P^{2}

algorithm maintains five markers with positions

q_{0} < q_{1} < q_{2} < q_{3} < q_{4}

and corresponding marker counts

n_{0} < n_{1} < n_{2} < n_{3} < n_{4}

. The central marker

q_{2}

converges to the ϕ-quantile of the data stream.

The desired marker positions after observing N data points are defined as follows:

n_{i}^{'} = \{\begin{matrix} 1 & if i = 0 \\ 1 + 2 ϕ (N - 1) & if i = 1 \\ 1 + ϕ \cdot 2 (N - 1) & if i = 2 \\ 1 + (1 + ϕ) \cdot 2 (N - 1) / 2 & if i = 3 \\ N & if i = 4 \end{matrix}

(3)

When a new observation x arrives, the algorithm proceeds as follows:

Step 1: Marker count update. Find the cell k such that

q_{k - 1} \leq x < q_{k}

and increment

n_{j}

for all

j \geq k

. Update boundary markers if necessary: if

x < q_{0}

, set

q_{0} \leftarrow x

; if

x > q_{4}

, set

q_{4} \leftarrow x

.

Step 2: Marker position adjustment. For each interior marker

i \in {1, 2, 3}

, if the actual position

n_{i}

deviates from the desired position

n_{i}^{'}

by more than a threshold, adjust

q_{i}

using piecewise-parabolic interpolation:

\begin{matrix} q_{i}^{new} = q_{i} + \frac{d}{n_{i + 1} - n_{i - 1}} & [(n_{i} - n_{i - 1} + d) (q_{i + 1} - q_{i}) \cdot \frac{1}{n_{i + 1} - n_{i}} \\ + (n_{i + 1} - n_{i} - d) (q_{i} - q_{i - 1}) \cdot \frac{1}{n_{i} - n_{i - 1}}], \end{matrix}

(4)

where

d = sign (n_{i}^{'} - n_{i})

. If the parabolic adjustment would violate marker ordering, a linear adjustment is used instead:

q_{i}^{new} = q_{i} + d \cdot \frac{q_{i + d} - q_{i}}{n_{i + d} - n_{i}} .

(5)

The

P^{2}

algorithm requires only

O (1)

operations per observation and

O (1)

memory (five markers), regardless of stream length N.

Remark 2

(

P^{2}

Convergence and Approximation Error). The

P^{2}

algorithm is an approximation method whose accuracy depends on the number of observations processed. Several properties govern its convergence behavior:

Initialization phase. The algorithm requires a minimum of 5 observations to initialize the markers. During this phase, the markers are set to the first 5 sorted observations, and quantile estimates may be unreliable.

Implementation strategy. In our implementation, FORCE uses a hybrid initialization approach for improved early-stage accuracy:

1.: For $N \leq 5$ : The first 5 observations are collected and sorted to initialize the $P^{2}$ markers at positions corresponding to the target quantiles.
2.: For $6 \leq N < N_{warmup}$ : The $P^{2}$ algorithm operates normally, with markers adjusting incrementally. During this warm-up phase, trimming decisions may be suboptimal.
3.: For $N \geq N_{warmup}$ : Full $P^{2}$ accuracy is achieved and FORCE operates at designed effectiveness.

The default warm-up threshold is

N_{warmup} = 100

. For applications requiring high accuracy from the first observation, practitioners may employ an alternative strategy: use exact quantile computation (via sorting) for the first

N_{exact}

observations, then transition to

P^{2}

streaming. This hybrid approach incurs

O (N_{exact} log N_{exact})

initial cost but ensures accurate quantile estimates from the start. In our experiments, we use pure

P^{2}

from

N = 1

as all datasets have

N \geq 1000

, well beyond the warm-up period.

Convergence rate. Jain and Chlamtac [23] demonstrated empirically that

P^{2}

achieves relative errors below 0.5% for

N \geq 1000

observations across a variety of distributions. For smaller sample sizes (

N < 100

), relative errors can reach 2–5%, particularly for extreme quantiles (

ϕ = 0.01

or

ϕ = 0.99

).

Distribution dependence. Convergence is faster for distributions with lighter tails. Heavy-tailed distributions (e.g., Cauchy, Pareto) may require larger N for the extreme quantile markers to stabilize, as rare extreme observations can cause substantial marker adjustments.

Impact on FORCE. The approximation error in quantile estimates propagates to the robust location (

\hat{μ}

), scale (

\hat{σ}

), and TER estimates, which in turn affect the trimming bounds. In small-N scenarios, this can result in (1) slightly suboptimal trimming thresholds and (2) minor inconsistencies between the intended and actual acceptance regions. However, the correlation computation itself is exact given the trimming decisions—only the trimming boundary is approximate.

In FORCE, we instantiate five

P^{2}

estimators for each dimension

j \in {1, \dots, p}

to track the quantiles

ϕ \in {0.01, 0.25, 0.50, 0.75, 0.99}

, denoted as

{\hat{q}}_{0.01}^{(j)}, {\hat{q}}_{0.25}^{(j)}, {\hat{q}}_{0.50}^{(j)}, {\hat{q}}_{0.75}^{(j)}, {\hat{q}}_{0.99}^{(j)}

.

Remark 3

(Minimum Sample Size Recommendations). Based on the convergence properties of the

P^{2}

algorithm, we provide the following sample size guidance for FORCE:

$N < 50$ : Not recommended. Quantile estimates are unreliable; consider exact methods or batch processing.
$50 \leq N < 200$ : Use with caution. Central quantiles ( $q_{0.25}, q_{0.50}, q_{0.75}$ ) are reasonably accurate, but extreme quantiles ( $q_{0.01}, q_{0.99}$ ) used for TER may have elevated error. Consider using a simpler threshold rule without TER adaptation.
$200 \leq N < 1000$ : Acceptable. All quantile estimates are sufficiently accurate for robust trimming. Minor approximation error may slightly affect boundary observations.
$N \geq 1000$ : Recommended. $P^{2}$ estimates are highly accurate (<0.5% relative error), and FORCE operates at full effectiveness.

For streaming applications where N grows continuously, this concern applies only to the initial “warm-up” period. Once the stream has processed sufficient observations, the

P^{2}

markers stabilize and subsequent estimates are highly accurate.

2.2.2. Module 2: Robust Location and Scale Estimation

Using the streaming quantile estimates, we compute robust measures of location and scale for each dimension.

Definition 4

(Robust Location Estimator). The robust location for dimension j is estimated by the streaming median:

{\hat{μ}}^{(j)} = {\hat{q}}_{0.50}^{(j)} .

(6)

Definition 5

(Robust Scale Estimator). The robust scale for dimension j is estimated using the normalized interquartile range (IQR):

{\hat{σ}}^{(j)} = \frac{{\hat{q}}_{0.75}^{(j)} - {\hat{q}}_{0.25}^{(j)}}{1.349},

(7)

where the normalization constant

1.349 = 2 Φ^{- 1} (0.75) \approx 1.349

ensures consistency with the standard deviation under Gaussian assumptions, i.e.,

E [\hat{σ}] = σ

when

F = N (μ, σ^{2})

.

Remark 4

(Robustness of Quantile-Based Estimation). Unlike moment-based statistics (mean, variance), quantiles are well defined and finite for all distributions, including those with infinite variance such as the Cauchy distribution. This property ensures that FORCE’s core statistical estimates (

\hat{μ}

,

\hat{σ}

, TER) remain bounded and meaningful even when applied to heavy-tailed data, a critical advantage over moment-based robust methods that may suffer numerical instability under extreme contamination.

The IQR-based scale estimator possesses a breakdown point of 25%, as corruption of more than 25% of the data can shift both the first and third quartiles to arbitrary positions [8].

Remark 5

(Choice of IQR over MAD). An alternative robust scale estimator is the median absolute deviation (MAD), defined as

MAD = median (| x_{i} - median (x) |)

, which achieves a 50% breakdown point—higher than IQR’s 25%. We chose IQR for FORCE for the following reasons:

Streaming compatibility: The IQR requires only two quantiles (

q_{0.25}

,

q_{0.75}

), which can be estimated directly via

P^{2}

instances. In contrast, MAD requires first a computation of the median, then computation of a second-pass median of the absolute deviations. Streaming MAD estimation requires either (a) storing of all observations to compute deviations (violating

O (1)

memory), or (b) approximate methods that introduce additional complexity and error.

Computational simplicity: Each

P^{2}

instance is independent. IQR computation requires no inter-estimator communication. MAD would require coordination of the median estimate with a secondary streaming process for deviation medians.

Practical sufficiency: The 25% breakdown point is sufficient for many real-world applications where contamination rates are typically below 20%. Our experiments demonstrate effective robust estimation across diverse domains with this threshold.

TER synergy: The extreme quantiles (

q_{0.01}

,

q_{0.99}

) required for TER computation are already tracked by

P^{2}

; no additional estimators are needed beyond those required for IQR.

Investigation of streaming MAD approximations to achieve higher breakdown points while preserving

O (N)

complexity remains a promising direction for future research (Section 5.5).

2.2.3. Module 3: Tail Expansion Ratio and Adaptive Thresholds

A key innovation in FORCE is the TER, which enables the algorithm to distinguish between coherent high-variance events (e.g., market-wide volatility spikes) and incoherent outliers (e.g., sensor malfunctions).

Definition 6

(Tail Expansion Ratio (TER)). The tail expansion ratio for dimension j is defined as follows:

{TER}^{(j)} = max (1, \frac{| {\hat{q}}_{0.99}^{(j)} - {\hat{q}}_{0.50}^{(j)} |}{| {\hat{q}}_{0.50}^{(j)} - {\hat{q}}_{0.01}^{(j)} |}) .

(8)

Remark 6

(Numerical Stability of TER). The TER formula involves division by

| {\hat{q}}_{0.50}^{(j)} - {\hat{q}}_{0.01}^{(j)} |

, which could approach zero in pathological cases such as extremely-low-variance data, discrete distributions where many quantiles coincide, or degenerate dimensions with constant values. To ensure numerical stability, the implementation adds a small tolerance ϵ to the denominator:

{TER}_{stable}^{(j)} = max (1, \frac{| {\hat{q}}_{0.99}^{(j)} - {\hat{q}}_{0.50}^{(j)} |}{| {\hat{q}}_{0.50}^{(j)} - {\hat{q}}_{0.01}^{(j)} | + ϵ}),

(9)

where

ϵ = 10^{- 10}

by default. Additionally, if the denominator falls below ϵ (indicating a near-degenerate distribution), FORCE sets

{TER}^{(j)} = 1

and issues a warning, as adaptive tail expansion is not meaningful for such dimensions. In practice, this safeguard is rarely triggered for continuous real-world data.

The TER captures the asymmetry of the distribution’s tails. Under symmetric contamination or symmetric heavy-tailed distributions,

TER \approx 1

. When the upper tail expands disproportionately (e.g., during volatility clustering in financial data),

TER > 1

, signaling that the acceptance region should be widened to accommodate legitimate extreme observations.

Remark 7

(TER Behavior Under Heavy-Tailed Distributions). The TER is designed to detect asymmetric tail expansion, not to accommodate arbitrarily heavy tails. Several properties govern its behavior:

Boundedness under symmetric distributions: For any symmetric distribution (including heavy-tailed ones such as the t-distribution or Cauchy [22]), the TER equals 1 by construction, as

| q_{0.99} - q_{0.50} | = | q_{0.50} - q_{0.01} |

. Thus, symmetric heavy tails do not inflate the TER.

Behavior under infinite variance: For distributions with infinite variance (e.g., Cauchy), the quantile spreads

| q_{0.99} - q_{0.50} |

and

| q_{0.50} - q_{0.01} |

are finite and well defined, even though the variance is not. The TER remains bounded because it is a ratio of quantile spreads, not moments. Specifically, for a standard Cauchy distribution,

TER = 1

due to symmetry.

Potential concern: asymmetric heavy tails: The TER can become large under asymmetric heavy-tailed distributions or when contamination is concentrated in one tail. In the extreme case where the upper tail expands dramatically relative to the lower tail, the TER could theoretically grow without bound, causing the acceptance region to widen excessively. However, this scenario is mitigated by two factors: (1) the robust scale estimate

\hat{σ}

(based on IQR) remains bounded even under heavy tails, and (2) practitioners can impose an explicit upper bound on TER (e.g.,

{TER}_{max} = 3

) in applications where extreme asymmetry is a concern.

Empirical evidence: In our experiments, the synthetic dataset includes Cauchy-distributed contamination (Section 2.4.1), yet FORCE achieves reasonable performance (RMSE 0.2001), demonstrating that the TER mechanism does not catastrophically fail under infinite-variance contamination. The primary failure mode of FORCE is exceeding the 25% breakdown point, not unbounded TER expansion.

Definition 7

(Adaptive Rejection Bounds). The lower and upper rejection bounds for dimension j are computed as follows:

μ_{L}^{(j)} = {\hat{μ}}^{(j)} - λ \cdot {TER}^{(j)} \cdot {\hat{σ}}^{(j)}, μ_{H}^{(j)} = {\hat{μ}}^{(j)} + λ \cdot {TER}^{(j)} \cdot {\hat{σ}}^{(j)},

(10)

where

λ > 0

is a tuning parameter controlling the trimming aggressiveness. The default value

λ = 3.0

corresponds to approximately 99.7% coverage under Gaussian assumptions. For applications involving potentially extreme asymmetry, practitioners may impose an upper bound

{TER}_{max}

(e.g.,

{TER}_{max} = 3

) by replacing Equation (8) with the following:

{TER}_{capped}^{(j)} = min ({TER}_{max}, max (1, \frac{| {\hat{q}}_{0.99}^{(j)} - {\hat{q}}_{0.50}^{(j)} |}{| {\hat{q}}_{0.50}^{(j)} - {\hat{q}}_{0.01}^{(j)} |})) .

(11)

Statistical Role of TER (Bias–Efficiency–Robustness Trade-Off)

FORCE computes correlations after applying an acceptance rule

X_{j} \in [μ_{L}^{(j)}, μ_{H}^{(j)}]

. For fixed

λ

, the width of this interval determines a classic trade-off: narrower bounds reduce sensitivity to extreme contamination (robustness) but increase truncation bias and reduce efficiency by discarding informative tail observations. The tail expansion ratio (TER) acts as an adaptive multiplier that increases the trimming width when the empirical tails expand, particularly under asymmetric tail behavior (large

| q_{0.99} - q_{0.50} |

relative to

| q_{0.50} - q_{0.01} |

). This adaptive widening mitigates excessive truncation during coherent, regime-wide tail expansion (e.g., volatility clustering) while preserving protection against isolated point outliers. Capping TER via (11) provides an explicit stability knob that prevents over-inclusion under extreme asymmetry. While TER is designed to maximize efficiency under coherent tail expansion (common in financial data), it entails a trade-off regarding robustness against purely asymmetric contamination. We characterize this trade-off using controlled simulations in Appendix B.

2.2.4. Module 4: Trimmed Correlation Computation

With the adaptive bounds established, FORCE computes the correlation matrix using only observations that fall within the acceptance region for all dimensions involved.

Definition 8

(FORCE Correlation Estimator). For dimensions j and k, define the acceptance indicator:

w_{i}^{(j, k)} = 1 [x_{i, j} \in [μ_{L}^{(j)}, μ_{H}^{(j)}]] \cdot 1 [x_{i, k} \in [μ_{L}^{(k)}, μ_{H}^{(k)}]],

(12)

where

1 [\cdot]

denotes the indicator function. The FORCE correlation estimate is

{\hat{r}}_{j k}^{FORCE} = \frac{\sum_{i = 1}^{N} w_{i}^{(j, k)} (x_{i, j} - {\bar{x}}_{j}^{w}) (x_{i, k} - {\bar{x}}_{k}^{w})}{\sqrt{\sum_{i = 1}^{N} w_{i}^{(j, k)} {(x_{i, j} - {\bar{x}}_{j}^{w})}^{2}} \cdot \sqrt{\sum_{i = 1}^{N} w_{i}^{(j, k)} {(x_{i, k} - {\bar{x}}_{k}^{w})}^{2}}},

(13)

where

{\bar{x}}_{j}^{w} = \sum_{i} w_{i}^{(j, k)} x_{i, j} / \sum_{i} w_{i}^{(j, k)}

denotes the weighted mean over accepted observations.

Remark 8

(Weighted-Correlation Interpretation). Equation (13) can be formally interpreted as a weighted Pearson correlation, where the weights are the binary acceptance indicators

w_{i}^{(j, k)} \in {0, 1}

defined in (12). This algebraic form is standard for weighted correlation estimators [24]. The principal contribution of FORCE is not the correlation formula itself, but, rather, the streaming construction of these weights via (i)

P^{2}

quantile tracking for robust location/scale and (ii) TER mechanism that adapts trimming to coherent tail expansion. These design choices allow the weights to be computed without sorting or optimization, enabling the estimator to retain a 25% breakdown point with

O (1)

per-sample updates.

In practice, the correlation is computed incrementally by maintaining running sums of

\sum w_{i} x_{i, j}

,

\sum w_{i} x_{i, j}^{2}

,

\sum w_{i} x_{i, j} x_{i, k}

, and

\sum w_{i}

for each dimension pair.

2.2.5. Complete Algorithm

Algorithm 1 presents the complete FORCE procedure in pseudocode form. The Second Pass (lines 19–28) performs a single sweep over all N observations and all

p (p - 1) / 2

dimension pairs, and, therefore, costs

O (N p^{2})

time (i.e.,

O (N)

per pair).

Algorithm 1 FORCE: Fast Outlier-Robust Correlation Estimation.

Require:: Data stream $X = {x_{1}, \dots, x_{N}}$ , $x_{i} \in R^{p}$ ; threshold parameter $λ$ (default: 3.0)
Ensure:: Robust correlation matrix $\hat{R} \in R^{p \times p}$
1:: Initialize: For each dimension $j \in {1, \dots, p}$ :
2:: Create $P^{2}$ estimators for quantiles ${0.01, 0.25, 0.50, 0.75, 0.99}$
3:: Initialize sufficient statistics: $S_{j} \leftarrow 0$ , $S_{j}^{2} \leftarrow 0$ , $n_{j} \leftarrow 0$
4:: For each pair $(j, k)$ : Initialize $S_{j k} \leftarrow 0$ , $n_{j k} \leftarrow 0$
5:: for each observation $x_{i} = (x_{i, 1}, \dots, x_{i, p})$ in stream do
6:: for each dimension $j \in {1, \dots, p}$ do
7:: Update $P^{2}$ estimators with $x_{i, j}$ ▹ $O (1)$ per quantile
8:: end for
9:: end for
10:: Compute Robust Parameters:
11:: for each dimension $j \in {1, \dots, p}$ do
12:: ${\hat{μ}}^{(j)} \leftarrow {\hat{q}}_{0.50}^{(j)}$ ▹ Robust location
13:: ${\hat{σ}}^{(j)} \leftarrow ({\hat{q}}_{0.75}^{(j)} - {\hat{q}}_{0.25}^{(j)}) / 1.349$ ▹ Robust scale
14:: ${TER}^{(j)} \leftarrow max (1, \frac{| {\hat{q}}_{0.99}^{(j)} - {\hat{q}}_{0.50}^{(j)} |}{| {\hat{q}}_{0.50}^{(j)} - {\hat{q}}_{0.01}^{(j)} |})$ ▹ Tail expansion
15:: $μ_{L}^{(j)} \leftarrow {\hat{μ}}^{(j)} - λ \cdot {TER}^{(j)} \cdot {\hat{σ}}^{(j)}$
16:: $μ_{H}^{(j)} \leftarrow {\hat{μ}}^{(j)} + λ \cdot {TER}^{(j)} \cdot {\hat{σ}}^{(j)}$
17:: end for
18:: Second Pass: Compute Trimmed Statistics ▹ two-pass batch form; see Remark 9 for single-pass streaming variants
19:: for each observation $x_{i}$ in stream do
20:: for each pair $(j, k)$ with $1 \leq j < k \leq p$ do ▹ #pairs $= p (p - 1) / 2$
21:: if $x_{i, j} \in [μ_{L}^{(j)}, μ_{H}^{(j)}]$ and $x_{i, k} \in [μ_{L}^{(k)}, μ_{H}^{(k)}]$ then
22:: Update: $S_{j} \leftarrow S_{j} + x_{i, j}$ , $S_{k} \leftarrow S_{k} + x_{i, k}$
23:: Update: $S_{j}^{2} \leftarrow S_{j}^{2} + x_{i, j}^{2}$ , $S_{k}^{2} \leftarrow S_{k}^{2} + x_{i, k}^{2}$
24:: Update: $S_{j k} \leftarrow S_{j k} + x_{i, j} \cdot x_{i, k}$
25:: Update: $n_{j k} \leftarrow n_{j k} + 1$
26:: end if
27:: end for
28:: end for
29:: Compute Correlation Matrix:
30:: for each pair $(j, k)$ with $j \leq k$ do
31:: if $j = k$ then
32:: ${\hat{R}}_{j j} \leftarrow 1$
33:: else
34:: ${\bar{x}}_{j} \leftarrow S_{j} / n_{j k}$ , ${\bar{x}}_{k} \leftarrow S_{k} / n_{j k}$
35:: ${Cov}_{j k} \leftarrow S_{j k} / n_{j k} - {\bar{x}}_{j} {\bar{x}}_{k}$
36:: ${Var}_{j} \leftarrow S_{j}^{2} / n_{j k} - {\bar{x}}_{j}^{2}$ , ${Var}_{k} \leftarrow S_{k}^{2} / n_{j k} - {\bar{x}}_{k}^{2}$
37:: ${\hat{R}}_{j k} \leftarrow {Cov}_{j k} / \sqrt{{Var}_{j} \cdot {Var}_{k}}$
38:: ${\hat{R}}_{k j} \leftarrow {\hat{R}}_{j k}$ ▹ Symmetry
39:: end if
40:: end for
41:: return $\hat{R}$

Remark 9

(Single-Pass and Online Implementation). Algorithm 1 is presented as a two-pass procedure for clarity. In true streaming settings where data cannot be stored, FORCE can be implemented in several alternative modes:

Online with delayed thresholds: Process incoming observations immediately using thresholds computed from the previous batch or time window. This introduces a one-window lag in threshold adaptation but enables single-pass operation. Formally, for window t:

[μ_{L}^{(t)}, μ_{H}^{(t)}] \leftarrow f ({\hat{q}}_{0.01}^{(t - 1)}, \dots, {\hat{q}}_{0.99}^{(t - 1)}),

(14)

where thresholds for window t are computed from quantiles estimated in window

t - 1

.

Exponentially weighted streaming: Maintain exponentially weighted sufficient statistics

S_{j}^{(ew)} = α x_{i, j} + (1 - α) S_{j}^{(ew)}

for trimmed observations, where acceptance is determined by thresholds updated continuously from the

P^{2}

estimators. This provides fully online operation with graceful adaptation to distributional drift.

Mini-batch hybrid: Accumulate observations into mini-batches of size B, perform two-pass FORCE within each batch, then merge batch-level statistics. This balances latency (B observations) against threshold accuracy.

The choice among these modes depends on application requirements. For most streaming applications with moderate throughput (<10,000 observations/second), the two-pass batch mode with periodic updates (e.g., every 1000 observations) provides excellent accuracy with negligible latency overhead.

2.3. Theoretical Analysis

2.3.1. Breakdown Point of FORCE

The breakdown point of FORCE is determined by the breakdown point of its constituent robust estimators.

Theorem 1

(Breakdown Point of FORCE). The asymptotic breakdown point of the FORCE correlation estimator is

ε_{FORCE}^{*} = 0.25 .

(15)

Proof.

The FORCE estimator relies on the median for location estimation and the IQR for scale estimation. The median has a breakdown point of 50%, while the IQR has a breakdown point of 25% [8]. As the adaptive thresholds depend on both the median and the IQR (Equation (10)), the overall breakdown point is determined by the minimum of these components. When contamination exceeds 25%, the quartile estimates

{\hat{q}}_{0.25}

and

{\hat{q}}_{0.75}

can be driven to arbitrary values, causing the robust scale

\hat{σ}

to become unreliable. Consequently, the trimming bounds lose their protective properties, and contaminated observations are admitted into the correlation computation. Therefore,

ε_{FORCE}^{*} = min (0.50, 0.25) = 0.25

. □

Remark 10.

The 25% breakdown point of FORCE is shared with other IQR-based trimmed methods (including TP-Exact and TP-TER), and is lower than the approximately 50% breakdown point achieved by the MCD estimator [13,17]. However, this trade-off enables coordinate-wise trimmed methods to achieve

O (N p^{2})

time complexity, compared to FastMCD’s

O (N p^{2} + p^{3})

. Among IQR-based methods, FORCE is distinguished by its

O (p^{2})

memory requirement—independent of stream length N—compared to the

O (N p)

storage required by exact trimmed methods. In streaming applications where contamination typically remains below 25% and data cannot be retained for batch processing, this architectural trade-off is favorable.

2.3.2. Computational Complexity Analysis

Theorem 2

(Time Complexity of FORCE). The FORCE algorithm has time complexity

O (N p^{2})

for computing the full

p \times p

correlation matrix, or

O (N)

for computing a single pairwise correlation.

Proof.

The algorithm consists of three phases:

Phase 1: Quantile estimation: For each of the p dimensions, we maintain 5 quantile estimators (

P^{2}

instances). Each observation requires

O (1)

updates per estimator. Thus, the total cost is

O (N p \cdot 5) = O (N p)

.

Phase 2: Threshold computation: Computing the robust location, scale, and TER for each dimension requires

O (1)

operations. Total cost:

O (p)

.

Phase 3: Trimmed correlation: For each of the

(\binom{p}{2}) = O (p^{2})

dimension pairs, we perform a single pass over the N observations, updating sufficient statistics in

O (1)

time per observation. Total cost:

O (N p^{2})

.

The dominant term is

O (N p^{2})

. For a single pairwise correlation, Phases 1 and 2 contribute

O (N)

and Phase 3 contributes

O (N)

, yielding

O (N)

total. □

Table 2 summarizes the complexity comparison between FORCE and alternative methods.

Remark 11.

While FORCE and Pearson correlation share the same asymptotic complexity

O (N p^{2})

, the constant factors differ. FORCE requires maintaining

5 p

quantile markers and performing two passes over the data (or a single pass with delayed correlation updates). In practice, as demonstrated in Section 3, FORCE is approximately 4–

14 \times

slower than Pearson but 214–

785 \times

faster than FastMCD (average

470 \times

). Compared to exact trimmed methods (TP-Exact, TP-TER), FORCE achieves comparable execution times (

0.7

–

2.1 \times

speedup) but with fundamentally different memory requirements: FORCE requires

O (p^{2})

memory independent of N, while exact methods require

O (N p)

storage to retain observations for sorting.

2.3.3. Consistency for a Trimmed Correlation Functional

We emphasize that FORCE estimates a well-defined trimmed correlation functional induced by its acceptance region. For a pair

(j, k)

, we define the population acceptance event

A_{j, k} = {X_{j} \in [μ_{L}^{(j)}, μ_{H}^{(j)}], X_{k} \in [μ_{L}^{(k)}, μ_{H}^{(k)}]},

where

(μ_{L}^{(j)}, μ_{H}^{(j)})

are the (population) bounds obtained from the target quantiles and the IQR-based scale. The corresponding trimmed correlation is

ρ_{j, k}^{trim} = Corr (X_{j}, X_{k} ∣ A_{j, k}) .

Proposition 1

(Consistency for the trimmed target). Assume that

{X_{i}}_{i = 1}^{N}

are i.i.d. with continuous marginals and finite second moments, and assume that the quantile estimators used in Algorithm 1 satisfy

{\hat{q}}_{ϕ}^{(j)} \overset{p}{\to} q_{ϕ}^{(j)}

for each

ϕ \in {0.01, 0.25, 0.50, 0.75, 0.99}

and dimension j. Then the FORCE estimator

{\hat{ρ}}_{j, k}^{FORCE}

converges in probability to

ρ_{j, k}^{trim}

as

N \to \infty

.

Asymptotic Rate

Under the same regularity assumptions and assuming that the acceptance probability

P (A_{j, k})

is bounded away from zero, the FORCE estimator is a smooth function of trimmed first and second moments and, therefore, achieves the usual

N^{- 1 / 2}

parametric rate:

{\hat{ρ}}_{j, k}^{FORCE} - ρ_{j, k}^{trim} = O_{p} (N^{- 1 / 2})

. The additional error induced by streaming quantile approximation enters only through the estimated bounds; once the quantile error is

o_{p} (1)

, the moment-based correlation rate dominates.

Corollary 1

(Separated contamination implies consistency for the clean correlation). Under a mixture model

X = (1 - ε) Y + ε Z

, where (i) Y is the clean distribution, (ii) Z places negligible mass inside the clean acceptance region, and (iii)

ε < 0.25

, the trimmed target equals the clean correlation

Corr (Y_{j}, Y_{k})

; hence, FORCE is consistent for the clean correlation. This formalizes the statistical objective beyond computational efficiency: FORCE is an online estimator of an explicitly robust correlation functional.

2.3.4. Space Complexity

Theorem 3

(Space Complexity of FORCE). The FORCE algorithm requires

O (p^{2})

memory, independent of N.

Proof.

The memory requirements are (1)

5 p

markers for quantile estimation (

O (p)

); (2)

3 p

values for location, scale, and TER (

O (p)

); (3)

O (p^{2})

values for sufficient statistics of dimension pairs. The dominant term is

O (p^{2})

, and, critically, this does not depend on N, making FORCE suitable for unbounded streaming applications. □

2.4. Experimental Design

2.4.1. Datasets

We evaluate FORCE on five diverse datasets spanning synthetic, financial, medical, and genomic domains. Table 3 summarizes the dataset characteristics.

Remark 12

(Dataset Selection Rationale). The five benchmark datasets were selected to span a range of contamination regimes relative to FORCE’s 25% breakdown point. Four datasets (Synthetic, S&P 500, ODDS-mammography, Genomics) have contamination rates well below the 25% threshold, representing FORCE’s intended operating regime. The ODDS-satellite dataset, with 31.7% contamination, was deliberately included to empirically validate the theoretical breakdown point—we expect FORCE to fail on this dataset, and observing this failure confirms the accuracy of our theoretical analysis.

Synthetic dataset: We generate

N = 1000

observations from a 10-dimensional multivariate normal distribution with a predefined correlation structure. A contamination fraction of

ε = 0.10

is introduced by replacing 10% of observations with draws from a heavy-tailed Cauchy distribution [8] scaled by a factor of 10. The true correlation matrix is known, enabling precise RMSE computation.

S&P 500: Daily log-returns for 50 constituent stocks of the S&P 500 index are obtained using the yfinance Python package [25] for the period 2015–2024. Financial returns exhibit “stylized facts” [26] including heavy tails, volatility clustering, and occasional extreme events (market crashes), providing a natural contamination structure. Unlike the Synthetic dataset, where the true correlation is known by construction, real-world financial data lacks a definitive ground truth. To establish a robust reference, we calculate the daily market volatility (defined as the mean absolute return across all assets) and compute the standard Pearson correlation matrix using only the observations falling below the 90th percentile of volatility. This creates a baseline representing the market’s correlation structure during stable regimes, explicitly excluding the top 10% of volatility events. Consequently, the RMSE metric evaluates how well each estimator—operating on the full, unfiltered time series—can recover this underlying stable correlation structure. FORCE achieves competitive performance (RMSE 0.1474 vs. FastMCD’s 0.1434), with its adaptive TER mechanism effectively identifying and handling volatility clustering to recover the stable market structure while being

579 \times

faster.

ODDS-mammography: This medical anomaly detection benchmark contains 11,183 mammography scans with 6 extracted features [27]. The anomaly rate is 2.3%, representing a low-contamination scenario typical of quality-controlled medical data.

ODDS-satellite: This remote sensing dataset contains 6435 observations across 36 spectral features, with a high anomaly rate of 31.7% [27]. This dataset deliberately exceeds the theoretical breakdown point of FORCE (25%) to evaluate failure modes.

Genomics (transcriptomics): Gene expression data for 5000 samples across 100 genes are obtained from the Gemma database [28]. Genomic data are typically sparse with occasional technical artifacts from sequencing errors, representing a high-dimensional, low-contamination regime.

2.4.2. Baseline Algorithms

We compare FORCE against four baseline correlation estimators:

Pearson correlation: The standard sample correlation coefficient, computed as follows:

{\hat{r}}_{j k}^{Pearson} = \frac{\sum_{i = 1}^{N} (x_{i, j} - {\bar{x}}_{j}) (x_{i, k} - {\bar{x}}_{k})}{\sqrt{\sum_{i = 1}^{N} {(x_{i, j} - {\bar{x}}_{j})}^{2}} \cdot \sqrt{\sum_{i = 1}^{N} {(x_{i, k} - {\bar{x}}_{k})}^{2}}} .

(16)

This serves as the non-robust baseline with

O (N)

complexity and zero breakdown point.

Spearman’s rank correlation: The Pearson correlation applied to ranks:

{\hat{r}}_{j k}^{Spearman} = {\hat{r}}^{Pearson} (rank (x_{\cdot, j}), rank (x_{\cdot, k})),

(17)

where

rank (\cdot)

denotes the rank transformation. In all experiments, Spearman’s

ρ

is computed by applying a univariate rank transform to each variable (with average ranks for ties) and then evaluating the Pearson correlation of the transformed data, as in (17). This baseline is, therefore, applicable to quantitative streams (including returns and gene expression), but it is not designed for single-pass streaming: computing ranks requires sorting (or equivalent order-statistics), leading to the stated

O (N log N)

cost per dimension and the resulting

O (N p^{2} log N)

cost for a full

p \times p

correlation matrix.

Winsorized correlation: Correlation computed after Winsorizing each variable at the 5th and 95th percentiles:

x_{i, j}^{W} = max ({\hat{q}}_{0.05}, min (x_{i, j}, {\hat{q}}_{0.95})),

(18)

followed by Pearson correlation on the Winsorized data. This requires sorting to compute percentiles, yielding

O (N log N)

complexity.

FastMCD: The fast algorithm for the minimum covariance determinant estimator [17], implemented in the scikit-learn 1.6.1 Python library. FastMCD achieves approximately 50% breakdown point [17,22] but with complexity

O (N p^{2} + p^{3})

.

2.4.3. Evaluation Metrics

Execution time: Wall-clock time in milliseconds, measured using Python’s high-resolution timer perf_counter. Each experiment is repeated 20 times to compute the mean and standard deviation.

RMSE: The accuracy of correlation matrix reconstruction is measured as follows:

RMSE = \sqrt{\frac{2}{p (p - 1)} \sum_{j < k} {({\hat{r}}_{j k} - r_{j k}^{true})}^{2}},

(19)

where

r_{j k}^{true}

denotes the true (or reference) correlation. For the Synthetic dataset, the true correlation is known by construction. For real-world datasets, the reference is computed using FastMCD on the full dataset, assuming it captures the underlying clean correlation structure.

95% confidence intervals: Confidence intervals for both time and RMSE are computed using the standard formula:

{CI}_{95 %} = \bar{x} \pm 1.96 \cdot \frac{s}{\sqrt{n}},

(20)

where

\bar{x}

is the sample mean, s is the sample standard deviation, and

n = 20

is the number of runs.

2.4.4. Implementation Details

The FORCE algorithm is implemented in Python 3.11. To ensure fair comparison against optimized C/Fortran routines used by numpy and scikit-learn, the core computational loops of FORCE are compiled to machine code using Numba JIT (Just-In-Time) compilation [29]. This allows the Python implementation to achieve near-native performance.

All experiments are conducted on a workstation equipped with an Intel Core i9-12900K processor (16 cores, 3.2 GHz base frequency), 64 GB DDR5 RAM, running Ubuntu 22.04 LTS. The Python environment uses NumPy 1.26.0, SciPy 1.11.3, and scikit-learn 1.3.2.

For all experiments reported in this paper, FORCE uses pure

P^{2}

streaming from

N = 1

, as all benchmark datasets have sample sizes (

N \geq 1000

) well exceeding the warm-up period. The

P^{2}

markers are initialized with the first 5 observations sorted, then updated incrementally thereafter. For applications with smaller batches or requiring immediate accuracy, the implementation supports an optional hybrid mode that uses exact quantile computation (via numpy.percentile) for the first

N_{exact}

observations before transitioning to

P^{2}

streaming.

The source codes for FORCE and all experimental scripts are available at https://github.com/pz1004/force (accessed on 20 December 2025).

3. Results

This section presents the comprehensive experimental evaluation of FORCE against six baseline algorithms across five benchmark datasets. We compare FORCE against classical estimators (Pearson, Spearman, Winsorized), the high-breakdown FastMCD method, and two trimmed Pearson variants that use exact quantile computation: TP-Exact (trimmed Pearson with exact quantiles) and TP-TER (trimmed Pearson with exact quantiles and TER adaptation). The latter two baselines isolate the effect of FORCE’s streaming quantile approximation by providing the same trimming methodology with

O (N log N)

exact sorting. Four datasets have contamination rates below FORCE’s 25% breakdown point (Synthetic: 10%, S&P 500: ∼10%, mammography: 2.3%, Genomics: <1%), while one dataset (satellite: 31.7%) deliberately exceeds this threshold to validate the theoretical breakdown analysis. We first analyze computational scalability, then examine estimation accuracy, demonstrating FORCE’s strengths within its operating regime and confirming predicted behavior outside it.

3.1. Computational Scalability

Table 4 presents the complete execution time comparison across all algorithms and datasets. Each entry reports the mean execution time in milliseconds along with the standard deviation computed over 50 independent runs, as well as the 95% confidence interval.

3.1.1. FORCE vs. FastMCD: Breaking the Computational Bottleneck

The results demonstrate a dramatic speedup of FORCE over FastMCD across all datasets. Table 5 quantifies these speedup factors.

The average speedup of FORCE over FastMCD exceeds

470 \times

, with the maximum speedup of

785 \times

observed on the Genomics dataset. This speedup enables robust correlation estimation in real-time streaming applications where FastMCD’s latency of 325–1367 ms per update would be prohibitive.

The speedup is consistent across all datasets, ranging from approximately

214 \times

(satellite) to

785 \times

(Genomics). On the Genomics dataset (

p = 20

,

N = 1203

), FORCE achieves sub-millisecond execution time (0.41 ms) compared to FastMCD’s 325 ms.

3.1.2. FORCE vs. Exact Trimmed Methods: Speed-Memory Trade-Offs

A natural question is whether the

P^{2}

streaming approximation provides sufficient computational benefit over exact quantile computation. The comparison between FORCE and TP-Exact/TP-TER directly addresses this question.

Execution time comparison: FORCE achieves modest average speedups over exact trimmed methods:

1.39 \times

faster than TP-Exact and

1.42 \times

faster than TP-TER. However, performance varies substantially by dataset. On Synthetic and Genomics data, FORCE achieves speedups of

1.80

–

2.10 \times

. On S&P 500, the speedup narrows to

1.05

–

1.34 \times

. On mammography, FORCE is actually slower (

0.69

–

0.71 \times

), reflecting the overhead of maintaining

P^{2}

estimators when dataset size permits efficient in-memory sorting.

Memory requirements (the critical distinction): The modest execution time differences obscure a fundamental architectural distinction. TP-Exact and TP-TER require

O (N p)

memory to store all observations for sorting, whereas FORCE requires only

O (p)

memory for the

P^{2}

quantile markers (plus

O (p^{2})

for correlation accumulators, shared by all methods). For a stream of

N = 10^{9}

observations across

p = 100

dimensions with 8-byte floating-point values:

TP-Exact / TP-TER: ∼800 GB storage required.
FORCE: ∼80 KB storage required (constant in N).

This

10^{7}

-fold reduction in memory footprint represents FORCE’s primary architectural contribution. The algorithm targets true streaming environments—continuous sensor networks, high-frequency trading systems, edge computing deployments—where observations arrive indefinitely and cannot be retained for batch processing.

3.1.3. FORCE vs. Rank-Based Methods: Bypassing the Sorting Barrier

Compared to Spearman’s rank correlation, FORCE achieves consistent speedups ranging from

1.63 \times

(mammography) to

6.36 \times

(Synthetic), with an average speedup of

3.86 \times

. This improvement directly reflects the elimination of the

O (N log N)

sorting requirement.

The speedup over Spearman varies with dataset characteristics. The Synthetic dataset exhibits the highest speedup (

6.36 \times

) due to its moderate sample size (

N = 1000

) where sorting overhead is relatively more significant. The mammography dataset shows the lowest speedup (

1.63 \times

) because its smaller dimensionality (

p = 6

) reduces the number of pairwise correlations to compute.

Similarly, FORCE outperforms Winsorized correlation by factors of

1.04 \times

to

3.82 \times

(average

2.40 \times

), as Winsorization also requires sorting to determine percentile thresholds.

3.1.4. FORCE vs. Pearson: The Cost of Robustness

FORCE is approximately

5 \times

to

14 \times

slower than non-robust Pearson correlation, with an average slowdown factor of

6.25 \times

(equivalently, FORCE achieves

0.16 \times

the speed of Pearson). This overhead reflects the cost of maintaining streaming quantile estimators and performing the adaptive trimming operation.

Critically, this overhead is constant with respect to sample size N, as both algorithms scale linearly. The practical implication is that FORCE can process data at rates approximately one-sixth that of Pearson while providing robustness guarantees—a favorable trade-off in contaminated environments where Pearson’s zero breakdown point renders it unreliable.

Remark 13

(Execution Time Variance). Examination of Table 4 reveals that FORCE exhibits higher relative variance in execution time compared to Pearson. For example, on the S&P 500 dataset, FORCE achieves

1.85 \pm 0.76

ms (coefficient of variation CV

= 41 %

) versus Pearson’s

0.12 \pm 0.01

ms (CV

= 8 %

). This elevated variance arises from two sources:

Data-dependent

P^{2}

adjustments: The

P^{2}

algorithm performs marker position adjustments (Equations (4) and (5)) only when markers deviate from desired positions. The number and magnitude of adjustments depend on the data distribution and observation order, introducing run-to-run variability.

Adaptive trimming decisions: The number of observations passing the acceptance criterion varies across runs (due to different random seeds for data shuffling in cross-validation), affecting the number of arithmetic operations in the correlation computation.

For real-time system designers, this variance is typically acceptable: even at the upper 95% confidence bound, FORCE execution times remain under 3 ms for most datasets, providing ample margin for latency-critical applications. If deterministic timing is required, the variance can be reduced by preallocating memory and disabling dynamic marker adjustments after the warm-up period.

3.1.5. Execution Time Distribution

Figure 1 visualizes the execution time comparison on a logarithmic scale, illustrating the orders-of-magnitude difference between algorithm classes.

The visualization reveals three distinct performance tiers. FORCE, TP-Exact, and TP-TER occupy the same computational tier, confirming that the

P^{2}

approximation provides marginal speed benefits. The critical distinction between these methods lies not in execution time but in memory requirements: FORCE operates with

O (p)

memory independent of stream length, while exact methods require

O (N p)

storage.

3.2. Estimation Accuracy

Table 6 presents the RMSE comparison, measuring the accuracy of correlation matrix reconstruction relative to the ground truth.

Figure 2 provides a visual comparison of estimation accuracy across all algorithms and datasets.

The RMSE results reveal a nuanced performance landscape that varies substantially across datasets and contamination regimes. We analyze these results by dataset category.

3.2.1. Financial Data: Trimmed Methods Excel

On the S&P 500 financial dataset, trimmed Pearson methods achieve the best performance among all estimators. TP-Exact attains RMSE of

0.0902

and TP-TER achieves

0.0909

, followed by FORCE at

0.1186

. All three trimmed methods substantially outperform Spearman (

0.1240

), Winsorized (

0.1321

), Pearson (

0.1335

), and, notably, FastMCD (

0.1606

).

This result merits careful interpretation. The S&P 500 dataset contains daily log-returns characterized by “stylized facts” of financial time series, most notably volatility clustering: periods of relative calm interspersed with bursts of extreme variance during market stress events (e.g., the 2020 COVID-19 crash, 2022 inflation shocks).

Why trimmed methods outperform FastMCD: The counterintuitive result that coordinate-wise trimmed methods outperform FastMCD warrants examination. FastMCD identifies multivariate outliers via Mahalanobis distance and excludes them entirely. During volatility clustering, returns across multiple assets exhibit correlated extreme movements. These events are not outliers in the traditional sense—they represent genuine, economically meaningful phenomena that should inform the correlation structure. By excluding entire observations during market stress, FastMCD discards economically relevant covariance information.

Trimmed Pearson methods, including FORCE, perform coordinate-wise trimming: each dimension is trimmed independently based on its marginal distribution. This approach accommodates coherent tail expansion—when all assets experience elevated volatility—while still rejecting dimension-specific anomalies (e.g., data errors affecting a single stock).

FORCE vs. exact trimmed methods: The accuracy gap between FORCE (RMSE

0.1186

) and TP-Exact (RMSE

0.0902

) reflects the cost of streaming quantile approximation. For the S&P 500 dataset (

N = 6288

), the

P^{2}

algorithm’s five-marker approximation introduces quantile estimation error that propagates to the trimming boundaries. With exact sorting, TP-Exact identifies the true quartiles precisely, enabling more accurate outlier rejection.

Limited benefit of TER adaptation: Comparing TP-Exact (no TER, RMSE

0.0902

) with TP-TER (with TER, RMSE

0.0909

) reveals that the adaptive tail expansion mechanism provides minimal improvement on this dataset. For financial returns with approximately symmetric heavy tails, the TER ratio

| q_{0.99} - - q_{0.50} | / | q_{0.50} - - q_{0.01} |

remains close to 1, providing limited adaptation. The TER mechanism may provide greater benefit for data with pronounced asymmetric tails.

Remark 14

(Interpretation of S&P 500 Results). The S&P 500 “ground truth” is the correlation matrix computed from low-volatility days (below the 90th percentile), representing the stable market correlation structure. TP-Exact achieves the best RMSE (

0.0902

) on this benchmark. For batch analysis of historical financial data where accuracy is paramount, exact trimmed methods are recommended. FORCE’s value proposition for financial applications lies in streaming scenarios—real-time correlation monitoring, algorithmic trading systems, or live risk dashboards—where data cannot be stored for batch processing.

Sensitivity Analysis

To validate the robustness of these findings, we performed a sensitivity analysis by varying the volatility cutoff used to define the reference correlation (excluding top 5%, 10%, and 15% volatility days). As detailed in Appendix A, all trimmed methods (FORCE, TP-Exact, TP-TER) consistently outperformed FastMCD across all cutoffs, confirming that coordinate-wise trimming is more appropriate than multivariate outlier exclusion for financial time series exhibiting volatility clustering.

3.2.2. Medical Data: FastMCD Leads, FORCE Competitive

On the ODDS-mammography dataset, FastMCD achieves the best RMSE (

0.0074

), followed by FORCE (

0.0157

), TP-Exact (

0.0163

), and TP-TER (

0.0163

). Spearman (

0.0345

) and Winsorized (

0.0308

) perform notably worse.

The mammography dataset represents a low-contamination regime (2.3% anomaly rate) with well-separated point outliers—an ideal scenario for FastMCD’s multivariate outlier detection. In this setting, the high-breakdown MCD approach correctly identifies and excludes the small fraction of anomalous observations.

FORCE achieves accuracy within a factor of

2.1 \times

of FastMCD while providing a speedup of

307 \times

. Notably, FORCE slightly outperforms the exact trimmed methods on this dataset (RMSE

0.0157

vs.

0.0163

), though the practical difference is negligible.

The practical implication is significant: in medical imaging pipelines where real-time quality control is essential, FORCE enables robust correlation monitoring at processing rates exceeding 350 updates per second, compared to approximately 1.2 updates per second with FastMCD.

3.2.3. Genomic Data: Reference Method Dominates

The Genomics dataset presents a moderate-dimensional (

p = 20

), low-contamination (<1%) scenario. Spearman achieves effectively zero error (

0.0000

), while FORCE achieves

0.1267

—lower than TP-TER (

0.1319

), TP-Exact (

0.1571

), and FastMCD (

0.2583

).

This result reflects the experimental design: the reference correlation for the Genomics dataset is computed using Spearman correlation, so Spearman naturally achieves zero RMSE by definition. FORCE and other moment-based estimators diverge from this rank-based reference.

Among non-Spearman methods, FORCE achieves the lowest RMSE, outperforming both exact trimmed variants and FastMCD. This suggests that for data where the underlying correlation structure is monotonic but not necessarily linear, FORCE’s adaptive trimming provides effective robust estimation while preserving magnitude information that Spearman discards.

The computational advantage remains substantial: FORCE achieves a speedup of

785 \times

over FastMCD, maintaining sub-millisecond execution time (0.41 ms).

3.2.4. Synthetic Data: Controlled Contamination Analysis

The synthetic dataset, with 10% controlled Cauchy contamination, allows precise analysis of estimator behavior under known conditions. The true correlation matrix is known by construction, enabling exact RMSE computation.

FastMCD achieves the lowest RMSE (

0.0180

), demonstrating the gold-standard high-breakdown estimator’s effectiveness when contamination is clearly separable from the nominal distribution. The exact trimmed methods follow: TP-Exact (

0.0487

) and TP-TER (

0.0549

). FORCE achieves

0.2001

, followed by Spearman (

0.2075

), Winsorized (

0.2665

), and Pearson (

0.6074

).

The ordering FastMCD < TP-Exact < TP-TER < FORCE < Spearman reflects the hierarchy of robustness mechanisms under heavy-tailed contamination:

FastMCD: Multivariate outlier detection via Mahalanobis distance identifies Cauchy-contaminated observations with high precision due to their extreme multivariate leverage.
TP-Exact: Exact coordinate-wise trimming removes univariate extremes effectively; the precise quartile boundaries enable accurate outlier rejection.
TP-TER: TER adaptation slightly widens bounds under symmetric heavy tails, admitting some borderline observations.
FORCE: The $P^{2}$ approximation introduces quantile error, particularly during the early streaming phase when Cauchy outliers can influence marker positions.

The accuracy gap between FORCE and TP-Exact on synthetic data (

0.2001

vs.

0.0487

, a factor of

4.1 \times

) represents the accuracy cost of streaming quantile approximation under heavy-tailed contamination. Importantly, despite the Cauchy distribution having infinite variance, the TER mechanism does not exhibit pathological behavior. Because the Cauchy distribution is symmetric, the TER remains close to 1, and the acceptance bounds are determined primarily by the robust scale estimate

\hat{σ}

.

FastMCD’s superior accuracy in this controlled setting comes at a computational cost of

780 \times

slower execution. For batch analysis where accuracy is paramount, FastMCD or TP-Exact is recommended. For streaming applications requiring bounded memory, FORCE provides acceptable accuracy (RMSE

0.2001

vs. Pearson’s

0.6074

—a

3 \times

improvement) with

O (p)

memory.

3.3. Empirical Validation of the Breakdown Point: The ODDS-Satellite Stress Test

To empirically validate the theoretical breakdown point established in Theorem 1, we deliberately include the ODDS-satellite dataset, whose contamination rate of 31.7% exceeds the 25% threshold for IQR-based methods by a margin of 6.7 percentage points. This controlled stress test provides a critical scientific function: confirming that the breakdown behavior of all IQR-based estimators (FORCE, TP-Exact, TP-TER) matches theoretical predictions (Table 7).

The results precisely confirm the theoretical prediction for all IQR-based methods. FORCE, TP-Exact, and TP-TER all achieve RMSE values comparable to non-robust Pearson (∼0.72), demonstrating complete robustness collapse as theory predicts. The near-identical performance of all three methods (FORCE:

0.7274

, TP-Exact:

0.7160

, TP-TER:

0.7160

) confirms that they share the same fundamental limitation: the 25% breakdown point inherited from IQR-based scale estimation.

This equivalence under breakdown conditions reveals an important insight: the accuracy advantage of exact quantile methods over FORCE vanishes when the IQR itself becomes corrupted. When contamination exceeds 25%, the quartile estimates—whether computed exactly or approximately—are dominated by outliers, and trimming fails regardless of quantile precision.

Meanwhile, FastMCD maintains excellent accuracy (RMSE

0.0160

) because its approximately 50% breakdown point remains above the 31.7% contamination level.

Figure 3 illustrates this breakdown phenomenon.

This controlled experiment provides compelling empirical validation of the theoretical properties shared by all IQR-based trimming methods. Practitioners can confidently deploy FORCE, TP-Exact, or TP-TER in environments where contamination remains below 20% (providing a safety margin), with the assurance that their behavior is theoretically grounded and empirically verified. For offline forensic analysis of heavily corrupted datasets exceeding this threshold, high-breakdown methods such as FastMCD remain the appropriate choice.

3.4. Summary of Results

Table 8 provides a consolidated summary of FORCE’s performance characteristics relative to all baselines. We analyze the contamination levels relative to the breakdown points in Table 9.

The experimental results establish FORCE as a specialized solution for robust correlation estimation in true streaming environments—applications where data arrives continuously and cannot be stored for batch processing. When batch processing is acceptable and memory is unconstrained, exact trimmed methods (TP-Exact) provide superior accuracy at comparable speed. FORCE’s unique contribution is enabling robust estimation with bounded

O (p)

memory independent of stream length, filling a critical gap for applications such as continuous sensor monitoring, real-time financial analytics, and edge computing deployments where storage constraints preclude batch approaches.

3.5. Statistical Significance

To verify that the observed performance differences are statistically significant, we conducted paired t-tests comparing FORCE execution times against each baseline. Table 10 reports the p-values.

Execution time differences between FORCE and high-latency methods (FastMCD, Spearman, Winsorized) are statistically significant at the

α = 0.001

level, confirming that the observed speedups are not attributable to random variation. Differences between FORCE and exact trimmed methods (TP-Exact, TP-TER) achieve significance on some datasets (Synthetic, Mammography, Genomics at

p < 0.01

) but not others (S&P 500 vs. TP-TER:

p = 0.569

; Satellite vs. TP-Exact:

p = 0.112

). The non-significant p-values confirm that FORCE and exact trimmed methods occupy the same computational performance tier—their distinction lies in memory requirements rather than execution time.

4. Discussion

The experimental results presented in Section 3 establish FORCE as a viable solution for real-time robust correlation estimation in memory-constrained streaming environments, while also revealing important limitations and trade-offs relative to exact trimmed methods. This section interprets these findings in the context of the broader robust statistics literature, analyzes the mechanisms underlying FORCE’s performance characteristics, and provides practical guidance for method selection.

4.1. Positioning in the Estimator Landscape

The central contribution of FORCE is its ability to achieve robust correlation estimation with bounded memory in true streaming environments. As illustrated in Figure 4, classical estimators occupy two distinct regions of the speed–robustness plane: moment-based methods (Pearson) offer

O (N)

time per pairwise correlation but zero breakdown point, while robust alternatives (Spearman, MCD) provide protection against contamination at the cost of

O (N log N)

time per pairwise correlation or worse.

The experimental results reveal that FORCE shares the fast–robust region with exact trimmed methods (TP-Exact, TP-TER). All three methods achieve comparable execution times (averaging 1–3 ms across benchmarks) and identical 25% breakdown points inherited from IQR-based scale estimation. The critical distinction lies not in speed or robustness guarantees, but in memory requirements:

TP-Exact/TP-TER: Require $O (N p)$ storage to retain all observations for sorting. For a stream of $N = 10^{9}$ observations across $p = 100$ dimensions, this translates to ∼800 GB of storage.
FORCE: Requires only $O (p)$ storage for $P^{2}$ quantile markers (plus $O (p^{2})$ for correlation accumulators, shared by all methods). The same stream requires only ∼80 KB of storage—a reduction of seven orders of magnitude.

This architectural distinction determines method selection. When data can be stored for batch processing, exact trimmed methods provide superior accuracy at comparable speed. When data arrives as an unbounded stream and cannot be retained, FORCE is the only viable option among IQR-based trimmed estimators.

The Accuracy Cost of Streaming Quantile Approximation

A natural question is how the approximation error inherent in the

P^{2}

algorithm affects the final correlation estimates. The experimental results provide direct empirical evidence.

On the S&P 500 financial dataset, TP-Exact achieves RMSE of

0.0902

compared to FORCE’s

0.1186

—a difference of 24%. On the synthetic dataset with Cauchy contamination, the gap widens: TP-Exact achieves RMSE of

0.0487

versus FORCE’s

0.2001

—a factor of

4.1 \times

. These differences directly reflect the accuracy cost of streaming quantile approximation.

Let

ϵ_{q}

denote the relative error in a quantile estimate, i.e.,

{\hat{q}}_{ϕ} = q_{ϕ} (1 + ϵ_{q})

where

q_{ϕ}

is the true quantile. The robust scale estimate

\hat{σ}

(Equation (7)) depends on the difference

{\hat{q}}_{0.75} - {\hat{q}}_{0.25}

. Under typical conditions where both quartiles have similar relative errors, the absolute error in the IQR is approximately

| Δ IQR | \approx | ϵ_{q} | \cdot (q_{0.75} - q_{0.25}) = | ϵ_{q} | \cdot IQR .

(21)

This error propagates linearly to the trimming bounds (Equation (10)), meaning that a 1% error in quantile estimation leads to approximately a 1% error in the acceptance region width.

For large samples (

N \geq 1000

), the

P^{2}

algorithm typically achieves <0.5% relative error for central quantiles (

q_{0.25}

,

q_{0.75}

), resulting in minimal impact on trimming accuracy. However, under heavy-tailed contamination—as in the synthetic Cauchy experiment—outliers encountered early in the stream can persistently bias the

P^{2}

marker positions, leading to the larger accuracy gaps observed empirically.

The key insight is that this accuracy cost must be weighed against the memory savings. For applications where batch processing is feasible, the 24–400% accuracy improvement of exact methods justifies the

O (N p)

storage requirement. For true streaming applications where data cannot be retained, FORCE’s accuracy (e.g., RMSE

0.2001

vs. Pearson’s

0.6074

on synthetic data—a

3 \times

improvement) represents a meaningful robustness gain achievable within

O (p)

memory.

4.2. Financial Data: Trimmed Methods Outperform Multivariate Approaches

The experimental results on the S&P 500 dataset reveal an important finding: all coordinate-wise trimmed methods (TP-Exact, TP-TER, FORCE) substantially outperform FastMCD on financial time series. TP-Exact achieves the best RMSE (

0.0902

), followed by TP-TER (

0.0909

) and FORCE (

0.1186

), while FastMCD—despite its higher 50% breakdown point—achieves only

0.1606

.

This result can be understood through the lens of financial econometrics. Financial return series exhibit well-documented “stylized facts” [26], including heavy tails, volatility clustering (GARCH effects), and correlation asymmetry (correlations increase during market stress). These phenomena create a data environment fundamentally different from the symmetric contamination model assumed by classical robust statistics.

4.2.1. Why Coordinate-Wise Trimming Excels on Financial Data

Consider a market crash event. During such episodes, returns across most assets become simultaneously extreme and highly correlated—a phenomenon termed “correlation breakdown” or “flight to correlation” in the finance literature [30]. From the perspective of classical robust statistics, these observations appear as multivariate outliers: they lie far from the distributional center in Mahalanobis distance. Consequently, high-breakdown methods like FastMCD identify and exclude them.

However, excluding crash observations is economically inappropriate. The correlation structure during market stress is precisely what risk managers need to capture for value-at-risk calculations, stress testing, and hedging strategy design. An estimator that excludes crash observations provides a misleadingly optimistic picture of portfolio diversification benefits.

Coordinate-wise trimmed methods—including TP-Exact, TP-TER, and FORCE—perform marginal trimming independently for each dimension. When all assets experience elevated volatility simultaneously (coherent tail expansion), the marginal quantiles expand accordingly, and the trimming bounds widen to accommodate the legitimately extreme observations. This behavior preserves the economically meaningful correlation information embedded in market stress events.

4.2.2. Limited Benefit of TER Adaptation

The experimental results reveal that the tail expansion ratio (TER) mechanism provides minimal accuracy improvement on financial data. Comparing TP-Exact (no TER, RMSE

0.0902

) with TP-TER (with TER, RMSE

0.0909

) shows negligible difference.

This finding has a straightforward explanation. The TER (Equation (8)) measures asymmetry between upper and lower tails:

TER = | q_{0.99} - q_{0.50} | / | q_{0.50} - q_{0.01} |

. For financial returns, which exhibit approximately symmetric heavy tails, the TER remains close to 1 regardless of volatility level. The TER mechanism was designed to detect asymmetric tail expansion—scenarios where one tail grows while the other remains stable. For symmetric volatility clustering, the mechanism provides no additional information.

Future work could investigate asymmetric TER formulations for data with directional tail behavior, such as credit spreads (which exhibit pronounced right skewness) or options-implied volatilities.

4.2.3. Implications for Financial Applications

The experimental results suggest clear guidance for financial practitioners:

For batch analysis of historical returns (backtesting, model calibration), TP-Exact provides the best accuracy and should be preferred.
For real-time streaming applications (live risk monitoring, algorithmic trading), FORCE enables robust estimation with bounded memory. The 24% accuracy gap relative to TP-Exact is the cost of streaming capability.
FastMCD, despite its higher breakdown point, is not recommended for financial time series exhibiting volatility clustering, as it inappropriately excludes economically meaningful stress observations.

4.3. Comparison with the Recent Literature

The results of this study align with and extend several recent findings in the robust statistics and streaming algorithms literature.

4.3.1. Connection to Concept Drift Research

The TER mechanism shares conceptual similarities with concept drift detection in streaming machine learning [7]. Both approaches aim to distinguish genuine distributional shifts (which should be incorporated into the model) from transient anomalies (which should be rejected).

However, FORCE does not explicitly model concept drift; it assumes that the contamination model (Equation (1)) holds with stationary parameters. Extending FORCE to incorporate explicit drift detection and adaptation represents a promising direction for future research.

4.3.2. Connection to High-Dimensional Finance Research

Maddanu et al. [18] documented the computational infeasibility of Mahalanobis distance calculations in high-dimensional financial datasets, noting that memory and compute requirements scale as

O (p^{2})

and

O (p^{3})

, respectively. They concluded that practitioners often revert to less robust methods due to computational constraints.

Both FORCE and exact trimmed methods (TP-Exact, TP-TER) address this limitation, achieving

O (N p^{2})

time complexity that scales identically to non-robust Pearson correlation. The choice between them depends on memory constraints: TP-Exact requires

O (N p)

storage for batch sorting, while FORCE requires only

O (p)

storage for streaming quantile markers.

Our genomics experiment (

p = 20

) demonstrates that both approaches maintain sub-millisecond to low-millisecond execution times in moderately high-dimensional settings. For extremely-high-dimensional applications (

p > 1000

), the

O (p^{2})

complexity of computing the full correlation matrix becomes prohibitive, regardless of the estimation method, and practitioners typically employ dimensionality reduction or sparse correlation estimation.

4.3.3. Connection to Theoretical Robust Statistics

Loh [19] called for new robust estimators that relax classical equivariance requirements in exchange for computational tractability. FORCE exemplifies this trade-off.

Classical robust estimators like the MCD are affine-equivariant [22]: the estimate transforms appropriately under affine transformations of the data. This property ensures that the estimator is not biased by arbitrary scaling or rotation of the coordinate system. However, achieving affine equivariance requires estimation of the full covariance structure simultaneously, leading to the

O (p^{3})

matrix operations that dominate FastMCD’s complexity.

FORCE and exact trimmed methods sacrifice affine equivariance by estimating marginal quantiles independently for each dimension, then combining these estimates to perform coordinate-wise trimming. This approach cannot detect outliers that are extreme only in their multivariate structure (e.g., observations that are moderate in each marginal but lie far from the regression surface). However, it enables the

O (N p^{2})

complexity that makes real-time estimation feasible.

The empirical results suggest that this trade-off is favorable in many practical settings. Coordinate-wise trimming successfully identifies outliers in four of five benchmark datasets, failing only when contamination exceeds the 25% breakdown point.

4.3.4. The Scientific Value of Stress Testing

A rigorous evaluation of any statistical estimator requires testing beyond its design limits. The deliberate inclusion of the ODDS-satellite dataset—with contamination exceeding the 25% breakdown point shared by all IQR-based methods—serves this scientific purpose.

The stress test demonstrates three important properties: (1) the breakdown point of IQR-based methods is precisely characterized by theory; (2) the transition from robust to non-robust behavior occurs at the predicted threshold for FORCE, TP-Exact, and TP-TER alike; and (3) practitioners can reliably predict method applicability based on contamination estimates. The observation that all three methods exhibit identical breakdown behavior confirms their shared theoretical foundation and enables principled method selection based on memory constraints rather than robustness properties.

4.4. Applicability Bounds and Method Selection

The ODDS-satellite experiment empirically confirms the theoretical 25% breakdown point shared by all IQR-based trimmed methods (FORCE, TP-Exact, TP-TER). This validation is scientifically valuable: it demonstrates that the limitations of coordinate-wise trimming are well understood, predictable, and precisely characterized.

4.4.1. Theoretical Basis of the 25% Breakdown Point

The 25% breakdown point is a fundamental property of IQR-based scale estimation, rigorously established in the robust statistics literature [8]. When contamination exceeds 25%, the first and third quartiles are corrupted, causing the IQR to reflect the contamination distribution rather than the nominal distribution. This limitation applies equally to exact and approximate quantile computation—the satellite experiment confirms identical breakdown behavior for FORCE, TP-Exact, and TP-TER (all achieving RMSE ∼0.72, comparable to non-robust Pearson).

This shared limitation is not a flaw of any individual method, but, rather, a well-understood trade-off: IQR-based methods exchange the higher breakdown point of MCD-based estimators (approximately 50%) for the computational efficiency required in real-time applications. Alternative robust scale estimators with higher breakdown points exist—notably the median absolute deviation (MAD), which achieves 50% breakdown [8,9]—but incorporating MAD into streaming estimation would require tracking additional quantiles and increase computational overhead.

4.4.2. Multivariate Outliers

A second limitation, shared by all coordinate-wise trimming methods, is reduced sensitivity to multivariate outliers. Consider an observation that is unremarkable in each marginal distribution but lies far from the regression line relating two variables. Such an observation would not be flagged by coordinate-wise trimming but would be correctly identified by the MCD’s Mahalanobis distance criterion.

In practice, this limitation is most relevant for low-dimensional data (

p \leq 3

) where the regression structure is visually apparent and economically interpretable. For high-dimensional data, multivariate outliers become increasingly rare relative to marginal outliers, and the practical impact of this limitation diminishes.

4.4.3. Non-Stationary Contamination

FORCE assumes that contamination is approximately stationary over the observation window. If contamination intensity varies dramatically (e.g., a sensor that malfunctions intermittently), the streaming quantile estimates may not accurately reflect the current contamination level. In such settings, windowed variants of FORCE that discard old observations may be preferable. Exact trimmed methods, by recomputing quantiles on each batch, naturally adapt to non-stationary contamination when applied in a sliding-window fashion.

4.5. Practical Deployment Guidelines

Based on the theoretical analysis and experimental results, we offer the following guidelines for practitioners.

4.5.1. Method Selection Framework

When to choose FORCE:

Data arrives as an unbounded stream that cannot be stored (e.g., continuous sensor telemetry, real-time market data feeds).
Memory constraints preclude storing observations (e.g., edge computing devices, embedded systems).
Application requires online updates without access to historical data.
Contamination is expected to remain below 20% (providing safety margin relative to the 25% breakdown point).

When to choose TP-Exact:

Data can be stored for batch processing.
Maximum accuracy is required and the 24–400% improvement over FORCE justifies storage costs.
Analysis is performed offline (backtesting, model calibration, historical studies).
Sliding-window analysis is acceptable (recompute on each window).

When to choose FastMCD:

Contamination may exceed 25% and the higher breakdown point is essential.
Multivariate outlier detection is required (observations extreme only in joint structure).
Computational latency of 300–1400 ms per update is acceptable.
Offline forensic analysis of heavily corrupted datasets.

We summarize these recommendations in Table 11.

4.5.2. Parameter Selection

The threshold parameter

λ

(default 3.0) controls the trade-off between robustness and efficiency. Larger values of

λ

admit more observations, increasing efficiency but reducing robustness. For highly contaminated streams, reducing

λ

to 2.5 may improve performance. For low-contamination streams, increasing

λ

to 3.5 reduces unnecessary trimming.

4.5.3. Quantifying the Accuracy–Memory Trade-Off

Table 12 enables explicit quantification of FORCE’s accuracy–memory trade-off:

On average, FORCE achieves 82% of TP-Exact’s accuracy while requiring

10^{3}

–

10^{4} \times

less memory. For applications where this trade-off is acceptable, FORCE enables robust estimation in environments where exact methods are infeasible. For applications requiring maximum accuracy, TP-Exact should be preferred when storage permits.

5. Conclusions

This paper introduced FORCE (Fast Outlier-Robust Correlation Estimation), a streaming algorithm designed to enable robust correlation estimation in memory-constrained environments where data arrives as unbounded streams and cannot be retained for batch processing.

5.1. Summary of Contributions

The principal contributions of this work are fourfold:

First, we developed a novel algorithmic framework that achieves robust correlation estimation with bounded memory by replacing sorting operations with streaming quantile approximations based on the

P^{2}

algorithm. FORCE requires only

O (p)

memory for quantile markers—independent of stream length N—compared to the

O (N p)

storage required by exact trimmed methods (TP-Exact, TP-TER) that must retain all observations for sorting. This architectural distinction enables deployment in true streaming environments where exact methods are infeasible.

Second, we conducted comprehensive benchmarking, comparing FORCE against six baseline algorithms across five diverse datasets spanning synthetic, financial, medical, and genomic domains. The results demonstrate that FORCE achieves speedups of approximately

470 \times

over FastMCD and

3.9 \times

over Spearman’s rank correlation. Importantly, we also evaluated exact trimmed methods (TP-Exact, TP-TER) that share FORCE’s coordinate-wise trimming approach but use exact quantile computation. These comparisons reveal that FORCE and exact trimmed methods occupy the same computational performance tier (1–3 ms average execution time), with the critical distinction being memory requirements rather than speed.

Third, we demonstrated that coordinate-wise trimmed methods—including FORCE, TP-Exact, and TP-TER—outperform multivariate robust estimators (FastMCD) on financial time series exhibiting volatility clustering. On the S&P 500 dataset, TP-Exact achieved the best RMSE (

0.0902

), followed by TP-TER (

0.0909

) and FORCE (

0.1186

), while FastMCD achieved

0.1606

despite its higher breakdown point. This result reflects the fundamental difference between coordinate-wise and multivariate outlier treatment: coordinate-wise trimming accommodates coherent market-wide volatility events that multivariate methods inappropriately exclude. FORCE achieves 76% of TP-Exact’s accuracy on financial data while requiring

10^{4} \times

less memory, enabling real-time correlation monitoring in streaming environments where batch processing is infeasible.

Fourth, we provided rigorous characterization and empirical validation of the 25% breakdown point shared by all IQR-based trimmed methods. Using the ODDS-satellite dataset (31.7% contamination), we demonstrated that FORCE, TP-Exact, and TP-TER exhibit identical breakdown behavior—all degrading to RMSE ∼0.72, comparable to non-robust Pearson. This shared limitation confirms that the methods rest on a common theoretical foundation, and practitioners can select among them based on memory constraints rather than robustness properties.

5.2. Positioning in the Robust Statistics Literature

FORCE shares the fast–robust region of the estimator landscape with exact trimmed methods (TP-Exact, TP-TER). All three methods achieve

O (N p^{2})

time complexity for full correlation matrix computation and 25% breakdown points inherited from IQR-based scale estimation. The critical distinction is architectural:

Exact trimmed methods (TP-Exact, TP-TER) achieve superior accuracy by computing exact quantiles via $O (N log N)$ sorting, but require $O (N p)$ storage to retain observations.
FORCE accepts an accuracy cost (averaging 82% of TP-Exact’s accuracy across benchmarks) in exchange for $O (p)$ memory independent of stream length.

This positioning clarifies FORCE’s role: it is not a universal replacement for exact trimmed methods, but, rather, a specialized solution for the increasingly important domain of memory-constrained streaming analytics. When batch processing is acceptable, TP-Exact provides better accuracy at comparable speed. When data cannot be stored, FORCE is the only viable option among IQR-based trimmed estimators.

The algorithm embodies a principled trade-off: by sacrificing quantile precision and accepting an accuracy cost of 18–76% relative to exact methods, FORCE achieves the bounded-memory property necessary for deployment in true streaming environments. Our results demonstrate that this trade-off is favorable for applications where memory constraints preclude batch processing.

5.3. Implications for Practice

The practical implications of this work extend across multiple application domains:

Quantitative finance: For batch analysis of historical returns (backtesting, model calibration), TP-Exact provides the best accuracy among fast methods and should be preferred. For real-time streaming applications (live risk monitoring, algorithmic trading), FORCE enables robust estimation with bounded memory, achieving 76% of TP-Exact’s accuracy while requiring no data storage. All coordinate-wise trimmed methods outperform FastMCD on financial data exhibiting volatility clustering, as they correctly treat coherent market-wide stress events as legitimate phenomena rather than outliers. This finding has important implications for risk management: correlation estimates during market stress—precisely when accurate estimates are most critical—are better captured by trimmed methods than by multivariate robust approaches.

Internet of Things: In IoT deployments with thousands of sensors generating continuous telemetry, FORCE enables correlation-based anomaly detection at scale. The

O (p)

memory footprint for quantile markers (plus

O (p^{2})

for correlation accumulators) makes deployment feasible on edge computing devices with limited resources. For applications where sensor data can be batched and stored, TP-Exact provides superior accuracy.

Genomics and bioinformatics: High-throughput sequencing generates massive correlation matrices for gene coexpression analysis. Both FORCE and exact trimmed methods scale linearly with sample size, enabling robust estimation on large datasets. The choice between them depends on whether the analysis pipeline can accommodate

O (N p)

storage for batch quantile computation.

5.4. Limitations

We acknowledge several limitations of the current work:

Accuracy cost of streaming approximation: The experimental results quantify the accuracy cost of

P^{2}

streaming quantile approximation: FORCE achieves 76–124% of TP-Exact’s accuracy depending on the dataset, with the largest gap (24% of TP-Exact) observed on synthetic data with heavy-tailed Cauchy contamination. Applications requiring maximum accuracy should use exact trimmed methods when storage permits.

Limited benefit of TER mechanism: The tail expansion ratio (TER) mechanism, designed to detect asymmetric tail expansion, provides minimal accuracy improvement on data with symmetric heavy tails (e.g., financial returns). The comparison between TP-Exact and TP-TER (RMSE

0.0902

vs.

0.0909

on S&P 500) confirms that TER adaptation offers negligible benefit when tails expand symmetrically. The TER may provide greater value for data with pronounced asymmetric tails, but this remains to be validated empirically.

25% breakdown point: The breakdown point, while sufficient for many applications, is lower than the approximately 50% achieved by MCD-based methods. This limitation is shared by all IQR-based trimmed methods (FORCE, TP-Exact, TP-TER) and is intrinsic to quartile-based scale estimation. Applications with contamination rates exceeding 20% should employ these methods with caution or consider FastMCD despite its computational cost.

Reduced sensitivity to multivariate outliers: Coordinate-wise trimming cannot detect observations that are extreme only in their joint structure. This limitation is shared by all trimmed Pearson variants and is most relevant for low-dimensional data (

p \leq 3

) where multivariate outlier structure is economically interpretable.

The current implementation assumes independent, identically distributed observations. Extension to time-series data with serial dependence (e.g., autoregressive contamination) requires additional theoretical development.

The asymptotic distribution of FORCE estimates has not been derived, limiting the ability to construct confidence intervals or perform hypothesis tests. Future theoretical work should address this gap.

P^{2}

approximation error in small samples: FORCE relies on the

P^{2}

algorithm for streaming quantile estimation, which achieves high accuracy for large sample sizes (<0.5% relative error for

N \geq 1000

) but exhibits larger errors in small-N scenarios. For samples with

N < 200

, applications should consider using exact quantile computation (TP-Exact) or employing a hybrid approach that transitions from exact to streaming quantiles after sufficient data accumulates.

5.5. Future Research Directions

Several promising directions emerge from this work:

Reducing the accuracy gap: Investigating alternative streaming quantile algorithms (e.g., t-digest [31], GK summaries [32]) that provide tighter error bounds than

P^{2}

could narrow the accuracy gap between FORCE and exact trimmed methods while preserving bounded memory. The fundamental question is whether streaming quantile approximations can achieve accuracy comparable to exact computation for robust correlation estimation.

Higher breakdown points: Investigating alternative streaming scale estimators (e.g., streaming MAD approximations) could increase the breakdown point of streaming trimmed methods from 25% toward 50% while preserving

O (p)

memory complexity.

Multivariate extension: Incorporating lightweight multivariate outlier detection (e.g., based on streaming Mahalanobis distance approximations) could address the reduced sensitivity to multivariate outliers shared by all coordinate-wise trimmed methods.

Adaptive windowing: Developing variants of FORCE that automatically adapt the effective window size based on detected concept drift would enhance robustness in non-stationary environments. Exact trimmed methods naturally adapt when applied in a sliding-window fashion; extending this capability to streaming estimation is non-trivial.

Theoretical analysis: Deriving the asymptotic distribution of FORCE estimates under the contamination model would enable formal statistical inference and provide theoretical guarantees complementing the empirical results presented here.

Hardware acceleration: The embarrassingly parallel structure of FORCE (independent quantile estimation per dimension, independent trimming per dimension pair) makes it amenable to GPU acceleration. Exploring CUDA implementations could yield additional order-of-magnitude speedups for very high-dimensional applications.

5.6. Concluding Remarks

The exponential growth of high-dimensional data streams across finance, IoT, and genomics has created an urgent need for statistical methods that are simultaneously fast, robust, and memory-efficient. This work contributes to addressing this need by introducing FORCE and by systematically evaluating the trade-offs among streaming and batch approaches to robust correlation estimation.

Our experimental results establish a clear method selection framework:

For batch processing where data can be stored, exact trimmed methods (TP-Exact) provide the best accuracy among fast robust estimators.
For memory-constrained streaming where data cannot be retained, FORCE is the only viable option among IQR-based trimmed methods, achieving meaningful robustness (RMSE improvements of 2– $3 \times$ over non-robust Pearson) within $O (p)$ memory.
For heavily contaminated data exceeding the 25% breakdown point, FastMCD remains necessary despite its computational cost.

We conclude that FORCE represents a specialized but important advance in the practical deployment of robust statistics. For true streaming applications with moderate contamination and memory constraints, it provides the only path to robust correlation estimation at the processing rates demanded by modern data infrastructure. As edge computing proliferates and streaming data volumes continue to grow, the importance of memory-efficient robust methods will only increase.

The broader contribution of this work is methodological: by systematically comparing streaming approximation against exact batch computation, we provide practitioners with the empirical evidence needed to make informed trade-offs between accuracy and memory efficiency. The choice between FORCE and exact trimmed methods is not a question of which is “better”, but, rather, which constraints dominate in a given application.

Supplementary Materials

For additional visualization of the experimental results, we provide two Supplementary Figures. The supporting information can be downloaded at: https://github.com/pz1004/force (accessed on 12 December 2025), Figure S1: Heatmap of FORCE speedup factors relative to baseline algorithms across all datasets; Figure S2: Combined view of FORCE performance.

Author Contributions

Conceptualization, S.J. and C.C.; methodology, S.J.; software, S.J.; validation, S.J. and C.C.; formal analysis, S.J.; investigation, S.J.; resources, S.J.; data curation, S.J.; writing—original draft preparation, S.J.; writing—review and editing, S.J. and C.C.; visualization, S.J.; supervision, C.C.; project administration, C.C.; funding acquisition, S.J. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the research fund of Hanbat National University in 2024, Grant Number HNU-2024-000001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available at https://github.com/pz1004/force (accessed on 12 December 2025)/Supplementary Materials.

Acknowledgments

During the preparation of this manuscript/study, the authors used ChatGPT 5.2 and Gemini 3.0 for the purposes of formatting and English grammar correction. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FORCE	Fast Outlier-Robust Correlation Estimation
MCD	Minimum covariance determinant
IQR	Interquartile range
TER	Tail expansion ratio
RMSE	Root mean square error
MAD	Median absolute deviation
PCA	Principal component analysis
IoT	Internet of Things
HFT	High-frequency trading
GARCH	Generalized Autoregressive Conditional Heteroskedasticity
CI	Confidence interval

Appendix A. Sensitivity Analysis of Financial Data Results

To verify that the superior performance of coordinate-wise trimmed methods on the S&P 500 dataset is not an artifact of the specific volatility threshold used to define the ground truth, we conducted a sensitivity analysis varying the cutoff parameter. The reference correlation matrices were constructed by excluding observations with daily volatility in the top 5%, 10%, and 15%, respectively.

Table A1 summarizes the RMSE of each estimator across these thresholds. The exact trimmed methods (TP-Exact, TP-TER) consistently achieve the lowest RMSE, followed by FORCE, confirming that coordinate-wise trimming outperforms multivariate outlier exclusion across all cutoff definitions. FORCE achieves 75–85% of TP-Exact’s accuracy across all thresholds while requiring

O (p)

memory versus

O (N p)

storage. Notably, FastMCD consistently underperforms compared to all coordinate-wise trimmed methods, reinforcing the observation that multivariate outlier detection tends to over-reject informative volatility clustering events.

Table A1. RMSE sensitivity analysis on S&P 500 data across varying volatility cutoffs. Exact trimmed methods (TP-Exact, TP-TER) achieve the best accuracy; FORCE achieves competitive accuracy with bounded memory. All coordinate-wise trimmed methods substantially outperform FastMCD.

Estimator	5% Cutoff	10% Cutoff	15% Cutoff	Memory
TP-Exact	0.0856	0.0902	0.0948	$O (N p)$
TP-TER	0.0862	0.0909	0.0955	$O (N p)$
FORCE (proposed)	0.1017	0.1186	0.1302	$O (p)$
Spearman	0.1087	0.1240	0.1373	$O (N p)$
Winsorized	0.1193	0.1321	0.1456	$O (N p)$
FastMCD	0.1420	0.1606	0.1751	$O (N p)$
Pearson	0.1500	0.1335	0.1433	$O (p)$

Note: The sensitivity analysis confirms two key findings: (1) all coordinate-wise trimmed methods (TP-Exact, TP-TER, FORCE) outperform FastMCD regardless of the volatility cutoff used to define the stable regime; and (2) the accuracy ranking (TP-Exact > TP-TER > FORCE) is consistent across cutoffs, validating that the accuracy gap reflects the fundamental cost of streaming quantile approximation rather than an artifact of experimental design.

Appendix B. Analysis of Tail Expansion Ratio (TER) Trade-Offs

To characterize the statistical behavior of the adaptive TER mechanism, we compared FORCE (which uses TER) against a baseline utilizing fixed trimming (TER ≡ 1) under two distinct simulation regimes:

Asymmetric contamination: A Gaussian core ( $ρ = 0.6$ ) contaminated with one-sided outliers at $+ 8 σ$ .
Coherent heavy tails: A multivariate Student’s t-distribution ( $d f \in {3, 5, 10}$ ), representing coherent volatility clustering often seen in financial markets.

The results, presented in Table A2, highlight a bias–variance trade-off. Under asymmetric contamination, TER interprets the skew as tail expansion and widens the acceptance bounds, resulting in higher MSE compared to fixed trimming. However, under coherent heavy tails (Student’s t), fixed trimming indiscriminately discards valid tail observations, whereas TER adapts the bounds to retain them. Consequently, FORCE with TER achieves lower MSE in heavy-tailed regimes.

Implications for method selection: The TER mechanism provides marginal benefit for data with symmetric heavy tails (as observed in the S&P 500 comparison where TP-Exact and TP-TER achieve nearly identical RMSE). For data with pronounced asymmetric tails, disabling TER (or using exact trimmed methods with fixed bounds) may be preferable. This design choice reflects FORCE’s optimization for financial applications where volatility clustering is the primary concern.

Table A2. MSE comparison of FORCE (with TER) vs. fixed trimming. TER exhibits higher error under asymmetric contamination but improves estimation accuracy (lower MSE) under coherent heavy-tailed distributions.

Scenario	Parameter	MSE (TER)	MSE (Fixed)
Asymmetric contamination	$ε = 0.05$	0.1013	0.0003
Asymmetric contamination	$ε = 0.10$	0.1549	0.0002
Coherent heavy tails (t-dist)	$d f = 10$	0.0010	0.0011
Coherent heavy tails (t-dist)	$d f = 5$	0.0029	0.0032
Coherent heavy tails (t-dist)	$d f = 3$	0.0055	0.0064

References

Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
Markowitz, H. Portfolio Selection. J. Financ. 1952, 7, 77–91. [Google Scholar]
Raymaekers, J.; Rousseeuw, P.J. Fast Robust Correlation for High-Dimensional Data. Technometrics 2021, 63, 184–198. [Google Scholar] [CrossRef]
Luecken, M.D.; Theis, F.J. Current Best Practices in Single-Cell RNA-Seq Analysis: A Tutorial. Mol. Syst. Biol. 2019, 15, e8746. [Google Scholar] [CrossRef] [PubMed]
Cook, A.A.; Mısırlı, G.; Fan, Z. Anomaly Detection for IoT Time-Series Data: A Survey. IEEE Internet Things J. 2020, 7, 6481–6494. [Google Scholar] [CrossRef]
Aït-Sahalia, Y.; Xiu, D. Principal Component Analysis of High-Frequency Data. J. Am. Stat. Assoc. 2019, 114, 287–303. [Google Scholar] [CrossRef]
Carnier, R.M.; Lahesoo, L.; Fukuda, K. Binary Anomaly Detection in Streaming IoT Traffic under Concept Drift. arXiv 2025, arXiv:2510.27304. [Google Scholar] [CrossRef]
Huber, P.J. Robust Statistics; Wiley: New York, NY, USA, 1981. [Google Scholar]
Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; Wiley: New York, NY, USA, 1986. [Google Scholar]
Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
Gnanadesikan, R.; Kettenring, J.R. Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data. Biometrics 1972, 28, 81–124. [Google Scholar] [CrossRef]
Devlin, S.J.; Gnanadesikan, R.; Kettenring, J.R. Robust Estimation of Dispersion Matrices and Principal Components. J. Am. Stat. Assoc. 1981, 76, 354–362. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Least Median of Squares Regression. J. Am. Stat. Assoc. 1984, 79, 871–880. [Google Scholar] [CrossRef]
Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
Croux, C.; Dehon, C. Influence Functions of the Spearman and Kendall Correlation Measures. Stat. Methods Appl. 2010, 19, 497–515. [Google Scholar] [CrossRef]
Shevlyakov, G.; Smirnov, P. Robust Estimation of the Correlation Coefficient: An Attempt of Survey. Austrian J. Stat. 2011, 40, 147–156. [Google Scholar]
Rousseeuw, P.J.; Van Driessen, K. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics 1999, 41, 212–223. [Google Scholar] [CrossRef]
Maddanu, F.; Proietti, T.; Crupi, R. Anomaly Detection in High-Dimensional Bank Account Balances via Robust Methods. arXiv 2025, arXiv:2511.11143. [Google Scholar] [CrossRef]
Loh, P.-L. A Theoretical Review of Modern Robust Statistics. Annu. Rev. Stat. Appl. 2025, 12, 477–496. [Google Scholar] [CrossRef]
Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5650–5659. [Google Scholar]
Zhao, P.; Yu, F.; Wan, Z. A huber loss minimization approach to byzantine robust federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 21806–21814. [Google Scholar]
Maronna, R.A.; Martin, R.D.; Yohai, V.J.; Salibián-Barrera, M. Robust Statistics: Theory and Methods (with R), 2nd ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
Jain, R.; Chlamtac, I. The P² Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations. Commun. ACM 1985, 28, 1076–1085. [Google Scholar] [CrossRef]
Chattamvelli, R. Correlation in Engineering and the Applied Sciences; Springer: Cham, Switzerland, 2024. [Google Scholar]
Aroussi, R. yfinance: Yahoo! Finance Market Data Downloader. Python Package Version 0.2.66. Available online: https://pypi.org/project/yfinance/ (accessed on 28 November 2025).
Cont, R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Financ. 2001, 1, 223–236. [Google Scholar] [CrossRef]
Rayana, S. ODDS Library. Available online: https://shebuti.com/outlier-detection-datasets-odds/ (accessed on 28 November 2025).
Lim, N.; Tesar, S.; Belmadani, M.; Poirier-Morency, G.; Mancarci, B.O.; Sicherman, J.; Jacobson, M.; Leong, J.; Tan, P.; Pavlidis, P. Curation of Over 10,000 Transcriptomic Studies to Enable Data Reuse. Database 2021, 2021, baab006. [Google Scholar] [CrossRef]
Lam, S.K.; Pitrou, A.; Seibert, S. Numba: A LLVM-based Python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, Austin, TX, USA, 15 November 2015; pp. 1–6. [Google Scholar]
Longin, F.; Solnik, B. Extreme Correlation of International Equity Markets. J. Financ. 2001, 56, 649–676. [Google Scholar] [CrossRef]
Dunning, T.; Ertl, O. Computing Extremely Accurate Quantiles Using t-Digests. arXiv 2019, arXiv:1902.04023. [Google Scholar] [CrossRef]
Greenwald, M.; Khanna, S. Space-Efficient Online Computation of Quantile Summaries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, 21–24 May 2001; pp. 58–66. [Google Scholar]

Figure 1. Execution time comparison across algorithms and datasets (log scale). Three distinct performance tiers emerge: (1) Sub-millisecond to low-millisecond (Pearson at ∼0.09–1.1 ms; FORCE, TP-Exact, TP-TER at ∼0.4–7.7 ms); (2) Low-millisecond (Spearman at ∼2.5–16.1 ms, Winsorized at ∼1.6–10.8 ms); (3) High-latency (FastMCD at ∼325–1367 ms). Error bars indicate 95% confidence intervals. FORCE occupies the first tier alongside exact trimmed methods, achieving an average speedup of

470 \times

over FastMCD and

3.9 \times

over Spearman. The distinction between FORCE and exact trimmed methods lies in memory requirements (

O (p)

vs.

O (N p)

), not execution time.

Figure 1. Execution time comparison across algorithms and datasets (log scale). Three distinct performance tiers emerge: (1) Sub-millisecond to low-millisecond (Pearson at ∼0.09–1.1 ms; FORCE, TP-Exact, TP-TER at ∼0.4–7.7 ms); (2) Low-millisecond (Spearman at ∼2.5–16.1 ms, Winsorized at ∼1.6–10.8 ms); (3) High-latency (FastMCD at ∼325–1367 ms). Error bars indicate 95% confidence intervals. FORCE occupies the first tier alongside exact trimmed methods, achieving an average speedup of

470 \times

over FastMCD and

3.9 \times

over Spearman. The distinction between FORCE and exact trimmed methods lies in memory requirements (

O (p)

vs.

O (N p)

), not execution time.

Figure 2. RMSE comparison across algorithms and datasets (lower is better). Key observations: (1) On S&P 500 financial data, exact trimmed methods achieve the lowest RMSE (TP-Exact: 0.0902, TP-TER: 0.0909), with FORCE competitive at 0.1186—all substantially outperforming FastMCD (0.1606); (2) FastMCD achieves optimal accuracy on controlled contamination scenarios (Synthetic: 0.0180, Mammography: 0.0074); (3) On the satellite dataset (31.7% contamination > 25% breakdown point), all IQR-based methods (FORCE, TP-Exact, TP-TER) exhibit expected degradation, validating the theoretical breakdown analysis.

Figure 3. Empirical validation of the theoretical breakdown point for IQR-based trimmed methods. (a) Mechanism of IQR corruption: when contamination is below 25% (top), the IQR remains intact and outliers are correctly rejected; when contamination exceeds 25% (bottom), the IQR becomes corrupted, causing the acceptance region to expand and admit outliers. (b) RMSE vs. contamination rate across all five datasets. The four datasets within the 25% operating regime (green region) achieve competitive RMSE for all IQR-based methods (FORCE, TP-Exact, TP-TER), while the satellite dataset (red region, 31.7% contamination) confirms identical breakdown behavior—all three methods degrade to RMSE ∼0.72, comparable to non-robust Pearson.

Figure 4. Position of correlation estimators in the speed–robustness plane. The x-axis shows average execution time (log scale, milliseconds) and the y-axis shows breakdown point (%). Three regions are identified: Fast–Fragile (red, bottom-left): Pearson offers speed but zero breakdown point; Slow–Robust (blue, top-right): FastMCD achieves ∼50% breakdown point but requires >300 ms average execution time; Fast–Robust (green, top-left): FORCE, TP-Exact, and TP-TER occupy this region, achieving

O (N p^{2})

time complexity with 25% breakdown points. Within the fast–robust region, the key distinction is memory requirements: exact methods require

O (N p)

storage for sorting, while FORCE requires only

O (p)

storage independent of stream length.

Figure 4. Position of correlation estimators in the speed–robustness plane. The x-axis shows average execution time (log scale, milliseconds) and the y-axis shows breakdown point (%). Three regions are identified: Fast–Fragile (red, bottom-left): Pearson offers speed but zero breakdown point; Slow–Robust (blue, top-right): FastMCD achieves ∼50% breakdown point but requires >300 ms average execution time; Fast–Robust (green, top-left): FORCE, TP-Exact, and TP-TER occupy this region, achieving

O (N p^{2})

time complexity with 25% breakdown points. Within the fast–robust region, the key distinction is memory requirements: exact methods require

O (N p)

storage for sorting, while FORCE requires only

O (p)

storage independent of stream length.

Table 1. Baseline performance of the FastMCD algorithm across benchmark datasets. The execution times, measured in milliseconds, illustrate the computational bottleneck that precludes real-time deployment. Results represent the mean ± standard deviation over 20 independent runs.

Dataset	Time (ms)	RMSE
Synthetic	$678.72 \pm 3.45$	$0.0180$
S&P 500	$494.19 \pm 4.06$	$0.1606$
ODDS-mammography	$858.88 \pm 3.30$	$0.0074$
ODDS-satellite	$1367.08 \pm 13.59$	$0.0160$
Genomics	$325.14 \pm 3.87$	$0.2583$

Table 2. Computational complexity comparison of correlation estimation methods. Here, N denotes sample size and p denotes dimensionality. FORCE achieves the same time complexity as exact trimmed methods while requiring only

O (p^{2})

memory independent of stream length.

Table 2. Computational complexity comparison of correlation estimation methods. Here, N denotes sample size and p denotes dimensionality. FORCE achieves the same time complexity as exact trimmed methods while requiring only

O (p^{2})

memory independent of stream length.

Algorithm	Time Complexity	Space Complexity	Breakdown Point
Pearson Correlation	$O (N p^{2})$	$O (p^{2})$	$0 %$
Spearman’s $ρ$	$O (N p^{2} log N)$	$O (N p)$	≈0%
Winsorized Correlation	$O (N p^{2} log N)$	$O (N p)$	Variable
TP-Exact (Exact Trimmed)	$O (N p^{2} log N)$	$O (N p)$	$25 %$
TP-TER (Exact Trimmed + TER)	$O (N p^{2} log N)$	$O (N p)$	$25 %$
FastMCD	$O (N p^{2} + p^{3})$	$O (N p + p^{2})$	≈50%
FORCE (proposed)	$O ({Np}^{2})$	$O (p^{2})$	$25 %$

Table 3. Summary of benchmark datasets used for evaluation. The “Within FORCE Limit?” column indicates whether contamination is below FORCE’s theoretical 25% breakdown point. The ODDS-satellite dataset is deliberately included to validate the breakdown point prediction.

Dataset	N	p	Contamination	Within FORCE Limit?	Domain
Synthetic	1000	50	10%	Yes (<25%)	Simulation
S&P 500	6288	12	∼10%	Yes (<25%)	Finance
ODDS-mammography	11,183	6	2.3%	Yes (<25%)	Medical
ODDS-satellite	6435	36	31.7%	No (>25%)	Remote sensing
Genomics	1203	20	<1%	Yes (<25%)	Genomics

Table 4. Execution time comparison (milliseconds) across five benchmark datasets. Values represent mean ± standard deviation over 20 independent runs. The 95% confidence intervals are shown in brackets below each estimate. TP-Exact = TrimmedPearsonExact without TER; TP-TER = TrimmedPearsonExact with TER optimization.

Dataset	Pearson	Spearman	Winsorized	FastMCD	TP-Exact	TP-TER	FORCE
Synthetic	$0.20 \pm 0.01$	$5.54 \pm 0.12$	$3.19 \pm 0.13$	$678.72 \pm 3.45$	$1.68 \pm 0.05$	$1.82 \pm 0.57$	$0.87 \pm 0.52$
	$[0.20, 0.20]$	$[5.50, 5.57]$	$[3.16, 3.23]$	$[677.76, 679.67]$	$[1.66, 1.69]$	$[1.67, 1.98]$	$[0.73, 1.01]$
S&P 500	$0.12 \pm 0.01$	$5.23 \pm 0.14$	$3.32 \pm 0.14$	$494.19 \pm 4.06$	$2.48 \pm 1.75$	$1.94 \pm 0.68$	$1.85 \pm 0.76$
	$[0.12, 0.13]$	$[5.20, 5.27]$	$[3.28, 3.36]$	$[493.07, 495.32]$	$[2.00, 2.97]$	$[1.75, 2.13]$	$[1.64, 2.07]$
ODDS-mammography	$0.27 \pm 0.02$	$4.56 \pm 0.18$	$2.90 \pm 0.10$	$858.88 \pm 3.30$	$1.99 \pm 0.83$	$1.94 \pm 0.92$	$2.80 \pm 1.10$
	$[0.27, 0.28]$	$[4.51, 4.61]$	$[2.87, 2.93]$	$[857.96, 859.79]$	$[1.76, 2.22]$	$[1.68, 2.19]$	$[2.49, 3.10]$
ODDS-satellite	$1.12 \pm 0.15$	$16.11 \pm 0.37$	$10.84 \pm 0.28$	$1367.08 \pm 13.59$	$7.45 \pm 0.53$	$7.65 \pm 0.52$	$6.38 \pm 5.00$
	$[1.08, 1.16]$	$[16.01, 16.22]$	$[10.76, 10.92]$	$[1363.31, 1370.84]$	$[7.30, 7.60]$	$[7.50, 7.79]$	$[4.99, 7.77]$
Genomics	$0.09 \pm 0.01$	$2.47 \pm 0.11$	$1.58 \pm 0.10$	$325.14 \pm 3.87$	$0.75 \pm 0.69$	$0.87 \pm 0.90$	$0.41 \pm 0.12$
	$[0.08, 0.09]$	$[2.44, 2.50]$	$[1.55, 1.61]$	$[324.07, 326.21]$	$[0.55, 0.94]$	$[0.62, 1.12]$	$[0.38, 0.45]$

Table 5. Speedup factors achieved by FORCE relative to baseline algorithms. Values greater than 1.0 indicate FORCE is faster; values less than 1.0 indicate FORCE is slower. TP-Exact = TrimmedPearsonExact without TER; TP-TER = TrimmedPearsonExact with TER.

Dataset	vs. Pearson	vs. Spearman	vs. Winsorized	vs. FastMCD	vs. TP-Exact	vs. TP-TER
Synthetic	$0.23 \times$	$6.36 \times$	$3.67 \times$	$779.95 \times$	$1.93 \times$	$2.10 \times$
S&P 500	$0.07 \times$	$2.82 \times$	$1.79 \times$	$266.56 \times$	$1.34 \times$	$1.05 \times$
ODDS-mammography	$0.10 \times$	$1.63 \times$	$1.04 \times$	$306.97 \times$	$0.71 \times$	$0.69 \times$
ODDS-satellite	$0.18 \times$	$2.53 \times$	$1.70 \times$	$214.30 \times$	$1.17 \times$	$1.20 \times$
Genomics	$0.21 \times$	$5.95 \times$	$3.82 \times$	$784.86 \times$	$1.80 \times$	$2.09 \times$
Average	$0.16 \times$	$3.86 \times$	$2.40 \times$	$470.53 \times$	$1.39 \times$	$1.42 \times$

Table 6. RMSE comparison (lower is better) across five benchmark datasets. Best values per dataset are shown in bold. The contamination column indicates the contamination level and whether it is within FORCE’s breakdown point (✓) or exceeds it (×). TP-Exact = TrimmedPearsonExact without TER; TP-TER = TrimmedPearsonExact with TER.

Dataset	Contam.	Pearson	Spearman	Winsorized	FastMCD	TP-Exact	TP-TER	FORCE
Synthetic	10% ✓	$0.6074$	$0.2075$	$0.2665$	$0.0180$	$0.0487$	$0.0549$	$0.2001$
S&P 500	∼10% ✓	$0.1335$	$0.1240$	$0.1321$	$0.1606$	$0.0902$	$0.0909$	$0.1186$
ODDS-mammography	2.3% ✓	$0.0723$	$0.0345$	$0.0308$	$0.0074$	$0.0163$	$0.0163$	$0.0157$
ODDS-satellite	31.7% ×	$0.7160$	$0.6473$	$0.7361$	$0.0160$	$0.7160$	$0.7160$	$0.7274$
Genomics	<1% ✓	$0.1326$	$0.0000$	$0.1333$	$0.2583$	$0.1571$	$0.1319$	$0.1267$

Table 7. Empirical validation using the ODDS-satellite dataset (31.7% contamination). This dataset deliberately exceeds the theoretical 25% breakdown point for IQR-based methods. All methods relying on quartile-based trimming show degraded RMSE performance, while FastMCD (50% breakdown point) maintains accuracy.

Algorithm	RMSE	Time (ms)	Breakdown Point	Status at 31.7%
Pearson	$0.7160$	$1.12$	0%	Exceeded
Spearman	$0.6473$	$16.11$	≈0%	Exceeded
Winsorized	$0.7361$	$10.84$	∼10%	Exceeded
TP-Exact	$0.7160$	$7.45$	25%	Exceeded
TP-TER	$0.7160$	$7.65$	25%	Exceeded
FORCE	$0.7274$	$6.38$	25%	Exceeded
FastMCD	$0.0160$	$1367.08$	≈50%	✓Within limit

Table 8. Summary of FORCE performance across evaluation criteria relative to all baselines. Assessments reflect FORCE’s position as a memory-efficient streaming alternative to exact trimmed methods.

Criterion	Assessment	Evidence
Speed vs. FastMCD	✓✓	$470 \times$ faster (average)
Speed vs. Spearman	✓	$3.9 \times$ faster (average)
Speed vs. Exact Trimmed	∼	$1.4 \times$ faster (marginal)
Memory Efficiency	✓✓	$O (p)$ vs. $O (N p)$ for exact methods
Streaming Capability	✓✓	Single-pass, bounded memory
Financial Data Accuracy	✓	RMSE 0.1186 (3rd best; TP-Exact best at 0.0902)
Medical Data Accuracy	✓	RMSE 0.0157 (2nd best; FastMCD best at 0.0074)
Synthetic Data Accuracy	△	RMSE 0.2001 ( $4 \times$ higher than TP-Exact)
Breakdown Point	△	25% (same as TP-Exact; lower than FastMCD’s 50%)
Breakdown Validation	✓	Theory confirmed on satellite (31.7% > 25%)

Legend: ✓✓ = strong advantage; ✓ = advantage; ∼ = comparable; △ = limitation relative to some alternatives.

Table 9. Contamination analysis relative to the 25% breakdown point shared by all IQR-based trimmed methods (FORCE, TP-Exact, TP-TER). All three methods achieve competitive RMSE within their theoretical operating limit and exhibit identical predicted degradation on the satellite validation dataset.

Dataset	Contamination	Margin to 25%	FORCE RMSE	Interpretation
Genomics	<1%	+24%	0.1267	Best among trimmed methods
ODDS-mammography	2.3%	+22.7%	0.0157	Excellent (2nd overall)
S&P 500	∼10%	∼+15%	0.1186	Good (3rd; TP-Exact best)
Synthetic	10%	+15%	0.2001	Moderate (TP-Exact $4 \times$ better)
ODDS-satellite	31.7%	−6.7%	0.7274	Validation: Breakdown confirmed

Note: “Margin to 25%” indicates the safety margin between contamination and the IQR-based breakdown point. Positive values indicate safe operation; the negative value for satellite confirms breakdown occurs as predicted for all IQR-based methods.

Table 10. Statistical significance of execution time differences (paired t-test p-values). Comparisons show whether FORCE execution time differs significantly from each baseline. TP-Exact = TrimmedPearsonExact without TER; TP-TER = TrimmedPearsonExact with TER.

Dataset	vs. Pearson	vs. Spearman	vs. Winsorized	vs. FastMCD	vs. TP-Exact	vs. TP-TER
Synthetic	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001
S&P 500	<0.001	<0.001	<0.001	<0.001	0.016	0.569
ODDS-mammography	<0.001	<0.001	0.508	<0.001	<0.001	<0.001
ODDS-satellite	<0.001	<0.001	<0.001	<0.001	0.112	0.083
Genomics	<0.001	<0.001	<0.001	<0.001	<0.001	<0.01

Table 11. Method selection guide based on application requirements. The primary decision criterion is whether data can be stored for batch processing.

Scenario	Recommended Method	Rationale
Batch processing, accuracy paramount	TP-Exact	Best accuracy among fast methods
Batch processing, heavy contamination (>25%)	FastMCD	Higher breakdown point
Streaming, memory unconstrained	TP-Exact (windowed)	Better accuracy than FORCE
Streaming, memory constrained	FORCE	Only $O (p)$ memory option
Streaming, unbounded data	FORCE	Cannot store for batch

Table 12. Accuracy cost of streaming quantile approximation (FORCE vs. TP-Exact).

Dataset	FORCE RMSE	TP-Exact RMSE	Accuracy Ratio	Memory Savings
S&P 500	0.1186	0.0902	0.76×	$10^{4} \times$
Synthetic	0.2001	0.0487	0.24×	$10^{3} \times$
Mammography	0.0157	0.0163	1.04×	$10^{4} \times$
Genomics	0.1267	0.1571	1.24×	$10^{3} \times$
Average	—	—	0.82×	$10^{3 - 4} \times$

Note: Accuracy ratio > 1 indicates that FORCE outperforms TP-Exact; <1 indicates TP-Exact is better. Memory savings indicates the ratio of TP-Exact storage to FORCE storage for each dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jang, S.; Choi, C. FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams. Mathematics 2026, 14, 191. https://doi.org/10.3390/math14010191

AMA Style

Jang S, Choi C. FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams. Mathematics. 2026; 14(1):191. https://doi.org/10.3390/math14010191

Chicago/Turabian Style

Jang, Sooyoung, and Changbeom Choi. 2026. "FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams" Mathematics 14, no. 1: 191. https://doi.org/10.3390/math14010191

APA Style

Jang, S., & Choi, C. (2026). FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams. Mathematics, 14(1), 191. https://doi.org/10.3390/math14010191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FORCE: Fast Outlier-Robust Correlation Estimation via Streaming Quantile Approximation for High-Dimensional Data Streams

Abstract

1. Introduction

1.1. The Explosion of High-Dimensional Data and the Robustness Deficit

1.2. The Computational and Memory Bottleneck

1.3. Survey of Recent Advances (2024–2025)

1.3.1. Concept Drift and IoT Security

1.3.2. High Dimensionality in Finance

1.3.3. Theoretical Limits and New Directions

1.3.4. Relation to Distributed Learning

1.4. Limitations of Existing Robust Solutions

1.5. The FORCE Contribution

2. Materials and Methods

2.1. Theoretical Framework: The Contamination Model

2.2. The FORCE Algorithm

2.2.1. Module 1: Streaming Quantile Approximation via the P 2 Algorithm

2.2.2. Module 2: Robust Location and Scale Estimation

2.2.3. Module 3: Tail Expansion Ratio and Adaptive Thresholds

Statistical Role of TER (Bias–Efficiency–Robustness Trade-Off)

2.2.4. Module 4: Trimmed Correlation Computation

2.2.5. Complete Algorithm

2.3. Theoretical Analysis

2.3.1. Breakdown Point of FORCE

2.3.2. Computational Complexity Analysis

2.3.3. Consistency for a Trimmed Correlation Functional

Asymptotic Rate

2.3.4. Space Complexity

2.4. Experimental Design

2.4.1. Datasets

2.4.2. Baseline Algorithms

2.4.3. Evaluation Metrics

2.4.4. Implementation Details

3. Results

3.1. Computational Scalability

3.1.1. FORCE vs. FastMCD: Breaking the Computational Bottleneck

3.1.2. FORCE vs. Exact Trimmed Methods: Speed-Memory Trade-Offs

3.1.3. FORCE vs. Rank-Based Methods: Bypassing the Sorting Barrier

3.1.4. FORCE vs. Pearson: The Cost of Robustness

3.1.5. Execution Time Distribution

3.2. Estimation Accuracy

3.2.1. Financial Data: Trimmed Methods Excel

Sensitivity Analysis

3.2.2. Medical Data: FastMCD Leads, FORCE Competitive

3.2.3. Genomic Data: Reference Method Dominates

3.2.4. Synthetic Data: Controlled Contamination Analysis

3.3. Empirical Validation of the Breakdown Point: The ODDS-Satellite Stress Test

3.4. Summary of Results

3.5. Statistical Significance

4. Discussion

4.1. Positioning in the Estimator Landscape

The Accuracy Cost of Streaming Quantile Approximation

4.2. Financial Data: Trimmed Methods Outperform Multivariate Approaches

4.2.1. Why Coordinate-Wise Trimming Excels on Financial Data

4.2.2. Limited Benefit of TER Adaptation

4.2.3. Implications for Financial Applications

4.3. Comparison with the Recent Literature

4.3.1. Connection to Concept Drift Research

4.3.2. Connection to High-Dimensional Finance Research

4.3.3. Connection to Theoretical Robust Statistics

4.3.4. The Scientific Value of Stress Testing

4.4. Applicability Bounds and Method Selection

4.4.1. Theoretical Basis of the 25% Breakdown Point

4.4.2. Multivariate Outliers

4.4.3. Non-Stationary Contamination

4.5. Practical Deployment Guidelines

4.5.1. Method Selection Framework

4.5.2. Parameter Selection

4.5.3. Quantifying the Accuracy–Memory Trade-Off

5. Conclusions

5.1. Summary of Contributions

5.2. Positioning in the Robust Statistics Literature

5.3. Implications for Practice

5.4. Limitations

5.5. Future Research Directions

5.6. Concluding Remarks

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

2.2.1. Module 1: Streaming Quantile Approximation via the $P^{2}$ Algorithm