The estimation of correlation matrices in high-dimensional data streams presents a fundamental conflict between computational efficiency and statistical robustness. Moment-based estimators, such as Pearson’s correlation, offer linear
complexity but lack robustness. In contrast, high-breakdown methods like the minimum covariance
[...] Read more.
The estimation of correlation matrices in high-dimensional data streams presents a fundamental conflict between computational efficiency and statistical robustness. Moment-based estimators, such as Pearson’s correlation, offer linear
complexity but lack robustness. In contrast, high-breakdown methods like the minimum covariance determinant (MCD) are computationally prohibitive (
) for real-time applications. This paper introduces Fast Outlier-Robust Correlation Estimation (FORCE), a streaming algorithm that performs adaptive coordinate-wise trimming using the
algorithm for streaming quantile approximation, requiring only
memory independent of stream length. We evaluate FORCE against six baseline algorithms—including exact trimmed methods (TP-Exact, TP-TER) that use
sorting with
storage—across five benchmark datasets spanning synthetic, financial, medical, and genomic domains. FORCE achieves speedups of approximately 470× over FastMCD and 3.9× over Spearman’s rank correlation. On S&P 500 financial data, coordinate-wise trimmed methods substantially outperform FastMCD: TP-Exact achieves the best RMSE (0.0902), followed by TP-TER (0.0909) and FORCE (0.1186), compared to FastMCD’s 0.1606. This result demonstrates that coordinate-wise trimming better accommodates volatility clustering in financial time series than multivariate outlier exclusion. FORCE achieves 76% of TP-Exact’s accuracy while requiring
less memory, enabling robust estimation in true streaming environments where data cannot be retained for batch processing. We validate the 25% breakdown point shared by all IQR-based trimmed methods using the ODDS-satellite benchmark (31.7% contamination), confirming identical degradation for FORCE, TP-Exact, and TP-TER. For memory-constrained streaming applications with contamination below 25%, FORCE provides the only viable path to robust correlation estimation with bounded memory.
Full article