Authentic SEC Data and Regime-Aware Ensemble Learning for Corporate Cash Flow Forecasting
Abstract
1. Introduction
1.1. Motivation: The Data Authenticity Problem in Financial Forecasting
1.2. Research Gaps and Contributions
- The Authenticity Gap: While protocols for extracting and validating data from the SEC’s EDGAR system exist, their adoption in forecasting research is limited. There is a lack of systematic evidence quantifying the bias introduced using estimated versus authentic data in a rigorous out-of-sample forecasting context.
- The Adaptation Gap: Theoretical work on regime-switching models (Hamilton, 1989) has revolutionized the analysis of non-stationary time series. Technology firm cash flows exhibit pronounced regime-dependent behavior. However, comprehensive forecasting frameworks that prescriptively use detected regimes to dynamically adapt model combination and weighting remain scarce (Pesaran & Timmermann, 1995).
- The Integration Gap: Ensemble methods, while empirically superior (Makridakis et al., 2020), are often implemented with static combination weights. There is a need for frameworks that dynamically reweight ensemble components based on both the identified economic regime and recent, out-of-sample model performance.
- Contribution 1 (Quantifying the Bias): We implement a rigorous data authenticity protocol and provide the first direct, out-of-sample quantification of the optimistic bias introduced by estimated data. We show that this bias is large (e.g., 43% in MAPE), is consistent across model architectures, and stems from the underestimation of true economic volatility.
- Contribution 2 (A Rigorous Forecasting Framework): We develop and validate a complete forecasting framework that includes (i) a well-specified pseudo-real-time evaluation design; (ii) an HMM for probabilistic regime identification using only information available at the forecast origin; (iii) regime-specific forecasting models (XGBoost and LSTM with attention); and (iv) a novel dynamic ensemble that weights models based on their recent, regime-filtered performance.
- Contribution 3 (Efficient Transfer Learning): We adapt Model-Agnostic Meta-Learning (MAML) for financial time series and introduce the Financial Domain Similarity (FDS) metric. We empirically demonstrate that this approach reduces data requirements for forecasting at a new firm by approximately 35% while significantly improving accuracy, making sophisticated forecasting more accessible.
- Contribution 4 (Reproducibility): We provide all source code, data extraction scripts, and model implementations to ensure full reproducibility and facilitate adoption by other researchers and practitioners.
1.3. Structure of the Study
2. Data: Construction and Authenticity Protocol
2.1. Data Sources and Sample Selection
2.2. Authentic Data Collection: A Rigorous Protocol
- Stage 1: Source Identification: Firms were identified based on their Central Index Key (CIK) on the SEC EDGAR database.
- Stage 2: Document Selection: We extracted data exclusively from official Form 10-Q filings; these are the legally mandated and reviewed (though unaudited) quarterly reports filed with the SEC.
- Stage 3: Precise Extraction: We programmatically parsed the “Net cash provided by (used in) operating activities” line item from the Statement of Cash Flows, using a context-aware parser to ensure accuracy.
- Stage 4: Systematic Restatement Handling: Companies may restate prior results in amended filings (e.g., 10-Q/A). Following the FASB guidelines (Financial Accounting Standards Board, 2021), our protocol automatically selects the latest corrected figure, ensuring temporal consistency and using the most accurate information.
- Stage 5: Three-Tier Validation: All extracted data were validated using the following protocol: (1) checking arithmetic consistency between quarterly and year-to-date figures; (2) reconciling summed quarterly figures to annual totals reported in Form 10-K; and (3) verifying a random 10% sample against data from the Bloomberg Terminal, achieving 99.8% concordance.
2.3. Constructing the “Estimated” Dataset
- Proportional to Annual Sales: Quarterly cash flow was estimated by allocating annual cash flow based on the proportion of quarterly sales to annual sales.
- Linear Interpolation: Quarterly values were imputed using a cubic spline interpolation of the annual totals.
2.4. Sample Characteristics and Feature Engineering
- Temporal Features: Lagged OCF (t − 1 to t − 4), 4-quarter moving averages, and rolling volatility.
- Decomposition Features: Seasonal and trend components from an STL decomposition estimated on an expanding window.
- External Macro-Financial Features: Lagged values of the VIX, the term spread (10-year–2-year Treasury yield), and GDP nowcasts from the Atlanta Fed.
- Regime Indicators: The smoothed probability of being in a high-volatility state from an HMM, as described in Section 3.1. A complete list of all engineered features, along with stationarity tests and multicollinearity diagnostics, is provided in Appendix E.
- Figure A1: Five Stage Data Authenticity Protocol—A visual summary of the extraction and validation stages described in Section 2.2 has been added to Appendix A for clarity.
3. Methodology: A Rigorous Forecasting Framework
3.1. The Forecasting Problem
- Target Variable: The operating cash flow for firm *i* in quarter *t*.
- Forecast Horizon: One-step-ahead, quarterly forecasts. While the framework is general, we focus on h = 1.
- Forecast Origin: The end of quarter *t − 1*. A forecast for y i,t is made at time t − 1.
3.2. Regime Detection with Hidden Markov Models
- State 1 (Stable Growth): Characterized by low volatility and positive mean growth;
- State 2 (Transitional): Moderate volatility and near-zero growth;
- State 3 (Turbulent): High volatility and potentially negative growth.
3.3. Forecasting Models
3.3.1. Benchmarks
- Seasonal Naïve:
- ARIMA: An auto.arima model selected on an expanding window basis using BIC, following the methodology of;
- ARIMAX: An ARIMA model augmented with the same external features used in the ML models.
3.3.2. XGBoost with Regime-Specific Regularization
3.3.3. LSTM with Temporal Attention
3.3.4. Dynamic Ensemble Mechanism
3.4. Transfer Learning Framework
Computational Cost and Hyperparameter Sensitivity
3.5. Evaluation Protocol: Pseudo-Real-Time Out-of-Sample Forecasts
- Initial Training Window: Q1 2011–Q4 2015 (20 quarters). This is the first window used to estimate model parameters.
- Evaluation Period: Q1 2016–Q4 2024 (36 quarters). All reported performance metrics are calculated on forecasts made during this period.
- Procedure: For each forecast origin τ from Q4 2015 to Q3 2024:
- Train/calibrate all models using only data from the start of the sample up to τ. For models with hyperparameters, these are tuned using cross-validation on this expanding window;
- Use the HMM to estimate the most probable state for τ + 1 (the target quarter) using data up to τ;
- Generate one-step-ahead forecasts ŷτ + 1 from each model;
- Move to the next forecast origin τ + 1, expanding the training data.
3.6. Computational Considerations
3.7. Complexity Versus Dataset Size
4. Empirical Results
4.1. Result 1: Quantifying the “Estimation–Reality Divide”
Robustness Checks
4.2. Result 2: Regime-Dependent Model Performance
4.3. Result 3: Comparison with Modern Baselines
4.4. Economic Impact Assessment
4.5. Economic Interpretation of Regime Dynamics
5. Discussion and Implications
5.1. For Academic Researchers
5.2. For Corporate Treasurers and Practitioners
5.3. Generalizability and Applicability
Explicit Boundaries of Empirical Claims
- Not that the MAPE of 7.9% will hold for small-cap firms;
- Not that the HMM-LSTM-XGBoost combination is universally optimal;
- Not that the four source firms in meta-learning represent all technology firms.
5.4. Quantitative Economic Impact
6. Conclusions, Limitations, and Future Research
6.1. Limitations
- Sample Specificity—The Fundamental Boundary of This Study: Our analysis is strictly and exclusively limited to five large-cap U.S. technology firms (Microsoft, Apple, Amazon, Alphabet, Meta) over the period 2011–2024. This is not a minor caveat; it is a deliberate and non-negotiable boundary condition of the study. We do not claim, nor do we provide any evidence for, generalizability to:
- Smaller firms (small-cap, mid-cap);
- Non-technology sectors (industrials, financials, healthcare, consumer goods, energy);
- International markets or non-U.S. jurisdictions;
- Firms with different reporting frequencies or data availability patterns.
- 2.
- Horizon: We focused on one-step-ahead quarterly forecasts. The performance of the framework for multi-step (e.g., annual) forecasts is an open question (Marcellino et al., 2006).
- 3.
- Model Scope: While we included strong benchmarks, the rapidly evolving field of deep learning for time series (e.g., Transformers, TFT, N-BEATS) offers other architectures that could be integrated and compared within our framework (Lim & Zohren, 2021; Vaswani et al., 2017; Zeng et al., 2023; Y. Zhang et al., 2025; Z. Zhang et al., 2025). We explicitly note that additional benchmarks (such as GARCH-MIDAS or pure Transformer models) could be added; we invite researchers to extend our work in this direction.
- 4.
- Computational Cost: The full framework requires substantial computational resources, which may be a barrier for smaller firms or researchers with limited infrastructure. We have detailed these costs in Section 3.6 to provide transparency.
- 5.
- Look ahead bias mitigation: While our HMM uses only filtered probabilities (available at the forecast origin), the initial estimation of the HMM transition matrix uses the full sample. This is standard practice in regime switching literature (Hamilton, 1989), but we acknowledge that a fully recursive estimation (refitting the HMM at each step) would be even more conservative. We have verified that our results are qualitatively unchanged when using a rolling HMM estimation window of 20 quarters (results available upon request).
- 6.
- Small Sample Size: Despite adequate statistical power, our analysis is based on only five firms. Replication on larger samples across sectors is essential before drawing definitive conclusions.
- 7.
- Framework Complexity: The full framework requires non-trivial computational resources and expertise, which may limit adoption by smaller firms or researchers with limited infrastructure. We provide a lightweight alternative (Section 3.6) to mitigate this barrier.
- 8.
- Lack of Cross-Sectoral Validation: The FDS metric, HMM regime interpretations, and transfer learning gains are derived solely from technology firms. Their performance in other sectors (e.g., banking, energy, consumer goods) is unknown and should not be assumed. We provide code to facilitate such testing but explicitly warn against blind application.
6.2. Future Research
- Cross-Sector, Cross-Cap, and International Validation (Highest Priority). The most critical extension of this study is the expansion of the dataset to include a broad cross-section of firms across multiple sectors and market capitalizations. Specifically, we encourage researchers to apply our framework to:
- Industrial firms, where cash flow volatility is driven by inventory cycles and capital expenditure lumpiness;
- Consumer goods, where seasonality and brand lifecycles dominate;
- Healthcare, where R&D pipelines and patent cliffs create regime-dependent cash flow dynamics;
- Financials, where regulatory capital requirements and interest rate sensitivity introduce distinct volatility patterns;
- Small-cap and mid-cap firms, to test whether the 7.9% MAPE benchmark holds beyond large-cap technology;
- International markets (e.g., Europe, Asia, emerging economies), to assess cross-jurisdictional generalizability.
- Until such validation is performed, the findings of this study should be viewed as a replicable case study within large-cap U.S. technology, not as established facts about financial forecasting in general. We provide all code and the FDS metric to lower replication barriers, but we explicitly warn against blind application without sector-specific adaptation (Taneva-Angelova & Granchev, 2025).
- Multi-Horizon Forecasting: The framework could be extended to direct multi-step forecasting and evaluate performance across different horizons.
- Architecture Comparison: The performance of a wider range of modern forecasting architectures—including convolutional neural networks (Borovykh et al., 2017), convolutional LSTM (Shi et al., 2015), hybrid CEEMDAN-LSTM (Cao et al., 2019), LSTM networks for market prediction (Fischer & Krauss, 2018), ARIMA-LSTM hybrids (Harikumar & Muthumeenakshi, 2025), and advanced transformer-based models (Nie et al., 2023; Zeng et al., 2023)—could be systematically compared within our dynamic ensemble framework.
- Causal Inference: The regime-switching framework could be used to better understand the causal drivers of cash flow changes during different economic states.
- More sophisticated volatility models such as GARCH-MIDAS (Asgharian et al., 2013; Engle et al., 2013; Ersin & Bildirici, 2023), building on the foundational GARCH framework (Bollerslev, 1986), and realized volatility measures (Andersen et al., 2001; Barndorff-Nielsen & Shephard, 2002, 2004; Corsi, 2009) could be incorporated to enhance forecast accuracy. Alternative volatility estimators based on price ranges (Parkinson, 1980; Yang & Zhang, 2000), two-scale realized volatility (L. Zhang et al., 2005), and jump-robust measures (Patton & Sheppard, 2009; Tauchen & Zhou, 2011) offer additional avenues. Recent machine learning approaches to volatility forecasting (Chun et al., 2025; Y. Zhang et al., 2025) and critical evaluations of mixed-frequency models (Virk et al., 2024) also merit exploration. Moreover, incorporating long-memory and co-integration (Engle & Granger, 1987; Stock & Watson, 2002) and Bayesian shrinkage methods (Zellner & Hong, 1989) could further improve predictive performance. Extensions to asymmetric volatility models (Engle, 1982; Nelson, 1991) and high-frequency intraday approaches (Ferreira & Medeiros, 2021) represent promising directions.
- Explainability: Interpretability methods like SHAP (Lundberg & Lee, 2017) and conformal prediction (Shafer & Vovk, 2008) could be applied to provide uncertainty quantification and model transparency.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. XGBoost Hyperparameter Configuration

| Parameter | Value | Search Range | Tuning Method | Description |
|---|---|---|---|---|
| n_estimators | 200 | [100, 300] | Early stopping | Number of boosting rounds. Training stopped if validation error did not improve for 10 rounds. |
| max_depth | 6 | [4, 8] | Grid search (5-fold CV) | Maximum tree depth. Controls model complexity and interaction depth. |
| learning_rate | 0.05 | [0.01, 0.1] | Grid search (5-fold CV) | Step size shrinkage. Lower values require more trees but improve generalization. |
| ubsample | 0.8 | [0.6, 1.0] | Grid search (5-fold CV) | Fraction of training samples used per tree. Prevents overfitting. |
| colsample_bytree | 0.8 | [0.6, 1.0] | Grid search (5-fold CV) | Fraction of features used per tree. Adds randomness and reduces variance. |
| reg_lambda | 0.1 | [0, 1.0] | Bayesian optimization | L2 regularization weight on leaf scores. Higher values increase regularization. |
| reg_alpha | 0.1 | [0, 1.0] | Bayesian optimization | L1 regularization weight on leaf scores. Can lead to sparsity. |
| min_child_weight | 5 | [1, 10] | Grid search (5-fold CV) | Minimum sum of instance weight (hessian) needed in a child node. Controls overfitting. |
| gamma | 0.1 | [0, 0.5] | Bayesian optimization | Minimum loss reduction required to make a further partition on a leaf node. |
| scale_pos_weight | 1 | - | Fixed | Balance of positive/negative weights. Not critical as this is a regression task. |
| objective | reg:squarederror | - | Fixed | Regression with squared loss. |
| eval_metric | rmse | - | Fixed | Root mean squared error for validation. |
Appendix B. Complete LSTM Formulation with Temporal Attention
Appendix B.1. LSTM Cell Dynamics
- tanh is the hyperbolic tangent activation function;
- ⊙ denotes element-wise multiplication;
- Wxi, Whi, bi are weight matrices and bias vectors for the input gate;
- Wxf, Whf, bf are weights and bias for the forget gate;
- Wxc, Whc, bc are weights and bias for the cell candidate;
- Wxo, Who, bo are weights and bias for the output gate.
Appendix B.2. Temporal Attention Mechanism
Appendix B.3. Final Prediction
Appendix B.4. Loss Function
Appendix C. LSTM Hyperparameters
| Parameter | Value | Description |
|---|---|---|
| Architecture | ||
| Number of LSTM layers | 2 | Stacked LSTM layers for hierarchical feature extraction |
| Hidden units per layer | 64 | Dimensionality of hidden and cell states |
| Dropout rate | 0.3 | Dropout applied between LSTM layers (prevents overfitting) |
| Recurrent dropout rate | 0.2 | Dropout applied to recurrent connections |
| Sequence length | 8 quarters | Number of past quarters used as input for each prediction |
| Training | ||
| Batch size | 16 | Number of sequences processed before model update |
| Initial learning rate | 0.001 | Adam optimizer initial step size |
| Learning rate decay | 0.1 | Factor by which learning rate is reduced after 50 epochs without improvement |
| Early stopping patience | 10 epochs | Training stops if validation loss does not improve for 10 epochs |
| Maximum epochs | 200 | Upper bound on training iterations |
| Optimization | ||
| Optimizer | Adam | Adaptive moment estimation optimizer |
| β1 (Adam) | 0.9 | Exponential decay rate for first-moment estimates |
| β2 (Adam) | 0.999 | Exponential decay rate for second-moment estimates |
| ε (Adam) | 1 × 10−8 | Small constant for numerical stability |
| Gradient clipping | 1.0 | Maximum norm for gradient clipping to prevent exploding gradients |
| Regularization | ||
| L2 regularization | 1 × 10−5 | Weight decay applied to all weights |
| Input/Output | ||
| Input features | 14 | Number of features after feature engineering |
| Output dimension | 1 | Single-step-ahead cash flow forecast |
| Activation (recurrent) | tanh | Activation function for recurrent steps |
| Activation (gates) | sigmoid | Activation function for LSTM gates |
Appendix D. Transfer Learning Implementation Details
Appendix D.1. MAML Framework Configuration
| Algorithm A1: MAML for Financial Time-Series Forecasting |
| Require: p(): Distribution over tasks (firms) Require: α, β: Inner and outer loop learning rates Require: θ: Initial model parameters (random initialization) while not done do Sample batch of tasks _i ~ p() for each _i do Sample support set _i and query set _i from _i Evaluate ∇_θ L_{_i}(f_θ) //Compute gradients on support set Compute adapted parameters: θ_i′ = θ − α ∇_θ L_{_i}(f_θ) Evaluate L_{_i}(f_{θ_i′}) on query set end for Update θ ← θ − β ∇_θ Σ_i L_{_i}(f_{θ_i′}) end while |
| Parameter | Value | Description |
|---|---|---|
| Meta-Learning | ||
| Inner loop learning rate (α) | 0.01 | Step size for task-specific adaptation |
| Outer loop learning rate (β) | 0.001 | Step size for meta-parameter update |
| Meta-batch size | 4 firms | Number of tasks sampled per meta-iteration |
| Task sampling | ||
| Support set size per firm | 16 quarters | Data used for inner loop adaptation |
| Query set size per firm | 8 quarters | Data used for meta-gradient computation |
| Total firms in meta-training | 4 | Microsoft, Apple, Amazon, Alphabet (source firms) |
| Validation firms | 1 | Meta (held out for meta-validation) |
| Training | ||
| Total meta-training iterations | 1000 | Number of meta-updates |
| Validation frequency | Every 50 iterations | Evaluate on meta-validation firm |
| Early stopping patience | 10 validation checks | Stop if meta-validation loss does not improve |
| Architecture | ||
| Base model | 2-layer LSTM (64 units) | Same architecture as in Appendix C |
| Shared parameters | All weights | All LSTM and attention weights are meta-learned |
| Task-specific parameters | None | All adaptation occurs via gradient steps on shared weights |
Appendix D.2. Financial Domain Similarity (FDS) Metric
| Feature | Notation | Calculation | Interpretation |
|---|---|---|---|
| Revenue Recurrence Ratio | Φ1 | (Subscription Revenue)/(Total Revenue) | Higher values indicate more predictable cash flows |
| Operating Margin Stability | Φ2 | 1/CV (Operating Margin) | Inverse of coefficient of variation; stable margins indicate consistent cost structures |
| R&D Intensity | Φ3 | (R&D Expenditure)/(Revenue) | Higher intensity correlates with innovation-driven growth and potential volatility |
| Cash Conversion Cycle Efficiency | Φ4 | Average (365/(Revenue/Average Accounts Receivable)) | Shorter cycles indicate working capital efficiency |
| Customer Concentration | Φ5 | Revenue from top 3 customers/Total Revenue | Higher concentration increases customer-related risk |
| FDS Range | Transferability | Expected Performance Degradation | Recommended Fine-Tuning Data |
|---|---|---|---|
| >0.8 | High | <30% vs. from scratch | 12–16 quarters |
| 0.6–0.8 | Moderate | 30–50% vs. from scratch | 16–24 quarters |
| <0.6 | Limited | >50% vs. from scratch | 24+ quarters recommended |
Appendix E. Feature Engineering Details
Appendix E.1. Complete Feature Set
| Feature Category | Feature Name | Notation | Transformation | ADF p-Value | VIF | Description |
|---|---|---|---|---|---|---|
| Temporal Lagged | OCF (t − 1) | yt − 1 | Level | <0.001 | 3.1 | Operating cash flow, lagged 1 quarter |
| OCF (t − 2) | yt − 2 | Level | <0.001 | 2.9 | Lagged 2 quarters | |
| OCF (t − 3) | yt − 3 | Level | <0.001 | 2.7 | Lagged 3 quarters | |
| OCF (t − 4) | yt − 4 | Level | <0.001 | 2.3 | Lagged 4 quarters (annual lag) | |
| Rolling Statistics | 4-Quarter MA | MA4 | Level | <0.001 | 2.4 | 4-quarter moving average of OCF |
| 8-Quarter SD | σ8 | Level | <0.001 | 1.8 | 8-quarter rolling standard deviation (volatility) | |
| Growth Rate (QoQ) | Δyt | Percentage | <0.001 | 2.1 | Quarterly growth rate: (yt − yt − 1)/yt − 1) | |
| STL Decomposition | Seasonal Component | St | Level | <0.001 | 1.5 | Seasonal pattern from STL decomposition |
| Trend Component | Tt | First difference | <0.001 | 1.9 | Trend component, differenced for stationarity | |
| Remainder | Rt | Level | <0.001 | 1.6 | Irregular component | |
| Regime Indicators | Volatility Z-Score | Zt | Level | 0.023 | 3.4 | Rolling Z-score of OCF growth: (Δyt − μΔy)/σΔy |
| MACD | MACDt | Level | 0.017 | 2.7 | Moving average convergence divergence of growth rates | |
| HMM State Probability | P(St = 3) | Level | 0.034 | 2.9 | Smoothed probability of being in high-volatility state | |
| Macroeconomic | 10-Year Treasury Yield | rt | First difference | <0.001 | 2.8 | Yield on 10-year U.S. Treasury notes |
| Term Spread | spreadt | Level | 0.008 | 2.1 | 10-year minus 2-year Treasury yield | |
| VIX Index | VIXt | Log | <0.001 | 2.5 | CBOE Volatility Index (market fear gauge) | |
| GDP Now-cast | GDPt | First difference | <0.001 | 2.2 | Atlanta Fed GDPNow estimate | |
| Industry | NASDAQ-100 Return | NDXt | Industry | NASDAQ-100 Return | NDXt | |
| Sector R&D Growth | R&Dt | Percentage | 0.012 | 1.9 | Growth in aggregate R&D for tech sector (SIC 3570–7379) | |
| Sentiment | Analyst Revision Score | ARt | Level | 0.006 | 2.5 | Net percentage of analysts revising earning forecasts upward |
| Sentiment Index | SENTt | Level | 0.018 | 2.2 | Composite of analyst recommendations |
Appendix E.2. Stationarity Testing
Appendix E.3. Multicollinearity Assessment
Appendix E.4. Feature Alignment Protocol
- All features are lagged appropriately so that they are available at the forecast origin;
- Rolling statistics use only data up to time t − 1;
- STL decomposition is refit on an expanding window basis;
- HMM probabilities are filtered probabilities (using data up to current time only);
- Macroeconomic indicators are aligned to the firm’s fiscal quarter-end date.
Appendix F. Statistical Robustness Checks
Appendix F.1. Bootstrap Confidence Intervals for the Estimation–Reality Divide
Appendix F.2. Statistical Power Analysis
Appendix G. Leave-One-Firm-Out (LOFO) Sensitivity Analysis
Appendix G.1. Motivation and Design
- -
- Procedure:
- For each target firm *i* in {Microsoft, Apple, Amazon, Alphabet, Meta}:
- Training set: All other four firms (combined quarterly observations from 2011 to 2024);
- Evaluation: Pseudo-real-time forecasts for the held-out firm over 2016–2024;
- The dynamic ensemble (XGBoost + LSTM with attention, regime-weighted) is used exactly as described in Section 3.3.4;
- No data from the held-out firm is used during training.
Appendix G.2. Results
| Held Out Firm | MAPE (LOFO) | MAPE (Full Sample—From Table 3) | Difference |
|---|---|---|---|
| Microsoft | 7.8% | 7.9% | −0.1% |
| Apple | 8.1% | 7.9% | +0.2% |
| Amazon | 7.6% | 7.9% | −0.3% |
| Alphabet | 7.9% | 7.9% | 0.0% |
| Meta | 8.3% | 7.9% | +0.4% |
| Mean | 7.9% | 7.9% | 0.0% |
| Std Dev | 0.3% | 0.0% |
Appendix G.3. Comparison with Full-Sample Training
Appendix G.4. Limitations of LOFO
Appendix H. Sensitivity Analysis of Number of HMM States (K)
Appendix H.1. Motivation
Appendix H.2. BIC Comparison
| K | BIC | ΔΔBIC vs. K = 3 | Evidence Against K = 3 |
|---|---|---|---|
| 2 | 1247.3 | +48.7 | Very strong |
| 3 | 1198.6 | 0 | — |
| 4 | 1213.4 | +14.8 | Strong |
| 5 | 1231.9 | +33.3 | Very strong |
Appendix H.3. Forecast Accuracy (Out-of-Sample MAPE)
| K | Dynamic Ensemble MAPE | Interpretation |
| 2 | 8.3% | States: “Low volatility” (78% of quarters) and “High volatility” (22%). The high volatility state mixes COVID 19 and 2022 rate hikes, reducing regime specific model specialization. |
| 3 | 7.9% | Clean separation: Stable (62%), Transitional (24%), Turbulent (14%). Attention based LSTM excels in Turbulent state (MAPE 11.5% vs. XGBoost 13.1%). |
| 4 | 8.1% | Fourth state is a “very high volatility” state with only 6% of quarters, leading to overfitting and unstable weight estimation. |
Appendix H.4. Conclusions
Appendix I. Robustness to Unequal Sample Periods Across Firms
Appendix I.1. Motivation
Appendix I.2. Common Period Analysis (2013–2024)
| Metric | Full Sample (as Reported) | Common Period (2013–2024) | Difference |
| Dynamic Ensemble MAPE | 7.9% | 8.0% | +0.1 p.p. |
| Estimated data MAPE (same ensemble) | 4.5% | 4.6% | +0.1 p.p. |
| MAPE difference (authentic vs. estimated) | 3.4 p.p. | 3.4 p.p. | 0.0 p.p. |
| Diebold Mariano p value | <0.001 | <0.001 | — |
| Note: A dash (—) indicates that the value is not applicable or not calculated. | |||
Appendix I.3. Conclusions
References
- Andersen, T. G., Bollerslev, T., Diebold, F. X., & Ebens, H. (2001). The distribution of realized stock return volatility. Journal of Financial Economics, 61(1), 43–76. [Google Scholar] [CrossRef]
- Asgharian, H., Hou, A. J., & Javed, F. (2013). The importance of macroeconomic variables in forecasting stock return variance: A GARCH-MIDAS approach. Journal of Forecasting, 32(7), 600–612. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, O. E., & Shephard, N. (2002). Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society: Series B, 64(2), 253–280. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, O. E., & Shephard, N. (2004). Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics, 2(1), 1–37. [Google Scholar] [CrossRef]
- Baumol, W. J. (1952). The transactions demand for cash: An inventory theoretic approach. Quarterly Journal of Economics, 66(4), 545–556. [Google Scholar] [CrossRef]
- Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. [Google Scholar] [CrossRef]
- Borovykh, A., Bohte, S., & Oosterlee, C. W. (2017). Conditional time series forecasting with convolutional neural networks. arXiv. [Google Scholar] [CrossRef]
- Brave, S. A., & Butters, R. A. (2012). Diagnosing the financial system: Financial conditions and financial stress. International Journal of Central Banking, 8(2), 191–239. [Google Scholar]
- Cao, J., Li, Z., & Li, J. (2019). Financial time series forecasting model based on CEEMDAN and LSTM. Physica A: Statistical Mechanics and Its Applications, 519, 127–139. [Google Scholar] [CrossRef]
- Chen, S., & Ge, L. (2019). Exploring the attention mechanism in LSTM-based Hong Kong stock price movement prediction. Quantitative Finance, 19(9), 1507–1515. [Google Scholar] [CrossRef]
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In The 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). ACM. [Google Scholar] [CrossRef]
- Chun, D., Cho, H., & Ryu, D. (2025). Volatility forecasting and volatility-timing strategies: A machine learning approach. Research in International Business and Finance, 75, 102723. [Google Scholar] [CrossRef]
- Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast accuracy and encompassing for nested models. Journal of Econometrics, 105(1), 85–110. [Google Scholar] [CrossRef]
- Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics, 7(2), 174–196. [Google Scholar] [CrossRef]
- Dechow, P. M., Kothari, S. P., & Watts, R. L. (1998). The relation between earnings and cash flows. Journal of Accounting and Economics, 25(2), 133–168. [Google Scholar] [CrossRef]
- Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427–431. [Google Scholar] [CrossRef]
- Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3), 253–263. [Google Scholar] [CrossRef]
- Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50(4), 987–1007. [Google Scholar] [CrossRef]
- Engle, R. F., Ghysels, E., & Sohn, B. (2013). Stock market volatility and macroeconomic fundamentals. Review of Economics and Statistics, 95(3), 776–797. [Google Scholar] [CrossRef]
- Engle, R. F., & Granger, C. W. J. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2), 251–276. [Google Scholar] [CrossRef]
- Ersin, Ö. Ö., & Bildirici, M. (2023). Financial volatility modeling with the GARCH-MIDAS-LSTM approach: The effects of economic expectations, geopolitical risks and industrial production during COVID-19. Mathematics, 11(8), 1785. [Google Scholar] [CrossRef]
- European Commission. (2019). Ethics guidelines for trustworthy AI. Available online: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai (accessed on 1 January 2025).
- Ferreira, I. H., & Medeiros, M. C. (2021). Modeling and forecasting intraday market returns: A machine learning approach. arXiv. [Google Scholar] [CrossRef]
- Financial Accounting Standards Board. (2021). Accounting standards update no. 2021-04: Error correction. FASB. [Google Scholar]
- Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 1126–1135). PMLR. [Google Scholar]
- Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654–669. [Google Scholar] [CrossRef]
- Gao, H., Kou, G., Liang, H., Zhang, H., Chao, X., Li, C., & Dong, Y. (2024). Machine learning in business and finance: A literature review and research opportunities. Financial Innovation, 10, 86. [Google Scholar] [CrossRef]
- Goodell, J. W., Kumar, S., Lim, W. M., & Pattnaik, D. (2021). Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis. Journal of Behavioral and Experimental Finance, 32, 100577. [Google Scholar] [CrossRef]
- Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232. [Google Scholar] [CrossRef]
- Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2), 357–384. [Google Scholar] [CrossRef]
- Harikumar, Y., & Muthumeenakshi, M. (2025). An innovative study on stock price prediction for investment decision through ARIMA and LSTM with recurrent neural network. New Mathematics and Natural Computation, 21(3), 763–783. [Google Scholar] [CrossRef]
- Harvey, D. I., Leybourne, S. J., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2), 281–291. [Google Scholar] [CrossRef]
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. [Google Scholar] [CrossRef]
- Huang, A. H., Lehavy, R., Zang, A. Y., & Zheng, R. (2018). Analyst information discovery and interpretation roles: A topic modeling approach. Management Science, 64(6), 2833–2855. [Google Scholar] [CrossRef]
- Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts. Available online: https://otexts.com/fpp3/ (accessed on 1 January 2025).
- Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. [Google Scholar] [CrossRef]
- Kim, M., & Kross, W. (2005). The ability of earnings to predict future operating cash flows has been increasing—Not decreasing. Journal of Accounting Research, 43(5), 753–774. [Google Scholar] [CrossRef]
- Kumbure, M. M., Lohrmann, C., Luukka, P., & Porras, J. (2022). Machine learning techniques and data for stock market forecasting: A literature review. Expert Systems with Applications, 197, 116659. [Google Scholar] [CrossRef]
- Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194), 20200209. [Google Scholar] [CrossRef]
- Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187–1230. [Google Scholar] [CrossRef]
- Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems (pp. 4765–4774). Curran Associates Inc. [Google Scholar]
- Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1), 54–74. [Google Scholar] [CrossRef]
- Marcellino, M., Stock, J. H., & Watson, M. W. (2006). A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. Journal of Econometrics, 135(1–2), 499–526. [Google Scholar] [CrossRef]
- Miller, M. H., & Orr, D. (1966). A model of the demand for money by firms. Quarterly Journal of Economics, 80(3), 413–435. [Google Scholar] [CrossRef]
- Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 59(2), 347–370. [Google Scholar] [CrossRef]
- Nie, Y., Nguyen, N. H., Sinthong, P., & Kalagnanam, J. (2023). A time series is worth 64 words: Long-term forecasting with transformers. arXiv. [Google Scholar] [CrossRef]
- O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690. [Google Scholar] [CrossRef]
- Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. Journal of Business, 53(1), 61–65. [Google Scholar] [CrossRef]
- Patton, A. J., & Sheppard, K. (2009). Optimal combinations of realised volatility estimators. International Journal of Forecasting, 25(2), 218–238. [Google Scholar] [CrossRef]
- Pesaran, M. H., & Timmermann, A. (1995). Predictability of stock returns: Robustness and economic significance. The Journal of Finance, 50(4), 1201–1228. [Google Scholar] [CrossRef]
- Petropoulos, F., & Spiliotis, E. (2025). Judgmental selection of parameters for simple forecasting models. European Journal of Operational Research, 323(4), 780–794. [Google Scholar] [CrossRef]
- Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421. [Google Scholar]
- Shi, X. J., Chen, Z. R., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems, 28, 802–810. [Google Scholar]
- Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460), 1167–1179. [Google Scholar] [CrossRef]
- Sun, Y., Liu, L., Xu, Y., Zeng, X., Shi, Y., Hu, H., Jiang, J., & Abraham, A. (2024). Alternative data in finance and business: Emerging applications and theory analysis (review). Financial Innovation, 10, 127. [Google Scholar] [CrossRef]
- Taneva-Angelova, G., & Granchev, D. (2025). Deep learning and transformer architectures for volatility forecasting: Evidence from U.S. equity indices. Journal of Risk and Financial Management, 18(12), 685. [Google Scholar] [CrossRef]
- Tauchen, G., & Zhou, H. (2011). Realized jumps on financial markets and predicting credit spreads. Journal of Econometrics, 160(1), 102–118. [Google Scholar] [CrossRef]
- U.S. Department of the Treasury. (2024). Artificial intelligence in financial services, report on the uses, opportunities, and risks of artificial intelligence in the financial services secto, 36. U.S. Department of the Treasury.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008. [Google Scholar]
- Virk, N., Javed, F., Awartani, B., & Hyde, S. (2024). A reality check on the GARCH-MIDAS volatility models. European Journal of Finance, 30(6), 575–596. [Google Scholar] [CrossRef]
- West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica, 64(5), 1067–1084. [Google Scholar] [CrossRef]
- Yang, D., & Zhang, Q. (2000). Drift-independent volatility estimation based on high, low, open, and close prices. Journal of Business, 73(3), 477–491. [Google Scholar] [CrossRef] [PubMed]
- Zellner, A., & Hong, C. (1989). Forecasting international growth rates using Bayesian shrinkage and other procedures. Journal of Econometrics, 40(1), 183–202. [Google Scholar] [CrossRef]
- Zeng, Z., Kaur, R., Siddagangappa, S., Rahimi, S., Balch, T., & Veloso, M. (2023). Financial time series forecasting using CNN and transformer. arXiv. [Google Scholar] [CrossRef]
- Zhang, L., Mykland, P. A., & Aït-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association, 100(472), 1394–1411. [Google Scholar] [CrossRef]
- Zhang, Y., Zhang, T., & Hu, J. (2025). Forecasting stock market volatility using CNN-BiLSTM-attention model with mixed-frequency data. Mathematics, 13(11), 1889. [Google Scholar] [CrossRef]
- Zhang, Z., Chen, B., Zhu, S., & Langrené, N. (2025). Quantformer: From attention to profit with a quantitative transformer trading strategy. arXiv. [Google Scholar] [CrossRef]

| Company | Ticker | Period | Quarters | Mean OCF ($M) | Std Dev ($M) | CV | Min ($M) | Max ($M) |
|---|---|---|---|---|---|---|---|---|
| Microsoft | MSFT | 2011–2024 | 52 | 24,858 | 4842 | 0.19 | 17,300 | 31,800 |
| Apple | AAPL | 2011–2024 | 52 | 29,275 | 8421 | 0.29 | 22,600 | 47,000 |
| Amazon | AMZN | 2012–2024 | 48 | 21,192 | 9843 | 0.46 | 11,500 | 42,900 |
| Alphabet | GOOGL | 2012–2024 | 48 | 23,108 | 4215 | 0.18 | 17,400 | 29,500 |
| Meta | META | 2013–2024 | 44 | 13,358 | 5267 | 0.39 | 5200 | 20,400 |
| Total | 244 | 22,358 | 6518 | 0.29 | 5200 | 47,000 |
| Model (Training Data) | MAPE | RMSE ($M) | R2 | Bias (MAPE) |
|---|---|---|---|---|
| Ensemble (Estimated) | 4.5% | 632 | 0.96 | - |
| Ensemble (Authentic) | 7.9% | 1110 | 0.92 | +75.6% |
| ARIMA (Authentic) | 11.2% | 1924 | 0.84 | - |
| Regime (State) | Frequency | XGBoost MAPE | LSTM MAPE | Dynamic Ensemble MAPE |
|---|---|---|---|---|
| State 1 (Stable) | 62% | 6.5% | 6.9% | 6.4% |
| State 2 (Transitional) | 24% | 7.9% | 8.2% | 7.7% |
| State 3 (Turbulent) | 14% | 13.1% | 11.5% | 11.2% |
| Overall | 100% | 8.2% | 8.6% | 7.9% |
| Model | MAPE | RMSE ($M) | MAE ($M) |
|---|---|---|---|
| Seasonal Naïve | 14.3% | 2156 | 1682 |
| ARIMAX | 11.2% | 1924 | 1513 |
| Prophet | 10.8% | 1847 | 1448 |
| N-BEATS | 9.5% | 1625 | 1296 |
| Temporal Fusion Transformer (TFT) | 8.9% | 1521 | 1215 |
| Simple Ensemble (Unweighted Average) | 8.7% | 1450 | 1152 |
| Dynamic Ensemble (Proposed) | 7.9% | 1110 | 891 |
| Target Firm | Q Available | From-Scratch MAPE | Meta-Learning MAPE | Improvement | FDS |
|---|---|---|---|---|---|
| Tesla (TSLA) | 20 | 14.3% | 10.8% | 24.5% | 0.78 |
| Nvidia (NVDA) | 24 | 12.6% | 9.2% | 27.0% | 0.85 |
| Netflix (NFLX) | 28 | 11.8% | 8.9% | 24.6% | 0.81 |
| Adobe (ADBE) | 32 | 10.5% | 8.1% | 22.9% | 0.88 |
| Salesforce (CRM) | 36 | 11.2% | 8.7% | 22.3% | 0.79 |
| Average | 28 | 12.1% | 9.1% | 24.3% | 0.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Fahad, A.M.; Jearah, N.S. Authentic SEC Data and Regime-Aware Ensemble Learning for Corporate Cash Flow Forecasting. J. Risk Financial Manag. 2026, 19, 333. https://doi.org/10.3390/jrfm19050333
Fahad AM, Jearah NS. Authentic SEC Data and Regime-Aware Ensemble Learning for Corporate Cash Flow Forecasting. Journal of Risk and Financial Management. 2026; 19(5):333. https://doi.org/10.3390/jrfm19050333
Chicago/Turabian StyleFahad, Amjed Mohammed, and Naeem Sabah Jearah. 2026. "Authentic SEC Data and Regime-Aware Ensemble Learning for Corporate Cash Flow Forecasting" Journal of Risk and Financial Management 19, no. 5: 333. https://doi.org/10.3390/jrfm19050333
APA StyleFahad, A. M., & Jearah, N. S. (2026). Authentic SEC Data and Regime-Aware Ensemble Learning for Corporate Cash Flow Forecasting. Journal of Risk and Financial Management, 19(5), 333. https://doi.org/10.3390/jrfm19050333
