A Modular Knowledge-Extraction Framework for Deep Learning Forecasts of Multi-Tier Commodity Prices
Abstract
1. Introduction
1.1. Related Work
1.1.1. Cross-Series Neural Forecasting
1.1.2. Regime-Switching and Gated Architectures
1.1.3. Decomposition-Based Deep Learning
1.1.4. Vertical Price Transmission and Constrained Multivariate Forecasting
1.1.5. Relationship to Prior Author Work
1.2. Research Question and Proposed Framework
1.3. Two Architectural Propositions
1.4. Contributions and Paper Organisation
- A directed cross-market attention layer that combines a domain-informed topology between vertically linked market tiers with time-varying attention intensities learned from data, with attention queries restricted to lower tiers and keys to upstream tiers (Section 2, Component C1).
- A two-stage regime-informed modal-weighting layer that mixes two trainable softmax weight profiles over IMF-aligned latent components through a filtered Markov-switching state probability, exposing regime-dependent latent-component weighting as an inspectable architectural object rather than absorbing it into the encoder’s weights (Section 2, Component C2).
- An empirical evaluation on a multi-tier commodity-price system spanning 2038 trading days, reporting root mean squared error against three external benchmarks (random walk, ARIMA(2,0,2), and an exogenous-only LSTM), together with a constraint-projection comparison and a decision-aware income-smoothing metric for farm-gate forecasts, providing the evidence on which the contributions of C1 and C2 are assessed (Section 3). Complementary point-error metrics (MAE and MASE) and Diebold–Mariano significance tests are provided in the companion repository submitted as Supplementary Material; the primary per-component ablations (E1 and E2) are reported in the main text (Section 3.4), with the broader ablation set in the companion repository.
2. Materials and Methods
2.1. Data: A Three-Tier Vertically Linked Rubber Price System
2.1.1. Tier Construction
2.1.2. Exogenous Covariates
2.1.3. Splits
2.2. Regime Detection (Offline Preprocessing)
Smoothed Versus Filtered Probabilities
2.3. Per-Tier Variational Mode Decomposition (Offline Preprocessing)
Causal Rolling Re-Decomposition
2.4. Hierarchical Encoder with Domain-Informed Cross-Tier Attention
2.4.1. Tier-Level BiLSTM Encoders
2.4.2. Directed Cross-Tier Attention (Component C1)
Degenerate Attention at Tier R
2.5. Two-Stage Regime-Informed Modal-Weighting Layer (Component C2)
2.5.1. Layer Construction
2.5.2. Layer Properties
2.6. Forecast Head and Training Loss
Log-to-Level Conversion
2.7. Auxiliary Sample-Ratio Constraint Projection (Component C3)
2.7.1. Static Constraint Projection
2.7.2. Regime-Conditional Constraint Projection
Properties of the Regime-Conditional Projection
2.8. Training Protocol
2.9. Framework Summary
2.10. Baselines and Evaluation Protocol
2.10.1. Reported Metrics
RMSE
MAE
MASE
2.10.2. Baselines
Comparison Fairness
2.10.3. Component Ablations
- E1
- No regime gating. Force during training. The regime-informed modal-weighting layer (Section 2.5) reduces to a fixed-weight aggregation z = and the filtered gating signal no longer affects forecasts. This isolates proposition P2 of Section 1.3.
- E2
- No cross-tier attention. Bypass the cross-tier attention layers: and . Each tier’s encoder produces its forecast from its own VMD inputs and exogenous covariates only. This isolates proposition P1 of Section 1.3.
- E5
- No VMD. Replace the five IMF inputs with the raw price series (or a wider window of recent prices). Tests whether the per-tier decomposition is responsible for the encoder’s signal.
- E7
- No constraint projection. Use the unreconciled base forecasts directly, without applying the post hoc projection of Section 2.7. Tests whether the auxiliary projection improves out-of-sample RMSE.
2.11. Decision-Loss Simulation
- Decision Rule
- Realised Income
- Income-Smoothing Metric
2.12. Reproducibility
3. Results
3.1. Experimental Setup
Reported Metrics
3.2. Base Forecast Accuracy
3.2.1. Horizon Dependence
3.2.2. Variance Diagnostic
3.3. Benchmark Comparison
- HVB-RA uniformly outperforms the contemporary deep learning baseline. Against NHITS, supplied with the same five exogenous covariates and evaluated on the identical aligned test origins, HVB-RA attains lower RMSE in all nine of nine tier-horizon cells, with the margin widening at long horizons (R-: versus ; G-: versus ). This establishes that HVB-RA, taken as a complete configuration, attains lower point error than a current deep forecaster on the same information set; the marginal point-error contribution of the individual components is examined separately in the component ablation of Section 3.4.
- No method improves on the random-walk floor at most cells. Under the residual target definition the no-change forecast is the natural zero-prediction baseline. No method—classical, exogenous, contemporary, or the proposed framework—reduces RMSE below the random-walk floor at the majority of tier-horizon cells. This is the expected behaviour for daily commodity prices under weak-form efficiency and is reported here without qualification. The proposed framework does not claim to dominate the near-floor classical baselines on point error; its demonstrated advantage is over the contemporary deep learning baseline.
- The farm-gate tier remains well-described by short-memory dynamics. ARIMA(2,0,2) and the random walk are strongest at the farm-gate tier, consistent with the relatively slower information flow there, where linear short-memory dynamics provide a sufficient statistical model. This is also consistent with prior single-tier evidence for the same commodity: Pinitjitsamut [6] reports a competitive ARIMA baseline on a single-tier rubber-price forecasting task.
Statistical Significance of the RMSE Differences
3.4. Real-Data Component Ablation
3.5. Constraint Projection
Out-of-Sample Covariance Estimation
- The unreconciled base forecasts attain the lowest RMSE at every tier; the projection trades point accuracy for exact coherence. The base forecasts attain the lowest RMSE on each tier and the lowest average RMSE ( USD/kg) but a non-zero constraint violation (). All projection variants drive the violation to zero at the cost of higher RMSE: the static (and regime-conditional) projection raises average RMSE to (), top–down to , and bottom–up to . Each tier’s base forecast is already near its own random-walk floor (Section 3.3), so the cross-tier coherence constraint necessarily pulls forecasts away from those individually-good values.
- Static and regime-conditional projections coincide in this test period. Both produce identical RMSE because the convex mix reduces to when (the fallback rule activates because no calibration observation falls in the calm regime). Regime conditioning would become identifiable only in test windows that contain both regimes; on the present split it is non-identified, a property of this particular calibration–test window rather than of the mechanism.
- The auxiliary projection is a coherence-enforcement option, not an accuracy-improvement mechanism, on this window. All four projection methods reduce the constraint violation from to effectively zero (). When exact cross-tier coherence is required by a downstream consumer, the projection supplies it at a quantified RMSE cost; when it is not required, the base forecasts are preferable. Whether the projection improves accuracy under regime mixing or richer feature sets remains an open question discussed in Section 4.
3.6. Decision-Aware Income Smoothing
3.7. Other Robustness Diagnostics
3.8. Summary of Findings
- Finding 1: HVB-RA outperforms the contemporary deep learning baseline at every cell. Against NHITS, a current neural hierarchical-interpolation forecaster supplied with the same five exogenous covariates and evaluated on the identical aligned test origins, HVB-RA achieves lower RMSE at all nine of nine tier-horizon cells (Table 7), with the margin widening at long horizons (R-: versus ; G-: versus ). The full HVB-RA configuration thus attains lower error than a current deep forecaster on the same information set. The component ablation (Section 3.4) localises this aggregate result: on the present single-regime window the cross-tier attention and regime-conditioning components do not themselves add point-forecast value, so the advantage over the contemporary baseline is not attributable to them.
- Finding 2: Classical baselines are strongest at the farm-gate tier. ARIMA(2,0,2) and the random walk attain the lowest RMSE across the farm-gate horizons (Table 7). Linear short-memory dynamics appear sufficient for the slower farm-gate price process at the resolution of this evaluation.
- Finding 3: No method improves on the random-walk floor at most cells. Under the residual target definition the no-change forecast is the natural zero-prediction baseline. No method—classical, exogenous, contemporary, or the proposed framework—reduces RMSE below the random-walk floor at the majority of tier-horizon cells, the expected behaviour for daily commodity prices under weak-form efficiency. HVB-RA’s demonstrated advantage is over the contemporary deep learning baseline (Finding 1), not over the near-floor classical baselines. The regime-conditioned components of the architecture require multi-regime evaluation for identification (Section 4.1), whereas the cross-tier routing component is evaluated on the present test window through the E2 ablation reported in Section 3.4.
- Finding 4: The regime-conditional component of the constraint projection is not identifiable on this test window. The trained Markov-switching classifier assigns every calibration origin and every test origin to the high-volatility state; the regime-conditional projection therefore reduces numerically to the static projection. Identification of the regime-conditional contribution requires a calibration–test split that contains both regimes.
- Finding 5: Decision-aware income smoothing is a null operational result in the current configuration. The HVB-RA decision rule produces with a descriptive seed-based interval that includes zero and a delay frequency that ranges from 0 to 278 across seeds, indicating that the rule operates near its degenerate boundary. The decision-aware evaluation is reported as a methodological template, not as a substantive operational contribution.
4. Discussion
4.1. What the Evaluation Identifies, and What It Does Not
4.2. The Single-Regime Test Window
4.3. Why the Backbone and Exogenous-Only LSTM Are So Close
4.4. The Framework as a Knowledge-Extraction Proposal
4.5. Generalisation Beyond Rubber
4.6. Limitations
4.7. Priorities for Follow-Up Evaluation
4.8. Closing Remarks
5. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AR | Autoregressive |
| ARIMA | Autoregressive Integrated Moving Average |
| BiLSTM | Bidirectional Long Short-Term Memory |
| CI | Confidence Interval |
| DM | Diebold–Mariano (test) |
| F | Futures tier |
| G | Farm-gate tier |
| HVB-RA | Hybrid VMD–BiLSTM with Regime-Aware components |
| IMF | Intrinsic Mode Function |
| IS | Income Smoothing |
| LSTM | Long Short-Term Memory |
| MASE | Mean Absolute Scaled Error |
| MS-AR | Markov-Switching Autoregressive |
| R | Regional (spot) tier |
| RMSE | Root Mean Squared Error |
| SGX | Singapore Exchange |
| SHFE | Shanghai Futures Exchange |
| StdR | Standard-Deviation Ratio |
| TOCOM | Tokyo Commodity Exchange |
| VMD | Variational Mode Decomposition |
References
- Ge, Y.; Wang, H.H.; Ahn, S.K. Cotton market integration and the impact of China’s new exchange rate regime. Agric. Econ. 2014, 45, 5–27. [Google Scholar] [CrossRef]
- Khin, A.A.; Ramli, M.A.F.B. Price transmission and volatility spillovers in the natural rubber market: Evidence from major rubber-producing countries. Int. J. Supply Chain. Manag. 2019, 8, 432–440. [Google Scholar]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the KDD ’20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 753–763. [Google Scholar] [CrossRef]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar] [CrossRef]
- Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
- Pinitjitsamut, M. Multi-scale forecasting of natural rubber prices using VMD-augmented BiLSTM: A hybrid architecture ablation study. Forecasting 2026, 8, 43. [Google Scholar] [CrossRef]
- Olivares, K.G.; Challu, C.; Marcjasz, G.; Weron, R.; Dubrawski, A. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx. Int. J. Forecast. 2023, 39, 884–900. [Google Scholar] [CrossRef]
- Lim, B.; Arik, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
- Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Garza, F.G.; Mergenthaler-Canseco, M.; Dubrawski, A. NHITS: Neural hierarchical interpolation for time series forecasting. Proc. Aaai Conf. Artif. Intell. 2023, 37, 6989–6997. [Google Scholar] [CrossRef]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar] [CrossRef]
- Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv 2019, arXiv:1905.10437. [Google Scholar] [CrossRef]
- Cao, D.; Wang, Y.; Duan, J.; Zhang, C.; Zhu, X.; Huang, C.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17766–17778. [Google Scholar]
- Hamilton, J.D. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 1989, 57, 357–384. [Google Scholar] [CrossRef]
- Klaassen, F. Improving GARCH volatility forecasts with regime-switching GARCH. Empir. Econ. 2002, 27, 363–394. [Google Scholar] [CrossRef]
- Teräsvirta, T. Specification, estimation, and evaluation of smooth transition autoregressive models. J. Am. Stat. Assoc. 1994, 89, 208–218. [Google Scholar] [CrossRef]
- Dijk, D.; Teräsvirta, T.; Franses, P.H. Smooth transition autoregressive models—A survey of recent developments. Econom. Rev. 2002, 21, 1–47. [Google Scholar] [CrossRef]
- Tong, H. Non-Linear Time Series: A Dynamical System Approach; Oxford University Press: Oxford, UK, 1990. [Google Scholar]
- Guidolin, M.; Pedio, M. Essentials of Time Series for Financial Applications; Academic Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Ang, A.; Timmermann, A. Regime changes and financial markets. Annu. Rev. Financ. Econ. 2012, 4, 313–337. [Google Scholar] [CrossRef]
- Bucci, A. Realized volatility forecasting with neural networks. J. Financ. Econom. 2020, 18, 502–531. [Google Scholar] [CrossRef]
- Marinho, P.; de Andrade, B.B.; Hotta, L.K. A regime-switching approach for forecasting commodity prices. J. Forecast. 2021, 40, 1090–1112. [Google Scholar] [CrossRef]
- Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A 1998, 454, 903–995. [Google Scholar] [CrossRef]
- Engle, R.F.; Granger, C.W.J. Co-integration and error correction: Representation, estimation, and testing. Econometrica 1987, 55, 251–276. [Google Scholar] [CrossRef]
- Wickramasuriya, S.L.; Athanasopoulos, G.; Hyndman, R.J. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Am. Stat. Assoc. 2019, 114, 804–819. [Google Scholar] [CrossRef]
- Panagiotelis, A.; Gamakumara, P.; Athanasopoulos, G.; Hyndman, R.J. Probabilistic forecast reconciliation: Properties, evaluation and score optimisation. Eur. J. Oper. Res. 2023, 306, 693–706. [Google Scholar] [CrossRef]
- Kim, C.-J. Dynamic linear models with Markov-switching. J. Econom. 1994, 60, 1–22. [Google Scholar] [CrossRef]
- Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
- Harvey, D.; Leybourne, S.; Newbold, P. Testing the equality of prediction mean squared errors. Int. J. Forecast. 1997, 13, 281–291. [Google Scholar] [CrossRef]
- Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar] [CrossRef]
- Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]

| Tier | Constituent Series | Source |
|---|---|---|
| F (global futures) | TOCOM RSS3/SGX TSR20/SHFE RU front-month | TOCOM, SGX, SHFE official feeds |
| R (regional spot) | SICOM RSS3/MRB benchmark/GAPKINDO composite | SICOM, MRB, GAPKINDO |
| G (farm-gate) | RAOT (TH)/MRB-DOSM (MY)/GAPKINDO provincial (ID) | RAOT, DOSM, GAPKINDO |
| Setting | Value |
|---|---|
| Optimiser | AdamW, learning rate , weight decay |
| Batch size | 64 |
| Maximum epochs | 100 |
| Sequence length L | 60 trading days |
| BiLSTM hidden size | 64 |
| Attention head dimension | (single head) |
| Dropout (BiLSTM/attention) | 0.2/0.1 |
| Gradient clipping | Global norm 1.0 |
| Early stopping | Patience 20 epochs on calibration loss |
| Decision-loss threshold | |
| Random seeds | 5 seeds (3407, 42, 1234, 2024, 7777) |
| Section No. | Component | Module Type | Where Parameters Come from |
|---|---|---|---|
| Section 2.3 | Per-tier VMD decomposition | Signal processing, offline | ADMM on training-window prices, rolling re-fit on calibration and test |
| Section 2.2 | Markov-switching regime detection | Classical statistical, offline | Expectation–maximisation on training-window returns |
| Section 2.4.1 | Tier-level BiLSTM encoders | Differentiable neural, joint training | Backpropagation on Equation (14) |
| Section 2.4.2 | Directed cross-tier attention (C1) | Differentiable neural, joint training | Backpropagation on Equation (14) |
| Section 2.5 | Regime-informed modal weighting (C2) | Differentiable neural, joint training | Backpropagation on Equation (14) |
| Section 2.7 | Sample-ratio constraint projection (C3) | Linear projection, post hoc | Calibration-window residuals |
| Variant | Tier F | Tier R | Tier G | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Full | 0.116 | 0.116 | 0.126 | 0.117 | 0.115 | 0.129 | 0.141 | 0.130 | 0.117 |
| E1 (no regime) | 0.116 | 0.116 | 0.126 | 0.117 | 0.115 | 0.129 | 0.141 | 0.130 | 0.117 |
| E2 (no attn) | 0.121 | 0.123 | 0.128 | 0.115 | 0.116 | 0.130 | 0.133 | 0.122 | 0.112 |
| Split | Date Range | N (Days) |
|---|---|---|
| Training | 2 May 2018 to 30 June 2023 | 1348 |
| Calibration | 3 July 2023 to 31 December 2024 | 392 |
| Test | 2 January 2025 to 4 March 2026 | 298 |
| Tier × Horizon | RMSE (USD/kg) |
|---|---|
| F (global futures), | |
| F, | |
| F, | |
| R (regional spot), | |
| R, | |
| R, | |
| G (farm-gate), | |
| G, | |
| G, |
| Tier | h | Random Walk | ARIMA(2,0,2) | NHITS | Exog-LSTM | HVB-RA |
|---|---|---|---|---|---|---|
| F | 1 | |||||
| F | 5 | |||||
| F | 21 | |||||
| R | 1 | |||||
| R | 5 | |||||
| R | 21 | |||||
| G | 1 | |||||
| G | 5 | |||||
| G | 21 |
| Tier | h | Full | E1 (No Regime) | E2 (No Attn) | (E2−Full) |
|---|---|---|---|---|---|
| F | 1 | ||||
| F | 5 | ||||
| F | 21 | ||||
| R | 1 | ||||
| R | 5 | ||||
| R | 21 | ||||
| G | 1 | ||||
| G | 5 | ||||
| G | 21 |
| Method | F RMSE | R RMSE | G RMSE | Avg RMSE | Constraint |
|---|---|---|---|---|---|
| Base (unreconciled) | |||||
| Bottom–up | |||||
| Top–down | |||||
| Static constraint projection | |||||
| Regime-conditional projection |
| Method | IS | (of 278) |
|---|---|---|
| Random walk (always sell) | ||
| HVB-RA ( forecast) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pinitjitsamut, M. A Modular Knowledge-Extraction Framework for Deep Learning Forecasts of Multi-Tier Commodity Prices. Mach. Learn. Knowl. Extr. 2026, 8, 185. https://doi.org/10.3390/make8070185
Pinitjitsamut M. A Modular Knowledge-Extraction Framework for Deep Learning Forecasts of Multi-Tier Commodity Prices. Machine Learning and Knowledge Extraction. 2026; 8(7):185. https://doi.org/10.3390/make8070185
Chicago/Turabian StylePinitjitsamut, Montchai. 2026. "A Modular Knowledge-Extraction Framework for Deep Learning Forecasts of Multi-Tier Commodity Prices" Machine Learning and Knowledge Extraction 8, no. 7: 185. https://doi.org/10.3390/make8070185
APA StylePinitjitsamut, M. (2026). A Modular Knowledge-Extraction Framework for Deep Learning Forecasts of Multi-Tier Commodity Prices. Machine Learning and Knowledge Extraction, 8(7), 185. https://doi.org/10.3390/make8070185
